This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
that precedes the relational information in Figure 14.9b and the that follows it — to constrain the search. This wrapper is a specific example of a generalized structure that parses a page into a head, followed by a sequence of relational items, followed by a tail; where specific delimiters are used to signal the end of the head, the items themselves, and the beginning of the tail. It is possible to infer such wrappers by induction from examples that comprise a set of pages and tuples representing the information derived from each page. This can be done by iterating over all choices of delimiters, stopping when a consistent wrapper is encountered. One advantage of automatic wrapper (a)
(b)
(c)
Some Country Codes Some Country Codes
Congo 242 Egypt 20 Belize 501 Spain 34 End ExtractCountryCodes(page P) Skip past first occurrence of
in P While next is before next in P For each [s, t] Œ {[,], [,]} Skip past next occurrence of s in P Extract attribute from P to next occurrence of t Return extracted tuples
FIGURE 14.9
Copyright 2005 by CRC Press LLC
Web page, underlying HTML, and wrapper extracting relational information.
C3812_C14.fm Page 16 Wednesday, August 4, 2004 8:23 AM
14-16
The Practical Handbook of Internet Computing
induction is that recognition then depends on a minimal set of cues, providing some defense against extraneous text and markers in the input. Another is that when errors are caused by stylistic variants, it is a simple matter to add these to the training data and reinduce a new wrapper that takes them into account. 14.3.1.1 Document Clustering with Links Document clustering techniques are normally based on the documents’ textual similarity. However, the hyperlink structure of Web documents, encapsulated in the “link graph” in which nodes are Web pages and links are hyperlinks between them, can be used as a different basis for clustering. Many standard graph clustering and partitioning techniques are applicable (e.g., Hendrickson and Leland [1995]). Linkbased clustering schemes typically use factors such as these: • The number of hyperlinks that must be followed to travel in the Web from one document to the other • The number of common ancestors of the two documents, weighted by their ancestry distance • The number of common descendents of the documents, similarly weighted These can be combined into an overall similarity measure between documents. In practice, a textual similarity measure is usually incorporated as well, to yield a hybrid clustering scheme that takes account of both the documents’ content and their linkage structure. The overall similarity may then be determined as the weighted sum of four factors (e.g., Weiss et al. [1996]). Clearly, such a measure will be sensitive to the stylistic characteristics of the documents and their linkage structure, and given the number of parameters involved, there is considerable scope for tuning to maximize performance on particular data sets. 14.3.1.2 Determining “Authority” of Web Documents The Web’s linkage structure is a valuable source of information that reflects the popularity, sometimes interpreted as “importance,” “authority,” or “status” of Web pages. For each page, a numeric rank is computed. The basic premise is that highly ranked pages are ones that are cited, or pointed to, by many other pages. Consideration is also given to (1) the rank of the citing page, to reflect the fact that a citation by a highly ranked page is a better indication of quality than one from a lesser page, and (2) the number of outlinks from the citing page, to prevent a highly ranked page from artificially magnifying its influence simply by containing a large number of pointers. This leads to a simple algebraic equation to determine the rank of each member of a set of hyperlinked pages [Brin and Page, 1998]. Complications arise because some links are “broken” in that they lead to nonexistent pages, and because the Web is not fully connected, but these are easily overcome. Such techniques are widely used by search engines (e.g., Google) to determine how to sort the hits associated with any given query. They provide a social measure of status that relates to standard techniques developed by social scientists for measuring and analyzing social networks [Wasserman and Faust, 1994].
14.4 Human Text Mining All scientific researchers are expected to use the literature as a major source of information during the course of their work to provide new ideas and supplement their laboratory studies. However, some feel that this can be taken further: that new information, or at least new hypotheses, can be derived directly from the literature by researchers who are expert in information seeking but not necessarily in the subject matter itself. Subject-matter experts can only read a small part of what is published in their fields and are often unaware of developments in related fields. Information researchers can seek useful linkages between related literatures that may be previously unknown, particularly if there is little explicit crossreference between the literatures. We briefly sketch an example to indicate what automatic text mining may eventually aspire to but is nowhere near achieving yet. By analyzing chains of causal implication within the medical literature, new hypotheses for causes of rare diseases have been discovered, some of which have received supporting experimental evidence [Swanson, 1987; Swanson and Smalheiser, 1997]. While investigating causes of
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 17 Wednesday, August 4, 2004 8:23 AM
Text Mining
14-17
migraine headaches, Swanson extracted information from titles of articles in the biomedical literature, leading to clues like these: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker Spreading cortical depression is implicated in some migraines High levels of magnesium inhibit spreading cortical depression Migraine patients have high platelet aggregability Magnesium can suppress platelet aggregability These clues suggest that magnesium deficiency may play a role in some kinds of migraine headache, a hypothesis that did not exist in the literature at the time. Swanson found these links. Thus a new and plausible medical hypothesis was derived from a combination of text fragments and the information researcher’s background knowledge. Of course, the hypothesis still had to be tested via nontextual means.
14.5 Techniques and Tools Text mining systems use a broad spectrum of different approaches and techniques, partly because of the great scope of text mining and consequent diversity of systems that perform it, and partly because the field is so young that dominant methodologies have not yet emerged.
14.5.1 High-Level Issues: Training vs. Knowledge Engineering There is an important distinction between systems that use an automatic training approach to spot patterns in data and ones that are based on a knowledge engineering approach and use rules formulated by human experts. This distinction recurs throughout the field but is particularly stark in the areas of entity extraction and information extraction. For example, systems that extract personal names can use handcrafted rules derived from everyday experience. Simple and obvious rules involve capitalization, punctuation, single-letter initials, and titles; more complex ones take account of baronial prefixes and foreign forms. Alternatively, names could be manually marked up in a set of training documents and machine-learning techniques used to infer rules that apply to test documents. In general, the knowledge-engineering approach requires a relatively high level of human expertise — a human expert who knows the domain and the information extraction system well enough to formulate high-quality rules. Formulating good rules is a demanding and time-consuming task for human experts and involves many cycles of formulating, testing, and adjusting the rules so that they perform well on new data. Markup for automatic training is clerical work that requires only the ability to recognize the entities in question when they occur. However, it is a demanding task because large volumes are needed for good performance. Some learning systems can leverage unmarked training data to improve the results obtained from a relatively small training set. For example, an experiment in document categorization used a small number of labeled documents to produce an initial model, which was then used to assign probabilistically weighted class labels to unlabeled documents [Nigam et al., 1998]. Then a new classifier was produced using all the documents as training data. The procedure was iterated until the classifier remained unchanged. Another possibility is to bootstrap learning based on two different and mutually reinforcing perspectives on the data, an idea called “co-training” [Blum and Mitchell, 1998].
14.5.2 Low-Level Issues: Token Identification Dealing with natural language involves some rather mundane decisions that nevertheless strongly affect the success of the outcome. Tokenization, or splitting the input into words, is an important first step that
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 18 Wednesday, August 4, 2004 8:23 AM
14-18
The Practical Handbook of Internet Computing
seems easy but is fraught with small decisions: how to deal with apostrophes and hyphens, capitalization, punctuation, numbers, alphanumeric strings, whether the amount of white space is significant, whether to impose a maximum length on tokens, what to do with nonprinting characters, and so on. It may be beneficial to perform some rudimentary morphological analysis on the tokens — removing suffixes [Porter, 1980] or representing them as words separate from the stem — which can be quite complex and is strongly language-dependent. Tokens may be standardized by using a dictionary to map different, but equivalent, variants of a term into a single canonical form. Some text-mining applications (e.g., text summarization) split the input into sentences and even paragraphs, which again involves mundane decisions about delimiters, capitalization, and nonstandard characters. Once the input is tokenized, some level of syntactic processing is usually required. The simplest operation is to remove stop words, which are words that perform well-defined syntactic roles but from a nonlinguistic point of view do not carry information. Another is to identify common phrases and map them into single features. The resulting representation of the text as a sequence of word features is commonly used in many text-mining systems (e.g., for information extraction). 14.5.2.1 Basic Techniques Tokenizing a document and discarding all sequential information yield the “bag of words” representation mentioned above under document retrieval. Great effort has been invested over the years in a quest for document similarity measures based on this representation. One is to count the number of terms in common between the documents: this is called coordinate matching. This representation, in conjunction with standard classification systems from machine learning (e.g., Naïve Bayes and Support Vector Machines; see Witten and Frank [2000]), underlies most text categorization systems. It is often more effective to weight words in two ways: first by the number of documents in the entire collection in which they appear (“document frequency”) on the basis that frequent words carry less information than rare ones; second by the number of times they appear in the particular documents in question (“term frequency”). These effects can be combined by multiplying the term frequency by the inverse document frequency, leading to a standard family of document similarity measures (often called “tf ¥ idf ”). These form the basis of standard text categorization and information retrieval systems. A further step is to perform a syntactic analysis and tag each word with its part of speech. This helps to disambiguate different senses of a word and to eliminate incorrect analyses caused by rare word senses. Some part-of-speech taggers are rule based, while others are statistically based [Garside et al., 1987] — this reflects the “training” vs. “knowledge engineering” referred to earlier. In either case, results are correct about 95% of the time, which may not be enough to resolve the ambiguity problems. Another basic technique for dealing with sequences of words or other items is to use Hidden Markov Models (HMMs). These are probabilistic finite-state models that “parse” an input sequence by tracking its flow through the model. This is done in a probabilistic sense so that the model’s current state is represented not by a particular unique state but by a probability distribution over all states. Frequently, the initial state is unknown or “hidden,” and must itself be represented by a probability distribution. Each new token in the input affects this distribution in a way that depends on the structure and parameters of the model. Eventually, the overwhelming majority of the probability may be concentrated on one particular state, which serves to disambiguate the initial state and indeed the entire trajectory of state transitions corresponding to the input sequence. Trainable part-of-speech taggers are based on this idea: the states correspond to parts of speech (e.g., Brill [1992]). HMMs can easily be built from training sequences in which each token is pre-tagged with its state. However, the manual effort involved in tagging training sequences is often prohibitive. There exists a “relaxation” algorithm that takes untagged training sequences and produces a corresponding HMM [Rabiner, 1989]. Such techniques have been used in text mining, for example, to extract references from plain text [McCallum et al., 1999]. If the source documents are hypertext, there are various basic techniques for analyzing the linkage structure. One, evaluating page rank to determine a numeric “importance” for each page, was described above. Another is to decompose pages into “hubs” and “authorities” [Kleinberg, 1999]. These are recur-
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 19 Wednesday, August 4, 2004 8:23 AM
Text Mining
14-19
sively defined as follows: A good hub is a page that points to many good authorities, while a good authority is a page pointed to by many good hubs. This mutually reinforcing relationship can be evaluated using an iterative relaxation procedure. The result can be used to select documents that contain authoritative content to use as a basis for text mining, discarding all those Web pages that simply contain lists of pointers to other pages. 14.5.2.2 Tools There is a plethora of software tools to help with the basic processes of text mining. A comprehensive and useful resource at nlp.stanford.edu/lionks/statnlp.html lists taggers, parsers, language models, and concordances; several different corpora (large collections, particular languages, etc.); dictionaries, lexical, and morphological resources; software modules for handling XML and SGML documents; and other relevant resources such as courses, mailing lists, people, and societies. It classifies software as freely downloadable and commercially available, with several intermediate categories. One particular framework and development environment for text mining, called General Architecture for Text Engineering or GATE [Cunningham, 2002], aims to help users develop, evaluate, and deploy systems for what the authors term “language engineering.” It provides support not just for standard textmining applications such as information extraction but also for tasks such as building and annotating corpora and evaluating the applications. At the lowest level, GATE supports a variety of formats including XML, RTF, HTML, SGML, email, and plain text, converting them into a single unified model that also supports annotation. There are three storage mechanisms: a relational database, a serialized Java object, and an XML-based internal format; documents can be reexported into their original format with or without annotations. Text encoding is based on Unicode to provide support for multilingual data processing, so that systems developed with GATE can be ported to new languages with no additional overhead apart from the development of the resources needed for the specific language. GATE includes a tokenizer and a sentence splitter. It incorporates a part-of-speech tagger and a gazetteer that includes lists of cities, organizations, days of the week, etc. It has a semantic tagger that applies handcrafted rules written in a language in which patterns can be described and annotations created as a result. Patterns can be specified by giving a particular text string, or annotations that have previously been created by modules such as the tokenizer, gazetteer, or document format analysis. It also includes semantic modules that recognize relations between entities and detect coreference. It contains tools for creating new language resources and for evaluating the performance of text-mining systems developed with GATE. One application of GATE is a system for entity extraction of names that is capable of processing texts from widely different domains and genres. This has been used to perform recognition and tracking tasks of named, nominal, and pronominal entities in several types of text. GATE has also been used to produce formal annotations about important events in a text commentary that accompanies football video program material.
14.6 Conclusion Text mining is a burgeoning technology that is still, because of its newness and intrinsic difficulty, in a fluid state — akin, perhaps, to the state of machine learning in the mid-1980s. Generally accepted characterizations of what it covers do not yet exist. When the term is broadly interpreted, many different problems and techniques come under its ambit. In most cases, it is difficult to provide general and meaningful evaluations because the task is highly sensitive to the particular text under consideration. Document classification, entity extraction, and filling templates that correspond to given relationships between entities are all central text-mining operations that have been extensively studied. Using structured data such as Web pages rather than plain text as the input opens up new possibilities for extracting information from individual pages and large networks of pages. Automatic text-mining techniques have
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 20 Wednesday, August 4, 2004 8:23 AM
14-20
The Practical Handbook of Internet Computing
a long way to go before they rival the ability of people, even without any special domain knowledge, to glean information from large document collections.
References Agarwal, R. and Srikant, R. (1994) Fast algorithms for mining association rules. Proceedings of the International Conference on Very Large Databases VLDB-94. Santiago, Chile, pp. 487–499. Aone, C., Bennett, S.W., and Gorlinsky, J. (1996) Multi-media fusion through application of machine learning and NLP. Proceedings of the AAAI Symposium on Machine Learning in Information Access. Stanford, CA. Appelt, D.E. (1999) Introduction to information extraction technology. Tutorial, International Joint Conference on Artificial Intelligence IJCAI’99. Morgan Kaufmann, San Francisco. Tutorial notes available at www.ai.sri.com/~appelt/ie-tutorial. Apte, C., Damerau, F.J., and Weiss, S.M. (1994) Automated learning of decision rules for text categorization. ACM Trans Information Systems, Vol. 12, No. 3, pp. 233–251. Baeza-Yates, R. and Ribiero-Neto, B. (1999) Modern information retrieval. Addison-Wesley Longman, Essex, U.K. Bell, T.C., Cleary, J.G. and Witten, I.H. (1990) Text Compression. Prentice Hall, Englewood Cliffs, NJ. Blum, A. and Mitchell, T. (1998) Combining labeled and unlabeled data with co-training. Proceedings of the Conference on Computational Learning Theory COLT-98. Madison, WI, pp. 92–100. Borko, H. and Bernier, C.L. (1975) Abstracting concepts and methods. Academic Press, San Diego, CA. Brill, E. (1992) A simple rule-based part of speech tagger. Proceedings of the Conference on Applied Natural Language Processing ANLP-92. Trento, Italy, pp. 152–155. Brin, S. and Page, L. (1998) The anatomy of a large-scale hypertextual Web search engine. Proceedings of the World Wide Web Conference WWW-7. In Computer Networks and ISDN Systems, Vol. 30, No. 1–7, pp. 107–117. Califf, M.E. and Mooney, R.J. (1999) Relational learning of pattern-match rules for information extraction. Proceedings of the National Conference on Artificial Intelligence AAAI-99. Orlando, FL, pp. 328–334. Cavnar, W.B. and Trenkle, J.M. (1994) N-Gram-based text categorization. Proceedings of the Symposium on Document Analysis and Information Retrieval. Las Vegas, NV, pp. 161–175. Cheeseman, P., Kelly, J., Self, M., Stutz., J., Taylor, W., and Freeman, D. (1988) AUTOCLASS: A Bayesian classification system. Proceedings of the International Conference on Machine Learning ICML-88. San Mateo, CA, pp. 54–64. Cohen, W.W. (1995) Fast effective rule induction. Proceedings of the International Conference on Machine Learning ICML-95. Tarragona, Catalonia, Spain, pp. 115–123. Cunningham, H. (2002) GATE, a General Architecture for Text Engineering. Computing and the Humanities, Vol. 36, pp. 223–254. Dumais, S.T., Platt, J., Heckerman, D., and Sahami, M. (1998) Inductive learning algorithms and representations for text categorization. Proceedings of the International Conference on Information and Knowledge Management CIKM-98. Bethesda, MD, pp. 148–155. Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrika, Vol. 63, No. 3, pp. 435–447. Fisher, D. (1987) Knowledge acquisition via incremental conceptual clustering. Machine Learning, Vol. 2, pp. 139–172. Frank, E., Paynter, G., Witten, I.H., Gutwin, C., and Nevill-Manning, C. (1999) Domain-specific keyphrase extraction. Proceedings of the International Joint Conference on Artificial Intelligence IJCAI99. Stockholm, Sweden, pp. 668–673. Franklin, D. (2002) New software instantly connects key bits of data that once eluded teams of researchers. Time, December 23.
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 21 Wednesday, August 4, 2004 8:23 AM
Text Mining
14-21
Freitag, D. (2000) Machine learning for information extraction in informal domains. Machine Learning, Vol. 39, No. 2/3, pp. 169–202. Garside, R., Leech, G., and Sampson, G. (1987) The Computational Analysis of English: A Corpus-Based Approach. Longman, London. Green, C.L., and Edwards, P. (1996) Using machine learning to enhance software tools for Internet information management. Proceedings of the AAAI Workshop on Internet Based Information Systems. Portland, OR, pp. 48–56. Grefenstette, G. (1995) Comparing two language identification schemes. Proceedings of the International Conference on Statistical Analysis of Textual Data JADT-95. Rome, Italy. Harman, D.K. (1995) Overview of the third text retrieval conference. In Proceedings of the Text Retrieval Conference TREC-3. National Institute of Standards, Gaithersburg, MD, pp. 1–19. Hayes, P.J., Andersen, P.M., Nirenburg, I.B., and Schmandt, L.M. (1990) Tcs: a shell for content-based text categorization. Proceedings of the IEEE Conference on Artificial Intelligence Applications CAIA90. Santa Barbara, CA, pp. 320–326. Hearst, M.A. (1999) Untangling text mining. Proceedings of the Annual Meeting of the Association for Computational Linguistics ACL99. University of Maryland, College Park, MD, June. Hendrickson, B. and Leland, R.W. (1995) A multi-level algorithm for partitioning graphs. Proceedings of the ACM/IEEE Conference on Supercomputing. San Diego, CA. Huffman, S.B. (1996) Learning information extraction patterns from examples. In S. Wertmer, E. Riloff, and G. Scheler, Eds. Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Springer-Verlag, Berlin, pp. 246–260. Kleinberg, J.M. (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, Vol. 46, No. 5, pp. 604–632. Kolata, G. (1986) Shakespeare’s new poem: an ode to statistics. Science, No. 231, pp. 335–336, January 24. Kushmerick, N., Weld, D.S., and Doorenbos, R. (1997) Wrapper induction for information extraction. Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-97. Nayoya, Japan, pp. 729–735. Lancaster, F.W. (1991) Indexing and abstracting in theory and practice. University of Illinois Graduate School of Library and Information Science, Champaign, IL. Lenat, D.B. (1995) CYC: a large-scale investment in knowledge infrastructure. Communications of the ACM, Vol. 38, No. 11, pp. 32–38. Lewis, D.D. (1992) An evaluation of phrasal and clustered representations on a text categorization task. Proceedings of the International Conference on Research and Development in Information Retrieval SIGIR-92. pp. 37–50. Copenhagen, Denmark. Liere, R. and Tadepalli, P. (1996) The use of active learning in text categorization. Proceedings of the AAAI Symposium on Machine Learning in Information Access. Stanford, CA. Mani, I. (2001) Automatic summarization. John Benjamins, Amsterdam. Mann, T. (1993) Library research models. Oxford University Press, New York. Martin, J.D. (1995) Clustering full text documents. Proceedings of the IJCAI Workshop on Data Engineering for Inductive Learning at IJCAI-95. Montreal, Canada. McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999) Building domain-specific search engines with machine learning techniques. Proceedings of the AAAI Spring Symposium. Stanford, CA. Mitchell, T.M. (1997) Machine Learning. McGraw Hill, New York. Nahm, U.Y. and Mooney, R.J. (2000) Using information extraction to aid the discovery of prediction rules from texts. Proceedings of the Workshop on Text Mining, International Conference on Knowledge Discovery and Data Mining KDD-2000. Boston, pp. 51–58. Nahm, U.Y. and Mooney, R.J. (2002) Text mining with information extraction. Proceedings of the AAAI2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases. Stanford, CA. Nardi, B.A., Miller, J.R. and Wright, D.J. (1998) Collaborative, programmable intelligent agents. Communications of the ACM, Vol. 41, No. 3, pp. 96-104.
Copyright 2005 by CRC Press LLC
C3812_C14.fm Page 22 Wednesday, August 4, 2004 8:23 AM
14-22
The Practical Handbook of Internet Computing
Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. (1998) Learning to classify text from labeled and unlabeled documents. Proceedings of the National Conference on Artificial Intelligence AAAI-98. Madison, WI, pp. 792–799. Porter, M.F. (1980) An algorithm for suffix stripping. Program, Vol. 13, No. 3, pp. 130–137. Quinlan, R. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Rabiner, L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE Vol. 77, No. 2, pp. 257–286. Salton, G. and McGill, M.J. (1983) Introduction to Modern Information Retrieval. McGraw Hill, New York. Sebastiani, F. (2002) Machine learning in automated text categorization. ACM Computing Surveys, Vol. 34, No. 1, pp. 1–47. Soderland, S., Fisher, D., Aseltine, J., and Lehnert, W. (1995) Crystal: inducing a conceptual dictionary. Proceedings of the International Conference on Machine Learning ICML-95. Tarragona, Catalonia, Spain, pp. 343–351. Swanson, D.R. (1987) Two medical literatures that are logically but not bibliographically connected. Journal of the American Society for Information Science, Vol. 38, No. 4, pp. 228–233. Swanson, D.R. and Smalheiser, N.R. (1997) An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence, Vol. 91, pp. 183–203. Tkach, D. (Ed.). (1998) Text Mining Technology: Turning Information into Knowledge. IBM White Paper, February 17, 1998. Wasserman, S. and Faust, K. (1994) Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, U.K. Weiss, R., Velez, B., Nemprempre, C., Szilagyi, P., Duda, A., and Gifford, D.K. (1996) HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. Proceedings of the ACM Conference on Hypertext. Washington, D.C., March, pp. 180–193. Willett, P. (1988) Recent trends in hierarchical document clustering: a critical review. Information Processing and Management, Vol. 24, No. 5, pp. 577–597. Witten, I.H., Moffat, A., and Bell, T.C. (1999) Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, CA. Witten, I.H. and Frank, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA. Witten, I.H. and Bainbridge, D. (2003) How to Build a Digital Library. Morgan Kaufmann, San Francisco, CA.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 1 Wednesday, August 4, 2004 8:25 AM
15 Web Usage Mining and Personalization CONTENTS Abstract 15.1 Introduction and Background 15.2 Data Preparation and Modeling 15.2.1 15.2.2 15.2.3 15.2.4
Sources and Types of Data Usage Data Preparation Postprocessing of User Transactions Data Data Integration from Multiple Sources
15.3 Pattern Discovery from Web Usage Data 15.3.1 Levels and Types of Analysis 15.3.2 Data-Mining Tasks for Web Usage Data
15.4 Using the Discovered Patterns for Personalization 15.4.1 15.4.2 15.4.3 15.4.4
The kNN-Based Approach Using Clustering for Personalization Using Association Rules for Personalization Using Sequential Patterns for Personalization
15.5 Conclusions and Outlook 15.5.1 Which Approach? 15.5.2 The Future: Personalization Based on Semantic Web Mining
Bamshad Mobasher
References
Abstract In this chapter we present a comprehensive overview of the personalization process based on Web usage mining. In this context we discuss a host of Web usage mining activities required for this process, including the preprocessing and integration of data from multiple sources, and common pattern discovery techniques that are applied to the integrated usage data. We also presented a number of specific recommendation algorithms for combining the discovered knowledge with the current status of a user’s activity in a Website to provide personalized content. The goal of this chapter is to show how pattern discovery techniques such as clustering, association rule mining, and sequential pattern discovery, performed on Web usage data, can be leveraged effectively as an integrated part of a Web personalization system.
15.1 Introduction and Background The tremendous growth in the number and the complexity of information resources and services on the Web has made Web personalization an indispensable tool for both Web-based organizations and end users. The ability of a site to engage visitors at a deeper level and to successfully guide them to useful and pertinent information is now viewed as one of the key factors in the site’s ultimate success. Web
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 2 Wednesday, August 4, 2004 8:25 AM
15-2
The Practical Handbook of Internet Computing
personalization can be described as any action that makes the Web experience of a user customized to the user’s taste or preferences. Principal elements of Web personalization include modeling of Web objects (such as pages or products) and subjects (such as users or customers), categorization of objects and subjects, snatching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. To date, the approaches and techniques used in Web personalization can be categorized into three general groups: manual decision rule systems, content-based filtering agents, and collaborative filtering systems. Manual decision rule systems, such as Broadvision (www.broadvision.com), allow Website administrators to specify rules based on user demographics or static profiles (collected through a registration process). The rules are used to affect the content served to a particular user. Collaborative filtering systems such as Net Perceptions (www.netperceptions.com) typically take explicit information in the form of user ratings or preferences and, through a correlation engine, return information that is predicted to closely match the user’s preferences. Content-based filtering systems such as those used by WebWatcher [Joachims et al., 1997] and client-side agent Letizia [Lieberman, 1995] generally rely on personal profiles and the content similarity of Web documents to these profiles for generating recommendations. There are several well-known drawbacks to content-based or rule-based filtering techniques for personalization. The type of input is often a subjective description of the users by the users themselves, and thus is prone to biases. The profiles are often static, obtained through user registration, and thus the system performance degrades over time as the profiles age. Furthermore, using content similarity alone may result in missing important “pragmatic” relationships among Web objects based on how they are accessed by users. Collaborative filtering [Herlocker et al., 1999; Konstan et al., 1997; Shardanand and Maes, 1995] has tried to address some of these issues and, in fact, has become the predominant commercial approach in most successful e-commerce systems. These techniques generally involve matching the ratings of a current user for objects (e.g., movies or products) with those of similar users (nearest neighbors) in order to produce recommendations for objects not yet rated by the user. The primary technique used to accomplish this task is the k-Nearest-Neighbor (kNN) classification approach that compares a target user’s record with the historical records of other users in order to find the top k users who have similar tastes or interests. However, collaborative filtering techniques have their own potentially serious limitations. The most important of these limitations is their lack of scalability. Essentially, kNN requires that the neighborhood formation phase be performed as an online process, and for very large data sets this may lead to unacceptable latency for providing recommendations. Another limitation of kNN-based techniques emanates from the sparse nature of the data set. As the number of items in the database increases, the density of each user record with respect to these items will decrease. This, in turn, will decrease the likelihood of a significant overlap of visited or rated items among pairs of users, resulting in less reliable computed correlations, Furthermore, collaborative filtering usually performs best when explicit nonbinary user ratings for similar objects are available. In many Websites, however, it may be desirable to integrate the personalization actions throughout the site involving different types of objects, including navigational and content pages, as well as implicit product-oriented user events such as shopping cart changes or product information requests. A number of optimization strategies have been proposed and employed to remedy these shortcomings [Aggarwal et al., 1999; O’Conner and Herlocker, 1999; Sarwar et al., 2000a; Ungar and Foster, 1998; Yu, 1999]. These strategies include similarity indexing and dimensionality reduction to reduce real-time search costs, as well as offline clustering of user records, allowing the online component of the system to search only within a matching cluster. There has also been a growing body of work in enhancing collaborative filtering by integrating data from other sources such as content and user demographics [Claypool et al., 1999; Pazzani, 1999]. More recently, Web usage mining [Srivastava et al., 2000], has been proposed as an underlying approach for Web personalization [Mobasher et al., 2000a]. The goal of Web usage mining is to capture and model the behavioral patterns and profiles of users interacting with a Website. The discovered patterns are usually represented as collections of pages or items that are frequently accessed by groups of users with
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 3 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-3
common needs or interests. Such patterns can be used to better understand behavioral characteristics of visitors or user segments, to improve the organization and structure of the site, and to create a personalized experience for visitors by providing dynamic recommendations. The flexibility provided by Web usage mining can help enhance many of the approaches discussed in the preceding text and remedy many of their shortcomings. In particular, Web usage mining techniques, such as clustering, association rule mining, and navigational pattern mining, that rely on offline pattern discovery from user transactions can be used to improve the scalability of collaborative filtering when dealing with clickstream and ecommerce data. The goal of personalization based on Web usage mining is to recommend a set of objects to the current (active) user, possibly consisting of links, ads, text, products, or services tailored to the user’s perceived preferences as determined by the matching usage patterns. This task is accomplished by snatching the active user session (possibly in conjunction with previously stored profiles for that user) with the usage patterns discovered through Web usage mining. We call the usage patterns used in this context aggregate usage profiles because they provide an aggregate representation of the common activities or interests of groups of users. This process is performed by the recommendation engine which is the online component of the personalization system. If the data collection procedures in the system include the capability to track users across visits, then the recommendations can represent a longer-term view of user’s potential interests based on the user’s activity history within the site. If, on the other hand, aggregate profiles are derived only from user sessions (single visits) contained in log files, then the recommendations provide a “short-term” view of user’s navigational interests. These recommended objects are added to the last page in the active session accessed by the user before that page is sent to the browser. The overall process of Web personalization based on Web usage mining consists of three phases: data preparation and transformation, pattern discovery, and recommendation. Of these, only the latter phase is performed in real time. The data preparation phase transforms raw Web log files into transaction data that can be processed by data-mining tasks. This phase also includes data integration from multiple sources, such as backend databases, application servers, and site content. A variety of data-mining techniques can be applied to this transaction data in the pattern discovery phase, such as clustering, association rule mining, and sequential pattern discovery. The results of the mining phase are transformed into aggregate usage profiles, suitable for use in the recommendation phase. The recommendation engine considers the active user session in conjunction with the discovered patterns to provide personalized content. In this chapter we present a comprehensive view of the personalization process based on Web usage mining. A generalized framework for this process is depicted in Figure 15.1 and Figure 15.2. We use this framework as our guide in the remainder of this chapter. We provide a detailed discussion of a host of Web usage mining activities necessary for this process, including the preprocessing and integration of data from multiple sources (Section 15. 1) and common pattern discovery techniques that are applied to the integrated usage data (Section 15.2.4). We then present a number of specific recommendation algorithms for combining the discovered knowledge with the current status of a user’s activity in a Website to provide personalized content to a user. This discussion shows how pattern discovery techniques such as clustering, association rule mining, and sequential pattern discovery, performed on Web usage data, can be leveraged effectively as an integrated part of a Web personalization system (Section 15.3.2.3).
15.2 Data Preparation and Modeling An important task in any data mining application is the creation of a suitable target data set to which data-mining algorithms are applied. This process may involve preprocessing the original data, integrating data from multiple sources, and transforming the integrated data into a form suitable for input into specific data-mining operations. Collectively, we refer to this process as data preparation. The data preparation process is often the most time-consuming and computationally intensive step in the knowledge discovery process. Web usage mining is no exception: in fact, the data preparation process in Web usage mining often requires the use of especial algorithms and heuristics not commonly
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 4 Wednesday, August 4, 2004 8:25 AM
15-4
The Practical Handbook of Internet Computing
FIGURE 15.1 The offline data preparation and pattern discovery components.
FIGURE 15.2 The online personalization component.
employed in other domains. This process is critical to the successful extraction of useful patterns from the data. In this section we discuss some of the issues and concepts related to data modeling and preparation in Web usage mining. Although this discussion is in the general context of Web usage analysis, we are focused especially on the factors that have been shown to greatly affect the quality and usability of the discovered usage patterns for their application in Web personalization.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 5 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-5
15.2.1 Sources and Types of Data The primary data sources used in Web usage mining are the server log files, which include Web server access logs and application server logs. Additional data sources that are also essential for both data preparation and pattern discovery include the site files and metadata, operational databases, application templates, and domain knowledge. Generally speaking, the data obtained through these sources can be categorized into four groups [Cooley et al., 1999; Srivastava et al., 2000]: 15.2.1.1 Usage Data The log data collected automatically by the Web and application servers represents the fine-grained navigational behavior of visitors. Depending on the goals of the analysis, this data needs to be transformed and aggregated at different levels of abstraction. In Web usage mining, the most basic level of data abstraction is that of a pageview. Physically, a pageview is an aggregate representation of a collection of Web objects contributing to the display on a user’s browser resulting from a single user action (such as a clickthrough). These Web objects may include multiple pages (such as in a frame-based site), images, embedded components, or script and database queries that populate portions of the displayed page (in dynamically generated sates). Conceptually, each pageview represents a specific “type” of user activity on the site, e.g., reading a news article, browsing the results of a search query, viewing a product page, adding a product to the shopping cart, and so on. On the other hand, at the user level, the most basic level of behavioral abstraction is that of a server session (or simply a session). A session (also commonly referred to as a “visit”) is a sequence of pageviews by a single user during a single visit. The notion of a session can be further abstracted by selecting a subset of pageviews in the session that are significant or relevant for the analysis tasks at hand. We shall refer to such a semantically meaningful subset of pageviews as a transaction (also referred to as an episode according to the W3C Web Characterization Activity [W3C]). It is important to note that a transaction does not refer simply to product purchases, but it can include a variety of types of user actions as captured by different pageviews in a session. 15.2.1.2 Content Data The content data in a site is the collection of objects and relationships that are conveyed to the user. For the most part, this data is comprised of combinations of textual material and images. The data sources used to deliver or generate this data include static HTML/XML pages, images, video clips, sound files, dynamically generated page segments from scripts or other applications, and collections of records from the operational databases. The site content data also includes semantic or structural metadata embedded within the site or individual pages, such as descriptive keywords, document attributes, semantic tags, or HTTP variables. Finally, the underlying domain ontology for the site is also considered part of the content data. The domain ontology may be captured implicitly within the site or it may exist in some explicit form. The explicit representations of domain ontologies may include conceptual hierarchies over page contents, such as product categories, structural hierarchies represented by the underlying file and directory structure in which the site content is stored, explicit representations of semantic content and relationships via an ontology language such as RDF, or a database schema over the data contained in the operational databases. 15.2.1.3 Structure Data The structure data represents the designer’s view of the content organization within the site. This organization is captured via the interpage linkage structure among pages, as reflected through hyperlinks. The structure data also includes the intrapage structure of the content represented in the arrangement of HTML or XML tags within a page. For example, both HTML and XML documents can be represented as tree structures over the space of tags in the page. The structure data for a site is normally captured by an automatically generated “site map” that represents the hyperlink structure of the site. A site mapping tool must have the capability to capture
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 6 Wednesday, August 4, 2004 8:25 AM
15-6
The Practical Handbook of Internet Computing
and represent the inter- and intra-pageview relationships. This necessity becomes most evident in a frame-based site where portions of distinct pageviews may represent the same physical page. For dynamically generated pages, the site mapping tools must either incorporate intrinsic knowledge of the underlying applications and scripts, or must have the ability to generate content segments using a sampling of parameters passed to such applications or scripts. 15.2.1.4 User Data The operational databases for the site may include additional user profile information. Such data may include demographic or other identifying information on registered users, user ratings on various objects such as pages, products, or movies, past purchase or visit histories of users, as well as other explicit or implicit representations of a user’s interests. Obviously, capturing such data would require explicit interactions with the users of the site. Some of this data can be captured anonymously, without any identifying user information, so long as there is the ability to distinguish among different users. For example, anonymous information contained in clientside cookies can be considered a part of the users’ profile information and can be used to identify repeat visitors to a site. Many personalization applications require the storage of prior user profile information. For example, collaborative filtering applications usually store prior ratings of objects by users, though such information can be obtained anonymously as well.
15.2.2 Usage Data Preparation The required high-level tasks in usage data preprocessing include data cleaning, pageview identification, user identification, session identification (or sessionization), the inference of missing references due to caching, and transaction (episode) identification. We provide a brief discussion of some of these tasks below; for a more detailed discussion see Cooley [2000] and Cooley et al. [1999]. Data cleaning is usually site-specific and involves tasks such as removing extraneous references to embedded objects, graphics, or sound files, and removing references due to spider navigations. The latter task can be performed by maintaining a list of known spiders and through heuristic identification of spiders and Web robots [Tan and Kumar, 2002]. It may also be necessary to merge log files from several Web and application servers. This may require global synchronization across these servers. In the absence of shared embedded session IDs, heuristic methods based on the “referrer” field in server logs along with various sessionization and user identification methods (see the following text) can be used to perform the merging. Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached. Missing references due to caching can be heuristically inferred through path completion, which relies on the knowledge of site structure and referrer information from server logs [Cooley et al., 1999]. In the case of dynamically generated pages, form-based applications using the HTTP POST method result in all or part of the user input parameter not being appended to the URL accessed by the user (though, in the latter case, it is possible to recapture the user input through packet sniffers on the server side). Identification of pageviews is heavily dependent on the intrapage structure of the site, as well as on the page contents and the underlying site domain knowledge. For a single frame site, each HTML file has a one-to-one correlation with a pageview. However, for multiframed sites, several files make up a given pageview. Without detailed site structure information, it is very difficult to infer pageviews from a Web server log. In addition, it may be desirable to consider pageviews at a higher level of aggregation, where each pageview represents a collection of pages or objects — for example, pages related to the same concept category. Not all pageviews are relevant for specific raining tasks, and among the relevant pageviews some may be more significant than others. The significance of a pageview may depend on usage, content and structural characteristics of the site, as well as on prior domain knowledge (possibly specified by the site designer and the data analyst). For example, in an e-commerce site, pageviews corresponding to product-
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 7 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-7
oriented events (e.g., shopping cart changes or product information views) may be considered more significant than others. Similarly, in a site designed to provide content, content pages may be weighted higher than navigational pages. In order to provide a flexible framework for a variety of data-mining activities, a number of attributes must be recorded with each pageview. These attributes include the pageview ID (normally a URL uniquely representing the pageview), duration, static pageview type (e.g., information page, product view, or index page), and other metadata, such as content attributes. The analysis of Web usage does not require knowledge about a user’s identity. However, it is necessary to distinguish among different users. In the absence of registration and authentication mechanisms, the most widespread approach to distinguishing among users is with client-side cookies. Not all sites, however, employ cookies, and due to abuse by some organizations and because of privacy concerns on the part of many users, client-side cookies are sometimes disabled. IP addresses alone are not generally sufficient for mapping log entries onto the set of unique users. This is mainly due the proliferation of ISP proxy servers that assign rotating IP addresses to clients as they browse the Web. It is not uncommon, for instance, to find a substantial percentage of IP addresses recorded in server logs of a high-traffic site as belonging to America Online proxy server or other major ISPs. In such cases, it is possible to more accurately identify unique users through combinations IP addresses and other information such as user agents, operating systems, and referrers [Cooley et al., 1999]. Since a user may visit a site more than once, the server logs record multiple sessions for each user. We use the phrase user activity log to refer to the sequence of logged activities belonging to the same user. Thus, sessionization is the process of segmenting the user activity log of each user into sessions. Websites without the benefit of additional authentication information from users and without mechanisms such as embedded session IDs must rely on heuristics methods for sessionization. A sessionization heuristic is a method for performing such a segmentation on the basis of assumptions about users’ behavior or the site characteristics. The goal of a heuristic is the reconstruction of the real sessions, where a real session is the actual sequence of activities performed by one user during one visit to the site. We denote the “conceptual” set of real sessions by ¬. A sessionization heuristic h attempts to map ¬ into a set of constructed sessions, which we denote as C ∫ Ch. For the ideal heuristic, h*, we have C ∫ Ch* = ¬. Generally, sessionization heuristics fall into two basic categories: time-oriented or structure-oriented. Time-oriented heuristics apply either global or local time-out estimates to distinguish between consecutive sessions, while structure-oriented heuristics use either the static site structure or the implicit linkage structure captured in the referrer fields of the server logs. Various heuristics for sessionization have been identified and studied [Cooley et al., 1999]. More recently, a formal framework for measuring the effectiveness of such heuristics has been proposed [Spiliopoulou et al., 2003], and the impact of different heuristics on various Web usage mining tasks has been analyzed [Berendt et al., 2002b]. Finally, transaction (episode) identification can be performed as a final preprocessing step prior to pattern discovery in order to focus on the relevant subsets of pageviews in each user session. As noted earlier, this task may require the automatic or semiautomatic classification of pageviews into different functional types or into concept classes according to a domain ontology. In highly dynamic sites, it may also be necessary to map pageviews within each session into “service-base” classes according to a concept hierarchy over the space of possible parameters passed to script or database queries [Berendt and Spiliopoulou, 2000]. For example, the analysis may ignore the quantity and attributes of an items added to the shopping cart and focus only on the action of adding the item to the cart. The above preprocessing tasks ultimately result in a set of n pageviews, P = {p1 , p2 ,, pn }, and a set of m user transactions, T = {t1 , t 2 , , t m }, where each t i ŒT is a subset of P. Conceptually, we can view each transaction t as an l-length sequence of ordered pairs:
t = ·( p1t , w( p1t )), ( p2t , w( p2t )), , ( plt , w( plt ))Ò where each pit = p j for some j Œ{1, , n}, and w · pit Ò is the weight associated with pageview pit in the transaction t.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 8 Wednesday, August 4, 2004 8:25 AM
15-8
The Practical Handbook of Internet Computing
The weights can be determined in a number of ways, in part based on the type of analysis or the intended personalization framework. For example, in collaborative filtering applications, such weights may be determined based on user ratings of items. In most Web usage mining tasks, the focus is generally on anonymous user navigational activity where the primary sources of data are server logs. This allows us to choose two types of weights for pageviews: weights can be binary, representing the existence or nonexistence of a pageview in the transaction, or they can be a function of the duration of the pageview in the user’s session. In the case of time durations, it should be noted that usually the time spent by a user on the last pageview in the session is not available. One commonly used option is to set the weight for the last pageview to be the mean time duration for the page taken across all sessions in which the pageview does not occur as the last one. Whether or not the user transactions are viewed as sequences or as sets (without taking ordering information into account) is also dependent on the goal of the analysis and the intended applications. For sequence analysis and the discovery of frequent navigational patterns, one must preserve the ordering information in the underlying transaction. On the other hand, for clustering tasks as well as for collaborative filtering based on kNN and association rule discovery, we can represent each user transaction as a vector over the n-dimensional space of pageviews, where dimension values are the weights of these pageviews in the corresponding transaction. Thus given the transaction t above, the n-dimensional transaction vector t is given by: t = ·w tp1 , w tp2 ,, w tpn Ò where each w tp = w( pit ), for some i Œ{1, , n}, in case pj appears in the transaction t, and w tp j = 0, j otherwise. For example, consider a site with 6 pageviews A, B, C, D, E, and F. Assuming that the pageview weights associated with a user transaction are determined by the number of seconds spent on them, a typical transaction vector may look like: ·11, 0, 22, 5,127, 0Ò. In this case, the vector indicates that the user spent 11 sec on page A, 22 sec on page C, 5 sec on page D, and 127 sec on page E. The vector also indicates that the user did not visit pages B and F during this transaction. Given this representation, the set of all m user transactions can be conceptually viewed as an m ¥ n transaction–pageview matrix that we shall denote by TP. This transaction–pageview matrix can then be used to perform various data-mining tasks. For example, similarity computations can be performed among the transaction vectors (rows) for clustering and kNN neighborhood formation tasks, or an association rule discovery algorithm, such as Apriory, can be applied (with pageviews as items) to find frequent itemsets of pageviews.
15.2.3 Postprocessing of User Transactions Data In addition to the aforementioned preprocessing steps leading to user transaction matrix, there are a variety of transformation tasks that can be performed on the transaction data. Here, we highlight some of data transformation tasks that are likely to have an impact on the quality and actionability of the discovered patterns resulting from raining algorithms. Indeed, such postprocessing transformations on session or transaction data have been shown to result in improvements in the accuracy of recommendations produced by personalization systems based on Web usage mining [Mobasher et al., 2001b]. 15.2.3.1 Significance Filtering Using binary weights in the representation of user transactions is often desirable due to efficiency requirements in terms of storage and computation of similarity coefficients among transactions. However, in this context, it becomes more important to determine the significance of each pageview or item access. For example, a user may access an item p only to find that he or she is not interested in that item, subsequently backtracking to another section of the site. We would like to capture this behavior by discounting the access to p as an insignificant access. We refer to the processing of removing page or item requests that are deemed insignificant as significance filtering.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 9 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-9
The significance of a page within a transaction can be determined manually or automatically. In the manual approach, the site owner or the analyst is responsible for assigning significance weights to various pages or items. This is usually performed as a global mapping from items to weights, and thus the significance of the pageview is not dependent on a specific user or transaction. More commonly, a function of pageview duration is used to automatically assign signficance weights. In general, though, it is not sufficient to filter out pageviews with small durations because the amount of time spent by users on a page is not merely based on the user’s interest on the page. The page duration may also be dependent on the characteristics and the content of the page. For example, we would expect that users spend far less time on navigational pages than they do on content or product-oriented pages. Statistical significance testing can help capture some of the semantics illustrated above. The goal of significance filtering is to eliminate irrelevant items with time duration significantly below a certain threshold in transactions. Typically, statistical measures such as mean and variance can be used to systematically define the threshold for significance filtering. In general, it can be observed that the distribution of access frequencies as a function of the amount of time spent on a given pageview is characterized by a log-normal distribution. For example, Figure 15.3 (left) shows the distribution of the number of transactions with respect to time duration for a particular pageview in a typical Website. Figure 15.3 (right) shows the distribution plotted as a function of time in a log scale. The log normalization can be observed to produce a Gaussian distribution. After this transformation, we can proceed with standard significance testing: the weight associated with an item in a transaction will be considered to be 0 if the amount of time spent on that item is significantly below the mean time duration of the item across all user transactions. The significance of variation from the mean is usually measured in terms of multiples of standard deviation. For example, in a given transaction t, if the amount of time spent on a pageview p is 1.5 to 2 standard deviations lower than the mean duration for p across all transactions, then the weight of p in transaction t might be set to 0. In such a case, it is likely that the user was either not interested in the contents of p, or mistakenly navigated to p and quickly left the page. 15.2.3.2 Normalization
350
350
300
300
250
250
# of Transactions
# of Transactions
There are also some advantages in rising the fully weighted representation of transaction vectors (based on time durations). One advantage is that for many distance- or similarity-based clustering algorithms, more granularity in feature weights usually leads to more accurate results. Another advantage is that, because relative time durations are taken into account, the need for performing other types of transformations, such as significance filtering, is greatly reduced.
200 150
200 150
100
100
50
50
0
0 0
50
100
150 Time Duration
200
250
300
0
0.5
1
1.5
FIGURE 15.3 Distribution of pageview durations: raw-time scale (left), log-time scale (right).
Copyright 2005 by CRC Press LLC
2
Time Duration (in log scale)
2.5
3
C3812_C15.fm Page 10 Wednesday, August 4, 2004 8:25 AM
15-10
The Practical Handbook of Internet Computing
However, raw time durations may not be an appropriate measure for the significance of a pageview. This is because a variety of factors, such as structure, length, and the type of pageview, as well as the user’s interests in a particular item, may affect the amount of time spent on that item. Appropriate weight normalization can play an essential role in correcting for these factors. Generally, two types of weight normalization are applied to user transaction vectors: normalization across pageviews in a single transaction and normalization of a pageview weights across all transactions. We call these transformations transaction normalization and pageview normalization, respectively. Pageview normalization is useful in capturing the relative weight of a pageview for a user with respect to the weights of the same pageview for all other users. On the other hand, transaction normalization captures the importance of a pageview to a particular user relative to the other items visited by that user in the same transaction. The latter is particularly useful in focusing on the “target” pages in the context of short user histories.
15.2.4 Data Integration from Multiple Sources In order to provide the most effective framework for pattern discovery and analysis, data from a variety of sources must be integrated. Our earlier discussion already alluded to the necessity of considering the content and structure data in a variety of preprocessing tasks such as pageview identification, sessionization, and the inference of missing data. The integration of content, structure, and user data in other phases of the Web usage mining and personalization processes may also be essential in providing the ability to further analyze and reason about the discovered patterns, derive more actionable knowledge, and create more effective personalization tools. For example, in e-commerce applications, the integration of both user data (e.g., demographics, ratings, purchase histories) and product attributes from operational databases is critical. Such data, used in conjunction with usage data in the mining process, can allow for the discovery of important business intelligence metrics such as customer conversion ratios and lifetime values. On the other hand, the integration of semantic knowledge from the site content or domain ontologies can be used by personalization systems to provide more useful recommendations. For instance, consider a hypothetical site containing information about movies that employs collaborative filtering on movie ratings or pageview transactions to give recommendations. The integration of semantic knowledge about movies (possibly extracted from site content) can allow the system to recommend movies, not just based on similar ratings or navigation patterns but also perhaps based on similarities in attributes such as movie genres or commonalities in casts or directors. One direct source of semantic knowledge that can be integrated into the mining and personalization processes is the collection of content features associated with items or pageviews on a Website. These features include keywords, phrases, category names, or other textual content embedded as meta information. Content preprocessing involves the extraction of relevant features from text and metadata. Metadata extraction becomes particularly important when dealing with product-oriented pageviews or those involving nontextual content. In order to use features in similarity computations, appropriate weights must be associated with them. Generally, for features extracted from text, we can use standard techniques from information retrieval and filtering to determine feature weights [Frakes and Baeza-Yates, 1992]. For instance, a commonly used feature-weighting scheme is tf.idf, which is a function of the term frequency and inverse document frequency. More formally, each pageview p can be represented as a k-dimensional feature vector, where k is the total number of extracted features from the site in a global dictionary. Each dimension in a feature vector represents the corresponding feature weight within the pageview. Thus, the feature vector for a pageview p is given by:
p = · fw( p, f1), fw( p, f2 ), , fw( p, fk )Ò
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 11 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-11
where fw( p, f j ), is the weight of the jth feature in pageview p Œ P, for 1 £ j £ k. For features extracted from textual content of pages, the feature weight is usually the normalized tf.idf value for the term. In order to combine feature weights from metadata (specified externally) and feature weights from the text content, proper normalization of those weights must be performed as part of preprocessing. Conceptually, the collection of these vectors can be viewed as a n ¥ k pageview-feature matrix in which each row is a feature vector corresponding to one of the n pageviews in P. We shall call this matrix PF. The feature vectors obtained in this way are usually organized into an inverted file structure containing a dictionary of all extracted features and posting files for each feature specifying the pageviews in which the feature occurs along with its weight. This inverted file structure corresponds to the transpose of the matrix PF. Further preprocessing on content features can be performed by applying text-mining techniques. This would provide the ability to filter the input to, or the output from, usage-mining algorithms. For example, classification of content features based on a concept hierarchy can be used to limit the discovered usage patterns to those containing pageviews about a certain subject or class of products. Similarly, performing clustering or association rule mining on the feature space can lead to composite features representing concept categories. The mapping of features onto a set of concept labels allows for the transformation of the feature vectors representing pageviews into concept vectors. The concept vectors represent the semantic concept categories to which a pageview belongs, and they can be viewed at different levels of abstraction according to a concept hierarchy (either preexisting or learned through machine-learning techniques). This transformation can be useful both in the semantic analysis on the data and as a method for dimensionality reduction in some data-raining tasks, such as clustering. A direct approach for the integration of content and usage data for Web usage mining tasks is to transform user transactions, as described earlier, into “content-enhanced” transactions containing the semantic features of the underlying pageviews. This process, performed as part of data preparation, involves mapping each pageview in a transaction to one or more content features. The range of this mapping can be the full feature space or the concept space obtained as described above. Conceptually, the transformation can be viewed as the multiplication of the transaction–pageview matrix TP (described in Section 15.2.2) with the pageview–feature matrix PF. The result is a new matrix TF = {t1' , t 2' , , t m' }, where each t i' is a k-dimensional vector over the feature space. Thus, a user transaction can be represented as a content feature vector, reflecting that user’s interests in particular concepts or topics. A variety of data-raining algorithms can then be applied to this transformed transaction data. The above discussion focused primarily on the integration of content and usage data for Web usage mining. However, as noted earlier, data from other sources must also be considered as part of an integrated framework. Figure 15.4 shows the basic elements of such a framework. The content analysis module in this framework is responsible for extracting and processing linkage and semantic information from pages. The processing of semantic information includes the steps described above for feature extraction and concept mapping. Analysis of dynamic pages may involve (partial) generation of pages based on templates, specified parameters, or database queries based on the information captured from log records. The outputs from this module may include the site map capturing the site topology as well as the site dictionary and the inverted file structure used for content analysis and integration. The site map is used primarily in data preparation (e.g., in pageview identification and path completion). It may be constructed through content analysis or the analysis of usage data (using the referrer information in log records). Site dictionary provides a mapping between pageview identifiers (for example, URLs) and content or structural information on pages; it is used primarily for “content labeling” both in sessionized usage data as well as the integrated e-commerce data. Content labels may represent conceptual categories based on sets of features associated with pageviews. The data integration module is used to integrate sessionized usage data, e-commerce data (from application servers), and product or user data from databases. User data may include user profiles, demographic information, and individual purchase activity. E-commerce data includes various productoriented events, including shopping cart changes, purchase information, impressions, clickthroughs, and
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 12 Wednesday, August 4, 2004 8:25 AM
15-12
The Practical Handbook of Internet Computing
Site Content
Content Analysis Module
Web/Application Server Logs
Preprocessing/ Sessionization Module
Data Integration Module
Integrated Sessionized Data
E-Commerce Data Mart
Usage Analysis
OLAP Tools OLAP Analysis
Data Cube
Site Map
customers orders products
Site Dictionary
Data Mining Engine
Pattern Analysis
Operational Database
FIGURE 15.4 An integrated framework for Web usage analysis.
other basic metrics primarily used for data transformation and loading mechanism of the Data Mart. The successful integration of this type of e-commerce data requires the creation of a site-specific “event model” based on which subsets of a user’s clickstream are aggregated and mapped to specific events such as the addition of a product to the shopping cart. Product attributes and product categories, stored in operational databases, can also be used to enhance or expand content features extracted from site files. The e-commerce data mart is a multidimensional database integrating data from a variety of sources, and at different levels of aggregation. It can provide precomputed e-metrics along multiple dimensions, and is used as the primary data source in OLAP analysis, as well as in data selection for a variety of datamining tasks (performed by the data-mining engine). We discuss different types and levels of analysis that can be performed based on this basic framework in the next section.
15.3 Pattern Discovery from Web Usage Data 15.3.1 Levels and Types of Analysis As shown in Figure 15.4, different kinds of analysis can be performed on the integrated usage data at different levels of aggregation or abstraction. The types and levels of analysis, naturally, depend on the ultimate goals of the analyst and the desired outcomes. For instance, even without the benefit of an integrated e-commerce data mart, statistical analysis can be performed on the preprocessed session or transaction data. Indeed, static aggregation (reports) constitutes the most common form of analysis. In this case, data is aggregated by predetermined units such as days, sessions, visitors, or domains. Standard statistical techniques can be used on this data to gain knowledge about visitor behavior. This is the approach taken by most commercial tools available for Web log analysis (however, most such tools do not perform all of the necessary preprocessing tasks Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 13 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-13
described earlier, thus resulting in erroneous or misleading outcomes). Reports based on this type of analysis may include information about most frequently accessed pages, average view time of a page, average length of a path through a site, common entry and exit points, and other aggregate measure. The drawback of this type of analysis is the inability to “dig deeper” into the data or find hidden patterns and relationships. Despite a lack of depth in the analysis, the resulting knowledge can be potentially useful for improving the system performance and providing support for marketing decisions. The reports give quick overviews of how a site is being used and require minimal disk space or processing power. Furthermore, in the past few years, many commercial products for log analysis have incorporated a variety of data-mining tools to discover deeper relationships and hidden patterns in the usage data. Another form of analysis on integrated usage data is Online Analytical Processing (OLAP). OLAP provides a more integrated framework for analysis with a higher degree of flexibility. As indicated in Figure 15.4, the data source for OLAP analysis is a multidimensional data warehouse which integrates usage, content, and e-commerce data at different levels of aggregation for each dimension. OLAP tools allow changes in aggregation levels along each dimension during the analysis. Indeed, the server log data itself can be stored in a multidimensional data structure for OLAP analysis [Zaiane et al., 1998]. Analysis dimensions in such a structure can be based on various fields available in the log files, and may include time duration, domain, requested resource, user agent, referrers, and so on. This allows the analysis to be performed, for example, on portions of the log related to a specific time interval, or at a higher level of abstraction with respect to the URL path structure. The integration of e-commerce data in the data warehouse can enhance the ability of OLAP tools to derive important business intelligence metrics. For example, in Buchner and Mulvenna [1999], an integrated Web log data cube was proposed that incorporates customer and product data, as well as domain knowledge such as navigational templates and site topology. OLAP tools, by themselves, do not automatically discover usage patterns in the data. In fact, the ability to find patterns or relationships in the data depends solely on the effectiveness of the OLAP queries performed against the data warehouse. However, the output from this process can be used as the input for a variety of data-mining algorithms. In the following sections we focus specifically on various datamining and pattern discovery techniques that are commonly performed on Web usage data, and we will discuss some approaches for using the discovered patterns for Web personalization.
15.3.2 Data-Mining Tasks for Web Usage Data We now focus on specific data-mining and pattern discovery tasks that are often employed when dealing with Web usage data. Our goal is not to give detailed descriptions of all applicable data-mining techniques but to provide some relevant background information and to illustrate how some of these techniques can be applied to Web usage data. In the next section, we present several approaches to leverage the discovered patterns for predictive Web usage running applications such as personalization. As noted earlier, preprocessing and data transformation tasks ultimately result in a set of n pageviews, P = { p1 , p2 ,, pn } and a set of m user transactions, T = {t1 , t 2, , tm}, where each ti ŒT is a subset of P. Each transaction t is an l-length sequence of ordered pairs: t = ·( p1t , w( p1t )), ( p2t , w( p2t )), ( plt , w( plt ))Ò, where each pit = p j for some j Œ{1, , n}, and w( pit ) is the weight associated with pageview pit in the transaction t. Given a set of transactions as described above, a variety of unsupervised knowledge discovery techniques can be applied to obtain patterns. Techniques such as clustering of transactions (or sessions) can lead to the discovery of important user or visitor segments. Other techniques such as item (e.g., pageview) clustering, association rule mining [Agarwal et al., 1999; Agrawal and Srikant, 1994], or sequential pattern discovery [Agrawal and Srikant, 1995] can be used to find important relationships among items based on the navigational patterns of users in the site. In the cases of clustering and association rule discovery, generally, the ordering relation among the pageviews is not taken into account; thus a transaction is
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 14 Wednesday, August 4, 2004 8:25 AM
15-14
The Practical Handbook of Internet Computing
{
( )
}
viewed as a set (or, more generally, as a bag) of pageviews st = pit 1 £ i £ l and w pit = 1 In the case of sequential patterns, however, we need to preserve the ordering relationship among the pageviews within transactions in order to effectively model users’ navigational patterns. 15.3.2.1 Association Rules Association rules capture the relationships among items based on their patterns of cooccurrence across transactions (without considering the ordering of items). In the case of Web transactions, association rules capture relationships among pageviews based on the navigational patterns of users. Most common approaches to association discovery are based on the Apriori algorithm (Agrawal and Srikant, 1994, 1995] that follows a generate-and-test methodology. This algorithm finds groups of items (pageviews appearing in the preprocessed log) occurring frequently together in many transactions (i.e., satisfying a userspecified minimum support threshold). Such groups of items are referred to as frequent itemsets. Given a transaction T and a set I = {I1 , I 2, , I k} of frequent itemsets over T, the support of an itemset I i Œ I is defined as s( I i ) =
{t ŒT : I T
i
Õ t}
.
An important property of support in the Apriori algorithm is its downward closure: if an itemset does not satisfy the minimum support criteria, then neither do any of its supersets. This property is essential for pruning the search space during each iteration of the Apriori algorithm. Association rules that satisfy a minimum confidence threshold are then generated from the frequent itemsets. An association rule r is an expression of the form X fi Y (s r , a r ), where X and Y are itemsets, s r = s(X » Y ) is the support of X » Y representing the probability that X and Y occur together in a transaction. The confidence for the rule r ,a r is given by s(X » Y )/ s(X ) and represents the conditional probability that Y occurs in a transaction given that X has occured in that transaction. The discovery of association rules in Web transaction data has many advantages. For example, a highconfidence rule such as {special-offers/ , /products/software/} fi {shopping-cart/} might provide some indication that a promotional campaign on software products is positively affecting online sales. Such rules can also be used to optimize the structure of the site. For example, if a site does not provide direct linkage between two pages A and B, the discovery of a rule {A} fi {B} would indicate that providing a direct hyperlink might aid users in finding the intended information. The result of association rule mining can be used in order to produce a model for recommendation or personalization systems [Fu et al., 2000; Lin et al., 2002; Mobasher et al., 2001a; Sarwar et al., 2000b]. The top-N recommender systems proposed in [Sarwar et al., 2000b] uses the association rules for making recommendations. First, all association rules are discovered on the purchase information. Customer’s historical purchase information then is matched against the left-hand side of the rule in order to find all rules supported by a customer. All right-hand-side items from the supported rules are sorted by confidence and the first N highest-ranked items are selected as recommendation set. One problem for association rule recommendation systems is that a system cannot give any recommendations when the data set is sparse. In Fu et al. [2000], two potential solutions to this problem were proposed. The first solution is to rank all discovered rules calculated by the degree of intersection between the left-hand side of the rule and a user’s active session and then to generate the top k recommendations. The second solution is to utilize collaborative filtering technique: the system finds “close neighbors” who have similar interest to a target user and makes recommendations based on the close neighbor’s history. In Lin et al. [2002], a collaborative recommendation system was presented using association rules. The proposed mining algorithm finds an appropriate number of rules for each target user by automatically selecting the minimum support. The recommendation engine generates association rules for each user among both users and items. Then it gives recommendations based on user association if a user minimum support is greater than a threshold. Otherwise, it uses article association.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 15 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-15
In Mobasher et al. [2001a], a scalable framework for recommender systems using association rule mining was proposed. The recommendation algorithm uses an efficient data structure for storing frequent itemsets and produces recommendations in real time without the need to generate all association rules from frequent itemsets. We discuss this recommendation algorithm based on association rule mining in more detail in Section 15.3.2.3. A problem with using a global minimum support threshold in association rule mining is that the discovered patterns will not include “rare” but important items that may not occur frequently in the transaction data. This is particularly important when dealing with Web usage data; it is often the case that references to deeper content or product-oriented pages occur far less frequently than those of toplevel navigation-oriented pages. Yet, for effective Web personalization, it is important to capture patterns and generate recommendations that contain these items. Liu et al. [1999] proposed a mining method with multiple minimum supports that allows users to specify different support values for different items. In this method, the support of an itemset is defined as the minimum support of all items contained in the itemset. The specification of multiple minimum supports allows frequent itemsets to potentially contain rare items that are nevertheless deemed important. It has been shown that the use of multiple support association rules in the context of Web personalization can be useful in dramatically increasing the coverage (recall) of recommendations while maintaining a reasonable precision [Mobasher et al., 2001a]. 15.3.2.2 Sequential and Navigational Patterns Sequential patterns (SPs) in Web usage data capture the Web page trails that are often visited by users in the order that they were visited. Sequential patterns are those sequences of items that frequently occur in a sufficiently large proportion of transactions. A sequence ·s1 , s2 , , sn Ò occurs in a transaction t = · p1 , p2 , , pm Ò (where n £ m) if there exist n positive integers 1 £ a1 < a2 < < an £ m and si = pai for all i. We say that ·cs1 , cs2 , , csn Ò is a contiguous sequence in t if there exists an integer 0 £ b £ m – n, and csi = pb +i for all i = 1 to n. In a contiguous sequential pattern (CSP), each pair of adjacent elements, si and si+1, must appear consecutively in a transaction t which supports the pattern, while a sequential pattern can represent noncontiguous frequent sequences in the underlying set of transactions. Given a transaction set T and a set S = {S1 , S2 , , Sn} of frequent sequential (respectively, contiguous sequential) pattern over T, the support of each Si is defined as follows:
s(Si ) =
|{t ŒT : Si is (contiguous) subsequence of t}| |T |
The confidence of the rule X fi Y, where X and Y are (contiguous) sequential patterns, is defined as a( X fi Y ) =
s( XoY ) , s( X )
where o denotes the concatenation operator. Note that the support thresholds for SPs and CSPs also satisfy downward closure property, i.e., if a (contiguous) sequence of items, S, has any subsequence that does not satisfy the minimum support criteria, then S does not have minimum support. The Apriori algorithm used in association rule mining can also be adopted to discover sequential and contiguous sequential patterns. This is normally accomplished by changing the definition of support to be based on the frequency of occurrences of subsequences of items rather than subsets of items [Agrawal and Srikant, 1995]. In the context of Web usage data, CSPs can be used to capture frequent navigational paths among user trails [Spiliopoulou and Faulstich, 1999; Schechter et al., 1998]. In contrast, items appearing in SPs, while preserving the underlying ordering, need not be adjacent, and thus they represent more general naviga-
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 16 Wednesday, August 4, 2004 8:25 AM
15-16
The Practical Handbook of Internet Computing
tional patterns within the site. Frequent item sets, discovered as part of association rule mining, represent the least restrictive type of navigational patterns because they focus on the presence of items rather than the order in which they occur within the user session. The view of Web transactions as sequences of pageviews allows us to employ a number of useful and well-studied models that can be used to discover or analyze user navigation patterns. On such approach is to model the navigational activity in the Website as a Markov chain. In general, a Markov model is characterized by a set of states {s1 , s 2 ,, sn } and a transition probability matrix { p1, 1 , …, p1, n , …, p2, 1 , …, p2, n , …, pn, 1 , …, pn, n } where pi,j represents the probability of a transition from state si to state sj. Markov models are especially suited for predictive modeling based on contiguous sequences of events. Each state represents a contiguous subsequence of prior events. The order of the Markov model corresponds to the number of prior events used in predicting a future event. So, a kth-order Markov model predicts the probability of next event by looking the past k events. Given a set of all paths R, the probability of reaching a state sj from a state si via a (noncyclic) path r Œ R is given by p(r) = ’ pk, k +1, where k ranges from i to j – 1. The probability of reaching sj from si is the sum over all paths: p( j | i) = S p(r). r ŒR In the context of Web transactions, Markov chains can be used to model transition probabilities between pageviews. In Web usage analysis, they have been proposed as the underlying modeling machinery for Web prefetching applications or for minimizing system latencies [Deshpande and Karypis, 2001; Palpanas and Mendelzon, 1999; Pitkow and Pirolli, 1999; Sarukkai, 2000]. Such systems are designed to predict the next user action based on a user’s previous surfing behavior. In the case of first-order Markov models, only the user’s current action is considered in predicting the next action, and thus each state represents a single pageview in the user’s transaction. Markov models can also be used to discover highprobability user-navigational trails in a Website. For example, in Borges and Levene [1999], the user sessions are modeled as a hypertext probabilistic grammar (or alternatively, an absorbing Markov chain) whose higher probability paths correspond to the user’s preferred trails. An algorithm is provided to efficiently mine such trails from the model. As an example of how Web transactions can be modeled as a Markov model, consider the set of Web transaction given in Figure 15.5 (left). The Web transactions involve pageviews A, B, C, D, and E. For each transaction the frequency of occurrences of that transaction in the data is given in table’s second column (thus there are a total of 50 transactions in the data set). The (absorbing) Markov model for this data is also given in Figure 15.5 (right). The transitions from the “start” state represent the prior probabilities for transactions starting with pageviews A and B. The transitions into the “final” state represent the probabilities that the paths end with the specified originating pageviews, For example, the transition probability from the state A to B is 16/28 = 0.57 because, out of the 28 occurences of A in transactions, B occurs immediately after A in 16 cases. Higher-order Markov models generally provide a higher prediction accuracy. However, this is usually at the cost of lower coverage and much higher model complexity due to the larger number of states. In order to remedy the coverage and space complexity problems, Pitkow and Pirolli [1999] proposed allkth-order Markov models (for coverage improvement) and a new state reduction technique called longest repeating subsequences (LRS) (for reducing model size). The use of all-kth-order Markov models generally requires the generation of separate models for each of the k orders; if the model cannot make a prediction using the kth order, it will attempt to make a prediction by incrementally decreasing the model order. This scheme can easily lead to even higher space complexity because it requires the representation of all possible states for each k. Deshpande and Karypis [2001] propose selective Markov models, introducing several schemes in order to tackle the model complexity problems with all-kth-order Markov models. The proposed schemes involve pruning the model based on criteria such as support, confidence, and error rate. In particular, the support-pruned Markov models eliminate all states with low support determined by a minimum frequency threshold.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 17 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-17
FIGURE 15.5 An example of modeling navigational trails as a Markov chain.
Another way of efficiently representing navigational trails is by inserting each trail into a trie structure [Spiliopoulou and Faulstich, 1999]. It is also possible to insert frequent sequences (after or during sequential pattern raining) into a trie structure [Pei et al., 2000]. A well-known example of this approach is the notion of aggregate tree introduced as part of the WUM (Web Utilization Miner) system [Spiliopoulou and Faulstich, 1999]. The aggregation service of WUM extracts the transactions from a collection of Web lags, transforms them into sequences, and merges those sequences with the same prefix into the aggregate tree (a trie structure). Each node in the tree represents a navigational subsequence from the root (an empty node) to a page and is annotated by the frequency of occurrences of that subsequence in the transaction data (and possibly other information such as markers to distinguish among repeat occurrences of the corresponding page in the subsequence). WUM uses a powerful mining query language, called MINT, to discover generalized navigational patterns from this trie structure. MINT includes mechanism to specify sophisticated constraints on pattern templates such as wildcards with user-specified boundaries, as well as other statistical thresholds such as support and confidence. As an example, again consider the set of Web transaction given in the previous example. Figure 15.6 shows a simplified version of WUM’s aggregate tree structure derived from these transactions. The advantage of this approach is that the search for navigational patterns can be performed very efficiently and the confidence and support for the sequential patterns can be readily obtained from the node annotations in the tree. For example, consider the navigational sequence ·A, B, E, FÒ. The support for this sequence can be computed as the support of F divided by the support of first pageview in the sequence, A, which is 6/28 = 0.21, and the confidence of the sequence is the support of F divided by support of its parent, E, or 6/16 = 0.375. The disadvantage of this approach is the possibly high space complexity, especially in a site with many dynamically generated pages. 15.3.2.3 Clustering Approaches In general, there are two types of clustering that can be performed on usage transaction data: clustering the transactions (or users) themselves, or clustering pageviews. Each of these approaches is useful in different applications and, in particular, both approaches can be used for Web personalization, There has been a significant amount of work on the applications of clustering in Web usage mining, e-marketing, personalization, and collaborative filtering. For example, an algorithm called PageGather has been used to discover significant groups of pages based on user access patterns [Perkowitz and Etzioni, 1998]. This algorithm uses, as its basis, clustering of pages based on the Clique (complete link) clustering technique. The resulting clusters are used to automatically synthesize alternative static index pages for a site, each reflecting possible interests of one user segment. Clustering of user rating records has also been used as a prior step to collaborative filtering in order to remedy the scalability problems of the k-nearest-neighbor algorithm [O’Conner and Her-
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 18 Wednesday, August 4, 2004 8:25 AM
15-18
The Practical Handbook of Internet Computing
FIGURE 15.6 An example of modeling navigational trails in an aggregate tree.
locker, 1999]. Both transaction clustering and pageview clustering have been used as an integrated part of a Web personalization framework based on Web usage raining [Mobasher et al., 2002b]. Given the mapping of user transactions into a multidimensional space as vectors of pageviews (i.e., the matrix TP in Section 15.2.2), standard clustering algorithms, such as k-means, generally partition this space into groups of transactions that are close to each other based on a measure of distance or similarity among the vectors. Transaction clusters obtained in this way can represent user or visitor segments based on their navigational behavior or other attributes that have been captured in the transaction file. However, transaction clusters by themselves are not an effective means of capturing an aggregated view of common user patterns. Each transaction cluster may potentially contain thousands of user transactions involving hundreds of pageview references. The ultimate goal in clustering user transactions is to provide the ability to analyze each segment for deriving business intelligence, or to use them for tasks such as personalization. One straightforward approach in creating an aggregate view of each cluster is to compute the centroid (or the mean vector) of each cluster. The dimension value for each pageview in the mean vector is computed by finding the ratio of the sum of the pageview weights across transactions to the total number of transactions in the cluster. If pageview weights in the original transactions are binary, then the dimension value of a pageview p in a cluster centroid represents the percentage of transactions in the cluster in which p occurs. Thus, the centroid dimension value of p provides a measure of its significance in the cluster. Pageviews in the centroid can be sorted according to these weights and lower-weight pageviews can be filtered out. The resulting set of pageview-weight pairs can be viewed as an “aggregate usage profile” representing the interests or behavior of a significant group of users. We discuss how such aggregate profiles can be used for personalization in the next section. As an example, consider the transaction data depicted in Figure 15.7 (left). In this case, the feature (pageview) weights in each transaction vector is binary. We assume that the data has already been clustered using a standard clustering algorithm such as k-means, resulting in three clusters of user transactions. The table in the right portion of Figure 15.7 shows the aggregate profile corresponding to cluster 1. As indicated by the pageview weights, pageviews B and F are the most significant pages characterizing common interests of users in this segment. Pageview C, however, only appears in one transaction and might be removed given a filtering threshold greater than 0.25. Note that it is possible to apply a similar procedure to the transpose of the matrix TP, resulting a collection of pageview clusters. However, traditional clustering techniques, such as distance-based methods, generally cannot handle this type clustering. The reason is that instead of using pageviews as dimensions, the transactions must be used as dimensions, whose number is in tens to hundreds of thousands in a typical application. Furthermore, dimensionality reduction in this context may not be
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 19 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-19
FIGURE 15.7 An example of deriving aggregate usage profiles from transaction clusters.
appropriate, as removing a significant number of transactions may result in losing too much information. Similarly, the clique-based clustering approach of PageGather algorithm [Perkowitz and Etzioni, 1998] discussed above can be problematic because finding all maximal cliques in very large graphs is not, in general, computationally feasible. One approach that has been shown to be effective in this type (i.e., item-based) clustering is Association Rule Hypergraph partitioning (ARHP) [Han et al., 1998]. ARHP can efficiently cluster high-dimensional data sets and provides automatic filtering capabilities. In the ARHP, first-association rule mining is used to discover a set I of frequent itemsets among the pageviews in P. These itemsets are used as hyperedges to form a hypergraph H = ·V, EÒ, where V Õ P and E Õ I . A hypergraph is an extension of a graph in the sense that each hyperedge can connect more than two vertices. The weights associated with each hyperedge can be computed based on a variety of criteria such as the confidence of the association rules involving the items in the frequent itemset, the support of the itemset, or the “interest” of the itemset. The hypergraph H is recursively partitioned until a stopping criterion for each partition is reached resulting in a set of clusters C. Each partition is examined to filter out vertices that are not highly connected to the rest of the vertices of the partition. The connectivity of vertex v (a pageview appearing in the frequent itemset) with respect to a cluster c is defined as: conn(v , c) =
S e Õc , vŒeweight(e) S e Õc weight(e)
A high connectivity value suggests that the vertex has strong edges connecting it to other vertices in the partition. The vertices with connectivity measure that are greater than a given threshold value are considered to belong to the partition, and the remaining vertices are dropped from the partition. The connectivity value of an item (pageviews) defined above is important also because it is used as the primary factor in determining the weight associated with that item within the resulting aggregate profile. This approach has also been used in the context of Web personalization [Mobasher et al., 2002b], and its performance in terms of recommendation effectiveness has been compared to the transaction clustering approach discussed above. Clustering can also be applied to Web transactions viewed as sequences rather than as vectors. For example in Banerjee and Ghosh [2001], a graph-based algorithm was introduced to cluster Web transactions based on a function of longest common subsequences. The novel similarity metric used for clustering takes into account both the time spent on pages as well as a significance weight assigned to pages.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 20 Wednesday, August 4, 2004 8:25 AM
15-20
The Practical Handbook of Internet Computing
Finally, we also observe that the clustering approaches such as those discussed in this section can also be applied to content data or to the integrated content-enhanced transactions described in Section 15.2.2. For example, the results of clustering user transactions can be combined with “content profiles” derived from the clustering of text features (terms or concepts) in pages [Mobasher et al., 2000b]. The feature clustering is accomplished by applying a clustering algorithm to the transpose of the pageview-feature matrix PF, defined earlier. This approach treats each feature as a vector over the space of pageviews. Thus the centroid of a feature cluster can be viewed as a set (or vector) of pageviews with associated weights. This representation is similar to that of usage profiles discussed above, however; in this case the weight of a pageview in a profile represents the prominence of the features in that pageview that are associated with the corresponding cluster. The combined set of content and usage profiles can then be used seamlessly for more effective Web personalization. One advantage of this approach is that it solves the “new item” problem that often plagues purely usage-based or collaborative approaches; when a new item (e.g., page or product) is recently added to the site, it is not likely to appear in usage profiles due to the lack of user ratings or access to that page, but it may still be recommended according to its semantic attributes captured by the content profiles.
15.4 Using the Discovered Patterns for Personalization As noted in the Introduction section, the goal of the recommendation engine is to match the active user session with the aggregate profiles discovered through Web usage raining and to recommend a set of objects to the user. We refer to the set of recommended object (represented by pageviews) as the recommendation set. In this section we explore the recommendation procedures to perform the matching between the discovered aggregate profiles and an active user’s session. Specifically, we present several effective recommendation algorithms based on clustering (which can be seen as an extension of standard kNN-based collaborative filtering), association rule mining (AR), and sequential pattern (SP) or contiguous sequential pattern (CSP) discovery. In the cases of AR, SP, and CSP, we consider efficient and scalable data structures for storing frequent itemset and sequential patterns, as well as recommendation generation algorithms that use these data structures to directly produce real-time recommendations (without the apriori generation of rule). Generally, only a portion of the current user’s activity is used in the recommendation process. Maintaining a history depth is necessary because most users navigate several paths leading to independent pieces of information within a session. In many cases these sub-sessions have a length of no more than three or four references. In such a situation, it may not be appropriate to use references a user made in a previous sub-session to make recommendations during the current sub-session. We can capture the user history depth within a sliding window over the current session. The sliding window of size n over the active session allows only the last n visited pages to influence the recommendation value of items in the recommendation set. For example, if the current session (with a window size of 3) is ·A, B, CÒ, and the user accesses the pageview D, then the new active session becomes ·B, C, DÒ. We call this sliding window the user’s active session window. Structural characteristics of the site or prior domain knowledge can also be used to associate an additional measure of significance with each pageview in the user’s active session. For instance, the site owner or the site designer may wish to consider certain page types (e.g., content vs. navigational) or product categories as having more significance in terms of their recommendation value. In this case, significance weights can be specified as part of the domain knowledge.
15.4.1 The kNN-Based Approach Collaborative filtering based on the kN N approach involves comparing the activity record for a target user with the historical records of other users in order to find the top k users who have similar tastes or interests. The mapping of a visitor record to its neighborhood could be based on similarity in ratings of items, access to similar content or pages, or purchase of similar items. The identified neighborhood is
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 21 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-21
then used to recommend items not already accessed or purchased by the active user. Thus, there are two primary phases in collaborative filtering: the neighborhood formation phase and the recommendation phase. In the context of personalization based on Web usage mining, kNN involves measuring the similarity or correlation between the active session s and each transaction vector t (where t Œ T). The top k-most similar transactions to s are considered to be the neighborhood for the session s, which we denote b y NB( s) (taking the size k o f the neighborhood to be implicit): NB(s) = {t s1 , t s2 , , t sk } A variety of similarity measures can be used to find the nearest neighbors. In traditional collaborative filtering domains (where feature weights are item ratings on a discrete scale), the Pearson r correlation coefficient is commonly used. This measure is based on the deviations of users’ ratings on various items from their mean ratings on all rated items. However, this measure may not be appropriate when the primary data source is clickstream data (particularly in the case of binary weights). Instead we use the cosine coefficient, commonly used in information retrieval, which measures the cosine of the angle between two vectors. The cosine coefficient can be computed by normalizing the dot product of two vectors with respect to their vector norms. Given the active session s and a transaction t , the similarity between them is obtained b y : t ◊s sim(t , s ) = . t ¥s In order to determine which items (not already visited by the user in the active session) are to be recommended, a recommendation score is computed for each pageview pi Œ P based on the neighborhood for the active session. Two factors are used in determining this recommendation score: the overall similarity of the active session to the neighborhood as a whole, and the average weight of each item in the neighborhood. First we compute the mean vector (centroid) of NB(s). Recall that the dimension value for each pageview in the mean vector is computed by finding the ratio of the sum of the pageview’s weights across transactions to the total number of transactions in the neighborhood. We denote this vector b y cent( NB( s)). F o r each pageview p in the neighborhood centroid, we can now obtain a recommendation score as a function of the similarity of the active session to the centroid vector and the weight of that item in this centroid. Here we have chosen to use the following function, denoted by rec(s , p): rec(s , p) = weight( p, NB(s)) ¥ sim(s , cent (NB(s))) where weight( p, NB(s)) is the mean weight for pageview p in the neighborhood as expressed in the centroid vector. If the pageview p is in the current active session, then its recommendation value is set to zero. If a fixed number N of recommendations are desired, then the top N items with the highest recommendation scores are considered to be part of the recommendation set. In our implementation, we normalize the recommendation scores for all pageviews in the neighborhood (so that the maximum recommendation score is 1), and return only those that satisfy a threshold test. In this way, we can compare the performance of kNN across different recommendation thresholds.
15.4.2 Using Clustering for Personalization The transaction clustering approach discussed in Section 15.3.2 will result in a set TC = {c1 , c2 , , ck } of transaction clusters, where each ci is a subset of the set of transactions T. As noted in that section, from
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 22 Wednesday, August 4, 2004 8:25 AM
15-22
The Practical Handbook of Internet Computing
each transaction cluster we can derive and aggregate usage profile by computing the centroid vectors for that cluster. We call this method PACT (Profile Aggregation Based on Clustering Transactions) [Mobasher et al., 2002b]. In general, PACT can consider a number of other factors in determining the item weights within each profile and in determining the recommendation scores. These additional factors may include the link distance of pageviews to the current user location within the site or the rank of the profile in terms of its significance. However, to be able to consistently compare the performance of the clustering-based approach to that of kNN, we restrict the item weights to be the mean feature values of the transaction cluster centroids. In this context, the only difference between PACT and the kNN-based approach is that we discover transaction clusters offline and independent of a particular target user session. To summarize the PACT method, given a transaction cluster c, we construct an aggregate usage profile prc as a set of pageview-weight pairs:
prc = {· p, weight( p, prc )Ò | p Œ P, weight( p, prc ) ≥ m} where the significance weight, weight( p, prc ), of the pageview p within the usage profile prc is:
weight( p, prc ) =
1 ◊ Â wt | c | t Œc p
and w tp is the weight of pageview p in transaction t Œc . The threshold parameter m is used to prune out very low support pageviews in the profile. An example of deriving aggregate profiles from transaction clusters was given in the previous section (see Figure 15.7). This process results in a number of aggregate profiles, each of which can, in turn, be represented as a vector in the original n-dimensional space of pageviews. The recommendation engine can compute the similarity of an active session s with each of the discovered aggregate profiles. The top matching profile is used to produce a recommendation set in a manner similar to that for the kNN approach discussed in the preceding text. If pr is the vector representation of the top matching profile pr, we compute the recommendation score for the pageview p by
rec(s, p) = weight( p, pr) ¥ sim(s, pr), where weight( p, pr) is the weight for pageview p in the profile pr. As in the case of kNN, if the pageview p is in the current active session, then its recommendation value is set to zero. Clearly, PACT will result in dramatic improvement in scalability and computational performance because most of the computational cost is incurred during the offline clustering phase. We would expect, however, that this decrease in computational costs be accompanied also by a decrease in recommendation effectiveness. Experimental results [Mobasher et al., 2001b] have shown that through proper data preprocessing and using some of the data transformation steps discussed earlier, we can dramatically improve the recommendation effectiveness when compared to kNN. It should be noted that the pageview clustering approach discussed in Section 15.3.2 can also be used with the recommendation procedure detailed above. In that case, also, the aggregate profiles are represented as collections of pageview-weight pairs and thus can be viewed as vectors over the space of pageviews in the data.
15.4.3 Using Association Rules for Personalization The recommendation engine based on association rules matches the current user session window with frequent itemsets to find candidate pageviews for giving recommendations. Given an active session window w and a group of frequent itemsets, we only consider all the frequent itemsets of size | w | +1
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 23 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-23
containing the current session window. The recommendation value of each candidate pageview is based on the confidence of the corresponding association rule whose consequent is the singleton containing the pageview to be recommended. In order to facilitate the search for itemsets (of size | w | +1 ) containing the current session window w, the frequent itemsets are stored in a directed acyclic graph, here called a Frequent Itemset Graph. The Frequent Itemset Graph is an extension of the lexicographic tree used in the “tree projection algorithm” [Agarwal et al., 1999]. The graph is organized into levels from 0 to k, where k is the maximum size among all frequent itemsets. Each node at depth d in the graph corresponds to an itemset I, of size d, and is linked to itemsets of size d + 1 that contain I at level d + 1. The single root node at level 0 corresponds to the empty itemset. To be able to match different orderings of an active session with frequent itemsets, all itemsets are sorted in lexicographic order before being inserted into the graph. The user’s active session is also sorted in the same manner before matching with patterns. Given an active user session window w, sorted in lexicographic order, a depth-first search of the Frequent Itemset Graph is performed to level |w|. If a match is found, then the children of the matching node n containing w are used to generate candidate recommendations. Each child node of n corresponds to a frequent itemset w »{p}. In each case, the pageview p is added to the recommendation set if the support ratio s(w » {p})/ s(w) is greater than or equal to a, where a is a minimum confidence threshold. Note that s(w » {p})/ s(w) is the confidence of the association rule w fi {p}. The confidence of this rule is also used as the recommendation score for pageview p. It is easy to observe that in this algorithm the search process requires only O(|w|) time given active session window w. To illustrate the process, consider the example transaction set given in Figure 15.8. Using these transactions, the Apriori algorithm with a frequency threshold of 4 (minimum support of 0.8) generates the itemsets given in Figure 15.9. Figure 15.10 shows the Frequent Itemsets Graph constructed based on the frequent itemsets in Figure 15.9. Now, given user active session window ·B, EÒ, the recommendation generation algorithm finds items A and C as candidate recommendations. The recommendation scores of item A and C are 1 and 4/5, corresponding to the confidences of the rules {B, E} Æ {A} and {B, E} Æ {C}, respectively.
15.4.4 Using Sequential Patterns for Personalization The recommendation algorithm based on association rules can be adopted to work also with sequential or contiguous sequential patterns. In this case, we focus on frequent (contiguous) sequences of size | w | +1 whose prefix contains an active user session w. The candidate pageviews to be recommended are the last items in all such sequences. The recommendation values are based on the confidence of the patterns. If T1: {ABDE} T2: {ABECD} T3: {ABEC} T4: {BEBAC} T5: {DABEC}
FIGURE 15.8 Sample Web Transactions involving pageviews A; B, C, D, and E. Size 1
Size 2
Size 3
Size 4
{A}(5) {B}(6) {C}(4) {E}(5)
{A,B}(5) {A,C}(4) {A,E}(5) {B,C}(4) {B,E}(5) {C,E}(4)
{A,B,C}(4) {A,B,E}(5) {A,C,E}(5) {B,C,E}(4)
{A,B,C,E}(4)
FIGURE 15.9 Example of discovered frequent itemsets.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 24 Wednesday, August 4, 2004 8:25 AM
15-24
The Practical Handbook of Internet Computing
Depth 0
O
AB (5)
AE (5)
AC (4)
ABC (4)
C (4)
B (6)
A (5)
BC (4)
ACE (4)
ABE (5)
E (5)
BE (5)
BCE (4)
Depth 1
CE (4)
Depth 2
Depth 3
Depth 4
ABCE (4)
FIGURE 15.10 An example of a Frequent Itemsets Graph.
the confidence satisfies a threshold requirement, then the candidate pageviews are added to the recommendation set. A simple trie structure, which we call Frequent Sequence Trie (FST), can be used to store both the sequential and contiguous sequential patterns discovered during the pattern discovery phase. The FST is organized into levels from 0 to k, where k is the maximal size among all sequential or contiguous sequential patterns. There is the single root node at depth 0 containing the empty sequence. Each nonroot node N at depth d contains an item sd and represents a frequent sequence · s1 , s2 , , sd -1 , sd Ò whose prefix · s1 , s2 , , sd -1 Ò is the pattern represented by the parent node of N at depth d – 1. Furthermore, along with each node we store the support (or frequency) value of the corresponding pattern. The confidence of each pattern (represented by a nonroot node in the FST) is obtained by dividing the support of the current node by the support of its parent node. The recommendation algorithm based on sequential and contiguous sequential patterns has a similar structure as the algorithm based on association rules. For each active session window w = · w1 ,w2 ,, wn Ò, we perform a depth-first search of the FST to level n. If a match is found, then the children of the matching node N are used to generate candidate recommendations. Given a sequence S = w1 , w 2 ,, w n , p represented by a child node of N, the item p is then added to the recommendation set as long as the confidence of S is greater than or equal to the confidence threshold. As in the case of the frequent itemset graph, the search process requires O(|w|) time given active session window size |w|. To continue our example, Figure 15.11 and Figure 15.12 show the frequent sequential patterns and frequent contiguous sequential patterns with a frequency threshold of 4 over the example transaction set Size 1 ·AÒ(5) ·BÒ(6) ·CÒ(4) ·EÒ(5)
Size 2 ·A,BÒ(4) ·A,CÒ(4) ·A,EÒ(4) ·B,CÒ(4) ·B,EÒ(5) ·C,EÒ(4)
Size 3 ·A,B,EÒ(4) ·A,E,CÒ(4)
. . . FIGURE 15.11. Example of discovered sequential patterns. Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 25 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-25
Size 1 ·AÒ(5) ·BÒ(6) ·CÒ(4) ·EÒ(5)
Size 2 ·A,BÒ( 4) ·B,EÒ( 4)
FIGURE 15.12. Example of discovered contiguous sequential patterns.
O
A(5)
B
(4)
E
(4)
C
(4)
Depth 0
B(6)
E
(4)
C(4)
C
(4)
E(5)
E
(5)
C (4)
Depth 1
C (4)
Depth 2
Depth 3
FIGURE 15.13 Example of a Frequent Sequence Trie (FST).
given in Figure 15.8. Figure 15.13 and Figure 15.14 show the trie representation of the sequential and contiguous sequential patterns listed in the Figure 15.11 and Figure 15.12, respectively. The sequential pattern ·A, B, EÒ appears in the Figure 15.13 because it is the subsequence of 4 transactions T1, T2, T3, and T5. However, ·A, B, EÒ is not a frequent contiguous sequential pattern because only three transactions (T2, T3, and T5) contain the contiguous sequence ·A, B, EÒ. Given a user’s active session window ·A, BÒ, the recommendation engine using sequential patterns finds item E as a candidate recommendation. The recommendation score of item E is 1, corresponding to the rule ·A, BÒ fi ·EÒ. On the other hand, the recommendation engine using contiguous sequential patterns will, in this case, fail to give any recommendations. It should be noted that, depending on the specified support threshold, it might be difficult to find large enough itemsets or sequential patterns that could be used for providing recommendations, leading to reduced coverage. This is particularly true for sites with very small average session sizes. An alternative to reducing the support threshold in such cases would be to reduce the session window size. This latter choice may itself lead to some undesired effects since we may not be taking enough of the user's activity history into account. Generally, in the context of recommendation systems, using a larger window size over the active session can achieve better prediction accuracy. But, as in the case of higher support threshold, larger window sizes also lead to lower recommendation coverage. In order to overcome this problem, we can use the all-kth-order approach discussed in the previous section in the context of Markov chain models. The above recommendation framework for contiguous sequential patterns is essentially equivalent to kth-order Markov models; however, rather than storing all navigational sequences, only frequent sequences resulting from the sequential pattern raining process are stored. In this sense, the above method is similar to support pruned models described in the previous section [Deshpande and Karypis, 2001], except that the support pruning is performed by the Apriori algorithm in the mining phase. Furthermore, in contrast to standard all-kth-order Markov models, this
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 26 Wednesday, August 4, 2004 8:25 AM
15-26
The Practical Handbook of Internet Computing
O
A(5)
B (6)
B
E
(4)
Depth 0
C (4)
(4)
E (5)
Depth 1
Depth 2
FIGURE 15.14 Example of an FST for contiguous sequences.
framework does not require additional storage because all the necessary information (for all values of k) is captured by the FST structure described above. The notion of all-kth-order models can also he easily extended to the context of general sequential patterns and association rule. We extend these recommendation algorithms to generate all-kth-order recommendations as follows. First, the recommendation engine uses the largest possible active session window as an input for the recommendation engine. If the engine cannot generate any recommendations, the size of active session window is iteratively decreased until a recommendation is generated or the window size becomes 0.
15.5 Conclusions and Outlook In this chapter we have attempted to present a comprehensive view of the personalization process based on Web usage mining. The overall framework for this process was depicted in Figure 15.1 and Figure 15.2. In the context of this framework, we have discussed a host of Web usage mining activities necessary for this process, including the preprocessing and integration of data from multiple sources, and pattern discovery techniques that are applied to the integrated usage data. We have also presented a number of specific recommendation algorithms for combining the discovered knowledge with the current status of a user’s activity in a Website to provide personalized content to a user. The approaches we have detailed show how pattern discovery techniques such as clustering, association rule mining, and sequential pattern discovery, performed on Web usage data, can be leveraged effectively as an integrated part of a Web personalization system. In this concluding section, we provide a brief discussion of the circumstances under which some of the approaches discussed might provide a more effective alternative to the others. We also identify the primary problems, the solutions of which may lead to the creation of the next generation of more effective and useful Web-personalization and Web-mining tools.
15.5.1 Which Approach? Personalization systems are often evaluated based on two statistical measures, namely precision and coverage (also known as recall). These measures are adaptations of similarly named measures often used in evaluating the effectiveness of information retrieval systems. In the context of personalization, precision measures the degree to which the recommendation engine produces accurate recommendations (i.e., the proportion of relevant recommendations to the total number of recommendations), while coverage (or recall) measures the ability of the recommendation engine to produce all of the pageviews that are likely to be visited by the user (i.e., proportion of relevant recommendations to all pageviews that will be visited, according to some evaluation data set). Neither of these measures individually is sufficient to evaluate the performance of the recommendation engine; however, they are both critical. A low precision in this context will likely result in angry customers or visitors who are not interested in the recommended items,
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 27 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-27
whereas low coverage will result in the inability of the site to produce relevant cross-sell recommendations at critical points in the user’s interaction with the site. In previous work [Mobasher et al., 2001a, 2002b, a], many of the approaches presented in this chapter have been evaluated based on these measures using real usage data. Here we present a summary of the findings. In the case of clustering approaches, we have compared the performance of transaction clustering method, PACT, with the pageview clustering approach based on hypergraph partitioning, ARHP [Mobasher et al., 2002a]. In general, the ARHP approach performs better when the data set is filtered to focus on more “interesting” objects (e.g., content-oriented pages that are situated more deeply within the site). It seems to produce a smaller set of high quality, and more specialized, recommendations even when a small portion of the user’s clickstream is used by the recommendation engine. On the other hand, PACT provides a clear performance advantage when dealing with all the relevant pageviews in the site, particularly as the session window size is increased. Thus, if the goal is to provide a smaller number of highly focused recommendations, then the ARHP approach may be a more appropriate method. This is particularly the case if only specific portions of the site (such as product-related or content pages) are to be personalized. On the other hand, if the goal is to provide a more generalized personalization solution integrating both content and navigational pages throughout the whole site, then using PACT as the underlying aggregate profile generation method seems to provide clear advantages. More generally, clustering, in contrast to association rule or sequential pattern mining, provides a more flexible mechanism for personalization even though it does not always lead to the highest recommendation accuracy. The flexibility comes from the fact that many inherent attributes of pageviews can be taken into account in the mining process, such as time durations and possibly relational attributes of the underlying objects. The association rule (AR) models also performs well in the context of personalization. In general, the precision of AR models are lower than the models based on sequential patterns (SP) and contiguous sequential patterns (CSP), but they often provide much better coverage. Comparison to kNN have shown that all of these techniques outperform kNN in terms of precision. In general, kNN provides better coverage (usually on par with the AR model), but the difference in coverage is diminished if we insist on higher recommendation thresholds (and thus more accurate recommendations). In general, the SP and the AR models provide the best choices for personalization applications. The CSP model can do better in terms of precision, but the coverage levels are often too low when the goal is to generate as many good recommendations as possible. This last observation about the CSP models, however, does not extend to other predictive applications such as prefetching, where the goal is to predict the immediate next action of the user (rather than providing a broader set of recommendations). In this case, the goal is not usually to maximize coverage, and the high precision of CSP makes it an ideal choice for this type of application. The structure and the dynamic nature of a Website can also have an impact on the choice between sequential and nonsequential models. For example, in a highly connected site, reliance on fine-grained sequential information in user trails is less meaningful. On the other hand, a site with many dynamically generated pages, where often a contiguous navigational path represents a semantically meaningful sequence of user actions, each depending on the previous actions, the sequential models are better suited in providing useful recommendations.
15.5.2 The Future: Personalization Based on Semantic Web Mining Usage patterns discovered through Web usage mining are effective in capturing item-to-item and userto-user relationships and similarities at the level of user sessions. However, without the benefit of deeper domain knowledge, such patterns provide little insight into the underlying reasons for which such items or users are grouped together. It is possible to capture some of the site semantics by integrating keywordbased content-filtering approaches with collaborative filtering and usage-mining techniques. These approaches, however, are incapable of capturing more complex relationships at a deeper semantic level based on the attributes associated with structured objects.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 28 Wednesday, August 4, 2004 8:25 AM
15-28
The Practical Handbook of Internet Computing
Indeed, with the growing interest in the notion of semantic Web, an increasing number of sites use structured semantics and domain ontologies as part of the site design, creation, and content delivery. The primary challenge for the next generation of personalization systems is to effectively integrate semantic knowledge from domain ontologies into the various parts of the process, including the data preparation, pattern discovery, and recommendation phases. Such a process must involve some or all of the following tasks and activities: 1. Ontology learning, extraction, and preprocessing: Given a page in the Web site, we must be able to extract domain-level structured objects as semantic entities contained within this page. This task may involve the automatic extraction and classification of objects of different types into classes based on the underlying domain ontologies. The domain ontologies, themselves, may be prespecified or may be learned automatically from available training data [Craven et al., 2000]. Given this capability, the transaction data can be transformed into a representation that incorporates complex semantic entities accessed by users during a visit to the site. 2. Semantic data mining: In the pattern discovery phase, data-mining algorithms must be able to deal with complex semantic objects. A substantial body of work in this area already exists. These include extensions of data-mining algorithms (such as association rule mining and clustering) to take into account a concept hierarchy over the space of items. Techniques developed in the context of “relational” data mining are also relevant in this context. Indeed, domain ontologies are often expressed as relational schema consisting of multiple relations. Relational data mining techniques have focused on precisely this type of data. 3. Domain-level aggregation and representation: Given a set of structured objects representing a discovered pattern, we must then be able to create an aggregated representation as a set of pseudo objects each characterizing objects of different types occurring commonly across the user sessions. Let us call such a set of aggregate pseudo objects a Domain-level Aggregate Profile. Thus, a domainlevel aggregate profile characterizes the activity of a group of users based on the common properties of objects as expressed in the domain ontology. This process will require both general and domainspecific techniques for comparison and aggregation of complex objects, including ontology-based semantic similarity measures. 4. Ontology-based recommendations: Finally, the recommendation process must also incorporate semantic knowledge from tide domain ontologies. This requires further processing of the user’s activity record according to the ontological structure of the objects accessed and the comparison of the transformed “semantic transactions” to the discovered domain-level aggregate profiles. To produce useful recommendations for users, the results of this process must be instantiated to a set of real objects or pages that exist in the site. The notion of “Semantic Web Mining” was introduced in Berendt et al. [2002a]. Furthermore, a general framework was proposed for the extraction of a concept hierarchy from the site content and the application of data-mining techniques to find frequently occurring combinations of concepts. An approach to integrate domain ontologies into the personalization process based on Web usage raining was proposed in Dai and Mobasher [2002], including an algorithm to construct domain-level aggregate profiles from a collection of semantic objects extracted from user transactions. Efforts in this direction are likely to be the most fruitful in the creation of much more effective Web usage raining and personalization systems that are consistent with the emergence and proliferation of the semantic Web.
References Agarwal, R., C. Aggarwal, and V. Prasad. A Tree Projection Algorithm for Generation of Frequent Itemsets. In Proceedings of the High Performance Data Mining Workshop, Puerto Rico, April 1999. Aggarwal, C. C., J. L. Wolf, and P. S. Yu. A New Method for Similarity Indexing for Market Data. In Proceedings of the 1999 ACM SIGMOD Conference, Philadelphia, PA, June 1999.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 29 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-29
Agrawal, R. and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, September 1994. Agrawal, R. and R. Srikant. Mining Sequential Patterns. In Proceedings of the International Conference on Data Engineering (ICDE’95), Taipei, Taiwan, March 1995. Banerjee, A. and J. Ghosh. Clickstream Clustering Using Weighted Longest Common Subsequences. In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, Chicago, IL, April 2001. Berendt, B., A. Hotho, and G. Stumme. Towards Semantic Web Mining. In Proceedings of the First International Semantic Web Conference (ISWC’02), Sardinia, Italy, June 2002a. Berendt, B., B. Mobasher, M. Nakagawa, and M. Spiliopoulou. The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis. In Proceedings of the 4th WebKDD 2002 Workshop, at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD’2000), Edmonton, Alberta, Canada, July 2002b. Berendt, B. and M. Spiliopoulou. Analysing navigation behaviour in web sites integrating multiple information systems. VLDB Journal, Special Issue on Databases and the Web, 9(1): 56–75, 2000. Borges, J. and M. Levene. Data Mining of User Navigation Patterns. In B. Masand and M. Spiliopoulou, Eds., Web Usage Analysis and User Profiling: Proceedings of the WEBKDD’99 Workshop, LNAI 1836, pp. 92–111. Springer-Verlag, New York, 1999. Buchner, A. and M. D. Mulvenna. Discovering internet marketing intelligence through online analytical Web usage mining. SIGMOD Record, 4(27), 1999. Claypool, M., A. Gokhale. T. Miranda, P. Murnikov, D. Netes, and M. Sartin. Combining Content-based and Collaborative Filters in an Online Newspaper. In Proceedings of the ACM SIGIR’99 Workshop on Recommender Systems: Algorithms and Evaluation, Berkeley, CA, August 1999. Cooley, R. Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. Ph.D. dissertation, Department of Computer Science, University of Minnesota, Minneapolis, MN, 2000. Cooley, R., B. Mobasher, and J. Srivastava. Data preparation for mining World Wide Web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 1999. Craven, M., D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2): 69–113, 2000. Dai, H. and B. Mobasher. Using Ontologies to Discover Domain-Level Web Usage Profiles. In Proceedings of the 2nd Semantic Web Mining Workshop at ECML/PKDD 2002, Helsinki, Finland, August 2002. Deshpande, M. and G. Karypis. Selective Markov Models for Predicting Web-Page Accesses. In Proceedings of the First International SIAM Conference on Data Mining, Chicago, April 2001. Frakes, W. B. and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ, 1992. Fu, X., J. Budzik, and K. J. Hammond. Mining Navigation History for Recommendation. In Proceedings of the 2000 International Conference on Intelligent User Interfaces, New Orleans, LA, ACM Press, New York, January 2000. Han, E., G. Karypis, V. Kumar, and B. Mobasher. Hypergraph based clustering in high-dimensional data sets: A summary of results. IEEE Data Engineering Bulletin, 21(1): 15–22, March 1998. Herlocker, J., J. Konstan, A. Borchers, and J. Riedl. An Algorithmic Framework for Performing Collaborative Filtering. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), Berkeley, CA, August 1999. Joachims, T., D. Freitag, and T. Mitchell. WebWatcher: A Tour Guide for the World Wide Web. In Proceedings of the International Joint Conference in Al (IJCAI97), Los Angeles, CA, August 1997. Konstan, J., B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3), 1997. Lieberman, H. Letizia: An Agent that Assists Web Browsing. In Proceedings of the 1995 International Joint Conference on Artificial Intelligence, IJCAI’95, Montreal, Canada, August 1995.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 30 Wednesday, August 4, 2004 8:25 AM
15-30
The Practical Handbook of Internet Computing
Lin, W., S. A. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6: 83–105, 2002. Liu, B., W. Hsu, and Y. Ma. Association Rules with Multiple Minimum Supports. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’99, poster), San Diego, CA, August 1999. Mobasher, B., R. Cooley, and J. Srivastava, Automatic personalization based on Web usage mining. Communications of the ACM, 43(8): 142–151, 2000a. Mobasher, B., H. Dai, T. Luo, and M. Nakagawa. Effective Personalization Based on Association Rule Discovery from Web Usage Data. In Proceedings of the 3rd ACM Workshop on Web Information and Data Management, (WIDM’01), Atlanta, GA, November 2001a. Mobasher, B., H. Dai, T. Luo, and M. Nakagawa. Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data. In Proceedings of the IJCAI 2001 Workshop on Intelligent Techniques for Web Personalization (ITWP’01), Seattle, WA, August 2001b. Mobasher, B., H. Dai, T. Luo, and M. Nakagawa. Using Sequential and Non-Sequential Patterns in Predictive Web Usage Mining Tasks. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi City, Japan, December 2002a. Mobasher, B., H. Dai, T. Luo, Y. Sun, and J. Zhu. Integrating Web Usage and Content Mining for More Effective Personalization. In E-Commerce and Web Technologies: Proceedings of the EC-WEB 2000 Conference, Lecture Notes in Computer Science (LNCS) 1875, pp. 165–176. Springer-Verlag, New York, September 2000b. Mobasher, B., H. Dai, M., Nakagawa, and T. Luo. Discovery and evaluation of aggregate usage profiles for Web personalization. Data Mining and Knowledge Discovery, 6: 61–82, 2002b. O’Conner, M. and J. Herlocker. Clustering Items for Collaborative Filtering. In Proceedings of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA, August 1999. Palpanas, T. and A. Mendelzon. Web Prefetching Using Partial Match Prediction. In Proceedings of the 4th International Web Caching Workshop (WCW99), San Diego, CA, March 1999. Pazzani, M. A Framework for Collaborative, Content-Based, and Demographic Filtering. Artificial Intelligence Review, 13(5–6): 393–408. 1999. Pei, J., J. Han, B. Mortazavi-Asl, and H. Zhu. Mining Access Patterns Efficiently from Web Logs. In Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), Kyoto, Japan, April 2000. Perkowitz, M. and O. Etzioni. Adaptive Web Sites: Automatically Synthesizing Web Pages. In Proceedings of the 15th National Conference on Artificial Intelligence, Madison, WI, July 1998. Pitkow, J. and P. Pirolli. Mining Longest Repeating Subsequences to Predict WWW Surfing. In Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999. Sarukkai, R. R. Link Prediction and Path Analysis Using Markov Chains. In Proceedings of the 9th International World Wide Web Conference, Amsterdam, May 2000. Sarwar, B., G. Karypis, J. Konstan, and J. Riedl. Application of Dimensionality Reduction in Recommender Systems — A Case Study. In Proceedings of the WebKDD 2000 Workshop at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD’2000), Boston, MA, August 2000a. Sarwar, B. M., G. Karypis, J. Konstan, and J. Riedl. Analysis of Recommender Algorithms for E-Commerce. In Proceedings of the 2nd ACM E-Commerce Conference (EC’00), Minneapolis, MN, October 2000b. Schechter, S., M. Krishnan, and M. D. Smith. Using Path Profiles to Predict HTTP Requests. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, April 1998. Shardanand, U. and P. Maes. Social Information Filtering: Algorithms for Automating “Word of Mouth.” In Proceedings of the Computer-Human Interaction Conference (CHI’95), Denver, CO, May 1995. Spiliopoulou, M. and H. Faulstich. WUM: A Tool for Web Utilization Analysis. In Proceedings of EDBT Workshop at WebDB’98, LNCS 1590, pp. 184–203. Springer-Verlag, New York, 1999. Spiliopoulou, M., B. Mobasher, B. Berendt, and M. Nakagawa. A framework for the evaluation of session reconstruction heuristics in Web usage Analysis. INFORMS Journal of Computing — Special Issue on Mining Web-Based Data for E-Business Applications, 15(2), 2003.
Copyright 2005 by CRC Press LLC
C3812_C15.fm Page 31 Wednesday, August 4, 2004 8:25 AM
Web Usage Mining and Personalization
15-31
Srivastava, J., R. Cooley, M. Deshpande, and P. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, 1(2): 12–23, 2000. Tan, P. and V. Kumar. Discovery of Web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6: 9–35, 2002. Ungar, L. H. and D. P. Foster. Clustering Methods for Collaborative Filtering. In Proceedings of the Workshop on Recommendation Systems at the 15th National Conference on Artificial Intelligence, Madison, WI, July 1998. W3C, World Wide Web Committee. Web Usage Characterization Activity. http://www.w3.org/WCA. Yu, P. S. Data Mining and Personalization Technologies. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA’99), Hsinchu, Taiwan, April 1999. Zaiane, O., M. Xin, and J. Han. Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs. In Proceedings of the IEEE Conference on Advances in Digital Libraries (ADL'98). Santa Barbara, CA, April 1998.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 1 Wednesday, August 4, 2004 8:27 AM
16 Agents CONTENTS Abstract 16.1 Introduction 16.2 What Is an Intelligent Agent? 16.3 Anatomy of an Agent 16.3.1 16.3.2 16.3.3 16.3.4 16.3.5 16.3.6 16.3.7 16.3.8
An Agent Architecture Sensors: Gathering Input Perception Decision-Making Behavior Communication Effectors: Taking Action Mobility An Example Agent
16.4 Multiagent Teams 16.5 Intelligent Agents on the Internet 16.5.1 Bots 16.5.2 Agents behind Websites
16.6 Research Issues Related to Agents and the Internet 16.6.1 16.6.2 16.6.3 16.6.4
Joseph P. Bigus Jennifer Bigus
Agents and Human–Computer Interfaces Agents and Privacy Agents and Security Autonomic Computing and Agents
16.7 Summary 16.8 Further Information 16.9 Glossary Acknowledgments References
Abstract The rise of distributed networked computing and of the Internet has spurred the development of autonomous intelligent agents (also called software robots or bots). Software agents are used to advise or assist users in performing tasks on the Internet, to help automate business processes, and to manage the network infrastructure. In this chapter, we explore the essential attributes of intelligent agents, describe an abstract agent architecture, and discuss the functional components required to implement the architecture. We examine the various types of intelligent agents and application-specific bots that are available to help users perform Internet-related tasks and describe how agents are used behind commercial Websites. Finally, we discuss some of the issues surrounding the widespread adoption of agent technology, including the human–computer interface, security, privacy, trust, delegation, and management of the Internet computing infrastructure.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 2 Wednesday, August 4, 2004 8:27 AM
16-2
The Practical Handbook of Internet Computing
16.1 Introduction The rise of distributed networked computing and of the Internet has spurred the development of autonomous intelligent agents (also called software robots or bots). Software agents are used to advise or assist users in performing tasks on the Internet, to help automate business processes, and to manage the network infrastructure. While the last decade of Internet growth has provided a fertile and highly visible ground for agent applications, autonomous software agents can trace their heritage back to research in artificial intelligence spanning the past half century. In this chapter, we explore the essential attributes of intelligent agents, describe a prototypical agent architecture, and discuss the functional components required to implement the architecture. We examine the various types of intelligent agents and application-specific bots that are available to help users perform Internet-related tasks and describe how agents are used behind commercial Websites. Finally, we discuss some of the issues surrounding the widespread adoption of agent technology, including the human–computer interface, security, privacy, trust, delegation, and management of the Internet computing infrastructure.
16.2 What Is an Intelligent Agent? So, what is an intelligent agent and how does it differ from other software that is used on the desktop or on the Internet? There is no generally agreed-upon definition of intelligent agent, but there are several generally agreed-upon attributes of an agent [Franklin and Graesser, 1996]. First and foremost, agents are autonomous, meaning they can take actions on their own initiative. The user delegates authority to them, so they can make decisions and act on the user’s behalf. Second, agents typically run for days or weeks at a time. This means that agents can monitor and collect data over a substantial time interval. Because they are long running, agents can “get to know” the user by watching the user’s behavior while performing repetitive tasks [Maes, 1994] and by detecting historical trends in Web data sources that are of interest. While most agents stay in one place, some are mobile, meaning they can move between computer systems on the network. Finally, all agents must be able to communicate, either directly with people using a human–computer interface or with other agents using an agent communication language [Bradshaw, 1997].
16.3 Anatomy of an Agent In this section we explore the basic architectural elements of an autonomous intelligent software agent. We start with a formal definition to motivate our discussion of the essential technical attributes of an agent. An intelligent agent is an active, persistent software component that can perceive, reason, act, and communicate [Huhns and Singh, 1998]. Let us examine this definition in more detail. Whereas intelligence, especially artificial intelligence, is somewhat in the eye of the beholder, a commonly accepted notion of intelligence is that of rationality. People are considered rational if they make decisions that help them achieve their goals. Likewise, a rational software agent is one that behaves with human-like decision-making abilities [Russell and Norvig, 1995]. An intelligent agent is a persistent software component, meaning it is not a transient software program but a long-running program whose current state, past experiences, and actions are maintained in persistent memory. For an agent to perceive, it must be able to sense its environment and process external data either through events or by polling. For an agent to reason, it must contain some form of domain knowledge and associated reasoning logic. To take action, an agent must have a decision-making component with the ability to change its environment through direct actions on hardware devices or by
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 3 Wednesday, August 4, 2004 8:27 AM
Agents
16-3
Domain Knowledge Decision-making Behavior Memory Sensors
Effectors
Environment
FIGURE 16.1 A basic intelligent agent architecture.
invoking actions on other software components. Finally, an agent must be able to communicate with humans and with other agents.
16.3.1 An Agent Architecture In Figure 16.1 we show a diagram of the major components of an autonomous intelligent agent. The input and output modules, called sensors and effectors, are the interfaces between the agent and its world. The sensory input is preprocessed by an optional perceptual component and passed into a decisionmaking or behavioral component. This component must include decision-making logic, some domain knowledge that allows the agent to perform the prescribed task, and working memory that stores both short- and long-term memories. Once behavioral decisions are made by the decision-making component, any actions are taken through the effector component. There are several major variations on the basic architecture described in Figure 16.1. Most of these variations deal with the structure of the behavioral or decision-making component. These include reactive agents that are typically simple stimulus–response agents [Brooks, 1991], deliberative agents that have some reasoning and planning components, and Belief–Desire–Intention (BDI) agents that contain complex internal models representing their beliefs about the state of the world, their current desires, and their committed intentions of what goals are to be achieved [Rao and Georgeff, 1991]. These architectures can be combined and layered, with lower levels being reactive and higher levels being more deliberative [Sloman, 1998]. While simple agents may have their goals hard coded into the decision-making logic, most agents have a control interface that allows a human user to specify goals and to provide feedback to the agent as it tries to meet those goals. Social agents also include the ability to communicate and cooperate with other agents. Many agents also include a learning component that allows the agent to adapt based on experience and user feedback. Figure 16.2 shows a more complete and flexible agent architecture. In the following sections we describe the purpose and technical requirements of the various components used in the agent architectures, with a special focus on the elements of the behavioral subsystem.
16.3.2 Sensors: Gathering Input Any software program needs a way to get input data. For agents, the inputs are provided by sensors, which allow the agent to receive information from its environment. Sensors can be connectors to a web server to read HTML pages, to an NNTP server to read articles in newsgroups, to an FTP connection to download files, or to messaging components that receive external events containing requests from users or other agents. A sensor could also be implemented to actively poll or request input data from an external
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 4 Wednesday, August 4, 2004 8:27 AM
16-4
The Practical Handbook of Internet Computing
Domain Knowledge Attention Planning
Reasoning
Perception Learning Memory Sensors
Effectors
Environment
FIGURE 16.2 An expanded intelligent agent architecture.
hardware device, from a software component, or from another agent in the environment. In this architecture, sensors provide a unified input channel for low-level environmental inputs as well as high-level communications. An alternative would provide a separate channel for human–agent and agent–agent communications.
16.3.3 Perception Although having sensors to gather data from the environment is necessary, the real trick is to turn that data into useful information. Like animals that have very specialized preprocessing systems for touch, smell, taste, hearing, and sight, software agents need similar preprocessing capabilities. The sensors gather raw data, and the perceptual subsystem converts the data into a format that can be easily digested by the decision-making component. For example, as events are streaming into an agent, it may need some way to detect patterns in those events. This job can be performed by an event correlation engine that takes events and turns them into higher-level percepts or situations. Doing this as a preprocessing step greatly lessens the burden on the agent’s decision-making components. Another important job of the perceptual component is to filter out noise. The agent must be able to differentiate between the usual, normal inputs and the unusual, abnormal ones. People have a natural ability to tune out noise and focus their attention on novel or interesting inputs. Agents need this same capability. In order to detect what is abnormal, it is often necessary to build internal models of the world. When data comes in, it is checked against the model of what is expected. If the data matches, then it is a normal occurrence, but if there is a mismatch, it is a new situation that has been detected. This function can also be performed by a separate attentional subsystem that works closely with the perceptual system to not only transform the raw sensor data but also to indicate the relative importance of that data. When exceptional conditions are detected, some agent designs provide special mechanisms for distribution of alarm signals that are processed at a higher priority than other input signals.
16.3.4 Decision-Making Behavior The behavioral or decision-making component of an agent can range from a simple Tcl/Tk or Perl script to C++ or Java code, to a complex reasoning and inferencing engine. There are three major pieces of this subsystem: domain knowledge, working memory, and decision-making logic. When the behavior is defined by procedural code, the domain-knowledge and decision-making logic can be one and the same. When the behavior is defined by rules or semantic networks and processed by inferencing engines, then the domain knowledge is separate from the decision-making logic. The working memory can be as simple as local data or variables stored as part of the agent, or as sophisticated as an associative or content-
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 5 Wednesday, August 4, 2004 8:27 AM
Agents
16-5
addressable memory component. In the next sections we explore details of the decision-making component, including domain knowledge, reasoning, planning, and learning. 16.3.4.1 Domain Knowledge How do we represent the information that an agent needs to perform its assigned task? The answer depends on the type of knowledge that must be represented in the agent. Perhaps the most common type of knowledge is procedural knowledge. Procedural knowledge is used to encode processes — stepby-step instructions for what to do and in what order. Procedural knowledge can be directly represented by computer programs. A second type of knowledge is relational knowledge. A common format for relational knowledge is relational databases, where groups of related information are stored in rows or tuples, and the set of related attributes are stored in the columns or fields of the database table. Although this is relational, it does not explicitly allow for the definition of the relationships between the fields. For the latter, graphbased representations such as semantic networks can be used to represent the entities (nodes) and the relationships (links). Another type of knowledge representation is hierarchical or inheritable knowledge. This type of knowledge representation allows “kind-of,” “is-a,” and “has-a” relationships between objects and allows reasoning about classes using a graph data structure. Perhaps the most popular knowledge representation is simple if–then rules. Rules are easily understood by nontechnical users and support a declarative knowledge representation. Because each rule stands alone, the knowledge is declared and explicitly defined by the antecedent conditions on the left-hand side of the rule and the consequent actions on the right-hand side of the rule. Note that whereas individual rules are clear and concise, large sets of rules or rulesets are often required to cover any nontrivial domain, introducing complex issues related to rule management, maintenance, priorities, and conflict resolution. 16.3.4.2 Reasoning Machine reasoning is the use of algorithms to process knowledge in order to infer new facts or to prove that a goal condition is true. If the most common knowledge representation is if–then rules, then the most common machine reasoning algorithms are inferencing using forward or backward chaining. With rule-based reasoning, there are three major components: data, rules, and a control algorithm. In forward chaining, the initial set of facts is expanded through the firing of rules whose antecedent conditions are true, until no more rules can fire [Forgy, 1982]. This process uses domain knowledge to enhance the understanding of a situation using a potentially small set of initial data. For example, given a customer’s age and income, forward chaining can be used to infer if the customer is a senior citizen or is entitled to silver, gold, or platinum level discounts. The backward-chaining algorithm can use the same set of rules as forward chaining, but works back from the goal condition through the antecedent clauses of the rules to find a set of bindings of data to variables such that the goal condition is proved true or false [Bratko, 1986]. For example, an expert system whose goal is to offer product selection advice can backward chain through the rules to guide the customer through the selection process. A popular alternative to inferencing using Boolean logic is the use of fuzzy logic [Zadeh, 1994]. An advantage of fuzzy rule systems is that linguistic variables and hedges provide an almost natural languagelike knowledge representation, allowing expressions such as “almost normal” and “very high” in the rules. 16.3.4.3 Planning or Goal-Directed Behavior When a task-oriented agent is given a goal, the agent must first determine the sequence of actions that must be performed to reach the goal. Planning algorithms are used to go from the initial state of the world to the desired end state or goal by applying a sequence of operators to transform the initial state into intermediate states until the goal state is reached. The sequence of operators defines the plan. Once the planning component determines the sequence of actions, the plan is carried out by the agent to accomplish the task. The main algorithms in AI planning are the operator-based STRIPS approach and the hierarchical task network (HTN) approach [IEEE Expert, 1996].
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 6 Wednesday, August 4, 2004 8:27 AM
16-6
The Practical Handbook of Internet Computing
Although planning looks easy at first glance, planning algorithms become very complex due to constraints on the order of operations and changes in the world that occur after the plan was calculated, but before it is completed. Taking into account uncertainties and choosing between alternative plans also complicate things. Because multiple solutions may exist, the number of possible combinations of actions can overwhelm a planning agent due to a combinatorial explosion. 16.3.4.4 Learning A key differentiator for an agent is the ability to adapt and learn from experience. Despite our best efforts, there is no way to anticipate, a priori, all of the situations that an agent will encounter. Therefore, being able to adapt to changes in the environment and to improve task performance over time is a big advantage that adaptive agents have over agents that cannot learn. There are several common forms of learning, such as rote learning or memorization; induction or learning by example where the important parameters of a problem are extracted in order to generalize to novel but similar situations; and chunking, where similar concepts are clustered into classes. There are a wide variety of machine-learning algorithms that can discern patterns in data. These include decision trees, Bayesian networks, and neural networks. Neural networks have found a large niche in applications for classification and predication using data sets and are a mainstay of business data-mining applications. [Bigus, 1996] There are three major paradigms for learning: supervised, unsupervised, and reinforcement learning. In supervised learning, explicit examples of inputs and corresponding outputs are presented to the learning agent. These could be attributes of an object and its classification, or elements of a function and its output value. Common supervised learning algorithms include decision trees, back propagation neural networks, and Bayesian classifiers. As an example, data-mining tools can use these algorithms to classify customers as good or bad credit risks based on past experience with similar customers and to predict future profitability based on a customer’s purchase history. In unsupervised learning, the data is presented to the learning agent and common features are used to group or cluster the data using a similarity or distance metric. Examples of unsupervised learning algorithms include Kohonen map neural networks and K-nearest-neighbor classifiers. A common use of unsupervised learning algorithms is to segment customers into affinity groups and to target specific products or services to members of each group. Reinforcement learning is similar to supervised learning in that explicit examples of inputs are presented to the agent, but instead of including the corresponding output value, a nonspecific reinforcement signal is given after a sequence of inputs is presented. Examples of reinforcement learning algorithms include temporal difference learning and Q-learning. In addition to the general learning paradigm is the issue of whether the learning agent is trained using all the data at once (batch mode or offline) or if it can learn from one example at a time (incremental mode or online). In general, incremental learning and the ability to learn from a small number of examples are essential attributes of agent learning.
16.3.5 Communication The communication component must be able to interact with two very different types of partners. First is the human user who tells the agent what to do. The interface can range from a simple command line or form-based user interface to a sophisticated natural language text or speech interface. Either way, the human must be able to tell the agent what needs to be done, how it should be done, and when the task must be completed. Once this is accomplished, the agent can autonomously perform the task. However, there may be times when the agent cannot complete its assigned task and has to come back to the user to ask for guidance or permission to take some action. There are complex issues related to how humans interact and relate to intelligent agents. Some of the issues are discussed in detail in the final section of this chapter. The second type of communication is interaction with other agents. There are two major aspects to any communication medium: the protocol for the communication and the content of the communication.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 7 Wednesday, August 4, 2004 8:27 AM
Agents
16-7
The protocol determines how two agents find each other, and how messages are formatted so that the receiving agent can read the content. This metadata, or data about the data, is handled by an agent communication language such as KQML (knowledge query and manipulation language) [Labrou and Finin, 1997]. The second aspect of communication is the semantic content or meaning of the message. This is dependent on a shared ontology, so that both agents know what is meant when a certain term is used. The Semantic Web project is an attempt to define ontologies for many specialized domains [BernersLee et al., 2001]. Agents use these ontologies to collaborate on tasks or problems.
16.3.6 Effectors: Taking Action Effectors are the way that agents take actions in their world. An action could be sending a motion command to a robotic arm, sending an FTP command to a file server, displaying an HTML Web page, posting an article to an NNTP newsgroup, sending a request to another agent, or sending an e-mail notification to a user. Note that effectors may be closely tied to the communication component of the agent architecture.
16.3.7 Mobility Agents do not have to stay in one place. Mobility allows an agent to go where the data is, which can be a big advantage if there is a large amount of data to examine. A disadvantage is that there must be a mobile agent infrastructure in place for agents to move around a network. Long-running processes must be available on each computer system where the agent may wish to reside. Security becomes a major issue with mobile agents because it is sometimes difficult to differentiate legitimate agents acting on behalf of authorized users from illegitimate agents seeking to steal data or cause other havoc with the computing systems.
16.3.8 An Example Agent In this section, we describe the design and development of an intelligent agent for filtering information, specifically articles in Internet newsgroups. The goal is to build an intelligent assistant to help filter out spam and uninteresting articles posted to one or more newsgroups and to rank and present the interesting articles to the user. The mechanics of how we do this is straightforward. The agent interacts with the newsgroup server using the NNTP protocol. Its sensors and effectors are sockets through which NNTP commands and NNTP responses are sent and received. The human–computer interface is a standard newsgroup reader interface that allows the user to select the newsgroup to monitor, download the articles from the selected newsgroups, and display their subject line in a list so the user can select them for viewing. What we have described so far is a standard newsgroup reader. Simple bots could be used to automate this process and automatically download any unread articles for the user when requested. A more powerful agent could allow the user to specify specific keywords that are of interest. The agent could score the articles posted to the newsgroup based on those keywords. The articles could be presented to the user, ordered by their score. The next level of functionality would be to allow the user to provide feedback to the agent so that the agent could adapt and tune its scoring mechanism. Instead of a preset list of keywords, the agent could use the feedback to add weighting to certain keywords and refine the scoring mechanism. Neural networks could be used to build a model of articles and keyword counts mapped to the expected interest level of the user. Using feedback over time, the scoring mechanism would be tuned to reflect the user’s weightings of the various keywords. An agent of this type was developed in Bigus and Bigus [2001] using a Java agent framework.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 8 Wednesday, August 4, 2004 8:27 AM
16-8
The Practical Handbook of Internet Computing
16.4 Multiagent Teams While individual agents can be useful for simple tasks, most large-scale applications involve a large number of agents. Each agent plays specific roles in the application and contains task- or domain-specific knowledge and capabilities. When agents collaborate to solve a problem, they often must reason not only about their individual goals but also about other agents’ intentions, beliefs, and goals. They must make commitments to other agents about what goals and actions they intend to pursue and, in turn, depend on other agents to fulfill their commitments [Cohen and Levesque, 1990; Sycara, 1998]. A community of agents requires a set of common services to operate efficiently, much like a city needs basic services and infrastructure to work well. These include yellow pages or directory services where agents can register themselves and their capabilities and interests so that other agents can find them. The agents need a communication infrastructure so they can send messages, ask questions, give answers, and plan and coordinate group operations.
16.5 Intelligent Agents on the Internet In this section, we discuss common uses of agents on the Internet today. This includes a review of software robots or bots commonly available on the Web and of agents used as part of Websites and e-business applications.
16.5.1 Bots A whole cottage industry has been spawned for highly specialized agents that are useful for Web-oriented tasks. These cover the gamut from Web searching, information tracking, downloading software, and surfing automation agents to Internet auctions, monitoring stocks, and Internet games [Williams, 1996]. These agents are available for download and personal use, and range from $5 to $50 or more depending on their sophistication and power. One of the first and most useful applications of agents on the Internet was their use in solving the problem of finding information. Commercial web search sites such as Alta Vista, Yahoo, and Google utilize the combination of an information taxonomy constructed by hand and the content information gleaned from hundreds of thousands of Websites using specialized agents called spiders. These agents scour the Web for information and bring it back to the search site for inclusion in the search database. Search bots allow you to enter queries and then submit the queries to multiple Internet search sites. They collect and interpret the results from the search and provide a unified set of results. Tracking bots allows you to keep an eye on Websites and Web pages of interest. They can notify you when site content has been updated and even provide snapshots of the old and new content with changes highlighted. These bots specialize in news and stock information tracking and can also be used to monitor the health of your own Website. File-sharing bots such as KaZaA, WinMX, and Morpheus allow you to share data on your computer’s hard drive with hundreds or thousands of other Internet users. These bots enable peer-to-peer computing because your computer can share data directly with other file-sharing bot users without going through a central server computer. Download bots help Internet users automate the process of downloading programs, music, and videos from Websites. They add functionality to browsers by improving the download speed by using multiple threads and by being able to recover and pick up where they left off when the network connection was lost. Personal assistant bots read Web-page and e-mail content aloud to users (for the visually impaired). Some automatically translate text from one national language to another. Surf bots can help users avoid those annoying pop-up ads that have proliferated at some commercial Websites by immediately closing the pop-up ads. Browsers store URLs and cache Web pages, and sites leave cookies of information on a user’s computer. An intrusive or malicious user could see where you
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 9 Wednesday, August 4, 2004 8:27 AM
Agents
16-9
have been and what you have been up to on the Internet. Privacy bots can be used to remove all traces of your activities while Web surfing. Shopping bots help consumers find the potential sellers of products as well as comparison shop by price. The economic benefit of these agents to the buyer is obvious. The potential impact on sellers is less obvious. The widespread use of shopping bots on the Internet could possibly lead to price wars and loss of margins for most commodities. Auction bots have been developed to assist Internet users who make use of Internet auction sites such as eBay or Yahoo to buy or sell items. These agents monitor the bidding on specific auctions and inform their owner of significant changes in the status of the auction. Stock bots perform a similar function by notifying the user regarding the movement of stock prices over the course of a trading day, based on user-specified conditions. Chatterbots are agents that can perform a natural language dialogue or discussion with a human. The earliest chatterbot was the AI program eLiza, which was a simple pattern-matching program intended to simulate a psychoanalyst. Modern chatterbots use a combination of natural language understanding and modeling in an attempt to educate and entertain. Game bots were developed to act as computer players in Internet games. They allow a single user to interact with multiple agent players to try alternative strategies for practice or just to play the game.
16.5.2 Agents behind Websites Whereas most users of the Internet think of the World Wide Web as a collection of HTML and dynamic HTML pages, commercial Websites often have intelligent agents working behind the scenes to provide personalization, customization, and automation. For example, Website reference bots allow site owners to check their current standings on the most popular search engines. They also provide automated submission of their site for inclusion in the search engine’s index. One of the promises of the Internet from a customer-relationship perspective is to allow businesses to have a personalized relationship with each customer. This one-to-one marketing can be seen in the personalization of a user’s Web experience. When a consumer logs onto a Website, different content is displayed based on past purchases or interests, whether explicitly expressed via profiles or by past browsing patterns. Companies can offer specials based on perceived interests and the likelihood that the consumer would buy those items. Personalization on Websites is done using a variety of intelligent technologies, ranging from statistical clustering to neural networks. Large e-commerce Websites can get hundreds or thousands of e-mail inquiries a day. Most of these are of rather routine requests for information or simple product requests that can be handled by automated e-mail response agents. The e-mail is first analyzed and classified and then an appropriate response is generated using a knowledge base of customer and product information. Agents are also used behind customer self-service applications, such as product configurators and product advisors. The future of applications on the Internet seems to center on the development of electronic commerce and Web Services. Web Services depend on standards including WSDL (Web Services Definition Language) and SOAP (Simple Object Access Protocol), which allow companies to specify the services they can provide as well as the methods and bindings to invoke those services. Intelligent agents will certainly play a role in the definition and provisioning of these Internet services. Another application area where intelligent agents play a role is in business-to-business purchasing and supply-chain replenishment. A typical company has hundreds of suppliers, and a manufacturing company may have thousands of part suppliers. Managing the ordering and shipment of parts to manufacturing plants to insure the speedy output and delivery of finished products is a complex problem. Aspects of this application handled by intelligent agents include the solicitation of part providers, negotiation of terms and conditions of purchase agreements, and tracking of delivery dates and inventory levels so that stocks can be kept at appropriate levels to maximize manufacturing output while minimizing the carrying costs of the parts inventory.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 10 Wednesday, August 4, 2004 8:27 AM
16-10
The Practical Handbook of Internet Computing
16.6 Research Issues Related to Agents and the Internet In this section we discuss issues related to the adoption and widespread use of agents on the Internet, including human–computer interfaces, privacy, security, and autonomic computing.
16.6.1 Agents and Human–Computer Interfaces The interface between humans and autonomous agents has some elements that are similar to any human–computer interface, but there are some additional issues to consider when designing the user interface for an intelligent agent. Unlike computer software tools that are used under the direct control and guidance of the user, intelligent agents are autonomous, acting on the authority of the user, but outside the user’s direct control. This model, in which the user delegates tasks to the agent, has implications and effects that must be considered when designing the human–computer interface. Studies conducted in the organizational and management sciences have shown that delegation is an important skill for successful managers. However, they often fail to delegate for a number of reasons including the amount of time it takes to explain what needs to be done, the loss of control over the task while still being held accountable for the results, and fear that the task will not get done or will not be done well. Trust is an issue in human-to-agent delegation as well. Intelligent agents are more suitable for some tasks than for others. The user interface must be designed in such a way that the user can opt out of delegating to the agent if the cost of using the agent outweighs the benefits. Part of the cost of using an intelligent agent is the time it takes to communicate with the agent. The interface design should allow the user to convey intentions and goals to the agent using natural language interfaces or by demonstrating what needs to be done. The agent must be capable of conveying its understanding of the task to the user, often through the anthropomorphic use of gestures, facial expressions, and vocalization [Milewski and Lewis, 1997]. Anthropomorphic agents are becoming a more common user interface paradigm, especially for agents that act as advisors, assistants, or tutors. An anthropomorphic interface makes an agent more personable, helping to establish trust and a comfort level beyond what is normally experienced with a traditional human–computer interface. Agents represented as three-dimensional characters are judged by users to have both a high degree of agency and of intelligence [King and Ohya, 1996]. Care must be taken, however, to ensure that the social interaction capabilities of the agent are sophisticated enough to meet the user’s expectations. Failure to measure up to those expectations can negatively impact the user’s perception of the agent [Johnson, 2003]. The anthropomorphic characteristics of an agent are often used to convey emotion. A character that expresses emotion increases the credibility and believability of an agent [Bates, 1994]. Emotion also plays an important role in motivation of students when interacting with a pedagogical agent. Having an agent that “cares” about how a student is doing can encourage the student to also care about his or her progress. In addition, enthusiasm for the subject matter, when conveyed by the agent, may foster enthusiasm in the student. Agents that have personality make learning more fun for the student, which, in turn, can make the student more receptive to learning the subject matter [Johnson et al. 2000]. Overall, the human–computer interface of an intelligent agent provides an additional dimension to the agent that can further add to the effectiveness of the agent as a personal assistant. The affective dimension of autonomous agent behavior can give the interface an emotional component to which the user reacts and relates. The challenge lies in building agents that are believable and lifelike, enhancing the interaction between the agent and the user without being condescending, intrusive, or annoying.
16.6.2 Agents and Privacy One of the biggest impediments to wider use of the Internet for electronic commerce is the concern consumers have about privacy. This includes the ability to perform transactions on the Internet without
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 11 Wednesday, August 4, 2004 8:27 AM
Agents
16-11
leaving behind a trail of cookies and Web logs. Another equally important aspect is the privacy and security of their personal, financial, medical, and health data maintained by companies with whom they do business. The ease-of-use and vast array of information are enticing, but the thought of unauthorized or criminal access to personal information has a chilling effect for many people. What role can agents play in this space? One could argue that sending out an autonomous (and anonymous) agent to make a purchase could assuage those concerns. Of course, this assumes that the privacy of any information contained in the agent is maintained. A secure network connection protocol, such as HTTPS, could be used by the agent to communicate with the Websites.
16.6.3 Agents and Security Another major issue for the use of autonomous agents on the Internet is related to security. Although agents can help legitimate users perform tasks such as searching for information on Websites and bidding in online auctions, they can also be used by unscrupulous users to flood Websites with requests or send e-mail spam. How can we leverage intelligent agent technology to enhance the usefulness of the Internet without providing new opportunities for misuse? One approach is to ensure that each and every agent can be traced back to a human user. The agent would be able to adopt the authorization of the user and perform transactions on the user’s behalf. Each agent would require a digital certificate used to prove the agent’s identity and to encrypt any sensitive messages. The same security mechanisms that make electronic commerce safe for human users on Websites could be used to secure agent-based applications.
16.6.4 Autonomic Computing and Agents A major computing initiative and grand challenge in computer science is in the area of self-configuring, self-healing, self-optimizing, and self-protecting systems known as autonomic computing. Researchers in industry and academia have identified complexity in the distributed computing infrastructure as a major impediment to realizing sustainable growth and reliability in the Internet [Kephart and Chess, 2003]. An autonomic computing architecture has been developed to help manage this complexity. The architecture, shown is Figure 16.3 is made up of four major elements, known as the MAPE loop, where M stands for monitoring, A represents data analysis, P is for planning, and E is for execution. A central component of the architecture is the domain knowledge. The autonomic manager components can be implemented as intelligent agents using sensors and effectors to interact with other autonomic elements and the resources they are managing [Bigus et al., 2002].
Autonomic Manager Analyze
Monitor
Plan
Knowledge
Execute
Managed Element
FIGURE 16.3 An autonomic computing architecture.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 12 Wednesday, August 4, 2004 8:27 AM
16-12
The Practical Handbook of Internet Computing
16.7 Summary In this chapter, we defined intelligent agents and described their essential attributes, such as autonomy, persistence, reasoning, and communication. Basic and expanded intelligent agent architectures were presented, and the various functional components including sensors, effectors, perception, reasoning, planning, and learning were described. We identified many types of bots used on the Internet today and introduced several research issues related to the successful application of intelligent agent technology.
16.8 Further Information The research, development and deployment, of intelligent agent applications on the Internet are very active areas in computer science today. Information on the latest research can be found at the major agent conferences including the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), the National Conference on Artificial Intelligence (AAAI), and the International Joint Conference on Artificial Intelligence (IJCAI). Major magazines and journals include IEEE Internet Computing, IEEE Intelligent Systems, Artificial Intelligence Magazine (AAAI), Artificial Intelligence Journal (Elsevier), Journal of Artificial Intelligence Research (JAIR), and The Journal of Experimental and Theoretical Artificial Intelligence (JETAI). Popular Web resources include the BotSpot site found at www.botspot.com, the PC AI magazine site at http://www.pcai.com/ and the University of Maryland Baltimore County (UMBC) Agent Web site at http://agents.umbc.edu.
16.9 Glossary Agent communication language: A formal language used by agents to talk with one another. Agent platform: A set of distributed services including agent lifecycle, communication and message transport, directory, and logging services on which multiple agents can run. Anthropomorphic agent: An agent whose representation has virtual human characteristics such as gestures, facial expressions, and vocalization used to convey emotion and personality. Artificial intelligence: Refers to the ability of computer software to perform activities normally thought to require human intelligence. Attentional subsystem: The software component that determines the relative importance of input received by an agent. Autonomous agent: An agent that can take actions on its own initiative. Behavioral subsystem: The decision-making component of an agent that includes domain knowledge, working memory, and decision-making logic. Belief–Desire–Intention agent: A sophisticated type of agent that holds complex internal states representing its beliefs, desires, and intended actions. Bot: A shorthand term for a software robot or agent, usually applied to agents working on the Internet. Effector: The means by which an agent takes action in its world. Intelligent Agent: An active, persistent software component that can perceive, reason, act, and communicate. Learning: The ability to adapt behavior based on experience or feedback. Mobility: The ability to move around a network from system to system. Multiagent system: An application or service comprised of multiple intelligent agents that communicate and use the services of an agent platform. Perceptual subsystem: The software component that converts raw data into the format used by the decision-making component of an agent. Planning: The ability to reason from an initial world state to a desired final state producing a partially ordered set of operators. Reasoning: The ability to use inferencing algorithms with a knowledge representation.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 13 Wednesday, August 4, 2004 8:27 AM
Agents
16-13
Sensor: The means by which an agent received input from its environment. Social agent: An agent with the ability to interact and communicate with other software agents and to join in cooperative and competitive activities.
Acknowledgments The authors would like to acknowledge the support of the IBM T.J. Watson Research Center and the IBM Rochester eServer Custom Technology Center.
References Bates, Joseph. The role of emotion in believable agents. Communications of the ACM, 37(7): 122–125, July 1994. Berners-Lee, Tim, James Hendler, and Ora Lassila. The Semantic Web. Scientific American, May 2001. Bigus, Joseph P. Data Mining with Neural Networks. McGraw Hill, New York, 1996. Bigus, Joseph P. and Jennifer Bigus. Constructing Intelligent Agents using Java, 2nd ed., John Wiley & Sons, New York, 2001. Bigus, Joseph P., Donald A. Schlosnagle, Jeff R. Pilgrim, W. Nathanial Mills, and Yixin Diao. ABLE: A toolkit for building multiagent autonomic systems. IBM Systems Journal, 41(3): 350–370, 2002. Bradshaw, Jeffrey M. Ed. Software Agents. MIT Press, Cambridge, MA, 1997. Bratko, Irving. Prolog Programming for Artificial Intelligence. Addison-Wesley, Reading, MA, 1986. Brooks, Rodney A. Intelligence without representation. Artificial Intelligence Journal, 47: 139–159, 1991. Cohen, Philip R. and Hector J. Levesque. Intention is choice with commitment. Artificial Intelligence Journal, 42(2–3): 213–261, 1990. Forgy, Charles L. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19: 17–37, 1982. Franklin, Stan and Art Graesser. Is it an agent, or just a program? A taxonomy for autonomous agents. Proceedings of the 3rd International Workshop on Agent Theories, Architectures, and Languages. Springer-Verlag, New York, 1996. Huhns, Michael N. and Munindar P. Singh, Eds. Readings in Agents. Morgan Kaufmann, San Francisco, 1998. IEEE Expert. AI planning systems in the real world. IEEE Expert, December, 4–12, 1996. Johnson, W. Lewis. Interaction tactics for socially intelligent pedagogical agents. International Conference on Intelligent User Interfaces, 251–253, 2003. Johnson, W. Lewis, Jeff W. Rickel, and James C. Lester. Animated pedagogical agents: face-to-face interaction in interactive learning environments. International Journal of Artificial Intelligence in Education, 11: 47–78, 2000. Kephart, Jeffrey O. and David M. Chess. The vision of autonomic computing. IEEE Computer, January 2003. King, William J. and Jun Ohya. The representation of agents: anthropomorphism, agency, and intelligence. Proceedings CHI’96 Conference Companion, 289–290, 1996. Labrou, Yannis and Tim Finin. Semantics and conversations for an agent communication language. Proceedings of the 15th International Conference on Artificial Intelligence, 584–491. International Joint Conferences on Artificial Intelligence, 1997. Maes, Patti. Agents that reduce work and information overload. Communications of the ACM, 7: 31–40, 1994. Milewski, Allen E. and Steven H. Lewis. Delegating to software agents. International Journal of Human–Computer Studies, 46: 485–500, 1997. Rao, Anand S. and Michael P. Georgeff. Modeling rational agents within a BDI-architecture. Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, 473–484, 1991.
Copyright 2005 by CRC Press LLC
C3812_C16.fm Page 14 Wednesday, August 4, 2004 8:27 AM
16-14
The Practical Handbook of Internet Computing
Sloman, Aaron. Damasio, Descartes, alarms, and meta-management. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2652–2657, San Diego, CA, 1998. Sycara, Katia. Multiagent Systems. AI Magazine 19(2): 79–92, 1998. Russell, Stewart and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, NJ, 1995. Williams, Joseph, Ed. Bots and other Internet Beasties. Sams.net, Indianapolis, IN, 1996. Zadeh, Lotfi A. Fuzzy logic, neural networks, and soft computing. Communications of the ACM, 3: 78–84, 1994.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 1 Wednesday, August 4, 2004 8:28 AM
17 Multiagent Systems for Internet Applications CONTENTS Abstract 17.1 Introduction 17.1.1 Benefits of an Approach Based on Multiagent Systems 17.1.2 Brief History of Multiagent Systems
17.2 Infrastructure and Context for Web-Based Agents 17.2.1 The Semantic Web 17.2.2 Standards and Protocols for Web Services 17.2.3 Directory Services
17.3 Agent Implementations of Web Services 17.4 Building Web-Service Agents 17.4.1 17.4.2 17.4.3 17.4.4 17.4.5
Michael N. Huhns Larry M. Stephens
Agent Types Agent Communication Languages Knowledge and Ontologies for Agent Reasoning Systems Cooperation
17.5 Composing Cooperative Web Services 17.6 Conclusion Acknowledgment References
Abstract The World Wide Web is evolving from an environment for people to obtain information to an environment for computers to accomplish tasks on behalf of people. The resultant Semantic Web will be computer friendly through the introduction of standardized Web services. This chapter describes how Web services will become more agent-like, and how the envisioned capabilities and uses for the “Semantic Web” will require implementations in the form of multiagent systems. It also describes how to construct multiagent systems that implement Web-based software applications.
17.1 Introduction Web services are the most important Internet technology since the browser. They embody computational functionality that corporations and organizations are making available to clients over the Internet. Web services have many of the same characteristics and requirements as simple software agents, and because of the kinds of demands and expectations that people have for their future uses, it seems apparent that Web services will soon have to be more like complete software agents. Hence, the best way to construct Web services will be in terms of multiagent systems. In this chapter, we describe the essential character-
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 2 Wednesday, August 4, 2004 8:28 AM
17-2
The Practical Handbook of Internet Computing
istics and features of agents and multiagent systems, how to build them, and then how to apply them to Web services. The environment for Web services, and computing in general, is fast becoming ubiquitous and pervasive. It is ubiquitous because computing power and access to the Internet is being made available everywhere; it is pervasive because computing is being embedded in the very fabric of our environment. Xerox Corporation has coined the phrase “smart matter” to capture the idea of computations occurring within formerly passive objects and substances. For example, our houses, our furniture, and our clothes will contain computers that will enable our surroundings to adapt to our preferences and needs. New visions of interactivity portend that scientific, commercial, educational, and industrial enterprises will be linked, and human spheres previously untouched by computing and information technology, such as our personal, recreational, and community life, will be affected. This chapter suggests a multiagent-based architecture for all of the different devices, components, and computers to understand each other, so that they will be able to work together effectively and efficiently. The architecture that we describe is becoming canonical. Agents are used to represent users, resources, middleware, security, execution engines, ontologies, and brokering, as depicted in Figure 17.1. As the technology advances, we can expect such specialized agents to be used as standardized building blocks for information systems and Web services. Multiagent systems are applicable not only to the diverse information soon to be available locally over household, automobile, and environment networks but also to the huge amount of information available globally over the World Wide Web being made available as Web services. Organizations are beginning to represent their attributes, capabilities, and products on the Internet as services that can be invoked by potential clients. By invoking each other’s functionalities, the Web services from different organizations can be combined in novel and unplanned ways to yield larger, more comprehensive functionalities with much greater value than the individual component services could provide. Web services are XML-based, work through firewalls, are lightweight, and are supported by all software companies. They are a key component of Microsoft’s .NET initiative, and are deemed essential to the business directions being taken by IBM, Sun, and SAP. Web services are also central to the envisioned Semantic Web [Berners-Lee et al., 2001], which is what the World Wide Web is evolving into. But the Semantic Web is also seen as a friendly environment for software agents, which will add capabilities and functionality to the Web. What will be the relationship between multiagent systems and Web services? User
User agent
Ontology agent
Broker agent
Resource agent
Database
Resource agent
Database
Execution or mediator agent
Application program
FIGURE 17.1 The agents in a multiagent Internet application first determine a user’s request (the responsibility of the user agent) and then satisfy it by managing its processing. Under the control of the execution agent, the request might be sent to one or more databases or Websites, which are managed by resource agents.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 3 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-3
17.1.1 Benefits of an Approach Based on Multiagent Systems Multiagent systems can form the fundamental building blocks for not only Web services but also software systems in general, even if the software systems do not themselves require any agent-like behaviors [Jennings, 2000]. When a conventional software system is constructed with agents as its modules, it can exhibit the following characteristics: • Agent-based modules, because they are active, more closely represent real-world things, that are the subjects of many applications. • Modules can hold beliefs about the world, especially about themselves and others; if their behavior is consistent with their beliefs, then it will be more predictable and reliable. • Modules can negotiate with each other, enter into social commitments to collaborate, and can change their mind about their results. The benefits of building software out of agents are [Coelho et al., 1994; Huhns, 2001]: 1. Agents enable dynamic composibility, where the components of a system can be unknown until runtime. 2. Agents allow interaction abstractions, where interactions can be unknown until runtime. 3. Because agents can be added to a system one-at-a-time, software can continue to be customized over its lifetime, even potentially by end users. 4. Because agents can represent multiple viewpoints and can use different decision procedures, they can produce more robust systems. The essence of multiple viewpoints and multiple decision procedures is redundancy, which is the basis for error detection and correction. An agent-based system can cope with a growing application domain by increasing the number of agents, each agent’s capability, the computational resources available to each agent, or the infrastructure services needed by the agents to make them more productive. That is, either the agents or their interactions can be enhanced. As described in Section 17.2.3, agents share many functional characteristics with Web services. We show how to build agents that implement Web services and achieve the benefits listed above. In addition, we describe how personal agents can aid users in finding information on the Web, keeping data current (such as trends in stocks and bonds), and alerts to problems and opportunities (such as bargains on eBay). We next survey how agent technology has progressed since its inception, and indicate where it is heading.
17.1.2 Brief History of Multiagent Systems Agents and agency have been the object of study for centuries. They were first considered in the philosophy of action and ethics. In this century, with the rise of psychology as a discipline, human agency has been studied intensively. Within the five decades of artificial intelligence (AI), computational agents have been an active topic of exploration, The AI work in its earliest stages investigated agents explicitly, albeit with simple models. Motivated by results from psychology, advances in mathematical logic, and concepts such as the Turing Test, researchers concentrated on building individual intelligent systems or one of their components, such as reasoning mechanisms or learning techniques. This characterized the first 25 years of AI. However, the fact that some problems, such as sensing a domain, are inherently distributed, coupled with advances in distributed computing, led several researchers to investigate distributed problem solving and distributed artificial intelligence (DAI). Progress and directions in these areas became informed more by sociology and economics than by psychology. From the late seventies onward, the resultant DAI research community [Huhns, 1987; Bond and Gasser, 1988; Gasser and Huhns, 1989] concerned itself with agents as computational entities that interacted with each other to solve various kinds of distributed problems. To this end, whereas AI at large borrowed abstractions such as beliefs and intentions from psychology, the DAI community borrowed abstractions
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 4 Wednesday, August 4, 2004 8:28 AM
17-4
The Practical Handbook of Internet Computing
and insights from sociology, organizational theory, economics, and the philosophies of language and linguistics. These abstractions complement rather than oppose the psychological abstractions, but — being about groups of agents — are fundamentally better suited to large distributed applications. With the expansion of the Internet and the Web in the 1990s, we witnessed the emergence of software agents geared to open information environments. These agents perform tasks on behalf of a user or serve as nodes — brokers or information sources — in the global information system. Although software agents of this variety do not involve specially innovative techniques, it is their synthesis of existing techniques and their suitability for their application that makes them powerful and popular. Thus, much of the attention they have received is well deserved.
17.2 Infrastructure and Context for Web-Based Agents 17.2.1 The Semantic Web The World Wide Web was designed for humans. It is based on a simple concept: information consists of pages of text and graphics that contain links, and each link leads to another page of information, with all of the pages meant to be viewed by a person. The constructs used to describe and encode a page, the Hypertext Markup Language (HTML), describe the appearance of the page but not its contents. Software agents do not care about appearance, but rather the contents. The Semantic Web will add Web services that are envisioned to be • • • •
Understandable to computers Adaptable and personalized to clients Dynamically composable by clients Suitable for robust transaction processing by virtual enterprises.
There are, however, some agents that make use of the Web as it is now. A typical kind of such agent is a shopbot, an agent that visits the online catalogs of retailers and returns the prices being charged for an item that a user might want to buy. The shopbots operate by a form of “screen-scraping,” in which they download catalog pages and search for the name of the item of interest, and then the nearest set of characters that has a dollar-sign, which presumably is the item’s price. The shopbots also might submit the same forms that a human might submit and then parse the returned pages that merchants expect are being viewed by humans. The Semantic Web will make the Web more accessible to agents by making use of semantic constructs, such as ontologies represented in OWL, RDF, and XML, so that agents can understand what is on a page.
17.2.2 Standards and Protocols for Web Services A Web service is a functionality that can be engaged over the Web. Web services are currently based on the triad of functionalities depicted in Figure 17.2. The architecture for Web services is founded on principles and standards for connection, communication, description, and discovery. For providers and requestors of services to be connected and to exchange information, there must be a common language. This is provided by the eXtensible Markup Language (XML). Short descriptions of the current protocols for Web service connection, description, and discovery are in the following paragraphs; more complete descriptions are found elsewhere in this book. A common protocol is required for systems to communicate with each other, so that they can request services, such as to schedule appointments, order parts, and deliver information. This is provided by the Simple Object Access Protocol (SOAP) [Box et al., 2000]. The services must be described in a machine-readable form, where the names of functions, their required parameters, and their results can be specified. This is provided by the Web Services Description Language (WSDL).
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 5 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-5
Service Broker (agent broker; directory facilitator)
Publish: WSDL (ACL)
Service Provider (multiagent system for cooperative distributed service)
Find: UDDI (ACL)
Bind: SOAP (ACL)
Service Requestor (requesting agent)
FIGURE 17.2 The general architectural model for Web services, which rely on the functionalities of publish, find, and bind. The equivalent agent-based functionalities are shown in parentheses, where all interactions among the agents are via an agent-communication language (ACL). Any agent might serve as a broker. Also, the service provider’s capabilities might be found without using a broker.
Finally, clients — users and businesses — need a way to find the services they need. This is provided by Universal Description, Discovery, and Integration (UDDI), which specifies a registry or “yellow pages” of services. Besides standards for XML, SOAP, WSDL, and UDDI, there is a need for broad agreement on the semantics of specific domains. This is provided by the Resource Description Framework (RDF) [Decker et al., 2000a,b], the OWL Web Ontology Language [Smith et al., 2003], and, more generally, ontologies [Heflin and Hendler, 2000].
17.2.3 Directory Services The purpose of a directory service is for components and participants to locate each other, where the components and participants might be applications, agents, Web service providers, Web service requestors, people, objects, and procedures. There are two general types of directories, determined by how entries are found in the directory: (1) name servers or white pages, where entries are found by their name, and (2) yellow pages, where entries are found by their characteristics and capabilities. The implementation of a basic directory is a simple database-like mechanism that allows participants to insert descriptions of the services they offer and query for services offered by other participants. A more advanced directory might be more active than others, in that it might provide not only a search service but also a brokering or facilitating service. For example, a participant might request a brokerage service to recruit one or more agents that can answer a query. The brokerage service would use knowledge about the requirements and capabilities of registered service providers to determine the appropriate providers to which to forward a query. It would then send the query to those providers, relay their answers back to the original requestor, and learn about the properties of the responses it passes on (e.g., the brokerage service might determine that advertised results from provider X are incomplete and so seek out a substitute for provider X). UDDI is itself a Web service that is based on XML and SOAP. It provides both a white-pages and a yellow-pages service, but not a brokering or facilitating service. The DARPA (Defense Advanced Research Projects Agency) DAML (DARPA Agent Markup Language) effort has also specified a syntax and semantics for describing services, known as DAML-S (now migrating to OWL-S http://www.daml.org/services). This service description provides • Declarative ads for properties and capabilities, used for discovery • Declarative APIs, used for execution • Declarative prerequisites and consequences, used for composition and interoperation
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 6 Wednesday, August 4, 2004 8:28 AM
17-6
The Practical Handbook of Internet Computing
17.3 Agent Implementations of Web Services Typical agent architectures have many of the same features as Web services. Agent architectures provide yellow-page and white-page directories where agents advertise their distinct functionalities and where other agents search to locate the agents in order to request those functionalities. However, agents extend Web services in several important ways: • A Web service knows only about itself but not about its users/clients/customers. Agents are often self-aware at a metalevel and, through learning and model building, gain awareness of other agents and their capabilities as interactions among the agents occur. This is important because without such awareness a Web service would be unable to take advantage of new capabilities in its environment and could not customize its service to a client, such as by providing improved services to repeat customers. • Web services, unlike agents, are not designed to use and reconcile ontologies. If the client and provider of the service happen to use different ontologies, then the result of invoking the Web service would be incomprehensible to the client. • Agents are inherently communicative, whereas Web services are passive until invoked. Agents can provide alerts and updates when new information becomes available. Current standards and protocols make no provision for even subscribing to a service to receive periodic updates. • A Web service, as currently defined and used, is not autonomous. Autonomy is a characteristic of agents, and it is also a characteristic of many envisioned Internet-based applications. Among agents, autonomy generally refers to social autonomy, where an agent is aware of its colleagues and is sociable, but nevertheless exercises its independence in certain circumstances. Autonomy is in natural tension with coordination or with the higher-level notion of a commitment. To be coordinated with other agents or to keep its commitments, an agent must relinquish some of its autonomy. However, an agent that is sociable and responsible can still be autonomous. It would attempt to coordinate with others where appropriate and to keep its commitments as much as possible, but it would exercise its autonomy in entering into those commitments in the first place. • Agents are cooperative and, by forming teams and coalitions, can provide higher-level and more comprehensive services. Current standards for Web services do not provide for composing functionalities.
17.4 Building Web-Service Agents 17.4.1 Agent Types To better communicate some of the most popular agent architectures, this chapter uses UML diagrams to guide an implementer’s design. However, before we describe these diagrams, we need to review some of the basic features of agents. Consider the architecture in Figure 17.3 for a simple agent interacting with an information environment, which might be the Internet, an intranet, or a virtual private network (VPN). The agent senses its environment, uses what it senses to decide upon an action, and then performs the action through its effectors. Sensory input can include received messages, and the action can be the sending of messages. To construct an agent, we need a more detailed understanding of how it functions. In particular, if we are to construct one using conventional object-oriented design techniques, we should know in what ways an agent is more than just a simple object. Agent features relevant to implementation are unique identity, proactivity, persistence, autonomy, and sociability [Weiß, 1999]. An agent inherits its unique identity simply by being an object. To be proactive, an agent must be an object with an internal event loop, such as any object in a derivation of the Java thread class would have. Here is simple pseudocode for a typical event loop, where events result from sensing an environment:
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 7 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-7
Inputs Sensors
Agent What the world is like now Condition-action rules What action I should do now
Environment (Internet, intranet, or VPN)
Effectors Outputs
FIGURE 17.3 A simple interaction between an agent and its information environment (Adapted from Russell, Stuart J. and Peter Norvig. Artificial Intelligence: A Modern Approach, 2nd Ed. Prentice-Hall, Upper Saddle River, NJ, 2003.). “What action I should do now” depends on the agent’s goals and perhaps ethical considerations, as noted in Figure 17.6.
Environment e; RuleSet r; while (true) { state = senseEnvironment(e); a = chooseAction(state, r); e.applyAction(a); }
This is an infinite loop, which also provides the agent with persistence. Ephemeral agents would find it difficult to converse, making them, by necessity, asocial. Additionally, persistence makes it worthwhile for agents to learn about and model each other. To benefit from such modeling, they must be able to distinguish one agent from another; thus, agents need unique identities. Agent autonomy is akin to human free will and enables an agent to choose its own actions. For an agent constructed as an object with methods, autonomy can be implemented by declaring all the methods private. With this restriction, only the agent can invoke its own methods, under its own control, and no external object can force the agent to do anything it does not intend to do. Other objects can communicate with the agent by creating events or artifacts, especially messages, in the environment that the agent can perceive and react to. Enabling an agent to converse with other agents achieves sociability. The conversations, normally conducted by sending and receiving messages, provide opportunities for agents to coordinate their activities and cooperate, if so inclined. Further sociability can be achieved by generalizing the input class of objects an agent might perceive to include a piece of sensory information and an event defined by the agent. Events serving as inputs are simply “reminders” that the agent sets for itself. For example, an agent that wants to wait 5 min for a reply would set an event to fire after 5 min. If the reply arrives before the event, the agent can disable the event. If it receives the event, then it knows it did not receive the reply in time and can proceed accordingly. The UML diagrams in Figure 17.4 and Figure 17.5 can help in understanding or constructing a software agent. These diagrams do not address every functional aspect of an agent’s architecture. Instead, they provide a general framework for implementing traditional agent architectures [Weiß, 1999). 17.4.1.1 Reactive Agents A reactive agent is the simplest kind to build because it does not maintain information about the state of its environment but simply reacts to current perceptions. Our design for such an agent, shown in Figure 17.4, is fairly intuitive, encapsulating a collection of behaviors, sometimes known as plans, and the means for selecting an appropriate one. A collection of objects, in the object-oriented sense, lets a developer add and remove behaviors without having to modify the action selection code, since an iterator
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 8 Wednesday, August 4, 2004 8:28 AM
17-8
The Practical Handbook of Internet Computing
Agent b : BehaviorCollection e : Environment
run(): while (true) e.takeAction(b.getAction(e.getInput()))
run() Environment takeAction() getInput()
BehaviorCollection elements : Vector getAction() Action
Behavior inhibits : Vector matches() inhibits() execute()
State
getAction(state): Vector match; for each b in elements if (b.matches(state)) match.add(b); for each b in match inhibited = false; for each c in match if (c.inhibits(b)) inhibited = true; break; if (!inhibited) return b.execute(state); return null;
FIGURE 17.4 Diagram of a simple reactive architecture for an agent. The agent’s run () method executes the action specified by the current behavior and state.
can be used to traverse the list of behaviors. Each behavior fires when it matches the environment, and each can inhibit other behaviors. Our action-selection loop is not as efficient as it could be, because getAction operates in O(n) time (where n is the number of behaviors). A better implementation could lower the computation time to O(logn) using decision trees, or O(1) using hardware or parallel processing. The developer is responsible for ensuring that at least one behavior will match for every environment. This can be achieved by defining a default behavior that matches all inputs but is inhibited by all other behaviors that match. 17.4.1.2 BDI Agents A belief–desire–intention (BDI) architecture includes and uses an explicit representation for an agent’s beliefs (state), desires (goals), and intentions (plans). The beliefs include self-understanding (“I believe I can perform Task-A”), beliefs about the capabilities of other agents (“I believe Agent-B can perform Task-B”), and beliefs about the environment (“Based on my sensors, I believe I am 3 ft from the wall”). The intentions persist until accomplished or are determined to be unachievable. Representative BDI systems — the Procedural Reasoning System (PRS) and JAM — all define a new programming language and implement an interpreter for it. The advantage of this approach is that the interpreter can stop the program at any time, save its state, and execute some other intention if it needs to. The disadvantage is that the interpreter — not an intention — runs the system; the current intention may no longer be applicable if the environment changes. The BDI architecture shown in Figure 17.5 eliminates this problem. It uses a voluntary multitasking method instead, whereby the environment thread constantly checks to make sure the current intention is applicable. If not, the agent executes stopCurrentIntention ( ), which will call the intention’s stopEx-
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 9 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
Agent B : BeliefSet D : DesireSet P : IntentionSet I : Intention e : Environment
17-9
Environment a : Agent thread : Thread
run() currentIntentionIsOK() : boolean stopCurrentIntention() getBestIntention() : Intention
IntentionSet
getInput(Agent) : BeliefSet takeAction(Agent, Action) run()
DesireSet elements : Vector getApplicable(BeliefSet) : DesireSet add(Desire) remove(Desire)
elements : Vector getApplicable(DesireSet, BeliefSet) : IntentionSet
Desire type : String priority : int
Intention a : Agent e : Environment priority : int goal : Desire
context(BeliefSet) : boolean
BeliefSet incorporateNewObs(BeliefSet)
satisfies(Desire) : boolean execute(Agent) : boolean context(BeliefSet) : boolean stopExecuting()
Belief
FIGURE 17.5 Diagram of a belief–desire–intention architecture for an agent.
ecuting ( ) method. Thus, the intention is responsible for stopping itself and cleaning up. By giving each intention this capability, we eliminate the possibility of a deadlock resulting from the intentions having some resource reserved when it was stopped. The following pseudocode illustrates the two main loops, one for each thread, of the BDI architecture. The variables a, B, D, and I represent the agent and its beliefs, desires, and intentions. The agent’s run method consists of finding the best applicable intention and executing it to completion. If the result of the execution is true, the meaning is that the desire was achieved, so the desire is removed from the desire set. If the environment thread finds that an executing plan is no longer applicable and calls for a stop, the intention will promptly return from its execute ( ) call with a false. Notice that the environment thread modifies the agent’s set of beliefs. The belief set needs to synchronize these changes with any changes that the intentions make to the set of beliefs. Agent::run () { Environment e; e.run (); //start environment in its own thread while (true) { I = a.getBestIntention(); If (I.execute(a)) // true if intention was achieved a.D.remove(I.goal); // I.goal is a desire } }
Finally, the environment thread’s sleep time can be modified, depending on the systems real-time requirements. If we do not need the agent to change intentions rapidly when the environment changes,
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 10 Wednesday, August 4, 2004 8:28 AM
17-10
The Practical Handbook of Internet Computing
the thread can sleep longer. Otherwise, a short sleep will make the agent check the environment more frequently, using more computational resources. A more efficient callback mechanism could easily replace the current run method if the agents input mechanism supported it. Environment::run() { while (true) { a.B.incorporateNewObservations (e.getInput (a)) ; if (! a.currentIntentionIsOK() ) a.stopCurrentIntention(); sleep(someShortTime); } }
17.4.1.3 Layered Architectures Other common architectures for software agents consist of layers of capabilities where the higher layers perform higher levels of reasoning. For example, Figure 17.6 shows the architecture of an agent that has a philosophical and ethical basis for choosing its actions. The layers typically interact in one of three ways: (case 1) inputs are given to all of the layers at the same time; (case 2) inputs are given to the highest layer first for deliberation, and then its guidance is propagated downward through each of the lower layers until the lowest layer performs the ultimate action; and (case 3) inputs are given to the lowest layer first, which provides a list of possible actions that are successively filtered by each of the higher layers until a final action remains. The lowest level of the architecture enables an agent to react to immediate events [Müller et al., 1994]. The middle layers are concerned with an agent’s interactions with others [Castelfranchi, 1998; Castelfranchi et al., 2000; Rao and Georgeff, 1991; Cohen and Levesque, 1990], whereas the highest level enables the agent to consider the long-term effects of its behavior on the rest of its society [Mohamed and Huhns, 2001]. Agents are typically constructed starting at the bottom of this architecture, with increasingly more abstract reasoning abilities layered on top. Awareness of other agents and of one’s own role in a society, which are implicit at the social commitment level and above, can enable agents to behave coherently [Gasser, 1991]. Tambe et al. [2000] have shown how a team of agents flying helicopters will continue to function as a coherent team after their Inputs: case 2
Agent Characterization
Philosophical Principles Reasoner Societal Norms and Conventions Constraint Reasoner Inputs: case 1
Social Commitment Reasoner Decision Theory (Probabilistic) Reasoner Beliefs, Desires, and Intentions Logical Reasoner Reactive Agent Kernel Inputs: case 3
“Good Citizen”
“Team Player”
“Rational”
“Responsive” Actions
FIGURE 17.6 Architecture for a philosophical agent. The architecture defines layers of deliberation for enabling an agent to behave appropriately in a society of agents.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 11 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-11
leader has crashed because another agent will assume the leadership role. More precisely, the agents will adjust their individual intentions in order to fulfill the commitments made by the team. 17.4.1.4 Behaviors and Activity Management Most popular agent architectures, including the three we diagrammed, include a set of behaviors and a method for scheduling them. A behavior is distinguished from an action in that an action is an atomic event, whereas a behavior can span a longer period of time. In multiagent systems, we can also distinguish between physical behaviors that generate actions, and conversations between agents. We can consider behaviors and conversations to be classes inheriting from an abstract activity class. We can then define an activity manager responsible for scheduling activities. This general activity manager design lends itself to the implementation of many popular agent architectures while maintaining the proper encapsulation and decomposability required in good objectoriented programming. Specifically, activity is an abstract class that defines the interface to be implemented by all behaviors and conversations. The behavior class can implement any helper functions needed in the particular domain (for example, subroutines for triangulating the agents position). The conversation class can implement a finite-state machine for use by the particular conversations. For example, by simply filling in the appropriate states and adding functions to handle the transitions, an agent can define a contracting protocol as a class that inherits from conversation. Details of how this is done depend on how the conversation class implements a finite-state machine, which varies depending on the system’s real-time requirements. Defining each activity as its own independent object and implementing a separate activity manager has several advantages. The most important is the separation between domain and control knowledge. The activities will embody all the knowledge about the particular domain the agent inhabits, while the activity manager embodies knowledge about the deadlines and other scheduling constraints the agent faces. When the implementation of each activity is restricted to a separate class, the programmer must separate the agent’s abilities into encapsulated objects that other activities can then reuse. The activity hierarchy forces all activities to implement a minimal interface that also facilitates reuse. Finally, placing the activities within the hierarchy provides many opportunities for reuse through inheritance. For example, the conversation class can implement a general lost-message error-handling procedure that all conversations can use. 17.4.1.5 Architectural Support Figure 17.4 and Figure 17.5 provide general guidelines for implementing agent architectures using an object-oriented language. As agents become more complex, developers will likely have to expand upon our techniques. We believe these guidelines are general enough that it will not be necessary to rewrite the entire agent from scratch when adding new functionality. Of course, a complete agent-based system requires an infrastructure to provide for message transport, directory services, and event notification and delivery. These are usually provided as operating system services or, increasingly, in an agent-friendly form by higher-level distributed protocols such as Jini (http:/ /www.sun.com/jini/), Bluetooth (http://www.bluetooth.com), and FIPA’s (the Foundation of Intelligent Physical Agents, at http://www.fipa.org/) emerging standards. See http://www.multiagent.com/, a site maintained by José Vidal, for additional information about agent tools and architectures. The canonical multiagent architecture, shown in Figure 17.1, is suitable for many applications. The architecture incorporates a variety of resource agents that represent databases, Websites, sensors, and file systems. Specifications for the behaviors of two types of resource agents are shown in Figure 17.7 and Figure 17.8. Figure 17.7 contains a procedural specification for the behavior of a resource (wrapper) agent that makes a database system active and accessible to other agents. In a supply-chain management scenario, the database agent might represent a supplier’s order-processing system; customers could send orders to the agent (inform) and then issue queries for status and billing information. A procedural specification for the behavior of an Internet agent that actively monitors a Website for new or updated
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 12 Wednesday, August 4, 2004 8:28 AM
17-12
The Practical Handbook of Internet Computing
Control flow for Database Resource Agent
Get Database Parameters
Register with Directory Facilitator
Start Query and Inform Behaviors
Receive and Parse Message Type
Inform Behavior
Query Behavior
Parse Inform Message
Parse Query Message
pred=Table Name
pred=Other
pred=Other
pred=Table Name
Query DB for Table Entry Write/Update Entry in DB Table
Reply with NotUnderstood Reply with Table Entry
FIGURE 17.7 Activity diagram showing the procedural behavior of a resource agent that makes a database system active and accessible to other agents.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 13 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-13
Control flow for Website Monitoring Agent
Get Website Parameters
Register with DF
Check For New Data
Data Available None Notify Data User
Sleep Until Message or Timeout
Receive&Parse Message
pred=STOP||KILL
pred=other
None
Delete Agent
reply NOT_UNDERSTOOD
FIGURE 17.8 Activity diagram showing the procedural behavior of an Internet agent that actively monitors a Website for new or updated information.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 14 Wednesday, August 4, 2004 8:28 AM
17-14
The Practical Handbook of Internet Computing
information is shown in Figure 17.8. An agent that implements this procedure could be used by customers looking for updated pricing or product information.
17.4.2 Agent Communication Languages Agents representing different users might collaborate in finding and fusing information, but compete for goods and resources. Similarly, service agents may collaborate or compete with user, resource, and other service agents. Whether they are collaborators or competitors, the agents must interact purposefully with each other. Most purposeful interactions — whether to inform, query, or deceive — require the agents to talk to one another, and talking intelligibly requires a mutually understood language. 17.4.2.1 Speech Acts Speech acts have to do with communication; they have nothing to do with speaking as such, except that human communication often involves speech. Speech act theory was invented in the fifties and sixties to help understand human language [Austin, 1962]. The idea was that with language you not only make statements but also perform actions. For example, when you request something, you do not just report on a request; you actually cause the request. When a justice of the peace declares a couple man and wife, she is not reporting on their marital status but changing it. The stylized syntactic form for speech acts that begins “I hereby request º” or “I hereby declare º” is called a performative. With a performative, literally, saying it makes it so. Verbs that cannot be put in this form are not speech acts. For example, “solve” is not a performative because “I hereby solve this problem” is not sufficient. Several thousand verbs in English correspond to performatives. Many classifications have been suggested for these, but the following are sufficient for most computing purposes: • • • • •
Assertives (informing) Directives (requesting or querying) Commissives (promising) Prohibitives Declaratives (causing events in themselves as, for example, the justice of the peace does in a marriage ceremony) • Expressives (expressing emotions) In natural language, it is not easy to determine what speech act is being performed. In artificial languages, we do not have this problem. However, the meanings of speech acts depend on what the agents believe, intend, and know how to perform, and on the society in which they reside. It is difficult to characterize meaning because all of these things are themselves difficult. 17.4.2.2 Common Language Agent projects investigated languages for many years. Early on, agents were local to each project, and their languages were mostly idiosyncratic. The challenge now is to have any agent talk to any other agent, which suggests a common language; ideally, all the agents that implement the (same) language will be mutually intelligible. Such a common language needs an unambiguous syntax so that the agents can all parse sentences the same way. It should have a well-defined semantics or meaning so that the agents can all understand sentences the same way. It should be well known so that different designers can implement it and so it has a chance of encountering another agent who knows the same language. Further, it should have the expressive power to communicate the kinds of things agents may need to say to one another. So, what language should you give or teach your agent so that it will understand and be understood? The current popular choice is being administered by the Foundation for Intelligent Agents (FIPA), at http://www.fipa.org. The FIPA ACL separates the domain-dependent part of a communication — the content — from the domain-independent part — the packaging — and then provides a standard for the domain-independent part.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 15 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-15
FIPA specifies just six performatives, but they can be composed to enable agents to express more complex beliefs and expectations. For example, an agent can request to be informed about one of several alternatives. The performatives deal explicitly with actions, so requests are for communicative actions to be done by the message recipient. The FIPA specification comes with a formal semantics, and it guarantees that there is only one way to interpret an agent’s communications. Without this guarantee, agents (and their designers) would have to choose among several alternatives, leading to potential misunderstandings and unnecessary work.
17.4.3 Knowledge and Ontologies for Agents An ontology is a computational model of some portion of the world. It is often captured in some form of a semantic network, a graph whose nodes are concepts or individual objects and whose arcs represent relationships or associations among the concepts. This network is augmented by properties and attributes, constraints, functions, and rules that govern the behavior of the concepts. Formally, an ontology is an agreement about a shared conceptualization, which includes frameworks for modeling domain knowledge and agreements about the representation of particular domain theories. Definitions associate the names of entities in a universe of discourse (for example, classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these names. For information systems, or for the Internet, ontologies can be used to organize keywords and database concepts by capturing the semantic relationships among the keywords or among the tables and fields in a database. The semantic relationships give users an abstract view of an information space for their domain of interest. 17.4.3.1 A Shared Virtual World How can such an ontology help our software agents? It can provide a shared virtual world in which each agent can ground its beliefs and actions. When we talk with our travel agent, we rely on the fact that we all live in the same physical world containing planes, trains, and automobiles. We know, for example, that a 777 is a type of airliner that can carry us to our destination. When our agents talk, the only world they share is one consisting of bits and bytes — which does not allow for a very interesting discussion! An ontology gives the agents a richer and more useful domain of discourse. The previous section described FIPA, which specifies the syntax but not the semantics of the messages that agents can exchange. It also allows the agents to state which ontology they are presuming as the basis for their messages. Suppose two agents have access to an ontology for travel, with concepts such as airplanes and destinations, and suppose the first agent tells the second about a flight on a 777. Suppose further that the concept “777” is not part of the travel ontology. How could the second agent understand? The first agent could explain that a 777 is a kind of airplane, which is a concept in the travel ontology. The second agent would then know the general characteristics of a 777. This communication is illustrated in Figure 17.9. 17.4.3.2 Relationships Represented Most ontologies represent and support relationships among classes of meaning. Among the most important of these relationships are: • Generalization and inheritance, which are abstractions for sharing similarities among classes while preserving their differences. Generalization is the relationship between a class and one or more refined versions of it. Each subclass inherits the features of its superclass, adding other features of its own. Generalization and inheritance are transitive across an arbitrary number of levels. They are also antisymmetric. • Aggregation, the part–whole or part-of relationship, in which classes representing the components of something are associated with the class representing the entire assembly. Aggregation is also
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 16 Wednesday, August 4, 2004 8:28 AM
17-16
The Practical Handbook of Internet Computing
777 ?
(is A 777 Airplane)
Your Agent
My Agent Ontology Transportation Conveyance
Train
Transport
Plane
Airliner
Boat
Fighter
DB Airplane id
model
FIGURE 17.9 Communication between agents sharing a travel ontology.
transitive, as well as antisymmetric. Some of the properties of the assembly class propagate to the component classes. • Instantiation, which is the relationship between a class and each of the individuals that constitute it. Some of the other relationships that occur frequently in ontologies are owns, causes, and contains. Causes and contains are transitive and antisymmetric; owns propagates over aggregation because when you own something, you also own all of its parts.
17.4.4 Reasoning Systems A simple and convenient means to incorporate a reasoning capability into an Internet software agent is via a rule-execution engine, such as JESS [Friedman-Hill, 2003]. With JESS, knowledge is supplied in the form of declarative rules. There can be many or only a few rules, and Jess will continually apply them to data in the form of a knowledge base. Typically, the rules represent the heuristic knowledge of a human expert in some domain, and the knowledge base represents the state of an evolving situation. An example rule in JESS is (defrule recognize-airliner “If an object ?X is a plane and carries passengers, then assert that ?X is an airliner.” (isA ?X plane) (carries ?X passengers) => (assert (isA ?X airliner)))
The associated knowledge base might contain facts about an airline company concerning their equipment and its characteriztics, such as (assert (isA 777-N9682 plane)) (assert (carries 777-N9682 passengers))
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 17 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-17
Many of the common agent development environments, such as JADE [Bellifemine and Trucco, 2003], ZEUS [Nwana et al., 1999], and FIPA-OS [Nortel Networks, 2003], include facilities for incorporating JESS into the agents a developer is constructing.
17.4.5 Cooperation The most widely used means by which agents arrange to cooperate is the contract-net protocol. This interaction protocol allows an initiating agent to solicit proposals from other agents by sending a Call for Proposals, evaluating their proposals, and then accepting the preferred one (or even rejecting all of them). Any agent can initiate the protocol, so it can be applied recursively. The initiator sends a message with a CFP speech act that specifies the action to be performed and, if needed, conditions upon its execution. The responders can reply by sending a PROPOSE message that includes any preconditions for their action, such as their cost or schedule. Alternatively, responders may send a REFUSE message to indicate their disinterest or a NOT-UNDERSTOOD message to indicate a communication problem. The initiator then evaluates the received proposals and sends an ACCEPTPROPOSAL message to the agents whose proposal will be accepted and a REJECT-PROPOSAL message to the others. Once the chosen responders have completed their task, they respond with an INFORM of the result of the action or with a FAILURE if anything went wrong.
17.5 Composing Cooperative Web Services Imagine that a merchant would like to enable a customer to be able to track the shipping of a sold item. Currently, the best the merchant can do is to point the customer to the shipper’s Website, and the customer can then go there to check on delivery status. If the merchant could compose its own production notification system with the shipper’s Web services, the result would be a customized delivery notification service by which the customer — or the customer’s agents — could find the status of a purchase in real time. As Web uses (and thus Web interactions) become more complex, it will be increasingly difficult for one server to provide a total solution and increasingly difficult for one client to integrate solutions from many servers. Web services currently involve a single client accessing a single server, but soon applications will demand federated servers with multiple clients sharing results. Cooperative peer-to-peer solutions will have to be managed, and this is an area where agents have excelled. In doing so, agents can balance cooperation with the interests of their owner.
17.6 Conclusion Web services are extremely flexible, and a major advantage is that a developer of Web services does not have to know who or what will be using the services being provided. They can be used to tie together the internal information systems of a single company or the interoperational systems of virtual enterprises. But how Web services tie the systems together will be based on technologies being developed for multiagent systems. The result will be a Semantic Web that enables work to get done and better decisions to be made.
Acknowledgment The US National Science Foundation supported this work under grant number IIS-0083362.
References Austin, John L. How to Do Things with Words. Clarendon Press, Oxford, 1962. Bellifemine, Fabio and Tiziana Trucco. Java agent development framework, 2003. http://sharon.cselt.it/ projects/jade/.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 18 Wednesday, August 4, 2004 8:28 AM
17-18
The Practical Handbook of Internet Computing
Berners-Lee, Tim, James Hendler, and Ora Lassila. The semantic web. Scientific American, 284(5): 34–43, 2001. Bond, Alan and Les Gasser, Eds. Readings in Distributed Artificial Intelligence. Morgan Kaufmann, San Francisco, 1988. Box, Don, David Ehnebuske, Gopal Kakivaya, Andrew Layman, Noah Mendelsohn, Henrik Frystyk Nielsen, Satish Thatte, and Dave Winer. Simple object access protocol (SOAP) 1.1, 2000. www.w3.org/TR/SOAP. Castelfranchi, Cristiano. Modelling social action for AI agents. Artificial Intelligence, 103: 157–182, 1998. Castelfranchi, Cristiano, Frank Dignum, Catholyn M. Jonker, and Jan Treur. Deliberate normative agents: Principles and architecture. In Nicholas R. Jennings and Yves Lesperance, Eds. Intelligent Agents VI: Agent Theories, Architectures, and Languages (ATAL-99), volume 1757, pp. 364–378, SpringerVerlag, Berlin, 2000. Coelho, Helder, Luis Antunes, and Luis Moniz. On agent design rationale. In Proceedings of the XI Simposio Brasileiro de Inteligencia Artificial (SBIA), pp. 43–58, Fortaleza, Brazil, 1994. Cohen, Philip R. and Hector J. Levesque. Persistence, intention, and commitment. In Philip Cohen, Jerry Morgan, and Martha Pollack, Eds., Intentions in Communication. MIT Press, Cambridge, MA, 1990. Decker, Stefan, Sergey Melnik, Frank van Harmelen, Dieter Fensel, Michel Klein, Jeen Broekstra, Michael Erdmann, and Ian Horrocks. The semantic web: the roles of XML and RDF. IEEE Internet Computing, 4(5): 63–74, September 2000a. Decker, Stefan, Prasenjit Mitra, and Sergey Melnik. Framework for the semantic web: an RDF tutorial. IEEE Internet Computing, 4(6): 68–73, November 2000b. Friedman-Hill, Ernest J. Jess, the Java expert system shell, 2003, http://herzberg.ca.sandia.gov/jess. Gasser, Les. Social conceptions of knowledge and action: DAI foundations and open systems semantics, Artificial Intelligence, 47: 107–138,1991. Gasser, Les and Michael N. Huhns, Eds. Distributed Artificial Intelligence, Vol. 2. Morgan Kaufmann, London, 1989. Heflin, Jeff and James A. Hendler. Dynamic ontologies on the Web. In Proceedings of American Association for Artificial Intelligence Conference (AAAI), pp. 443–449, Menlo Park, CA, 2000. AAAI Press. Huhns, Michael N., Ed. Distributed Artificial Intelligence. Morgan Kaufmann, London, 1987. Huhns, Michael N. Interaction-oriented programming. In Paulo Ciancarini and Michael Wooldridge, Eds., Agent-Oriented Software Engineering, Vol. 1957 of Lecture Notes in Artificial Intelligence, pp. 29–44, Springer-Verlag, Berlin, 2001. Huhns, Michael N. and Munindar P. Singh, Eds. Readings in Agents. Morgan Kaufmann, San Francisco, 1998. Jennings, Nicholas R. On agent-based software engineering. Artificial Intelligence, 117(2): 277–296, 2000. Mohamed, Abdulla M. and Michael N. Huhns. Multiagent benevolence as a societal norm. In Rosaria Conte and Chrysanthos Dellarocas, Eds., Social Order in Multiagent Systems, pp. 65–84, Kluwer, Boston, 2001. Müller, Jörg P. Markus Pischel, and Michael Thiel. Modeling reactive behavior in vertically layered agent architectures. In Michael J. Wooldridge and Nicholas R. Jennings, Eds., Intelligent Agents, Vol. 890 of Lecture Notes in Artificial Intelligence, pp. 261–276, Springer-Verlag, Berlin, 1994. Nortel Networks. FIPA-OS, 2003. http://fipa-os.sourceforge.net/. Nwana, Hyacinth, Divine Ndumu, Lyndon Lee, and Jaron Collis. ZEUS: A tool-kit for building distributed multi-agent systems. Applied Artificial Intelligence, 13(1),1999. Rao, Anand S. and Michael P Georgeff. Modeling rational agents within a BDI-architecture. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, pp. 473–484,1991. Reprinted in Huhns and Singh [1998]. Russell, Stuart J. and Peter Norvig. Artificial Intelligence: A Modern Approach, 2nd ed. Prentice-Hall, Upper Saddle River, NJ, 2003. Smith, Michael K., Chris Welty, and Deborah McGuiness. Web ontology language (OWL) guide version 1.0, 2003. http://www.w3.org/TR/2003/WD-owl-guide-20030210/.
Copyright 2005 by CRC Press LLC
C3812_C17.fm Page 19 Wednesday, August 4, 2004 8:28 AM
Multiagent Systems for Internet Applications
17-19
Tambe, Milind, David V. Pynadath, and Nicolas Chauvat. Building dynamic agent organizations in cyberspace. IEEE Internet Computing, 4(2): 65–73, February 2000. Weiß, Gerhard, Ed. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge, MA, 1999.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 1 Wednesday, August 4, 2004 8:30 AM
18 Concepts and Practice of Personalization CONTENTS Abstract 18.1 Motivation 18.2 Key Applications and Historical Development 18.2.1 18.2.2 18.2.3 18.2.4
Desktop Applications such as E-mail Web Applications such as E-commerce Knowledge Management Mobile Applications
18.3 Key Concepts 18.3.1 Individual vs. Collaborative 18.3.2 Representation and Reasoning
18.4 Discussion
Manuel Aparicio IV Munindar P. Singh
18.4.1 Advice 18.4.2 Metrics 18.4.3 Futures
References
Abstract Personalization is how a technical artifact adapts its behavior to suit the needs of an individual user. Information technology provides both a heavy need for personalization because of its inherent complexity and the means to accomplish it through its ability to gather information about user preferences and to compute user’s needs. Although personalization is important in all information technology, it is particularly important in networked applications. This chapter reviews the main applications of personalization, the major approaches, the tradeoffs among them, the challenges facing practical deployments, and the themes for ongoing research.
18.1 Motivation Personalization is one of the key attributes of an intelligent system that is responsive to the user’s needs. Personalization emerged with the early desktop applications. The emergence of the Internet as a substrate for networked computing has enabled sharing and collaborative applications. More generally, the Internet enables a dynamic form of trading, where users can carry on individualized monetary or information trades. Further, because Internet environments involve parties engaging each other in interactions from a distance where there is a greater risk that users’ needs would be misunderstood, there is concomitantly an increased need for personalization. This led to a corresponding evolution in the techniques of personalization.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 2 Wednesday, August 4, 2004 8:30 AM
18-2
The Practical Handbook of Internet Computing
It is helpful to make a distinction between personalization and customization. Although sometimes these terms are used interchangeably, the distinction between them is key from the standpoints of both users and technology. Customization involves selecting from among some preset options or setting some parameter values that are predefined to be settable. For example, if you choose the exterior color and upholstery for your vehicle, you are customizing it from a menu of options that the manufacturer made available to you. Likewise, if you choose the ring-tones with which your mobile telephone rings, you are also customizing this functionality of the telephone. In the same vein, when you select the elements to display on your personal page at a leading portal, such as Yahoo!, you are customizing your page based on the menu of options made available by the portal. A desktop example is the reordering of menus and restriction of visible items on a menu to those used recently. Such approaches do not attempt to model what a user might care about, but simply remember the recent actions of users with little or no regard for the context in which they were performed. A variation is when an application helps a user or a system administrator construct a user profile by hand. Such profiles are inherently limited in not being able to anticipate situations that have not been explicitly included in the profile. They can represent only a few dimensions of interests and preferences, often missing out on subtle relationships. Moreover, they require user input for all the profile data, which is a major reason why they are limited to a few dimensions. Such profiles also suffer from the problem of not being able to adapt to any changes in users’ interests. The profile offers degrading performance as the user’s preferences drift and must be reconstructed by hand whenever the user’s interests have changed significantly. Such systems work to a small extent even if they do not work very well. Consequently, users are reluctant to change anything for fear of inadvertently breaking something that does work. By contrast, your experience with a leading e-commerce site, such as Amazon.com, is personalized. This is because a site such as Amazon.com will behave in a manner (chiefly by presenting materials and recommendations) that responds to your needs without the need for any explicit selection of parameters on your part. Therefore, the main difference between personalization and customization lies in the fact that customization involves a more restrictive and explicit choice from a small, predetermined set of alternatives, whereas personalization involves a more flexible, often at least in parts implicit, choice from a larger, possibly changing set of alternatives. A simple approach for personalization involves the development of a domain model or taxonomy. For example, in the world of news portals, we may have a category called business and another category called sports. Business and sports are siblings under news. Under sports, we would have further classifications such as football and hockey. This approach is called category matching. For customization, the user would simply drill down this hierarchy and select the news categories of interest. For personalization, we first try to infer the right categories for the user and then supply the materials that the match the user’s needs. The basic philosophy underlying the main classes of applications of personalization can be understood in terms of decision support. The key point of decision support is to help a human user decide, efficiently and effectively. Specifically, this concept is to be contrasted with automation, conventionally the goal of information technology deployments. The key point of automation is that an agent would replace the human. That is, the agent would be expected to take important decisions on its own accord. It has been shown time and again that such unfettered uses of computational intelligence lead to problems. By keeping the human in the loop and supporting the human’s decision making, we can improve the quality of the typical decision and reduce the delay in obtaining decisions while ensuring that a human is involved, so that the resulting behaviors are trustworthy. We consider decision support as the essence of workflow and logistics applications where personalization can apply. The idea of decision-support extends naturally into other kinds of collaborative applications where peers exchange information about their experiences and expertise. And it even accommodates e-commerce, where the decisions involved are about what to purchase.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 3 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-3
Along the lines of decision-support, we formulate a user’s actions as choices made by the user. Personalization is about learning the kinds of choices that a user would find relevant in his specific circumstances. The term choice indicates the more general extent of this topic than items or products to be reviewed or purchased, which are simply one family of applications. Personalization is inherently tied to the vision of an intelligent agent acting as the user’s personal assistant. In simple, if somewhat idealized terms, an agent is something that watches its user’s environment and the user’s decisions and actions, learns its user’s preferences, and helps its user in making further decisions. From the perspective of personalization, an agent needs the ability to perceive the user’s environment. For desktop applications, this would involve perceiving the user interface to the level of graphical widgets that are displayed and selected. For Internet applications, a personal assistant still can benefit from perceiving the elements of desktop interaction. However, often personalization in the Internet setting is based on Website servers, where the perceptions are limited to browser actions that are transmitted to the Website, the so-called click-stream data. Applications that support human-tohuman collaboration involve a combination of the two. Personal assistants can be implemented via a variety of techniques. The chapter on intelligent agents discusses techniques for building agents in general. Of significance in practice are rule-based approaches as well as machine-learning techniques. This chapter concentrates on the concepts of personalization. Some other relevant topics, especially the specific techniques involved, are discussed in the chapters on Web mining, business processes, policies, mobile services, and pervasive computing. Section 18.2 provides an overview of the main traditional and emerging applications for personalization in networked applications. Section 18.3 reviews the major approaches for personalization. Section 18.4 discusses some practical considerations and the remaining challenges that are guiding current developments in personalization, and summarizes the main themes that drive this large area.
18.2 Key Applications and Historical Development Personalization is, or rather should be, ubiquitous in computing. It has significant usefulness in a variety of applications. The arrival of the Internet in the commercial sense has created a number of applications where personalization is expected. There are three main reasons for this shift in expectations. • When the direct human touch is lost, as in many Internet applications, there is a greater need for intelligence and personalization in supporting effective interactions. • Modern computers make it possible to carry out the computations necessary to produce effective personalization. • The networked nature of the applications means that often many users will be aggregated at a single site, which will then have the data available to produce effective personalization based on collaborative filtering. When little data was available, the best we could do was to offer the user a fixed set of choices about some aspect of the interaction. But across many users, choices can be collected and recommendations made based on shared, common experiences.
18.2.1 Desktop Applications such as E-mail Desktop computing is synonymous with personal computing and provided the first environment for personalization. Two technical approaches, rule-based and machine-learning systems, were part of many early experiments, and both approaches continue to find their way into all current and future areas of personalization. Machine learning has been slower to develop because early algorithms were largely intended for batchoriented, highly parametric data mining. Data mining has its place in personalization as will be discussed, but the individual modeling of desktop end users implies the use of methods that can learn on-the-fly, watching and predicting the user’s situation and actions automatically, without any knowledge engineer-
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 4 Wednesday, August 4, 2004 8:30 AM
18-4
The Practical Handbook of Internet Computing
ing. Early desktop operating systems also inhibited the development of adaptive personalization because most of the operating systems were too “opaque” for learning systems to observe the user’s situations and actions. The MAC OS developed a semantic layer for system and application events, beyond keystroke and mouse events, and this allowed some adaptive personalization systems such as Open Sesame to use some simple but incremental machine-learning techniques. Without access to each application’s events and support of third-party observation of such events, the only other early attempts were by operating systems themselves to model users for more adaptive help systems. For instance, both OS2 and Windows used various combinations of rule-based and adaptive systems to infer the user’s level of expertise and intention in order to provide appropriate support. As Microsoft has also developed desktop applications that rely on elements of the operating system, a Bayesian form of machine learning is increasingly being made available to all applications for modeling and helping end users. In fact, ever since the earliest experiments with such personalization (as in the failed “Bob” interface), Microsoft has persisted in its research and development and has made clear statements about machine learning becoming part of the operating system. Clearly, the value of the system learning about the user rather than the user having to learn about the system remains the future of more intelligent computers. Early desktop applications lacked the machine-learning algorithms and operating system support that would have helped transfer many research ideas for personalization into actual product, but e-mail is an interesting story in the development of rule-based personalization, which continues to survive and has relevance to current issues with spam. Such rule-based systems are not adaptive. Users are typically assumed to be the authors of the rules to suit themselves, and as such can be seen as an advanced form of customization rather than personalization. Still, e-mail filtering was an early implementation of the core idea behind personalization: the computer as an intelligent agent that can be instructed (by one method or another) to assist the user’s individual needs and desires. Rule-based e-mail filtering itself has had limited success although it has been and still is a staple feature of all e-mail products. Rule authoring is notoriously difficult, but early experiences with this problem have lead to some improvements. For instance, rule types are typically preauthored and provided as templates, such as “If from:X and subject keyword:Y, then delete.” Users simply fill in the slots to build such rules for managing their inbox. As well, many corporations discovered that end users tend not to write rules, but that IT can develop them and successfully support their use across an organization. All of this experience is now finding a place in the war against spam mail. Both rule-based and machinelearning systems are available. E-mail applications and desktop operating systems (dominantly Windows and NT, of course) provide much more transparency of APIs and events, allowing third parties to try many approaches and algorithms. For instance, Bayesian learning is a popular method for building individual models — what each individual user decides is spam or not. Spam filtering is largely a case for personalization. On the other hand, “blacklist filtering” and corporate/ISP policies are more or less like rule-based filtering. Users tend not to write such rules, but a central IT function can successfully support a group of users, at least to the extent that much of spam is commonly and clearly decided as spam. E-mail and spam filtering are also good examples of how the Internet has exploded over the last many years, blurring the distinction between desktop and Web-based applications. Web applications are largely based on transparent, machine-readable, standards-based content, which enables agents and algorithms to better attach to the content the user sees and actions the user performs. Most critically, however, the Internet and the Web have enabled the networking of users beyond their individual desktops. Approaches to spam also include ideas for adaptive collaboration, but this kind of personalization was best developed through the rise of the Web and e-commerce personalization.
18.2.2 Web Applications such as E-commerce The Internet explosion of technical and financial interests provided fertile ground for personalization. The Web grew the scope of computer users to include a mass of human browsers and consumers. E-
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 5 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-5
commerce in particular combined general ideas of computer personalization with marketing ideas of personalization. Ideal marketing would entail one-to-one marketing in which each customer is known and treated as a unique individual, and it seemed that Internet technologies would allow electronic capture and use of consumer behaviors, all during the browsing process and at point-of-sale. E-commerce, especially of the business-to-consumer variety, provides the most well-known examples of personalization. A classical scenario is where a customer is recommended some items to purchase based on previous purchases by the given customer as well as on purchases by other customers. At the larger e-commerce sites, with as many as millions of items in their catalogs, such recommendations are essential to help customers become aware of the available products. Customers would often not be able to search effectively through such large catalogs and would not be able to evaluate which of the items were of best use to them. A recommendation can effectively narrow down the search space for the customers. Because of the obvious payoff and prospective returns on investment, e-commerce has been a prime historical motivator for research into personalization. The ideal of personalized commerce requires individual consumer models that are richly detailed with the customer’s browsing and purchase behavior. In particular, the model should understand an individual’s preferences in terms of product features and how they map to the situational desires. In the same way that desktop systems would like to include a personal secretary for every user, e-commerce systems would like to include a personal shopper for every user — an intelligent agent that would learn all the details of each person’s needs and preferences. If not a personal shopper for every consumer, the system should at least be like a well-known salesperson who can understand particular needs, know who they mapped to product features, and remember the consumer’s past preferences when making new recommendations in repeat visits. However, the earliest online catalogues and available other technology left much to be desired. First, understanding the intention of a consumer remains a hard research problem. Search engines must still advance to include individual meaning of user queries. Systems like AskJeeves continue to work on natural language input rather than just keywords, but personalized meaning is a problem even for sentences or keywords. For instance, consider a consumer looking for a “phone.” Assuming that the search engine will retrieve actual phone items (not all the products, such as accessories, that might contain the keyword “phone” for one reason or another), the meaning of phone differs between the corporate user needing a two-line speakerphone and the home consumer wanting a wall phone for the kitchen. Keyword conjunction, navigation of taxonomic trees, and disambiguation dialogs are all also appropriate answers to such problems, but user modeling, knowing what the user probably means and probably wants based on past experience, is largely the method used by great personal shoppers and sales agents who individually know their customers. Second but most historically significant, early online catalogs were very impoverished, backed by databases of only product SKUs without any schematized product features. In other words, the transparency of HTML of the Web made it possible to observe purchase behaviors, but there was little to record and understand of the behavior other than to record it. Necessity is the mother of invention, and given this situation, “collaborative filtering” was invented as a new inference for still being able to make product recommendations from such impoverished systems. The basic idea was this: based only on consumer IDs and product SKUs: given a purchase by one consumer, look for other consumers who have made the same purchase and see what else they have bought as well. Recommend these other products to the consumer. Collaborative filtering enjoyed the hype of the Internet boom and many companies and products such as Firefly, NetPerceptions, and Likeminds and others provided more or less this kind of personalization technology. However, the inference of collaborative filtering is actually a form of stereotyping, akin to market segmentation of individuals into groups. While the personalization industry continued to sell the idea of one-to-one individualized marketing, the technology did not live up to this promise. As well, many mistakes were made in various integrations of the technology. Even if this example is apocryphal, poor consumer experiences were often reported such as when buying a child’s book as a gift and then being persistently stereotyped and faced with children’s’ books recommendations on return visits.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 6 Wednesday, August 4, 2004 8:30 AM
18-6
The Practical Handbook of Internet Computing
Given the general effects of the bursting Internet bubble and general reductions in IT spending over the last few years, personalization companies have also suffered. Even at the time of this writing, industry leaders such as NetPerceptions are planning to liquidate. However, Amazon.com continues to be a strong example of taking lessons learned and using personalization to great effect. For instance, the problem of persistently clustering a customer as purchaser of children’s books is handled by making the collaborative inference only within the context of the current book purchase. In-house experience and persistence in applying personalization have been successful here. Also note that Amazon.com has a general strategy toward personalization, including customization and even basic human factors. For instance, one-click purchasing is low tech but perhaps the most profound design for knowing the customer (in simple profiles) and increasing ease-of-use and purchase. Collaborative filtering also suffered from low accuracy rates and thus low return on investment (ROI). Like marketing campaigns based on market segmentation (again, based on group statistics rather than each individual), hit rates tend to be very low. The “learning” rate of the system is also very slow. For example, consider when a new and potentially “hot” book is first introduced to a catalogue. The collaborative recommendation system is unlikely to recommend it until a large-enough group of consumers has already purchased it because the inference is based on looking at past purchases of product SKUs. In contrast, feature-based inferencing is more direct and faster to inject products as recommendations. If a consumer seems to like books with particular features (authors, subjects, etc.), then a new book with such desirable features should be immediately matched to users, even before a single purchase. However, whereas the business of personalization has suffered, understanding of the technical issues continues and will serve to improve accuracy as personalization reemerges for desktop and Web applications. For instance, the argument between supporters of collaborative inferencing and individual modeling has been the argument in personalization. One side argues about the power of collective knowledge. The other side argues about the power of focused knowledge. Recent research (Ariely, Lynch, and Aparicio, in press) found that both are right and both are wrong. Individual modeling is best when the retailer knows a lot of about the user. Individual modeling allows the feature-based inferencing just mentioned above, which is faster to recommend products and more likely to make accurate recommendations. Such modeling is possible with today’s feature-rich catalogues. However, whenever the retailer is faced with a new consumer (or the consumer expresses interests that the retailer has not yet observed in the consumer) and there is no individual model, then collaborative filtering provides additional information to help. Future personalization systems will combine all this industry experience. Just as the current war on spam is already leveraging experience with early e-mail and collaborative systems, stronger future requirements for personalization will also include the hybridization of individual and collaborative models as well as the hybridization of rule-based and adaptive systems. Before the burst of the Internet bubble, the personalization industry was heading toward richer, feature-based, more dynamic modeling. As such interests recover, these directions will reemerge. Beyond just e-commerce, such requirements for easier, faster, and more accurate search of products will become a requirement for general search. The major search engines are still clearly not ideal. As the Internet continues to grow, personalization will be an inevitable requirement to understand the user’s needs and wants.
18.2.3 Knowledge Management The topic of Knowledge Management deserves a chapter of its own, and in fact many books have been written on this very broad topic. In regard to personalization, it should be noted that many of the personalization companies and technologies crossed into this application as well. The main justification is that user models of desires and preferences as in consumer modeling can be transferred to user models of situations and practices in expertise modeling. All the same issues apply to knowledge management as raised for other applications such as e-mail and e-commerce. Rule-based systems and Knowledge Engineering are likely to be included in a hybrid
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 7 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-7
system, but here too, knowledge engineering is a secondary task that is difficult and takes users away from their primary task. People are less inclined to read and write best practices than they are to simply practice. From a corporate perspective, primary tasks are more related to making money. As such, tacit or implicit knowledge management is preferred. Tacit knowledge can be captured by machine-learning techniques that, as in the case of personalization, can watch user situations, actions, and outcomes to build individual and collaborative models of user practice. Beyond the recommendation of books and CDs, knowledge management applications tend to be more serious, in the sense that the costs and consequences of decisions extend beyond $10 to $20 consumer purchases. In particular, advanced techniques of personalization will be required for “heavy lifting” sorts of knowledge management and decision support, such as for individualized medical practice. The importance of individual differences in pharmacology is increasingly outweighing population-level statistics about a drug’s safety and efficacy. Such individual differences can be fundamentally one-to-one between a patient and a drug, not predicted by group memberships based on race, age, geography, or behavior. Workflow enactment is another important application area where personalization finds use. Workflows involve executing a set of tasks with a variety of control and data-flow dependencies between them. Importantly, some of the tasks are performed by humans and some by machines. Thus, workflows involve repeated human involvement. Often, the human is faced with multiple choices for each decision. Personalization of some of the decisions might be appropriate and would reduce the cognitive overload on the human. Workflows can be enacted over business intranets or even over the public Internet. Often, they are wrapped through e-mail. Workflow is a good example of an application area that underpins many heavy lifting applications and that can be improved by individual user modeling and personalization. Whereas workflows tend to be rigid and developed through hard knowledge-engineering of the “correct” procedures, real organizational systems also operate by informal ad hoc workflows between people. These workflows are defined by user preferences and experiences. For instance, if a user routes an activity to a particular person with particular other orders, the user is likely to use the same routing for a future activity that is similar to the first. As the system builds an individual model for each user to learn various situations and actions, each model becomes a personal assistant for helping to manage the workflow inbox, much like for e-mail but within the context of more structured activities and processes.
18.2.4 Mobile Applications The recent expansion of mobile applications has opened up another major arena for personalization. Mobile applications are like other networked applications in some respects. However, they are more intense than applications that are executed in wireline environments because they typically involve devices with smaller displays, limited input capabilities, reduced computing power, and low bandwidth. Further, these devices are typically employed when the user is on the run, where there is a reduced opportunity for careful interactions. For these reasons, personalization is more important in mobile than in other environments. Further, mobile devices tend to be more intensely personal by their very nature. For example, a phone or PDA is typically with their user at all times and are used for purposes such as communication and personal information management (e.g., calendar and address book), which inherently tie them to the user. Thus they provide a natural locus for personalization. Mobile environments can and often support an ability to estimate, in real time, the position of a mobile device. A device’s position corresponds to its X–Y coordinates. Position must be mapped to a geographical location, which is a meaningful abstraction over position. That is, location corresponds to something that would relate to a user’s viewpoint, such as work or home, or maybe even the 3rd aisle in the supermarket. Although the precision of position determination varies, technology is becoming quite accurate and can support a variety of applications. This has led to an increasing interest in location-based services in widearea settings. Currently proposed location-based services are limited and may well not become widely popular. Examples include pushing coupons to a user based on what stores are close to his or her current location. We believe that more conventional applications with a location component would perhaps be
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 8 Wednesday, August 4, 2004 8:30 AM
18-8
The Practical Handbook of Internet Computing
more valuable. For example, a user’s participation in a logistics workflow may be guided based on his or her proximity to a site where a particular step of the workflow is being performed. Both kinds of applications involve an adaptation of a system’s actions to suit the (perceived) needs of a user and thus are a form of personalization. Further, any application-level heuristics would be applied in conjunction with user-specific preferences, e.g., whether a user wishes to have coupons pushed to him or her and whether he or she truly wishes to participate in the workflow when he or she is in a particular location. A more general way to think of personalization in mobile environments is to consider two concepts that apply not merely to mobile environments but to all networked environments. Networked environments involve an ability to discern the presence of users. Presence deals with whether a given user is present on a given network. Network here could be a physical subnet or an application-level construct such as a chat room. Presence and location together lead to the richer concept of availability, which deals with more than just whether a user is in a certain place or on a certain network, but whether the user is available for a certain task or activity. Ultimately, knowing a user’s availability is what we care about. However, a user’s availability for a task depends on whether the task is relevant for him or her right now. This cannot easily be captured through simplistic rules, but requires more extensive personalization. Indeed, personalization in many applications can be framed as inferring a user’s availability for various tasks. For example, the decision whether to send a user a coupon or an alert can be based on an estimation of the user’s availability for the task or tasks with which the given coupon or alert is associated. Mobile computing is still in its early days of development as with desktop and Web-based applications. The ideas for personalization are emerging from research just as they were in the early days of desktop and Web applications. For instance, Remembrance Agent (MIT) is an intelligent agent that remembers what a mobile user does within the context of locale (and other aspects of the entire physical situation) [Rhodes and Maes, 2000]. If the user returns to some place and begins working on a document that was opened in that place in the past, Remembrance Agent could recall other documents associated with the given document and place. Again, the basic vision is of a personal assistant that can watch and learn from the user, making relevant recommendations that are highly accurate (without annoying false alarms). One early example of adaptivity for handheld devices is found in the handwriting recognition approach invented by Jeff Hawkins for the Palm Pilot and Handspring. Rather than have the user learn how to completely conform to the letter templates, Graffiti can adapt to example provided by the user. This particular example is on the fringe of personalization interests and is more akin to user-dependent input recognition systems (also including voice recognition), but it represents the philosophy and technology of how adaptive systems will be established one point at a time and then grow to be pervasive. Jeff Hawkins used an associative memory for such machine learning and suggests that, one day, more silicon will be devoted to associative memories than for any other purpose [Hawkins, 1999]. As with Microsoft’s interest in Bayesian learning as part of the operating system, personalization will find a fundamental place in all operating systems, including handheld and other new computing devices. For instance, the proliferation of television channels and content makes browsing of a viewer’s guide increasingly difficult. TV viewing is usually a matter of individual mood, interest, and preference, and personalization technology is already being introduced to set-top devices. Aside from CD purchasing, which was one of the original applications of collaborative filtering, music listening will also benefit from personalization technology to learn the moods and preferences of each individual.
18.3 Key Concepts After discussing the applications of personalization, we are now ready to take a closer look at the key concepts, especially as these are instantiated in the major technical approaches. Even as personalization has expanded its role into both desktop and Internet settings, and especially into applications such as email that combine elements of desktop and Internet applications, it has had some challenges. One set of challenges arises from the hype associated with some of the early work on personalization. Because people expected personalization to work near perfectly with little or no user input, they were
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 9 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-9
disappointed with the actual outcome. The trade press was harsh in its criticisms of personalization. The main problem identified was that personalization was too weak to be cost effective in practical settings where there are many users and the users are not technically savvy. In such cases, simplistic techniques could only yield limited personalization, which was not adequate for effectively guiding operations. The techniques had an extremely low learning rate and yielded high errors. For example, in e-commerce settings, it was difficult to introduce new products because the approach would require a number of customers to have bought a new product before being able to incorporate it into a recommendation. The systems were not personal enough and yielded at best coarse-grained market segmentation, not mass customization or one-to-one marketing as promised. The criticisms of the IT press are in many ways valid. There is clearly a need for richer, true individualization, which is being widely recognized in the industry. This realization has led to an increased focus on richer modeling for higher accuracy and resulting value. In the newer approaches, personalization applies to both individual taste and individual practice. The evolution of personalization as a discipline can be understood in terms of a fundamental debate, which attracted a lot interest in the 1990s. This debate is between two main doctrines about personalization: should it be based on individual models or on collaborative representations. This debate has petered out to some extent because of the recognition that some sort of a hybrid or mixin approach is preferable in practice. However, the concepts involved remain essential even today.
18.3.1 Individual vs. Collaborative Individual modeling involves creating a model for each user. Modulo the difficulties of constructing such models, they can be effective because they seek to capture what exactly a user needs. Collaborative modeling, of which the best known variant is collaborative filtering, involves relating a user to other users and using the choices of other users to filter the choices to be made available to the given user. In its canonical form, each of the above approaches has the serious deficiency of narcissism. • With individual modeling, it is difficult to introduce any novelty into the model. Specifically, unless the user has experienced some choices or features, it is not possible to predict what the relevance might be. In this sense, the risk of narcissism is obvious. • With collaborative modeling too, it is difficult to introduce novelty, although the scope is expanded in that exploration by one or a few users can, in principle, help other users. The risk of narcissism still remains at a system level because, if the users chosen to collaborate with the given user have not yet experienced some choices or features, those choices or features will not be recommended to the given user. The above debate has a simple resolution. Hybridization of the individual and collaborative approaches proves more effective in practice. In a setting such as collaborative filtering, where a recommendation is based on an aggregation by the system of the choices made by relevant users, hybridization can be understood based on the following conceptual equation. Here user_weight corresponds to the relative importance assigned to the individual and the collaborative aspects. The equation could be applied to make overall judgments about relevance or at the level of specific attributes or features. Relevance = (user_experience * user_relevance) + ((1 – user_experience) * collaborative_relevance)
Hybridization does not necessarily have to apply in an aggregated manner. An alternative family of approaches would reveal to the user identities of the other users and their choices and let the given user make an informed choice.
18.3.2 Representation and Reasoning Early collaborative filtering inference was too weak to be cost effective. A general weakness of collaborative approaches is that they are not individualized — they essentially stereotype users. If a user is stereotyped
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 10 Wednesday, August 4, 2004 8:30 AM
18-10
The Practical Handbook of Internet Computing
incorrectly, then suboptimal recommendations result. An example of such an error is in the approach known as k-means clustering. This approach involves constructing clusters of users based on an appropriate notion of proximity. A user is then stereotyped according to the cluster in which the user falls, that is, the user is treated as if he or she were the centroid of the cluster to which he or she is the closest. Thus a user who is an outlier in his chosen cluster would be likely to be misunderstood. By contrast, the k-nearest-neighbors approach constructs a cluster for the given user where the user is close to being the centroid. The recommendation is then based on the choices of the user’s nearest neighbors. Figure 18.1 illustrates this point schematically. Some basic concepts of standard machine learning are worth reviewing. Machine-learning approaches can be supervised or unsupervised and online or offline. Supervised approaches require inputs from users in order to distinguish between good and bad cases, the user explicitly or implicitly marking the relevant and irrelevant cases. Unsupervised approaches seek to learn a user’s preferences without explicit inputs from the user but based instead on observed actions or other coincidences that are somehow reinforced or self-organized. In our setting, such reinforcement may be inferred based on heuristics such as that a user who visits a Web page repeatedly likes the content presented therein. An example of selforganizing coincidences would be in learning to associate all the items of a shopping cart. Given another shopping cart, recalling such associations to additional items provides a method of cross-selling. The evaluation of machine-learning approaches is based on estimations of true vs. false positives and negatives usually called “hits,” “misses,” “false alarms,” and “missed opportunities.” Sometimes the utility of a correct decision and the cost of an error must be balanced to settle on an optimal risk–reward payoff. For example, if the potential payoff from an unexamined good alternative is high, and the cost of examining an undesirable alternative is low, it might be safer to err on the side of making additional suggestions. Finally, personalization can be applied in two main kinds of business settings: back room and front room. Back-room settings are those where the data pertaining to personalization are gathered up and processed offline, for example, to plan marketing campaigns and to understand whether a Website ought to be reorganized. By contrast, front-room settings seek to apply the results of personalization in a customer-facing activity. Examples of these include carrying on dialogs with users as for customer relationship management (CRM), and making real-time recommendations for products. These call for rapid adaptation to adjust quickly to what the user needs. On the other hand, back-room applications, such as sales campaigns, can be slower and more static. Here, rule-based systems are appropriate to
text
text
FIGURE 18.1 The clusters approach distributes the data points (shown by small stars) among a number of clusters. The given user (shown by a large star) falls into one of the clusters and is treated like the center of that cluster. The nearest-neighbors approach associates the given user with other users that are the closest to it. In effect, the given user is the center of its own cluster. The nearest-neighbors approach can thus make more relevant predictions.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 11 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-11
implement sales policies and explicit sales promotions. Online machine-learning approaches function incrementally as the data comes in, and are appropriate for front-room settings, whereas offline approaches operate on data that is gathered up through several interactions, perhaps across a large data warehouse and are appropriate for back-room settings.
18.4 Discussion Aside from the past ups and downs of the Internet revolution, personalization is a fundamental requirement and will continue to gather lessons learned for building effective total solutions. So what is one now to do? This section provides a review of past experiences, how to integrate and measure, and what is yet to be discovered. Difficulties remain for search in both e-commerce and in general across the Internet. Search engines in general seem to have waves of progress from one sort of technology to another, and personalization will eventually be required for more reasonable accuracy.
18.4.1 Advice The following pieces of advice include some practical matters of interest to customers of personalization as well as industry consultants’ beliefs about personalization vendors. 18.4.1.1 Hybridization First, consider all the methods that have been described. As evidenced by the breath of topics in one of the last major reviews (Riecken, 2000), personalization means different things to different people. This is true because personalization solutions need to include issues of human factors as well as underlying technology. Think low tech as well as high tech. Think of customization as well as personalization. For instance, Amazon.com includes “one-click” as a simple or low-tech but powerful method as well as its higher-tech collaborative filtering methods. As described above, also consider the hybridization of individual and collective approaches. The question is no longer about which is better, but about which is better when, and how the combination can provide best accuracy overall. Also consider the use of rule-based systems, but not for end-user authoring. Whereas adaptive technologies are preferred for automatically tuning in to customer behaviors, rules still have a strong place if centrally administered, especially for back-end functions such as sales policy and campaign management. 18.4.1.2 Richer Detail Just as SKU-based online catalogues matured into feature-based descriptions and this allowed better forms of personalization, expect increasing richness in data source descriptions and in models themselves. On the data side, industry standard schemas will help better identify the meaning of items and features. For instance, taxonomies will help organize product categories, and more complete ontologies (more than just taxonomies) will help describe the structural meaning of items and features. As these improved descriptions are fed into personalization systems, more intelligent algorithms can be included such as for analogical reasoning, partially transferring what is learned from trucks to cars and vice versa, for example. On the model side, algorithms must become more dynamic and internally complex. Many current implementations suffer from their linearity. In other words, many approaches keep only independent variable accounts of each item or feature. Such linear modeling is inexpensive. Naïve Bayesian modeling, for instance, shows some very good results. However, as decisions become more complex and better accuracy is required, individual models must be based on nonlinear techniques. In other words, the models must assume context-dependency, in that features can be dependent on one another. The desirability of one feature is often largely dependent on the other features. In heavy-lifting decision support,
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 12 Wednesday, August 4, 2004 8:30 AM
18-12
The Practical Handbook of Internet Computing
the interaction of elements within a given situation will become very critical in representation and modeling. Of course, nonlinear modeling in more difficult and computationally expensive. For one problem, many nonlinear methods, such as most neural networks, are highly parametric (require much backroom knob-tuning) and are trained in the batch mode (cannot be dynamically updated case by case on-thefly). In contrast, a class of modeling called “lazy learning” includes case-based, memory-based, and other similar techniques that could provide nonparametric, incremental, nonlinear modeling as required. Newer algorithms must emerge to handle vast numbers of individual models, while also becoming internally richer and more powerful in prediction. 18.4.1.3 Grounding Without the intelligence and common sense of real sales people and experts to assist the user, the grounding problem — recording situations and actions and ensuring what they mean — has been difficult. For instance, the apparent time a user spends on a page can be totally misleading if the user were to walk away for a meeting or just to get coffee. This will continue to be a matter of research for online question and answer systems, but practical progress is also possible. New ideas in information retrieval will be presented next under Metrics, but before situations and actions can be measured, they must at least be available. In the design of the user interface, it is critically important to gather and display the context of the user. What is the task? What is already included — whether it be a shopping basket or a research paper? What did the user do? What did the user really intend to do? How can we measure the final outcome? For instance, is the user browsing or intending to buy? But even a purchase is not the final outcome. A fully integrated system must also know whether an item is returned or not, to ground a better sense of customer satisfaction, not merely the first act of purchasing.
18.4.2 Metrics Personalization is a specialized enhancement of information retrieval. In general, a user submits a query and the system returns a set of answers. Depending on the task, these answers might be documents, products, or both. Therefore, information retrieval (IR) measures are often quoted as measures for personalization. For instance, precisions and recall are classic. Precision measures how well the system recommends “hits” rather than “false alarms.” Recall, in compliment, measures how well the system recommends all the hits. The relevance score of a system is often measured by marking the “right” answers in the catalog that the system should recommend for any given query, then reporting the precision and recall for the query. However, recent IR research is moving to better definitions of relevance. For one, relevance is understood as more personal. Rather than assuming some set of right answers, a philosophy of personalized relevance is coming to dominate. What the user intends by even any single query word is often different from person to person. Therefore, the precision and recall of a recommendation system can be measured only as a matter for each person’s measure of success. IR is also moving to notions of effective relevance. This again is part of the grounding problem, but we can measure the effectiveness of an answer in several ways. First, if a user thinks it is worth the effort to at least explore the recommendation, we can consider this as some measure of potential relevance. IR is also moving to allow partial relevance. Rather than marking an item as a hit or miss, as being absolutely relevant or irrelevant, many cases of browsing and querying are more complex and fuzzy, and therefore, recommendations can only by more or less relevant. To the degree that the user makes a cost-benefit tradeoff in deciding to investigate, we can attempt to measure some degree of partial relevance. Of course, business ROI must be grounded in business revenue as much as possible. Therefore, we can consider the purchase action as marking more effective relevance of the item; as mentioned, nonreturn of an item marks even greater effectiveness. However, even this should not be the ultimate measure.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 13 Wednesday, August 4, 2004 8:30 AM
Concepts and Practice of Personalization
18-13
Instead, customer satisfaction should be measured by repeat visits by the customer, which is typically the real bottom line for customer relationship management.
18.4.3 Futures Effective personalization systems can be deployed and measured as described, but several remaining issues should also be kept in mind. These topics are a matter of research, but near-term awareness and action are also advised. 18.4.3.1 Novelty Injection Repeat visits by the customer should be the ultimate measure of personalized recommendation systems, but ongoing research indicates the requirement for more advanced technologies that will be able to achieve it. There are two problems to overcome. The first problem stems from the assumptions of traditional information retrieval: that precision and recall to a given query can be measured by return of the right answers, which are absolutely relevant, avoiding return of false positives that are absolutely not relevant. The underlying assumption here is also that there is only one query — one shot to get it right or wrong. The second problem stems from the nature of personalization technology itself. All adaptive recommendation systems will tend to be narcissistic. Whether individual or collaborative, recommendation systems observe user behaviors and make new recommendations based on such past experience. The danger is that the recommendation system focuses the user on only such past behaviors, which creates myopic users and limited recommenders. Instead, personalization technologies must assume partial relevance and the need to explore more uncertain items to determine relevance or not. Personalization technologies must also explicitly inject novelty and variety into recommendation sets. For e-commerce systems, this is the definition of real choice. If a recommendations system suggests all very relevant but similar items, then it is not really giving the user a choice. In decision support, such novelty and variety is a matter of thinking out of the box, considering alternative hypotheses, and learning about new sources. Indeed, the injection of novelty is suited to building a long-term customer relationship (Dan Ariely and Paulo Oliveira), but there is a delicate tradeoff. Early sessions with the customer need to optimize relevance for each single query in order to build trust and to optimize purchase “hits” for the single occasion. However, long-term relationships are built by some cost to pure relevance in favor of novelty. By blending items most likely to be purchased based on past experience with items that are partially relevant but somewhat unfamiliar, the user optimizes his or her breadth of knowledge about the product catalog while the system optimizes the breadth of knowledge about the user. Technically, this requires user models that can report levels of user experience expressed as novelty in a given product, and ensure variety, the dissimilarity between a set of relevant suggestions. 18.4.3.2 Context Switching The role of context is a growing issue within artificial intelligence. Even the most principled rules are not thought of as true or false, but true or false under a set of other specific conditions (other rules). Machine learning is also grappling with contextual issues of different perspectives and situational dependencies. Personalization can be seen as resonant with such thought. Rather than think of absolute right answers, individual modeling aims to capture the perspectives of different users as at least one dimension of context. Yet, the representation, capture, and use of context will remain a difficult set of issues. As mentioned above, one problem is in how to extract and hold context to understand the intentions and current situation of the user. For instance, in searching a topic for a research paper, it would be helpful for the system to also “look” at any already selected references and perhaps even the status of the paper as it is being written. For applications such as e-commerce and entertainment, it would be helpful to understand the location and mood of the recipients before making recommendations. Carrying on a dialog with the user without being annoying and intrusive is a major user-interface and application-design problem.
Copyright 2005 by CRC Press LLC
C3812_C18.fm Page 14 Wednesday, August 4, 2004 8:30 AM
18-14
The Practical Handbook of Internet Computing
Context switching is a deeper and harder problem. Situations change. Consumer tastes change. The fundamental problem of context for adaptive systems is this: when is a model wrong or merely inappropriate? Ideally, a truly intelligent sales person remembers the customer’s habits for appropriate repetition, injects reasonable variety into recommendations, and moves quickly to learn new tastes as the customer changes. However, even when tastes change, the old knowledge is not exactly wrong and no longer needed: When the consumer again changes taste, perhaps back to the first model, the salesperson should easily remember (not relearn) the earlier habits of such a well-known customer. Knowing when to switch and how to switch across radical changes, either in e-commerce customers or decision-support situations, remains a challenge. There is currently no elegant technical answer. Some absolute context switching, such as a model of “on diet” and a separate model for “off diet” can be explicitly made and controlled. However, human brains do this more elegantly and with ease, and we should expect intelligent personalization agents of the future to be richly detailed and dynamic enough even to this extreme. Neuroscience and psychology have known how associative memories are formed, inhibited (but not forgotten), and then re-recalled. Some computational modeling of these phenomena has been advanced [Aparicio and Strong, 1992] but not yet fully researched and applied. Once fully developed, such advances will provide long-term knowledge and responsiveness to each customer, which is the holy grail of personalization.
References Aparicio, Manuel, Donald Gilbert, B. Atkinson, S. Brady, and D. Osisek, “The role of intelligent agents in the information infrastructure.” 1st International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM), London, 1995. Aparicio, Manuel and P.N. Strong, Propagation controls for true Pavlovian conditioning. In Motivation, Emotion, and Goal Direction in Neural Networks. D.S. Levine and S.J. Level (Eds.). Lawrence Erlbaum, Hillsdale, NJ, 1992. Ariely, Dan, John G. Lynch Jr., and Manuel Aparicio, Learning by collaborative and individual-based recommendation agents. Journal of Consumer Psychology. In press. Breese, John S. David Heckerman, and Carl Kadie, Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 1998, pp. 43–52. Good, Nathaniel, J. Ben Schafer, Joseph A. Konstan, Al Berchers, Badrul Sarwar, Jon Herlocker, and John Riedl, Combining collaborative filtering with personal agents for better recommendations. Proceedings of the National Conference on in Artificial Intelligence, 1999, pp. 439–446. Hawkins, Jeff, “That’s not how my brain works — Q&A with Charles C. Mann.” MIT Technology Review, July/August 1999, pp. 76–79. Horvitz, Eric Principles of mixed-initiative user interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), ACM Press, New York, 1999, pp. 159–166. Riecken, D. (Guest Ed.). Special issue on personalization. Communications of the ACM, 43(9), Sept. 2000. Resnick, Paul, Neophytos Iacovou, Mitesh Suchak, Peter Bergstorm, and John Riedl, GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the ACM Conference on Computer Supported Cooperative Work, ACM Press, New York, 1994, 175–186. Rhodes, Bradley J. and Pattie Maes, Just-in-time information retrieval agents. IBM Systems Journal (special issue on the MIT Media Laboratory), Vol. 29, Nos. 3 and 4, pp. 685–704, 2000. Sarwar, Badrul M., George Karypis, Joseph A. Konstan, and John Riedl, Analysis of recommendation algorithms for e-commerce. ACM Conference on Electronic Commerce, 2000, pp. 158–167. Shardanand, Upendra and Pattie Maes, Social Information Filtering: Algorithms for Automating “Word of Mouth.” Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Vol. 1, ACM Press, 1995, pp. 210–217.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 1 Wednesday, August 4, 2004 8:31 AM
19 Online Marketplaces CONTENTS 19.1 What Is an Online Marketplace? 19.2 Market Services 19.2.1 Discovery Services 19.2.2 Transaction Services
19.3 Auctions 19.3.1 Auction Types 19.3.2 Auction Configuration and Market Design 19.3.3 Complex Auctions
19.4 Establishing a Marketplace 19.4.1 Technical Issues 19.4.2 Achieving Critical Mass
Michael P. Wellman
19.5 The Future of Online Marketplaces References
Even before the advent of the World Wide Web, it was widely recognized that emerging global communication networks offered the potential to revolutionize trading and commerce [Schmid, 1993]. The Web explosion of the late 1990s was thus accompanied immediately by a frenzy of effort attempting to translate existing markets and introduce new ones to the Internet medium. Although many of these early marketplaces did not survive, quite a few important ones did, and there are many examples where the Internet has enabled fundamental change in the conduct of trade. Although we are still in early days, automating commerce via online markets has in many sectors already led to dramatic efficiency gains through reduction of transaction costs, improved matching of buyers and sellers, and broadening the scope of trading relationships. Of course, we could not hope to cover in this space the full range of interesting ways in which the Internet contributes to the automation of market activities. Instead, this chapter addresses a particular slice of electronic commerce in which the Internet provides a new medium for marketplaces. Since the population of online marketplaces is in great flux, we focus on general concepts and organizing principles, illustrated by a few examples rather than attempting an exhaustive survey.
19.1 What Is an Online Marketplace? Marketplace is not a technical term, so unfortunately, there exists no precise and well-established definition clearly distinguising what is and is not an online marketplace. However, we can attempt to delimit its meaning with respect to this chapter. To begin, what do we mean by a “market”? This term, too, lacks a technical definition, but for present purposes, we consider a market to be an interaction mechanism where the participants establish deals (trades) to exchange goods and services for monetary payments (i.e., quantities of standard currency). Scoping the “place” in “marketplace” can be difficult, especially given the online context. Some would say that the Web itself is a marketplace (or many marketplaces), as it provides a medium for buyers and
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 2 Wednesday, August 4, 2004 8:31 AM
19-2
The Practical Handbook of Internet Computing
sellers to find each other and transact in a variety of ways and circumstances. However, for this chapter we adopt a narrower conception, limiting attention to sites and services attempting to provide a wellscoped environment for a particular class of (potential) exchanges. Many preexisting marketplaces are now online simply because the Internet has provided an additional interface to existing protocols. For example, online brokerages have enabled any trader to route orders (with some indirection) to financial exchanges and electronic crossing networks (e.g., Island or REDIBook). Although such examples certainly qualify as online marketplaces, the plethora of different interfaces, and usually nontransparent indirections, do make them less pure instances of online marketplaces. For high-liquidity marketplaces like equity exchanges, these impurities may not substantially impede vibrant trade. For newer and more completely online marketplaces, directness and transparency are hallmarks of the value they provide in facilitating exchange. Perhaps the most well-known and popularly used online marketplace is eBay [Cohen, 2002], an auction site with over 12 million items (in hundreds of categories and subcategories) available for bid every day. The canonical “person-to-person” marketplace, eBay has upwards of 69 million registered users.1 Whereas many eBay sellers (and some buyers) earn their livelihood trading on the site (which is why the “consumerto-consumer” label would be inaccurate), participation requires only a lightweight registration process, and most aspects of the transaction (e.g., shipping, payment) are the ultimate responsibility of the respective parties to arrange. Note the contrast with the brokered trading model employed in financial markets, where securities are generally exchanged between broker–dealers on behalf of clients. Many online marketplaces define commerce domains specific to an industry or trading group. One of the most prominent of these is Covisint, formed in 2000 by a consortium of major automobile manufacturers (Ford, General Motors, DaimlerChrysler, and Renault–Nissan, later joined by Peugeot Citroen) to coordinate trading processes with a large universe of suppliers.2 Covisint provides electronic catalog tools, operates online procurement auctions, and supports a variety of document management and information services for its trading community. Although many of the online marketplaces launched by industry consortia in the late 1990s have since failed, as of 2002 there were still dozens of such exchanges, with projections for renewed (albeit slower) growth [Woods, 2002]. Similarly, the number of person-to-person sites had reached into the hundreds during the speculative Internet boom. Clearly, eBay dominates the field, but many niche auctions remain as well. The examples of person-to-person auctions (eBay), industry-specific supplier networks (Covisint), and online brokerages illustrate the diversity of online marketplaces that have emerged on the Internet over the past decade. Another category of major new markets are the exchanges in electric power and other commodities corresponding to recently (partially) deregulated industries. Many of these are hidden from view, running over private (or virtually private) networks, but these, too, constitute online marketplaces, and play an increasingly significant role in the overall economy.
19.2 Market Services What does a marketplace do? In order to facilitate conduct of trade, a marketplace may support any or all phases in the lifecycle of a transaction. It can be useful to organize commerce activities into three stages, representing the fundamental steps that parties must go through in order to conduct a transaction. 1. The Connection: searching for and discovering the opportunity to engage in a commercial interaction 2. The Deal: negotiating and agreeing to terms 3. The Exchange: executing a transaction
1 2
Source: http://pages.ebay.com/community/aboutebay and internetnews.com, May 2003. 76,000 members as of January 2003. Source: http://www.covisint.com/about/history.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 3 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-3
Connection
Deal
Exchange
discover
negotiate
execute
FIGURE 19.1 The fundamental steps of a commerce interaction.
These steps are illustrated in Figure 19.1. Of course, the boundaries between steps are not sharp, and these activities may be repeated, partially completed, retracted, or interleaved along the way to a complete commercial transaction. Nevertheless, keeping in mind the three steps is useful as a way to categorize particular marketplace services, which tend to focus on one or the other. In this chapter, we focus on the negotiation phase, not because it is necessarily the most important, but because it often represents the core functionality of an online marketplace. Discovery and exchange are relatively open-ended problems, with services often provided by third parties outside the scope of a particular marketplace, as well as within the marketplace itself. Moreover, several aspects of these services are covered by other chapters of this handbook. Nevertheless, a brief overview of some discovery and transaction facilities is helpful to illustrate some of the opportunities provided by the online medium, as well as the requirements for operating a successful marketplace.
19.2.1 Discovery Services At a bare minimum, marketplaces must support discovery to the extent of enabling users to navigate the opportunities available at a site. More powerful discovery services might include electronic catalogs, keyword-based or hierarchical search facilities, and so forth. The World Wide Web has precipitated a resurgence in the application of information retrieval techniques [Belew, 2000], especially those based on keyword queries over large textual corpora. Going beyond generic search, a plethora of standards have been proposed for describing and accessing goods and services across organizations (UDDI [Ariba Inc. et al., 2000], SOAP, a variety of XML extensions), all of which support discovering connections between parties to a potential deal. For the most part these are designed to support search using standard query-processing techniques. Some recent proposals have suggested using semantic Web [Berners-Lee et al., 2001] techniques to provide matchmaking services based on inference over richer representations of goods and services offered and demanded [Di Noia et al., 2003, Li and Horrocks, 2003]. The task of discovering commerce opportunities has inspired several innovative approaches that go beyond matching of descriptions to gather and disseminate information relevant to comparing and evaluating commerce opportunities. Here we merely enumerate some of the important service categories: • Recommendation [Resnick and Varian, 1997; Schafer et al., 2001]. Automatic recommender systems suggest commerce opportunities (typically products and services to consumers) based on prior user actions and a model of user preferences. Often this model is derived from cross-similarities among activity profiles across a collection of users, in which case it is termed collaborative filtering [Riedl and Konstan, 2002]. A familiar example of collaborative filtering is Amazon.com’s “customers who bought” feature. • Reputation. When unfamiliar parties consider a transaction with each other, third-party information bearing on their reliability can be instrumental in establishing sufficient trust to proceed. In particular, for person-to-person marketplaces, the majority of exchanges represent one-time interactions between a particular buyer and seller. Reputation systems [Dellarocas, 2003; Resnick et al., 2002] fill this need by aggregating and disseminating subjective reports on transaction results across a trading community. One of the most prominent examples of a reputation system is eBay’s “Feedback Forum” [Cohen, 2002; Resnick and Zeckhauser, 2002], which some credit significantly for eBay’s ability to achieve a critical-mass network of traders.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 4 Wednesday, August 4, 2004 8:31 AM
19-4
The Practical Handbook of Internet Computing
• Comparison shopping. The ability to obtain deal information from a particular marketplace suggests an opportunity to collect and compare offerings across multiple marketplaces. The emergence on the Web of price comparison services followed soon on the heels of the proliferation of searchable retail Web sites. One early example was BargainFinder [Krulwich, 1996], which compared prices for music CDs available across nine retail Web sites. The University of Washington ShopBot [Doorenbos et al., 1997] demonstrated the ability to automatically learn how to search various sites, exploiting known information about products and regularity of retail site organization. Techniques for rapidly adding sites and product information have continued to improve, and are employed in the many comparison-shopping services active on the Web today. • Auction aggregation. The usefulness of comparison shopping for fixed-price offerings suggested that similar techniques might be applicable to auction sites. Such information services might be even more valuable in a dynamically priced setting, as there is typically greater inherent uncertainty about the prevailing terms. The problem is also more challenging, however, as auction listings are often idiosyncratic, thus making it difficult to recognize all correspondences. Nevertheless, several auction aggregation services (BidFind, AuctionRover, and others) were launched in the late 1990s. Concentration in the online auction industry combined with the difficulty of delivering reliable information has limited the usefulness of such services, however, and relatively few are operating today.
19.2.2 Transaction Services Once a deal is negotiated, it remains for the parties to execute the agreed-upon exchange. Many online marketplaces support transaction services to some extent, recognizing that integrating “back-end” functions — such as logistics, fulfillment, and settlement — can reduce overall transaction costs and enhance the overall value of a marketplace [Woods, 2002]. A critical component of market-based exchange, of course, is payment, the actual transfer of money as part of an overall transaction. The online medium enables the automation of payment in new ways, and, indeed, the 1990s saw the introduction of many novel electronic payment mechanisms [O’Mabony et al., 1997], offering a variety of interesting features [MacKie-Mason and White, 1997), including many not available in conventional financial clearing systems. For example, some of the schemes supported anonymity [Chaum, 1992], micropayments [Manasse, 1995], or atomic exchange of digital goods with payment [Sirbu and Tygar, 1995]. As it turned out, none of the innovative electronic payment mechanisms really caught on. There are several plausible explanations [Crocker, 1999], including inconvenience of special-purpose software, network effects (i.e., the need to achieve a critical mass of buyers and sellers), the rise of advertisingsupported Internet content, and decreases in credit-card processing fees. Nevertheless, some new payment services have proved complementary with marketplace functions, and have thrived. The most well-known example is PayPal, which became extremely popular among buyers and sellers in person-to-person auctions, who benefited greatly from simple third-party payment services. PayPal’s rapid ascension was in large part due to an effective “viral marketing” launch strategy, in which one could send money to any individual, who would then be enticed to open an account.
19.3 Auctions Until a few years ago, if one said the word “auction,” most bearers would conjure up images of hushed rooms with well-dressed art buyers bidding silently while a distinguished-looking individual leads the proceeding from a podium with a gavel. Or, they might have envisioned a more rowdy crowd watching livestock while yelling out their bids to the slick auctioneer speaking with unintelligible rapidity. Another common picture may have been the auctioneer at the fishing dock lowering the price until somebody agrees to haul away that day’s catch. Today, one is just as likely to suggest a vision (based on direct experience) of an auction happening online. Thus is the extent to which online auctions have emerged as a familiar mode of commercial interaction.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 5 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-5
Speculations abound regarding the source of the popularity of online auctions. For some, it is a marketing gimmick–enticing customers by making a game of the buying process. Indeed, participating in auctions can be fun, and this factor undoubtedly plays a significant role. More fundamentally; however, auctions support dynamic formation of prices, thereby enabling exchanges in situations where a fixed price — unless it happened to be set exactly right — would not support as many deals. Dynamic market pricing can improve the equality of trades to the extent there is significant value uncertainty, such as for sparsely traded goods, high demand variability, or rapid product obsolescence. Distribution of information is, of course, the rationale for auctions in offline contexts as well. The online environment is particularly conducive to auctions, due to at least two important properties of the electronic medium. First, the network supports inexpensive, wide-area, dynamic communication, Although the primitive communication protocol is point-to-point, a mediating server (i.e., the auction) can easily manage a protocol involving thousands of participants. Moreover, the information revelation process can be carefully controlled. Unlike the human auctioneer orchestrating a room of shouting traders, a network auction mediator can dictate exactly which participants receive which information and when, according to auction rules. Second, to the extent that auction-mediated negotiation is tedious, it can be automated. Not only the auctioneer, but also the participating traders, may be represented by computational processes. For example, many sellers employ listing software tools to post large collections of goods for sale over time. To date, trading automation appears to be only minimally exploited by buyers in popular Internet auctions, for example, via “sniping” services that submit bids automatically at designated times, thus freeing the bidder from the necessity of manual monitoring.
19.3.1 Auction Types Despite the variety in imagery of the auction scenarios above, most people would recognize all of them as auctions, with items for sale, competing buyers and a progression of tentative prices, or bids, until the final price, or clearing price, is reached. How the initial price is chosen, whether the tentative prices are announced by the auctioneer or the traders (i.e., bidders) themselves, or even whether the prices go up or down toward the result, are defining details of the particular type of auction being executed. Although the specific rules may differ, what makes all of these auctions is that they are organized according to welldefined rules, and at the end of the process, these rules will dictate what deal, if any, is struck as a consequence of the bidding activity by the auction participants. Many obvious variants on the above scenarios will clearly qualify as auctions as well. For example, there might be several items for sale instead of one, or the bidders might compete to sell, rather than buy, the good or goods in question. Once we consider how auction rules can vary, we see that auctions naturally group themselves into types, where auctions of a given type share some distinctive feature. For example, the scenarios described above are all instances of open outcry auctions, which share the property that all status information (e.g., the tentative prices) is conveyed immediately and globally to all participants. Another form of open outcry auction is the familiar “trading pit” of a commodities or securities exchange. Although this might not always be viewed as an auction in common parlance, it shares with the examples above some essential features. Even the seemingly chaotic trading pit operates according to rules governing who is allowed to shout what and when, and what the shouts entail in terms of offers of exchange. The most immediately distinguishing feature of the trading pit is that it is two-sided: both buyers and sellers play bidders in this protocol. In contrast, the art, livestock, and fish auctions alluded to above are one-sided: a single seller offers an item to multiple bidding buyers. The inverse one-sided auction — where a single buyer receives bids from multiple competing sellers — is sometimes called a reverse auction, and is often employed by businesses in procurement, where it is often called a request for quotations or RFQ. Unlike open outcry events, in sealed-bid auctions the participants do not learn the status of the auction until the end, if then, or until some other explicit action by the auctioneer. Familiar examples of sealed-
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 6 Wednesday, August 4, 2004 8:31 AM
19-6
The Practical Handbook of Internet Computing
bid auctions include government sales of leases for offshore oil drilling and procedures by which realestate developers let construction contracts. Note that the latter is an example of a procurement auction, as it is the sellers of construction services doing the bidding. Sealed-bid auctions may be one-shot, or may involve complex iterations over multiple rounds, as in the prominent U.S. FCC spectrum auctions held in recent years [McAfee and McMillan, 1996]. Like open outcry auctions, sealed-bid auctions may also come in one-sided or two-sided varieties. Internet auctions, like their offline counterparts, also come in a range of types and variations. Today, almost all consumer-oriented auctions are one-sided (i.e., they allow buy bids only), run by the sellers themselves or by third parties. The prevailing format can be viewed as an attempt to mimic the familiar open outcry auctions, posting the current high bid and the bidders identity. The very familiarity of these mechanisms is an advantage, and may be the reason they have proliferated on the Internet.
19.3.2 Auction Configuration and Market Design Auctions operated in business-to-business marketplaces are also predominantly one-sided (typically procurement or reverse auctions), though some two-sided auctions (often called exchanges) persist. Familiarity is also a factor in designing business-oriented auctions, though we should expect less of a tendency for a one-size-fits-all approach for several reasons. • Industry trade groups may have preexisting prevailing conventions and practices, which will be important to accommodate in market designs. • Participants earn their livelihood by trading, and so are more willing to invest in learning marketspecific trading rules. Adopting more complex rules may be worthwhile if they enable more efficient trading over time. • Transactions will involve higher stakes on average, and so even small proportional gains to customization may be justified. The Michigan Internet AuctionBot [Wurman et al., 1998] was an early attempt to support general configurability in auction operation. Although the ability to customize auction rules proved not to be very useful for consumer-oriented marketplaces, for the reasons stated above, this capability provides potentially greater value for specialized commercial trade. The AuctionBot provided a model for the auction platform now distributed as Ariba Sourcing™, which underlies several business-to-business marketplaces. 19.3.2.1 Dimensions of Market Design Flexible market infrastructure supports a variety of market rules, covering all aspects of operating a market. The infrastructure is configurable if the designer can mix and match operating rules across the various categories of market activity. We have found it particularly useful to organize market design around three fundamental dimensions (Figure 19.2), which correspond to three core activities performed by the market. 1. Bidding rules. Traders express offers to the market in messages called bids, describing deals in which they are willing to engage. The market’s bidding policy defines the form of bids, and dictates what bids are admissible, when, and by whom, as a function of bids already received. 2. Clearing policy. The object of the market is to determine exchanges, or deals, by identifying compatible bids and clearing them as trades. The clearing policy dictates how and when bids are matched and formed into trades, including determining the terms of the deals in cases where there exist many consistent possibilities. 3. Information revelation policy. Markets typically post intermediate information about the status of bidding prior to the determination of final trades. Determining what information is available to whom, when, and in what form, is the subject of information revelation policy.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 7 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-7
bidding rules
clearing policy
information revelation policy FIGURE 19.2 Three dimensions of market design.
19.3.2.2 Bidding Rules To illustrate the specification of market rules across these dimensions, let us consider some of the possible range of bidding rules an auction can impose. The outline below is far from exhaustive — even for the bidding dimension alone. I include it merely to illustrate the great variety of separately definable auction features. A more comprehensive and technically precise exposition of auction design space is presented by Wurman et al. [2001]. We generally assume that any trader can always submit a bid to an auction. The bidding rules determine whether it will be admitted to the auction. Admitted bids are entered into the auction’s order book, which stores the bids considered currently active. Most bidding rules can be defined to hold for everyone or specialized to hold for particular classes of traders. In general, a bidding rule may consider the current order book, previous bids by this trader or, for that matter, any aspect of the auction’s history. However, it is helpful to focus the examples on forms of bidding rules corresponding to particularly useful categories. • Allowable bid modifications. These rules regulate when a bid revision is permitted, as a function of the previous pattern of bids by this trader or others. • Withdrawal/replace allowed: Whether or when a new bid may be submitted to supersede a previous one. • Bid frequency restrictions: Set over the entire course of the auction or for designated periods. For example, an auction might define a notion of stage, or round, and allow each trader to bid once per round. • Static restrictions on bid content. Content rules define what bids are admissible, based on the specifics of the offer. A content rule is static if it can be defined independently of other bids that have been submitted by this trader or others. • One- vs. two-sided: Competitive bidding on both the buy and sell sides, or just one or the other As discussed above, in a one-sided auction, only one distinguished trader is allowed to sell (buy); all others can submit only buy (sell) bids. • Bid quantities: Offers can be for single or multiple units, and if multiunit, the allowable offer patterns. For example, a multiunit offer may be limited to a single price point, or arbitrary price-quantity schedules may be allowed. Similarly, quantity bidding rules control such issues as whether or not indivisible (“all-or-none”) bids are allowed. • Dynamic restrictions on bid content. A content rule is dynamic if it depends on previous bids by this trader or the current order book. • Beat-the-quote: A new bid must be better than some designated benchmark, such as the best offer received so far. These rules can be used to implement an ascending (or descending) auction, where prices progress in a given direction until the final price is reached.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 8 Wednesday, August 4, 2004 8:31 AM
19-8
The Practical Handbook of Internet Computing
•
Bid dominance: In a manner analogous to beat-the-quote, we can require that a new bid improves the trader’s own bid. There are various versions of this rule, based on different criteria for comparing bids. • Eligibility: Defines the conditions under which a trader is eligible to submit bids, or the prices or quantities allowed in those bids. Eligibility is typically based on trader qualifications (e.g., credit ratings) or prior bidding history. For example, activity rules define eligibility based on the extent of current bids — the stronger the current bids, the greater the trader’s eligibility for subsequent bidding. • Payments. Sometimes, restrictions such as those above can be waived on agreement to pay a fixed or variable fee. For example, an initial bid may require an entry fee (refundable or not) or withdrawals may be allowed on payment of a decommitment penalty. 19.3.2.3 Criteria for Auction Design Given the wide range of possible ways to run an auction, how is the designer to choose the policies for a particular market? The first step is to define one’s objectives. There are many characteristics of a market we may care about. These may be categorized roughly into process- and outcome-oriented features. Process-oriented features bear on the operation of the market and the participation effort required of traders or other interested parties. For example, we generally prefer that market rules be as simple and familiar as possible, all else being equal, as this promotes ease of learning and participation. Markets may also differ on how much time they impose for bid preparation and monitoring or how much information they require the traders to reveal. Some market structures might be considered more transparent than others or otherwise present perceived differences in fairness. All of these may be important issues for marketplace designers. Outcome-oriented features represent properties of the results that would be reasonably expected form the market. Natural measures include expected revenue from a seller-run auction or expected expenditures in a procurement auction. Often we care most directly about overall efficiency, that is, how well the market allocates resources to their most valuable uses. A natural index of efficiency is total surplus, the aggregate gain (measured in currency units) from trade summed over all participants. Other considerations include the resistance of the mechanism to market manipulation, collusion, or various forms of cheating. To take such issues into account, the designer, of course, needs some way to relate the market rules to these desired characteristics. Fortunately, there exists a substantial body of theory surrounding auctions [Klemperer, 1999; Krishna, 2002], starting from the seminal (Nobel Prize-winning) work of Vickrey [1961]. Auction theory tends to focus on outcome-oriented features, analyzing markets as games of incomplete information [Fudenberg and Tirole, 1991]. One of the key results of the field of mechanism design is the impossibility of guaranteeing efficiency through a mechanism where rational agents are free to participate or not, without providing some subsidy [Myerson and Satterthwaite, 1983]. It follows that auction design inevitably requires tradeoffs among desirable features, In recent years, the field has accumulated much experience from designing markets for privatization [Milgrom, 2003], yielding many lessons about market process as well as performance characteristics.
19.3.3 Complex Auctions The discussion of market types above focused attention on “simple” auctions, where a single type of good (one or more units) is to be exchanged, and the negotiation addresses only price and quantity. In a multidimensional auction, bids may refer to multiple goods or features of a good. Although such complex auctions are not yet prevalent, automation has only recently made them feasible, and they are likely to grow in importance in online marketplaces. A combinatorial auction [de Vries and Vohra, 2003] allows indivisible bids for bundles of goods. This enables the bidder to indicate a willingness to obtain goods if and only if the combination is available. Such a capability is particularly important when the goods are complementary, that is, the value of
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 9 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-9
obtaining some is increased when the others are obtained as well. For example, a bicycle assembler needs both wheels and frames; neither part constitutes a bicycle without the other. Bidding rules for combinatorial auctions dictate what bundles are expressible, and clearing policy defines the method for calculating overall allocations and payments. A multiattribute auction [Bichler, 2001] allows bids that refer to multiple features of a single good. For example, a shipment of automobile tires might be defined by wheel diameter, tread life, warranty, delivery date, and performance characteristics (antiskid, puncture resistance), as well as the usual price and quantity. Multiattribute bids may specify the value of particular feature vectors or express a correspondence of values over extended regions of the attribute space. The form of such bids are defined by the bidding rules, and the clearing policy dictates the method of matching such multiattribute offers. Multidimensional negotiation constitutes an area of great potential for online marketplaces — enabling a form of trading automation not previously possible. Ultimately, combinatorial and multiattribute negotiation could, in principle, support the negotiation of general contracts [Reeves et al., 2002]. Before that vision becomes reality, however, numerous technical issues in multidimensional negotiation must be addressed, including such problems as: • • • • •
What are the best forms for expressing combinatorial and multiattribute bids? What intermediate information should be revealed as these auctions proceed? How can we reduce the complexity of participating in multidimensional negotiations? What strategies should we expect from combinatorial and multiattribute bidders? What is the appropriate scope of a market? Combining related goods in a single negotiation avoids market coordination failures, but imposes synchronization delays and other potential costs in computation, communication, and organization.
Although several existing proposals and models address these questions in part, multidimensional negotiation remains an active research topic in market design.
19.4 Establishing a Marketplace To build an effective online marketplace, one needs to identify unfulfilled trading opportunities, design a suitable negotiation mechanists, and provide (directly or through ancillary parties) well-integrated discovery and transaction services. This is, of course, quite a tall order, and the specifics are dauntingly open ended. Nevertheless, assembling all these functions still is not sufficient to ensure marketplace success. Unfortunately, despite several useful sources of advice on establishing an online marketplace [Kambil and van Heck, 2002; Woods, 2002], much of the prevailing wisdom is based on anecdotal experience — accrued within a dynamic technological and economic environment — and continues to evolve rapidly. In this section, I briefly note some of the additional technical and organizational issues that can prove instrumental in making an online marketplace really work. As the field matures, we can expect that some of these will become routinely addressed by common infrastructure, and others will become more precisely understood through accumulation and analysis of experience.
19.4.1 Technical Issues The section on auctions above discusses economic as well as technical issues in the design and deployment of negotiation mechanisms, focusing on the logic of market procedures. To underpin the market logic, we require a robust computational infrastructure to ensure its proper operation under a range of conditions, loads, and extraordinary events. By their very nature, online marketplaces operate over distributed networks, typically accessed by a heterogeneous collection of traders and observer nodes. For example, one user might submit a simple bid through a Web page accessed via telephone modem, whereas another might automatically submit large arrays of trades through a programmatic interface from a fast work-
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 10 Wednesday, August 4, 2004 8:31 AM
19-10
The Practical Handbook of Internet Computing
station connected through a high-bandwidth network. Access is generally asynchronous and conducted over public networks. In many respects, the processing issues faced by a marketplace are identical to those in other transaction-processing applications. We naturally care to a great extent about general system reliability and availability, and transparency of operation. It is important that transactions be atomic (i.e., an operation either completes or has no effect), and that state is recoverable in case of an outage, system crash, or other fault event. There may also be some additional issues particularly salient for market applications. For example, maintaining temporal integrity can be critically important for correct and fair implementation of market rules. In the market context, temporal integrity means that the outcome of a negotiation is a function o f the sequence of communications received from traders, independent of delays in computation and communication internal to the market. One simple consequence of temporal integrity is that bids be processed in order received, despite any backlog that may exist. (Bid processing may in general require a complex computation on the order book, and it would be most undesirable to block incoming messages while this computation takes place.) Another example involves synchronization with market events. If the market is scheduled to clear at time t, then this clear should reflect all bids received before t, even if they are not all completely processed by this time. One way to enforce this kind of temporal integrity is to maintain a market logical time, which may differ (i.e., lag somewhat behind) the actual clock time [Wellman and Wurman, 1998]. Given this approach, any information revealed by the market can be associated with a logical time, thus indicating the correct state based on bids actually received by this logical time. Despite its apparent importance, strikingly few online markets provide any meaningful guarantees of temporal integrity or even indications relating information revealed to the times of the states they reflect. For example, in a typical online brokerage, one is never sure about the exact market time corresponding to the posted price quotes. This makes it difficult for a trader (or even a regulator!) to audit the stream of bids to ensure that all deals were properly determined. The likely explanation is that these systems evolved on top of semi-manual legacy systems for which such fine-grained accounting was not feasible. As a result, to detect improper behavior it is often necessary to resort to pattern-matching and other statistical techniques [Kirkland et al., 1999]. With increased automation, a much higher standard of temporal integrity and accountability should be possible to achieve normally, and this should be the goal of new market designers. Finally, one cannot deploy a marketplace without serious attention to issues of privacy and security. We cannot do justice to such concerns here, and hence we will just note that simply by virtue of their financial nature, markets represent an obvious security risk. In consequence, the system must carefully authenticate and authorize all market interactions (e.g., both bidding and access of revealed market information). Moreover, online marketplaces are often quite vulnerable to denial-of-service and other resource-oriented attacks. Because negotiation necessarily discloses sensitive information (as do other market activities, such as search and evaluation), it is an essential matter of privacy to ensure that the market reveals no information beyond that dictated by the stated revelation policy.
19.4.2 Achieving Critical Mass If we build an electronic marketplace for a compelling domain with rich supporting services, sound economic design, and technically solid in all respects — w i l l the traders come? Alas, it (still) depends … Trading in markets is a network activity, in the sense that the benefit of participating depends on the participation of others. Naturally, it is a waste of effort searching for deals in a market where the attractive counterparties are scarce. To overcome these network effects [Shapiro and Varian, 1998; Shy, 2001], it is often necessary to invest up front to develop a critical mass of traders that can sustain itself and attract additional traders. In effect, the marketplace may need to subsidize the early entrants, helping them overcome the initial fixed cost of entry until there are sufficient participants such that gains from trading itself outweigh the costs. Note that enticing entrants by promising or suggesting some advantage in the
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 11 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-11
market itself is generally counterproductive, as it inhibits the traders on the “other side” who will ultimately render the market profitable overall. It is commonplace to observe in this context that the key to a successful marketplace is achieving sufficient liquidity. A market is liquid to the extent it is readily possible to make a trade at the “prevailing” price at any time. In a thin market, in contrast, it is often the case that one can execute a transaction only at a disadvantageous price due to frequent temporary imbalances caused by the sparseness of traders. For example, markets in equities listed on major stock exchanges are famously liquid, due to large volume as well as the active participation of market makers or specialists with express obligations (incurred in return for their privileged status in the market) to facilitate liquidity by trading on their own account when necessary. It is perhaps unfortunate that the financial markets have provided the most salient example of a functionally liquid market. It appears that many of the first generation of online marketplaces have attempted to achieve liquidity by emulation of these markets, quite often hiring key personnel with primary experience as traders in organized equity or commodity exchanges. For example, traditional financial securities markets employ variants of the continuous double auction [Friedman and Rust, 1993], which matches buy and sell orders instantaneously whenever compatible offers appear on the market. However, many eminently tradeable goods inherently lack the volume potential of financial securities, and for such markets instantaneous matching might be reasonably sacrificed for more designs likely to produce more robust and stable prices. In principle, new marketplaces provide an opportunity for introducing customized market designs. In practice, however, familiarity and other factors introduce a bias toward “legacy” trading processes.
19.5 The Future of Online Marketplaces Anyone contemplating a prediction of the course of online marketplaces will be cautioned by the memory of prevailing late-1990s forecasts that proved to be wildly optimistic. Though many online marketplaces came and went during “the Bubble,” the persistence of some through the pessimistic “Aftermath” is surely evidence that online marketplaces can provide real value. Even the failed attempts have left us with cautionary tales and other learning experiences [Woods, 2002], and in some instances, useful technologies. So without offering any specific prognostications with exponential growth curves, this chapter ends with a generally positive outlook plus a few suggestions about what we might see in the next generation of online marketplaces. First, while specific marketplaces will come and go, the practice of online trading will remain, and likely stabilize over time through recognition of successful models and standardization of interfaces. Decisions about joining marketplaces or starting new ones should perhaps be driven less by strategic concerns (e.g., the “land grab” mentality that fueled the Bubble), and more by the objective of supporting trading activities that improve industry efficiency and productivity. Second, as discussed above, there is currently a large amount of research attention, as well as some commercial development, devoted to the area of multidimensional negotiation. Combinatorial and multiattribute auctions support richer expressions of offers, accounting for multiple facets of a deal, and interactions between parts of a deal. Whereas multidimensional negotation is not a panacea (presenting additional costs and complications, and unresolved issues), it does offer the potential to get beyond some of the rigidities inhibiting trade in online marketplaces. Finally, trading is a labor-intensive activity. Whereas online marketplaces can provide services to assist discovery and monitoring of trading opportunities, it may nevertheless present too many plausible options for a person to reasonably attend. Ultimately, therefore, it is reasonable to expect the trading function itself to be automated, and for online marketplaces to become primarily the province of programmed traders. Software agents can potentially monitor and engage in many more simultaneous market activities than could any human. A recently inaugurated annual trading agent competition [Wellman et al., 2003] presents one vision of a future of online markets driven by autonomous trading agents.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 12 Wednesday, August 4, 2004 8:31 AM
19-12
The Practical Handbook of Internet Computing
References Ariba Inc., IBM Corp., and Microsoft Corp. Universal description, discovery, and integration (UDDI). Technical white paper, UDDI.org, 2000. Belew, Richard K. Finding Out About. Cambridge University Press, 2000. Berners-Lee, Tim, James Hendler, and Ora Lassila. The semantic web. Scientific American, 284(5): 34–43, 2001. Bichler, Martin. The Future of e-Markets: Multidimensional Market Mechanisms. Cambridge University Press, Cambridge, U.K., 2001. Chaum, David. Achieving electronic privacy. Scientific American, 267(2): 96–101, 1992. Cohen, Adam. The Perfect Store: Inside eBay. Little, Brown, and Company, New York, 2002. Crocker, Steve. The siren song of Internet micropayments. iMP: The Magazine on Information Impacts, 1999. de Vries, Sven and Rakesh Vohra. Combinatorial auctions: a survey. INFORMS Journal on Computing, 15, 284–309, 2003. Dellarocas, Chrysanthos. The digitization of word-of-mouth: promise and challenges of online reputation mechanisms. Management Science, 1407–1424, 2003. Di Noia, Tommaso, Eugenio Di Sciascio, Francesco M. Donini, and Marina Mongiello. A system for principled matchmaking in an electronic marketplace. In Twelfth International World Wide Web Conference, Budapest, 2003, in press Int. J. Elec. Comm. 2004. Doorenbos, Robert B., Oren Etzioni, and Daniel S. Weld. A scalable comparison-shopping agent for the world-wide web. In First International Conference on Autonomous Agents, pages 39–48, 1997. Friedman, Daniel and John Rust, Eds. The Double Auction. Market. Addison-Wesley, Reading, MA, 1993. Fudenberg, Drew and Jean Tirole. Game Theory. MIT Press, Cambridge, MA, 1991. Kambil, Ajit and Eric van Heck. Making Markets. Harvard Business School Press, Boston, 2002. Kirkland, J. Dale, Ted E. Senator, James J. Hayden, Tomasz Dybala, Henry G. Goldberg, and Ping Shyr. The NASD Regulation advanced-detection system (ADS). AI Magazine, 20(1): 55–67, 1999. Klemperer, Paul D. Auction theory: A guide to the literature. Journal of Economic Surveys, 13: 227–286,1999. Krishna, Vijay. Auction Theory. Academic Press, San Diego, CA, 2002. Krulwich, Bruce T. The BargainFinder agent: Comparison price shopping on the Internet. In Joseph Williams, Ed., Bots and Other Internet Beasties, chapter 13, pages 257–263. Sams Publishing, Indianapolis, IN, 1996. Li, Lei and Ian Horrocks. A software framework for matchmaking based on semantic web technology. In Twelfth International World Wide Web Conference, Budapest, 2003, in press Int J Elec. Comm. 2004. MacKie-Mason, Jeffrey K. and Kimberly White. Evaluating and selecting digital payment mechanisms. In Gregory L. Rosston and David Waterman, Eds., Interconnection and the Internet: Selected Papers from the 24th Annual Telecommunications Policy Research Conference. Lawrence Erlbaum, Hillsdale, NJ, 1997. Manasse, Mark S. The Millicent protocols for electronic commerce. In First USENIX Workshop on Electronic Commerce, pages 117–123, New York, 1995. McAfee, R. Preston and John McMillan. Analyzing the airwaves auction. Journal of Economic Perspectives, 10(1): 159–175, 1996. Milgrom, Paul. Putting Auction Theory to Work. Cambridge University Press, Cambridge, U.K., 2003. Myerson, Roger B. and Mark A. Satterthwaite. Efficient mechanisms for bilateral trading. Journal of Economic Theory, 29: 265–281, 1983. O’Mahony, Donal, Michael Pierce, and Hitesh Tewari. Electronic Payment Systems. Artech House, Norwood, MA, 1997. Reeves, Daniel M., Michael P. Wellman, and Benjamin N. Grosof. Automated negotiation from declarative contract descriptions. Computational Intelligence, 18: 482–500, 2002.
Copyright 2005 by CRC Press LLC
C3812_C19.fm Page 13 Wednesday, August 4, 2004 8:31 AM
Online Marketplaces
19-13
Resnick, Paul and Hal R. Varian. Recommender systems. Communications of the ACM, 40(3): 56–58, 1997. Resnick, Paul and Richard Zeckhauser. Trust among strangers in Internet transactions: Empirical analysis of eBay’s reputation system. In Michael R. Baye, editor, The Economics of the Internet and ECommerce, volume 11 of Advances in Applied Microeconomics. Elsevier Science, Amsterdam, 2002. Resnick, Paul, Richard Zeckhauser, Eric Friedman, and Ko Kuwabara. Reputation systems. Communications of the ACM, 43(12): 45–48, 2002. Riedl, John and Joseph A. Konstan. Word of Mouse: The Marketing Power of Collaborative Filtering. Warner Books, New York, 2002. Schafer, J. Ben, Joseph A. Konstan, and John Riedl. E-commerce recommendation applications. Data Mining and Knowledge Discovery, 5: 115–153, 2001. Schmid, Beat F. Electronic markets. Electronic Markets, 3(3), 1993. Shapiro, Carl and Hal R. Varian. Information Rules: A Strategic Guide to the Network Economy. Harvard Business School Press, Boston, 1998. Shy, Oz. The Economics of Network Industries. Cambridge University Press, Cambridge, MA, 2001. Sirbu, Marvin and J. D. Tygar. NetBill: an Internet commerce system optimized for network delivered services. IEEE Personal Communications, 2(4): 34–39, 1995. Vickrey, William. Counterspeculation, auctions, and competitive sealed tenders. Journal of Finance, 16: 8–37,1961. Wellman, Michael P., Amy Greenwald, Peter Stone, and Peter R. Wurman. The 2001 trading agent competition. Electronic Markets, 13: 4–12, 2003. Wellman, Michael P. and Peter R. Wurman. Real time issues for Internet auctions. In IEEE Workshop on Dependable and Real-Time E-Commerce Systems, Denver, CO, 1998. Woods, W. William A. B2B Exchanges 2.0. ISI Publications, Hong Kong, 2002. Wurman, Peter R., Michael P. Wellman, and William E. Walsh. The Michigan Internet AuctionBot: A configurable auction server for human and software agents. In Second International Conference on Autonomous Agents, pages 301–308, Minneapolis, 1998. Wurman, Peter R., Michael P. Wellman, and William E. Walsh. A parameterization of the auction design space. Games and Economic Behavior, 35: 304–338, 2001.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 1 Wednesday, August 4, 2004 8:33 AM
20 Online Reputation Mechanisms 20.1 20.2 20.3 20.4
CONTENTS Introduction An Ancient Concept In a New Setting A Concrete Example: eBay’s Feedback Mechanism Reputation in Game Theory and Economics 20.4.1 Basic Concepts 20.4.2 Reputation Dynamics
20.5 New Opportunities and Challenges of Online Mechanisms 20.5.1 Understanding the Impact of Scalability 20.5.2 Eliciting Sufficient and Honest Feedback 20.5.3 Exploiting the Information Processing Capabilities of Feedback Mediators 20.5.4 Coping with Easy Name Changes 20.5.5 Exploring Alternative Architectures
Chrysanthos Dellarocas
20.6 Conclusions References
Online reputation mechanisms harness the bidirectional communication capabilities of the Internet in order to engineer large-scale, word-of-mouth networks. They are emerging as a promising alternative to more established assurance mechanisms such as branding and formal contracting in a variety of settings ranging from online marketplaces to peer-to-peer networks. This chapter surveys our progress in understanding the new possibilities and challenges that these mechanisms represent. It discusses some important dimensions in which Internet-based reputation mechanisms differ from traditional word-of-mouth networks and surveys the most important issues related to designing, evaluating, and using them. It provides an overview of relevant work in game theory and economics on the topic or reputation. It further discusses how this body of work is being extended and combined with insights from computer science, information systems, management science, and psychology in order to take into consideration the special properties of online mechanisms such as their unprecedented scalability, the ability to precisely design the type of feedback information they solicit and distribute, and challenges associated with the relative anonymity of online environments.
20.1 Introduction A fundamental aspect in which the Internet differs from previous technologies for mass communication is its bidirectional nature: Not only has it bestowed upon organizations a low-cost channel through which to reach audiences of unprecedented scale but also, for the first time in human history, it has enabled
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 2 Wednesday, August 4, 2004 8:33 AM
20-2
The Practical Handbook of Internet Computing
individuals to almost costlessly make their personal thoughts and opinions accessible to the global community of Internet users. An intriguing family of electronic intermediaries are beginning to harness this unique property, redefining and adding new significance to one of the most ancient mechanisms in the history of human society. Online reputation mechanisms, also known as reputation systems [Resnick, Zeckhauser, Friedman, and Kubwara, 2000] and feedback mechanisms [Dellarocas, 2003b] are using the Internet’s bidirectional communication capabilities in order to artificially engineer large-scale word-of-mouth networks in online environments. Online reputation mechanisms allow members of a community to submit their opinions regarding other members of that community. Submitted feedback is analyzed, aggregated with feedback posted by other members, and made publicly available to the community in the form of member feedback profiles. Several examples of such mechanisms can already be found in a number of diverse online communities (Figure 20.1). Perhaps the best-known application of online reputation mechanisms to date has been as a technology for building trust in electronic markets. This has been motivated by the fact that many traditional trustbuilding mechanisms, such as state-enforced contractual guarantees and repeated interaction, tend to be less effective in large-scale online environments [Kollock, 1999]. Successful online marketplaces such as eBay are characterized by large numbers of small players, physically located around the world and often known to each other only via easily changeable pseudonyms. Contractual guarantees are usually difficult or too costly to enforce due to the global scope of the market and the volatility of identities. Furthermore, the huge number of players makes repeated interaction between the same set of players less probable, thus reducing the incentives for players to cooperate on the basis of hoping to develop a profitable relationship. Online reputation mechanisms have emerged as a viable mechanism for inducing cooperation among strangers in such settings by ensuring that the behavior of a player towards any other player becomes
Web Site
Category
Summary of reputation mechanism
Format of solicited feedback
Format of feedback profiles
eBay
Online auction house
Buyers and sellers rate one another following transactions
Positive, negative or neutral rating plus short comment; ratee may post a response
Sums of positive, negative and neutral ratings received during past 6 months
eLance
Professional services marketplace
Contractors rate their satisfaction with subcontractors
Numerical rating from 1–5 plus comment; ratee may post a response
Average of ratings received during past 6 months
Epinions
Online opinions forum
Users write reviews about Users rate multiple Averages of item ratings; % products/service; other aspects of reviewed of readers who found a members rate the items from 1–5; readers review “useful” usefulness of reviews rate reviews as “useful”, “not useful”, etc.
Google
Search engine
Search results are rank How many links point to Rank ordering acts as an ordered based on how a page, how many links implicit indicator of many sites contain links point to the pointing reputation that point to them [Brin page, etc. and Page, 1998]
Slashdot
Online discussion board
Postings are prioritized or Readers rate posted filtered according to the comments rating they receive from readers
Rank ordering acts as an implicit indicator of reputation
FIGURE 20.1 Some examples of online reputation mechanisms used in commercial Websites.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 3 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-3
publicly known and may therefore affect the behavior of the entire community towards that player in the future. Knowing this, players have an incentive to behave well towards each other, even if their relationship is a one-time deal. A growing body of empirical evidence seems to demonstrate that these systems have managed to provide remarkable stability in otherwise very risky trading environments (see, for example, Bajari and Hortacsu [2003]; Dewan and Hsu [2002]; Houser and Wonders [2000]; LuckingReiley et al. [2000]; Resnick and Zeckhauser [2002]). The application of reputation mechanisms in online marketplaces is particularly interesting because many of these marketplaces would probably not have come into existence without them. It is, however, by no means the only possible application domain of such systems. Internet-based feedback mechanisms are appearing in a surprising variety of settings: For example, Epinions.com encourages Internet users to rate practically any kind of brick-and-mortar business, such as airlines, telephone companies, resorts, etc. Moviefone.com solicits and displays user feedback on new movies alongside professional reviews and Citysearch.com does the same for restaurants, bars, and performances. Even news sites, perhaps the best embodiment of the unidirectional truss media of the previous century, are now encouraging readers to provide feedback on world events alongside professionally written news articles. The proliferation of online reputation mechanisms is already changing people’s behavior in subtle but important ways. Anecdotal evidence suggests that people now increasingly rely on opinions posted on such systems in order to make a variety of decisions ranging from what movie to watch to what stocks to invest on. Only 5 years ago the same people would primarily base those decisions on advertisements or professional advice. It might well be that the ability to solicit, aggregate, and publish mass feedback will influence the social dynamics of the 21st century in a similarly powerful way in which the ability to mass broadcast affected our societies in the 20th century. The rising importance of online reputation systems not only invites but also necessitates rigorous research on their functioning and consequences. How do such mechanisms affect the behavior of participants in the communities where they are introduced? Do they induce socially beneficial outcomes? To what extent can their operators and participants manipulate them? How can communities protect themselves from such potential abuse? What mechanism designs work best in what settings? Under what circumstances can these mechanisms such as contracts, legal guarantees, and professional reviews become viable substitutes (or complements) of more established institutions? This is just a small subset of questions that invite exciting and valuable research. This chapter surveys our progress so far in understanding the new possibilities and challenges that these mechanisms represent. Section 20.2 discusses some important dimensions in which Internet-based reputation mechanisms differ from traditional word-of-mouth networks. Section 20.3 provides a case study of eBay’s feedback mechanism, perhaps the best known reputation system at the time of this chapter’s writing. The following two sections survey our progress in developing a systematic discipline that can help answer those questions. First, Section 20.4 provides an overview of relevant past work in game theory and economics. Section 20.5 then discusses how this body of work is being extended in order to take into consideration the special properties of online mechanisms. Finally, Section 20.6 summarizes the main points and lists opportunities for future research.
20.2 An Ancient Concept In a New Setting Word-of-mouth networks constitute an ancient solution to a timeless problem of social organization: the elicitation of good conduct in communities of self-interested individuals who have short-term incentives to cheat one another. The power of such networks to induce cooperation without the need for costly and inefficient enforcement institutions has historically been the basis of their appeal. Before the establishment of formal law and centralized systems of contract enforcement backed by the sovereign power of a state, most ancient and medieval communities relied on word-of-mouth as the primary enabler of economic and social activity [Benson, 1989; Greif, 1993; Milgrom, North, and Weingast, 1990]. Many aspects of social and economic life still do so today [Klein, 1997].
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 4 Wednesday, August 4, 2004 8:33 AM
20-4
The Practical Handbook of Internet Computing
What makes online reputation mechanisms different from word-of-mouth networks of the past is the combination of (1) their unprecedented scale, achieved through the exploitation of the Internet’s lowcost, bidirectional communication capabilities, (2) the ability of their designers to precisely control and monitor their operation through the introduction of automated feedback mediators, and (3) new challenges introduced by the unique properties of online interaction, such as the volatile nature of online identities and the almost complete absence of contextual cues that would facilitate the interpretation of what is, essentially, subjective information. • Scale enables new applications. Scale is essential to the effectiveness of word-of-mouth networks. In an online marketplace, for example, sellers care about buyer feedback primarily to the extent that they believe that it might affect their future profits; this can only happen if feedback is provided by a sufficient number of current customers and communicated to a significant portion of future prospects. Theory predicts that a minimum scale is required before reputation mechanisms have any effect on the behavior of rational agents [Bakos and Dellarocas, 2002]. Whereas traditional word-of-mouth networks tend to deteriorate with scale, Internet-based reputation mechanisms can accumulate, store, and flawlessly summarize unlimited amounts of information at very low cost. The vastly increased scale of Internet-based reputation mechanisms might therefore make such mechanisms effective social control institutions in settings where word-of-mouth previously had a very weak effect. The social, economic, and perhaps even political consequences of such a trend deserve careful study. • Information technology enables systematic design. Online word-of-mouth networks are artificially induced through explicitly designed information systems (feedback mediators). Feedback mediators specify who can participate, what type of information is solicited from participants, how it is aggregated, and what type of information is made available to them about other community members. They enable mechanism designers to exercise precise control over a number of parameters that are very difficult or impossible to influence in brick-and-mortar settings. For example, feedback mediators can replace detailed feedback histories with a wide variety of summary statistics, they can apply filtering algorithms to eliminate outlier or suspect ratings, they can weight ratings according to some measure of the rater’s trustworthiness, etc. Such degree of control can impact the resulting social outcomes in nontrivial ways (see Section 20.5.2, Section 20.5.3, and Section 20.5.4). Understanding the full space of design possibilities and the consequences of specific design choices introduced by these new systems is an important research challenge that requires collaboration between traditionally distinct disciplines such as computer science, economics, and psychology, in order to be properly addressed. • Online interaction introduces new challenges. The disembodied nature of online environments introduces several challenges related to the interpretation and use of online feedback. Some of these challenges have their roots at the subjective nature of feedback information. Brick-andmortar settings usually provide a wealth of contextual cues that assist in the proper interpretation of opinions and gossip (such as the fact that we know the person who acts as the source of that information or can infer something about her through her clothes, facial expression, etc.). Most of these cues are absent from online settings. Readers of online feedback are thus faced with the task of making sense out of opinions of complete strangers. Other challenges have their root at the ease with which online identities can be changed. This opens the door to various forms of strategic manipulation. For example, community members can build a good reputation, milk it by cheating other members, and then disappear and reappear under a new online identity and a clean record [Friedman and Resnick, 2001]. They can use fake online identities to post dishonest feedback for the purpose of inflating their reputation or tarnishing that of their competitors [Dellarocas, 2000; Mayzlin, 2003]. Finally, the mediated nature of online reputation mechanisms raises questions related to the trustworthiness of their operators. An important prerequisite for the widespread acceptance of online reputation mechanisms as legitimate trust-building institu-
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 5 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-5
tions is, therefore, a better understanding of how such systems can be compromised as well as the development of adequate defenses.
20.3 A Concrete Example: eBay’s Feedback Mechanism The feedback mechanism of eBay is, arguably, the best known online reputation mechanism at the time of this writing. Founded in September 1995, eBay is the leading online marketplace for the sale of goods and services by a diverse global community of individuals and businesses. Today, the eBay community includes more than 100 million registered users. One of the most remarkable aspects of eBay is that the transactions performed through it are not backed up by formal legal guarantees. Instead, cooperation and trust are primarily based on the existence of a simple feedback mechanism. This mechanism allows eBay buyers and sellers to rate one another following transactions and makes the history of a trader’s past ratings public to the entire community. On eBay, users are known to each other through online pseudonyms (eBay IDs). When a new user registers to the system, eBay requests that she provide a valid email address and a valid credit card number (for verification purposes only). Most items on eBay are sold through English auctions. A typical eBay transaction begins with the seller listing an item he has on sale, providing an item description (including text and optionally photos), a starting bid, an optional reserve price, and an auction closing date/time. Buyers then place bids for the item up until the auction closing time. The highest bidder wins the auction. The winning bidder sends payment to the seller. Finally, the seller sends the item to the winning bidder. It is easy to see that the above mechanism incurs significant risks for the buyer. Sellers can exploit the underlying information asymmetries to their advantage by misrepresenting an item’s attributes (adverse selection) or by failing to complete the transaction (moral hazard), i.e., keeping the buyer’s money without sending anything back. It is clear that without an adequate solution to these adverse selection and moral hazard problems, sellers have an incentive to always cheat and/or misrepresent, and therefore expecting this, buyers would either not use eBay at all or place very low bids. To address these problems, eBay uses online feedback as its primary trust building mechanism. More specifically, following completion of a transaction, both the seller and the buyer are encouraged to rate one another. A rating designates a transaction as positive, negative, or neutral, together with a short text comment. eBay aggregates all ratings posted for a member into that member’s feedback profile. An eBay feedback profile consists of four components (Figure 20.2 and Figure 20.3): 1. A member’s overall profile makeup: a listing of the sum of positive, neutral, and negative ratings received during that member’s entire participation history with eBay 2. A member’s summary feedback score equal to the sum of positive ratings received by unique users minus the number of negative ratings received by unique users during that member’s entire participation history with eBay 3. A prominently displayed eBay ID card, which displays the sum of positive, negative, and neutral ratings received during the most recent 6-month period (further subdivided into ratings received during the past week, month, and past 6 months) 4. The complete ratings history, listing each individual rating and associated comment posted for a member in reverse chronological order Seller feedback profiles are easily accessible from within the description page of any item for sale. More specifically, all item descriptions prominently display the seller’s eBay ID, followed by his summary feedback score (component B in Figure 20.2). By clicking on the summary feedback score, prospective buyers can access the seller’s full feedback profile (components A, B, and C) and can then scroll through the seller’s detailed ratings history (component D).
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 6 Wednesday, August 4, 2004 8:33 AM
20-6
Copyright 2005 by CRC Press LLC
The Practical Handbook of Internet Computing
FIGURE 20.2 Profile summary of eBay member.
C3812_C20.fm Page 7 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
Copyright 2005 by CRC Press LLC
FIGURE 20.3 Detailed feedback history.
20-7
C3812_C20.fm Page 8 Wednesday, August 4, 2004 8:33 AM
20-8
The Practical Handbook of Internet Computing
The feedback mechanism for eBay is based on two important assumptions: first, that members will indeed leave feedback for each other (feedback is currently voluntary and there are no concrete rewards or penalties for providing it or for failing to do so); second, that in addition to an item’s description, buyers will consult a seller’s feedback profile before deciding whether to bid on a seller’s auction. Based on the feedback profile information, buyers will form an assessment of the seller’s likelihood to be honest in completing the transaction, as well as to accurately describe the item’s attributes. This assessment will help buyers decide whether they will indeed proceed with bidding. It will, further, influence the amounts they are willing to bid. Sellers with “bad” profiles (many negative ratings) are therefore expected to receive lower bids or no bids. Knowing this, sellers with long horizons will find it optimal to behave honestly even towards onetime buyers to avoid jeopardizing their future earnings on eBay. At equilibrium, therefore, the expectation is that buyers will trust sellers with “good” profiles to behave honestly and sellers will indeed honor the buyers’ trust. Initial theoretical and empirical evidence suggests that, despite its simplicity, eBay’s feedback mechanism succeeds to a large extent in achieving these objectives (see Dellarocas [2003b] and Resnick et al. [2002] for surveys of relevant studies).
20.4 Reputation in Game Theory and Economics Given the importance of word-of-mouth networks in human society, reputation formation has been extensively studied by economists using the tools of game theory. This body of work is perhaps the most promising foundation for developing an analytical discipline of online reputation mechanism design. This section surveys past work in this area, emphasizing the results that are most relevant to the design of online reputation mechanisms. Section 20.5 then discusses how this body of work is being extended to address the unique properties of online systems.
20.4.1 Basic Concepts According to Wilson [1985] reputation is a concept that arises in repeated game settings when there is uncertainty about some property (the “type”) of one or more players in the mind of other players. If “uninformed” players have access to the history of past stage game (iteration) outcomes, reputation effects then often allow informed players to improve their long-term payoffs by gradually convincing uninformed players that they belong to the type that best suits their interests. They do this by repeatedly choosing actions that make them appear to uninformed players as if they were of the intended type (thus “acquiring a reputation” for being of that type). The existence of some initial doubt in the mind of uninformed players regarding the type of informed players is crucial in order for reputation effects to occur. To see this, consider an eBay seller who faces an sequence of sets of one-time buyers in a marketplace where there are only two kinds of products: high-quality products that cost 0 to the seller and are worth 1 to the buyers and low-quality products that cost 1 to the seller and are worth 3 to the buyers. Buyers compete with one another on a Vickrey (second-price) auction and therefore bid amounts equal to their expected valuation of the transaction outcome. The winning bidder sends payment to the seller and the seller then has the choice of either “cooperating” (producing a high quality good) or “cheating” (producing a low quality good). The resulting payoff matrix is depicted in Figure 20.4. If the seller cannot credibly precommit to cooperation and buyers are certain that they are facing a rational, utility-maximizing seller, the expected outcome of all transactions will be the static Nash equilibrium: sellers will always cheat and, expecting this, buyers always place low bids. This outcome is socially inefficient in that the payoffs of both parties are equal to or lower to those they could achieve if they cooperated. The concept of reputation allows the long-run player to improve his payoffs in such settings. Intuitively, a long-run player who has a track record of playing a given action (e.g., cooperate) often enough in the past acquires a reputation for doing so and is “trusted” by subsequent short-run players to do so in the future as well. However, why would a profit-maximizing long-term player be willing to behave in such a way and why would rational short-term players use past history as an indication of future behavior?
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 9 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-9
Cooperate
Cheat
Bid high
0,2
-2,3
Bid low
2,0
0.1
FIGURE 20.4 Payoff matrix of a simplified “eBay” bilateral exchange stage game (first number in each cell represents buyer payoff, second number represents seller payoff).
To explain such phenomena, Kreps, Milgrom, Roberts, and Wilson [1982], Kreps and Wilson [1982], and Milgrom and Roberts [1982] introduced the notion of “commitment” types. Commitment types are long-run players who are locked into playing the same action.1 An important subclass of commitment types are Stackelberg types: long-run players who are locked into playing the so-called Stackelberg action. The Stackelberg action is the action to which the long-run player would credibly commit if he could. In the above eBay-type example, the Stackelberg action would be to cooperate; cooperation is the action that maximizes the seller’s lifetime payoffs if the seller could credibly commit to an action for the entire duration of the game; therefore, the Stackelberg type in this example corresponds to an “honest” seller who never cheats. In contrast, an “ordinary” or “strategic” type corresponds to a profit-maximizing seller who cheats whenever it is advantageous for him to do so. Reputation models assume that short-run players know that commitment types exist, but are ignorant of the type of the player they face. An additional assumption is that short-run players have access to the entire history of past stage game outcomes. The traditional justification for this assumption is that past outcomes are either publicly observable or explicitly communicated among short-run players. The emergence of online feedback mechanisms provides, of course, yet another justification (however, the private observability of outcomes in online systems introduces a number of complications; see Section 20.5.2). A player’s reputation at any given time, then, consists of the conditional posterior probabilities over that player’s type, given a short-run player’s prior probabilities over types and the repeated application of Bayes’ rule on the history of past stage game (iteration) outcomes. In such a setting, when selecting his next move, the informed player must take into account not only his short-term payoff, but also the long-term consequences of his action based on what that action reveals about his type to other players. As long as the promised future gains due to the increased (or sustained) reputation that comes from playing the Stackelberg action offset whatever short-term incentives he might have to play otherwise, the equilibrium strategy for an “ordinary” informed player will be to try to “acquire a reputation” by masquerading as a Stackelberg type (i.e., repeatedly play the Stackelberg action with high probability.) In the eBay-type example, if the promised future gains of reputation effects are high enough, rational sellers are induced to overcome their short-term temptation to cheat and to try to acquire a reputation for honesty by repeatedly producing high quality. Expecting this, buyers will then place high bids, thus increasing the seller’s long-term payoffs.
20.4.2 Reputation Dynamics The derivation of equilibrium strategies in repeated games with reputation effects is, in general, quite complicated. Nevertheless, a small number of specific cases have been extensively studied. They provide interesting insight into the complex behavioral dynamics introduced by reputational considerations.
1
Commitment types are sometimes also referred to as "irrational" types because they follow fixed, "hard-wired" strategies as opposed to "rational" profit-maximizing strategies. An alternative way to justify such players is to consider them as players with non-standard payoff structures such that that the "commitment" action is their dominant strategy given their payoffs.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 10 Wednesday, August 4, 2004 8:33 AM
20-10
The Practical Handbook of Internet Computing
20.4.2.1 Initial Phase In most cases, reputation effects begin to work immediately and, in fact, are strongest during the initial phase, when players must work hard to establish a reputation. Holmstrom [1999] discusses an interesting model of reputational considerations in the context of an agent’s “career” concerns: Suppose that wages are a function of an employee’s innate ability for a task. Employers cannot directly observe an employee’s ability; however, they can keep track of the average value of her past task outputs. Outputs depend both on ability and labor. The employee’s objective is to maximize her lifetime wages while minimizing the labor she has to put in. At equilibrium, this provides incentives to the employee to work hard right from the beginning of her career in order to build a reputation for competence. In fact, these incentives are strongest at the very beginning of her career when observations are most informative. During the initial phase of a repeated game it is common that some players realize lower, or even negative, profits while the community “learns” their type. In those cases players will only attempt to build a reputation if the losses from masquerading as a Stackelberg type in the current round are offset by the present value of the gains from their improved reputation in the later part of the game. In trading environments, this condition usually translates to the need of sufficiently high profit margins for “good quality” products in order for reputation effects to work. This was first pointed out in [Klein and Leffler, 1981] and explored more formally in [Shapiro, 1983]. Another case where reputation effects may fail to work is when short-run players are “too cautious” vis-à-vis the long-run player and therefore update their beliefs too slowly in order for the long-run player to find it profitable to try to build a reputation. Such cases may occur when, in addition to Stackelberg (“good”) types, the set of commitment types also includes “bad” or “inept” types: players who always play the action that the short-run players like least. In the eBay-type example, a “bad” type corresponds to a player who always cheats. If short-run players have a substantial prior belief that the long-run player may be a “bad,” type then the structure of the game may not allow them to update their beliefs fast enough to make it worthwhile for the long-run player to try to acquire a reputation. Diamond’s [1989] analysis of reputation formation in debt markets presents an example of such a setting, In Diamond’s model there are three types of borrowers: safe borrowers, who always select safe projects (i.e., projects with zero probability of default); risky borrowers, who always select risky projects (i.e., projects with higher returns if successful but with nonzero probability of default); and strategic borrowers who will select the type of project that maximizes their long term expected payoff. The objective of lenders is to maximize their long term return by offering competitive interest rates, while at the same time being able to distinguish profitable from unprofitable borrowers. Lenders do not observe a borrower’s choice of projects, but they do have access to her history of defaults. In Diamond’s model, if lenders believe that the initial fraction of risky borrowers is significant, then despite the reputation mechanism, at the beginning of the game interest rates will be so high that strategic players have an incentive to select risky projects. Some of them will default and will exit the game. Others will prove lucky and will begin to be considered safe players. It is only after lucky strategic players have already acquired some initial reputation (and therefore begin to receive lower interest rates) that it becomes optimal for them to begin “masquerading” as safe players by consciously choosing safe projects in order to maintain their good reputation. 20.4.2.2 Steady State (or Lack Thereof) Reputation games are ideally characterized by an equilibrium in which the long-run player repeatedly plays the Stackelberg action with high probability and the player’s reputation converges to the Stackelberg type. The existence of such equilibria crucially depends on the ability to perfectly monitor the outcomes of individual stage games. In games with perfect public monitoring of stage game outcomes, such a steady state almost always exists. For example, consider the “eBay game” that serves as an example throughout
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 11 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-11
this section, with the added assumption that buyers perfectly and truthfully observe and report the seller’s action. In such cases, the presence of even a single negative rating on a seller’s feedback history reveals the fact that the seller is not honest. From then on, buyers will always choose the low bid in perpetuity. Since such an outcome is not advantageous for the seller, reputation considerations will induce the seller to cooperate forever. The situation changes radically if monitoring of outcomes is imperfect. In the eBay example, imperfect monitoring means that even when the seller produces high quality there is a possibility that an eBay buyer will post a negative rating, and, conversely, even when the seller produces low quality, the buyer may post a positive rating. A striking result is that in such “noisy” environments, reputations cannot be sustained indefinitely. If a strategic player stays in the game long enough, short-run players will eventually learn his true type and the game will inevitably revert to one of the static Nash equilibria [Cripps, Mailath, and Samuelson, 2002]. To see the intuition behind this result, note that reputations under perfect monitoring are typically supported by a trigger strategy. Deviations from the equilibrium strategy reveal the type of the deviator and are punished by a switch to an undesirable equilibrium of the resulting complete-information continuation game. In contrast, when monitoring is imperfect, individual deviations neither completely reveal the deviator’s type nor trigger punishments. Instead, the long-run convergence of beliefs ensures that eventually any current signal of play has an arbitrarily small effect on the uniformed player’s beliefs. As a result, a player trying to maintain a reputation ultimately incurs virtually no cost (in terms of altered beliefs) from indulging in a single small deviation from Stackelberg play. But the long-run effect of many such small deviations from the commitment strategy is to drive the equilibrium to full revelation. Holmstrom’s “career concerns” paper provides an early special case of this striking result: the longer an employee has been on the market, the more “solid” the track record she has acquired and the less important her current actions in influencing the market’s future assessment of her ability. This provides diminishing incentives for her to keep working hard. Cripps, Mailath, and Samuelson’s result, then, states that if the employee stays on the market for a really long time, these dynamics will lead to an eventual loss of her reputation. These dynamics have important repercussions for systems like eBay. If eBay makes the entire feedback history of a seller available to buyers (as it does today) and if an eBay seller stays on the system long enough, the above result predicts that once he establishes an initial reputation for honesty, he will be tempted to occasionally cheat buyers. In the long run, this behavior will lead to an eventual collapse of his reputation and therefore of cooperative behavior. The conclusion is that, if buyers pay attention to a seller’s entire feedback history, eBay’s current mechanism fails to sustain long-term cooperation. 20.4.2.3 Endgame Considerations Since reputation relies on a tradeoff between current “restraint” and the promise of future gains, in finitely-repeated games incentives to maintain a reputation diminish and eventually disappear as the end of the game comes close. A possible solution is to assign some postmortem value to reputation, so that players find it optimal to maintain it throughout the game. For example, reputations can be viewed as assets that can be bought and sold in a market for reputations. Tadelis [1999] shows that a market for reputations is indeed sustainable. Furthermore, the existence of such a market provides “old” agents and “young” agents with equal incentives to exert effort [Tadelis, 2002]. However, the long-run effects of introducing such a market can be quite complicated since good reputations are then likely to be purchased by “inept” agents for the purpose of depleting them [Mailath and Samuelson, 2001; Tadelis, 2002]. Further research is needed in order to fully understand the long-term consequences of introducing markets for reputations as well as for transferring these promising concepts to the online domain.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 12 Wednesday, August 4, 2004 8:33 AM
20-12
The Practical Handbook of Internet Computing
20.5 New Opportunities and Challenges of Online Mechanisms In Section 20.2, a number of differences between online reputation mechanisms and traditional wordof-mouth networks were discussed. This section surveys our progress in understanding the opportunities and challenges that these special properties imply.
20.5.1 Understanding the Impact of Scalability Bakos and Dellarocas [2002] model the impact of information technology on online feedback mechanisms in the context of a comparison of the social efficiency of litigation and online feedback. They observe that online feedback mechanisms provide linkages between otherwise disconnected smaller markets (each having its own informal word-of-mouth networks) in which a firm operates. This, in turn, is equivalent to increasing the discount factor of the firm when considering the future impacts of its behavior on any given transaction. In trading relationships, a minimum discount factor is necessary to make reputation effects productive at all in inducing cooperative behavior. Once this threshold is reached, however, the power of reputation springs to life in a discontinuous fashion and high levels of cooperation can be supported. Thus, the vastly increased potential scale of Internet-based feedback mechanisms and the resulting ability to cover a substantial fraction of economic transactions are likely to render these mechanisms into powerful quality assurance institutions in environments where the effectiveness of traditional word-of-mouth networks has heretofore been limited. The social, economic, and perhaps even political consequences of such a trend deserve careful study.
20.5.2 Eliciting Sufficient and Honest Feedback Most game theoretic models of reputation formation assume that stage game outcomes (or imperfect signals thereof) are publicly observed. Online reputation mechanisms, in contrast, rely on private monitoring of stage game outcomes and voluntary feedback submission. This introduces two important new considerations (1) ensuring that sufficient feedback is, indeed, provided and (2) inducing truthful reporting. Economic theory predicts that voluntary feedback will be underprovided. There are two main reasons for this. First, feedback constitutes a public good: once available, everyone can costlessly benefit from it. Voluntary provision of feedback leads to suboptimal supply, since no individual takes account of the benefits that her provision gives to others. Second, provision of feedback presupposes that the rater will assume the risks of transacting. Such risks are highest for new products: prospective consumers may be tempted to wait until more information is available. However, unless somebody decides to take the risk of becoming an early evaluator, no feedback will ever be provided. Avery, Resnick, and Zeckhauser [1999] analyze mechanisms whereby early evaluators are paid to provide information and later evaluators pay so as to balance the budget. They conclude that any two of three desirable properties for such a mechanism can be achieved, but not all three, the three properties being voluntary participation, no price discrimination, and budget balance. Since monitoring is private and assessments usually subjective, an additional consideration is whether feedback is honest. Miller, Resnick, and Zeckhauser [2002] propose a mechanism for eliciting honest feedback based on the technique of proper scoring rules. A scoring rule is a method for inducing decision makers to reveal their true beliefs about the distribution of a random variable by rewarding them based on the actual realization of the random variable and their announced distribution [Cooke, 1991]. A proper scoring rule has the property that the decision maker maximizes the expected score when he truthfully announces his belief about the distribution. Their mechanism works as long as raters are assumed to act independently. Collusive behavior can defeat proper scoring rules. Unfortunately, online environments are particularly vulnerable to collusion. The development of effective mechanisms for dealing with collusive efforts to manipulate online ratings is currently an active area of research. Dellarocas [2000, 2004] explores the use of robust statistics in
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 13 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-13
aggregating individual ratings as a mechanism for seducing the effects of coordinated efforts to bias ratings. To this date, however, there is no effective solution that completely eliminates the problem.
20.5.3 Exploiting the Information Processing Capabilities of Feedback Mediators Most game theoretic models of reputation assume that short-run players have access to the entire past history of stage game outcomes and update their prior beliefs by repeated application of Bayes’ rule on that information. Online feedback mediators completely control the amount and type of information that is made available to short-run players. This opens an entire range of new possibilities: For example, feedback mediators can hide the detailed history of past feedback from short-term players and replace it with a summary statistic (such as the sum, mean, or median of past ratings) or with any other function of the feedback history. They can filter outlying or otherwise suspect ratings. They can offer personalized feedback profiles; that is, present different information about the same long-run player to different shortrun players. Such information transformations can have nontrivial effects in the resulting equilibria and can allow online reputation mechanisms to induce outcomes that are difficult or impossible to attain in standard settings. The following are two examples of what can be achieved: As discussed in Section 20.4.2.2, in environments with imperfect monitoring, traditional reputation models predict that reputations are not sustainable; once firms build a reputation they are tempted to “rest on their laurels”; this behavior, ultimately, leads to a loss of reputation. Economists have used a variety of devices to construct models that do not exhibit this undesirable behavior. For instance, Mailath and Samuelson [1998] assume that in every period there is a fixed, exogenous probability that the type of the firm might change. Horner [2002] proposes a model in which competition among firms induces them to exert sustained effort. Online feedback mediators provide yet another, perhaps much more tangible, approach to eliminating such problems: By designing the mediator to publish only recent feedback, firms are given incentives to constantly exert high effort. In the context of eBay, this result argues for the elimination of the detailed feedback history from feedback profile and the use of summaries of recent ratings as the primary focal point of decision-making. Dellarocas [2003a] studied the equilibria induced by a variation of eBay’s feedback mechanism in which the only information available to buyers is the sum of positive and negative ratings posted on a seller during the most recent N transactions. He found that, in trading environments with opportunistic sellers, imperfect monitoring of a seller’s effort level, and two possible transaction outcomes (corresponding to “high” and “low” quality respectively), such a mechanism induces high levels of cooperation that remain stable over time. Furthermore, the long-run payoffs are independent of the size of the window N. A mechanism that only publishes the single most recent rating is just as efficient as a mechanism that summarizes larger numbers of ratings. A second example of improving efficiency through proper mediator design can be found in Dellarocas [2002], which studied settings in which a monopolist sells products of various qualities and announces the quality of each product. The objective of a feedback mechanism in such settings is to induce truthful announcements. Once again, Cripps, Mailath, and Samuelson’s [2002] result predicts that, in noisy environments, a mechanism that simply publishes the entire history of feedback will not lead to sustainable truth-telling. Dellarocas proposes a mechanism that acts as an intermediary between the seller and the buyers. The mechanism does not publish the history of past ratings. Instead, it keeps track of discrepancies between past seller quality announcements and corresponding buyer feedback and then punishes or rewards the seller by “distorting” the seller’s subsequent quality announcements so as to compensate for whatever “unfair” gains or losses he has realized by misrepresenting the quality of his items. If consumers are risk-averse, at equilibrium this induces the seller to truthfully announce quality throughout the infinite version of the game.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 14 Wednesday, August 4, 2004 8:33 AM
20-14
The Practical Handbook of Internet Computing
20.5.4 Coping with Easy Name Changes In online communities it is usually easy for members to disappear and reregister under a completely different online identity with zero or very low cost. Friedman and Resnick [2001] refer to this property as “cheap pseudonyms.” This property hinders the effectiveness of reputation mechanisms: community members can build a reputation, milk it by cheating other members, and then vanish and reenter the community with a new identity and a clean record. Friedman and Resnick [2001] discuss two classes of approaches to this issue: Either make it more difficult to change online identities or structure the community in such a way so that exit and reentry with a new identity becomes unprofitable. The first approach makes use of cryptographic authentication technologies and is outside the scope of this chapter. The second approach is based on imposing an upfront cost to each new entrant, such that the benefits of “milking” one’s reputation are exceeded by the cost of subsequent reentry. This cost can be an explicit entrance fee or an implicit cost of having to go through an initial reputation-building (or “dues paying”) phase with low or negative profits. Friedman and Resnick [2001] show that, although dues paying approaches incur efficiency losses, such losses constitute an inevitable consequence of easy name changes. Dellarocas [2003a] shows how such a dues-paying approach can be implemented in an eBay-like environment where feedback mediators only publish the sum of recent ratings. He proves that, in the presence of easy name changes, the design that results in optimal social efficiency is one where the mechanism sets the initial profile of new members to correspond to the “worst” possible reputation.2 He further shows that, although this design incurs efficiency losses relative to the case where identity changes are not possible, its efficiency is the highest possible attainable by any mechanism if players can costlessly change their identities.
20.5.5 Exploring Alternative Architectures The preceding discussion has assumed a centralized architecture in which feedback is explicitly provided and a single trusted mediator controls feedback aggregation and distribution. Though the design possibilities of even that simple architecture are not yet fully understood, centralized reputation mechanisms do not begin to exhaust the new possibilities offered by information technology. In recent years the field of multiagent systems [Jennings, Sycara, and Wooldridge, 1998] has been actively researching online reputation systems as a technology for building trust and inducing good behavior in artificial societies of software agents. Two lines of investigation stand out as particularly novel and promising: 20.5.5.1 Reputation Formation Based on Analysis of “Implicit Feedback” In our networked society, several traces of an agent’s activities can be found on publicly accessible databases. Instead of (or in addition to) relying on explicitly provided feedback, automated reputation mechanisms can then potentially infer aspects of an agent’s attributes, social standing, and past behavior through collection and analysis of such “implicit feedback” information. Perhaps the most successful application of this approach to date is exemplified by the Google search engine. Google assigns a measure of reputation to each Web page that matches the keywords of a search request. It then uses that measure in order to rank order search hits. Google’s page reputation measure is based on the number of links that point to a page, the number of links that point to the pointing page, and so on [Brin and Page, 1998]. The underlying assumption is that if enough people consider a page to be important enough in order to place links to that page from their pages, and if the pointing pages
2
For example, if the mechanism summarizes the 10 most recent ratings, newcomers would begin the game with a profile that indicates that all 10 recent ratings were negative. An additional assumption is that buyers cannot tell how long a given seller has been on the market and therefore cannot distinguish between newcomers with "artificially tarnished" profiles and dishonest players who have genuinely accumulated many negative ratings.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 15 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-15
are “reputable” themselves, then the information contained on the target page is likely to be valuable. Google’s success in returning relevant results is testimony to the promise of that approach. Pujol, Sangüesa, and Delgado [2002] propose a generalization of the above algorithm that “extracts” the reputation of nodes in a general class of social networks. Sabater and Sierra [2002] describe how direct experience, explicit, and implicit feedback can be combined into a single reputation mechanism. Basing reputation formation on implicit information is a promising solution to problems of eliciting sufficient and truthful feedback. Careful modeling of the benefits and limitations of this approach is needed in order to determine in what settings it might be a viable substitute or complement of voluntary feedback provision. 20.5.5.2 Decentralized Reputation Architectures Our discussion of reputation mechanisms has so far implicitly assumed the honesty of feedback mediators. Alas, mediators are also designed and operated by parties whose interests may sometimes diverge from those of community participants. Decentralizing the sources of reputation is a promising approach for achieving robustness in the presence of potentially dishonest mediators and privacy concerns. A number of decentralized reputation mechanisms have recently been proposed (Zacharia, Moukas, and Maes [2000]; Mui, Szolovits, and Ang [2001], Sen and Sajja [2002], Yu and Singh [2002]). The emergence of peer-to-peer networks provides a further motivation for developing decentralized reputation systems. In such networks, reputation system represent a promising mechanism for gauging the reliability and honesty of the various network nodes. Initial attempts to develop reputation mechanisms for peer-to-peer networks are reported in Aberer and Despotovic [2001], Kamvar, Schlosser, and Garcia-Molina [2003], and Xiong and Liu [2003]. Though novel and intriguing, none of these works provides a rigorous analysis of the behavior induced by the proposed mechanisms or an explicit discussion of their advantages relative to other alternatives. More collaboration is needed in this promising direction between computer scientists, who better understand the new possibilities offered by technology, and social scientists, who better understand the tools for evaluating the potential impact of these new systems.
20.6 Conclusions Online reputation mechanisms harness the remarkable ability of the Internet not only to disseminate but also collect and aggregate information from large communities at very low cost in order to artificially construct large-scale word-of-mouth networks. Such networks have historically proven to be effective social control mechanisms in settings where information asymmetries can adversely impact the functioning of a community and where formal contracting is unavailable, unenforceable, or prohibitively expensive. They are fast emerging as a promising alternative to more established trust building mechanisms in the digital economy. The design of such mechanisms can greatly benefit from the insights produced by more than 20 years of economics and game theory research on the topic of reputation. These results need to be extended to take into account the unique new properties of online mechanisms such as their unprecedented scalability, the ability to precisely design the type of feedback information that is solicited and distributed, the volatility of online identities, and the relative lack of contextual cues to assist interpretation of what is, essentially, subjective information. The most important conclusion drawn from this survey is that reputation is a powerful but subtle and complicated concept. Its power to induce cooperation without the need for costly and inefficient enforcement institutions is the basis of its appeal. On the other hand, its effectiveness is often ephemeral and depends on a number of additional tangible and intangible environmental parameters. In order to translate these initial results into concrete guidance for implementing and participating in effective reputation mechanisms further advances are needed in a number of important areas. The
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 16 Wednesday, August 4, 2004 8:33 AM
20-16
The Practical Handbook of Internet Computing
following list contains what the author considers to be the imperatives in the most important open areas of research in reputation mechanism design: • Scope and explore the design space and limitations of mediated reputation mechanisms. Understand what set of design parameters work best in what settings. Develop formal models of those systems in both monopolistic and competitive settings. • Develop effective solutions to the problems of sufficient participation, easy identity changes, and strategic manipulation of online feedback. • Conduct theory-driven experimental and empirical research that sheds more light into buyer and seller behavior vis-à-vis such mechanisms. • Compare the relative efficiency of reputation mechanisms to that of more established mechanisms for dealing with information asymmetries (such as state-backed contractual guarantees and brandname building) and develop theory-driven guidelines for deciding which set of mechanisms to use and when. • Identify new domains where reputation mechanisms can be usefully applied. Online reputation mechanisms attempt to artificially engineer heretofore naturally emerging social phenomena. Through the use of information technology, what had traditionally fallen within the realm of the social sciences is, to a large extent, being transformed into an engineering design problem. The potential to engineer social outcomes through the introduction of carefully crafted information systems is opening a new chapter on the frontiers of information technology. It introduces new methodological challenges that require collaboration between several traditionally distinct disciplines, such as economics, computer science, management science, sociology, and psychology, in order to be properly addressed. Our networked societies will benefit from further research in this exciting area.
References Aberer, K., Z. Despotovic, 2001. Managing trust in a peer-2-peer information system. Proceedings of the tenth international conference on Information and Knowledge Management. Association for Computing Machinery, Atlanta, GA. 310–317. Avery, C., P. Resnick, R. Zeckhauser. 1999. The Market for Evaluations. American Economics Review 89(3): 564–584. Bajari, P., A. Hortacsu. 2003. Winner’s Curse, Reserve prices and Endogenous Entry: Empirical Insights from eBay Auctions. Rand Journal of Economics 34(2). Bakos, Y., C. Dellarocas. 2002. Cooperation without Enforcement? A Comparative Analysis of Litigation and Online Reputation as Quality Assurance Mechanisms. L. Applegate, R. Galliers, J. I. DeGross, Eds. Proceedings of the 23rd International Conference on Information Systems (ICIS 2002). Association for Information Systems, Barcelona, Spain, 127–142. Benson, B. 1989. The Spontaneous Evolution of Commercial Law. Southern Economic Journal. 55(January) 644–661. Reprinted in D. B. Klein, Ed. 1997. Reputation: Studies in the Voluntary Elicitation of Good Conduct. University of Michigan Press, Ann Arbor, MI, 165–189. Brin S., L. Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1–7): 107–117. Cripps, M., G. Mailath, L. Samuelson. 2002. Imperfect Monitoring and Impermanent Reputations. Penn Institute for Economic Research Working Paper 02-021, University of Pennsylvania, Philadelphia, PA. (available at: http://www.econ.upenn.edu/Centers/pier/Archive/02-021.pdf) Dellarocas, C. 2000. Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior. Proceedings of the 2nd ACM Conference on Electronic Commerce. Association for Computing Machinery, Minneapolis, MN, 150–157.
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 17 Wednesday, August 4, 2004 8:33 AM
Online Reputation Mechanisms
20-17
Dellarocas, C. 2002. Goodwill Hunting: An economically efficient online feedback mechanism in environments with variable product quality. J. Padget, O. Shehory, D. Parkes, N. Sadeh, W. Walsh, Eds. Agent-Mediated Electronic Commerce IV. Designing Mechanisms and Systems. Lecture Notes in Computer Science 2351, Springer-Verlag, Berlin, 238–252. Dellarocas C. 2003a. Efficiency and Robustness of Binary Feedback Mechanisms in Trading Environments with Moral Hazard. MIT Sloan Working Paper No. 4297-03, Massachusetts Institute of Technology. Cambridge, MA (available at: http://ssrn.com/abstract=393043) Dellarocas C. 2003b. The Digitization of Word-of-Mouth: Promise and Challenges of Online Feedback Mechanisms. Management Science (October 2003). Dellarocas, C. 2004. Building Trust On-Line: The Design of Robust Reputation Mechanisms for Online Trading Communities. G. Doukidis, N. Mylonopoulos, N. Pouloudi, Eds. Social and Economic Transformation in the Digital Era. Idea Group Publishing, Hershey, PA. Dewan, S., V. Hsu, 2002. Adverse Selection in Reputations-Based Electronic Markets: Evidence from Online Stamp Auctions. Working Paper, Graduate School of Management, University of California, I r v i n e , C A ( a v a i l a b l e a t : h t t p : / / w e b . g s m . u c i . e d u / ~ s d e w a n / H o m e % 2 0 Pa g e / Adverse%20Selection%20in%20Reputations.pdf) Diamond, D. 1989. Reputation Acquisition in Debt Markets. Journal of Political Economy 97(4): 828–862. Friedman, E., P. Resnick. 2001. The Social Cost of Cheap Pseudonyms. Journal of Economics and Management. Strategy 10(1): 173–199. Fudenberg, D., D. Levine. 1992. Maintaining a Reputation When Strategies are Imperfectly Observed, Review of Economic Studies 59(3): 561–579. Greif, A. 1993. Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition. American Economic Review 83(June) 525–548. Holmstrom, B. 1999. Managerial Incentive Problems: A Dynamic Perspective. Review of Economic Studies 66(1): 169–182. Horner, J. 2002. Reputation and Competition, American Economic Review 92(3): 644–663. Houser, D., J. Wonders. 2000. Reputation in Auctions: Theory and Evidence from eBay. Department of Economics Working Paper 00-01, The University of Arizona, Tucson, AZ (available at: http://infocenter.ccit.arizona.edu/~econ/working-papers/Internet_Auctions.pdf) Jennings, N., K. Sycara, M. Wooldridge, 1998. A Roadmap of Agent Research and Development. Autonomous Agents and Multi-Agent Systems 1(1): 275–306. Kamvar, S. D., M. T. Schlosser, H. Garcia-Molina, 2003. The Eigentrust algorithm for reputation management in P2P networks. Proceedings of the 12th international conference on World Wide Web. Association for Computing Machinery, Budapest, Hungary, 640–651. Klein, D., Ed. 1997. Reputation: Studies in the Voluntary Elicitation of Good Conduct. University of Michigan Press, Ann Arbor, MI. Klein, B., K. Leffler. 1981. The Role of Market Forces in Assuring Contractual Performance. Journal of Political Economy 89(4): 615–641. Kollock, P. 1999. The Production of Trust in Online Markets. E. J. Lawler, M. Macy, S. Thyne, and H. A. Walker, Eds. Advances in Group Processes (Vol. 16). JAI Press, Greenwich, CT. Kreps, D., Milgrom, P., Roberts, J., Wilson, R. 1982. Rational Cooperation in the Finitely Repeated Prisoners’ Dilemma. Journal of Economic Theory 27(2): 245–252. Kreps, D., R. Wilson. 1982. Reputation and Imperfect Information. Journal of Economic Theory 27(2): 253–279. Lucking-Reiley, D., D. Bryan, et al. 2000. Pennies from eBay: the Determinants of Price in Online Auctions, Working Paper, University of Arizona, Tucson, AZ (available at: http://eller.arizona.edu/~reiley/ papers/PenniesFromEBay.html.) Mailath, G. J., L. Samuelson. 1998. Your Reputation Is Who You’re Not, Not Who You’d Like to Be. Center for Analytic Research in Economics and the Social Sciences (CARESS) Working Paper 98–11, University of Pennsylvania, Philadelphia, PA (available at: http://www.ssc.upenn.edu/~gmailath/ wpapers/rep-is-sep.html)
Copyright 2005 by CRC Press LLC
C3812_C20.fm Page 18 Wednesday, August 4, 2004 8:33 AM
20-18
The Practical Handbook of Internet Computing
Mailath, G. J., L. Samuelson. 2001. Who Wants a Good Reputation? Review of Economic Studies 68(2): 415–441. Mayzlin, D. 2003. Promotional Chat on the Internet. Working Paper #MK-14, Yale School of Management, New Haven, CT. Milgrom, P. R., D. North, B. R. Weingast. 1990. The Role of Institutions in the Revival of Trade: The Law Merchant, Private Judges, and the Champagne Fairs. Economics and Politics 2(1): 1–23. Reprinted in D. B. Klein, Ed. 1997. Reputation: Studies in the Voluntary Elicitation of Good Conduct. University of Michigan Press, Ann Arbor, MI, 243–266. Milgrom, P., J. Roberts. 1982. Predation, Reputation and Entry Deterrence. Journal of Economic Theory 27(2): 280–312. Miller, N., P. Resnick, R. Zeckhauser. 2002. Eliciting Honest Feedback in Electronic Markets, Research Working Paper RWP02-039, Harvard Kennedy School, Cambridge, MA (available at: http:// www.si.umich.edu/~presnick/papers/elicit/index.html) Mui, L., P. Szolovits, C. Ang. 2001. Collaborative sanctioning: applications in restaurant recommendations based on reputation. Proceedings of the 5th International Conference on Autonomous Agents, Association for Computing Machinery, Montreal, Quebec, Canada, 118–119. Pujol, J. M., R. Sanguesa, J. Delgado. 2002. Extracting reputation in multi agent systems by means of social network topology. Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Bologna, Italy, 467–474. Resnick, P., R. Zeckhauser, E. Friedman, K. Kuwabara. 2000. Reputation Systems. Communications of the ACM 43(12): 45–48. Resnick, P., R. Zeckhauser. 2002. Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System. Michael R. Baye, Ed. The Economics of the Internet and E-Commerce (Advances in Applied Microeconomics, Vol. II). JAI Press, Greenwich, CT. Resnick, P, R. Zeckhauser, J. Swanson, K. Lockwood. 2002. The Value of Reputation on eBay: A Controlled Experiment. Working Paper, University of Michigan, Ann Arbor, MI (available at: http:// www.si.umich.edu/~presnick/papers/postcards/index.html.) Sabater, J., C. Sierra. 2002, Reputation and social network analysis in multi-agent systems. Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Bologna, Italy, 475–482. Sen, S., N. Sajja. 2002. Robustness of reputation-based trust: boolean case. Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Bologna, Italy, 288–293. Shapiro, C. 1983. Premiums for High Quality Products as Returns to Reputations. The Quarterly Journal of Economics 98(4): 659–680. Tadelis, S. 1999. What’s in a Name? Reputation as a Tradeable Asset. The American Economic Review 89(3): 548–563. Tadelis, S. 2002. The Market for Reputations as an Incentive Mechanism. Journal of Political Economy 92(2): 854–882. Wilson, R. 1985. Reputations in Games and Markets. A. Roth, Ed. Game-Theoretic Models of Bargaining, Cambridge University Press, Cambridge, U.K., 27–62. Xiong, L., L. Liu, 2003. A Reputation-Based Trust Model for Peer-to-Peer ecommerce Communities. IEEE Conference on E-Commerce (CEC’03), Newport Beach, KY, June, 2003. Yu, B., M. Singh. 2002. An evidential model of distributed reputation management. Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Bologna, Italy, 294–301. Zacharia, G., A. Moukas, P. Maes. 2000. Collaborative Reputation Mechanisms in Electronic Marketplaces. Decision Support Systems 29(4): 371–388.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 1 Wednesday, August 4, 2004 8:38 AM
21 Digital Rights Management
Mikhail Atallah Keith Frikken Carrie Black Susan Overstreet Pooja Bhatia
CONTENTS Abstract 21.1 Introduction 21.2 Overview 21.3 Digital Rights Management Tools 21.3.1 Software Cracking Techniques and Tools 21.3.2 Protection Mechanisms 21.3.3 Further Remarks About Protection Mechanisms
21.4 Legal Issues Acknowledgment References
Abstract This chapter surveys Digital Rights Management (DRM). The primary objective of DRM is to protect the rights of copyright owners for digital media, while protecting the privacy and the usage rights of the users. We review the various approaches to DRM, their weaknesses and strengths, advantages and disadvantages. We also survey the tools that are used to circumvent DRM protections, and techniques that are commonly used for making DRM protections more resilient against attack. We also take a brief look at the legal issues of DRM. The chapter gives an overview and a brief glimpse of these techniques and issues, without going into intricate technical details. While the reader cannot expect to find all DRM topics in this chapter, nor will the chapter’s coverage of its topics be complete, the reader should be able to obtain sufficient information for initial inquiries and references to more in-depth literature.
21.1 Introduction The ability to create exact replicas of digital data reduces the ability to collect payment for an item, and since there is no degradation of quality for these easily made copies, this “piracy” is damaging to the owners of digital content. The fear of piracy has kept many content owners from embracing the Internet as a distribution channel, and has kept them selling their wares on physical media and in physical stores. Compared to the digital distribution of data which requires no physical media and no middlemen, this is not an economically efficient way of doing business, but it has been lucrative for content owners; they are in no rush to make it easier for pirates to steal their movies. However, this “physical delivery” model’s profitability for the content owners has decreased with the online availability of pirated versions of their content, which is more convenient to download than taking a trip to the store (and this pirated version is also “free” to the unscrupulous). Some content owners have recently turned to online delivery, using a mix of DRM technology and legal means to protect their rights, with mixed results. The protections are typically
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 2 Wednesday, August 4, 2004 8:38 AM
21-2
The Practical Handbook of Internet Computing
defeated by determined attackers, sometimes after considerable effort. Some of the deployed and proposed techniques for protecting digital data have potential for serious damage to consumer privacy. Digital Rights Management, more commonly known as DRM, aims at making possible the distribution of digital content while protecting the rights of the parties involved (mainly the content owners and the consumers). The content owners run risks such as the unauthorized duplication and redistribution of their digital content (piracy), or the unauthorized violations of use policies by a user, whereas the consumers run risks that include potential encroachments on their privacy as a result of DRM technology. Any computer user also runs the risk that, because of a combination of DRM-related legislation (that mandates one particular approach to DRM) and corporate decisions from within the IT and entertainment industries, future computers may become too constrained in their architectures and too inflexible to continue being the wonderful tools for creation and tinkering that they have been in the past. While it may not be possible to eliminate all these risks using purely technological means (because legal, ethical, business, and other societal issues play important roles), sound DRM technology will play a crucial role in mitigating these risks. DRM technology also has many other “side uses,” such as ensuring that only authorized applications run (e.g., in a taximeter, odometer, or any situation where tampering is feared) and that only valid data is used. This chapter is a brief introduction to DRM. Before we delve further into DRM-related issues, we need to dispel several misconceptions about DRM including that DRM is primarily about antipiracy, that DRM and access control are the same problem, and that DRM is a purely technical problem. DRM is about more than antipiracy. It is just as much about new business models, new business and revenue opportunities for content owners and distributors, and wide-ranging policy enforcement (preventing uncontrolled copying and distribution is but one kind of policy). As we shall see later in this chapter, DRM technologies can also have profound implications for antivirus protection, integrity checking and preservation, and the security of computer systems and networks in general. The line between DRM and access control is blurred by many DRM proponents. Access control is a different problem than DRM; the existing and proposed solutions for access control are simpler, less restrictive, and with less wide-ranging implications for system architectures, than those for DRM. How does DRM differ from access controls? The latter is about control of clients’ access to server-side content, whereas the former is about control of clients’ access to client-side content. For example, if the server wants to make sure that only authorized personnel access a medical records database through the network, this is access control, but if the software on the client’s machine wants to allow a user to view a movie but only in a specified manner (for example the user can only view the movie three times, or within the next 30 days, or the user cannot copy the movie, etc.) then this is DRM. A remarkable number of otherwise knowledgeable corporate spokespersons misrepresent DRM technology or proposed DRM legislation as “protecting the user,” when the truth is that the user is the adversary from whom they seek protection and whom they wish to constrain. DRM is largely about constraining and limiting the user’s ability to use computers and networks. It may be socially desirable to so constrain the user, and thus to protect the revenue stream of digital content owners and distributors (the creators of digital content have grocery bills to pay, too). But it should be stated for what it is: protection from the user, not of the user. The interested user may consent to such constraints as a condition for using a product or system, viewing a movie, etc. However, as we will see below, there are attempts to impose an exorbitant dose of DRM-motivated costs on all users, even legally constraining what digital devices (including computers) can and cannot do. The rest of this chapter proceeds as follows: In Section 2, we give an overview of the general techniques used in DRM; section 3 discusses the tools to protect DRM (encryption wrappers, watermarking, code obfuscation, trusted hardware, etc.) and the tools the attackers use to circumvent DRM technologies; and Section 4 discusses legal (also social, political) issues related to DRM.
21.2 Overview The purpose of this section is to provide an overview of how DRM systems operate. There are various design questions that need to be addressed, and in many situations there is no clear, strictly superior solution.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 3 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-3
Content owners will have differing requirements that they desire to place upon their content (e.g., how many times it can be viewed, whether it can be copied, modified, etc.). Hence, to have a general DRM system there must be some mechanism for describing this range of policies. Metadata is used for this purpose; metadata describes the content and policies relating to data (ownership, usage, distribution, etc.). Standards and techniques exist for expressing, in a way that is computer-interpretable yet convenient for humans to use, the policies concerning the access and use of digital content. As elsewhere, open standards are desirable, as they ensure interoperability. Examples include XrML (originated from XEROX PARC) and ODRL (Open Digital Rights Language). XrML is becoming the dominant industry standard. The rights to be managed are not limited to duplication, but also to modification, printing, expiration, release times, etc. They are described in a license, whose terms and conditions are verified and enforced by the reader (e.g., a movie viewer) that mediates access to the digital content. With a standard for expressing metadata, there is a design issue that needs to be addressed. Should the metadata be embedded into the content or should it be stored separately? An advantage in keeping content and the Metadata separate is that the policies can be updated after purchase (e.g., if the customer wants to buy three additional viewings of the movie then it can be done without having to redownload the movie). When the metadata is embedded, on the other hand, the policies for the file are self-contained and any attack against the file either changes the metadata or prevents applications from looking for metadata. These attacks can also be done when the metadata is stored separately, but there is another avenue of attack that tries to modify the linking of the content to its metadata. Thus, embedding the data is likely to be more secure, but it is more convenient to keep the metadata separate. In some commercial products the policy is stored in a remote secure server, allowing more central control, and rapid modification and enforcement of policies. This has been popular with some corporations. Corporate emails and internal documents are therefore always sent in encrypted form. Viewing an email then means that the mail application has to obtain the decryption key from the server that can enforce the policy related to that email (which could be “view once only,” “not printable,” “not forwardable,” etc.). Such a corporation can now remotely “shred” emails and documents by having the server delete the key that corresponds to them (so that they become impossible to read, even though they remain on various employee computers). A common policy like “all emails should be deleted after three months” that is next to impossible to enforce otherwise (employees simply do not comply) now becomes easily enforceable. This may be socially undesirable because it facilitates the automatic deletion of evidence of wrongdoing, but it appeals to many corporations who are worried about ancient emails causing them huge liability (as so often happens with lawsuits alleging harassment, hostility, discrimination, etc.). What are some of the approaches one can take to prevent the unrestricted replication and dissemination of digital content? We begin with the case of media (audio, video, text, and structured aggregates thereof) because the question of protecting media ultimately reduces to the question of protecting software (as will become clear in what follows). One way is to force access to the media to take place through approved applications (“players”) that enforce the DRM policy (embedded in the media or separate). One policy could take the form of “tying” the media to the hardware, so that copying it to another machine would make it no longer viewable. (Of course, the DRM data attached to the media would now specify the approved hardware’s identifying parameters.) This is problematic if the customer repairs hardware or upgrades it and would seem to require the customer obtaining a certification from some approved dealer that she did indeed perform the upgrade (or repair) of her hardware and thus deserves a cost-free copy of the media (that would now be tied to the new hardware). The policy could alternatively take the form of tying the media to a particular customer, who can view it on all kinds of different hardwares after undergoing some form of credentials-checking (possibly involving communication with the media owner’s server). Finally, the policy could be that the customer never actually acquires the media as a file to be played at the customer’s convenience. The player plays a file that resides on a remote server under the control of the media owner. However, requiring that media files reside on a remote server will add
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 4 Wednesday, August 4, 2004 8:38 AM
21-4
The Practical Handbook of Internet Computing
a large communication overhead and will disallow the playing of media files on systems that are not connected to a network. Given a choice between being able to prevent the violation of policy and being able to detect the violation and take legal action, it is obvious that the prevention of such violations is preferred. However, it is often easier to detect violations than to prevent them. This is evident if one contrasts the mechanisms that need to be in place for detection versus protection. In either case some data needs to be bound to the object (in the case of detection this would be the ownership information, while in the prevention case this would be the policies allowed for the file). However, for prevention there is an additional step of requiring “viewers” of the data to enforce the rules, and steps need to be taken to prevent renegade players. Thus, the mechanisms that need to be in place for prevention are a superset of those for detection. Relying on the legal system for DRM protection is problematic since many violators are in countries outside the reach of our legal system. Even with domestic violators, legal means can be difficult. For example, if there is piracy by a large number of parties it is tedious and expensive to prosecute each case. There are advantages and disadvantages to both approaches, and it is often best to use a hybrid of the two. To create a DRM system that is resilient to attack, one must understand how an attacker would compromise such a system. In all such approaches that rely on an approved player for enforcing DRM policy, the point of attack for digital bandits becomes the player. “Cracking” the player produces a version of it that is stripped of the functionality that the digital pirates dislike (the one that does credentials-checking and enforces other DRM policies), while preserving the player’s ability to view the digital media. Many “cracked” versions of such player software can be found on the Internet (e.g., on “warez” sites). Thus the question of preventing the cracking (unauthorized modification) of software is central to the protection of media. If the player proves unbreakable then another avenue of attack would be to modify the policy files (i.e., the metadata) in some way as to allow more liberal access. This could be done by either removing the policy altogether or by replacing it with a more “liberal” policy. A design decision for a player is how to handle files without policies. There are two options: (1) treat all such files as files where users have complete control, or (2) treat all such files as pirated files and refuse to “play” the file for the user. If the second approach is used, then one does not need to worry about attacks that remove policies, but it is dangerous since it obsoletes many current systems (including ones where users have legitimate access), which may cause users to boycott the new technology. A final issue that is addressed in this section is why is DRM difficult? DRM is difficult because it is not enough to just prevent an average user from breaking the DRM system. It is not difficult to prevent the average person from being able to “crack” a DRM scheme, but if an expert creates a “cracked” player or a tool for removing copyright protections, then this expert could make it available to the public. If this happens, then even a novice user can use it and effectively be as powerful as the expert. Thus DRM is difficult because a single breaking of the system possibly allows many users, even those with limited technical knowledge, to circumvent the system. DRM is easier in some situations where it is enough to postpone the defeat of a protection “long enough” rather than stop it altogether; what “long enough” means varies according to the business in question. For example, computer game publishers make most of the revenue from a product in the first few weeks after its initial release.
21.3 Digital Rights Management Tools In this section we take a look at the various tools used for software cracking and for DRM protection. The attacker’s toolkit consists of decompilers, debuggers, performance profilers, and static and dynamic program analysis tools. The defender’s toolkit can be divided into two groups: hardware and software. Software techniques include encryption wrappers, watermarking, finger-printing, code obfuscation, and software aging. Hardware techniques include dongles and trusted hardware. This section should be viewed as a survey of these techniques, since detailed discussion of these tools in depth would require much more space than available for this chapter.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 5 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-5
21.3.1 Software Cracking Techniques and Tools The following is a brief survey of the tools that exist or that are likely to be developed to attack software protection mechanisms (some of these are not normally thought of as attack tools). 1. Disassemblers, decompilers, and debuggers. These allow attackers to: • Locate events in the code. The places where a variable (say password) is used, a routine is called (say to open a information box), or a jump is made (say to “exit message”) can be identified. • Trace variable use in the code. One can follow the password and variables derived from it. This is tedious to do unless the use is simple, but it is faster than reading the code. • Create more understandable version of the code. This is the principal use of these tools when code protections are complex. 2. Performance Profilers: These can be used to identify short, computation intensive code segments. Some protection mechanisms (like encryption wrappers) have this property. 3. Static and dynamic program analysis tools: These, and their natural extensions, provide valuable information to the attacker. Program slicing identifies which instructions actually affect the value of a variable at any point in the program. Value tracing identifies and displays the ancestors and descendents of a particular variable value. Distant dependencies are also displayed — dependencies that are widely separated. This tool can help locate protection mechanisms and show how values propagate through code. Constant tracing identifies all variables that always acquire the same value at run-time in a program, that is, values that are independent of program input. These are important in protection, as they are keys to authorizations (passwords, etc.), to protection (checksums), and to implicit protections (constants in identities), so that knowing these values gives the attacker important clues on how a code is protected. 4. Pattern Matching: A protected program may have many protection-related code fragments and identities inserted into it. These are normally short pieces of code so one can look for repeated patterns of such codes. More sophisticated versions of these tools look for approximate matches in the patterns.
21.3.2 Protection Mechanisms There are a plethora of protection mechanisms for DRM. These include encryption wrappers, watermarking, fingerprinting, code obfuscation, software aging, and various hardware techniques. The following section presents an overview of these tools; their technical details are omitted. None of these tools is a “magic bullet” that fixes all DRM problems, and in most cases a combination is used. 21.3.2.1 Encryption Wrappers In this protection technique the software file (or portions thereof) is encrypted, and self-decrypts at runtime. This, of course, prevents a static attack and forces the attacker to run the program to obtain a decrypted image of it. The protection scheme usually tries to make this task harder in various ways, some of which we briefly review next. One is to include in the program antidebug, antimemory-dump, and other defensive mechanisms that aim to deprive the attacker of the use of various attack tools. For example, to prevent the attacker from running the program in a synthetic (virtual machine) environment in which automated analysis and attack tools are used, the software can contain instructions that work fine in a real machine but otherwise cause a crash. This is sometimes done by having one instruction x corrupt an instruction y in memory at a time when it is certain that y would be in the cache on a real machine; so it is the uncorrupted version of y, the one in the cache, that executes on a real machine, but otherwise it is the corrupted one that tries to execute and often causes a crash. These protections can usually be defeated by a determined adversary (virtual machines exist that emulate a PC very faithfully, including the cache behavior). Another way to make the attacker’s task harder is to take care to not expose the whole program in unencrypted form at any one time — code decrypts as it needs to for execution, leaving other parts of
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 6 Wednesday, August 4, 2004 8:38 AM
21-6
The Practical Handbook of Internet Computing
the program still encrypted. This way a snapshot of memory does not expose the whole decrypted program, only parts of it. This forces the attacker to take many such snapshots and try to piece them together to obtain the overall unencrypted program. But there is often a much less tedious route for the attacker: to figure out the decryption key which, after all, must be present in the software. (Without the key the software could not self-decrypt.) Encryption wrappers usually use symmetric (rather than public key) encryption for performance considerations, as public key encryption is much slower than symmetric encryption. The encryption is often combined with compression, resulting in less space taken by the code and also making the encryption harder to defeat by cryptanalysis (because the outcome of the compression looks random). To reap these benefits the compression has to be done prior to the encryption, otherwise there is no point in doing any compression. (The compression ratio would be poor because of compressing random-looking data, and the cryptanalyst would have structured rather than random cleartext to figure out from the ciphertext.) 21.3.2.2 Watermarking and Fingerprinting The goal of watermarking is to embed information into an object such that the information becomes part of the object and is hard to remove by an adversary without considerably damaging the object. There are many applications of watermarking, including inserting ownership information, inserting purchaser information, detecting modification, placing caption information, etc. If the watermarking scheme degrades the quality of an item significantly, then the watermarking scheme is useless since users will not want the reduced-quality object. There are various types of watermarks, and the intended purpose of the watermark dictates which type is used. Various decisions need to be made about the type of watermark for a given application. The following is a list of such decisions. 1. Should the watermark be visible or indiscernible? Copyright marks do not always need to be hidden, as some watermarking systems use visible digital watermarks which act as a deterrent to an attacker. Most of the literature has focused on indiscernible (e.g., invisible, inaudible) digital watermarks which have wider applications, hence we will focus on invisible watermarks in this section. 2. Should the watermark be different for each copy of the media item or should it be the same? A specific type of watermarking is fingerprinting, which embeds a unique message in each instance of digital media so that a pirated version can be traced back to the culprit. This has consequences for the adversary’s ability to attack the watermark, as two differently marked copies often make possible a “diff ” attack that compares the two differently marked copies and often allows the adversary to create a usable copy that has neither one of the two marks. 3. Should a watermark be robust or fragile? A fragile watermark is destroyed if even a small alteration is done to the digital media, while a robust one is designed to withstand a wide range of attacks. These attacks include anything that attempts to remove or modify the watermark without destroying the object. In steganography, the very existence of the mark must not be detectable. The standard way of securing communication is by encrypting the traffic, but in some situations the very fact that an encrypted message has been sent between two parties has to be concealed. Steganography hides the very existence of a secret message, as it provides a covert communication channel between two parties, a channel whose existence is unknown to a possible attacker. A successful attack now consists of detecting the existence of this communication (e.g., using statistical analysis of images with and without hidden information), so the attacker need not actually learn the secret message to be considered successful. Since invisible watermarking schemes often rely on inserting information into the redundant parts of data, inserting watermarks into compressed or encrypted data is problematic since data in this form either appears random (encryption) or does not contain redundancy (compression). Although watermarks may be embedded in any digital medium, by far most of the published research on watermarking has dealt with images. Robust image watermarking commonly takes two forms: spatial domain and
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 7 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-7
frequency domain. The former inserts the mark in the pixels of the image, whereas the latter uses the transform (Discrete Cosine, Fourier, etc.) of the image. One spatial domain watermarking technique mentioned in the literature slightly modifies the pixels in one of two randomly-selected subsets of an image. Modification might include, for example, flipping the low-order bit of each pixel representation. The easiest way to watermark an image is to change directly the values of the pixels in the spatial domain. A more advanced way to do it is to insert the watermark in the frequency domain, using one of the well known transforms: FFT, DCT, or DWT. There are other techniques for watermarking images; for example, there are techniques that use fractals to watermark an image. Software is also watermarkable, usually at design time by the software publisher. Software watermarks can be static, readable without running the software, or could appear only at run-time, possibly in an evanescent form that does not linger for long. In either case, reading the watermark can require knowing a secret key, without which the watermark remains invisible. Watermarks may be used for a variety of applications, including proof of authorship or ownership of a multimedia creation, fingerprinting for identifying the source of illegal copies of multimedia objects, authentication of multimedia objects, tamper-resistant copyright protection of multimedia objects, and captioning of multimedia objects to provide information about the object. We now give a more detailed explanation of each of these applications. 1. Proof of ownership: The creators or owners of the digital object need to prove that it is theirs and insert a watermark to that effect. For example, a photographer may want to be able to prove that a specific photograph was taken by him in order to prevent a magazine from using the photo without paying the royalties. By inserting a robust watermark into the item stating ownership information, the owner would be able to prove ownership of the item. 2. Culprit tracing: Inserting a robust watermark with information about the copyright owner, as well as the entity who is purchasing (downloading) the item, will allow traceback to the entity if the item were to be illegally disseminated to others by this entity. This is a form of fingerprinting. Breaking the security of the scheme can have nasty consequences, allowing the attacker to “frame” an innocent victim. One problem with this model is that a user could claim that another party stole his copy and pirated it, and such claims could be true and are difficult to disprove. 3. Credentials checking: If a trusted player wants to verify that the user has a license for an item, then a robust watermark inserted into the item can be used for this purpose. 4. Integrity checking: There are many cases where modification unauthorized of an object is to be detected. In this case placing a fragile watermark into an item (one that would be destroyed upon modification), and requiring the presence of the watermark at viewing-time, would prevent such modifications. If requirements are such that both a robust and fragile watermark need to be inserted into an object, then obviously the robust watermark should be inserted first. 5. Restricting use: Another use of robust watermarks is to embed captions into the information that contain the metadata for an object. For example, if an object is read-only then a watermark specifying this would allow a player to enforce this policy. The goal of an attack depends on the type of watermark and the attacker’s goal. For a robust watermark, the goal is often to remove or modify the watermark; for a fragile watermark, the goal is to make changes without destroying the watermark. The attacker may not care about removing an existing watermark. In this case, a second watermark may be inserted that is just as likely to be authentic as the first and prevent the effective use of the first in court. Any attack that removes a watermark must not destroy too much of the data being protected (e.g., the image must not become too blurry as a result), otherwise doing so will defeat the purpose of the attacker. In the case where the object being protected is a fingerprint, an attack must be resilient against multiple entities colluding together to remove watermarks by comparing their copies of the item (a “diff attack”). In summary, watermarks are a powerful DRM tool. However, their use can require trusted players to check for the existence of the watermarks, which implies that watermarking requires other DRM technologies to be effective.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 8 Wednesday, August 4, 2004 8:38 AM
21-8
The Practical Handbook of Internet Computing
21.3.2.3 Code Obfuscation Code obfuscation is a process which takes code and “mangles” it so that it is harder to reverse engineer. As pointed out in [Collberg and Thomborson, 2002], with code being distributed in higher level formats such as Java bytecode, protecting code from reverse engineering is even more important. Code obfuscation has many applications in DRM. For example, a company that produces software with algorithms and data structures that give them a competitive edge over their competition has to prevent their competition from reverse engineering their code and then using the proprietary algorithms and data structures in their own software. Code obfuscation also has applications besides protecting the rights of software companies; it is also useful when protecting players from being “cracked.” A trusted player has specific areas where policy checks are made, and this will likely be a target of a person trying to remove these checks. Obfuscating the code makes it more difficult to find the code fragments where such checks are made. Many transformations can be used to obfuscate code. In Collberg and Thomborson [2002], there are a set of requirements for these transformations, which are summarized here. It is important that these transformations do not change what the program does. To make the obfuscation resilient to attack, it is desirable to maximize the obscurity of the code, and these transformations need to be resilient against tools designed to automatically undo them. These transformations should not be easily detectable by statistical means, which will help prevent automatic tools from finding the locations of transformations. A limiting factor on these transformations is that the performance of the code should not be effected too much. For more information on code obfuscation see Collberg et al. [1997], Collberg and Thomborson [2002], and Wroblewski [2002]. An interesting note about obfuscation is that if the transformations hide information by adding crude, inefficient ways of doing simple tasks, then the code optimizer in the compiler may remove much of the obscurity and “undo” much of the obfuscation. If, on the other hand, the obfuscation “fools” the optimizer and prevents it from properly doing its job, then experience has shown that the performance hit due to obfuscation is considerable; so either way one loses something. This seems to speak in favor of low-level obfuscation (assembly level) because it does not prevent the code optimizer from doing its job, yet most existing automatic obfuscators are essentially source-tosource translators. Below are some of the types of obfuscation transformations that have been proposed: 1. Layout obfuscation involves manipulation of the code in a “physical appearance” aspect. Examples of this includes replacing important variables with random strings, removing all formatting (making nested conditional statements harder to read), and the removing of comments. These are the easiest transformations to make and they are not as effective as the other means; yet, when combined with the other techniques, they do contribute confusion to the overall picture. 2. Data obfuscation focuses on the data structures used within a program. This includes manipulation of the representation and the methods of using that data, merging of data (arrays or inheritance relationships) that is independent, or splitting up data that is dependent, and allows the reordering of the layout of the objects used. This is a helpful tactic, since the data structures are the elements that contain important information that any attacker needs to comprehend. 3. Control obfuscation attempts to manipulate the control flow of a program in such a way that a person is not able to discern its “true” (pre-obfuscation) structure. This is achieved through merging or splitting various fragments of code, reordering any expressions, loops or blocks, etc. The overall process is quite similar to creating a spurious program that is embedded into the original program and “tangled” with it, which aids in obfuscating the important features of a program. 4. Preventive transformations are manipulations that are made to stop any possible deobfuscation tool from extracting the “true” program from within the obfuscation. This can be done through the use of what Collberg et al. [1997] call opaque predicates. These are the main method that an obfuscation has to prevent any portions of the inner spurious program from being identified by the attacker. An opaque predicate is basically a conditional statement that always evaluates to true, but in a manner that is extremely hard to recognize, thereby confusing the attacker.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 9 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-9
Like most protections, obfuscation only delays a determined attacker; in other words it “raises the bar” against reverse engineering, but does not prevent it. The paper by Barak et al. [2001] gives a family of functions that are probably impossible to completely obfuscate. Thus code obfuscation is good for postponing reverse engineering, and if the time by which it can be postponed is acceptable to an organization, then obfuscation can be a viable approach. 21.3.2.4 Software Aging Software aging uses a combination of software updates and encryption to age the software and protect the intellectual property of the software owner. More explicitly, through the use of mandated updates, the software is able to maintain its functionality with respect to other copies of the same software. The focus of this technique is to deter pirating, where the user obtains a copy of the software and then (with or without modification) redistributes it at a discount cost. Unlike most other DRM techniques, this method can only be applied to specific types of software; it is applicable only for those that generate files or messages that will need to be interpreted or viewed by another user with the same software. Examples of this include such products as word processors or spreadsheets, since the users often create a document and then send it to other users to view or modify. The way software aging works is that every piece of software sold has two unique identifiers, a registration number that is different for every copy, and a key that is the same for all copies of the software. The registration number is necessary for the software update process, while the key is used to encrypt every file or message that a user creates with the software. More importantly, the key is also used to decrypt every file or message that it reads, whether it is their own file or someone else’s. With an outof-date key, the user will not be able to read documents encrypted with a more recent key. In applications that have a continual exchange of information with other users, this forces users to continually update their software, since they cannot read objects created by other versions. The software is “aged” by making documents produced by newer versions of the software incompatible with older software; thus forcing a software pirate to create a new cracked version for each update. One advantage of this scheme is that it allows for possible piracy detection. If any illegal purchaser would happen to try to download the updates, then this could lead to a correlation that this user has purchased or obtained an illegal copy of the software. So by the interaction of the illegal users with the pirate, where they would be attempting to receive the updates, linkage between the illegal users and the pirate could be obtained. The possible piracy detection could deter future piracy, which could thereby reduce the amount of pirated software. There are several disadvantages to this scheme. First, this technique is only possible with certain types of software, thereby making this scheme not very useful for various types of media or other types of software. It is not as extensible as other types of DRM technologies, which can be used for several different types of media. Also, with this scheme, there could exist a substantial overhead cost with the encryption and decryption process. Furthermore this scheme will be a nuisance to many users whenever they get a document that they cannot use. This nuisance is amplified if there is an illegal user that receives an update for a valid user before that user. Within this scheme, there could also exist some privacy concerns. Because the registration number is being sent to the vendor’s central server every time a piece of software receives a software update, this does allow for a form of ID linkage between the purchaser and a particular copy of the software. Also, because the user is continually downloading every time there is a new update available, the constant interaction between the purchaser and the vendor makes it necessary to be careful that there is no user-tracking software also in the download. Thus there needs to be a check in place so that there is no abuse of this constant interaction between the user’s PC and the vendor. To summarize, software aging provides a useful mechanism to deter piracy for a class of software, but the disadvantages of this technique make its usage cumbersome. 21.3.2.5 Hardware Techniques Any DRM protection mechanism that requires software players to make checks to determine if a user has the required license to use a file is vulnerable to an attacker modifying the player so that these checks
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 10 Wednesday, August 4, 2004 8:38 AM
21-10
The Practical Handbook of Internet Computing
are rendered ineffective (bypassed or their outcome ignored). Thus, this type of system does not protect against highly skilled attackers with large resources, and if such an attacker “cracks” a player and makes the cracked version publicly available then any user can circumvent the protection mechanisms. In Lipton et al. [2002] a proof is given that no fully secure all-software protection exists and that some form of trusted hardware is needed for really foolproof protection. This is strong motivation to use some form of tamper-resistant hardware in situations where a crack needs to be prevented rather than merely postponed “long enough.” If a DRM solution requires the presence of a hardware mechanism that cannot be circumvented with software alone, then any circumvention scheme would require that the hardware be cracked as well, a much harder task. Even if the hardware is cracked, the attacker who cracked it may not have an easy way to make the attack available to others. Whether he does depends on the details of the system and the extent to which the hardware protection is defeated; a compromise of secret keys within the hardware could, in some cases, lead to massive consequences. The presence of hardware can be used to prove authorization (e.g., to access a file), or the hardware can itself control the access and require proof of authorization before allowing access. In one type of protection the software checks for the presence of the hardware and exits if the hardware is absent, but that is open to attack — for example by modifying the software so it does it not perform the check (or still performs it but ignores its outcome). Another form of protection is when the tamperproof hardware performs some essential function without which the software cannot be used (such as computing a function not easily guessed from observing its inputs and outputs). Another type of hardware protection controls access and requires the user to prove adequate authority for using the file for the specific use. An example of this is for a sound card to not play a copyrighted sound file until proper authorization has been proven. Another example is to have the operating system and the hardware prevent the usage of unauthorized players. These mechanisms are covered in the section titled “other hardware approaches” below. 21.3.2.1.1 Dongles Some software applications use a copy protection mechanism most typically referred to by end-users as a dongle. A dongle is a hardware device that connects to the printer port, serial port, or USB port on a PC. Dongles were originally proposed for software piracy protection, but they can be used to protect media files. The purpose of dongles is to require the presence of the dongle in order to play a media file, which ties a piece of hardware to the digital media. Hence, in order to pirate the media file one must either (1) work around the need for the hardware or (2) duplicate the hardware. The difficulty of reversing a dongle depends on the complexity of the dongle; there are various kinds of dongles including (1) a device that just outputs a serial number (vulnerable to a “replay attack”), (2) a device that engages in a challenge-response protocol (different data each time, which prevents replay attack), (3) a device that decrypts content, and (4) a device that provides some essential feature of a program or media file. Dongles are still used by many specialized applications, typically those with relatively higher pricing. Dongles have several disadvantages that have limited their usage. Users dislike them for a variety of reasons, but mainly because they can be troublesome to install and use since they often require a special driver and can interfere with the use of peripherals such as printers and scanners. Since no standard exists for dongles, each protected program requires an additional dongle, which causes an unwieldy number of connected dongles. Dongles are also not an option for many software companies, since they add an additional manufacturing expense to each copy of the program. Dongles also do not facilitate Internetbased distribution of software since a dongle must be shipped to each customer to allow operation of the software. In summary, dongles “raise the bar” for the attacker but are not a particularly good copyprotection mechanism for most software applications. There has been some recent work to make dongles practical for DRM. Dongles are traditionally used for software piracy protection; however, in Budd [2001] a modification to the dongle was introduced in the form of a smartcard with a lithium battery that wears out in order to provide a viable method for media protection. The key idea of this system was that there was a universal key that would decrypt any digital object, and that the profit from battery sales would be distributed among the owners of such
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 11 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-11
protected objects. To distribute the profit fairly, the smartcards would keep track of usage information that would be collected through battery returning programs for which economic incentives are given. It is pointed out in Budd [2001] that a weakness in such a scheme is that if one smartcard is broken then all digital objects protected with that key become unprotected. Programs that use a dongle query the port at startup and at programmed intervals thereafter and terminate if it does not respond with the dongle’s programmed validation response. Thus, users can make as many copies of the program as they want but must pay for each dongle. The idea was clever, but it was initially a failure, as users disliked tying up a serial port this way. Almost all dongles on the market today will pass data through the port and monitor for certain codes with minimal if any interference with devices further down the line. This innovation was necessary to allow daisy-chained dongles for multiple pieces of software. The devices are still not widely used, as the software industry has moved away from copy-protection schemes in general. 21.3.2.1.2 Trusted Hardware Processors In any DRM mechanism, there are certain places in the code where important choices are made, e.g., whether or not to allow a user access to a movie. If the software’s integrity is not preserved at these crucial checkpoints then the DRM mechanism can be compromised. If checks are made in software, then one must prevent the software from being modified. To ensure that every single component running on a machine is legally owned and properly authorized, including the operating system, and that the infringement of any copyright (software, images, videos, text, etc.) does not occur, one must work from the lowest level possible in order to verify the integrity of everything on a particular machine. This could be done by a tamper resistant chip or some other form of trusted hardware. The tamper-resistant hardware would check and verify every piece of hardware and software that exists or that requests to be run on a computer, starting at the boot-up process, as described in Arbaugh et al. [1997]. This tamper-resistant, trusted hardware could guarantee integrity through a chain-like process by checking one entity at a time when the machine boots up, and every entity that wants to be run or used on that machine after the machine is already booted. Trusted hardware would restrict certain activities of the software through those particular hardware devices. We now give an example of how such trusted hardware could operate. The trusted hardware stores all of the the keys necessary to verify signatures, decrypt licenses and software before running it, and to encrypt messages in online handshakes and online protocols that it may need to run with a software vendor, or with another trusted server on the network. Software downloaded onto a machine would be stored in encrypted form on the hard drive and would be decrypted and executed by the trusted hardware, which would also encrypt and decrypt information it sends and receives from the random-access memory. The same software or media would be encrypted in a different way for each trusted processor that would execute it because each processor would have a distinctive decryption key. This would put quite a dent in the piracy problem, as disseminating one’s software or media files to others would not do them much good (because it would not be “matched” to the keys in their own hardware). It would also be the ultimate antivirus protection: A virus cannot attach itself to a file that is encrypted with an unknown key and, even if it could somehow manage that feat (say, because the file is not encrypted), it would nevertheless be rendered harmless by the fact that files could also have a signed hash of their content attached to them as a witness to their integrity. The trusted hardware would verify their integrity by verifying the signature and comparing the result to a hash of the file that it computes. (If they don’t match, then the trusted hardware knows that the file is corrupted). Note the inconvenience to the user if, for example, an electric power surge “fries” the processor and the user has to get it replaced by an approved hardware repair shop. Even though the disk drive is intact, the software and movies on it are now useless and must be downloaded again (because they are “matched” to the old processor but not to the new one). To prevent abuse from claims of fake accidents, this redownloading may necessitate for the hapless user to prove that the power-surge accident did occur (presumably with the help of the repair shop).
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 12 Wednesday, August 4, 2004 8:38 AM
21-12
The Practical Handbook of Internet Computing
The trusted hardware could enforce the rule that only “approved” sound cards, and video cards, output devices, etc., are a part of the computer system. It could enforce such rules as “No bad content is played on this machine” — the type of each content can be ascertained via digital signatures and allowed to play only if it is “approved” (i.e., signed by an authority whose signature verification key is known to the trusted hardware). The potential implications for censorship are chilling. One of the disadvantages of these types of systems is the time spent encrypting and decrypting; these disadvantages can be made acceptable through special hardware, but for low-end machines the boot-up time could be an annoyance. Trusted hardware also limits the use and functionality of many hardware devices that the public currently uses and has certain expectations about the continued usage of these devices. By limiting and hindering certain activities, this could cause a negative reaction from consumers; it is unlikely that consumers will appreciate the fact that they will be limited to purchase only certain “approved” hardware, software, and media. There are also serious privacy concerns relating to these hardware-based protection schemes because they could enable more stealthy ways for software publishers to spy on users, to know what is on a user’s computer, to control what die the user can and cannot execute, view, connect to the computer, print, etc. Censorship becomes much easier to enforce, whistleblowing by insiders at their organization’s misdeeds becomes harder, and the danger increases of unwise legislation that would mandate the use of DRM-motivated hardware that coerces the sale of severely restricted computers only. This would have disastrous consequences to IT innovation, and to the exportability of our IT technology; foreigners will balk at buying machines running only encrypted software that may do all kinds of unadvertised things besides the advertised word processing. Another drawback of hardware protections is their inflexibility. They are more awkward to modify and update than software-based ones. Markets and products change rapidly, consumers respond unpredictably, business and revenue models evolve, and software-based protections’ flexibility may in some cases offset their lower level of security. The largest software producer and the largest hardware producer are nevertheless both planning the introduction of such hardware-based technology, and at the same time the lobbyists from the movie industry and other content-providers have attempted to convince Congress to pass laws making them mandatory.
21.3.3 Further Remarks About Protection Mechanisms To summarize the protection mechanisms, we give a succinct summary of the protection mechanism outlined in this section. 1. Watermarking: A watermark allows information to be placed into an object. This information can be used to provide a trusted player with authorization or authentication information. 2. Fingerprinting: This is a special case of watermarking, in which the watermark is different for each user so that pirated copies can be traced to the original culprit. 3. Code obfuscation: This makes reverse engineering more difficult and can protect the trusted player from modification. 4. Software aging: This is an obscure protection mechanism that deters piracy by making pirates update their software frequently. However, this works only for certain kinds of software. 5. Dongles: A dongle is a hardware device that provides some information (or some capability) needed to use a file. 6. Hardware-based integrity: One of the difficulties with creating a DRM system is protecting the trusted players against unauthorized modification. An “integrity chip” can verify that a player has not been modified before allowing it to run and access copyrighted material. In summary, the functions DRM protection mechanisms perform include, mainly, to: (1) protect the Trusted Player (column PROTECT in the table below), (2) provide information to the Trusted Player (INFO), (3) make piracy easier to trace (TRACE), (4) make piracy more difficult by confining the digital
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 13 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-13
object in antidissemination ways (CONFINE) in some other ways than (1) to (3). The table below summarizes the above techniques in these regards. Technique
(PROTECT)
(INFO)
Fingerprinting
Software Aging
Dongles Integrity Chip
(CONFINE)
Watermarking
Code Obfuscation
(TRACE)
Each protection method needs to be evaluated according to its impact on the following software cost and performance characteristics: • • • • • • •
Portability Ease of use Implementation cost Maintenance cost Compatibility with installed base Impact on running time Impact on the size of the program
Applying a particular protection mechanism improves a security feature but can simultaneously degrade one or more of the above traditional (nonsecurity) characteristics. In summary, the application of DRM techniques is situational, and no single technique stands out as best in all situations. A hybrid of many mechanisms is often desirable.
21.4 Legal Issues DRM is a problem where issues from such areas as law, ethics, economics, business, etc., play a role at least as important (and probably more) than issues that are purely technical. A crucial role will be played by the insurance industry: An insurance rate is a valuable “price signal” from the marketplace about the effectiveness (or lack thereof) of a deployed DRM technology, just like the home insurance rebate one gets for a home burglar alarm system is a valuable indicator of how much the risk of burglary is decreased by such a system. It is now possible to buy insurance against such Internet-related risks as the inadvertent use of someone else’s intellectual property. Any DRM technology is bound to eventually fail in the face of a determined attacker who deploys large enough resources and can afford to use a long enough amount of time to achieve his goal. It is when technology fails that the law is most valuable. Section 1201(a) of the 1998 Digital Millennium Copyright Act (DMCA) says: “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” This can detract from security in the following way. In cryptography, it is through repeated attempts by researchers to break certain cryptosystems that we have acquired the confidence we have today in their security (many others fell by the wayside after being found to not be resistant enough to cryptanalysis). If reputable researchers are prevented from probing the weaknesses of various protection schemes and reporting their findings, then we may never acquire such confidence in them: They may be weak, and we would never know it (to the ultimate benefit of evildoers who will be facing weaker protections than otherwise). Also forbidden is the design and dissemination of tools whose primary use is to defeat the DRM protections. This should not apply to tools whose primary intended use is legal, but that can be misused as attack tools against DRM protections. For example, software debuggers are a major attack tool for
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 14 Wednesday, August 4, 2004 8:38 AM
21-14
The Practical Handbook of Internet Computing
software crackers, and yet creating and distributing debuggers is legal. However, there are huge potential problems for someone who creates a tool whose main purpose is legal but specialized and not “mainstream,” a tool that is later primarily used to launch powerful attacks on DRM protections. This could generate much DMCA-related headache for its creator (the argument that would be made by the entities whose revenue-stream it threatens is that its main use is so esoteric that its creator must surely have known that it would be used primarily as an attack tool). The freedom to “creatively play” that was behind so much of the information technology revolution can be threatened. Legislation often has unintended consequences, and the DMCA, enacted to provide legal weapons against digital pirates, has been no exception. This is especially so for its antitamper provisions that forbid circumventing (or developing tools to circumvent) the antipiracy protections embedded into digital objects. These banned both acts of circumvention and the dissemination of techniques that make it possible. We review some of these next. First, it has resulted in lawsuits that had a chilling effect on free speech and scientific research. This is a short-sighted and truly Pyrrhic victory that will ultimately result in less protection for digital objects: Cryptographic tools (and our trust in them) would today be much lower had something like the DMCA existed at their inception and “protected” them by preventing the scientific investigations that have succeeded in cracking and weeding out the weaker schemes. We may end up knowing even less than we do today as to whether the security technologies we use every day are vulnerable or not. No one will have dared to probe them for vulnerabilities or publicize these vulnerabilities for fear of being sued under the DMCA. Second, by banning even legitimate acts of circumvention, the DMCA has inadvertently protected anticustomer and anticompetitive behavior on the part of some manufacturers and publishers. What is a legitimate act of tampering with what one has bought? Here is an example: Suppose Alice buys a device that runs on a widely available type of battery, but later observes that the battery discharges faster when she uses any brand of battery other than the “recommended” (and particularly overpriced) one. Alice is a tinkerer. She digs deeper and discovers to her amazement that the device she bought performs a special challenge-response protocol with the overpriced battery to become convinced that this is the type of battery being used. If the protocol fails to convince the device, then the device concludes that another kind of battery is being used and then it misbehaves deliberately (e.g., by discharging the battery faster). This kind of anticustomer and anticompetitive behavior is well documented. (It occurs in both consumer and industrial products). There are all-software equivalents of the above “Alice’s story” that the reader can easily imagine (or personally experience). The point is that such anticustomer and anticompetitive practices do not deserve the protection of a law that bans both tampering with them and developing/ disseminating techniques for doing so. Who will argue that it is not Alice’s right to “tamper” with what she bought, to overcome what looks like an unreasonable constraint on her use of it? For a more in-depth treatment of this issue, and many specific examples of the unintended consequences of the DMCA, we refer the reader to the excellent Website of the Electronic Frontiers Foundation (www.eff.org). When they cannot prevent infringement, DRM technologies report it, and publishers then use the legal system. The targets of such lawsuits are often businesses rather than individuals. Businesses cannot claim that they are mere conduits for information, the way phone companies or ISPs do to be exempt from liability; after all, businesses already exercise editorial and usage control by monitoring and restricting what their employees can do (they try to ferret out hate speech, sexual harassment, viruses and other malware, inappropriate Web usage, and the leakage of corporate secrets, etc.). Businesses will increase workplace monitoring of their employees as a result of DRM for fear that they will be liable for infringements by their employees. In fact, a failure to use DRM by an organization could one day be viewed as failure to exercise due care and by itself expose the buyer to legal risks. But what is infringement on intellectual property in cyberspace? What is trespassing in cyberspace? As usual, there are some cases that are clear-cut, but many cases are at the “boundary” and the legal system is still sorting them out. Throughout history, the legal system has considerably lagged behind major technological advances, sometimes by decades. This has happened with the automobile, and is
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 15 Wednesday, August 4, 2004 8:38 AM
Digital Rights Management
21-15
now happening with the Internet. Many Internet legal questions (of liability, intellectual property, etc.) have no clear answer today. The issue of what constitutes “trespass” in cyberspace is still not completely clear. Even linking to someone’s Website can be legally hazardous. While it is generally accepted that linking to a site is allowed without having to ask for permission, and in fact doing so is generally viewed as “flattering” and advantageous to that site, there have nevertheless been lawsuits for copyright infringement based on the fact that the defendant had linked deep into the plaintiff ’s site (bypassing the intended path to those targeted Web pages that included the viewing of commercial banners). Generally speaking, although explicit permission to link is not required, one should not link if the target site explicitly forbids such linking. This must be explicitly stated, but it might be done as part of the material that one must “click to accept” before entering the site, and thus can easily be overlooked.
Acknowledgment Portions of this work were supported by Grants ETA-9903545 and ISS-0219560 from the National Science Foundation, Contract N00014-02-1-0364 from the Office of Naval Research, by sponsors of the Center for Education and Research in Information Assurance and Security, and by Purdue Discovery Park’s EEnterprise Center.
References Anderson, Ross. Security Engineering: A Guide to Building Dependable Distributed Systems. John & Wiley Sons, New York, February 2001, pp. 413–452. Anderson, Ross. TCPA/Palladium Frequently Asked Questions www.cl.cam.ac.uk/users/rja14/tcpafaq.html, 2003. Anderson, Ross. Security in Open versus Closed Systems — Dance of Blotzmann, Coase, and Moore www.ftp.cl.cam.ac.uk/ftp/users/rja14/toulouse.pdf, 2002. Arbaugh, William, David Farber, Jonathan Smith. A Secure and Reliable Bootstrap Architecture Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA 1997. Barak, Boaz, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, Ke Yang. On the (Im)possibility of Obfuscating Program. Electronic Colloquium on Computational Complexity, Report No. 57, 2001. Budd, Timothy. Protecting and Managing Electronic Content with a Digital Battery. Computer, vol. 34, no. 8, pp. 2–8, Aug 2001. Caldwell, Andrew E., Hyun-Jin Choi, Andrew B. Kahng, Stefanus Mantik, Miodrag Potkonjak, Gang Qu, Jennifer L. Wong. Effective Iterative Techniques for Fingerprinting Design IP. Design Automation Conference, New Orleans, LA, 1999, pp. 843–848. Camenisch, Jan. Efficient Anonymous Fingerprinting with Group Signatures. In ASIACRYPT, Kyoto, Japan, 2000, LNCS 1976, pp. 415–428. Collberg, Christian, Clark Thomborson, Douglas Low. A taxonomy of obfuscating transformations. Tech. Report # 148, Department of Computer Science, University of Auckland, 1997. Collberg, Christian and Clark Thomborson. Watermarking, Tamper-Proofing, and Obfuscation Tools for Software Protection. IEEE Transactions on Software Engineering, vol. 28 no. 8 pages 735–746, 2002. Jakobsson, Markus and Michael Reiter. Discouraging Software Piracy Using Software Aging. Digital Rights Management Workshop, Philadelphia, PA, 2001 pp. 1–12. Kahng, A., J. Lach, W. Mangione-Smith, S. Mantik, I. Markov. Watermarking Techniques far Intellectual Property Protection. Design Automation Conference 1998, San Francisco, CA, pp. 776–781. Maude, Tim and Derwent Maude. Hardware Protection Against Software Piracy Communication of ACM, vol. 27, no. 9, pp. 950–959, September 1984.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 16 Wednesday, August 4, 2004 8:38 AM
21-16
The Practical Handbook of Internet Computing
Pfitzmann, Birgit and Ahmad-Reza Sadeghi. Coin-Based Anonymous Fingerprinting. In EURO-CRYPT, Prague, 1999, LNCS 1592, pp. 150–164. Pfitzmann, Birgit and Matthias Schunter. Asymmetric Fingerprinting. In Advances in Cryptology EUROCRYPT Saragossa, Spain, 1996, vol. 1070 of LNCS, pp. 84–95. Qu, Gang and Miodrag Potkonjak. Fingerprinting Intellectual Property Using Constraint-Addition. Design Automation Conference, Los Angeles, CA, 2000, pp. 587–592. Qu, Gang, Jennifer L. Wong, and Miodrag Potkonjak. Optimization-Intensive Watermarking Techniques for Decision Problems In Proceedings 36th ACM/IEEE Design Automation Conference 1999, pp. 33–36. Schneier, Bruce. Secrets and Lies: Digital Security in a Networked World. John Wiley & Sons, New York, 2000. Seadle, M., J. Deller, and A. Gurijala. Why Watermark? The Copyright Need for an Engineering Solution. Joint Conference on Digital Libraries, Portland, OR, July 2002. Wang, Chenxi, Jonathan Hill, John Knight, Jack Davidson. Software Tamper Resistance: Obstructing Static Analysis of Programs. Ph.D Dissertation, University of Virginia, Charlottesville, VA. Wroblewski, Gregory. General Method of Program Code Obfuscation. Ph.D Dissertation, Wroclaw University of Technology, Institute of Engineering Cybernetics, Wrodaw, Poland, 2002.
Copyright 2005 by CRC Press LLC
C3812_C21.fm Page 17 Wednesday, August 4, 2004 8:38 AM
Part 3 Information Management
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 1 Wednesday, August 4, 2004 8:40 AM
22 Internet-Based Enterprise Architectures CONTENTS 22.1 Introduction 22.2 From Client–Server to n-Tier Architectures 22.2.1 22.2.2 22.2.3 22.2.4
Client–Server Architecture Remote Procedure Calls (RPC) Messaging Systems n-Tier Architectures
22.3 New Keys to Interoperability: XML, SOAP, Web Servi Meta Data Registries, and OMGs MDA 22.3.1 22.3.2 22.3.3 22.3.4 22.3.5
XML SOAP Web Services Meta Data Registries OMG Model-Driven Architecture (MDA)
22.4 The J2EE Architecture 22.4.1 22.4.2 22.4.3 22.4.4 22.4.5 22.4.6 22.4.7 22.4.8
The J2EE Layered Approach J2EE Container Model Web Container EJB Container Java Message Service (JMS) Java Naming and Directory Interface (JNDI) Java Database Connectivity (JDBC) J2EE Application Architectures
22.5 The .NET Architecture 22.5.1 Basic Principles 22.5.2 .NET Application Architectures 22.5.3 Microsoft Transaction Server (MTS) and Language Runtime 22.5.4 Microsoft Message Queue (MSMQ) 22.5.5 Active Directory
François B. Vernadat
22.6 Comparison of J2EE and .NET 22.7 Conclusion: The Global Architecture References
This chapter describes and evaluates common industry approaches such as J2EE and .NET to build Internet-based architectures for interoperable information systems within business or administrative organizations of any size. It discusses the directions in which enterprise architectures are moving, especially with respect to supporting business process operations and offering information and services both to internal and external users (staff, clients, or partners).
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 2 Wednesday, August 4, 2004 8:40 AM
22-2
The Practical Handbook of Internet Computing
22.1 Introduction Today’s organizations, be they profit or non-profit organizations, are facing major challenges in terms of flexibility and reactivity — also called agility — to manage day-to-day internal or interorganizational business processes. Indeed, they have to cope with changing conditions of their environment, stronger connectivity with their trading partners (e.g., networked enterprise, integrated supply chains), interoperability of their information systems, and innovation management in their business area. Among the business needs frequently mentioned, we can list: • • • • • • • • •
Wider distribution support to data, applications, services, and people More intense Web-based operations with a variety of partners and clients Intense transaction across heterogeneous platforms Integration of different information systems (legacy and new ones) High service availability (preferably, 7 d a week, 24 h/d) Scalability and growth capabilities of application systems High security and performance standards Ability to reuse existing building blocks (also called enterprise components) Ability to quickly adapt the current architecture to business evolution needs
There is, therefore, a need to build open, flexible, and modular enterprise architectures on top of a sound IT architecture [Vernadat, 2000]. However, the IT architecture should by no means constrain the enterprise architecture, as it happened too often in the past in the form of large monolithic enterprise information systems (ERP systems still suffer from this deviation, especially in small and medium sized enterprises). The enterprise architecture, which deals with people, customers, suppliers, partners, business processes, application systems, and inter-organizational relationships (e.g., CRM, SCM), now looks like a constellation of nodes in a networked organization (Figure 22.1). The IT architecture plays the role of the enabling infrastructure that supports operations of the enterprise architecture. It must be as much as possible transparent to the enterprise architecture.
Insurance
Supplier
Partner
Company Bank
ERP INTERNET
ERP
SCM
SCM CRM
CRM
EXTRANET Our Company
ERP
CRM SCM SCM
Client ERP
SCM CRM
Our Staff
INTRANET
ERP: Enterprise Resource Planning SCM: Supply Chain Management
FIGURE 22.1 Principle of the networked organization.
Copyright 2005 by CRC Press LLC
CRM: Customer Relationship Management
C3812_C22.fm Page 3 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-3
Recent technological advances that contribute to make all this become a reality include: • Greater computing power now widely available. (It is still doubling every 18 months and makes possible to have PC-based data centers for small and medium-sized organizations.) • Increased connectivity by means of low cost, broad-reach Internet, wireless, or broadband access. • Electronic device proliferation that can be easily connected to networks nearly anywhere (PCs, personal digital assistants or PDAs, cellular phones, etc.). • Internet standards, and especially XML and ebXML, allowing XML-based integration. • Enterprise Application Integration (EAI), i.e., methods and tools aimed at connecting and coordinating the computer applications of an enterprise. Typically, an enterprise has an existing base of installed legacy applications and databases, wants to continue to use them while adding new systems, and needs to have all systems communicating to one another.
22.2 From Client–Server to n-Tier Architectures Interoperation over computer networks and Internet has become a must for agile enterprise architectures, especially to coordinate and synchronize business processes within a single enterprise or across networked enterprises [Sahai, 2003; Vernadat, 1996]. Interoperability is defined in Webster’s dictionary as “the ability of a system to use the parts of another system.” Enterprise interoperability can be defined as the ability of an enterprise to use information or services provided by one or more other enterprises. This assumes on the part of the IT architecture the ability to send and receive messages made of requests or data streams (requests and responses). These kinds of exchange can be made in synchronous or asynchronous mode. Interoperability also assumes extensive use of standards in terms of data exchange in addition to computing paradigms. This section reviews some of the key paradigms for computer systems interoperability.
22.2.1 Client–Server Architecture Wherever the information is stored, there comes a time at which it needs to be transferred within or outside of the enterprise. To support this, the client–server architecture, together with Remote Procedure Calls (RPCs), has been the systemwide exchange computing paradigm of the 1990s. The basic idea of the client–server architecture is to have a powerful machine, called the server, providing centralized services, which can be accessed from remotely located, less powerful machines, called clients. This two-level architecture characterizes client–server systems, in which the client requests a service (i.e., function execution or access to data) and the server provides it. The client–server architecture makes possible the connection of different computer environments on the basis of a simple synchronous message passing protocol. For instance, the server can be a Unix machine hosting a corporate relational database management system (RDBMS) while the clients could be PCs under Apple OS or MS-Windows running interface programs, spreadsheets, or local database applications, and used as front-ends by business users. Figure 22.2 illustrates the basic principle of the client–server approach. A client issues a request (via a computer network) to a server. The server reads the request, processes it, and sends the result(s) back to the client. The server can provide one or more services to clients and is accessed via one of its interfaces, i.e., a set of callable functions. Callable functions (e.g., open, close, read, write, fetch, update, or execute) are defined by their signature (i.e., name and list of formal parameters). Their implementation is internal to the server and is transparent to the client. On the server side, the server can itself become a client to one more other servers, which in turn can be clients to other servers, and so forth, making the architecture more complex. Clients and servers can be distributed over several machines connected by a computer network or may reside on the same node of the network.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 4 Wednesday, August 4, 2004 8:40 AM
22-4
The Practical Handbook of Internet Computing
Server
Client Send request
Request
Read Read request request Process request request
Display Display results results
Response
Send results results
FIGURE 22.2 Principle of the client–server approach.
On the client side, the client can be a simple program (i.e., an interface) used to interact with the application hosted on the server. In this case, it implements no complex features and all heavy computing duties are performed by the server. It is then called a thin client. Conversely, due to the increasing power and disk capacity of PCs, the client can be a sophisticated program able to locally execute a significant part of the functionality of the server to reduce the burden of computer communication exchange and improve processing speed. It is then called a thick client. Good examples of universal thin clients in the Internet computing world are Web browsers (e.g., Netscape, Internet Explorer, Opera). These are widely available tools on most operating systems. They can provide access to any Web site provided that one knows the site URL to be accessed. They can also give access to enterprise portals from anywhere in the world, thus making possible for employees on travel to remain connected with their enterprise.
22.2.2 Remote Procedure Calls (RPC) RPC can be defined as an interprocess communication mechanism to start some processes on remotely located processors. In pure computer science terms, it is a way to start a process located in a separate address space from the calling code. It can, therefore, be used as one of the synchronous messaging protocols for implementing a client–server system, especially in non-Windows environments (Windows environments can, for instance, use Windows sockets). The goal of the RPC mechanism is to ease the development of client–server applications in large-scale heterogeneous and distributed environments and to free system developers from the burden of dealing with the complexity of low level layers of networks. The RPC mode of operation is similar to a local procedure call but extended to a network environment. In other words, for the calling application, the procedure is executed as if it were executed locally. The RPC distribution mechanism considers a RPC client and a RPC server. The RPC client is the part of an application that initiates a remote procedure call, while the RPC server is the part of the application that receives the remote call and takes in charge the procedure execution (Figure 22.3). Communication between the two applications (connection, transfer, and disconnection) is handled by the RPC mechanism installed as a software extension of both applications and interacting with the respective operating system hosting each application. It consists of a software layer made of a RPC library and a client stub on the client machine and a server stub on the server machine. The client stub is an interface that prepares the parameters (or arguments) of the procedure called by the client application and puts them in a package that will be transmitted by the RPC routines (provided by the RPC Library). The client application can explicitly identify the server on which the procedure will be executed or let the RPC Library locate itself the corresponding server. Each application server is identified by its unique UUID name (Universal Unique IDentifier). This name is known to the client stub. A name server is available on the network. This name server maintains a table containing, among other things, the list of all servers running server applications (identified by their UUID). On the server side, the RPC Library
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 5 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-5
RPC Client
RPC Server
Calling Application Call start
Client stub
RPC Library
1
2
Server Application Call end
Data preparation 9
Procedure execution 5
Server stub Data conversion
4 6
Data conversion Data preparation
RPC Library
Transmission 8
3 7
Data reception
Transmission
Data reception
Communication medium
FIGURE 22.3 Principle of the Remote Procedure Call (RPC).
receives the package sent by the client and transmits it to the server stub (RPC interface between the RPC Library and the server application) for the application concerned by the call (there can be several server stubs on the same server). The server stub decodes the parameters received and passes them to the called procedure. Results are prepared as a new package and sent back by the server application to the calling application in a more symmetric way than the call was issued by the client application (Figure 22.3).
22.2.3 Messaging Systems Messaging systems are middleware components used to handle the exchange of messages, i.e., speciallyformatted data structures describing events, requests, and replies, among applications that need to communicate. They are usually associated with a message queuing system to serialize, prioritize, and temporarily store messages. IBM MQSeries, Sun Microsystems Java Messaging System (JMS), and Microsoft MSMQ are common examples of messaging systems, also called message-oriented middleware (MOM). There are two major messaging models: the point-to-point model (in which the addresses of the issuer and of the receiver need to be known) and the publish/subscribe model (in which the message is broadcasted by the issuer and intercepted by relevant recipients that may notify receipt of the message). Messaging makes it easier for programs to communicate across different programming environments (languages, compilers, and operating systems) since the only thing that each environment needs to understand is the common messaging format and protocol.
22.2.4 n-Tier Architectures Client–server architectures assume tight and direct peer-to-peer communication and exchange. They have paved the way to more general and flexible architectures able to serve concurrently many clients, namely 3-tier architectures and, by extension, n-tier architectures. 3-tier architectures are architectures organized in three levels as illustrated by Figure 22.4: • The level of clients or service requestors (thin or thick clients) • The level of the application server (also called middle-tier or business level) supposed to provide services required by clients • The level of secondary servers, usually database or file servers but also specialized servers and mainframes, which provide back-end services to the middle tier
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 6 Wednesday, August 4, 2004 8:40 AM
22-6
The Practical Handbook of Internet Computing
Client level
Application server level
Web Browsers
Client Applications
Presentation (Web server)
Secondary server level
Business logic
Database Server
Security Concurrency Connectivity
Back-office Server
FIGURE 22.4 Principle of the 3-tier architecture.
As shown by Figure 22.4, a 3-tier architecture clearly separates the presentation of information to requesters (front-end) from the business logic of the application (middle tier), itself dissociated from the data servers or support information systems (back-end). Software applications based on the 3-tier architecture principle are becoming very popular nowadays. Since the year 2000, it is estimated that more than half of new application developments use this approach. It is especially well-suited for the development of enterprise portals, data warehouses, business intelligence applications, or any other type of applications required to serve a large number of clients (users or systems), access, reformat, or even aggregate data issued from several sources; to support concurrent transactions; and to meet security and performance criteria. The idea of making applications talk to one another is not new. Currently, distributed architectures such as OSF/DCE,1 OMG/CORBA,2* or Microsoft DCOM,3** already support exchange among applications using the RPC mechanism. However, each of these architectures is using a proprietary exchange protocol, and therefore they are not easily interoperable. For instance, it is difficult to make a CORBA object communicate with a COM object. Furthermore, protocols used to transport these objects are often blocked by enterprise firewalls. To go one step further towards enterprise interoperability, n-tier architectures based on open and standardized protocols are needed. This is the topic of the next section.
22.3 New Keys to Interoperability: XML, SOAP, Web Services, Meta Data Registries, and OMGs MDA 22.3.1 XML XML (eXtended Mark-up Language) is a language derived from SGML (Standard Generalized Markup Language–ISO 8879) [Wilde, 2003; W3C, 2000a]. It is a standard of the World Wide Web Consortium (W3C) intended to be a universal format for structured content and data on the Web. Like HTML (Hypertext Markup Language), XML is a tagged language, i.e., a language which bounds information by tags. HTML has a finite set of predefined tags oriented to information presentation (e.g., title, paragraph, image, hypertext link, etc.). Conversely, XML has no predefined tags. The XML syntax uses matching start and end tags, such as and , to mark up information. It is a meta-language that allows the creation of an unlimited set of new tags to characterize all pieces or aggregates of pieces of basic information that a Web page can be made of. No XML tag conveys presentation information. This is the role of XSL and XSLT. XSL (eXtensible Stylesheet Language) is another language recommended by the World Wide Web Consortium (W3C) to specify the representation of information contained in an XML document [W3C,
1
OSF/DCE: Open Software Foundation/Distributed Computing Environment OMG/CORBA: Object Management Group/Common Object Request Broker Architecture 3** DCOM: Distributed Component Object Model 2*
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 7 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-7
2000b]. XSL is used to produce XSL pages that define the style sheets for the formatting of XML documents. XSL can be divided into two components: • XSLT (eXtensible Stylesheet Transformation): allows transforming of the hierarchical structure of an XML document into a different structure by means of template rules • The data layout language: allows defining the page layout of textual or graphical elements contained in the information flow issued from the XSLT transformation Thanks to the clear separation between content and layout, XML provides a neutral support for data handling and exchange. It is totally independent of the physical platform that transports or publishes the information because XSL style sheets are used to manage the layout of the information in function of the support or destination, the content of the basic XML file containing data remaining the same. It is also totally independent of the software that will process it (nonproprietary format). The only dependence on a specific domain appears in the way the information structured in an XML file must be interpreted with respect to this domain. This can be specified by means of Data Type Documents (DTDs), which define the structure of the domain objects. This so-called data neutralization concept is essential to achieve systems interoperability.
22.3.2 SOAP SOAP stands for Simple Object Access Protocol. It is a communication protocol and a message layout specification that defines a uniform way of passing XML-encoded data between two interacting software entities. SOAP has been submitted to the World Wide Web Consortium to become a standard in the field of Internet computing [W3C, 2002a]. The original focus of SOAP is to provide: • A framework for XML-based messaging systems using RPC principles for data exchange • An envelope to encapsulate XML data for transfer in an interoperable manner that provides for distributed extensibility and evolvability, as well as intermediaries such as proxies, caches, and gateways • An operating system neutral convention for the content of the envelope when messages are exchanged by RPC mechanisms (sometimes called “RPC over the Web”) • A mechanism to serialize data based on XML schema data types • A non-exclusive mechanism layered on HTTP transport layer (to be able to interconnect applications — via RPC, for instance — across firewalls) Although the SOAP standard does not yet specify any security mechanisms (security handling is delegated to the transport layer), SOAP is likely to become the standard infrastructure for distributed applications. First, because, as its name suggests, it keeps things simple. Second, because it nicely complements the data neutralization principle of XML (i.e., transfers ASCII data). Furthermore, it is independent of the use of any particular platform, language, library, or object technology (it builds on top of these). Finally, it does not dictate a transport mechanism. (SOAP specification defines how to send SOAP messages over HTTP but other transport protocols can be used such as SMTP, or raw TCP.)
22.3.3 Web Services The aim of Web Services is to connect computers and devices with each other using the Internet to exchange data and to process data dynamically, i.e., on-the-fly. It is a relatively new concept, still rapidly evolving, but which has the power to drastically change the way we build computer-based applications and which can make computing over the Web a reality soon. No doubt that it will become a fundamental building block in implementing agile distributed architectures [W3C, 2002b]. Web Services can be defined as interoperable software objects that can be assembled over the Internet using standard protocols and exchange formats to perform functions or execute business processes [IST
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 8 Wednesday, August 4, 2004 8:40 AM
22-8
The Practical Handbook of Internet Computing
Diffuse Project, 2002; Khalaf et al., 2003]. A concise definition is proposed by DestiCorp [IST Diffuse Project, 2002]. It states that “Web Services are encapsulated, loosely coupled, contracted software functions, offered via standard protocols.” In a sense, Web Services can be assimilated to autonomous software agents hosted on some servers connected to the Web. These services can be invoked by means of their URL by any calling entity via Internet and using XML and SOAP to send requests and receive responses. Their internal behavior can be implemented in whatever language (e.g., C, Java, PL/SQL) or even software system (e.g., SAS, Business Objects, Ilog Rules). Their granularity can be of any size. The great idea behind Web Services is that a functionality can be made available on the Web (or published) and accessed (or subscribed) by whoever needs it without having to know neither its location nor its implementation details, i.e., in a very transparent way. Thanks to Web Services, direct exchanges are made possible among applications in the form of XML flows only. This makes possible on-the-fly software execution as well as business process execution through the use of loosely coupled, reusable software components. In other words, the Internet itself becomes a programming platform to support active links, or transactions, between business entities. Because businesses will no longer be tied to particular applications and to underlying technical infrastructures, they will become more agile in operation and adaptive to changes of their environment. Partnerships and alliances can be set up, tested, and dissolved much more rapidly. Figure 22.5 shows the evolution and capabilities of programming technology culminating in Web Services and clarifies this point. Web Services rely on the following components, which are all standards used in industry: • WSDL (Web Service Description Language): a contract language used to declare Web Service interfaces and access methods using specific description templates [W3C, 2001]. • UDDI (Universal Description, Discovery, and Integration): an XML-based registry or catalog of businesses and the Web Services they provide (described in WSDL). It is provided as a central facility on the Web to publish, find, and subscribe Web Services [OASIS, 2002]. • XML: the language used to structure and formulate messages (requests or responses). • SOAP: used as the messaging protocol for sending requests and responses among Web Services. • SMTP/HTTP(S) and TCP/IP: used as ubiquitous transport and Internet protocols. Zone 2
Zone 1 Monolithic system (Cobol)
Main program with sub-programs
Static linking (C++)
Main program
Dynamic linking (Java)
Main program
Remote Procedure Call (RPC)
Main program
Web Services
Main program
Funct. Funct. 1 n
Function 1 adapter
Function n binary message adapter SOAP/XML
Function n
Function n (service)
FIGURE 22.5 Evolution of programming technology.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 9 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-9
Because all these are open and noninvasive standards and protocols, messages can go through enterprise firewalls. In business terms, this means that services can be delivered and paid for as fluid streams of messages as opposed to the economic models of packaged software products, which in most cases require license, operation, and maintenance costs. Using Web Services, a business can transform assets into licensable, traceable, and billable products running on its own or third-party computers. The subscribers can be other applications or end-users. Moreover, Web Services are technology-independent and rely on a loosely-coupled Web programming model. They can therefore be accessed by a variety of connecting devices able to interpret XML or HTML messages (PCs, PDAs, portable phones). Figure 22.6 gives an overview of what the next generation of applications is going to look like, i.e., “Plug and Play” Web Services over Internet.
22.3.4 Meta Data Registries Meta Data Registries (MDRs) are database systems used to collect and manage descriptive and contextual information (i.e., metadata) about an organization’s information assets [Lavender, 2003; Metadata Registries, 2003]. In layman’s terms, these are databases for metadata (i.e., data about data) used to make sure that interacting systems talk about the same thing. For instance, examples of metadata in national statistical institutes concern the definition of statistical objects, nomenclatures, code-lists, methodological notes, etc., so that statistical information can be interpreted the same way by all institutes. Metadata Registries constitute another type of fundamental building blocks of enterprise architectures to achieve enterprise interoperability. For instance, UDDI, as used for Web Services registry, is a kind of MDR. An LDAP database (i.e., a database for Lightweight Directory Access Protocol used to locate organizations, individuals, and other resources such as files and devices in a network, whether on the public Internet or on a corporate intranet) is another kind. A thesaurus of specialized terms and a database of code lists in a specific application domain are two other examples. The main roles of Metadata Registries are: • To enable publishing and discovery of information (e.g., user lists, code lists, XML files, DTDs, UML models, etc.) or services (e.g., Web Services) • To allow organizations to locate relevant business process information • To provide content management and cataloging services (Repository) • To provide services for sharing information among systems Application Logic GSM/ WAP
XXM MLL
PDA
End-user Richer, More Productive User Experience
XML
Business Logic Logi c && Web Web Services Services XML
Private Private Web We b Services
XML
Internal We b Internal Web Services
XML
L HTM
OS OS Services Services
XX MM LL
Open Internet Communications Protocols (HTTP, SMTP, XML, SOAP)
FIGURE 22.6 Web Services and business applications.
Copyright 2005 by CRC Press LLC
Public Public Web We b Services Services
Data Data servers, servers, BBack-ends ack-ends Globally Available Federated Web Services
C3812_C22.fm Page 10 Wednesday, August 4, 2004 8:40 AM
22-10
The Practical Handbook of Internet Computing
Depending on their scope, three types of registries can be defined: • Private scope registries: Their access is strictly limited to a few applications or groups of users (for instance, business application integration catalogs). • Semi-private scope registries: industry or corporation specific (for instance, an enterprise portal catalog of users). • Public scope registries: unrestricted access. Because of their ability to describe context, structure, and life cycle stages of operational data as well as of system components and their states (e.g., modules, services, applications), MDRs can play an essential role as semantic mediators at the unification level of systems interoperability. This role is enforced by a number of standards recently published concerning the creation and structure of registries (e.g., ISO/IEC 11179 and ISO/IEC 20944, OASIS/ebXML Registry Standard) or their contents (e.g., ISO 704, ISO 1087, ISO 16642, Dublin Core Registries) [Metadata Registries, 2003].
22.3.5 OMG Model-Driven Architecture (MDA) The Object Management Group (OMG) has contributed a number of IT interoperability standards and specifications, including the Common Object Request Broker Architecture (CORBA), the Unified Modeling Language (UML), the Meta Object Facility (MOF), XML Metadata Interchange (XMI), and the Common Warehouse Metamodel (CWM). Details on each of these standards can be found on the Web at www.omg.org. However, due to recent IT trends, OMG has realized the need for two modifiers: First, there are limits to the interoperability level that can be achieved by creating a single set of standard programming interfaces. Second, the increasing need to incorporate Web-based front ends and to make links to business partners, who may be using proprietary interface sets, can force integrators back to lowproductivity activities of writing glue code to hold multiple components together. OMG has, therefore, produced a larger vision necessary to support increased interoperability with specifications that address integration through the entire system life cycle: from business modeling to system design, to component construction, to assembly, integration, deployment, management, and evolution. The vision is known as the OMG’s Model Driven Architecture (MDA), which is described in [OMG, 2001, 2003] and from which this section is adapted. MDA first of all defines the relationships among OMG standards and how they can be used today in a coordinated fashion. Broadly speaking, MDA defines an approach to IT system specification that separates the specification of system functionality from the specification of the implementation of the functionality on a specific platform. To this end, the MDA defines an architecture for models that provides a set of guidelines for structuring specifications expressed as models. As defined in [OMG, 2001], “a model is a representation of a part of the function, structure, and/or behavior of a system (in the system-theoretic sense and not restricted to software systems). A platform is a software infrastructure implemented with a specific technology (e.g., Unix platform, CORBA platform, Windows platform).” Within the MDA, an interface standard — for instance, a standard for the interface to an Enterprise Resource Planning (ERP) system — would include a Platform-Independent Model (PIM) and at least one Platform-Specific Model (PSM). The PIM provides formal specifications of the structure and function of the system that abstracts away technical details, while the PSM is expressed in terms of the specification model of the target platform. In other words, the PIM captures the conceptual design of the interface standard, regardless of the special features or limitations of a particular technology. The PSM deals with the “realization” of the PIM in terms of a particular technology — for instance, CORBA — wherein target platform concepts such as exception mechanisms, specific implementation languages, parameter formats and values, etc. have to be considered. Figure 22.7 illustrates how OMG standards fit together in MDA. It provides an overall framework within which the roles of various OMG and other standards can be positioned.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 11 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-11
Finance Manufacturing
E-Commerce
T CURI Y SE I/XML XM
Telecom
T
N
TS
Model Driven Architecture M M O F CW NE JA R VA E AN EV SAC TIO N S
T
Space
ECTORY DIR PE B SERVIC E E W
IVE SERVIC AS ES V CORBA R S UML
Transportation
Health Care More...
FIGURE 22.7 OMG’s Model Driven Architecture.
The core of the architecture (inner layer) is based on OMG’s modeling standards: UML, the MOF, and CWM. It comprises a number of UML profiles for different computing aspects. For example, one will deal with Enterprise Computing with its component structure and transactional interaction (business processes); another will deal with Real-Time Computing with its special needs for resource control. The total number of profiles in MDA is planned to remain small. Each UML profile is intended to represent common features to be found on all middleware platforms for its category of computing. Its UML definition must be independent of any specific platform. The first step when developing an MDA-based application is to create a PIM of the application expressed in UML using the appropriate UML profile. Then, the application model (i.e., the PIM) must be transformed into a UML, platform-specific, model (i.e., a PSM) for the targeted implementation platform (e.g., CORBA, J2EE, COM+ or .NET). The PSM must faithfully represent both the business and technical run-time semantics in compliance with the run-time technology to be used. This is expressed by the second inner layer in Figure 22.7. Finally, all applications, independent of their context (Manufacturing, Finance, e-Commerce, Telecom), rely on some essential services. The list varies somewhat depending on the application domain and IT context but typically includes Directory Services, Event Handling, Persistence, Transactions, and Security. This is depicted by the outer layer of Figure 22.7 that makes the link with application domains. The MDA core, based on OMG technologies, is used to describe PIMs and PSMs. Both PIMs and PSMs can be refined n-times in the course of their development until the desired system description level and models consistency are obtained. These core technologies include: • UML (Unified Modeling Language): It is used to model the architecture, objects, interactions between objects, data modeling aspects of the application life cycle, as well as the design aspects of component-based development including construction and assembly. • XMI (XML Metadata Interchange): It is used as the standard interchange mechanism between various tools, repositories, and middleware. It plays a central role in MDA because it marries the world of modeling (UML), metadata (MOF and XML), and middleware (UML profile for Java, EJB, CORBA, .NET, etc.). • MOF (Meta Object Facility): Its role is to provide the standard modeling and interchange constructs that are used in MDA. These constructs can be used in UML and CWM models. This common foundation provides the basis for model/metadata interchange and interoperability, and is the mechanism through which models are analyzed in XMI.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 12 Wednesday, August 4, 2004 8:40 AM
22-12
The Practical Handbook of Internet Computing
• CWM (Common Warehouse Metamodel): This is the OMG data warehouse standard. It covers the full life cycle of designing, building, and managing data warehouse applications and supports management of the life cycle.
22.4 The J2EE Architecture The Java language was introduced by Sun Microsystems in the 1990s to cope with the growth of Internetcentric activities and was soon predicted to change the face of software development as well as the way industry builds software applications. This indeed has really happened, especially for Internet applications, as many key players in the software industry have joined the bandwagon, and there is now a large and growing Java community of programmers around the world with its specific Java jargon. Although robust, performing, open, portable, based on object-oriented libraries, and dynamic linking, Java by itself cannot fulfill all integration, interoperation, and other IT needs of industry. The J2EE (Java To Enterprise Edition and Execution) application server framework has been proposed as a de facto standard for building complex applications or IT infrastructures using Java. A J2EE-compliant platform is supposed to preserve previous IT investments, be a highly portable component framework, deal with heterogeneity of applications, offer scalability and distributed transactions, and, as much as possible, ensure security and a high degree of availability. This standard has been accepted by industry and most software and computer vendors are providing J2EE compliant application server environments such as WebLogic by BEA Systems, WebSphere by IBM, Internet Operating Environment (IOE) by HP, or iAS by Oracle, to name only four.
22.4.1 The J2EE Layered Approach The J2EE framework [SUN Microsystems, 2001] adopts a three-layer approach to build n-tier applications as illustrated by Figure 22.8. It assumes execution of these applications on a Java Virtual Machine (JVM) to be in principle independent of any specific operating system intricacies. It is made of three layers: • Presentation Layer. Deals with connection with the outside environment and the way information is presented to users and client applications.
Web Web Browsers Web Browsers Browsers
Client Applications Client Applications Applications Client Applications Applications
Presentation Layer L ayer Business Object Object Layer Layer(EJB) (EJB) Back -EndLayer Back-End Layer(JCA) (JCA) JavaVirtual Virtual Machine Java Machine Data Data Server Server
FIGURE 22.8 The J2EE layered approach.
Copyright 2005 by CRC Press LLC
Server
Data Server
Server
C3812_C22.fm Page 13 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-13
• Business Object Layer. Performs the business processing (or application server part) offered by the Java application to be developed. The application logic is programmed by means of Enterprise Java Beans or EJBs. An EJB is a (reusable) platform independent software component written in Java with self-descriptive capabilities and invoked via its methods. • Back-End Layer (or JCA for J2EE Connector Architecture). Deals with connection with back-end systems such as database servers, ERP systems, remote systems, or other J2EE or non-J2EE platforms. This three-layered approach is implemented according to the architecture specified by the J2EE Container Model. This model defines a number of architectural parts, called Containers, and some satellite components (JMS, JNDI, JDBC).
22.4.2 J2EE Container Model The J2EE Container Model provides a global view of the elements that need to be developed or to be used for developing the three layers of a J2EE platform (Figure 22.9). A container is an interface between a component and a specific platform functionality. Container settings customize the underlying support provided by the J2EE Server, including services such as transaction management, security, naming and directory services, or remote connectivity. The role of the container is also to manage other services such as component life cycle, database connection, resource pooling, and so forth. The model defines three types of containers: Application Client Container, Web Container, and EJB Container. On the client side (Client Machine), access to the J2EE Server can be made by means of a Web browser via HTTP in the case of a Web-based application, for instance to present HTML pages to the client, or by an application program encapsulated in an Application Client Container. In the latter case, the application can access HTML pages (i.e., talk to the Web Container), or request functions written in Java (i.e., invoke the EJB Container), or both. On the server side, the J2EE Server is the middleware that supports communication with a Web client by means of server-side objects called Web components (Servlets and Java Server Pages). Web components are activated in the Web Container environment. In addition to Web page creation and assemblage, the role of the Web Container is to provide access to built-in services of the J2EE platform such as request dispatching, security authorization, concurrency and transaction management, remote connectivity, or life cycle management. These services are provided by components of the EJB Container. These compoJ2EE Server Browser
Application Application Client Application Client Container Container
Servlet Servlet
JSP Page Page
Web Container Container
Database Enterprise Bean
Enterprise Bean
Client Machine EJB Container
JMS
JJNDI
Operating System
FIGURE 22.9 The J2EE Container Model and satellite components.
Copyright 2005 by CRC Press LLC
JDBC
C3812_C22.fm Page 14 Wednesday, August 4, 2004 8:40 AM
22-14
The Practical Handbook of Internet Computing
nents are called Enterprise Java Beans (EJBs). Because an EJB is a server-side Java component that encapsulates some piece of business logic, it can be accessed by a client application through one of its methods defined in the bean’s interface.
22.4.3 Web Container As mentioned earlier, the Web Container is an environment of the J2EE Server that supports Web-based communication (usually via the HTTP protocol) with a client (browser or Web application). There are two types of Web components in a Web Container: Java servlets and Java Server Pages (JSPs). A servlet is a Java programming language class used to extend the capabilities of servers that host applications accessed via a request-response programming model. Although servlets can respond to any type of request, they are commonly used to extend the applications hosted by Web servers. For such applications, the Java Servlet technology defines HTTP-specific servlet classes. Java Server Pages (JSPs) are easier to use than servlets. JSPs are simple HTML files containing embedded code in the form of scripts (HTML files with extension.jsp and containing clauses of the form script code ). They allow the creation of Web contents that have both static and dynamic components. JSPs can provide all the dynamic capabilities of servlets but they provide a more natural approach for the creation of static content. The main features of JSPs are: • A language for developing JSP pages, which are text-based documents that describe how to process a request and to construct a response. • Constructs for accessing server-side objects. • Mechanisms to define extensions to the JSP language in the form of custom tags. Custom tags are usually distributed in the form of tag libraries, each one defining a set of related custom tags and containing the objects that implement the tags. In developing Web applications, servlets and JSP pages can be used interchangeably. Each one has its strengths: • Servlets are best suited for the management of the control functions of an application, such as dispatching requests, and handling non-textual data. • JSP pages are more appropriate for generating text-based markup languages.
22.4.4 EJB Container Enterprise Java Beans (EJBs) are both portable and reusable components for creating scalable, transactional, multiuser, and secure enterprise-level applications. These components are Java classes dealing with the business logic of an application and its execution on the server side. They are invoked by clients by means of the methods declared in the bean’s interface (Remote Method Invocation or RMI principle similar to the RPC method). Clients can discover the bean’s methods by means of a self-description mechanism called introspection. These are standard methods that allow an automatic analysis of the bean, which, thanks to its syntactic structure, describes itself. Because they are written in Java, programming with EJBs makes the application code very modular and platform-independent and, hence, simplifies the development and maintenance of distributed applications. For instance, EJBs can be moved to a different, more scalable platform should the need arise. This allows “plug and work” with off-the-shelf EJBs without having to develop them or have any knowledge of their inner workings. The role of the EJB Container is to provide system-level services such as transaction management, life-cycle management, security authorization, and connectivity support to carry out application execution. Thus, the client developer does not have to write “plumbing” code, i.e., routines that implement either transactional behavior, access to databases, or access to other back-end resources because these are provided as built-in components of the EJB Container. The EJB Container can contain three types of EJBs: Session Beans, Entity Beans, and Message-Driven Beans.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 15 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-15
Session Beans: A session bean is an EJB that acts as the counterpart of a single client inside the J2EE Server. It performs work for its client, shielding the client from server-side complexity. To access applications deployed on the server, the client invokes the session bean’s methods. There are two types of session beans: stateful and stateless session beans. • In a stateful session bean, the instance variables represent the state of a unique client-bean session for the whole duration of that client-bean session. • A stateless session bean does not maintain any conversational state for a client other than for the duration of a method invocation. Except during method invocation, all instances of a stateless bean are equivalent, thereby allowing better scalability. Entity Beans: An entity bean represents a business object in a persistent storage mechanism, e.g., RDBMS. Typically, entity beans correspond to an underlying table and each instance of the bean corresponds to a row in that table. Entity beans differ from session beans in several ways. Entity beans are persistent, allow shared access, have primary keys, and may participate in relationships with other beans. There are two types of persistence for entity beans: bean-managed (BMP) and container-managed (CMP). Being a part of an entity bean’s deployment descriptor (DD), the abstract schema defines the bean’s persistent fields and relationships and is referenced by queries written in the Enterprise Java Beans Query Language (EJB QL). Message-Driven Beans (MDBs): An MDB is an EJB that allows J2EE applications to process messages asynchronously by acting as a Java Message Service (JMS) listener (see next section). Messages may be sent by any J2EE component or even a system that does not use J2EE technology. An MDB has only a bean class., i.e., the instance variables can contain state across the handling of client messages.
22.4.5 Java Message Service (JMS) The Java Message Service (JMS) is an application program interface (JMS API) ensuring a reliable, flexible service for the asynchronous exchange of critical business data and events throughout the enterprise. It provides queuing system facilities that allow a J2EE application to create, send, receive, and read messages to/from another application (J2EE or not) using both a point-to-point and publish/subscribe policy (Figure 22.10). JMS, which has been part of J2EE platforms since version 1.3, has the following features: • Application clients, EJBs, and Web components can send or synchronously receive a JMS message • Application clients can in addition receive JMS messages asynchronously • MDBs enable the asynchronous consumption of messages. A JMS provider may optionally implement concurrent processing of messages by MDBs • Messages sent and received can participate in distributed transactions
22.4.6 Java Naming and Directory Interface (JNDI) The Java Naming and Directory Interface (JNDI) is a J2EE resource location service accessible via its application programming interface (JNDI API). It provides naming and directory functionality to appli-
Message Application Client
Message Queue
Sends
Receives
J2EE Server MDB MDB Instances EJB EJBContainer Container
FIGURE 22.10 Java Message Service (JMS) principles.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 16 Wednesday, August 4, 2004 8:40 AM
22-16
The Practical Handbook of Internet Computing
Java Application JNDI API API Naming Manager Manager JNDI SPI SPI LDAP
DNS
NIS
NDS
RMI
CORBA
JNDI Implementation Possibilities
FIGURE 22.11 The JNDI layered approach.
cations. It has been designed to be independent of any specific directory service implementation (e.g., LDAP, DNS, UDDI). It uses a layered approach (Figure 22.11) in which the JNDI SPI is a specific presentation interface for the selected directory implementation possibility (e.g., LDAP, DNS, CORBA). A JNDI name is a user-friendly name for an object. Each name is bound to its corresponding object by the naming directory service provided by the J2EE Server. This way, a J2EE component can locate objects by invoking the JNDI lookup method. However, the JNDI name of a resource and the name of the resource referenced are not the same. This allows clean separation between code and implementation.
22.4.7 Java Database Connectivity (JDBC) The Java Database Connectivity (JDBC) is an other application programming interface (API) provided by the J2EE Server to provide standard access to any relational database management system (RDBMS). It has been designed to be independent of any specific database system. This application program interface is used to encode access request statements in standard Structured Query Language (SQL) that are then passed to the program that manages the database. It returns the results through a similar interface.
22.4.8 J2EE Application Architectures Figure 22.12 presents what could be a typical J2EE application architecture. Many of the components mentioned in Figure 22.12 have been presented earlier. The Web browser talks to the servlets/JSP components exchanging XML or HTML-formatted messages via HTTP, possibly using HTTPS, a secured connection with SSL (Secured Socket Layer). Other applications (either internal or external to the enterprise) can access the application server via a RMI/IIOP connection, i.e., by Remote Method Invocations using an Internet Inter ORB (object request broker) Protocol. Legacy systems can be connected to the application server via a J2EE Connector Architecture (JCA) in which Java Beans will handle the communication with each legacy system. (This is made much easier when the legacy system has a properly defined API.) The role of the JCA is to ensure connection management, transaction management, and security services.
22.5 The .NET Architecture The Microsoft’s Dot Net Architecture (DNA or .NET) is the counterpart in the Windows world of the J2EE technology in the Java world [Microsoft, 2001]. The aim is to set a software technology for connecting information, people, systems, and various types of devices using various communication media (e.g., e-mail, faxes, telephones, PDAs) .NET will have the ability to make the entire range of
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 17 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-17
RMI: Remote Method Invocation
CORBA Client
IIOP: Internet Inter -ORB Protocol
RMI/ IIOP
SSL: Secured Socket Layer
Java Applets in Browser
RDBMS: Relational Database System X and Y: Interfaces to EJBs
Java Applications (Swing, AWT)
RMI/ IIOP X
XML/HTML HTTP Web Browser
(SSL)
JDBC
Y Servlet/ JSP
RMI/ IIOP
X Y
RDBMS EJB Entity
EJB Session
EJB Server ) (Application Server) J2EE J2EE Connector Connector
Legacy system
HTTP Server JMS
JNDI
OS/2, MAC OS, Windows, AIX, Solaris, OS/390, HP/UX, etc.
FIGURE 22.12 Typical J2EE n-tier application architecture.
computing devices work together and to have user information automatically updated and synchronized on all of them. It will provide centralized data storage (for wallet, agenda, contacts, documents, services, etc.), which will increase efficiency and ease of access to information, as well as synchronization among users and services (see Figure 22.6).
22.5.1 Basic Principles According to Microsoft, .NET is either a vision, i.e., a new platform for the digital era, and a set of technologies and products to support Windows users in implementing their open enterprise architectures. This vision fully relies on a Web Service strategy and the adoption of major Internet computing standards (more specifically, HTTP, SMTP, XML, SOAP). A Web Service is defined by Microsoft as a programmable entity that provides a particular element of functionality, such as application logic, and is accessible to a number of potentially disparate systems through the use of Internet standards. The idea is to provide individual and business users with a seamlessly interoperable and Web-enabled interface for applications and computing devices to make computing activities increasingly Web browseroriented. The .NET platform includes servers, building-block services (such as Web-based data storage or e-business facilities), developer tools, and device software for smart clients (PCs, lap-tops, personal digital assistants, WAP or UMTS phones, video-equipped devices, etc.) as illustrated by Figure 22.13. As seen in Figure 22.13, smart clients (application software and operating systems) enable PCs and other smart computing devices to act on XML Web Services, allowing anywhere, anytime access to information. XML Web Services, either internal, private, or public, are small, reusable applications interacting in XML. They are being developed by Microsoft and others to form a core set of Web Services (such as authentication, contacts list, calendaring, wallet management, ticketing services, ebiz-shops, news providers, and weather forecast services) that can be combined with other services or used directly by smart clients. Developer tools include Microsoft Visual Studio and Microsoft .NET Framework as a complete solution for developers to build, deploy, and run XML Web Services. Servers can belong to the MS Windows 2000 server family, the .NET Enterprise Servers, and the upcoming Windows Server 2003 family to host the Web Services or other back-end applications.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 18 Wednesday, August 4, 2004 8:40 AM
22-18
The Practical Handbook of Internet Computing
.NET Smart Clients Local server
Home PC
Portable PC
PDA
GSM
Corporate server Open standards
XML Web Services
XML WSDL/SOAP UDDI HTTP/SMTP TCP/IP
Developer Tools
Servers
FIGURE 22.13 Principles of the .NET architecture.
22.5.2 .NET Application Architectures .NET, like J2EE platforms, also clearly separates information presentation and layout to browsers from the business logic handled by the application server platform (Figure 22.14) .NET extends the classical Microsoft’s OLE-COM-DCOM-COM+ development chain for application integration technology. The counterpart of EJBs in Microsoft environments are ActiveX and COM+ objects. COM+ components are activated in the Microsoft Transaction Server (MTS), the counterpart of the EJB Server. The connection with Web browsers is managed by the Microsoft Web server called Internet Information Server (IIS) and its Active Server Pages (ASP.NET, formerly called ASP). ASP.NET allows a Web site builder to dynamically build Web pages on the fly, as required by clients, by inserting elements in Web pages which will let their contents be dynamically generated. These elements can be pure presentation elements or queries to a relational database but also result from business process execution carried out by ActiveX business components. ASP.NET supports code written in compiled languages such as Visual Basic, C++, C#, and Perl. ASP.NET files can be recognized by their.aspx extension. The application can serve calling Web Services and call support Web Services to do its job. Figure 22.14 presents what could be a typical .NET application architecture, to be compared to the J2EE one in Figure 22.12. Some similarities between the two architectures are obvious, such as JNDI which becomes an Active Directory, JMS which is replaced by MSMQ, the Microsoft message queuing system, and ODBC (Open Database Connectivity) which plays a similar role as JDBC. (Microsoft also offers ActiveX Data Objects [ADO] and OLE DB solutions to access data servers.) External applications can talk to the application server via DCOM, if they are developed with Microsoft technology or need a bridge with DCOM, otherwise. For instance, a CORBA client needs a CORBA/COM bridge to transform CORBA objects into COM objects. Users accessing the system with a Web browser to interact through XML or HTML pages can be connected by a HTTP/HTTPS connection. In this case the session is managed by the IIS/ASP module of the architecture previously mentioned.
22.5.3 Microsoft Transaction Server (MTS) and Language Runtime The Microsoft Transaction Server (MTS), based on the Component Object Model (COM) which is the middleware component model for Windows NT, is used for creating scalable, transactional, multiuser, and secure enterprise-level server side components. It is a program that manages application and database transaction requests or calls to Web Services on behalf of a client computer user. The Transaction Server
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 19 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-19
MTS: Microsoft Transaction Server CORBA/COM Bridge CORBA Client
IIS/ASP: Internet Information Server/Active Server Pages COM: Component Object Model DCOM: Distributed Component Object Model
DCOM
ActiveX Control in Browser
ODBC: Object Database Connectivity MSMQ: Microsoft Message Queueing
DCOM ADO, OLE DB, ODBC
Applications Shared Property Manager
XML/HTML HTTP Web Browser (ActiveX)
(SSL)
IIS/ ASP
DCOM COM+
X Y
RBDMS MTS (Application Server)
COM+ Component
Babylon Integration Server
MSMQ
Active Directory
Legacy system
HTTP Server
Windows 2000/XP
FIGURE 22.14 Typical Microsoft .NET n-tier application architecture.
screens the user and client computer to remove the need for having to formulate requests for unfamiliar databases and, if necessary, forwards the requests to database servers or to other types of servers. It also manages security, connection to servers, and transaction integrity. MTS can also be defined as a component-based programming model (for managed components). An MTS component is a type of COM component that executes in the MTS run-time environment. MTS supports building enterprise applications using ready-made MTS components and allows “plug and work” with off-the-shelf MTS components developed by component developers. It is important to realize that MTS is a stateless component model, whose components are always packaged as an in-proc DLL (Dynamic Link Library). Since they are made of COM objects, MTS components can be implemented in an amazing variety of different languages including C++, Java, Object Pascal (Delphi), Visual Basic and even COBOL. Therefore, business objects can be implemented in these various languages as long as they are compatible with the Common Language Runtime (CLR), a controlled environment that executes transactions of the Microsoft Transaction Server. The difference with J2EE servers is that in Microsoft’s .NET application servers the business objects (COM+ components) are compiled in an intermediary language called MSIL (Microsoft Intermediate Language). MSIL is a machine-independent language to be compiled into machine code by a just-intime compiler and not interpreted as with Java. This peculiarity makes the approach much more flexible and much faster at run-time than Java-based systems. Sitting on top of CLR, MTS allows the easy deployment of applications based on the 3-tier architecture principle using objects from COM/COM+ libraries. It also allows deployment of DCOM, the Distributed Component Object Model of Microsoft COM. It is enough to install MTS on a machine to be able to store and activate COM components, for instance as a DLL if the component is written in Visual Basic. These components can be accessed by a remote machine by registering them on this new server. Like EJBs, these components are able to describe all their interfaces when called by specific methods. The components can be clustered into groups, each group executing a business process. Such a process group is called a MTS package. Each package can be maintained separately, which ensures its independence and its isolation from the transactional computation point of view. MTS ensures the management of
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 20 Wednesday, August 4, 2004 8:40 AM
22-20
The Practical Handbook of Internet Computing
transactions and packages. For instance, packages frequently solicited will be automatically maintained in main memory while less frequently used packages will be moved to permanent storage.
22.5.4 Microsoft Message Queue (MSMQ) MSMQ is the message-oriented middleware provided by Microsoft within .NET. It is the analog of JMS in J2EE. The Microsoft Message Queue Server (MSMQ) guarantees a simple, reliable, and scalable means of synchronous communication freeing up client applications to do other tasks without waiting for a response from the other end. It provides loosely-coupled and reliable network communications services based on a message queuing model. MSMQ makes it easy to integrate applications, implement a pushstyle business event delivery environment between applications, and build reliable applications that work over unreliable but cost-effective networks.
22.5.5 Active Directory Active Directory is Microsoft’s proprietary directory service, a component of the Windows 2000 architecture. Active Directory plays the role of J2EE’s JNDI in .NET. Active Directory is a centralized and standardized object-oriented storage organization that automates network management of user data, security, and distributed resources, and enables interoperation with other directories (e.g., LDAP). It has a hierarchical organization that provides a single point of access for system administration (regarding management of user accounts, client servers, and applications, for example). It organizes domains into organizations, organizations into organization units, organizations units into elements.
22.6 Comparison of J2EE and .NET The two technologies, J2EE and .NET, are very similar in their principles. Both are strongly based on new standards, a common one being XML. Figure 22.12 and Figure 22.14 have been intentionally drawn using the same layout to make the comparison more obvious. They, however, differ a lot at the system and programming levels, although both are object-oriented and service-based. While the .NET technology is younger than J2EE and is vendor-dependent, J2EE remains a specification and .NET is a product. However, there are many J2EE compliant products on the market and, therefore, both technologies are operational and open to each other, despite some compatibility problems among J2EE platforms when the standard is not fully respected. From a practical standpoint, .NET offers solutions that are faster to develop and that execute faster, while J2EE offers a stronger component abstraction. Especially, the library interfaces and functionality is well-defined and standardized in the J2EE development process. Hence, a key feature of J2EE-based application development is the ability to interchange plug-compatible components of different manufacturers without having to change a line of code. J2EE is recommended, even for Windows environments, when there are many complex applications to be integrated and there is a large base of existing applications to be preserved (legacy systems). However, performance tuning of J2EE platforms (JVM in particular) remains an issue for experts due to their inherent complexity. This is less critical for the CLR of .NET and its language interface, which recognizes 25 programming languages. However, load balancing is better managed in J2EE products because EJBs can dynamically replicate the state of a session (stateful session entities) while .NET allows session replication for ASP.NET pages but not for COM+ components. Finally, in terms of administration, a .NET platform, thanks to its better integration, is easier to manage than a J2EE platform. Table 22.1 provides a synthetic comparison of the two types of platform. The main difference lies in the fact that with .NET Microsoft addresses the challenge of openness in terms of interoperability while Sun addresses it in terms of portability. Experience has proved that .NET is easier to integrate to the existing environment of the enterprise and that Web Service philosophy is not straightforward for J2EE platforms.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 21 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-21
TABLE 22.1 J2EE and .NET Comparison J2EE
.NET/DNA
J2EE = a specification Middle-tier components: EJB • Stateless and stateful service components plus data components Naming and Directory: JNDI J2EE features • Portability is the key • Supported by key players in middleware vendors (30+) • Rapidly evolving specifications (J2EE connector, EJB) • Strong competencies in object technology required • Many operating systems supported and strong portability of JVM J2EE at work • Dynamic Web pages: Servlets/JSP • Database access: JDBC, SQL/J • Interpreter: Java Runtime Environment • SOAP, WSDL, UDDI: Yes • de facto industry standard
.NET = a product Middle-tier components: COM+ • Stateless service components (.NET managed components) Naming and Directory: Active Directory DNA features • Interoperability is the key • Integrated environment that simplifies its administration • Reduced impact of code granularity on system performances • Fully dependent on Windows 2000 • Open to other platforms .NET at work • Dynamic Web pages: ASP.NET • Database access: ADO.NET • Interpreter: Common Language Runtime • SOAP, WSDL, UDDI: Yes • Productivity and performances • Reduced time-to-market
22.7 Conclusion: The Global Architecture Interoperability of enterprise systems is the key to forthcoming e-business and e-government organizations or X2X organizations, be they business-to-customer (B2C), business-to-business (B2B), businessto-employee (B2E), government-to-citizen (G2C), government-to-employee (G2E), government-tobusiness (G2B), or government-to-nongovernment (G2NG) relationships. In any case, Enterprise Interoperability (EI), i.e., the ability to make systems work better together, is, first of all, an organizational issue and next a technological issue [Vernadat, 1996]. The aim of this chapter was to review emerging technological issues to define and build a sound Corporate Information Systems Architecture on which an open, flexible, and scalable Enterprise Architecture, dealing with the management of business processes, information and material flows, as well as human, technical, and financial resources, can sit. Among these, the J2EE and .NET architectures are going to play a premier role because of their component-oriented approach, their Web Service orientation, and their ability to support execution of interorganizational business processes in a totally distributed operational environment preserving legacy systems. As illustrated by Figure 22.15, there are two other essential components in the big picture of Internetbased IT systems architecture to achieve enterprise interoperability that have not been discussed in this chapter. These are Enterprise Application Integration (EAI) and Enterprise Information Portals (EIP). Enterprise Application Integration: Implementing business processes requires the integration of new or legacy applications of different nature (e.g., SQL databases, bills-of-materials processors, ERP systems, home-made analytical systems, etc.). EAI tools and standards facilitate communication, i.e., data and message exchange, among applications within a company or with partner companies. EAI techniques are strongly based on the data neutralization principle by means of common exchange formats (e.g., EDI, STEP, Gesmes, HTML, XML). Enterprise Information Portals: These are front-end systems providing a personalized user interface and single-point-of-access based on Web browsing technology to provide access to different internal and external information sources, services or applications, and to integrate or substitute existing user interfaces for host access or client computing. To complete the global picture of Figure 22.15, a number of features can be added to the architecture, especially at the level of the application server. One of these is the single sign-on (SSO) facility to prevent
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 22 Wednesday, August 4, 2004 8:40 AM
22-22
The Practical Handbook of Internet Computing
Presentation Layer
Application Server Layer
Back-end Layer
Authentication/Authorization Logging Loggingand andTracing Tracing
C/S App C/S App
Back-end Systems
Enterprise Portol
Servlet Servlet Response JSP JSP XML/XSL XML/XSL
usiness Business Rules
Connector Connector
Request Web Web App App
Request Manager
Users
Business BusinessRule RuleEngine Engine
Content Management M anagement
Datawarehouse Datawarehouse
Gateway XML Gateway
E A I
Gateway
Legacy Database Repository
Mobile App Servlet Servlet JSP JSP XML/XSL XML/XSL
Session Sessionmanagement management
…
Common services/ services/encryption/… Common encryption/…
Profiling Configuration Repository
FIGURE 22.15 The global architecture of future IT systems.
users from logging on again each time they access another system in the architecture (authentication/ authorization module). Another one concerns the content management module to deal with storage and management of any kind of electronic documents to be accessed to serve client applications or be presented within Web pages to users. Profiling is another very interesting feature that allows to define different groups of users and to associate with each group specific access rights and specific sets of functionality. In addition, encryption/decryption capabilities have proven to be very useful features to protect confidential or sensitive data. The business rule engine module is a component that allows to control execution of business processes and to enforce the respect of predefined declarative rules about enterprise operations. With J2EE, .NET, Enterprise Portal, and Web Service technologies, the IT community is providing the business and administration communities with the necessary tools and techniques to build interoperable enterprise architectures, i.e., open, expandable, modular, reusable, and cooperative solutions to their integration needs.
References IST Diffuse Project. Will Web services revolutionize e-Commerce? Proceedings of the Second Annual Diffuse Conference, Brussels, Belgium, February 6, 2002. Khalaf, R., Curbera, F., Nagy, W., Mukhi, N., Tai, S., Duftler, M. Web services, In Practical Handbook of Internet Computing (M. Singh, Ed.), CRC Press, Boca Raton, FL, 2005. Lavender, G. Directory services, In Practical Handbook of Internet Computing (M. Singh, Ed.), CRC Press, Boca Raton, FL, 2005. Metadata Registries. Web site of Open Forum 2003 on Metadata Registries, http://metadata-stds.org/ openforum2003/frmain.htm, 2003. Microsoft. The Dot Net Architecture, http://www.microsoft.com/net, 2001. OASIS. UDDI: Universal Description, Discovery, and Integration, http://www.uddi.org, 2002. OMG. Model Driven Architecture (MDA), Document number ormsc/2001-07-01, Object Management Group, 2001 (http://www.omg/org/mda/). OMG. MDA Guide Version 1.01, Document number omg/2003-06-01, Object Management Group, June 2003. Sahai, A. Business processes integration, In Practical Handbook of Internet Computing (M. Singh, Ed.), CRC Press, Boca Raton, FL, 2005.
Copyright 2005 by CRC Press LLC
C3812_C22.fm Page 23 Wednesday, August 4, 2004 8:40 AM
Internet-Based Enterprise Architectures
22-23
SUN Microsystems. J2EE: Java to Enterprise Edition and Execution, http://java.sun.com, 2001. Vernadat, F.B. Enterprise Modeling: Principles and Applications, Chapman & Hall, London, 1996. Vernadat, F.B. Enterprise Integration and Management in Agile Organizations, In Agile Manufacturing (A. Gunasekaran, Ed.), Springer-Verlag, Berlin, 2000. Wilde, E. Advanced XML technologies, In Practical Handbook of Internet Computing (M. Singh, Ed.), CRC Press, Boca Raton, FL, 2005. W3C. World Wide Web Consortium, XML: eXtensible Markup Language, http://www.w3.org/xml, 2000a. W3C. World Wide Web Consortium, XSL: eXtensible Stylesheet Language, http://www.w3.org/Style/XSL, 2000b. W3C. World Wide Web Consortium, WSDL: Web Service Description Language, http://www.w3.org/TR/ wsdl, 2001. W3C. World Wide Web Consortium, SOAP: Simple Object Access Protocol, http://www.w3.org/TR/SOAP, 2002a. W3C. World Wide Web Consortium, Web Service, http://www.w3.org/2002/ws, 2002b.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 1 Wednesday, August 4, 2004 8:42 AM
23 XML Core Technologies CONTENTS 23.1 Introduction 23.2 Core Standards 23.2.1 XML 23.2.2 XML Namespaces
23.3 XML Data Models 23.3.1 XML Information Set (XML Infoset) 23.3.2 XML Path Language (XPath) 23.3.3 XML Application Programming Interfaces
23.4 XML Schema Languages 23.4.1 XML Schema 23.4.2 RELAX NG 23.4.3 Document Schema Definition Languages (DSDL)
Erik Wilde
References
23.1 Introduction The Extensible Markup Language (XML) has become the foundation of very diverse activities in the context of the Internet. Its uses range from an XML format for representing Internet Request For Comments (RFC) (created by Rose [1999]) to very low-level applications such as data representation in remote procedure calls. Put simply, XML is a format for structured data that specifies a framework for structuring mechanisms and defines a syntax for encoding structured data. The application-specific constraints can be defined by XML users, and this openness of XML makes it adaptable to a large number of application areas. Ironically, XML today resembles the Abstract Syntax Notation One (ASN.1) of the Open Systems interconnection (OSI) protocols, which at the time it was invented was rejected by the Internet community as being unnecessarily complex. However, XML is different from ASN.1 in that it has only one set of “encoding rules,” the XML document format, which is character-based. There is an ongoing debate whether that is a boon (because XML documents are very bulky and handling binary data is a problem) or a bane (because applications only need to know one encoding format), but so far no alternative encodings have been successful on a large scale. Apart from this difference in encoding rules, XML can be viewed as the “presentation layer” (referring to the OSI reference model) of many of today’s Internetbased applications. XML users can be divided into two camps, the “document-oriented” and the “dataoriented.” Whereas the document-oriented users will probably reject XML as a data presentation format as being too simplistic, the overwhelming majority of XML users — the data-oriented users — will probably agree with this view. In this chapter, XML and a number of accompanying specifications are examined. The topics covered are by no means exhaustive, as the number of XML-related or XML-based technologies is rising practically daily. This chapter discusses the most important technologies and standards; for a more exhaustive and up-to-date list, please refer to the online glossary of XML technologies at http://dret.net/glossary/.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 2 Wednesday, August 4, 2004 8:42 AM
23-2
The Practical Handbook of Internet Computing
This chapter discusses XML technologies as they are and as they depend on each other. Each of the technologies may be used in very different contexts, and sometimes the context may be very important to decide which technology to use and how to use it. To help make these decisions, a set of general guidelines for using XML in Internet protocols has been published as RFC 3470 by Hollenbeck et al. [2003]. In Section 23.2, XML core technologies that are the foundation of almost all applications of XML today are described. XML’s data model is discussed in Section 23.3, which describes XML on a more abstract level. In many application areas, it is necessary to constrain the usage of XML. This is the task of XML schema languages, which are described in Section 23.4. In Chapter 24, more technologies built on top of XML are discussed, in particular style sheet languages for using XML as the source for presentation, and technologies for supporting XML processing in various environments and application scenarios.
23.2 Core Standards XML itself is a rather simple standard. Its specification is easy to read and defines the syntax for structured documents, the “XML documents,” and a schema language for constraining documents, the Document Type Definition (DTD). The complexity of XML today is a result of the multitude of standards that in one way or the other build on top of XML. In this first section, another core standard is described, which in a way can be regarded as being part of “Core XML” today. This standard is XML Namespaces, a mechanism for making names in an XML environment globally unique. XML Base by Marsh [2001], a specification for the interpretation of relative URIs in XML documents, and XML Inclusions by Marsh and Orchard [2004], as a standard syntax for including documents or document fragments, are also considered a core part of XML, but are not covered in detail here.
23.2.1 XML The XML specification was first released in February 1998 by Bray et al. [1998], followed by a “second edition” in October 2000 and then a “third edition” in February 2004 by Bray et al. The second and third editions of XML 1.0 did not make any substantial changes to the specification; they simply corrected known errors of the preceding editions. XML 1.1 by Cowan [2004] reached recommendation status in early 2004, but is mainly concerned with character set and encoding issues,1 and thus also leaves most of XML as it is. XML, when seen as an isolated specification, is rather simple. It is a subset of the much older Standard Generalized Markup Language (SGML) by the International Organization for Standardization [1986], which has been an ISO standard since 1986. XML, which started out under the title “SGML on the Web,” has been developed by the World Wide Web Consortium (W3C). It was an effort to overcome the limited and fixed vocabulary of the Hypertext Markup Language (HTML) and enable users to define and use their own vocabularies for document structures. It turned out that the result of this effort, XML, also was extremely useful to people outside of the document realm, who started using XML for any kind of structured data, not just documents. Since its very beginning, XML has been used in application areas that were not part of the original design goals, which is the reason for some of the shortcomings of XML in today’s application areas, such as the support of application-oriented datatypes, or some way to make XML documents more compact. Figure 23.1 shows a simple example of an XML document. It starts with an (optional) XML declaration, which is followed by the document’s content, in this case, a book represented by a number of elements. Element names are enclosed in angle brackets, and each element must have an opening and a closing 1 Specifically, XML 1.1 is a reaction to Unicode moving from 2.0 past 3.0. Further, to facilitate XML processing on mainframe computers, the NEL character (U + 85) has been added to XML’s list of end-of-line characters.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 3 Wednesday, August 4, 2004 8:42 AM
XML Core Technologies
23-3
Practical Handbook of Internet Computing … XML Core Technologies The Extensible Markup Language (XML) … XML The XML specification has been first released in February
1998 by , followed by a second edition of the specification in October 2000 … …
… FIGURE 23.1 Example XML 1.0 document.
tag. Elements must be properly nested, so that the structure of elements of the document can be regarded as a tree, with the so-called document element acting as the tree’s root. In addition to elements, XML knows a number of other structural concepts, such as attributes (for example, the editor attribute of the book element) and text (the title elements contain text instead of other elements). Elements may also be empty (such as the ref element), in which case they may use two equivalent forms of markup, full () or abbreviated (). If a document conforms to the syntactic rules defined by the XML specification, it is said to be well-formed. The XML specification is rather easy to read and, in fact, can serve as a reference when questions about syntactic details of XML arise. DuCharme [1998] has published an annotated version of the specification (the original version, not the later editions), which makes it even easier to read. Further, the W3C provides an increasing number of translated specifications,2 which are intended to increase awareness and global ease of use. Logically, the XML specification defines two rather separate things: • A syntax for structured data: An XML document is a document using a Markup Language as a structuring mechanism. XML markup is a special case of the more generalized model of SGML markup usinga fixed set of markup characters. (The most visible ones are the "" characters for delimiting tags.) For HTML users, XML markup is straightforward; XML simply disallows some markup abbreviations that HTML provides through its use of SGML markup minimization features. Specifically, XML always requires full markup (the only exception is an abbreviation for empty elements), so the HTML method of omitting markup (such as element end tags and attribute value delimiters) is not possible. The most important structural parts of XML documents are elements and attributes. Elements can be nested and thus make it possible to create arbitrarily complex tree structures (see Figure 23.1 for an example). XML also allows processing instructions, comments, and some other con-
2
http://www.w3.org/Consortium/Translation/
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 4 Wednesday, August 4, 2004 8:42 AM
23-4
The Practical Handbook of Internet Computing
structs, but these are less frequently used in XML applications. If an XML document conforms to the syntax defined in the XML specification, it is said to be well-formed. • A schema language for constraining XML documents: As many applications want to restrict the structural possibilities of XML, for example, by defining a fixed set of elements and their allowed combinations, XML also defines a schema language, the DTD. A DTD defines element types, and the most important aspects of element types are their attributes and content models (see Figure 23.2 for an example). If an XML document conforms to all constraints defined in a DTD, it is said to be valid with regard to this DTD. DTDs are part of XML because they are also part of SGML, and it is this heritage from the document-oriented world that makes it easier to understand the limitations of DTDs. For example, DTDs provide only very weak datatyping support, and these types are applicable only to attributes. With XML’s success and its application in very diverse areas, the need for alternative schema languages became apparent, and alternative schema languages were developed (see Section 23.4 for further information). Even though XML documents and DTDs are not strictly separated (it is possible to embed a DTD or parts of it into an XML document), they should be regarded as logically separate. In a (purely hypothetical) new major version of XML, it is conceivable that DTDs would be left out of the core and become a separate specification. Figure 23.2 shows an example of a DTD, in this case a DTD for the XML document shown in Figure 23.1. All elements and attributes must be declared. Elements are declared by defining the allowed content, which may be none (for example, the ref element is declared to be EMPTY), only elements (for example, the book element is declared to have as content a sequence of one title and any number of chapter elements), or text mixed with elements (for example, the para element may contain text or emph, quote, or ref elements). The element declarations, in effect, define a grammar for the legal usage of elements. In addition, a DTD defines attribute lists that declare which attributes may or must be used with which elements (for example, a chapter element must have id and date attributes and may have an author attribute). If a document conforms to the definitions contained in a DTD, it is said to be valid with regard to this DTD. CDATA #REQUIRED > (title, para+, section*)> ID #REQUIRED CDATA #REQUIRED CDATA #IMPLIED > (title, para+)> CDATA #REQUIRED > (#PCDATA)> (#PCDATA | emph | quote | ref)*> (#PCDATA | quote | ref)*> (#PCDATA | emph )*> EMPTY> ID
#REQUIRED >
C3812_C23.fm Page 5 Wednesday, August 4, 2004 8:42 AM
XML Core Technologies
23-5
To summarize, XML itself is a rather simple specification for structured documents (using a markup language) and for a schema language to define constraints for these documents.
23.2.2 XML Namespaces XML Namespaces, defined by Bray et al. [1999], are a mechanism for associating URLs and one or more names. A new version of the specification by Bray et al. [2004], reached recommendation status in early 2004. The URI of a namespace serves as a globally unique identifier, which makes it possible to refer to the names in the namespace in a globally unique way. Namespaces are important when XML documents may contain names (elements and attributes) from different schemas. In this case, the names must be associated with a namespace, and XML Namespaces use a model of declarations and references to these declarations. Figure 23.3 shows an example of using namespaces. It uses the XML Linking Language (XLink) defined by DeRose et al. [2001], which has a vocabulary for embedding linking information into XML documents. XLink is based on namespaces; all information in a document that is relevant for XLink can be easily identified through belonging to the XLink namespace. In order to use a namespace, it must be declared in the XML document using attribute syntax (xmlns:xlink="http://www.w3.org/1999/ xlink"). The namespace declaration binds a prefix to a namespace name, in this case the prefix xlink to the namespace name http://www.w3.org/1999/xlink . This namespace name is defined in the XLink specification, and thus serves as an identifier that all names referenced from this namespace have the semantics defined in the XLink specification. An element or attribute is identified as belonging to a namespace by prefixing its name with a namespace prefix, such as in xlink:type="simple". In this case, the application interpreting the document knows that this attribute is the type attribute from the http://www.w3.org/1999/xlink namespace, and can take the appropriate action. Namespace declarations are recursively inherited by child elements and can be redeclared (in XML Namespaces 1.1, they can also be undeclared). There also is the concept of the default namespace, which is not associated with a prefix (the default namespace is the only namespace that can be undeclared in Practical Handbook of Internet Computing … XML Core Technologies The Extensible Markup Language (XML) … XML The XML specification has been first released in February 1998 by , followed by a second edition of the specification in October 2000 … … … FIGURE 23.3 Namespace example.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 6 Wednesday, August 4, 2004 8:42 AM
23-6
The Practical Handbook of Internet Computing
XML Namespaces 1.0). It is important to note that namespace prefixes have only local significance; they are the local mechanism that associates a name with a namespace URL. XML Namespaces impose some additional constraints on XML documents, the most important being that colons may not be used within names (they are required for namespace declarations and qualified names) and that the use of namespace declarations and qualified names must be consistent (e.g., no undefined prefixes are used). As namespaces are used in almost all XML application areas today, care should be taken that all XML documents are namespace-compliant.
23.3 XML Data Models Although XML defines a syntax for representing structured data, from the application point of view, the goal in most cases is to access that structured data and not the syntax. In fact, much of XML’s success is based on the fact that application programmers never have to deal with XML syntax directly, but can use existing tools to handle the syntax. Thus, from the application point of view, XML is often used in abstractions of various levels. For example, in most applications, whitespace in element tags (such as ) is irrelevant, but for programmers building an XML editor, which should preserve the input without any changes, this whitespace is significant and must be accessible through the XML tools they are using. According to the terminology specified by Pras and Schönwälder [2003], XML itself implements a data model. At a more abstract level, there is the question about XML’s information model (to quote Pras and Schönwälder [2003], “independent of any specific implementations or protocols used to transport the data”). There is an ongoing debate about whether XML should have had an information model and not just a syntax, but there is no simple answer to this question, and XML’s success demonstrates that a data model without an information model can indeed be very successful. However, from the developers’ point of view, this answer is unsatisfactory. The question remains: What is relevant in an XML document and what is not? The relevance of attribute order or of whitespace in element tags is one example of this question, and another is whether character references (such as ) are preserved or resolved to characters. As XML itself does not define an information model, and accentuated by the process of many other standards emerging in the XML world, it became apparent that the same questions about the information model come up again and again. In an effort to provide a solution to this problem, the W3C defined the XML Information Set, which is used by a number of W3C specifications as an information model. However, there still are several other data models in use, and it is unlikely that there will be a single and universally accepted XML information model soon. In the following sections, some of the more important information models are described.
23.3.1 XML Information Set (XML Infoset) The XML Information Set (XML Infoset) defined by Cowan and Tobin [2001] was the first attempt to solve the problem of different views of XML’s information model in different application areas. The Infoset is not intended to be a specification that is directly visible to XML users; it is intended to be used by specification writers as a common XML information model. The Infoset defines 11 types of information items (such as element and attribute items), which are each described by a number of properties. The Infoset is defined informally, and does not prescribe any particular data format or API.3 The Infoset also explicitly states that users are free to omit parts of it or add parts to it, even though it is unclear how this should be done. The Infoset omits a number of syntax-specific aspects of an XML document, such as whitespace in element tags and attribute order. The Infoset specification contains a list of omissions; some of them are
3
Seen in this way, an XML document is just one way to represent the information specified by the Infoset.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 7 Wednesday, August 4, 2004 8:42 AM
XML Core Technologies
23-7
Document Item
book
Element Item
Namespace Item
Attribute Item
Character Item(s)
chapter
xlink namespace
xlink namespace declaration
editor title
xlink id namespace
author date
title ...
'X'
... xlink namespace
'P'
'r'
'a'
'c'
'M'
para
'L Core ...' xlink namespace
'tical Handbook ...' ... xlink namespace
'T'
'h'
'e'
''
emph
FIGURE 23.4 Example XML Infoset.
minor, whereas others are important and considered by many people to be problematic. Among these omissions are CDATA sections, the entity structure of the XML document, and character references. Figure 23.4 shows an example of an XML Infoset. It depicts a part of the Infoset of the XML document shown in Figure 23.3. In particular, it shows how namespace declarations appear as two information items (an attribute and a namespace item) on the element where the namespace is declared, and as namespace items on all descendants of this element. This explains why in Figure 23.3, the xlink prefix for the XLink namespace can be used for the attributes of the link element even though the namespace is declared on another element (specifically, the book document element). Even though it could be argued that the Infoset could be improved in terms of extensibility (Wilde [2002]), its current informal version makes it hard to extend in an interoperable way. However, the Infoset is used by a number of other specifications, and as such, is certainly the most successful information model of XML. One interesting application of the Infoset is Canonical XML, defined by Boyer [2001]. It defines a normalization of XML documents, which makes it easier to compare documents (based on their Infoset content, so the goal is to compare Infosets rather than XML documents with all their possible syntactic variations). Canonical XML simply defines a serialization process of an Infoset into XML document syntax, and so could be specified rather easily by building on top of the abstraction provided by the Infoset.
23.3.2 XML Path Language (XPath) One of the most interesting XML technologies is the XML Path Language (XPath), defined by Clark and DeRose [1999]. It is a language for selecting parts of an XML document, based on a structural view of the document. XPath came into existence in an interesting way. When the Extensible Stylesheet Language (XSL) (described in Section 24.1.2) was created, it was discovered that the transformation part of the process would be useful in its own right, and so it was subsequently created as a stand-alone technology, the XSL Transformations (XSLT) (described in Section 24.2.2). Working on XSLT, it was discovered that the core of the language, the ability to select parts of an XML document, also had the potential to be useful in contexts other than XSLT. Consequently, XPath was created, so that currently XPath is referenced by XSLT, which in turn is referenced by XSL. In the meantime, this design has shown its usefulness,
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 8 Wednesday, August 4, 2004 8:42 AM
23-8
The Practical Handbook of Internet Computing
as XPath has already been used by a number of other XML technologies, most notably XML Schema, Schematron, the Document Object Model (DOM), and the XML Query Language (XQuery). XPath is a language that can be most easily understood through the analogies of XML as a tree structure and a file system as a tree structure. In the same way that most operating systems provide tools or commands for navigating through the file system’s structure, XPath provides a language for navigating through an XML document’s structure. In comparison to the simple but powerful cd command of operating systems, XPath provides the following generalizations: • Node types: While a file system tree only has nodes of one type (files), the nodes of an XML document tree can be of different types. The XML Infoset described in the previous section defines 11 types of information items. XPath reduces the complexity of the Infoset model to seven node types. The most important XPath node types are element and attribute nodes. • Axes: Navigation in a file system tree implicitly uses children for navigation, and XPath generalizes this by introducing axes, which are used to select nodes with relationships different from that of direct children (to remain as intuitive as possible, the child axis is the default axis, which means that if no axis is specified, the child axis is chosen). To illustrate these concepts, the XPath /html/head/title selects the title child of the head child of the hurl document element (assuming a HTML document structure), and is equivalent to the more explicit /child::html/child::head/child::title XPath. On the other hand, the /descendant::title XPath selects all title descendants (children, children’s children, and so on) from the document root, and thus uses a different axis. XPath defines a number of axes, most of them navigating along parent/child relationships, with the exception of a special axis for attributes and another for namespaces (both of these are XPath nodes). • Predicates: Axes and node types make it possible to select certain nodes depending on their relationship with other nodes. To make selection more powerful, it is possible to apply predicates to a set of selected nodes. Each predicate is evaluated for every node that has been selected by the node type and axis, and only if it evaluates to true will this node be ultimately selected. Predicates are simply appended after the axis and node specifiers, as in the XPath /descendant::meta[position()=2], which selects the second meta element in the document. The three concepts described above are the three constituents of a location step. Many XPaths are socalled location paths, and a location path is simply a sequence of location steps separated by slashes. The examples given above are XPath location paths, concatenating several steps with an axis (if the axis is omitted, the default child axis is taken), a node test,4 and optional predicates. XPath’s type system is rather simple; there are strings, numbers, booleans, and node sets. The first three types are well-known, for example, from programming languages. Node sets are specific to XPath and can contain any number of nodes from the XPath node tree. XPath makes it easy to select parts of an XML document, based on complex structural criteria. Depending on the context where XPath is being used, this selection can be supported by additional concepts such as variables. XPath provides a function library of some basic string, numerical, and boolean functions. However, XPath’s strength is the support for accessing XML structures, and the function library is sufficient for basic requirements, but lacks more sophisticated functionality (such as regular expression handling for strings). XPath applications are free to extend XPaths function library, and most XPath applications (such as XSLT) exploit this feature. Figure 23.5 shows three examples of XPath expressions, which all refer to the XML example document shown in Figure 23.1. The first XPath selects the title “children” of the book element, and thus selects the title element with the “Practical Handbook of Internet Computing” content. The second XPath 4 The node test can also be a name test, not only testing for node type but also testing for a specific name from among named nodes (i.e., elements and attributes).
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 9 Wednesday, August 4, 2004 8:42 AM
XML Core Technologies
23-9
1. /book/title
2. /book/chapter[@id='core']/@author 3. //ref[@id='xmΩ10=spec']/ancestor::chapter FIGURE 23.5 XPath examples.
selects the book element, then its chapter child where the id attribute has the value “core,” and then the author attribute of this element. Consequently, this XPath selects the attribute with the value "Erik Wilde". The third XPath selects all ref elements within the document (regardless of their location in the element tree), and from these only the ones with the id attribute set to “xml10-spec.” From these elements, it selects all ancestor element (i.e., all the elements that are hierarchically above them in the element tree) that are chapter elements. In effect, this XPath selects all chapters containing references to the XML 1.0 specification. For a common data model to be used with the XML Query Language (XQuery) (described in Section 24.3.2) and for a new version of XSLT (XSLT 2.0), a heavily extended version (2.0) of XPath is currently under construction by Berglund et al. [2003]. Formally, the new version of XPath will be based on an explicit data model (Fernández et al. [2003]), which is in turn based on the XML Infoset as well as on the Post Schema Validation Infoset (PSVI) contributions of XML Schema (see Section 23.4.1 for PSVI details). XPath 2.0 will be much more powerful than XPath 1.0, and it is planned to design it as backward compatible as possible (probably not 100%).
23.3.3 XML Application Programming Interfaces In many cases, XML users are not interested in handling XML markup directly, but want to have access to XML documents through some Application Programming Interface (API). In fact, this is one of the biggest reasons for XML’s success: A multitude of tools are available on very different platforms, and it is very rarely the case that users must deal with XML directly. To further underscore this point, every place in a system where XML is accessed directly (e.g., by reading markup from a file and inspecting it) should be reviewed with great care, because it usually means that code will be written that already exists in XML (in this case, in XML parsers) has to be rewritten. This is not only a waste of resources but also dangerous, because it usually results in inferior XML parsing code, which is not robust against the great variety of representations that XML allows. There are two main APIs for XML; one is the Document Object Model (DOM) described in Section 23.3.3.1, and the other one is the Simple API for XML (SAX) described in Section 23.3.3.2. Many modern parsers provide a DOM as well as a SAX interface, and Figure 23.6 shows how this is usually implemented. While SAX is a rather simple interface, which simply reports events while parsing a stream of characters from an XML document, DOM provides a tree model, which can be used for rather complex tasks. Many parsers implement a thin SAX layer on top of their parsing engine and provide access to this layer. However, parsers also implement a DOM layer on top of this SAX layer, which consumes the SAX events and uses them to create an in-memory representation of the document tree. Consequently, if an application only needs the sequence of events generated by a parser, then SAX is the right API. However, if a more sophisticated representation is necessary, and the application wants to have random access to the document structure, then a DOM parser should be chosen, which consumes more resources than a SAX parser (in particular, memory for the document tree structure), but on the other hand provides a more powerful abstraction of the XML document. Apart from the DOM and SAX APIs described in the following two sections, there are a number of alternative APIs for XML. Two of the more popular among these are JDOM and Java API for XML Parsing (JAXP). Whereas JDOM can be regarded as an essentially DOM-like interface optimized for Java, JAXP is not an XML API itself, but an abstraction layer (i.e., an API for other APIs). JAXP can be used to access XML-specific functionality from within Java programs. It supports parsing XML documents using the
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 10 Wednesday, August 4, 2004 8:42 AM
23-10
The Practical Handbook of Internet Computing
SAX Events
DOM/SAX Application
XML Processor
DOM Calls
DOM Tree Builder
Interpretation and SAX Event Generation
Scanner (Lexical Analysis) + Parser (Syntactic Analysis)
FIGURE 23.6 XML processor supporting SAX and DOM.
DOM or SAX APIs, and processing XML documents with XSLT using the TrAX API (more information about XSLT can be found in Section 24.2.2). 23.3.3.1 Document Object Model (DOM) The Document Object Model (DOM) is the oldest API for XML. In fact, the DOM existed before XML was created, and started as an API that provided an interface between JavaScript code and HTML pages. This first version of the DOM is often referred to as DOM Level 0. When XML was created, it quickly became apparent that the DOM would be good starting point for an XML API, and a first version of the DOM was created that supported HTML as well as XML. This first version, defined by Wood et al. [2000], is called the DOM Level 1 (DOM1). DOM quickly established itself as the most popular API for XML, one of the reasons being that it is not restricted to a particular programming language. The DOM is defined using the Interface Definition Language (IDL), which is a language for specifying interfaces independent of any particular programming language. The DOM specification also contains two languages, which are mappings from IDL to the peculiarities of a specific programming language. The two programming language bindings contained in the DOM specifications cover Java and ECMAScript (which is the standardized version of JavaScript), but DOM language bindings for a multitude of other languages are also available. The advantage of this independence from a programming language is that programmers can quickly transfer their knowledge about the DOM from one language to another, by simply looking at the new language binding. The disadvantage is that the DOM is not a very elegantly designed interface because it cannot benefit from the features of a particular programming language. This is the reason why JDOM was created, which has been inspired by the DOM but has been specifically designed for Java. Since the DOM has been very successful, many programmers were using it and requested additional functionality. Consequently, the DOM has evolved into a module-based API. DOM Level 2 (DOM2) defined by Le Hors et al. [2000] and DOM Level 3 (DOM3), defined by Le Hors et al. [2003], are both markers along this road, with the DOM3 being the current state of the art. DOM3 even contains an XPath Module, which makes it possible to select nodes from the document by using XPath expressions (for more information about XPath refer to Section 23.3.2). Implementations of the DOM (usually XML processors) may support only a module subset, so before selecting a particular software that supports the DOM, it is important to check the version of the DOM and the modules it supports.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 11 Wednesday, August 4, 2004 8:42 AM
XML Core Technologies
23-11
23.3.3.2 Simple API for XML (SAX) Although the DOM is a very powerful API, it also is quite complicated and requires a lot of resources (because a DOM implementation needs to create an in-memory representation of the document tree, which can be an issue when working with large documents). In an effort to create a lightweight XML API, the Simple API for XML (SAX) was created. As shown in Figure 23.6, it works differently from DOM because it is event-based. This means that SAX applications register event handlers with the API, which are then called whenever the corresponding event is triggered during parsing. These events are closely related to markup structures, such as recognizing an element, an attribute, or a processing instruction while parsing the document. This event-based model results in a fundamentally different way of designing an application. While DOM-based applications operate in a “pull-style” model, pulling the relevant information from an XML document by accessing the corresponding parts of the tree, SAX-based applications operate in a “pushstyle” model. In this model, the document is “pushed” through the parser, which simply calls the eventhandlers that have been registered. Essentially, the program flow of a DOM-based application is controlled by the application writer, whereas the program flow of a SAX-based application is mostly determined by the document being processed.5 A SAX parser is a more lightweight piece of software because it does not have to build an internal tree model of the document. However, this also means that applications cannot access a document in the random access mode that DOM supports. Consequently, the question of whether a particular application should be built on top of a SAX or DOM interface depends on the application’s requirements. For rather simple streaming applications, SAX is probably sufficient, whereas applications needing a more flexible way of accessing the document are probably better supported by a DOM interface.
23.4 XML Schema Languages As described in Section 23.2.1, the XML specification defines a syntax for XML documents as well as a schema language for these documents, the Document Type Definition (DTD). However, as XML became successful and was used in very diverse application areas, it quickly became apparent that the documentoriented and rather simple features of DTDs were not sufficient for a growing number of users. In principle, everything that can be described in a schema language (such as a DTD) can also be checked programmatically, so that it would be possible to use XML without any schema language and implement constraint checking in the application. However, a schema has some very substantial advantages over code: it is declarative, it is easier to write, understand, modify, and extend than code, and it can be processed with a variety of software tools, as long as these tools support the schema language. Basically, in the same way that XML is useful because people can use existing tools to process XML documents, a schema language is useful because people can use existing tools to do constraint checking for XML documents. Lee and Chu [2000] and van der Vlist [2002] have published comparisons of different schema languages that have been proposed by various groups. Since a schema language must reach critical mass to be useful (the language must be supported by a number of tools to have the benefit of being able to exchange schemas between different platforms), many of these proposals never had significant success. The W3C developed a schema language of its own, which took a number of proposals and used them to create a new and powerful schema language. The result of this effort is XML Schema described in Section 23.4.1. However, XML Schema has received some criticism because it is very big and bulky, and because it tries to solve too many problems at once. As an alternative, a simpler and less complicated schema language has been developed outside of W3C, the RELAX NG language described in Section 23.4.2. 5
This push vs. pull model is also an issue with the XSLT programming language (described in Section 24.2.2), which can be used in both ways.
Copyright 2005 by CRC Press LLC
C3812_C23.fm Page 12 Wednesday, August 4, 2004 8:42 AM
23-12
The Practical Handbook of Internet Computing
It is questionable whether there ever will be one schema language to satisfy the requirements from all areas where XML is used, and the likely answer to this question is that it is impossible. If it is ever developed, this schema language will be so complex and powerful that it would essentially have become a programming language, departing from the advantages of a small and declarative language. Consequently, a reasonable approach to schema languages is to live with a number of them, provide features to combine them, and handle validation as a modular task. This is the approach taken by the Document Schema Definition Languages (DSDL) activity, which is described in Section 23.4.3.
23.4.1 XML Schema XML Schema has been designed to meet user needs that go beyond the capabilities of DTDs. With one exception (the definition of entities, which are not supported by XML Schema), XML Schema is a superset of DTDs, which means that everything that can be done with DTDs can also be done with XML Schema. XML Schema is a two-part specification; the first part by Thompson et al. [2001] defines the structural capabilities, whereas the second part by Biron and Malhotra [2001] defines the datatypes that may be used in instances or for type derivation. Figure 23.7 shows an example of an XML Schema. It is based on the DTD shown in Figure 23.2, but only parts are shown because the complete schema is much longer than the DTD. Obviously, the XML Schema differs from DTD in that it uses XML syntax, whereas DTDs use a special syntax. Also, the . The XML fragment of Figure 25.1 has two photometa elements at the top level. The location element near the bottom of the figure is nested within the second photometa element and in turn contains , city, state, and note as subelements. The time element on line 3 has a zone attribute with value PST. At first glance, our sample data appears to possess a uniform structure. However, a closer examination reveals several variations: The second photometa element is missing a camera subelement as found in the first. The two time elements differ in their formats. The first includes a zone attribute indicating the time zone and uses a 24-hour format that includes seconds whereas the second has no time zone information and uses a 12-hour format without seconds. The location elements also exhibit differing structures. The first specifies the location using a brief label, while the second specifies the different components of a location separately using a set of subelements. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:
2002-12-21 15:03:07 10mm 8.0 1/125 n Death Valley Nikon 995 2002-12-23 10:37 PM 28mm 2.2 1/250 Y 2127 Firewood Ln Springfield MA Living Room
FIGURE 25.1 Photo metadata in XML.
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 4 Wednesday, August 4, 2004 8:46 AM
25-4
The Practical Handbook of Internet Computing
Photometa ID int
Date date
Time time
Flen float
F float
Shutter float
Flash boolean
1001 1002
2002=12-21 2002-12-23
23:03:07 22:37:00
10 28
8.0 2.2
0.008 0.004
0 1
Location Camera varchar(100) varchar(50) Death Valley Springfield
Nikon 995 null
FIGURE 25.2 A simple relational schema.
In Section 25.2, we explore several schemes for coping with such variations in structure. Our discussion is based on general methods for encoding semistructured data, specifically XML, into relational tables. In Section 25.3, we turn our attention to specific XML features found in commercial products from four database vendors. We compare the features with each other and relate there to the general methods introduced in Section 25.2. Throughout this chapter, we use our running example to tie together ideas from different methods and database products into a common framework.
25.2 Relational Schemas for Semistructured Data In this section, we present some simple methods for representing semistructured data in relational form. For concreteness, we will focus on data in XML format, using the data of Figure 25.1 as a running example.
25.2.1 Using Tuple-Generating Elements Perhaps the simplest method for storing the photo data is to use a relation with one attribute for each subelement of the photometa element, as suggested by Figure 25.2. In general, this method is based on choosing one XML element type as the tuple-generating element (TGE) type. We use a relation that has one attribute for each possible subelement (child) of the tuple-generating element, along with an ID attribute that serves as an artificial key of the relation. Each instance of a tuple-generating element is mapped to a tuple. The content of each subelement of this tuple-generating element is the value of the corresponding attribute in the tuple. One drawback of this method is that instances of the tuple-generating element that do not have every possible subelement result in tuples with nulls. In the sample data, the missing camera subelement of the second photometa element results in a null in the second tuple of Figure 25.2. Another drawback is that subelements with subelements of their own (i.e., those with element or mixed content [Bray et al., 1998]) are not represented well. In the sample data, the location element of the second photometa element has several subelements; the relational representation stores only the city. This problem is more severe for data that has a deeply nested structure.
25.2.2 Representing Deep Structure We may address the second drawback above by creating relational attributes for not only the immediate subelements of a tuple-generating element, but also the subelements at deeper levels. Essentially, this method flattens the nested structure that occurs within tuple-generating elements. Figure 25.3 illustrates this method for the sample data. In general, this method is based on selecting a tuple-generating element type and creating a relation with an ID attribute and an attribute for every possible subelement (direct or indirect) of such elements. For our running example, we have chosen photometa as the tuplegenerating element to yield the representation of Figure 25.3. (Rows beginning with º represent continuations of the preceding rows.) Subelements that have only element content are not mapped to attributes in this method. The location subelement of the second photometa element does not generate an attribute for this reason. Although this method addresses the second drawback of our earlier method, it exacerbates the problem of nulls. Note that we have chosen to represent the location information of
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 5 Wednesday, August 4, 2004 8:46 AM
Semistructured Data in Relational Databases
25-5
Photometa ID int
Date date
Time time
Flen float
F float
Shutter float
1001 1002
2002-12-21 2002-12-23
23:03:07 22:37:00
10 28
8.0 2.2
0.008 0.004
Flash … boolean … 0 1
… … Photometa (contd.)
… …
Addr varchar(40)
… Death Valley … 2127 Firewood Ln
City varchar(40)
State char(2)
Camera varchar(50)
null Springfield
null MA
Nikon 995 null
FIGURE 25.3 The schema of Figure 25.2 modified for detailed location information.
the first photometa element in the Addr attribute of the relation. The alternative — of including a location attribute in the relation — results in more nulls. These simple representations have the advantage of being easy to query. For example, to locate the IDs of photos taken at a location with the string “field” somewhere in its name we may use the following query for the first scheme: Select ID where Photometa like ‘%field%’;
The query for the second scheme is only slightly longer: select ID from Photometa where Addr like ‘%field%’ or city like ‘%field%’ or State like ‘%field%’;
25.2.3 Representing Ancestors of TGEs Both the methods described above ignore XML content that lies outside the scope of the selected tuplegenerating elements. For example, consider the data depicted in Figure 25.4. This data is similar to that of Figure 25.1, except that the photometa elements are now not the top-level elements, but are grouped within elements representing the trips on which photographs were taken, while trips may themselves be grouped into collections. Our earlier scheme based on photometa as a tuple-generating element ignores the trip and collection elements. Consequently, there would be no way to search for photographs from a given trip or collection. We may address this shortcoming by adding to the relation schema one attribute for each possible ancestor of the tuple-generating element. The resulting relational representation of the data in Figure 25.4 is depicted in Figure 25.5. This representation allows us to search for tuple-generating elements based on the contents of their ancestors in addition to the contents of their descendants. For example, we may find photometa elements for photos from the Winter 2002 trip that have field in the address using the following query: select P.ID from Photometa P where P.Trip_name = ‘Winter 2002’ and (P.Addr like ‘%field%’ or P.City like ‘%field%’ or P.State like ‘%field%’);
We need not restrict our queries to returning only the tuple-generating elements. For example, the following query returns the names of trips on which at least one flash picture was taken: select distinct P.Trip_name from Photometa P
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 6 Wednesday, August 4, 2004 8:46 AM
25-6
The Practical Handbook of Internet Computing
California 2002-12-21 15:03:07 10mm 8.0 1/125 n Death Valley Nikon 995 Winter 2002 2002-12-23 10:37 PM 28mm 2.2 1/250 y 2127 Firewood Ln Springfield MA FIGURE 25.14 A candidate DTD for the structured location elements in the data of Figure 25.1.
The DAD associates a DTD (document type definition) with the XML-typed attribute. The DTD for our example, photometa.dtd is exhibited in Figure 25.14. For our purpose, the following simplified description of DTDs suffices: A DTD contains two main kinds of declarations. The first is an ELEMENT declaration, which defines the allowable contents of an element, identified by its name. For example, the third line in Figure 25.14 indicates that city elements have text content (The notation #PCDATA is used to represent parsed character data, informally, text.) There is at most one element declaration for each element type (name). The declaration for the location element type indicates that such elements contain zero or more occurrences of elements of type line, city, state, and note, along with text. The syntax, following common conventions, uses | to denote choice and * to denote zero or more occurrences. The second kind of declaration found in DTDs is an ATTLIST declaration, which defines the set of attributevalue pairs that may be associated with an element type. For example, the last line in Figure 25.14 indicates that line elements may include a num attribute of text type. (The notation CDATA indicates character data, or text. A technical point is that, unlike #PCDATA, this character data is not parsed and may thus include special characters.) As indicated above, we may think of side tables as user-level indexes. Essentially, whenever a query on the main table accesses an XML element or attribute (within an XML-typed attribute) that is referenced by a column in a side table, the query may be made more efficient by using the side table to access the column. The increased efficiency is due to the side tables providing access to the required XML elements and attributes without query-time parsing. Although it is possible to use side tables in this manner, the task is made easier by views that are automatically defined by the system when side tables are initialized. Briefly, there is one view for each main table, consisting of a join with all its side tables on the key attribute. For our running example, the system creates the following view: create view Photometa.view(ID, Date, Time, Flen, F, Shutter, Flash, Camera, Line, City, State) as select P.ID, P.Date, P.Time, P.Flen, P.F, P.Shutter, P.Flash, P.Camera, L.Line, C.City, C.State from Photometa P, line_side_table L, city_side_table C where P.ID = L.ID and P.ID = C.ID;
Our “field in location” query is easily expressed using this view: select P.ID from Photometa_view P where City like ‘%field%’;
It is not necessary to create side-tables on all parts of the XML-typed attribute that may be queried. Indeed, since there is a maintenance overhead associated with side tables, a database designer must carefully select the side tables to instantiate. Queries that need to access parts of an XML-typed attribute that are not part of a side table must use extracting functions. The general scheme of extracting functions is extractType (Col, Path), where Type is a SQL type such as Integer or Varchar, Col is the name of an XML-typed column, and Path is an XPath expression indicating the part of the XML attribute that is to be extracted. Such a function returns objects of the type in its name, assuming the XML element or attribute to which the path points can be appropriately coerced (indicating an error otherwise). For our
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 16 Wednesday, August 4, 2004 8:46 AM
25-16
The Practical Handbook of Internet Computing
example, we may use the following query to find photo IDs that have “field” in the text content of either a location element or a city element nested within a location element: select P.ID from Photometa P, LocationsXML L where P.LOCID = L.LocID and (extractVarchar(L.Location,/location) like ‘%field%’ or extractVarchar(L.location,/location//city) like ‘%field%’);
The path expression in the argument of an extracting function must match no more than one value within the XML-typed attribute for each tuple of the main table. For example, the above query generates an error if there are multiple city elements within the location element corresponding to one photometa element (and tuple). Fortunately, DB2 provides table extracting functions for coping with this situation. The names of table extracting functions are obtained by pluralizing the names of the corresponding (scalar) extracting functions. These functions return a table that can be used in the from clause of a SQL query in the usual manner. For our running example, we may use the following query to find the IDs of photometa elements with “field” in the city, when multiple city elements (perhaps in multiple location elements) are possible. select P.ID from Photometa P, LocationsXML L where P.LocID = L.LocID and exists (select 1 from table(extractVarchars(L.location,/location//city)) as C where C.returnedVarchar like ‘%field%’);
25.3.2 Microsoft SQL Server Using Microsoft’s SQL Server, we may separate the structured and semistructured parts of our data as follows: The structured part is stored in a Photometa table similar to the one used in Section 25.3.1, as suggested by Figure 25.12. However, instead of storing the XML fragment describing the location of each Photometa element in a column of the Photometa table, we store all the location data (for all Photometa elements) in a separate file outside the database system. Let us assume this file is called LocationsXML. In order to maintain the mapping between Location and Photometa elements, we shall further assume that location elements have a LocID attribute that is a intuitively a foreign key referencing the ID attribute of the appropriate Photometa tuple. The main tool used to access external XML data is the OpenXML rowset function [Microsoft, 2003]. It parses XML files into a user-specified tabular format. Queries using the OpenXML function include a with clause that provides a sequence of triples consisting of a relational attribute name, the type of the attribute, and a path expression that indicates how the attribute gets its value from the XML data. For example, consider the following query, which finds the identifiers of Photometa elements that contain “field” in the city or description elements: select P.ID from Photometa P, openXML(@locationsFile, ‘/location’, 2) with (LOCID int ‘@LocID, Line varchar(40) ‘line’, City varchar(40) ‘city’, State varchar(40) ‘state’, Desc varchar(100) ‘text()’) L where P.LocID = L.LocID and (L.City like ‘%field%’ or L.Desc like ‘%field%’);
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 17 Wednesday, August 4, 2004 8:46 AM
Semistructured Data in Relational Databases
25-17
The first argument to the OpenXML function is a file handle (declared and initialized elsewhere) pointing to the LocationsXML file. The second argument is an XPath expression that matches elements that are to be extracted as tuples. These elements are essentially the tuple-generating elements discussed in Section 25.2.1. The third argument is a flag that indicates that the following with clause is to be used for mapping elements to tuples. The with clause declares the names and types of columns in the generated table in a manner similar to that used in a create table statement. Each attribute name is followed by a type and an XPath expression. This expression is evaluated starting at nodes matching the expression in the OpenXML function (/location in our example) as context nodes. Thus, the first triple in the “with” clause above indicates that the LocID attribute of a location element is used to populate the LocID attribute of the corresponding tuple. Although our example uses rather simple forms of XPath, it is possible to use more complex forms. For example, the parent or sibling of a context node may be accessed using the appropriate XPath axes (e.g., . . /, . . /alternate). If the XPath expression for some attribute in the with clause does not match anything for some instance of a context node (matching the expression in the OpenXML function), that position in the generated tuple has a null. It is an error if an XPath expression in the with clause matches more than one item for a given context node. SQL Server also provides an alternate method for storing XML that uses an Edges table to store the graph representation of XML. This method is essentially the method of Section 25.2.6.
25.3.3 Oracle XSU Oracle’s XML SQL Utility (XSU) includes features for parsing XML and storing the result in a relational table [Higgins, 2001, Chapter 5]. However, such storage works only if there is a simple mapping from the structure of the input XML data to the relational schema of the database. By simple mapping, we mean one similar to the one illustrated in Section 25.2.1. Although this method may be useful for storing well-structured XML data, it does not fare well with semistructured data because it necessitates cumbersome restructuring and schema redesign whenever the input data changes form. XSU also provides some features for storing unparsed XML text in a relational attribute. It provides the XMLType datatype, which is similar to the XMLVarchar, XMLCLOB, and XMLFile types of DB2. The extraction functions used to parse such attributes at query-execution time are also similar to those used by DB2, but use an object-oriented syntax. For example, the method extract(/product/price) may be used on a XMLType attribute to extract the price element; the numeric value of the result is extracted by a getRealVal method. (Similar methods exist for other types.) For our running example, we use a table similar to the one suggested by Figure 25.12, replacing XMLCLOB with XMLType. The following query may be used to locate the Photometa elements that contain “field” in their city elements: select P.ID from Photometa P where P.location.extract(/location/city).getStringVa1() like ‘%field%’;
In addition to the extract function, XSU provides a Boolean function existsNode for checking the existence of specified nodes in an XMLType field. For example. the following query finds Photometa tuples whose locations have a city subelement: select P.ID from Photometa P where P.location.existsNode(/location//city);
Unlike DB2, XSU does not use side tables as a tool for improving access to selected elements. However, query performance can be improved by using functional indexes based on the extract function. For our running example, we may speed up the execution of queries similar to the “field” query above by creating an index on the city elements as follows:
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 18 Wednesday, August 4, 2004 8:46 AM
25-18
The Practical Handbook of Internet Computing
create index photometa_city_idx on Photometa( location.extract(/location/city).getStringVal());
One may also create a text index on an XML attribute in order to support efficient searches on that attribute. Such an index is implemented by Oracle as a function-based index on the getCLOBVal( ) method of XMLType objects [Kaminaga, 2002b, a]. For our running example, we may create a text index on the contents of the location attribute as follows: create index photometa_location_txtidx on Photometa(location) indextype is ctxsys.context parameters (‘SECTION GROUP ctxsys.path_section_group’);
This index may now be used in an alternate version of the “field” query described above: select P.ID from photometa P where contains(P.location ‘field inpath(/location//city)’);
25.3.4 Sybase The XML features of the Sybase Adaptive Server Enterprise (ASE) use a tight coupling with Java classes and methods to aid storing XML in relational tables [Sybase, 2001, 1999]. A database designer creates a set of Java classes that are customized by the database designer, ASE provides three methods for storing XML: element, document, and hybrid. In the element storage method, the XML elements of interest are stored separately in relational attributes. This method is similar to the method of Section 25.2.7. For our running example, we may use a Photometa table similar to that suggested by Figure 25.12. Our query to find photographs with “field” in the cities of their locations can then be expressed as follows: select P.ID from Photometa P where P.location>>getLocationElement(1, “city”) like ‘%field%’;
In this query, the getLocationElement method is invoked for the location column (which is XMLvalued) of each tuple in the Photometa table. The second argument to this method specifies the name of the element that is to be extracted, and the first argument is the ordinal number of the element (among those with the same name). In our example query, the method returns the string value of the contents of the first city element. In the document storage method, XML elements of interest are stored together (coalesced) in a single document, which in turn is stored in a table of suitable schema. In our continuing example, consider a table LocationFiles that has a schema similar to the table LocationsXML described earlier. The difference is that now all location data is stored in a single attribute value. We will assume that location elements have an ID subelement that is a foreign key referencing the ID attribute of the corresponding Photometa tuple. This scheme has the advantage of permitting easy and efficient retrieval of all location data in XML form (perhaps for display or export to another application). However, querying the documentformat XML data is difficult. For example, we may attempt to write our “field in city” query as follows: (select F.XMLText>>getLocationElement(1, "ID") from LocationFiles F where F.XMLText>>getLocationElement(1,"city")like'\%field%') union \\ (select F.XMLText>>getLocationElement(2,"ID") from LocationFiles F where F.XMLText>>getLocationElement(2,"city")like’\%field\%');
However, this query checks only the cities of the first two locations in the XML-valued location attribute. We may extend it to check more cities; however, checking all cities requires the assistance of the host program.
Copyright 2005 by CRC Press LLC
C3812_C25.fm Page 19 Wednesday, August 4, 2004 8:46 AM
Semistructured Data in Relational Databases
25-19
The hybrid storage method is essentially a combination of the element and document storage methods. The relevant data is stored in both unparsed (document) and parsed (element) form, with the former providing efficient document-centric access and the latter providing efficient data-centric access. In this respect, the tables used for element storage in the hybrid method are user-level indexes analogous to side tables in DB2 (described in Section 25.3.1). However, unlike side tables, these user-level indexes are not automatically maintained by the database system; it is the responsibility of application programs to ensure that they remain consistent with the data in document storage. Although the term semistructured data has not been in use for very long, the kind of data it describes is not new. Traditionally, such data has either been ignored or handled using non-database techniques such as text search engines and application-specific methods. As for structured data, using a database management system for such data provides benefits such as efficient evaluation of ad hoc queries, consistency, concurrency control, and durability. In addition, applications that use both structured and semistructured data benefit from the ability to use a single system for both kinds of data. We have presented several methods for relational storage of semistructured data. As noted, current database systems provide a variety of features for managing such data. Mapping these features to the methods discussed in this chapter provides a framework for determining the best approach for the domain under consideration.
Acknowledgment This material is based upon work supported by the National Science Foundation under grants IIS9984296 (CAREER) and IIS-0081860 (ITR). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
References Chawathe, Sudarshan S. Managing Historical XML Data, volume 57 of Advances to Computers, pages 109–169. Elsevier Science, 2003. To appear. Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice-Hall, 2002. Higgins, Shelley. Oracle9i application developer’s guide — XML. Available at http://www.oracle.com/, June 2001. Release 1 (9.0.1) part number A88894–01. IBM. XML Extender administration and programming, version 7. Product information. Available at http://www. fbm.com/, 2000. Kaminaga, Garrett. Oracle Text 9.0.1 XML features overview. Oracle Technology Network. Available at http//otn.oracle.com/, November 2002a. Kaminaga, Garrett. Oracle Text 9.2.0 technical overview. Oracle Technology Network. Available at http/ /otn.oracle.com/, June 2002b. Microsoft. Using OPENXM. Microsoft SQL Server Documentation http://msdn. microsoft.com/, 2003. Selinger, Pat. What you should know about DB2 support for XML: A starter kit. The IDUG Solutions. Journal, 8(1), May 2001. International DB2 Users Group. Available at http://www.idug.org/. Shanmugasundaram, Jayavel, Eugene Shekita, Rimon Barr, Michael Carey, Bruce Lindsay, Hamid Pirahesh, and Berthold Reinwald. Efficiently publishing relational data as XML documents. The VLDB Journal, 10(2–3): 133–154, 2001. Sybase. Using XML with the Sybase Adaptive Server SQL databases. Technical White Paper. Available at http://www. sybase.com/, 1999. Sybase. XML technology in Sybase Adaptive Server Enterprise. Technical White Paper. Available at http:/ /www.sybase.com/, 2001.
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 1 Wednesday, August 4, 2004 8:49 AM
26 Information Security CONTENTS 26.1 Introduction 26.2 Basic Concepts 26.2.1 Access Control Mechanisms: Foundations and Models 26.2.2 A Brief Introduction to XML
26.3 Access Control for Web Documents 26.3.1 Access Control: Requirements for Web Data 26.3.2 A Reference Access Control Model for the Protection of XML Documents
26.4 Authentication Techniques for XML Documents 26.4.1 An Introduction to XML Signature 26.4.2 Signature Policies
26.5 Data Completeness and Filtering
E. Bertino E. Ferrari
26.5.1 Data Completeness 26.5.2 Filtering
26.6 Conclusions and Future Trends References
26.1 Introduction As organizations increase their reliance on Web-based systems for both day-to-day operations and decision making, the security of information and knowledge available on the Web becomes crucial. Damage to and misuse of the data representing information and knowledge affect not only a single user or an application but they may also have disastrous consequences on the entire organization. Security breaches are typically categorized into unauthorized data observation, incorrect data modification, and data unavailability. Unauthorized data observation results in disclosure of information to users not entitled to gain access to such information. All organizations we may think of, ranging from commercial organizations to social organizations such as healthcare organizations, may suffer heavy losses from both financial and human points of view upon unauthorized data observation. Incorrect modifications of data, either intentional or unintentional, may result in inconsistent and erroneous data. Finally, when data are unavailable, information crucial for the proper functioning of the organization is not readily available. A complete solution to the information security problem must thus meet the following three requirements: 1. Secrecy or confidentiality — which refers to the protection of data against unauthorized disclosure 2. Integrity — which means the prevention of unauthorized or improper data modification 3. Availability — which refers to the prevention of and recovery from software errors and from malicious denials making data not available to legitimate users
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 2 Wednesday, August 4, 2004 8:49 AM
26-2
The Practical Handbook of Internet Computing
In particular, when data to be secured refer to information concerning individuals, the term privacy is used. Privacy is today becoming increasingly relevant, and enterprises have thus begun to actively manage and promote the level of privacy they provide to their customers. In addition to those traditional requirements, the development of Web-based networked information systems has introduced some new requirements. New requirements that are relevant in such contexts include data completeness, self-protection, and filtering. It is important to note that whereas the traditional requirements are mainly meant to protect data against illegal access and use, the new requirements are meant to protect subjects and users. In particular, by data completeness we mean that a subject receiving an answer to an access request must be able to verify the completeness of the response, that is, the subject must be able to verify that it has received all the data that it is entitled to access, according to the stated access control policies. As an example, consider a Website publishing information about medical drugs. In such a case, a completeness policy may require that if a subject has access to information about a specific drug, information concerning side effects of the drug must not to be withheld from the subject. By selfprotection and filtering, we mean that a subject must be able to specify what is unwanted information and therefore be guaranteed not to receive such information. Such a requirement is particularly relevant in push-based information systems, automatically sending information to users, and in protecting specific classes of users, such as children, from receiving improper material. Data security is ensured by various components in a computer system. In particular, the access control mechanism ensures data secrecy. Whenever a subject tries to access a data item, the access control mechanism checks the right of the subject against a set of authorizations, stated usually by some security administrators. An authorization states which user can perform which action on which data item. Data security is further enhanced by cryptographic mechanisms that protect the data when being transmitted across a network. Data integrity is jointly ensured by several mechanisms. Whenever a subject tries to modify some data item, the access control mechanism verifies that the subject has the right to modify the data, whereas the semantic integrity mechanism verifies that the updated data are semantically correct. In addition, content authentication techniques, such as the ones based on digital signatures, may be used by a subject to verify the authenticity of the received data contents with respect to the original data. Finally, the error recovery mechanism ensures that data are available and correct despite hardware and software failures. Data availability is further enhanced by intrusion detection techniques that are able to detect unusual access patterns and thus prevent attacks, such as query floods that may result in denial of service to legitimate users. The interaction among some of the components described above is shown in Figure 26.1. In this chapter, we will first focus on access control mechanisms and related authorization models for Web-based information systems. In order to make the discussion concrete, we will discuss the case of data encoded according to XML [Extensible Markup Language, 2000]. However, the concepts and techniques we present can be easily extended to the cases of other data models or data representation languages. We then briefly discuss authentication techniques for XML data because this is the key issue for Internet computing, whereas we refer the reader to Bertino [1998] and to any database textbook for details on semantic integrity control, and to Stallings [2000] for cryptography techniques. We then outline recent work on data completeness and filtering, and finally present future research directions.
26.2 Basic Concepts In this section, we introduce the relevant concepts for the discussion in the subsequent sections. In particular, we first introduce the basic notions of access control mechanisms and present a brief survey of the most relevant models. We then introduce the XML language.
26.2.1 Access Control Mechanisms: Foundations and Models An access control mechanism can be defined as a system that regulates the operations that can be executed on data and resources to be protected. Its goal is thus to control operations executed by subjects in order
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 3 Wednesday, August 4, 2004 8:49 AM
Information Security
26-3
client client
client cryptography mechanisms digital signatures
Internet
access control mechanism
semantic integrity mechanism
data management server
enterprise system
FIGURE 26.1 Main security modules in an enterprise scenario.
to prevent actions that could damage data and resources. The basic concepts underlying an access control system are summarized in Figure 26.2. Access control policies specify what is authorized and can thus be used like requirements. They are the starting point in the development of any system that has security features. Adopted access control policies mainly depend on organizational, regulatory, and user requirements. They are implemented by mapping them into a set of authorization rules. Authorization rules, or simply authorizations, establish the operations and rights that subjects can exercise on the protected objects. The reference monitor is the control mechanism; it has the task of determining whether a subject is permitted to access the data. Any access control system is based on some access control model. An access control model essentially defines the various components of the authorizations and all the authorization-checking functions. Therefore, the access control model is the basis on which the authorization language is defined. Such a language allows one to enter and remove authorizations into the system and specifies the principles according to which access to objects is granted or denied; such principles are implemented by the reference monitor. Most relevant access control models are formulated in terms of objects, subjects, and privileges. An object is anything that holds data, such as relational tables, documents, directories, inter-process messages, network packets, I/O devices, or physical media. A subject is an abstraction of any active entity that performs some computation in the system. Subjects can be classified into users — single individuals connecting to the
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 4 Wednesday, August 4, 2004 8:49 AM
26-4
The Practical Handbook of Internet Computing
access control policies Authorization Rules
access denied access request
Reference Monitor
access permitted access partially permitted
FIGURE 26.2 Main components of an access control system.
system, groups — sets of users, roles — named collections of privileges or functional entities within the organization, and processes — executing programs on behalf of users. Finally, privileges correspond to the operations that a subject can exercise on the objects in the system. The set of privileges thus depends on the resources to be protected; examples of privileges are read, write, and execute privileges for files in a file system, and select, insert, update, and delete privileges for relational tables in a relational DBMS. Objects, subjects, and privileges can be organized into hierarchies. The semantics of a hierarchy depends on the specific domain considered. An example of hierarchy is the composite object hierarchy, typical of object-based DBMSs, that relates a given object to its component objects. Another relevant example is represented by the role hierarchy that relates a role in a given organization to its more specialized roles, referred to as junior roles. Hierarchies allow authorizations to be implicitly propagated, thus increasing the conciseness of authorization rules and reducing the authorization administration load. For example, authorizations given to a junior role are automatically propagated to its ancestor roles in the role inheritance hierarchy, and thus there is no need of explicitly granting these authorizations to the ancestors. The most well-known types of access control models are the discretionary model and the mandatory model. Discretionary access control (DAC) models govern the access of subjects to objects on the basis of the subjects identity and on the authorization rules. Authorization rules state, for each subject, the privileges it can exercise on each object in the system. When an access request is submitted to the system, the access control mechanism verifies whether there is an authorization rule authorizing (partially or totally) the access. In this case, the access is authorized; otherwise, it is denied. Such models are called discretionary in that they allow subjects to grant, at their discretion, authorizations to other subjects to access the protected objects. Because of such flexibility, DAC models are adopted in most commercial DBMSs. An important aspect of discretionary access control is related to the authorization administration, that is, the function of granting and revoking authorizations. It is the function by which authorizations are entered into (or removed from) the access control system. Common administration approaches include centralized administration, by which only some privileged users may grant and revoke authorizations, and ownership-based administration, by which grant and revoke operations on a data object are issued by the creator of the object. The ownership-based administration is often extended with features for administration delegation. Administration delegation allows the owner of an object to assign other users the right to grant and revoke authorizations, thus enabling decentralized authorization administration. Most commercial DBMSs adopt the ownership-based administration with administration dele-
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 5 Wednesday, August 4, 2004 8:49 AM
Information Security
26-5
gation. More sophisticated administration approaches have been devised such as the joint-based administration, by which several users are jointly responsible for authorization administration; these administration approaches are particularly relevant for cooperative, distributed applications such as workflow systems and computer-supported cooperative work (CSCW). One of the first DAC models to be proposed for DBMSs is the model defined by Griffiths and Wade [1976] in the framework of the System RDBMS, which introduced the basic notions underlying the access control models of current commercial DBMSs. Such a model has been then widely extended with features such as negative authorizations, expressing explicit denials, and thus supporting the formulation of exceptions with respect to authorizations granted on sets of objects; more articulated revoke operations; and temporal authorizations, supporting the specification of validity intervals for authorizations. Even though DAC models have been adopted in a variety of systems because of their flexibility in expressing a variety of access control requirements, their main drawback is that they do not impose any control on how information is propagated and used once it has been accessed by subjects authorized to do so. This weakness makes DAC systems vulnerable to malicious attacks, such as attacks through Trojan Horses embedded in application programs or through covert channels. A covert channel [Bertino, 1998] is any component or feature of a system that is misused to encode or represent information for unauthorized transmission. A large variety of components or features can be exploited to establish covert channels, including the system clock, the operating system interprocess communication primitives, error messages, the concurrency control mechanism, and so on. Mandatory access control (MAC) models address the shortcoming of DAC models by controlling the flow of information among the various objects in the system. The main principle underlying MAC models is that information should only flow from less protected objects to more protected objects. Therefore, any flow from more protected objects to less protected objects is illegal and it is thus forbidden by the reference monitor. In general, an MAC system specifies the accesses that subjects have to objects based on subjects–objects classification. The classification is based on a partially ordered set of access classes, also called labels, that are associated with every subject and object in the system. An access class generally consists of two components: a security level and a set of categories. The security level is an element of a hierarchically ordered set. A very well known example of such a set is the one including the levels Top Secret (TS), Secret (S), Confidential (C), and Unclassified (U), where TS > S > C > U. An example of the set of categories in an unordered set is NATO, Nuclear, Army. Access classes are partially ordered as follows: An access class ci dominates (>) a class cj iff the security level of ci is greater than or equal to that of cj and the categories of ci include those of cj. The security level of the access class associated with a data object reflects the sensitivity of the information contained in the object, whereas the security level of the access class associated with a user reflects the user’s trustworthiness not to disclose sensitive information to users not cleared to see it. Categories are used to provide finer-grained security classification of subjects and objects than classifications provided by security levels alone and are the basis for enforcing need-to-know restrictions. Access control in a MAC system is based on the following two principles formulated by Bell and LaPadula [1976]: • No read-up: a subject can read only those objects whose access class is dominated by the access class of the subject. • No write-flown: a subject can write only those objects whose access class dominates the access class of the subject. Verification of these principles prevents information in a sensitive object from flowing into objects at lower or incomparable levels. Because of such restrictions, this type of access control has also been referred to as multilevel security. Database systems that satisfy multilevel security properties are called multilevel secure database management systems (MLS/DBMSs). The main drawback of MAC models is their lack of flexibility, and therefore, even though some commercial DBMSs exist that provide MAC, these systems are seldom used. In addition to DAC and MAC models, a third type, known as role-based access control (RBAC) model, has been more recently proposed. The basic notion underlying such a model is one of role. Roles are strictly related to the
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 6 Wednesday, August 4, 2004 8:49 AM
26-6
The Practical Handbook of Internet Computing
Users
RA Roles
UA
PA
Permissions
Legend: RA: Role Assignment PA: Permission Assignment UA: User Assignment RSA: Role Session Assignment
RSA Sessions
FIGURE 26.3 Building blocks of RBAC models.
organization and can be seen as a set of actions or responsibilities associated with a particular working activity. Under RBAC models, all authorizations needed to perform a certain activity are granted to the role associated with that activity, rather than being granted directly to users. Users are then made members of roles, thereby acquiring the roles’ authorizations. User access to objects is mediated by roles; each user is authorized to play certain roles and, on the basis of the role, he or she can perform “accesses” on the objects. Because a role groups a (possibly large) number of related authorizations, authorization administration is greatly simplified. Whenever a user needs to perform a certain activity or is assigned a certain activity, the user only needs to be granted the authorization to play the proper role, rather than being directly granted the required authorizations. A session is a particular instance of a connection of a user to the system and defines the subset of activated roles. At each moment, different sessions for the same user can be active. When users log into the system, they establish a session and, during this session, can request activation of a subset of the roles they are authorized to play. Basic building blocks of RBAC models are presented in Figure 26.3. The use of roles has several well-recognized advantages from an enterprise perspective. First, because roles represent organizational functions, an RBAC model can directly support security policies of the organization. Authorization administration is also greatly simplified. If a user moves to a new function within the organization, there is no need to revoke the authorizations he had in the previous function and grant the authorizations he needs in the new function. The security administrator simply needs to revoke and grant the appropriate role membership. Last, but not least, RBAC models have been shown to be policy-neutral [Sandhu, 1996]; in particular, by appropriately configuring a role system, one can support different policies, including the mandatory and discretionary ones. Because of this characteristic, RBAC models have also been termed policy-neutral models. This is extremely important because it increases the flexibility in supporting the organization’s security policies. Although current DBMSs support RBAC concepts, they do not exploit the full potential of these models. In particular, several advanced RBAC models have been developed, supporting, among other features, role hierarchies by which roles can be defined as subroles of other roles, and constraints supporting separation of duty requirements [Sandhu, 1996]. An RBAC model supporting sophisticated separation of duty constraints has also been recently proposed for workflow systems [Bertino et al., 1999].
26.2.2 A Brief Introduction to XML XML [Extensible Markup Language, 2000] is today becoming the standard for data representation and exchange over the web. Building blocks of any XML document are elements and attributes. Elements can be nested at any depth and can contain other elements (subelements), in turn originating a hierarchical structure. An element contains a portion of the document delimited by two tags: the start tag, at the beginning of the element, and the end tag, at the end of the element. Attributes can have different types allowing one to specify element identifiers (attributes of type ID), additional information about the element (e.g., attributes of type CDATA containing textual information), or links to other elements (attributes of type IDREF(s)/URI(s)). An example of XML document modeling an employee dossier is
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 7 Wednesday, August 4, 2004 8:49 AM
Information Security
26-7
… … … … … … … … … … FIGURE 26.4 An example of XML document.
presented in Figure 26.4. Another relevant feature of XML is the support for the definition of applicationspecific document types through the use of Document Type Definitions (DTDs) or XMLSchemas.
26.3 Access Control for Web Documents In this section, we start by discussing the main access control requirements of Web data by highlighting how protecting Web data is considerably different from protecting data stored in a conventional information system. Then, to make the discussion more concrete, we present a reference access control model for the protection of XML documents [Bertino et al., 2001b, Bertino and Ferrari, 2002a]. Data exchange over the Web often takes the form of documents that are made available at Web servers or those that are actively broadcasted by Web servers to interested clients. Thus, in the following text we use the terms data and documents as synonyms.
26.3.1 Access Control: Requirements for Web Data In conventional Data Management Systems, access control is usually performed against a set of authorization rules stated by security administrators or users according to some access control policies. An authorization rule, in general, is specified on the basis of three parameters: (s, o, p) and specifies that subject s is authorized to exercise privilege p on object o. This simple identity-based paradigm, however,
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 8 Wednesday, August 4, 2004 8:49 AM
26-8
The Practical Handbook of Internet Computing
does not fit very well in the Web environment. The main reason is that Web data are characterized by peculiar characteristics that must be taken into account when developing an access control model suitable for their protection. These requirements are mainly dictated by the need of flexibility, which is necessary to cope with a dynamic and evolving environment like the Web. In the following, we briefly discuss the main issues that must be taken into account. 26.3.1.1 Subject Qualification The first aspect when dealing with the protection of Web data is how to qualify subjects to whom an access control policy applies. In conventional systems, subjects are usually referred to on the basis of an ID-based mechanism. Examples of identity information could be a user-ID or an IP address. Thus, each access control policy is bound to a subject (or a set of subjects) simply by including the corresponding IDs in the policy specification. This mechanism, although it has the advantage of being very simple, is no longer appropriate for the Web environment where the community of subjects is usually highly dynamic and heterogeneous. This means that more flexible ways of qualifying subjects should be devised that keep into account, in addition to the subject identities, other characteristics of the subject (either personal or deriving from relationships the subject has with other subjects). 26.3.1.2 Object Qualification The second issue regards the number and typology of protection objects to which a policy may apply, where by protection object we mean a data portion over which a policy can be specified. An important security requirement of the Web environment is the possibility of specifying policies at a wide range of protection granularity levels. This is a crucial need because Web data source contents usually have varying protection requirements. In some cases, the same access control policy may apply to a set of documents in the source. In other cases, different policies must be applied to different components within the same document, and many other intermediate situations may also arise. As an example of this need, if you consider the XML document in Figure 26.4, while the name and the professional experiences of an employee may be available to everyone, access to the health record or the manager evaluation record must be restricted to a limited class of users. To support a differentiated and adequate protection of Web data, the access control model must thus be flexible enough to support a spectrum of protection granularity levels. For instance, in the case of XML data, examples of granularity levels can be the whole document, a document portion, a single document component (e.g., an element or attribute), or a set of documents. Additionally, this wide range of protection granularity levels must be complemented with the possibility of specifying policies that exploit both the structure and the content of a Web document. This is a relevant feature because there can be cases in which all the documents with the same structure have the same protection needs and thus can be protected by specifying a unique policy, as well as situations where documents with the same structure have contents with very different sensitivity degrees. 26.3.1.3 Exception Management and Propagation Supporting fine-grained policies could lead to the specification of a possibly high number of access control policies. An important requirement is that the access control model has the capability of reducing as much as possible the number of policies that need to be specified. A means to limit the number of policies that need to be specified is the support for both positive and negative policies. Positive and negative policies provide a flexible and concise way of specifying exceptions in all the situations where a whole protection object has the same access control requirements, apart from one (or few) of its components. A second feature that can be exploited to limit the number of policies that need to be specified is policy propagation, according to which policies (either positive or negative) specified for a protection object at a given granularity level apply by default to all protection objects related to it according to a certain relationship. For instance, in the case of XML documents, the relationships that can be exploited derive from the hierarchical structure of an XML document. For example, one can specify that a policy specified on a given element propagates to its direct and indirect subelements.
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 9 Wednesday, August 4, 2004 8:49 AM
Information Security
26-9
26.3.1.4 Dissemination Strategies The last fundamental issue when dealing with access control for Web data is what kind of strategies can be adopted to release data to subjects in accordance with the specified access control policies. In this respect, there are two main strategies that can be devised: • Pull mode. This represents the simplest strategy and the one traditionally used in conventional information systems. Under this mode, a subject explicitly requests data to the Web source when needed. The access control mechanism checks which access control policies apply to the requesting subject and, on the basis of these policies, builds a view of the requested data that contains all and only those portions for which the subject has an authorization. Thus, the key issue in supporting information pull is how to efficiently build data views upon an access request on the basis of the specified access control policies. • Push mode. Under this approach, the Web source periodically (or when the same relevant event arises) sends the data to its subjects, without the need for an explicit request. Even in this case, different subjects may have the right to access different views of the brodcasted documents. Because this mode is mainly conceived for data dissemination to a large community of subjects, the key issue is how to efficiently enforce information push by ensuring at the same time the confidentiality requirements specified by the defined policies. In the following section, we illustrate an access control model for XML documents addressing the above requirements. The model has been developed in the framework of the Author-X project [Bertino et al., 2001b; Bertino and Ferrari, 2002a].
26.3.2 A Reference Access Control Model for the Protection of XML Documents We start by illustrating the core components of the model, and then we deal with policy specification and implementation techniques for the access control model. 26.3.2.1 Credentials Author-X provides a very flexible way for qualifying subjects, based on the notion of credentials. The idea is that a subject is associated with a set of information describing his characteristics (e.g., qualifications within an organization, name, age). Credentials are similar to roles used in conventional systems; the main difference is that a role can just be seen as a name associated with a set of privileges, denoting an organizational function that needs to have such privileges for performing its job, whereas credentials have an associated set of properties that can be exploited in the formulation of access control policies. Thus, credentials make easier the specification of policies that apply only to a subset of the users belonging to a role, which show some common characteristics (for instance, a policy that authorizes all the doctors with more than 2 years of experience). Because credentials may contain sensitive information about a subject, an important issue is how to protect this information. For instance, some credential properties (such as the subject name) may be made accessible to everyone, whereas other properties may be visible only to a restricted class of subjects. To facilitate credential protection, credentials in Author-X are encoded using an XML-based language called X-Sec [Bertino et al., 2001a]. This allows a uniform protection of XML documents and credentials in that credentials themselves are XML documents and thus can be protected using the same mechanisms developed for the protection of XML documents. To simplify the task of credential specification, X-Sec allows the specification of credential-types, that is, DTDS which are templates for the specification of credentials with a common structure. A credential-type models simple properties of a credential as empty elements and composite properties as elements with element content, whose subelements model composite property components. A credential is an instance of a credential type and specifies the set of property values characterizing a given subject against the credential type itself. X-Sec credentials are certified by the credential issuer (e.g., a certification authority) using the techniques proposed by the W3C XML Signature Working Group [XML Signature Syntax, 2002].
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 10 Wednesday, August 4, 2004 8:49 AM
26-10
The Practical Handbook of Internet Computing
The credential issuer is also responsible for certifying properties asserted by credentials. An example of an X-Sec credential type that can be associated with a department head is shown in Figure 26.5(a), whereas Figure 26.5(b) shows one of its possible instances. A subject can be associated with different credentials, possibly issued by different authorities, that describe the different roles played during the subject’s everyday life. For instance, a subject can have, in addition to a dept_head credential, also a credential named IEEE_member, which qualifies him or her as a member of IEEE. To simplify the process of evaluating subject credentials against access control policies, all the credentials a subject possesses are collected into an XML document called X-profile. 26.3.2.2 Protection Objects Author-X provides a spectrum of protection granularity levels, such as a whole document, a document portion, a single document component (i.e., an attribute/element or a link), and a collection of documents. These protection objects can be identified both on the basis of their structure and their content. Additionally, Author-X allows the specification of policies both at the instance and at the schema level (i.e., DTD/XMLSchema), where a policy specified at the schema level propagates by default to all the corresponding instances. To simplify the task of policy specification, Author-X supports different explicit propagation options that can be used to reduce the number of access control policies that need to be defined. Propagation options state how policies specified on a given protection object of a DTD/XMLSchema/document propagate (partially or totally) to lower-level protection objects. We use a natural number n or the special symbol ‘*’ to denote the depth of the propagation, where symbol ‘*’ denotes that the access control policy propagates to all the direct and indirect subelements of the elements specified in the access control policy specification, whereas symbol n denotes that the access control policy propagates to the subelements of the elements specified in the policy specification, which are, at most, n level down in the document/DTD/XML-Schema hierarchy. 26.3.2.3 Access Control Modes Author-X supports two different kinds of access control policies: browsing policies that allow subjects to see the information in a document and/or to navigate through its links, and authoring policies that allow the modification of XML documents under different modes. More precisely, the range of access modes provided by Author-X is the following: {view,navigate,append,write,browse-all,auth_all}, where the view privilege authorizes a subject to view an element or some of its components, the navigate privilege authorizes a subject to see the existence of a specific link or of all the links in a given element, the append privilege allows a subject to write information in an element (or in some of its parts) or to include a link in an element, without deleting any preexisting information, whereas the write privilege allows a subject to modify the content of an element and to include links in the element. The set of access modes is complemented by two additional privileges that respectively subsume all the browsing and authoring privileges. ]>
HR John Smith 168 Bright Street, 1709 New York City Data2000 objs_spec EMPTY> acc_policy_spec cred_expr CDATA #REQUIRED> objs_spec target CDATA #REQUIRED> objs_spec path CDATA> acc_policy_spec priv CDATA #REQUIRED> acc_policy_spec type CDATA #REQUIRED> acc_policy_spec prop CDATA #REQUIRED>
FIGURE 26.6 X-Sec access control policy base template..
26.3.2.4 X-Sec Access Control Policies Access control policies are encoded in X-Sec, according to the template reported in Figure 26.6. The template is a DTD where each policy specification is modeled as an element (acc_policy-spec) having an attribute/element for each policy component. The meaning of each component is explained in Table 26.1. The template allows the specification of both positive and negative credential-based access control policies. An access control policy base is therefore an XML document instance of the X-Sec access control policy base template. An example of access control policy base, referring to the portion of XML source illustrated in Figure 26.4, is reported in Figure 26.7. 26.3.2.5 Implementation Techniques for the Access Control Model To be compliant with the requirements discussed at the beginning of this section, Author-X allows the release of XML data according to both a push and a pull mode. In the following, we focus on the techniques devised for information push because this constitutes the most innovative way of distributing data on the Web. We refer the interested reader to [Bertino et al., 2001b] for the details on information push enforcement. The idea exploited in Author-X is that of using encryption techniques for an efficient support of information push. More precisely, the idea is that of selectively encrypting the documents to be released under information push on the basis of the specified access control policies. All document portions to which the same access control policies apply are encrypted with the same key. Then, the same document encryption is sent to all the subjects, whereas each subject only receives the decryption keys corresponding to document portions he or she is allowed to access. This avoids the problem of generating and sending a different view for each subject (or group of subjects) that must receive a document according to a push mode. A relevant issue in this context is how keys can be efficiently and securely distributed to the interested subjects. Towards this end, Author-X provides a variety of ways for the delivering of decryption keys to subjects. The reason for providing different strategies is that the key delivery method may depend on many factors, such as for instance the number of keys that need to be delivered, the number of subjects, specific subject preferences, or the security requirements of the considered domain. Basically, the Security Administrator may select between offline and online key delivery
TABLE 26.1 Attribute Specification Attr_name Cred_expr Target Path Priv Type Prop
Copyright 2005 by CRC Press LLC
Parent_node acc_policy_spec objs_spec objs_spec acc_policy_spec acc_policy_spec acc_policy_spec
Meaning XPath expression on X-profiles denoting the subjects to which the policy applies Denotes the documents/DTDs/XMLSchemas to which the policy applies Denotes selected portions within the target (through XPath) Specifies the policy access mode Specifies whether the access control policy is positive or negative Specifies the policy propagation option
C3812_C26.fm Page 12 Wednesday, August 4, 2004 8:49 AM
26-12
The Practical Handbook of Internet Computing
objs_spec target CDATA #REQUIRED> objs_spec path CDATA> sign_policy_spec duty CDATA #REQUIRED> sign_policy_spec type CDATA #REQUIRED> sign_policy_spec prop CDATA #REQUIRED>
FIGURE 26.9 X-Sec signature policy base template.
The structure of the template is very similar to that provided for access control policies; the only differences are attribute duty and element subjs_spec. Attribute duty specifies the kind of signature policy and can assume two distinct values: sign and joint-sign. The sign duty imposes that at least a subject, whose X-profile satisfies the credential expression in the signature policy, signs the protection objects to which the policy applies, whereas the joint-sign duty imposes that, for each credential expression specified in the signature policy, at least one subject, whose X-profile satisfies the specified credential expression, signs the protection objects to which the policy applies. To support both simple and joint signatures, the subjs_spec element contains one or more XPath compliant expressions on X-profiles. A signature policy base is an instance of the signature policy base template previously introduced. An example of signature policy base for the XML document in Figure 26.4 is reported in Figure 26.10. For instance, the first signature policy in Figure 26.10 imposes that each employee signs his resume, whereas according to the second policy, the manager must sign the employee’s evaluation. The third signature policy requires a joint signature by two members of the board of directors on the employee evaluation made by the board of directors.
26.5 Data Completeness and Filtering In this section, we discuss possible approaches for the enforcement of both data completeness and selfprotection and filtering, two additional security requirements that are particularly crucial when dealing with Web-based information systems.
26.5.1 Data Completeness By data completeness we mean that any subject requesting data from a Web data source must be able to verify that he or she has received all the data (or portions of data) that the subject is entitled to access, according to the stated access control policies. This requirement is particularly crucial when data are released according to a third-party architecture. Third-party architectures for data publishing over the Web are today receiving growing attention due to their scalability properties and to their ability to efficiently manage a large number of subjects and a great amount of data. In a third-party architecture, there is a distinction between the Owner and the Publisher of information. The Owner is the producer of the information, whereas Publishers are responsible for managing (a portion of) the Owner information and for answering subject queries. A relevant issue in this architecture is how the Owner can ensure a complete publishing of its data even if the data are managed by an untrusted third party. A possible approach to completeness verification [Bertino et al., 2002b] relies on the use of the secure structure of an XML document. In a third-party architecture, the secure structure is sent by the Owner to the Publisher
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 15 Wednesday, August 4, 2004 8:49 AM
Information Security
26-15
FIGURE 26.10 An example of signature policy base.
and successively returned to subjects together with the answer to a query on the associated document. By contrast, if a traditional architecture is adopted, the secure structure is sent directly to subjects when they submit queries to the information owner. This additional document contains the tagname and attribute names and values of the original XML document, hashed with a standard hash function, so that the receiving subject cannot get information he is not allowed to access. To verify the completeness of the received answer, the subject performs the same query submitted to the Publisher on the secure structure. Clearly, the query is first transformed to substitute each tag and attribute name and each attribute value with the corresponding hash value. By comparing the result of the two queries, the subject is able to verify completeness for a wide range of XPath queries [Bertino et.al., 2002b], without accessing any confidential information.
26.5.2 Filtering The Web has the undoubted benefit of making easily available to almost everyone a huge amount of information. However, the ease with which information of any kind can be accessed through the Web poses the urgent need of protecting specific classes of users, such as children, from receiving improper material. To this purpose, Internet filtering systems are now being developed that can be configured according to differing needs. The aim of a filtering system is to block accesses to improper Web contents. Filtering systems operate according to a set of filtering policies that state what can be accessed and what should be blocked. Thus, differently from what happens in access control mechanisms, filtering policies should not be specified by content producers (i.e., target-side), but by content consumers (i.e., clientside). Clearly, a key requirement for Internet filtering is the support for content-based filtering. This
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 16 Wednesday, August 4, 2004 8:49 AM
26-16
The Practical Handbook of Internet Computing
feature relies on the possibility of associating a computer-effective description of the semantic content (the meaning) of Web documents. The current scenario for implementing and using content-based filtering systems implies that content providers or independent organizations associate PICS-formatted labels with Web documents. However, such an approach has serious shortcomings. The first drawback is related to completeness, i.e., the possibility of representing the semantic content of Web documents to the necessary level of detail. A second important problem concerns the low level of neutrality of the current rating/filtering proposals. A third drawback is that current filtering systems do not have the ability of associating different policies to different classes of users on the basis of their credentials. This is a relevant feature when filtering systems must be used by institutional users (private users such as publishers or public users such as NGOs or parental associations) that must specify different filtering policies for different classes of users. Thus, a promising research direction is that of developing multistrategy filtering systems, able to support a variety of ways of describing the content of Web documents and a variety of different filtering policies. An example of this research trend is the system called MaX (E. Bertino. E. Ferrari, Perego, 2003), developed in the framework of the IAP EUFORBIA project. MaX provides the ability of specifying credential-based filtering policies, and it is almost independent from the techniques used to describe the content of a Web document. It can support either standard PICS-based rating systems, as well as keyword-based and concept-based descriptions. However, much more work needs to be done in the field of multistrategy filtering systems. A first research direction is related to the fact that most of the filtering systems so far developed are mainly conceived for textual data. An interesting issue is thus how to provide content-based filtering for other kinds of multimedia data such as, for instance, images, audio, and video. Another relevant issue is how to complement existing filtering systems with supervision mechanisms according to which accesses to a Website are allowed, provided that a subject or a group of subjects (e.g., parents) are preventively informed of these accesses.
26.6 Conclusions and Future Trends Information security, and in particular, data protection from unauthorized accesses, remain important goals of any networked information system. In this chapter, we have outlined the main concepts underlying security, as well as research results and innovative approaches. Much more, however, needs to be done. As new applications emerge on the Web, new information security mechanisms are required. In this chapter, we could not discuss many emerging applications and research trends. In conclusion, however, we would like to briefly discuss some important research directions. The first direction is related to the development of trust negotiation systems. The development of such systems is motivated by the fact that the traditional identity-based approach to establishing trust is not suitable for decentralized environments, where most of the interactions occur between strangers. In this context, the involved parties need to establish mutual trust before the release of the requested resource. A promising approach is represented by trust negotiation, according to which mutual trust is established through an exchange of property-based digital credentials. Disclosure of credentials, in turn, must be governed by policies that specify which credentials must be received before the requested credential can be disclosed. Several approaches to trust negotiation have been proposed (e.g., T. Yu, 2003). However, all such proposals focus on only some of the aspects of trust negotiation, such as the language to express policies and credentials, the strategies for performing trust negotiations, or the efficiency of the interactions, and none of them provide a comprehensive solution. We believe that key ingredients of such a comprehensive solution are first of all a standard language for expressing credentials and policies. In this respect, an XML-based language could be exploited for the purpose. The language should be flexible enough to express a variety of protection needs because the environments in which a negotiation could take place can be very heterogeneous. Another important issue is to devise different strategies to carry on the negotiation, on the basis of the sensitivity of the requested resource, the degree of trust previously established by the involved parties, and the requested efficiency. For instance, there can often be cases in which two parties negotiate the same or a similar resource. In such a case, instead of performing several
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 17 Wednesday, August 4, 2004 8:49 AM
Information Security
26-17
times the same negotiation from scratch, the results of such previous negotiations can be exploited to speed up the current one. Another research direction is related to providing strong privacy. Even though privacy-preserving datareleasing techniques have been widely investigated in the past in the area of statistical databases, the Web makes available a large number of information sources that can then be combined by other parties to infer private, sensitive information, perhaps also through the use of modern data-mining techniques. Therefore, individuals are increasingly concerned about releasing their data to other parties, as they do not know with what other information the released information could be combined, thus resulting in the disclosure of their private data. Current standards for privacy preferences are a first step towards addressing this problem. However, much more work needs to be done, in particular in the area of privacypreserving data-mining techniques. Another research direction deals with mechanisms supporting distributed secure computations on private data. Research proposals in this direction include mechanisms for secure multiparty computation — allowing two or more parties to jointly compute some computation of their inputs while hiding their inputs from each other — and protocols for private information retrieval — allowing a client to retrieve a selected data object from a database while hiding the identity of this data object from the server managing the database. Much work, however, needs to be carried out to apply such approaches to the large variety of information available on the Web and to the diverse application contexts. Finally, interesting research issues regarding how protection mechanisms that have been developed and are being developed for Web documents can be easily incorporated into existing technology and Web-based enterprise information system architectures. The Web community regards XML as the most important standardization effort for information exchange and interoperability. We argue that compatibility with XML companion technologies is an essential requirement in the implementation of any access control and protection mechanism.
References Bell, D.L., and L.J. LaPadula. Secure computer systems: Unified exposition and multics interpretation. Technical report, MITRE, March 1976. Bertino, E., Data security. Data and Knowledge Engineering, 25(1/2): 199–216, March 1998. Bertino, E., V Atluri and E. Ferrari. The specification and enforcement of authorization constraints in workflow managenent systems. ACM Transactions on Information and System Security, 2(1): 65–104, February 1999. Bertino, E. and E. Ferrari. Secure and selective dissemination of xml documents. ACM Transactions on Information and System Security, 5(3): 290–331, August 2002a. Bertino, E., E. Ferrari, and S. Castano. On specifying security policies for web documents with an xmlbased language. Lecture Notes in Computer Science, pages 57–65. 1st ACM Symposium on Access Control Models and Technologies (SACMAT’01), ACM Press, New York May 2001a. Bertino, E., E. Ferrari, and S. Castano. Securing xml documents with author-X. IEEE Internet Computing, 5(3): 21–31, May/June 2001b. Bertino, E., E. Ferrari, and L. Parasiliti. Signature and access control policies for xml documents. 8th European Symposium on Research in Computer Security (ESORICS 2003), Gjovik, Norway, October 2003. Bertino, E., E. Ferrari, B. Thuraisingam, A. Gupta, and B. Carminati. Selective and authentic third-party distribution of xml documents. Technical Report, IEEE Transactions on Knowledge and Data Engineering, in press. Bertino, E., A. Vinai, and B. Catania. Encyclopedia of Computer Science and Technology, volume 38, in the chapter “Transaction Models and Architectures,” pages 361–400. Marcel Dekker, New York, 1998. Extensible Markup Language (XML) 1.0 (Second Edition). World Wide Web Consortium, W3C Recommendation 6 October 2000. URL http://www.w3.org/TR/REC-xmI.
Copyright 2005 by CRC Press LLC
C3812_C26.fm Page 18 Wednesday, August 4, 2004 8:49 AM
26-18
The Practical Handbook of Internet Computing
Griffiths, P.P. and B.W. Wade. An authorization mechanism for a relational database system. ACM Transactions on Database Systems, 1(3): 242–255, September 1976. Pollmann, C. Geuer. The xml security page. URL http://www.nue.et-inf.uni-siegen.de/euer-pollmann/ xml_security.html. Sandhu, R. Role hierarchies and constraints for lattice-based access controls. Lecture Notes in Computer Science 1146, pages 65–79. 4th European Symposium on Research in Computer Security (ESORICS 2003), Rome, September 1996. Stallings, W. Network Security Essentials: Applications and Standards, Prentice Hall, Englewood Cliffs, NJ, 2000. T. Yu, M. Winslett. A unified scheme for resource protection in automated trust negotiation. IEEE Symposium on Security and Privacy, Oakland, CA, 2003. Yu, T., K. E. Seamons, and M. Winslett. Supporting structured credentials and sensitive policies through interoperable strategies for automated trust negotiation. ACM Transactions on Information and System Security, 6(1): 1–42, February 2003. XML Path (XPATH) 1.0, 1999. World Wide Web Consortium. URL http://www.w3.org/TR/1999/RECxpath-19991116. XML Signature Syntax and Processing. World Wide Web Consortium, W3C Recommendation February 12, 2002. URL http://www.w3.org/TR/xmldsig-core/.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 1 Wednesday, August 4, 2004 8:51 AM
27 Understanding Web Services CONTENTS Abstract 27.1 Introduction 27.2 Service Oriented Computing 27.3 Understanding the Web Services Stack 27.4 Transport and Encoding 27.4.1 SOAP
27.5 Quality of Service.
Rania Khalaf Francisco Curbera William Nagy Stefan Tai Nirmal Mukhi Matthew Duftler
27.5.1 Security 27.5.2 Reliability 27.5.3 Coordination
27.6 Description 27.6.1 Functional Definition of a Web Services 27.6.2 A Framework for Defining Quality of Service 27.6.3 Service Discovery
27.7 Composition 27.7.1 Choreography
27.8 Summary References
Abstract Web services aim to support highly dynamic integration of applications in natively interorganizational environments. They pursue a platform independent integration model, based on XML standards and specifications, that should allow new and existing applications created on proprietary systems to seamlessly integrate with each other. This chapter provides an overview of the Web services paradigm, starting with the motivations and principles guiding the design of the Web services stack of specifications. Practical descriptions are provided of the key specifications that define how services may communicate, define, and adhere to quality of service requirements; provide machine readable descriptions to requesting applications; and be composed into business processes. An example is discussed throughout the chapter to illustrate the use of each specification.
27.1 Introduction Web services is one of the most powerful and, at the same time, more controversial developments of the software industry in the last 5 years. Starting with the initial development of the Simple Object Access Protocol specification (SOAP 1.0) in 1999, followed by the Universal Description Discovery and Integra-
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 2 Wednesday, August 4, 2004 8:51 AM
27-2
The Practical Handbook of Internet Computing
tion (UDDI 1.0) and the Web Services Description Language (WSDL 1.0) in 2000, the computer industry quickly bought into a vision of cross-vendor interoperability, seamless integration of computing systems, falling information technology (IT) costs, and the rapid creation of an Internet-wide service economy. The vision was soon backed by an array of products supporting the emerging pieces of the Web services framework and endorsed by most industry analysts. It is easy to miss, among this unprecedented level of industry support, the flurry of new specifications and the skepticism of many (often from the academic community), the main motivating factors behind the Web services effort and its distinguishing features. There are three major motivating forces: 1. A fundamental shift in the way enterprises conduct their business, toward greater integration of their business processes with their partners. The increased importance of the business-to-business (B2B) integration market is a reflection of this trend [Yates et al., 2000]. 2. The realization that the installed computing capacity is largely underutilized, and a renewed focus on efficient sharing. The development of scientific computing Grids are the most prominent example of this trend [Foster et al., 2001]. 3. A recognition of the power of Internet standards to drive technical and business innovation. Two important trends in the way business is conducted are particularly relevant to our first point. On the one hand, enterprises are moving to concentrate on their core competencies and outsource other activities to business partners. Businesses today assume that fundamental parts of their core processes may be carried out by trusted partners and will be effectively out of their direct control. On the other hand, a much more dynamic business environment and the need for more efficient processes have made “just-in-time” techniques key elements of today’s business processes (in production, distribution, etc.). Just-in-time integration of goods and services provided by partners into core processes is now commonplace for many businesses. A parallel trend has taken place in the scientific community. The idea that scientific teams are able to remotely utilize other laboratories’ specialized applications and scientific data instead of producing and managing it all locally is the counterpart of the business trends we just described. In addition, the Grids initiative has helped focus the industry’s attention on the possibility of achieving higher levels of resource utilization through resource sharing. A standard framework for remote application and data sharing will necessarily allow more efficient exploitation of installed computing capacity. These changes in the way we think about acquiring services and goods have taken place simultaneously with the expansion of the Web into the first global computing platform. In fact, the wide availability of Internet networking technologies has been the technical underpinning for some of these developments. More importantly, however, the development of the Internet has shown the power of universal interoperability and the need to define standards to support it. HTTP and HTML were able to spark and support the development of the Web into a global human-centric computing and information sharing platform. Would it be possible to define a set of standards to support a global, interoperable, and applicationcentric computing platform? The Web services effort emerges in part as an attempt to answer this question. Web services is thus an effort to define a distributed computing platform to support a new way of doing business and conducting scientific research in a way that ensures universal application to application interoperability. The computing model required to support this program necessarily has to differ from previous distributed computing frameworks if it is to support the requirements of dynamism, openness, and interoperability outlined here. Service Oriented Computing (SOC) provides the underlying conceptual framework on which the Web services architecture is being built. We review the main assumptions of this model in the next section. In the rest of this chapter we will provide an overview of the key specifications of the Web services framework. Section 27.2 discusses the principal assumptions of the SOC model. Section 27.3 contains an introduction to the Web services specification stack. Section 27.4 covers the basic remote Web services interaction mechanisms, and Section 27.5 describes the quality of service protocols that support Web services interactions. In Section 27.6 we discuss how Web services are described and in Section 27.7 we
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 3 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-3
explain how Web services can be combined to form service compositions. We conclude in Section 27.8 with a summary of the contents of this chapter.
27.2 Service Oriented Computing Service Oriented Computing (SOC) tries to capture essential characteristics of a distributed computing system that natively supports the type of environment and the goals which we have described in the previous section. SOC differentiates the emerging platform from previous distributed computing architectures. Simplifying the discussion, we may summarize the requirements implied by our discussion above in three main ideas: • The assumption that computing relies on networks of specialized applications owned and maintained by independent providers • The concept of just-in-time, automated application integration, including pervasive support for a dynamic binding infrastructure • The assumption that a set of basic protocols will provide universal interoperability between applications regardless of the platform in which they run The exact nature of service oriented computing has not yet been clearly and formally articulated (though there is a growing interest in the space, see ICSOC, 2001; Papazoglou and Georgakopoulos [2003]), but we can state a set of key characteristics of SOC platforms that follow directly from these assumptions. Platform independence: The set of protocols and specifications that support the SOC framework should avoid any specific assumption about the capabilities of the implementation platforms on which services run. The realization of the wide heterogeneity of platforms and programming models and the requirement for universal support and interoperability motivate this principle. Explicit metadata: Applications (“services”) must declaratively define their functional and nonfunctional requirements and capabilities (“service metadata”) in an agreed, machine readable format. The aim is to reduce the amount of out-of band and implicit assumptions regarding the operation and behavior of the application by making all technical assumptions about the service explicit. Implicit assumptions about service properties limit its ability to operate in interorganizational environments. Metadata driven dynamic binding: Based on machine readable declarative service descriptions, automated service discovery, selection, and binding become native capabilities of SOC middleware and applications, allowing just-in-time application integration. A direct consequence of the dynamic binding capability is a looser coupling model between applications. Applications express their dependencies on other services in terms of a set of behavioral characteristics and potentially discover the actual services they will utilize at a very late stage of their execution (late binding). A componentized application model: In a SOC environment, services are basic building blocks out of which new applications are created. New applications are built out of existing ones by creating service compositions. A service composition combines services following a certain pattern to achieve a business goal, solve a scientific problem, or provide new service functions in general. Whether these services are found inside or outside the organization is essentially irrelevant from the application integration perspective, once we assume a SOC-enabled middleware platform. Service composition thus provides a mechanism for application integration which seamlessly supports cross-enterprise (business to business, B2B) and intra-enterprise application integration (EAI). Service-oriented computing is thus naturally a component oriented model. A peer-to-peer interaction model: The interaction between services must be able to naturally model the way organizations interact, which does not necessarily follow the traditional client-server asymmetric model. Rather, they are much like business interactions: bidirectional and conversational. A typical interaction involves a series of messages exchanged between two parties over a possibly long-running conversation. Different modes of service coupling should be possible; however, loosely-coupled services combine traditional decoupled messaging-style interactions (data-excbange) with the explicit application
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 4 Wednesday, August 4, 2004 8:51 AM
27-4
The Practical Handbook of Internet Computing
contracts of tighter-coupled (interface-driven) object-oriented interaction styles. Note that with tightlycoupled services the notion of an application contract similar to those in object-oriented systems has a different flavor, since it relies on explicit specification of behavior and properties in metadata, as opposed to implicit assumptions about the state of an object.
27.3 Understanding the Web Services Stack The Web services framework consists of a set of XML standards and specifications that provide an instantiation of the service-oriented computing paradigm discussed in the previous section. In this section we provide an overview of these different specifications. We organize them into a Web services stack, which is illustrated in Figure 27.1. The Web services stack is extensible and modular. As the framework continues to mature and evolve, new specifications may be added in each of the layers to address additional requirements. The modularity enables a developer to use only those pieces of the stack deemed necessary for the application at hand. Using alternative transport and encoding protocols is an example of this extensibility. The framework is split into four main areas: transport and encoding, quality of service, descriptions, and business processes. The transport and quality of service layers define basic “wire” protocols that every service will be assumed to support to ensure interoperability. A basic messaging protocol (SOAP, Gudgin et al. [2003]) and encoding format (XML) provide basic connectivity. The specifications in the quality of service layer (QoS) define protocols for handling QoS requirements such as exchanging messages reliably, security, and executing interactions with transactional semantics. The focus of these protocols is to provide “on the wire” interoperability by describing the normative requirements of the exchanged sequences of messages. Implementation and programming model implications are necessarily absent in order to guarantee full implementation and platform independence. It is important to observe that the Web services framework only requires following standard protocols for interoperability reasons. They may be replaced by platform specific ones when appropriate, while still complying with higher levels of the specification stack. Description layer specifications deal with two problems. The first one is how to represent service behavior, capabilities, and requirements in a machine readable form. The second is how to enable potential service users to discover and dynamically access services. The Web Services Description Language, WSDL [Christensen et al., 2001] is used to define the functional capabilities of a service, such as the operations, service interfaces, and message types recognized by the service. WSDL also provides deployment information such as network address, transport protocol, and encoding format of the interaction. Quality of service requirements and capabilities are declared using the WS-Policy framework [Box et al., 2002b]. WS-Policy enables Web services quality of service “policies” to be attached to different parts of a WSDL definition. Different policy “dialects” are defined to represent specific types of QoS protocols, such reliable messaging, security, etc. Thus, the description layer enables the creation of
Business Processes Description Quality of Service Transport & Encoding FIGURE 27.1 The Web services stack.
Copyright 2005 by CRC Press LLC
BPEL4WS WSDL, Policy, UDDI, Inspection Security
Reliable Messaging
SOAP XML, Encoding
Transactions Coordination Other Protocols Other Services
C3812_C27.fm Page 5 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-5
standard metadata associated with the protocols that form the other layers. A second set of specifications define how to publish, categorize, and search for services based on their service descriptions. The Universal Description, Discovery, and Integration (UDDI) specification [Bellwood et al., 2002] and WS-Inspection are the two protocols in this category. Finally, the business process layer deals with the composition of services. Building on the service (component) descriptions defined using WSDL and WS-Policy, the specifications in this layer will define how services may be combined to create new applications or services. Currently only one specification, the Business Process Execution Language for Web Services (BPEL4WS) [Curbera et al., 2002b], provides this type of functionality. BPEL4WS defines a process or workflow oriented composition model particularly well suited to deal with business applications. Other composition mechanisms will likely emerge in the near future. In order to understand how these layers fit together, consider a simple example of a loan request interaction. We will detail this example throughout this chapter as we cover the key specifications from the different parts of the stack. In this example, Joe, a college student, wants to request a loan to pay for his college tuition. From his friends Joe learns about a company called LoanCo that processes such requests using Web services and decides to work with it. He looks at the functional definition of the service as contained in its WSDL document and sees that he can access its loan processing operation (“request”) using SOAP over HTTP. Joe then sends a secure, reliable loan request to the service. On the other side of the wire, LoanCo used BPEL4WS to create its loan service. As part of its operation, the loan service interacts with two other Web services, one from a credit reporting agency and another from a highly-reputable financial institution to decide whether or not to provide the loan. LoanCo’s process is not tied to specific agencies or banks, and it can decide at runtime which ones it would like to use for each customer, perhaps finding them through a query to a UDDI registry. In the following sections we present the different layers of the stack in more detail, focusing on the specifications that are most relevant in each layer. The example which we just described will be used in each section to illustrate how the corresponding specifications are used.
27.4 Transport and Encoding To achieve the goal of universal interoperability, a small set of application-to-application interaction protocols needs to be defined and adopted across the industry. The SOAP specification defines a messaging model which provides basic interoperability between applications, regardless of the platforms in which they are implemented. In this respect, SOAP is for Web services what the HTTP protocol is for the Web. As we mentioned before, the definition and widespread adoption of a common interaction protocol for Web services does not preclude the use of other platform or application specific protocols. For example, two applications may replace SOAP by a shared technology whose use provides additional benefits such as increased quality of service.
27.4.1 SOAP SOAP is a lightweight, platform independent, XML-based messaging and RPC protocol. SOAP messages can be carried on any existing transport such as HTTP, SMTP, TCP, or proprietary messaging protocols like IBM’s MQ Series. SOAP does not require the use of any of these, but it does define a standard way of carrying SOAP messages inside an HTTP request. In its simplest form, a SOAP message is nothing but an envelope containing an optional header and a mandatory body. The envelope, header, and body are each expressed as XML elements. The header and the body sections can contain arbitrary XML elements. The following XML snippet shows an example of a (particularly simple) SOAP request that carries no header section.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 6 Wednesday, August 4, 2004 8:51 AM
27-6
The Practical Handbook of Internet Computing
Joe 10000
In addition SOAP introduces the notion of “actors,” which can be used to indicate the intended recipient of various parts of a SOAP message. The actor construct is used to guide a SOAP message through a sequence of intermediaries, with each intermediary processing its portion of the message and forwarding the remainder. 27.4.1.1 Messaging using SOAP The basic level of functionality provided by SOAP is that of a messaging protocol. The SOAP envelope wraps the application message in its body. The header section contains XML elements that support middleware level protocols such as reliable messaging, authentication, distributed transactions, etc. The envelop may be transmitted using any number of protocols, such as HTTP; it may even flow over different protocols as it reaches its destination. In a typical example, Joe’s client may utilize LoanCo’s service by sending it a SOAP envelope as the content of an HTTP POST request. When messaging semantics are used, the SOAP body contains arbitrary XML data; the service and client must come to an agreement as to how the data is to be encoded. 27.4.1.2 Remote Procedure Calls using SOAP In addition to pure messaging semantics, the SOAP specification also defines a mechanism for performing remote procedure calls (RPC). This mechanism places constraints upon the messaging protocol defined above, such as how the root element in the body of the SOAP envelope is to be named. In order to carry the structured and typed information necessary for representing remote procedure calls, the SOAP specification provides suggestions as to how the data should be encoded in the messages which are exchanged. XML Schema [Fallside, 2001], another W3C specification, provides a standard language for defining the structure of XML documents as well as the data type of XML structures. While SOAP allows one to use whatever encoding style or serialization rules one desires, SOAP does define an encoding style based on XML schema that may be used. This encoding style will allow for the generation of an XML representation for almost any type of application data. The requests and responses of RPC calls are represented using this XML encoding. No matter what one’s platform and transport protocol of choice are, there is most likely a SOAP client and server implementation available. There are literally dozens of implementations out there, many of which are capable of automatically generating and/or processing SOAP messages. So long as the generated messages conform to the SOAP specification, SOAP peers can exchange messages without regard to implementation language or platform. That being said, it is important to once again point out that supporting SOAP is not a requirement for being considered a Web service; it simply provides a fall-back if no better suited communication mechanism exists.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 7 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-7
27.5 Quality of Service Just like any other distributed computing platform, the Web services framework must provide mechanisms to guarantee specific quality of service (QoS) properties such as performance, reliability, security, and others. Business interactions are just not feasible if they cannot be assumed to be secure and reliable. These requirements are often formalized through contracts and service agreements between organizations. In a SOC environment, protocols are defined as strict “wire protocols,” in that they only describe the external or “visible” behavior of each participant (“what goes on the wire”), avoiding any unnecessary specification of how the protocol should be implemented. This is a consequence of the platform and implementation independence design principle, and distinguishes Web service protocol specifications from their counterparts in other distributed computing models. In addition the SOC model requires that the QoS requirement and capabilities of services be declaratively expressed in machine readable form to enable open access and dynamic binding. In this section we examine three key QoS protocols: security, reliability, and the ability to coordinate activities with multiple parties (for example, using distributed transaction capabilities). The policy framework described in Section 27.6.2 provides a generic mechanism to describe the QoS characteristics of a service.
27.5.1 Security Security is an integral part of all but the most trivial of distributed computing interactions. The security mechanisms applied may range from a simple authentication mechanism, such as that which is provided by HTTP Basic Authentication, to message encryption and support for nonrepudiation. In most cases, the issues that must be addressed in Web services are almost identical to those faced by other computing systems, and so it is generally a matter of applying existing security concepts and techniques to the technologies being used. As always, securing anything is a very complex task and requires a delicate balance between providing security and maintaining the usability of a system. 27.5.1.1 Authentication/Authorization Web services are used to expose computational resources to outside consumers. As such, it is important to be able to control access to them and to guarantee that they are used only in the prescribed manner. This guarding of resources is performed by the code responsible for authentication and authorization. There are many existing programming models and specifications which provide authentication and authorization mechanisms that can be used in conjunction with Web services. As with communications protocols, the support for preexisting technologies, such as Kerberos [Steiner et al., 1988], allows Web services to be integrated into an existing environment, although such a choice does limit future integration possibilities and may only prove useful for single-hop interactions. A set of new proposals, such as the Security Assertions Markup Language (SAML) [Hallam-Baker et al., 2002] and WS-Security [Atkinson et al., 2002], have been developed to provide authentication and authorization mechanisms which fit naturally into XML-based technologies. SAML provides a standard way to define and exchange user identification and authorization information in an XML document. The SAML specification also defines a profile for applying SAML to SOAP messages. WS-Security defines a set of SOAP extensions that can be used to construct a variety of security models, including authentication and authorization mechanisms. Both SAML and WS-Security allow authentication and authorization information to be propagated along a chain of intermediaries, allowing a “single sign-on” to be achieved. In our example, if LoanCo requires that Joe be authenticated before he can access the service, then his initial SOAP request may contain a SOAP header like the following (assuming that LoanCo uses WSSecurity).
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 8 Wednesday, August 4, 2004 8:51 AM
27-8
The Practical Handbook of Internet Computing
Joe money
27.5.1.2 Confidentiality The data traveling between Web services and their partners are often confidential in nature, and so we need to be able to guarantee that the messages can only be read by the intended parties. As with other distributed systems, this functionality is typically implemented through some form of encryption. In scenarios where IP is being used for the communication transport, we can use SSL/TLS to gain confidentiality on a point-to-point level. If SOAP is being used, the WS-Security specification defines how XML–Encryption [Dillaway et al., 2002] may be applied to allow us to encrypt/decrypt part or all of the SOAP message, thereby allowing finer-grain access to the data for intermediaries and providing a flexible means for implementing end-to-end message level confidentiality. Again, the choice of technologies depends upon the environment in which we are deploying. 27.5.1.3 Integrity In addition to making sure that prying eyes are unable to see the data, there is a need to be able to guarantee that the message which was received is the one that was actually sent. This is usually implemented through some form of a digital signature or through encryption. If we are using IP, we can again use SSL/TLS to gain integrity on a point-to-point level. WS-Security defines how XML-SIG (XML Signature Specification) [Bartel et al., 2002] may be used to represent and transmit the digital signature of a message and the procedures for computing and verifying such signatures to provide end-to-end message level integrity.
27.5.2 Reliability Messages sent over the Internet using an unreliable transport like SOAP-over-HTTP may never reach their target, due to getting lost in transit, unreachable recipients, or other failures. Additionally, they may be reordered by the time they arrive, resulting in surprising and unintended behavior. Interactions generally depend on the reliable receipt and proper ordering of the messages that constitute them. Most high-level communication protocols themselves do not include reliability semantics. Reliable messaging systems, on the other hand, enable fire-and-forget semantics at the application level. In order to address reliability in a uniform way, a Web services specification named WS-ReliableMessaging [Bilorusets et al., 2003], WS-RM for short, has been proposed. WS-RM is not tied to a particular transport protocol or implementation strategy; however, a binding for its use in SOAP has been defined in the specification. The basic idea is for a recipient to send an acknowledgment of the receipt of each message back to the sender, possibly including additional information to ensure certain requirements. A WS-RM enabled system provides an extensible set of delivery assurances, four of which are defined in the specification: at most once, at least once, exactly once, and in order (which may be combined with one of the other three). If an assurance is violated, a fault must be thrown. In order to track and ensure the delivery of a message, the message must include a “sequence” element that contains a unique identifier for the sequence, a message number signaling where the message falls in the sequence, and optional elements containing an expiration time and/or indicating the last message in the sequence. An acknowledgment is sent back, using the “sequenceAcknowledgment” element containing the sequence identifier and a range of the numbers of the successfully received message. Using the message numbers in a sequence, a recipient is able to rearrange messages if they arrive out of order. A “SequenceFault” element is defined to signal erroneous situations. WS-RM defines a number of faults
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 9 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-9
to signal occurrences such as invalid acknowledgments and exceeding the maximum message number in a sequence. Continuing with our example, let’s assume that both Joe’s middleware and LoanCo implement the WS-RM specification. When Joe’s client sends his request to his middleware, it may decide to chop the message into two smaller messages before transmitting it over SOAP. The SOAP header of each would contain a < wsrm : sequence > element, such as the one below for the second message: http://loanapp235.com/RM/xyz 2
Now assume that the LoanCo service acknowledges the first message but not the second: http://loanapp235.com/RM/xyz
After some time, Joe’s system assumes that the second part has been lost and resends it. Upon receiving the resent message, LoanCo’s middleware recombines the two messages and hands the appropriate input to the service. Later, LoanCo may actually receive the original second message, but knows from the sequence identifier and the message number that it is a duplicate it can safely ignore. Note that WS-RM itself does not specify which part of a system assembles the pieces or how the assembly is accomplished. That is left to the implementation. A very basic system without reliability support might do it in the application itself, but ideally reliability would be handled in the middleware layer. The specification does, however, include the concept of message sources and targets (Joe and LoanCo’s middleware) that may be different from the initial sender and ultimate receiver of the message (Joe and LoanCo). So far, we have not mentioned how a service may declare the delivery assurances it offers but simply how reliable message delivery can be carried out in the Web services framework. In section 27.6.2, we illustrate the use of a pluggable framework for declaring such information on a service’s public definition.
27.5.3 Coordination Multiparty service interactions typically require some form of transactional coordination. For example, a set of distributed services that are invoked by an application may need to reach a well-defined, consistent agreement on the outcome of their actions. In Joe’s case, he may wish to coordinate the loan approval with an application to college. Joe does not need a loan without being accepted into college, and he needs to decline the college acceptance should he not get a loan. The two activities form an atomic transaction, where either all or none of the activities succeed. 27.5.3.1 Coordination WS-Coordination [Cabrera et al., 2002b] addresses the problem of coordinating multiple services. It is a general framework defining common mechanisms that can be used to implement different coordination models. The framework compares to distributed object frameworks for implementing extended transactions, such as the J2EE Activity service [Houston et al., 2001]. Using WS-Coordination, a specific coordination model is represented as a coordination type supporting a set of coordination protocols. A coordination protocol is the set of well-defined messages that are exchanged between the services that are the coordination participants. Coordination protocols include completion protocols, synchronization protocols, and outcome notification protocols. The WS-Transaction specification (described below) exemplifies the use of WS-Coordination by defining two coordination types for distributed transactions.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 10 Wednesday, August 4, 2004 8:51 AM
27-10
The Practical Handbook of Internet Computing
In order for a set of distributed services to be coordinated, a common execution context needs to exist. WS-Coordination defines such a coordination context, which can be extended for a specific coordination type. A middleware system implementing WS-Coordination can then be used to attach the context to application messages so that the context is propagated to the distributed coordination participants. WS-Coordination further defines two generic coordination services and introduces the notion of a coordinator. The two generic services are the Activation service and the Registration service. A coordinator groups these two services as well as services that represent specific coordination protocols. The Activation service can be used by applications wishing to create a coordination context. The context contains a global identifier, expiration data, and coordination type-specific information, including the endpoint reference for the Registration service. The endpoint reference is a WSDL definition type that is used to identify an individual port; it consists of the URI of the target port as well as other contextual information such as service-specific instance data. A coordination participant can register with the Registration service for a coordination protocol (using the endpoint reference obtained from the context). The participant may also choose to use its own coordinator for this purpose. Figure 27.2 illustrates the sequence of a WS-Coordination activation, service invocation with context propagation, and protocol registration, using two coordinators. 27.5.3.2 Distributed Transactions Transactions are program executions that transform the shared state of a system from one consistent state into another consistent state. Two principle kinds of transactions exist: short-running transactions where locks on data resources can be held for the duration of the transaction and long-running business transactions, where resources cannot be held. WS-Transaction [Cabrera et al., 2002a] leverages WS-Coordination by defining two coordination types for Web services transactions. Atomic Transaction (AT) is the coordination type supporting shortrunning transactions, and Business Activity (BA) is the coordination type supporting long-running business transactions. ATs render the well-known and widely used distributed transaction model of traditional middleware and databases for Web services. A set of coordination protocols supporting atomicity of Web services execution, including the two-phase commit protocol, are defined. ATs can be used to coordinate Web services within an enterprise, when resources can be held. ATs may also be used across enterprises if a tight service coupling for transactional coordination is desired. BAs model potentially long-lived activities. They do not require resources to be held, but do require business logic to handle exceptions. Participants here are viewed as business tasks that are children to the BA for which they register. Compared to ATs, BAs suggest a more loosely-coupled coordination model in that, for example, participants may choose to leave a transaction or declare their processing outcome before being solicited to do so.
2. send app message containing c1
Application
1. createCoordinationContex, returns c1
Application
3. createCC(c1), returns c2 4. register
Coordinator 1 Activation Service AS1
Registration Service RS1
Coordinator 2 Protocol Service PS1
Activation Service AS2
5. register (PS2) 6. protocol P
FIGURE 27.2 Coordinating services.
Copyright 2005 by CRC Press LLC
Registration Service RS2
Protocol Service PS2
C3812_C27.fm Page 11 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-11
In our example, Joe would use middleware that supports the WS-Transaction Atomic Transaction coordination type to create a coordination context. He would also require the two services, the loan approval service and the college application service, to support the AT protocols of the WS-Transaction specification. Joe would invoke the two services and have the middleware propagate the coordination context with the messages; the two services being invoked will then register their resources as described. Should any one of the two services fail, the transaction will abort (none of the two services will commit). An abort can be triggered by a system or network failure, or by the application (Joe) interpreting the results of the invocations. The WS-Transaction AT and BA coordination types are two models for Web services transactions, which can be implemented using the WS-Coordination framework. Other related specifications have been published in the area of Web services transactions. These include the Business Transaction Protocol [Ceponkus et al., 2002], and the Web services Composite Application Framework (WS-CAF) [Bunting et al., 2003] consisting of a context management framework, a coordination framework, and a set of transaction models. The BTP and the WS-CAF transaction protocols may be implemented using WSCoordination, otherwise, the BTP and the WS-Transaction protocols may be implemented using the WSCAF context and coordination framework. In general, there is significant overlap between these specifications defining models, protocols, and mechanisms for context-based transactional coordination.
27.6 Description Service descriptions are central to two core aspects of the SOC model. First, the ability of applications to access services that are owned and managed by third party organizations relies not only on the availability of interoperability protocols, but also on the assumption that all relevant information needed to access the service is published in an explicit, machine readable format. In addition, the ability to perform automatic service discovery and binding at runtime relies on selecting services and adapting to their requirements based on those service descriptions. This section describes the Web services languages that can be used to encode service descriptions, and the discovery mechanisms that can be built on top of those descriptions.
27.6.1 Functional Definition of a Web Services The functional description of a Web service is provided by the Web Services Description Language (WSDL) [Christensen et al., 2001]. A complete WSDL description provides two pieces of information: an application-level description of the service (which we will also call the “abstract interface”) and the specific protocol-dependent details that need to be followed to access the service at concrete service endpoints. This separation of the abstract from the concrete enables the definition of a single abstract component that is implemented by multiple code artifacts and deployed using different communication protocols and programming models. The abstract interface in WSDL consists of the operations supported by a service and the definition of their input/output messages. A WSDL message is a collection of named parts whose structure is formally described through the use of an abstract type system, usually XML Schema. An operation is simply a combination of messages labeled input, output, or fault. Once the messages and operations have been defined, a portType element is used to group operations supported by the endpoint. For example, the “request” operation that processes the loan application in our example could be defined in the following snippet taken from the loan approval sample in [Curbera et al., 2002b]:
where, for example, the approval message is defined to contain one part that is of the type defined by XML Schema’s string:
The abstract definition of a service, as defined by WSDL, provides all of the information necessary for a user of the service to program against (assuming that their middleware is capable of dealing with the transport/protocol details.) The second portion of a WSDL description contains the concrete, implementation-specific aspects of a service: what protocols may be used to communicate with the service, how a user should interact with the service over the specified protocols, and where an artifact implementing the service’s interface may be found. This information is defined in the binding, port, and service elements. A binding element maps the operations and messages in a portType (abstract) to a specific protocol and data encoding format (concrete). Bindings are extensible, enabling one to define access to services over multiple protocols in addition to SOAP. A pluggable framework such as that defined in [Mukhi et al., 2002] may then be used for multiprotocol Web services invocations. A port element provides the location of a physical endpoint that implements a specific portType using a specific binding. A service is a collection of ports. An example is shown below of LoanCo’s offering of an implementation of the “loanServicePT” port type over a SOAP binding. Notice how the abstract “request” operation is mapped to a SOAP-encoded rpc style invocation using SOAP over HTTP: soap:binding style=“rpc” transport=“http://schemas.xmlsoap.org/soap/http”/>
Finally, the LoanService service element contains a port at which an endpoint is located that can be communicated with using the information in the associated binding (and implementing the portType associated with that binding). Loan Service
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 13 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-13
27.6.2 A Framework for Defining Quality of Service As we have seen, WSDL provides application-specific descriptions of the abstract functionality and concrete bindings and ports of Web services. The quality-of-service aspects of a Web service, however, are not directly expressed in WSDL. QoS characteristics may comprise reliable messaging, security, transaction, and other capabilities, requirements, or usage preferences of a service. In our example, the loan approval service may wish to declare that authentication is required before using the service or that the service supports a specific transaction protocol and allows applications to coordinate the service according to the transaction protocol. The Web Services Policy Framework (WS-Policy) [Box et al., 2002b] provides a general-purpose model to describe and communicate such quality-of-service information. WS-Policy is a domain-neutral framework which is used to express domain-specific policies such as those for reliable messaging, security, and transactions. WS-Policy provides the grammar to express and compose both simple declarative assertions as well as conditional expressions. For example, the following security policy (taken from the WS-Policy specification) describes that two types of security authentication, Kerberos and X509, are supported by a service, and that Kerberos is preferred over X509 authentication.
The example illustrates the three basic components of the WS-Policy grammar: the top level < wsp : Policy > container element, policy operators (here: < wsp : ExactlyOne >) to group statements, and attributes to distinguish usage. Different policy operators and values for the usage attributes are defined in WS-Policy, allowing all kinds of policy statements to be made. Policies can be associated with services in a flexible manner and may be used by both clients and the services themselves. Specific attachment mechanisms are defined in the WS-PolicyAttachment specification [Box et al., 2002a], including the association of policies with WSDL definitions and UDDI entities. The Web services policy framework also provides common policy assertions that can be used within a policy specification: for example, assertions for encoding textual data and for defining supported versions of specifications. These common assertions are defined in the WS-PolicyAssertions Language (WS-PolicyAssertions) specification [Box et al., 2002c].
27.6.3 Service Discovery To allow developers and applications to use a service, its description must be published in a way that enables easy discovery and retrieval, either manually at development time or automatically at runtime. Two of the specifications which facilitate the location of service information for potential users are the Universal Description, Discovery, and Integration (UDDI) [Bellwood et al., 2002] and WS-Inspection [Ballinger et al., 2001]. 27.6.3.1 UDDI UDDI is cited as the main Web service query and classification mechanism. From an architectural perspective, UDDI takes the form of a network of queryable business/service information registries, which may or may not share information with one another. The UDDI specification defines the structure of data that may be stored in UDDI repositories and the APIs which may be used to interact with a repository, as well as the guidelines under which UDDI nodes operate with each other.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 14 Wednesday, August 4, 2004 8:51 AM
27-14
The Practical Handbook of Internet Computing
The UDDI consortium, uddi.org, manages an instance of UDDI called the UDDI Business Registry (UBR) , which functions as a globally known repository at which businesses can register and discover Web services. Service seekers can use the UBR to discover service providers in a unified and systematic way, either directly through a browser or using UDDI’s SOAP APIs for querying and updating registries. A variety of “private” UDDI registries have been created by companies and industry groups to provide an implementation of the functionality for their own internal services. UDDI encodes three types of information about Web services: “white pages” information such as name and contact data; “yellow pages” or categorization information about business and services; and the socalled “green pages” information, which includes technical data about the services. A service provider is represented in UDDI as a “businessEntity” element, uniquely identifiable by a business key, containing identifying information about the provider and a list of its services. Each such “businessService” contains one or more binding templates that contain the technical information needed for accessing different endpoints of that service that possibly have different technical characteristics. The most interesting field in a binding template is “tModelInstanceDetails,” which is where the technical description of the service is provided in a list of references to technical specifications, known as “tModels,” that the service complies with. The tModels represent a technical specification, such as a WSDL document, that has already been registered in the directory and assigned a unique key. They enable service descriptions to contain arbitrary external information that is not defined by UDDI itself. Taxonomical systems can be registered in UDDI as tModels to enable categorized searching. Three standard taxonomies have been preregistered in the UBR: an industry classification (NAICS), a classification of products and services (UNSPSC), and a geographical identification system (ISO 3166). In our example, banks and credit agencies will have published information about their services in UDDI so that it may be easily retrieved by new or existing customers. When LoanCo needs their services, it may query a UDDI registry to discover the necessary description information. For example, it may submit a query to find a bank using an NAICS [Bureau, 2002] code as the search key and may submit another query to find a credit agency using an existing WSDL interface as the search key. UDDI may return any number of matching entries in the responses, allowing LoanCo to pick the one that best fulfills their requirements. Once a service record is located within UDDI, any registered information, such as WSDL interfaces or WS-Policies, may be retrieved and consumed. 27.6.3.2 WS-Inspection The WS-Inspection specification provides a completely decentralized mechanism for locating service related information. WS-Inspection operates in much the same as the Web; XML-based WS-Inspection documents are published at well known locations and then retrieved through some common protocol such as HTTP. WS-Inspection documents are used to aggregate service information, and contain “pointers” to service descriptions, such as WSDL documents or entries in UDDI repositories. In our example, LoanCo may publish a WS-Inspection document that looks like the following
on its Website to advertise its services to any interested clients. If Joe knows that he is interested in working with LoanCo, he may search their Website for a WS-Inspection document which would tell him what services they have available. In fact, if Joe were using a WS-Inspection-aware Web browser, it might simply present him with an interface for directly interacting with the service.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 15 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-15
27.7 Composition In a service-oriented architecture, application development is closely tied to the ability to compose services. A collection of services can be composed into an aggregate service, which is amenable to further composition. Alternatively, a set of services may be orchestrated by specifying their interactions upfront. In the latter case the aggregate entity may not define a composable service, but can be defined using the same mechanisms as a service composition. The two most often cited use cases for service composition are Enterprise Application Integration (EAI) in which it defines the interactions between applications residing within the same enterprise, and Business Process Integration (BPI) in which it does so for applications spread across enterprises. In both of these cases, the fundamental problem is that of integration of heterogeneous software systems. Service composition offers hope in attacking these issues since once each application is offered as a service described in a standard fashion, integration reduces to the orchestration of services or the creation of aggregate services through composition operations.
27.7.1 Choreography The Business Process Execution Language for Web Services [Curbera et al., 2002b], or BPEL for short, is a language for specifying service compositions/business processes. BPEL lets process designers create two kinds of processes using the same language, barring a small set of language features whose use depends on the kind of process being defined: 1. Abstract processes, which are used to define protocols between interacting services usually controlled by different parties. Such processes cannot be interpreted directly as they do not specify a complete description of business logic and service interactions. Rather, each party involved may use the process description to verify that his/her own executable business logic matches the agreed upon protocol. 2. Executable processes, which specify a complete set of business rules governing service interactions and represent the actual behavior of each party; these can be interpreted, and such a process is similar to a program written in some high level programming language to define a composition. A key feature of BPEL is that every process has a WSDL description, and this description can in fact be derived from the process definition itself. This is significant since the BPEL process then becomes nothing but the specification of an implementation of a Web service described using WSDL, which may consequently be composed recursively as the service defined through the process. may be used in another process definition. BPEL derives its inspiration from the workflow model for defining business processes. The process definition can be viewed as a flowchart-like expression of an algorithm. Each step in the process is called an activity. There are two sets of activities in BPEL: 1. Primitive activities, such as the invocation of a service (specified using the invoke activity), receiving a message (the receive activity, responding to a receive the reply activity), signaling a fault (the throw activity), termination of the process (the terminate activity), etc. 2. Structured activities, which combine other activities into complex control structures. Some of these are the sequence activity, which specifies that all the contained activities must execute in order; the flow activity wherein contained activities run in parallel, with control dependencies between them specified using links; and the scope activity, which defines a unit of fault handling and compensation handling. The services being composed through a BPEL process are referred to as partners of the process. The relationship between the partner and the process is defined using a service link type. This specifies the functionality (in terms of WSDL port types supported) that the process and the partner promise to provide.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 16 Wednesday, August 4, 2004 8:51 AM
27-16
The Practical Handbook of Internet Computing
Consider an example of a BPEL process definition that defines how a LoanCo’s process may be implemented, illustrated in Figure 27.3. This service is in fact implemented through the composition of two other services, which may be offered by third parties: a credit agency that determines the suitability of an applicant for a loan, and a bank that is capable of making a decision for high risk applications such as those with a high loan amount or a requester with a bad credit history. This process views the credit agency and bank as partners whose functionality it makes use of. Additionally, the user of the composite service itself is also a partner. Viewing each interacting entity as a partner in this way defines a model in which each service is a peer. The process logic is straightforward, defined by grouping activities in a flow construct and specifying control flow through the use of links with transition conditions that determine whether a particular link is to be followed or not. The incoming loan application is processed by a receive activity. For loan requests less than $10,000, an invoke activity hands off the application to the credit agency service for a risk assessment. If that assessment comes back as high or if the amount had been large to start with, then the bank service is invoked. On the other hand, if the credit agency had determined that the applicant was low risk, an assign activity creates a message approving the applicant. Finally, a reply activity sends a message back to the applicant, which at this point contains either the bank’s decision or the positive approval created in the assign. The full BPEL definition of this flow is available in [Curbera et al., 2002b] and with the samples distributed with the prototype runtime BPWS4J [Curbera et al., 2002a]. The user, of course, views the process merely through its WSDL interface. It sees that the service offers a request-response operation for loan approval, sends the loan application in the required format, and receives an answer in return. Each request-response operation on the WSDL is matched at runtime to a receive and reply activity pair. BPEL processes may be stateless, but in most practical cases a separate stateful process needs to be created to manage the interaction with a particular set of partners. In our example, each loan applicant needs to use his own version of the loan approval process, otherwise we might have Joe and Bob making simultaneous loan applications and receiving counter-intuitive responses. BPEL does not have explicit lifecycle control, instead processes are created implicitly when messages are received. Data contained within the exchanged messages are used to identify the particular stateful interaction. In BPEL, sets of these key data fields are known as correlation sets. A loan application generally contains a set of business data, such as the user’s Social Security number, which serve as useful correlation fields for a process. A process definition allows specification of sets of such correlation fields and also specifies how they can be mapped to business data contained within application messages. Thus, correlation data is not an opaque middleware-generated token but instead consists of a set of fields contained in the application messages.
LoanCo WS FLOW
Joe WSDL
a
RCVreqst,a
a=1000
risk=”high” INVapprove,Cy INVcheck,Cx risk=”low” Assign
op:check
RPLreqst,a
FIGURE 27.3 Sample loan approval flow using two services and exposed as a Web service.
Copyright 2005 by CRC Press LLC
Cy
Bank WS WSDL
op:reqst
Bank WS WSDL
op:approve
Cx
C3812_C27.fm Page 17 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-17
BPEL also provides facilities for specifying fault handling, similar to the exception handling facility in Java, and allows for the triggering of compensation and the specification of compensation handlers. Compensation is used to undo the effects of earlier actions and is used to implement the rollback of a transaction or activities of a similar nature. Now that we have some understanding of how compositions are created using BPEL, let’s take a step back and see why BPEL is necessary. One might immediately conclude that it is possible to specify such compositions directly in one’s favorite programming language such as Java. Although this is true, the key here is that BPEL allows standard specification of such a composition, portable across-Web service containers, and it is a recognition of the fact that programming at the service-level is a separate issue from programming at the object-level. It is not meant to replace someone’s favorite language. The latter has its place in defining fine grained logic, and BPEL’s features in this space do not match up. However, it is in wiring together services, each of which may implement fine grained logic, where BPEL’s capabilities are meant to be harnessed. Complex business applications typically involve interactions between multiple independent parties, and such interactions are usually transactional in nature. For this purpose, one may use WS Coordination and WS-Transaction in tandem. with BPEL4WS as described in [Curbera et al., 2003].
27.8 Summary Web services aim to support highly dynamic integration of applications in natively inter-organizational environments. Web services also aims at achieving universal application to application interoperability based on open Internet standards such as XML. Recognizing the intrinsic heterogeneity of the existing computing infrastructure, Web services pursues a platform independent integration model that should allow new and existing applications created on proprietary systems to seamlessly integrate with each other. In this chapter, we have reviewed the motivating factors behind the Web services effort, and the architectural principles that guide the design of the Web services stack of specifications. We have reviewed the main specifications of the stack and illustrated them by providing a practical description of key specifications for each area. These areas define how services may communicate, define, and adhere to quality of service requirements, provide machine readable descriptions to requesting applications, and be composed into business processes. An example is discussed throughout the chapter to show how each specification may be used in a simple scenario. Figure 27.4 illustrates this scenario and summarizes how the different pieces of the technology fit together. A large number of resources are available, especially on-line, where further information may be found about Web services, including the specifications proposed, their status in the standardization process, implementation strategies, and available implementations. Dedicated Web sites with articles and developer information include IBM’s “developeTWorks” (http://www.ibm.com/developer-works), Microsoft’s “gotdotnet” (http://www.gotdotnet.com), and third-party pages like “webservices.org” (http://www,webservices.org). Additionally, a number of academic conferences have been specifically addressing this space, including the relation between Web services and related Internet technologies such as the Semantic Web and Grid Computing. Service Oriented Computing is still in its youth. The specifications presented in this chapter are the building blocks leading to a complete standards-based framework to support service orientation. With its extensible, modular design, the Web services stack is having specifications filling in the remaining gaps and industry support consolidating behind a set of basic standards. Over the next few years, we will likely see the deployment and adoption of the full SOC model by business and scientific communities. As SOC evolves, we believe the way to stay in step is to design projects with the SOC principles in mind. As the technology matures and develops, systems designed in such a manner will be more agile and able to adopt emerging specifications along the way.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 18 Wednesday, August 4, 2004 8:51 AM
The Practical Handbook of Internet Computing
WS-Inspection (at LoanCo)
SOAP msgs w/ WS-Reliability & WS-Security Headers
SC/ W
BPEL4WS
SOAP msgs ms
gs ,
STx
ms gs
College WS
SOAP msgs
Credit Agency WS
WSDL
W
SO AP
Bank WS WSDL
WS-C/WS-Tx msgs,
WSDL + WS-Policy Attachments
Joe
UDDI
LoanCo WS
WSDL
27-18
FIGURE 27.4 Example: services are exposed using WSDL (w/possible WS-Policy attachments defining QoS), communicated with using SOAPIHTTP, coordinated into an atomic transaction using WS-Coordination and WS-Transaction, and discovered using UDDI and WS-Inspection.
References Atkinson, Bob, Giovanni Della-Libera, Satoshi Hada, Maryann Hondo, Phillip Hallam-Baker, Chris Kaler, Johannes Klein, Brian LaMacchia, Paul Leach, John Manferdelli, Hiroshi Maruyarna, Anthony Nadalin, Nataraj Nagaratnam, Hemma Prafullchandra, John Shewchuk, and Dan Simon. Web Services Security (WS-Security) 1.0. Published online by IBM, Microsoft, and VeriSign at http:// www-106.ibm.com/developerworks/library/ws-secure, 2002. Ballinger, Keith, Peter Brittenham, Ashok Malhotra, William A. Nagy, and Stefan Pharies. Web Services Inspection Language (WS-Inspection) 1.0. Published on the World Wide Web by IBM Corp. and Microsoft Corp. at http://www.ibm.com/developerworks/webservices/library/ws-wsilspec.html, November 2001. Bartel, Mark, John Boyer, Donald Eastlake, Barb Fox, Brian LaMacchia, Joseph Reagle, Ed Simon, and David Solo. XML-Signature Syntax and Processing, W3C Recommendation, published online at http://www.w3.org/TR/xmldsig-core/, 2002. Bellwood, Tom, Luc Clement, David Ehnebuske, Andrew Hately, Maryann Hondo, Yin Leng Husband, Karsten Januszewski, Sam Lee, Barbara McKee, Joel Munter, and Claus von Riegen. The Universal Description, Discovery and Integration (UDDI) protocol. Published on the World Wide Web at http:/Iwww.uddi.org, 2002. Bilorusets, Rusian, Adam Bosworth, Don Box, Felipe Cabrera, Derek Collison, Jon Dart, Donald Ferguson, Christopher Ferris, Tom Freund, Mary Ann Hondo, John Ibbotson, Chris Kaler, David Langworthy, Amelia Lewis, Rodney Limprecht, Steve Lucco, Don Mullen, Anthony Nadalin, Mark Nottingham, David Orchard, John Shewchuk, and Tony Storey. Web Services Reliable Messaging Protocol (WS-ReliableMessaging). Published on the World Wide Web by IBM, Microsoft, BEA, and TIBCO at http://www-106.ibm.com/developerworks/webservices/library/ws-rm/, 2003.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 19 Wednesday, August 4, 2004 8:51 AM
Understanding Web Services
27-19
Box, Don, Francisco Curbera, Maryann Hondo, Chris Kaler, Hiroshi Maruyama, Anthony Nadalin, David Orchard, Claus von Riegen, and John Shewchuk. Web Services Policy Attachment (WS-PolicyAttachment). Published online by IBM, BEA, Microsoft, and SAP at http://www.106.ibm.com/developerworks/webservices/library/ws-polatt, 2002a. Box, Don, Francisco Curbera, Dave Langworthy, Anthony Nadalin, Nataraj Nagaratnam, Mark Nottingham, Claus von Riegen, and John Shewchuk. Web Services Policy Framework (WS-Policy Framework). Published online by IBM, BEA, and Microsoft at http://www-106.ibm.com/developerworks/ webservices/library/ws-polfram, 2002b. Box, Don, Maryann Hondo, Chris Kaler, Hiroshi Maruyama, Anthony Nadalin, Nataraj Nagaratnam, Paul Patrick, and Claus von Riegen. Web Services Policy Assertions (WS-PolicyAssertions). Published online by IBM, BEA, Microsoft, and SAP at http://www-106.ibm.com/developerworks/ webservices/library/ws-polas, 2002c. Bunting, Doug, Martin Chapman, Oisin Hurley, Mark Little, Jeff Mischkinsky, Eric Newcomer, Jim Webber, and Keith Swenson. Web Services Composite Application Framework (WS-CAF) version 1.0. Published online by Arjuna, Fujitsu, IONA, Oracle, and Sun at http://developers.sun.com/ techtopics/webservices/wscaf/index.html, 2003. Cabrera, Felipe, George Copeland, Bill Cox, Tom Freund, Johannes Klein, Tony Storey, and Satish Thatte. Web Services Transactions (WS-Transaction) 1.0. Published online by IBM, BEA, and Microsoft at http://www-106.ibm.com/developerworks/library/ws-transpec, 2002a. Cabrera, Felipe, George Copeland, Tom Freund, Johannes Klein, David Langworthy, David Orchard, John Shewchuk, and Tony Storey. Web Services Coordination (WS-Coordination) 1.0. Published online by IBM, BEA, and Microsoft at http://www-106.ibm.com/developerworks/library/ws-coor, 2002b. Ceponkus, Alex, Sanjay Dalal, Tony Fletcher, Peter Furniss, Alastair Green, and Bill Pope. Business Transaction Protocol. Published on the World Wide Web at http://www.oasis-open.org, 2002. Christensen, Erik, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web Services Description Language (WSDL) 1.1. Published on the World Wide Web by W3C at http://www.w3.org/TR/wsdl, March 2001. Curbera, Francisco, Matthew Duftler, Rania Khalaf, Nirmai Mukhi, William Nagy, and Sanjiva Weerawarana. BPWS4J. Published on the World Wide Web by IBM at http://www.alphaworks.ibm.com/ tech/bpws4j, August 2002a. Curbera, Francisco, Yaron Goland, Johannes Klein, Frank Leymann, Dieter Roller, Satish Thatte, and Sanjiva Weerawarana. Business Process Execution Language for Web Service (BPEL4WS) 1.0. Published on the World Wide Web by BEA, IBM, and Microsoft at http://www.ibm.com/developerworks/library/ws-bpel, August 2002b. Curbera, Francisco, Rania Khalaf, Nirmal Mukhi, Stefan Tai, and Sanjiva Weerawarana. Web Services, The next step: robust service composition. Communications of the ACM: Service Oriented Computing, 46(10), 2003. Dillaway, Blair, Donald Eastlake, Takeshi Imamura, Joseph Reagle, and Ed Simon. XML Encryption Syntax and Processing. W3C Recommendation, published online at http://www.w3.org/TRJxmlenc-core/ , 2002. Fallside, D.C. XML Schema Part 0: primer. W3C Recommendation, published online at http:// www.w3.org/TR/xmlschema-0/, 2001. Foster, Ian, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: enabling scalable virtual organizations. International Journal of Supercomputing Applications, 15(3), 2001. Gudgin, Martin, Marc Hadley, Noah Mendelsohn Jean-Jacques Moreau, and Henrik Frystyk Nielsen. SOAP Version 1.2. W3C Proposed Recommendation, published online at http://www.w3c.org/ 2000/xp/Group/, 2003.
Copyright 2005 by CRC Press LLC
C3812_C27.fm Page 20 Wednesday, August 4, 2004 8:51 AM
27-20
The Practical Handbook of Internet Computing
Hallam-Baker, Phillip, Eve Maler, Stephen Farrell, Irving Reid, David Orchard, Krishna Sankar, Simon Godik, Hal Lockhart, Carlisle Adams, Tim Moses, Nigel Edwards, Joe Pato, Marc Chanliau, Chris McLaren, Prateek Mishra, Charles Knouse, Scott Cantor, Darren Platt, Jeff Hodges, Bob Blakley, Marlena Erdos, and R.L. “Bob” Morgan. Assertions and Protocol for the OASIS Security Assertion Markup Language (SAML). Published on the World Wide Web at http://www.oasis-open.org, 2002. Houston, Iain, Mark C. Little, Ian Robinson, Santosh K. Shrivastava, and Stuart M. Wheater. The CORBA Activity Service Framework for Supporting Extended Transactions. In Proceedings of Middle-ware 2001, number 2218 in LNCS, pages 197+. Springer-Verlag, 2001. Orlowska, M., S. Weerawarana, M. Papazoglou, and J. Yang, Proceedings of the First International Conference on Service-Oriented Computing (ISOC 2003), Springer-Verlag, LNCS 2910, December 2003. Mukhi, Nirmal, Rania Khalaf, and Paul Fremantle. Multi-protocol Web Services for Enterprises and the Grid. In Proceedings of the EuroWeb Conference, December 2002. Papazoglou, Mike P. and Dimitri Georgakopoulos, Eds. Communications of the ACM: Service-Oriented Computing, 46(10) 2003. Steiner, J., C. Neuman, and J. Schiller. Kerberos: An Authentication Service for Open Network Systems. In Usenix Conference Proceedings, Dallas, TX, February 1988. U.S. Census Bureau, editor. North American Industry Classification System (NAICS). U.S. Government, 2002. Yates, Simon, Charles Rutstein, and Christopher Voce. Demystifying b2b integration. The Forrester Report, September 2000.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 1 Wednesday, August 4, 2004 8:53 AM
28 Mediators for Querying Heterogeneous Data CONTENTS Abstract 28.1 Introduction 28.2 Mediator Architectures 28.2.1 Wrappers 28.2.2 Reconciliation 28.2.3 Composable Mediators
28.3 The Amos II Approach to Composable Mediation 28.3.1 The Functional Data Model of Amos II 28.3.2 Composed Functional Mediation 28.3.3 Implementing Wrappers
Tore Risch
28.4 Conclusions References
Abstract The mediator approach to integrating heterogeneous sources introduces a virtual middleware mediator database system between different kinds of wrapped data sources and applications. The mediator layer provides a view over the data from the underlying heterogeneous sources, which the applications can access using standard query-based database APIs. The sources can be not only conventional database servers available over the Internet but also web documents, search engines, or any data-producing system. The architecture of mediator systems is first overviewed. An example illustrates how to define mediators in an object-oriented setting. Finally, some general guidelines are outlined of how to define mediators.
28.1 Introduction The mediator architecture was originally proposed by Wiederhold [31] as an architecture for integrating heterogeneous data sources. The general idea was that mediators are relatively simple distributed software modules that transparently encode domain-specific knowledge about data and share abstractions of that data with higher layers of mediators or applications. A mediator module thus contains rules for semantic integration of its sources, i.e., how to resolve semantic similarities and conflicts. Larger networks of mediators can then be defined through these primitive mediators by logically composing new mediators in terms of other mediators and data sources. It is an often overlooked fact that a mediator (Wiederhold in [31]) was a relatively simple knowledge module in the data integration and that mediators could be combined to integrate many sources. There also need to be a distinction between mediator modules and the system interpreting these modules, the mediator engine. Different mediator modules may actually be interpreted by different kinds of mediator engines.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 2 Wednesday, August 4, 2004 8:53 AM
28-2
The Practical Handbook of Internet Computing
Application 1
Application 2
Application 3
Application interfaces (e.g. JDBC) Mediator (e.g. Extensible Relational DBMS) Wrapper 1 Wrapper 2 Wrapper 3
Data source 1
Data source 2
Data source 3
FIGURE 28.1 Central mediator architecture.
Many systems have since then been developed based on the mediator approach. [4, 1, 6, 8, 13, 29] Most of these systems regard the mediator as a central system with interfaces to different data sources called wrappers. We will call this a central mediator. Often the mediator engines are relational database systems extended with mechanisms for accessing other databases and sources. The central mediators provide (e.g., SQL) queries to a mediator schema that includes data from external sources. Important design aspects are performance and scalability over the amounts of data retrieved. Some problems with the central mediator approach are that a universal global schema is difficult to define, in particular when there are many sources. Another important issue when integrating data from different sources is the choice of common data model1 (CDM) for the mediator. The CDM provides the language in which the mediating views are expressed. The mediator engine interprets the CDM and queries are expressed in terms of it. If the CDM is less expressive than some of the sources, semantics will be lost. For example, if some sources have object-oriented (OO) abstractions, a mediator based on the relational model will result in many tables where the OO semantics is hidden behind some conventions of how to map OO abstractions to less expressive tabular representations. We will describe in Section 28.1 the architecture of central mediators and wrappers, followed by a description of composable mediators where mediators may wrap other mediators, in Section 28.2.3. As an example of a composable mediator system, in Section 28.2.3 we make an overview of the Amos II mediator engine and show a simple example of how to mediate heterogeneous data using Amos II. Amos II is based on a distributed mediator architecture and a functional data model that permit simple and powerful mediation of both relational and object-oriented sources. Our example illustrates how data from a relational database can be mediated with data from an XML-based Web source using a functional and object-oriented common data model.
28.2 Mediator Architectures The central mediator/wrapper architecture is illustrated in Figure 28.1. A central mediator engine is interfaced to a number of data sources through a number of wrappers. The engine is often a relational database manager. The central mediator contains a universal mediator schema that presents to users and applications a transparent view of the integrated data. The mediator schema must further contain metainformation of how to reconcile differences and similarities between the wrapped data sources.2 SQL
1 2
We use the term data model to mean the language used for defining a schema. Sometimes the term ontology is used for such semantically enriched schemas.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 3 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-3
queries posed to the mediator in terms of the universal schema are translated to data access calls to the source data managers. Applications interact with the mediator manager using standard interfaces for the kind of database management system used (e.g., JDBC or ODBC for relational databases). Most mediator systems so far are based on the relational database model, but other data models are conceivable too. Being a database manager of its own, the mediator will contain its own data tables. These tables are normally an operational large database. One can regard the central mediator architecture as being a conventional database server extended with possibilities to efficiently access external data sources. Wrappers are interfaces between the mediator engine and various kinds of data sources. The wrappers implement functionality to translate SQL queries to the mediator into query fragments (subqueries) or interface calls to the sources. If a mediator has many sources, there will be many query execution strategies for calling the sources and combining their results. One important task of the mediator engine is to do such query decomposition to utilize, in an optimized way, the query capabilities of the sources. The problem of updates in mediators has not gained much attention. One reason for this is that mediators normally wrap autonomous sources without having updates rights to these. Another reason is that mediator updates are problematic because mediators are essentially views of wrapped data. Therefore, updates in mediators are similar to the problem of updating views in relational databases, which is possible only in special cases. [2, 16] Vidal and Loscio [30] propose some view update rules for mediators. Notice here that the mediator approach is different from the data warehouse approach of integrating data. The idea of integrating data in data warehouses is to import them to a central very large relational database for subsequent data analysis using SQL and OLAP (Online Analytical Processing) tools. The data importation is done as regular database applications that convert external data to tabular data inserted into the data warehouse. Such data importations are run offline regularly, e.g., once a day. In contrast, the mediator approach retains the data in the sources. Queries to the central mediator schema are dynamically translated by the query processor of the mediator engine into queries or subroutine calls retrieving data from the sources at query time. The data integration in a mediator system is thus online. This makes data and decisions based on data current. From an implementation point of view the mediator approach is more challenging because it may be difficult to efficiently process dynamically provided data. In the data warehouse solution, efficiency can rather easily be obtained by careful physical design of the central relational database tables. A mediator engine must dynamically access external data in real time, which is more challenging.
28.2.1 Wrappers Figure 28.2 illustrates the architecture of a general wrapper component of a mediator system. A wrapper may contain both physical interfaces to a source and rules or code for translating the data of each source to the schema of the mediator represented by its CDM. The different layers do the following tasks: 1. On the lowest layer there is a physical interface to the data source. For example, if relational databases are accessed, there need to be interfaces for connecting to the sources, sending SQL queries to the sources as strings, and iterating over the result tuples. For relational databases this can be implemented using the standardized JDBC/ODBC APIs that are based on sending SQL strings to the database server for evaluation. SQL Management of External Data, SQL/MED, [23] is an ISO standard that provides interface primitives for wrapping external data sources from an extensible relational database system. SQL/MED provides wrapper-specific interface primitives regarding external data as foreign tables. Using SQL/MED, the wrapper implementor provides tabular wrapper abstractions by internally calling data-source-specific physical interfaces to obtain the information required for the foreign table abstractions. The physical interface layer can therefore be seen as hidden inside the interface to a wrapped source. SQL/MED is supported by IBM’s mediation product DB2 Information Integrator3 having SQL/MED-based wrappers implemented for all major relational database systems, XML-documents, Web search engines, etc. 3
http: //www.ibm.com/software/data/integration/db2ii/
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 4 Wednesday, August 4, 2004 8:53 AM
28-4
The Practical Handbook of Internet Computing
Query Optimization Data Model Mapping Pysical Interface
Data source FIGURE 28.2 Wrapper layers.
In some systems the term wrapper is used to mean simply grammar rules for parsing, e.g., HTMLdocuments.[7] This would here mean a physical interface to HTML sources. 2. A data source may use a different data model than the CDM of the mediator engine, The wrapper must therefore contain code expressing how to map schema constructs between different data models, here called data model mappings. For example, if a source is an object-oriented database and the mediator engine uses a relational data model, the object-oriented concepts used by the source need to be translated to relational (tabular) data. In such a case the wrapper for that kind of source will contain general application-independent knowledge of how to map objects to relations. In the SQL/MED standard [23] this is achieved by delivering accessed data as foreign relational tables to the central mediator through external relations that are implemented as functions in, e.g., C, that deliver result table rows tuple-wise. 3. Some sources may require source-specific wrapper query optimization methods. Wrapper query optimization is needed, e.g., if one wants to access a special storage manager indexing data of a particular kind, such as free text indexing. The query optimizer of the mediator engine will then have to be extended with new query optimization rules and algorithms dealing with the kind of query operators the storage manager knows how to index. For example, for text retrieval there might be special optimization rules for phrase-matching query functions. The optimizations may have to be extended with costing information, which is code to estimate how expensive a query fragment to a source is to execute, how selective it is, and other properties. Furthermore, sourcespecific query transformation rules may be needed that generate optimized query fragments for the source. These query fragments are expressed in terms of the source’s query language, e.g., some text retrieval language for an Internet search engine. Once a wrapper is defined, it is possible to make queries to the wrapped data source in terms of queries of the mediator query language. For example, if the common data model is a relational database, SQL can be used for querying the wrapped data source as an external table.
28.2.2 Reconciliation Different sources normally represent the same and similar information differently from the integrated schema. The schema of the mediator therefore must include view definitions describing how to map the schema of a particular source into the integrated mediator schema. Whereas the data model mapping rules are source and domain independent, these schema mapping rules contain knowledge of how the schema of a particular wrapped data source is mapped into the mediator’s schema. The most common method to specify the schema mapping rules is global as view. [6, 8, 11, 22, 29] With global as view, the mediator schema is defined in terms of a number of views that map wrapped
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 5 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-5
source data (external relations) to the mediator’s schema representations. Thus views in the integrated schema are defined by matching and transforming data from the source schemas. In defining these views, common keys and data transformations need to be part of the view in order to reconcile similarities and differences between source data. For relational mediators, SQL can be used for defining these views. However, data integration often involves rather sophisticated matching and reconciliation of data from the different sources. The same and similar information may be present in more than one source and the mediator needs to deal with how to handle cases when there are conflicting and overlapping data retrieved from the sources. Such reconciliation operations are often difficult or impossible to express in basic SQL. For example, rather complex string matching may be needed to identify equivalent text retrieved from the Web, and advanced use of outer joins may be needed for dealing with missing information. Therefore, in a mediator based on a relational database engine, such functionality will often have to be performed as user-defined functions (UDFs) plugged into the relational database engine. The schema mapping rules can then be expressed by view definitions containing calls to these UDFs. With global as view, whenever a new source is to be integrated, the global mediator view definitions have to be extended accordingly. This can be problematic when there are many similar sources to integrate. With local as view [19], there is a common fixed-mediator schema. Whenever a new source is to be integrated, one has to define how to map data from the global schema to the new source without altering the global schema. One thus, so to speak, includes new sources by defining an inverse top-down mapping from the mediator schema to the source schema. Local as view has the advantage that it is simple to add new sources. There are, however, problems of how to reconcile differences when there are conflicts and overlaps between sources. Usually, local as view provides some default reconciliation based on accessing the “best” source, e.g., by best covering the data needed for a user query. If one needs careful reconciliation management, local as view does not provide good mechanisms for that; local as view is more suited for “fuzzy” matching such as for retrieving documents. There are also tools to semiautomatically generate schema mappings, e.g., the Clio system. [24] Clio uses some general heuristics to automatically generate schema mappings as view definitions, and these mappings can be overridden by the user if they are incorrect.
28.2.3 Composable Mediators The central mediator architecture is well suited for accessing external data sources of different kinds from an extensible relational database server. This provides relational abstractions of all accessed data, and these abstractions can be made available to applications and users through regular SQL APIs such as JDBC/ODBC. One can predict that different information providers will set up many such information integration servers on the Internet. Each information provider provides transparent views over its mediated data. It may even be so that the same data source is transparently mediated by different mediator servers. When many such mediator servers are available, there will be need for mediating the mediators too, i.e., to define mediator servers that access other mediator servers along with other data sources. This we call composable mediators and is illustrated by Figure 28.3. There is another reason that next generation mediators should be composable. That is to be able to scale the data integration process by modularizing the data integration of large numbers of data sources. Each single meditator should be a module that contains the knowledge of how to integrate just a small number of sources and provide view abstractions of these sources to higher-level mediators and applications. Rather than trying to integrate all sources through one mediator as in the central mediator approach, compositions are defined in terms of other mediator compositions. Composable mediators reduce the complexity of defining mediators over many sources because the data integration is modularized and logically composed. This is important in particular when integrating the large numbers of different kinds of sources available on the Internet. It is possible to achieve similar effects by view compositions inside a central mediator server too, but it is not always realistic to have one mediator server integrating all data. Furthermore, in many cases it seems less natural to pass through
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 6 Wednesday, August 4, 2004 8:53 AM
28-6
The Practical Handbook of Internet Computing
Application 1
Application 2
Application 3
APIs Mediator W1
W2
APIs
APIs
Mediator
Mediator
W1 DS1
W3 DS2
W4
W4 DS3
W5 DS4
W6 DS5
FIGURE 28.3 Composed mediator architecture.
the same large relational database server whenever data integration is required, but rather one would like to have compositions of many more mediator servers not always based on tabular data. Even though composable mediators provide a possible solution to large-scale data integration design, it also poses some implementation challenges. For example, if each mediator is regarded as a database view, the regular multilevel view expansion used in relational DBMSs produce very large queries. Because the cost-based query optimization used in relational databases is NP-complete over the size of the query, it will be impossible to optimize large compositions of mediator views with cost-based query optimization. This has been investigated in [14] where incomplete view expansion for composable mediators is proposed. There it is shown that view expansion is favorable in particular when there are common sources hidden inside the views. Therefore, knowledge of what sources a mediator view is dependent on can be used in deciding whether or not to expand it. Another problem is the optimization of queries in a composable mediation framework that contains more or less autonomous mediator peers having their own query processors. In such an environment the different mediator query optimizers will have to cooperate to compile query fragments, as investigated in Josifovski et al. [9, 12]. In the DIOM project [22] there is a similar framework for integration of relational data sources with a centrally performed compilation process.
28.3 The Amos II Approach to Composable Mediation As an example of a mediator system we give an overview of the Amos II system [25, 26] followed by a simple example of how to mediate using Amos II. The example illustrates some of the problems involved in defining views over distributed and heterogeneous data. Amos II is based on a functional data model having its roots in the Daplex data model. [28] One of the original purposes of Daplex was data integration and similar functional data models have been used for data integration in several other systems, including Multibase, [3] Pegasus, [4] and PIFDM. [17] Most mediator systems use a relational data model. However, a functional model turns out to be a very wellsuited data model for integrating data because it is more expressive and simple than both OO and relational models and has in Amos II been extended with mediator reconciliation primitives. [10, 11]
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 7 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-7
Amos II is a distributed system where several mediator peers communicate over the Internet. Each mediator peer appears as a virtual functional database layer. We say therefore that Amos II is a peer mediator system. [14] Functional views provide transparent access to data sources from clients and other mediator peers. Conflicts and overlaps between similar real-world entities being modeled differently in different data sources are reconciled through the mediation primitives [10, 11] of the multi-mediator query language AmosQL. The mediation services allow transparent access to similar data structures represented differently in different data sources. Applications access data from distributed data sources through queries to views in some mediator peer. Logical composition of mediators is achieved when multidatabase views in mediators are defined as Amos II functions in terms of functions in other Amos II peers or data sources. The functional multidatabase views make the mediator peers appear to the user as a single virtual database containing a set of function definitions that transparently combine underlying data. General queries can be specified over these mediator functions using the functional query language AmosQL. A distributed query optimizer will thereby translate query expressions into optimized query fragments over the sources of a mediator while respecting the autonomy of sources and mediator peers. In order to access data from external data sources rather than other Amos II peer, one or several wrappers can be defined inside an Amos II mediator. An Amos II wrapper for a particular kind of data source consists of an interface to physically access such sources and a translator for wrapper query optimization. The wrappers are defined by a set of functions that map the data source’s data into the schema of the mediator; i.e., global as view is used. Wrappers have been defined for ODBC-based access to relational databases, [5] access to XML documents, [20] CAD systems, [18] or Internet search engines. [15] External Amos II peers known to a mediator can also be regarded as external data sources, and there is a special wrapper for accessing other Amos II peers. However, among the Amos II peers, special query optimization methods are used that take into account the distribution, capabilities, costs, etc., of the different peers. [12, 14, 9]
28.3.1 The Functional Data Model of Amos II The basic concepts of the data model are objects, types, and functions. Each mediator database schema is defined in terms of these basic concepts. In order to wrap a non-Amos Il data source inside an Amos II mediator database, the system has mechanisms for defining user-defined functions and types along with query processing rules over these. Furthermore, in order to compose views of data in other mediator peers, the basic concepts are orthogonally extended with proxy objects that are placeholders for corresponding objects in other mediators. 28.3.1.1 Objects Objects model all entities in a mediator database. The system is reflective in the sense that everything in Amos II is represented as objects managed by the system, both system and user-defined objects. There are two main kinds of representations of objects: literals and surrogates. The surrogates have associated object identifiers (OIDs) that are explicitly created and deleted by the user or the system. Examples of surrogates are objects representing real-world entities such as persons, meta-objects such as functions, or even Amos II mediators as meta-mediator objects stored in some Amos Il mediator database. The literal objects are self-described system-maintained objects that do not have explicit OIDs. Examples of literal objects are numbers and strings. Objects can also be collections, representing collections of other objects. The system-supported collections are bags (unordered sets with duplicates) and vectors (order-preserving collections). Literal objects and collections are automatically deleted by an incremental garbage collector when they are no longer referenced in the database. Proxy objects in a mediator peer are local OIDs having associated descriptions of corresponding surrogate objects stored in other mediators or data sources. They provide a general mechanism to define references to remote surrogate objects in other mediators or in data sources wrapped by a mediator.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 8 Wednesday, August 4, 2004 8:53 AM
28-8
The Practical Handbook of Internet Computing
Proxy objects are created implicitly by the system whenever OIDs need to be exchanged with other mediators. They are garbage collected by the system when no longer needed. 28.3.1.2 Types Objects are classified into types, making each object an instance of one or several types. The set of all instances of a type is called the extent of the type. The types are organized in a multiple-inheritance, supertype/subtype hierarchy. If an object is an instance of a type, then it is also an instance of all the supertypes of that type, and the extent of a type is a subset of all extents of the supertypes of that type (extent-subset semantics). For example, if the type Student is a subtype of type Person, the extent of type Student is also a subset of the extent of type Person. The extent of a type that is multiple inherited from other types is a subset of the intersection of the extents of its supertypes. There are two kinds of types, stored and derived types: • A stored type is a type whose extent is explicitly stored in the local store of a mediator. Stored types are used for representing metadata about data retrieved from other mediators. Amos II can also be used as a stand-alone database server and then the stored types represent stored database objects. • In contrast, a derived type is a virtual type whose extent is defined through a query over the (virtual) database in a mediator. The derived types are used for combining and reconciling differences between data retrieved from heterogeneous schemas in different mediators, as will be explained later. Stored types are defined and stored in an Amos II peer through the create type statement. For example, assume we have a database named Uppsala with these types: create create create create create create
type type type type type type
Person; Employee under Person; Teacher under Employee; Student under Person; Course; Attendance;
The above statements define in a mediator a schema with six new types; the extent of type Person is the union of all objects of types Person, Employee, Student, and Teacher. The types themselves are represented as instances of a system type named Type. For defining types stored in other mediators and sources, the system internally uses proxy types, which are proxy objects for objects of type Type in other mediators. A proxy object is an instance of some proxy type or types, and the extent of a proxy type is a set of proxy objects. Proxy types are defined implicitly by the system when the user references types in other mediators. For example, the following statement defines in a mediator a derived type UppsalaStudent that inherits its contents from the type named Student in a mediator named Uppsala: create derived type UppsalaStudent under Student@Uppsala;
The mediator where UppsalaStudent is defined will internally use a proxy-type object to reference the external type Student@Uppsala. The extent of type UppsalaStudents is a set of proxy objects for the extent of type Student@Uppsala. This is a primitive example of integration of data from other mediators. One may then ask for all objects of type UppsalaStudent by the query: select s from UppsalaStudent s;
The query will return proxy objects for all objects of type Student in peer Uppsala. 28.3.1.3 Functions Functions model properties of objects, computations over objects, and relationships between objects. Functions are basic primitives in AmosQL queries and views. A function consists of two parts, the
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 9 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-9
signature and the implementation: The signature defines the types and optional names of the argument or arguments and the result of a function. For example, the signature of the function modeling the attribute name of type Person has the signature: name(Person)->Charstring;
The implementation of a function specifies how to compute its result given a tuple of argument values. For example, the function name could obtain the name of a person by accessing a wrapped data source. The implementation of a function is normally nonprocedural, i.e., a function only computes result values for given arguments and does not have any side effects. Furthermore, Amos II functions are often multidirectional, meaning that the system is able to inversely compute one or several argument values if the expected result value is known. Inverses of multidirectional functions can be used in database queries and are important for specifying general queries with function calls over the database. For example, in the following query that finds the age of the person named “Tore,” if there is an index on the result of function name, the system will use the inverse of function name to avoid iterating over the entire extent of type Person: select age(p) from Person p where name(p) = ‘Tore’;
Depending on their implementation, the basic functions can be classified into stored, derived, and foreign functions. • Stored functions represent properties of objects (attributes) locally stored in an Amos II mediator. Stored functions correspond to attributes in object-oriented databases and tables in relational databases. In a mediator, stored functions are used for representing metadata about data in sources, private data, or data materialized in the mediator by the mediator engine. • Derived functions are functions defined in terms of functional queries over other Amos II functions. Derived functions cannot have side effects, and the query optimizer is applied when they are defined. Derived functions correspond to side-effect-free methods in object-oriented models and views in relational databases. AmosQL uses an SQL-like select query statement for defining derived functions. • Foreign functions (user-defined functions) provide the basic interfaces for wrapping external systems from Amos II. For example, data structures stored in external storage managers can be manipulated through foreign functions as well as basic interfaces to external query systems, e.g., JDBC and ODBC interfaces. Foreign functions can also be defined for updating external data structures, but foreign functions used in queries must be side-effect free. Foreign functions correspond to methods in object-oriented databases. Amos II provides a possibility to associate several implementations of inverses of a given foreign function — multidirectional foreign functions — which declares to the query optimizer that there are several access paths implemented for the function. To help the query processor, each associated access path implementation may have associated cost and selectivity functions. The multidirectional foreign functions provide access to external storage structures similar to data “blades,” “cartridges,” or “extenders” in object-relational databases. [27] The basis for the multidirectional foreign function was developed in Litwin and Risch, [21] where the mechanisms are further described. Amos II functions can furthermore be overloaded, meaning that they can have different implementations, called resolvents, depending on the type or types of their arguments. For example, the salary may be computed differently for types Student and Teacher. Resolvents can be any of the basic function types. Example of functions in the previous Amos IT database schema are: create create create create
function function function function
Copyright 2005 by CRC Press LLC
ssn(Person) -> Integer; /* Stored function */ name(Person) -> Character; pay(Employee) -> Integer; subject(Course) -> Character;
C3812_C28.fm Page 10 Wednesday, August 4, 2004 8:53 AM
28-10 create create create create create
The Practical Handbook of Internet Computing function function function function function
teacher(Course) -> Teacher; student(Attendance) -> Student; course(Attendance) -> Course; score(Attendance) -> Integer; courses(Student s) -> Bag of Course c /* Derived function */ as select c from Attendance a where student(a) = s and course (a) = c; create function score(Student s, Course c) -> Integer sc as select score(a) from Attendance a where student(a) = s and course(a) = c; create function teaches(Teacher t) -> Bag of Course c /* Inverse of teacher */ as select c where teacher(c) = t;
The function name is overloaded on types Person and Course. The functions courses and score are derived functions that use the inverse of function course. The function courses returns a set (bag) of values. If “Bag of ” is declared for the value of a function, it means that the result of the function is a bag (multiset), e.g., function courses. Functions (attributes) are inherited, so the above statement will make objects of type Teacher have the attributes name, ssn, dept, pay, and teaches. As for types, function definitions are also system objects belonging to a system type named Function. Functions in one mediator can be referenced from other mediators by defining proxy functions whose OIDs are proxy objects for functions in other mediators. The creation of a proxy function is made implicitly when a mediator function is defined in terms of a function in another mediator. For example: create function uppsala_students() -> Bag of Charstring as select name(s) from Student@Uppsala s;
returns the names of all objects of type Student in peer Uppsala. The system will internally generate a proxy type for Student@Uppsala and a proxy function for the function name in Uppsala. Proxy objects can be used in combination with local objects. This allows for general multidatabase queries over several mediator peers. The result of such queries may be literals (as in the example), proxy objects, or local objects. The system stores internally information about the origin of each proxy object so it can be identified properly. Each local OID has a locally unique OID number, and two proxy objects are considered equal if they represent objects created in the same mediator or source with equal OID numbers. Proxy types can be used in function definitions as any other type. In the example, one can define a derived function returning the teacher proxy object of a named teacher peer Uppsala: create function uppsala_teacher_named(Charstrirng nm) -> Teacher@Uppsala as select nm from Teachers@Uppsala p where name(t) = nm;
In this case the local function Uppsala_teacher_named will return a proxy object instance of the type Teacher in mediator named Uppsala for which it holds that the value of function name in Uppsala
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 11 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-11
returns nm. The function can be used freely in local queries and function definitions and as proxy functions in multidatabase queries from other mediator peers. For example, this query returns proxy objects for the students of a course taught by a person named Carl in peer Uppsala: select students(teaches(Uppsala_teacher_named(‘Carl’)));
The so-called Daplex-semantics is used for function composition, meaning that bag-valued function calls are automatically unnested. This can also be seen as a form of extended path expressions through functional notation.4
28.3.2 Composed Functional Mediation The multidatabase primitives of AmosQL provides the basis for defining derived types and functions in mediators that combine and reconcile data from several sources. Types, functions, and queries provide powerful primitives for mediating both relational and object-oriented data sources without losing data semantics. As an example, assume we have two sources: 1. A relational database in Stockholm stores details about students and teachers at Stockholm university and the names and social security numbers of all Swedish residents. It has the relations residents(ssn, firstname, lastname, address) course(cid, name, teacherssn) takes(ssn, cid, score)
2. An XML-based database in Uppsala stores details about students at Uppsala University. It is stored as an XML document on the Web wrapped by Amos II. The XML file is loaded into the system and there represented by the Uppsala schema given above. The wrapper imports and converts the data by reading the XML document and stores it in the mediator’s database using database update statements. The details of how to translate XML documents to Amos II is not detailed here; basic XML primitives can be mapped to Amos II data elements automatically, [20] or some XML wrapping tool (e.g. Xerces2 [32]) can read the source based on an XML-Schema definition. The alternative to retaining the data in the XML-source is also possible if the source is managed by some XQuery tool.5 In that case the wrapper will be more complex and needs to translate mediator queries into XQuery statements. Now we define a mediator named StudMed to be used by students attending both Uppsala and Stockholm universities. It will access data wrapped by the two Amos II peers named Uppsala and Stockholm, both of which are assumed to be set up as autonomous mediator peers. In our scenario we furthermore semantically enrich the wrapped relational data in Stockholm by providing an object-oriented view of some of the tabular data. This is done by defining a derived type Student along with object navigation functions that connect instances through object references rather than foreign key references. We begin by showing how to define a mediator that wraps and semantically enrich the relational database in Stockholm, and then we show how to define the mediator StudMed fusing data from both wrapped sources. 28.3.2.1 Wrapping the Relational Database Entities from external sources are linked to a mediator by calling the system function access (source, entities) that calls the wrapper to import to the data source schema. If the source is a wrapped relational 4 With an extended path notation, the above query could have been written as select “Carl”.uppsala-teachernamed.teaches.students. 5 http. //www.w3.org/XMLQuery#implementations.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 12 Wednesday, August 4, 2004 8:53 AM
28-12
The Practical Handbook of Internet Computing
database, the specified source relations become proxy types in Amos Il and the columns become proxy functions. The relational database is accessed from the Stockholm mediator through these commands: set :a = jdbc(“ibds”, “interbase.interclient.Driver”); connect(:a, “jdbc:interbase://localhost/stockholm.gdb”, “SYSDBA”, “masterkey”); access(:a, {“resident”, “course”, “teaches”, “takes”});
Here the Interbase Interclient JDBC driver6* is used to connect to the database stockholm.gdb running on the same host as the mediator peer Stockholm. After the above commands, the following types and functions are available in the mediator Stockholm: type Resident function ssn(Resident) -> Integer function firstname(Resident) -> Charstring function lastname(Resident) -> Charstring function address(Resident) -> Charstring type Course function cid(Course) -> Integer function name(Course) -> Charstring type Teaches function ssn(Teaches) -> Integer function cid(Teaches) -> Course type Takes function ssn(Takes) -> Integer function cid(Takes) -> Integer function score(Takes) -> Integer
Because there is no explicit type Student in the wrapper, we define it as a derived type: create function student?(Resident p) -> boolean as select true where some(select true from takes t where ssn(t)=ssn(p)); create derived type Student under Resident p where student?(p);
A resident is thus a student if he takes some course. This is an example of a schema mapping. We also need functions to represent the relationship between a course enrollment and its student: create function student(Takes t) -> Student s as select s where ssn(t) = ssn(s); create function course(Takes t) -> Course c as select c where cid(c) = cid(t);
These two functions allow direct object references between objects of types Takes,Student, and Course. At this point we can set up Stockholm as a mediator peer on the Internet. Other peers and applications can access it using AmosQL.
6*
http://prdownloads. sourceforge.net/firebird.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 13 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-13
28.3.2.2 Composing Mediators The mediator StudMed provides a composed view of some combined data from mediators Stockholm and Uppsala. The mediator schema has the following types and function signatures: type Student function id(Person) -> Integer function name(Person) -> Charstring type Course function subject(Course) -> Character type Takes function student(Takes) -> Student function course(Takes) -> Course function score(Takes) -> Integer
All these types and functions are proxies for data in one of or both the other peers. We begin with accessing the desired types and functions of the other mediators by a system call: access(“Uppsala”, {“Student”, “Course”,“Teacher”, “Attendance”}); access(“Stockholm”, (“Student”, “Course”, “Takes”});
Here the call to access defines a proxy type Student@Stockholm with proxy functions ssn, name, address (inherited). Analogously for the other types accessed from the mediator peers. The data of the type Student in mediator StudMed is derived from the derived type Student in mediator Stockholm and the stored type Student in mediator Uppsala. In StudMed we wish to model the union of the students in Uppsala and Stockholm along with their properties. This is modeled in Amos II as a derived supertype of the proxy types Student@Stockholm and Student@Uppsala. In order to reconcile corresponding students, we need a key for Students in Uppsala and Stockholm, and the social security number (SSN) provides this. Since various properties of students are computed differently in different sources, we also need to define how to compute equivalent properties from different sources and how to reconcile differences, conflicts, and overlaps. A special derived-type syntax called IUT (Integration Union Type) [11] provides the primitives to do the reconciliation: create derived type Student key Integer id /* Common key = SSN */ supertype of Student@Stockholm s: ssn(s),/* Key mapping */ Student@Uppsala u: ssn(u)/* Key mapping */ functions (name Charstring)/* Mediator view function */ case s: name = firstname(s) + “ “÷ lastname(s); /* Concatenated names */ end functions;
Because function name is not directly available in Stockholm, it must be reconciled through string concatenation (case s) when a student attends only Stockholm classes. If a student attends classes from both cities, the name function from Uppsala is used. The integrated type Course can similarly be defined using the following definition: create derived type Course key Charstring subject supertype of
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 14 Wednesday, August 4, 2004 8:53 AM
28-14
The Practical Handbook of Internet Computing Course@Stockholm st: name(st), Course@Uppsala u: subject(u);
Finally, we need to specify the derived-type Takes that links students to courses. Inheritance between objects from different mediators provides a convenient way to reference objects of types Student and Course. First we need to define two utility functions that compute the composite key for the two different sources as a vector ({} notation): create function takesKey(Takes@Stockholm s) -> Vector as select {ssn(s),name(c)} from Course@Stockholm c where cid(s) = cid(c); create function takesKey(Attendance@Uppsala u) -> Vector as select (ssn(student(u)), title(course(u))};
Now we can define the IUT for Takes simply as: create derived type Takes key vector supertype of Takes@stockholm s: takesKey(s), Attendance@uppsala u: takesKey(u) functions (student Student, course Course, score Integer) case s,u: student = student(u); course = course(u); score = min(score(u),score(s)); end functions;
The definition becomes this simple because there are already corresponding functions student, course, and score in both mediator peers Uppsala and Stockholm. The functions student in Stockholm and Uppsala return objects of types Student@Stockhom and Student@Uppsala, respectively, which are inherited from type Student in mediator StudMed. Analogously for function course returning type Course also inherited between the mediators. 28.3.2.3 Querying the Mediator From the user’s point of view, the mediator looks like any other Amos II database. It can be freely queried using AmosQL. In our example, we can ask general queries such as the names and SSNs of all students with a score on some course larger than 5: select distinct name(s), id(s) from Student s, Takes t where score(t)>5 and student(t)=s;
The distributed mediator query optimizer translates each query to interacting query optimization plans in the different involved mediators and sources using an set of distributed query optimization and transformation techniques. [9, 12, 14] Since types and functions are specified declaratively in AmosQL through functional queries, the system utilizes knowledge about type definitions to eliminate overlaps and simplify the queries before generating the distributed query execution plans. [10, 11] The plans are generated through interactions between the involved mediators. Global query decomposition strategies are used for obtaining an efficient global execution plan. [9, 12]
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 15 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-15
28.3.3 Implementing Wrappers The physical wrapper interface is implemented as a set of foreign AmosQL functions, while the data model mapping and wrapper query optimization are implemented through a set of source specific rewrite rules that transforms general AmosQL query fragments into calls to the interface foreign functions. Finally, the schema mappings are defined through derived functions and types as in the example discussed earlier. 28.3.3.1 Foreign Functions As a very simple example of how to wrap a data source using a multidirectional foreign function, assume we have an external disk-based hash table on strings to be accessed from Amos II. We can then implement it as follows: create function get_string(Charstring x)-> Charstring r as foreign “JAVA:Foreign/get_hash”;
Here the foreign function get_string is implemented as a Java method get_hash of the public Java class Foreign. The Java code is dynamically loaded when the function is defined or the mediator initialized. The Java Virtual Machine is interfaced with the Amos II kernel through the Java Native Interface to C. Multidirectional foreign functions include declarations of inverse foreign function implementations. For example, our hash table can not only be accessed by keys but also scanned, allowing queries to find all the keys and values stored in the table. We can generalize it by defining: create function get_string(Charstring x)->Charstring y as multidirectional (“bf” foreign “JAVA:Foreign/get_hash” cost (100,1)) (“ff” foreign “JAVA:Foreign/scan_hash” cost “scan_cost”);
Here, the Java method scan_hash implements scanning of the external hash table. Scanning will be used, e.g., in queries retrieving the hash key for a given hash value. The binding patterns, bf and ff, indicate whether the argument or result of the function must be bound (b) or free (f) when the external method is called. The cost of accessing an external data source through an external method can vary heavily depending on, e.g., the binding pattern, and to help the query optimizer, a foreign function can have associated costing information defined as user functions. The cost specifications estimate both execution costs in internal cost units and result sizes (fanouts) for a given method invocation. In the example, the cost specifications are constant for get_hash and computed through the Amos II function scan_cost for scan_hash. For relational sources there is a primitive foreign access function to send parameterized query strings to a relational database source: create function sql Relational s, Charstring query, Vector params) -> bag of Vector as foreign “PrepareAndExecute”;
Example of call: sql(:s, “select c.cid from course c where c.name = ?”, {“Programming”});
For a given relational data source s, the function executes the specified query with parameters marked ? substituted with the corresponding values in params. The function sql is implemented as an overloaded foreign function calling ODBC or JDBC, depending on the type of the source.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 16 Wednesday, August 4, 2004 8:53 AM
28-16
The Practical Handbook of Internet Computing
28.3.3.2 Defining Translator Rules The wrapping of an external hash table requires no further data model mapping or query optimization. However, for more advanced sources, such as relational databases, the wrapper must also include rewrite rules that translate functional query expressions into optimized source queries. [5] It is the task of the translator part of a wrapper to transform AmosQL queries to the wrapper into calls to primitive functions like sql. In Fahl and Risch [5] it is explained what rewrites are needed for relational databases. To provide for a convenient way to integrate new kinds of data sources, we have developed a general mechanism to define wrapper query optimization by translation rules for different kinds of sources. The system includes mechanisms to define different capabilities of different kinds of sources, where a capability basically specifies which kind of query expressions can be translated by a particular source. The capabilities are specified by a set of rewrite rules associated with the wrapped source. These rules identify which query expressions a particular source can handle and transform them into primitive calls to interface functions for the source. For wrappers of relational databases, the transformation rules identify connected subqueries for a given source and then translate the graph into SQL query strings. Cartesian products and calls to functions not executable as SQL are processed as queries in the wrapping mediator. Thus the rewrite rules specify what parts of an AmosQL query to a wrapped source can be translated to source queries and what parts need to be treated by Amos II.
28.4 Conclusions Mediators provide a general framework for specifying queries and views over combinations of heterogeneous data sources. Each kind of data source must have a wrapper which is a program module transforming source data into the common data model used by the mediator engine. A given wrapper provides the basic mechanisms for accessing any source of the wrapped kind. The mediator will contain views defining mappings between each source and the schema of the mediator. We discussed various overall architectures of mediator systems and wrappers. As an example of a mediator system we gave an overview of the Amos II mediator system and an example of how it integrates wrapped heterogeneous data. In summary, the following is involved in setting up a mediator framework: • Classify kinds of data sources. First one needs to investigate what kinds of sources are involved in the mediation. Are wrappers already defined for some of the sources? If not, the available APIs to the different kinds sources are investigated in order to design new wrappers. • Implement wrappers. Wrappers needs to be designed and implemented that translate queries in the mediator query language into queries of the source. The results from the source queries are translated back from the source representation into the data abstractions of the mediator model. This is the most challenging task in the data integration process. For example, if an Internet search engine is to be wrapped and a relational data model is used in the mediator, SQL queries to a source need to be translated into query search strings of the particular search engine. The wrapper will pass the translated query strings to the search engine using some API of the source permitting this. In the same way, the results passed back from the search engine need to be passed to the relational mediator as rows in relations. [15] The wrappers need to translate API calls used by the mediator engine into API calls of the source, e.g., adhering to the SQL/MED standard if that is used by the mediator system. • Define source schemas. The particular data sources to be mediated need to be identified and analyzed. The structure of the data to access need to be investigated and their schema defined. For example, if a relational database model is used, the sources are modeled as a number of external source relations. • Define mediator schema. The schema of the mediator needs to be defined in terms of the source schemas. Views in the integrated schema are defined by matching data from the source schemas.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 17 Wednesday, August 4, 2004 8:53 AM
Mediators for Querying Heterogeneous Data
28-17
In our example, we illustrated the reconciliation using a functional data model. In the relational data model, reconciliation means defining views that join source relations. In defining these views, common keys and data transformations need to be defined in order to reconcile similarities and differences between source data. A problem here can be that SQL does not support reconciliation and therefore the mediating view may be complex, involving user defined functions (UDFs) to handle some matchings and transformations. Most present mediator frameworks are central in that a central mediator schema is used for integrating a number of sources in a two-tier mediator framework. The composable mediator framework [9, 14] generalizes this by allowing transparent definitions of autonomous mediators in terms of other mediators without knowing internals of source mediators. Thus a multi-tier network of interconnected mediators can be created where higher-level mediators do not know that lower-level mediators in their turn access other mediators. Such a peer mediator architecture poses several challenges, e.g., for query optimization. [14] We have already illustrated how to compose mediators using the composable mediator system Amos II.
References [1] O. Bukhres and A. Elmagarmid (Eds.): Object-Oriented Multidatabase Systems. Prentice Hall, Englewood Cliffs, NJ, 1996. [2] U. Dayal and P.A. Bernstein: On the Correct Translation of Update Operations on Relational Views. Transactions on Database Systems, 7(3), 381–416, 1981. [3] U. Dayal and H-Y Hwang: View Definition and Generalization for Database Integration in a Multidatabase System. IEEE Transactions on Software Engineering, 10(6), 628–645, 1984. [4] W. Du and M. Shan: Query Processing in Pegasus. In O. Bukhres, A. Elmagarmid (Eds.): ObjectOriented Multidatabase Systems. Prentice Hall, Englewood Cliffs, NJ, 449–471, 1996. [5] G. Fahl and T. Risch: Query Processing over Object Views of Relational Data. The VLDB Journal, 6(4), 261–281, 1997. [6] H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y.Sagiv, J. Ullman, V. Vassalos, and J. Widom: The TSIMMIS approach to mediation: Data models and languages. Journal of Intelligent Information Systems (JIIS), 8(2), 117–132, 1997. [7] J-R. Gruser, L. Raschid, M.E. Vidal, and L. Bright: Wrapper Generation for Web Accessible Data Sources, 3rd Conference on Cooperative Information Systems (CoopIS’98), 1998. [8] L. Haas, D. Kossmann, E. Wimmers, and J. Yang: Optimizing Queries accross Diverse Data Sources. Proceedings of the International Conference on Very Large Data Based (VLDB’97), pp. 276–285, Athens, 1997. [9] V. Josifovski, T. Katchaounov, and T. Risch: Optimizing Queries in Distributed and Composable Mediators. 4th Conference on Cooperative Information Systems (CoopIS’99), pp. 291–302, 1999. [10] V. Josifovski and T. Risch: Functional Query Optimization over Object-Oriented Views for Data Integration. Intelligent Information Systems (JIIS), 12(2–3), 165–190, 1999. [11] V. Josifovski and T. Risch: Integrating Heterogeneous Overlapping Databases through ObjectOriented Transformations. 25th Conference on Very Large Databases (VLDB’99), 435–446, 1999. [12] V. Josifovski and T.Risch: Query Decomposition for a Distributed Object-Oriented Mediator System. Distributed and Parallel Databases, 11(3), 307–336, May 2001. [13] V. Josifovski, P. Schwarz, L. Haas, and E. Lin: Garlic: A New Flavor of Federated Query Processing for DB2, ACM SIGMOD Conference, 2002. [14] T. Katchaounov, V. Josifovski, and T. Risch: Scalable View Expansion in a Peer Mediator System, Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, March 2003. [15] T. Katchaounov, T. Risch, and S. Zürcher: Object-Oriented Mediator Queries to Internet Search Engines, International Workshop on Efficient Web-based Information Systems (EWIS), Montpellier, France, September 2, 2002.
Copyright 2005 by CRC Press LLC
C3812_C28.fm Page 18 Wednesday, August 4, 2004 8:53 AM
28-18
The Practical Handbook of Internet Computing
[16] A.M. Keller: The role of semantics in translating view updates, IEEE Computer, 19(1), 63–73, 1986. [17] G.J.L. Kemp, J.J. Iriarte, and P.M.D. Gray: Efficient Access to FDM Objects Stored in a Relational Database, Directions in Databases, Proceedings of the 12th British National Conference on Databases (BNCOD 12), pp. 170–186, Guildford, U.K., 1994. [18] M. Koparanova and T. Risch: Completing CAD Data Queries for Visualization, International Database Engineering and Applications Symposium (IDEAS’2002), Edmonton, Alberta, Canada, July 17–19, 2002. [19] A.Y. Levy, A. Rajaraman, and J.J. Ordille: Querying heterogeneous information sources using source descriptions. In Proceedings of the International Conference on Very Large Databases (VLDB’96), Mumbai, India, 1996. [20] H. Lin, T. Risch, and T. Katchaounov: Adaptive data mediation over XML data. In special issue on “Web Information Systems Applications” of Journal of Applied System Studies (JASS), 3(2), 2002. [21] W. Litwin and T. Risch: Main memory oriented optimization of OO queries using typed datalog with foreign predicates. IEEE Transactions on Knowledge and Data Engineering, 4(6), 517–528, 1992. [22] L. Liu and C. Pu: An adaptive object-oriented approach to integration and access of heterogeneous information sources. Distributed and Parallel Databases, 5(2), 167–205, 1997. [23] J. Melton, J. Michels, V. Josifovski, K. Kulkarni, P. Schwarz, and K. Zeidenstein: SQL and management of external data, SIGMOD Record, 30(1), 70–77, March 2001. [24] L. Popa, Y. Velegrakis, M. Hernandez, R. J. Miller, and R. Fagin: Translating Web Data, 28th International Conference for Very Large Databases (VLDB 2002), Hong Kong, August 2002. [25] T. Risch and V. Josifovski: Distributed Data Integration by Object-Oriented Mediator Servers. Concurrency and Computation: Practice and Experience, 13(11), September, 2001. [26] T. Risch, V. Josifovski, and T. Katchaounov: Functional data integration in a distributed mediator system. In P. Gray, L. Kerschberg, P. King, and A. Poulovassilis (Eds.): Functional Approach to Computing with Data, Springer-Verlag, New York, 2003. [27] M. Stonebraker and P. Brown: Object-Relational DBMSs: Tracking the Next Great Wave. Morgan Kaufmann, San Francisco, CA, 1999. [28] D. Shipman: The functional data model and the data language DAPLEX. ACM Transactions on Database Systems, 6(1), 140–173, 1981. [29] A. Tomasic, L. Raschid, and P Valduriez: Scaling access to heterogeneous data sources with DISCO. IEEE Transactions on Knowledge and Date Engineering, 10(5), 808–823, 1998. [30] V.M.P. Vidal and B.F. Loscio: Solving the Problem of Semantic Heterogeneity in Defining Mediator Update Translations, Proceedings of ER ’99, 18th International Conference on Conceptual Modeling, Lecture Notes in Computer Science 1728, 1999. [31] G. Wiederhold: Mediators in the architecture of future information systems. IEEE Computer, 25(3), 38–49,1991. [32] Xerces2 Java Parser, http://Xml.apache.org/Xerces2-j/, 2002.
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 1 Wednesday, August 4, 2004 8:55 AM
29 Introduction to Web Semantics CONTENTS Abstract 29.1 Introduction 29.2 Historical Remarks 29.3 Background and Rationale 29.3.1 Understanding Information on the Web 29.3.2 Creating Information on the Web 29.3.3 Sharing Information over the Web
29.4 Ontologies 29.5 Key Ontology Languages 29.5.1 RDF and RDF Schema 29.5.2 OWL
29.6 Discussion 29.6.1 Domain-Specific Ontologies 29.6.2 Semantic Web Services and Processes 29.6.3 Methodologies and Tools
Munindar P. Singh
29.7 Summary References
Abstract The Web as it exists currently is limited in that, although it captures content, it does so without any explicit representation of the meaning of that content. The current approach for the Web may be adequate as long as humans are intended to be the direct consumers of the information on the Web. However, involving humans as direct consumers restricts the scale of several applications. It is difficult for unassisted humans to keep up with the complexity of the information that is shared over the Web. The research program of Web semantics seeks to encode the meaning of the information on the Web explicitly so as to enable automation in the software tools that create and access the information. Such automation would enable a richer variety of powerful applications than have previously been possible.
29.1 Introduction The Web we know and love today has evolved into a practically ubiquitous presence in modern life. The Web can be thought of as a set of abstractions over data communication (in the nature of the hypertext transport protocol, better known as HTTP) and information markup (in the nature of the hypertext markup language or HTML).
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 2 Wednesday, August 4, 2004 8:55 AM
29-2
The Practical Handbook of Internet Computing
The Web has clearly been successful or we would not be talking about it here. However, its success has also exposed its limitations. For one, it is difficult to produce and consume the content on today’s Web. This is because a human must be involved in order to assign meaning to the content because the data structures used for that content are weak and concentrate on the presentation details. Thus the meaning, if any, is confined to text or images, which must be read and interpreted by humans. Researchers have realized almost since the inception of the Web that it is limited in terms of its representation of meaning. This has led to the vision of the Semantic Web in which the meaning of the content would be captured explicitly in a declarative manner and be reasoned about by appropriate software tools. The main advantage of such an encoding of Web semantics is that it would enable greater functionality to be shifted from humans to software. Software tools would be able to better exploit the information on the Web. They would be able to find and aggregate the right information to better serve the needs of the users and possibly to produce information for other tools to consume. Further, the availability of information with explicitly represented semantics would also enable negotiation among the various parties regarding the content. This vision of the Semantic Web was first promulgated by Tim Berners-Lee. We use the term Semantic Web to refer to Berners-Lee’s project and the term Web semantics to refer to semantics as dealing with the Web in general. Web semantics owes much of its intellectual basis to the study of knowledge modeling in artificial intelligence, databases, and software engineering. The arrival of the Web has given a major impetus to knowledge modeling simply because of the scale and complexity of it, which severely limits the effectiveness of ad hoc methods. Web semantics has expanded to become a leading subarea of Internet computing with a large number of active researchers and practitioners. Over the past few years, there has been much scientific activity in this area. Recently, a conference, International Conference on the Semantic Web; and an academic journal, Journal on Web Semantics, have been launched in this area. This chapter provides a high-level introduction to Web semantics. It deals with the key concepts and some of the techniques for developing semantically rich representations. Some other chapters carry additional relevant topics. Specifically, Kashyap [this volume] discusses ontologies and metadata; Brusilovsky and Nejdl [this volume] introduce adaptive hypermedia; Fisher and Sheth [this volume] present enterprise portals as an application of semantic technologies; and, [Arroyo et al., this volume] describe semantic Web services, which in simple terms involve an application of semantic techniques to the modeling of Web services.
29.2 Historical Remarks In simple terms, the history of the Web can be understood in terms of the increasing explicitness of what is represented. The earliest Web was cast in terms of HTML. HTML provides a predetermined set of tags, which are primarily focused on the presentation of content. An inability to express the structure of the content proves to be a serious limitation when the purpose is to mark up content in a general enough manner that it can be processed based on meaning. The work on HTML was predated by several years by work on markup techniques for text. The information retrieval community had developed the standardized general markup language (SGML) as a powerful approach to create arbitrary markups. Unfortunately, SGML proved too powerful. It was arcane and cumbersome and notoriously difficult to work through. Consequently, robust tools for creating and parsing SGML did not come into existence. Thus whereas the expressiveness and flexibility of SGML were attractive, its complexity was discouraging. This combination led to the design of a new language that was simple yet sufficiently expressive and — most importantly — extensible to accommodate novel applications. This language was formalized as the extensible markup language (XML) [Wilde, this volume]. The move to XML in conjunction with stylesheets separates the presentation from the intrinsic structure of the content. XML provides an ability to expand the set of tags based on what an application needs. Thus, it enables a richer variety of content structure to be specified in a manner that potentially respects the needs of the given applications. Additional stylesheets, also expressible in XML, enable the rendering of
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 3 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-3
the content specified in XML in a manner that can be processed by existing browsers. Stylesheets also have another important function, which is to transform XML content from one form into another. XML provides structure in terms of syntax, meaning that an XML document corresponds to a unique parse tree, which can be traversed in a suitable manner by the given application. However, XML does not nail down the structure that an application may give to the content that is specified. Consequently, there can be lots of nonstandard means to express the same information in XML. This led to a series of languages that capture content at a higher level of abstraction than XML. This chapter introduces the most established of these languages, namely, Resource Description Framework (RDF), RDF Schema (RDFS), and Web Ontology Language (OWL). In principle, these higher layers do not have to be mapped into XML, but usually they are, so as to best exploit the tools that exist for XML. But occasionally, especially when the verbosity of XML is a consideration, other representations can be used. The above development can be understood as a series of layers where the lower layers provide the basic syntax and the upper layers provide increasing shades of meaning. Berners-Lee presented his vision for the layers as what is known as the Layer Cake.
29.3 Background and Rationale Let’s consider some major use cases for the Web, which motivate the need for Web semantics.
29.3.1 Understanding Information on the Web The Web is designed for human consumption. The pages are marked up in a manner that is interpreted by the popular Web browsers to display a page so that a human viewing that page would be able to parse and understand the contents of the page. This is what HTML is about. HTML provides primitives to capture the visual or, more generally, the presentational aspects of a page. These pages do not directly reflect the structure of the content of the page. For example, whereas HTML captures the recommended type faces and type sizes of the text in a document, it does not capture the sections or subsections of the document. An even more telling example is when you access a form over the Web. The form may ask for various inputs, giving slots where a user can enter some data. The only clue as to the information required is in the names of the fields that are given in the adjacent text. For example, a field that appears next to a label “first name” would be understood as asking for the user’s first name. However, there are two potential shortcomings of such an approach. First, the user needs to assign meaning to words based on his or her tacit knowledge of what the given application may require. A software application can work with such a form only based on some rough and ready heuristics — for example, that the words “first name” indicate the first name. Second, when the meaning is subtle, it is unwieldy for both humans and software, because there is no easy way for the correct interpretation to be specified via ad hoc label names. For example, if the form were really meant to request certain relevant information, but the information to be requested was not known in advance, there would be no way for an application to guess the correct meaning. More concretely, assume you are using a software application that tries to order some medical supplies for your hospital. Would this application be able to correctly fill out the forms at a relevant site that it visits? Only to the extent that you can hard code the forms. Say it knows about shipping addresses. Would it be able to reason that the given site needs a physical address rather a post office box? It could, but only if the knowledge were appropriately captured.
29.3.2 Creating Information on the Web The Web is designed for the creation of information by humans. Information is gathered up on Web pages by humans and its markup is created, in essence, by hand. Tools can help in gathering the information and preparing the markup, but the key decisions must be made by humans. It is true that Web pages can be generated by software applications, either from documents produced by hand or from databases based on further reasoning. However, the structure that is given even to such
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 4 Wednesday, August 4, 2004 8:55 AM
29-4
The Practical Handbook of Internet Computing
dynamically generated Web pages is determined in an ad hoc manner by the programmers of the applications that produce such pages. It would be great if the schema for a Web page were designed or customized on the fly, based on the needs of the user for whom it was being prepared. The contents of the page could then automatically be generated based on the concepts needed to populate the page.
29.3.3 Sharing Information over the Web More generally, consider the problem of two or more parties wishing to share information. For example, these parties could be independent enterprises that are engaged in e-business. Clearly, the information to be shared must be transmitted in some appropriate fashion from one to the other. The information must be parsed in an unambiguous manner. Next it must be interpreted in the same way by the parties involved. Imagine the software applications that drive the interactions between the interacting parties. The applications interpret the information that is exchanged. The interpretation can be in the nature of the data structures (say, the object structures) that the applications serialize or stream into the information that they send over the wire, and the structures that they materialize or construct based on the information that they receive. Web semantics as understood here is about declaratively specifying the meaning of the information that is exchanged. Naturally this meaning takes the form of describing the object structures to which the exchanged information corresponds. An important special case of the above is configuring Web applications. Often information is not accessed directly but is used indirectly through specialized tools and applications, e.g., for capturing information and presenting it in a suitable manner to users. For example, when an enterprise resource planning (ERP) system is deployed in a particular enterprise, it must be instantiated in an appropriate manner for that specific enterprise. For example, a hospital billing system must deal with the hospital’s human resources (payroll) systems, with insurance companies, and with local government agencies, Thus installing a new billing system can be cumbersome. Even maintaining an existing system in the light of external changes, say, to the government regulations, is difficult. Likewise for Web applications involving e-business. For example, to operate a supply chain may involve suitable software applications at the interacting companies to be configured in such a manner as to respect the common information model that the companies have agreed upon. Traditionally, the knowledge models underlying such applications are implicit in sections of the procedural code for the given applications. In such cases, when an application is to be deployed in a new installation (e.g., at a new enterprise), it must be painstakingly tuned through a long process involving expensive consultants. However, if the application is modeled appropriately, it would be a simpler matter of refreshing its models for the particular enterprise or business context where it is being deployed.
29.4 Ontologies Although the current Web has its strong points, it is notorious for its lack of meaning. Since the content of the Web is captured simply in terms of the natural language text that is embedded in HTML markup, there is only a little that we can do with it. For example, the best that current search engines can do is, upon crawling Web pages, to index the words that occur on those pages. Users can search based on the words. And, when a user conducts a search, the engine can produce pages that include the given words. However, such searches miss out on the meaning of interest to the user. Capturing the meaning of the pages is an example of what Web semantics is about. Given an explicit representation of the meaning of a given page, a search engine would index the pages based on the meanings captured by pages rather than just the words that happen to occur on a page. In that manner, the engine would be able to support meaningful searches for its users.
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 5 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-5
We would not be able to capture the meaning for each page — or, more generally, information resource — in a piecemeal manner. Instead, we must model the knowledge with which the meaning of the given resource can be captured. That is, the knowledge of the domain of interest (i.e., the universe of discourse) would provide a basis for the semantics underlying the information even if across disparate resources. To put the above discussion of ontologies into a practical perspective, let us consider a series of examples from a practical domain involving a business transaction that might serve as one step of a supply chain. Let us consider a simple setting where some medical parts are ordered by one enterprise from another. First we consider a fragment of an order placed in XML. central yes
This could be alternatively expressed as follows:
Or even as the following fragment:
How can such varied forms be understood? One idea is to define transforms among these variations so that messages written in one form can be morphed into a second form and thus understood by software that is designed only to accommodate the latter form. Such transforms are introduced elsewhere [Wilde, this volume]. The question of interest here is how can we justify such syntactic transformations. Clearly, this must be through the meanings of the terms used; hence, the need for ontologies. An ontology is a formal representation of the conceptualization of a domain of interest. From the computer science perspective, there are three main kinds of things that constitute an ontology. • An ontology expresses classes or concepts. For example, a medical ontology may have concepts such as catheter, procedure, catheter insertion, and angioplasty. An ontology captures taxonomic relationships among its concepts. For example, we may have that angioplasty is a kind of catheter insertion, and catheter insertion is a kind of procedure. Similarly, we can define blood vessel, artery, vein, jugular vein, and carotid artery with the obvious taxonomic relationships among them. Such taxonomic relationships are treated specially because they are at the very core of the space of concepts expressed by an ontology. • An ontology expresses relationships among the concepts. For example, we may have a relationship usedln in our medical ontology, which relates catheters to catheter insertion. Cardinality constraints may be stated over these relationships. For example, we may require that at least one catheter be used in a catheter insertion. • An ontology may express additional constraints. Ontologies can be represented in various ways. Well-known approaches include the following. • Frame systems. Each frame is quite like a class in an object-oriented programming language. The connection is not coincidental; frame systems predate object-oriented languages. Thus each frame corresponds to a concept. Relationships among concepts are captured as the slots of various frames. For example, we may have a frame called catheter insertion, which has a slot called cathetersUsed. This slot is filled with a set of catheters. And we may have another slot called intoVessel to capture the blood vessel into which the insertion takes place. • Description logics. In these approaches, concepts are defined through formal expressions or descriptions that refer to other concepts. Based on the formal semantics for the language, it is possible to determine whether one description subsumes another, i.e., refers to a larger class than the other. The language chosen determines the complexity of this computation. For example, we may define
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 6 Wednesday, August 4, 2004 8:55 AM
29-6
The Practical Handbook of Internet Computing
a new procedure called venal catheter insertion as a kind of catheter insertion whose intoVessel relationship must refer to a vein. • Rules, which capture constraints on the taxonomic and other relationships that apply. In recent work, rules are used mainly to capture constraints on data values and taxonomic relationships are left to one of the above approaches. An example of a rule might be that for credit card payments by new customers or above a certain amount, the shipping address must be the same as the billing address of the credit card. The subsumption hierarchy computed in description logics recalls the explicit class hierarchy captured by the frame system. However, in description logics, the hierarchy is derived from the definitions of the classes, whereas the hierarchy is simply given in frame systems. Frame systems have a certain convenience and naturalness, whereas description logics have a rigorous formal basis but can be unintuitive for untrained users. The past several years have seen a convergence of the two approaches. This research, with support from the U.S. Defense Advanced Research Projects Agency (DARPA), led to the DARPA Agent Markup Language (DAML), and with support from the European Union to the Ontology Interchange Language (OIL). OIL was sometimes referred to as the Ontology Inference Language. DAML and OIL were combined into DAML+OIL, which has evolved into a W3C draft known as the Web Ontology Language (OWL) [McGuiness and van Harmelen, 2003]. This chapter considers only OWL, since that is the current direction.
29.5 Key Ontology Languages This section introduces the main ontology languages used for Web semantics. Extensive literature is available on these languages describing their formal syntax and semantics. Such details are beyond the scope of this chapter. Instead, this chapter seeks only to introduce the languages at a conceptual level. In principle, ontologies can be represented in a variety of ways. Consider the ontology dealing with medical terms that was informally described above. We can certainly encode this in XML. For example, we may come up with the following simple, if somewhat contrived, solution:
Before even considering the merits of the above approach, we can see that it has some immediate shortcomings along the lines of the shortcomings of XML that were described above. A multiplicity of such ontology representations is possible. Unless we nail down the representation, we cannot exchange the ontologies and we cannot create tools that would process such representations. For this reason, it became clear that a standardized representation was needed in which ontologies could be captured and which lay above the level of XML.
29.5.1 RDF and RDF Schema The first such representation is the Resource Description Framework (RDF), which provides a standard approach to express graphs [Decker et al., 2000]. RDF is not exclusively tied to XML, but can be rendered into a variety of concrete syntaxes, including a standard syntax that is based on XML. An RDF specification describes a simple information model, and corresponds to a particular knowledge representation that the given model can have. RDF is general enough to capture any knowledge representations that corre-
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 7 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-7
spond to graphs. For example, RDF versions of taxonomies such as conventionally used in software class diagrams can be readily built. The basic concept of RDF is that it enables one to encode statements. Following simplified natural language, each statement has an object, a subject, and a predicate. For example, given the English sentence, “catheters are used in angioplasty,” we would say that “catheters” is the subject, “angioplasty” is the object, and “used” is the predicate. This analysis is quite naive in that we do not consider any of the subtleties of natural language, such as tense or active vs. passive voice. However, this analysis provides an elegant starting point for computational representation. RDF enables us to encode information in the form of triples, each of which consists of a subject, an object, and a property or predicate. Subjects of RDF statements must be resources, i.e., entities. These entities must have an identity given via a URI. As usual, the main point about URIs is that they are unique; there is no assumption that the URI corresponds to a physical network address or that the entity so named is a physical entity. The objects of RDF statements must be resources or literals, which are based on general data types such as integers. If they are resources, they can be described further by making them subjects of other statements; if they are literals, the description would bottom out with them. Properties must also be resources. By linking statements via the subjects or objects that are common to them we can construct general graphs easily. The vertices of a graph would correspond to resources that feature as subjects or objects. The edges of a graph would correspond to statements — with the origin of an edge being the subject, the target of the edge being the object, and the label of the edge being the property. Because each RDF statement has exactly three parts — its subject, object, and property — it is essential to have some additional mechanisms by which more complex structures can be encoded in RDF. One of them is the use of certain containers. RDF defines Bag (unordered collection with duplicates allowed), Seq (ordered collection also with duplicates allowed), and Alt (disjoint union). Members of containers are asserted via rdf:li. RDF schematically defines properties to indicate membership in the containers. These are written _1, _2, and so on. The second important mechanism in RDF is reification. That is, RDF statements can be reified, meaning that they can be referred to from other statements. In other words, statements can be treated as resources and can be subjects or objects of other statements. Thus complex graphs can be readily encoded in RDF. For example, we can encode a statement that asserts (as above) that angioplasty is a kind of catheter insertion, and another statement which asserts that the first statement is false, and yet another statement which asserts that the first statement is true except for neonatal patients. The essence of this example is that when statements are reified, we can assert further properties of them in a natural manner. The following is a description of our example ontology in RDF. Using XML namespaces, RDF is associated with a standard namespace in which several general RDF terms are defined. By convention, the namespace prefix rdf is used to identify this namespace. The primitives in rdf include rdf:type, which is a property that states the class of a given resource. To support reification, RDF includes a type called rdf:Statement; all statements are resources and are of rdf:type rdf:Statement. Each rdf:Statement has three main properties, namely, rdf:subject, rdf:object and rdf:predicate, which can be thought of as acecssor functions for the three components of a statement. Other useful primitives are rdf:Description, and attributes rdf:resource, rdf:ID and rdf:about. The primitive rdf:Description creates the main element about which additional properties are stated. However, RDF, too, leaves several key terms to be defined by those who use RDF. For example, one person may build an RDF model of a taxonomy using a term subCategory, and another person may use the term subset for the same purpose. When such arbitrary terms are selected by each modeler, the models cannot be related, compared, or merged without human intervention. To prevent this problem, the RDF schema language (RDFS) specifies a canonical set of terms using which simple taxonomies can be unambiguously defined. In more general terms, RDFS is best understood as a system for defining application-specific RDF vocabularies. RDFS defines a standard set of predicates that enable simple semantic relationships to be
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 8 Wednesday, August 4, 2004 8:55 AM
29-8
The Practical Handbook of Internet Computing
captured by interpreting vertices as classes and edges as relationships. RDFS standardizes a namespace, which is conventionally abbreviated as rdfs. In simple terms, RDFS defines primitives that build on rdf and impose standard interpretations. The main primitives include rdfs:Class (a set of instances, each of which has rdf:type equal to the given class); rdfs:subClassOf (a property indicating that instances of its subject class are also instances of its object class, thereby defining the taxonomy); and rdfs:Resource, which is the set of resources. Next, for properties, RDFS includes rdf:Property, the class of properties, which is defined as an instance of rdfs:Class and rdfs:subPropertyOf forms a taxonomy over properties. The properties rdfs:range and rdfs:domain apply to properties and take classes as objects. Multiple domains and ranges allowed for a single property and are interpreted as conjunctions of the given domains and ranges. Further, rdfs:Resource is an instance of rdfs:Class; rdfs:Literal is the class of literals, i.e., strings and integers; rdfs:Datatype is the class of data types; each data type is a subclass of rdfs:Literal. The following listing gives an RDF rendition of the above example ontology. This formulation uses RDF Schema primitives introduced above, so it is a standard formulation in that respect.
29.5.2 OWL Although RDFS provides the key primitives with which to capture taxonomies, there are often several other kinds of refinements of meaning that must be specified to enable the unambiguous capture of knowledge about information resources. These are captured through the more complete ontology languages, the leading one of which is the Web Ontology Language (OWL) [McGuiness and van Harmelen, 2003]. Specifications in OWL involve classes and properties that are defined in terms of various constraints and for which the taxonomic structure (e.g., the subclass relationship) can be inferred from the stated definitions. Such specifications include constraints on cardinality and participation that are lacking from RDFS. Of course, such constraints could be syntactically encoded in XML (as, indeed, they are). However, what OWL does, in addition, is give them a standard interpretation. Any compliant implementation of OWL is then required to process such specifications in the standard manner. OWL is defined as a set of three dialects of increasing expressivity and named OWL Lite, OWL DL (for description logic), and OWL Full, respectively. In this introduction, we will simply describe the main features of OWL without regard to the dialect. OWL includes the class and property primitives derived from RDF and RDF Schema. These are enhanced with a number of new primitives, For classes, we have equivalenceClass and disjointWith, among others. Booleans such as intersection, union, and complementation can also be asserted. For properties, we can declare properties as transitive and symmetric and whether they are functional, meaning that each domain element is mapped to at most range element. Cardinality restrictions can also
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 9 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-9
be stated about the properties. The most interesting are the property type restrictions. The primitive allValuesFrom restricts a property as applied on a class (which should be a subclass of its domain) to take values only from a specified class (which should be a subclass of its range). The primitive someValuesFrom works analogously. The following listing gives an OWL representation for our example ontology. For most of its entries, this listing resembles the RDF Schema version given above. However, it leads to more expressiveness when we consider the cardinality and the property restrictions. $ld: Surgery.owl,v1.0 2003/11/01$ An Ontology for Surgery Catheter Procedure Catheter Insertion Angioplasty
The above is largely self-explanatory. However, OWL enables further structure. The simplest enhancement is to assert constraints about cardinality, an example of which follows. 1
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 10 Wednesday, August 4, 2004 8:55 AM
29-10
The Practical Handbook of Internet Computing
The power of description logics is most apparent when we capture more subtle kinds of ontological constraints. Let us consider the example about venal catheter insertion, which is defined as catheter insertion that operates on a vein. To illustrate this, we also define arterial catheter insertion. Also note that veins and arteries are defined to be disjoint kinds of blood vessels. Now a description logic reasoner can infer that venal catheter insertion must be disjoint with arterial catheter insertion. Blood Vessel
29.6 Discussion The above development was about the representation of domain knowledge in standard representations. In some cases, an ontology that is agreed upon by interacting parties would be sufficient for them to work together. For example, an ontology of surgical components might form the basis of a surgical supplies catalog and the basis for billing for medical components and procedures. In many other cases, such an ontology would merely be the starting point for capturing additional application-specific knowledge. Such knowledge would be used to mark up information resources in a manner that can be comprehended and processed by others across the Web.
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 11 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-11
A number of tools now exist that offer a variety of key functionality for developing ontologies. This is, however, a fast-changing area. Jena is an open-source RDF and RDPS toolkit, which includes support for querying and inference over RDF knowledge bases [McBride, 2002]. Protégé [Protégé, 2000] and OilEd [Bechhofer et al., 2001] are well-known tools for building and maintaining ontologies. Ontology tools can require fairly sophisticated reasoning, which can easily prove intractable. Although the formal foundations are well-understood now, work is still ongoing to identify useful sublanguages that are tractable. The work of Horrocks and colleagues is an important contribution in this regard, e.g., [Horrocks et al., 1999].
29.6.1 Domain-Specific Ontologies OWL gives us the essence of Web semantics, at least as far as the basic information is concerned. However, a lot of work based on OWL is required so as to make ontologies practical. The first tasks deal with specifying standardized ontologies for different application domains. These enable narrower knowledge models that are built on top of them to be comprehended by the various parties, For example, a medical (surgery) ontology would define concepts such as “incision” and “catheter” in a manner that is acceptable to surgeons given their current practice. Thus, someone could use that ontology to describe a particular cardiac procedure in which an incision is made and a catheter inserted into an artery. Then another surgery tool would be able to interpret the new cardiac procedure or at least to relate it to existing cardiac procedures. Clearly, these are details that are specific to the given application domain. Equally clearly, if there were no agreement about these terms (say, among surgeons), it would be difficult for the given procedures to be understood the same way (even if everyone used OWL). Understanding the key terms is crucial so that their meanings can be combined — or, more precisely, so that if the meanings are automatically combined, then the result is comprehensible and sensible to the participants in the domain. A challenge that large-scale vocabularies open up is ensuring their quality [Schulz et al., 1998]. Ideally a vocabulary would have no more than one term for the same concept, but even such a simple constraint can be difficult to enforce.
29.6.2 Semantic Web Services and Processes Another natural extension to the above deals with information processing. In other words, whereas the above deals with information as it is represented on the Web, we must supplement it with representations of how the information may be modified. Such representations would enable the processes to interoperate and thereby lead to superior distributed processes. Semantic Web Services apply the techniques of Web semantics to the modeling and execution of Web services. In simple terms, semantic Web services approaches apply semantics in the following ways: • Modeling the information that is input to or output from Web services • Describing ontologies for the nonfunctional attributes of Web services, such as their performance, reliability, and the quality of their results, among others • Describing ontologies that capture the process structure of Web services and conversations that they support The above lead into formalizations of more general models of processes and protocols as well as of contract languages and policies, e.g., Grosof and Poon [2003]. The motivation for these formalization is similar to that for domain-specific ontologies in that they streamline the design of tools and applications that deal with information processing.
29.6.3 Methodologies and Tools Ontology development is a major challenge. It is sometimes said to suffer from the “two Ph.D.s” problem meaning that a knowledge modeler must be a specialist both in the domain of interest and in the
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 12 Wednesday, August 4, 2004 8:55 AM
29-12
The Practical Handbook of Internet Computing
knowledge modeling profession. Consequently, a lot of the work on ontologies is about making this task simpler and scalable. A conventional view is that knowledge modeling is an effort that is separate from and precedes knowledge use. Clearly, when an ontology exists in a given domain, it should be used. However, an existing ontology would often not be adequate. In such a case, it would need to be extended while it is being used. We suggest that such extensions would and should be attempted primarily on demand; otherwise, the modeling effort will be unmotivated and expensive and will only tend to be put off. Ontology management has drawn a lot of attention in the literature. Ontology management refers to a number of functionalities related to creating, updating, maintaining, and versioning ontologies [Klein, 2001]. These challenges have been around since the earliest days of ontologies and have been especially well-studied in the context of clinical terminology development. For example, the Galapagos project considered the challenges in merging terminologies developed in a distributed manner [Campbell et al., 1996]. Galapagos encountered the challenges of version management through locking, as well as the semantic problems of resolving distinctions across terminologies. The key aspects of ontology management studied in the recent literature include how mismatches occur among ontologies — language, concept, paradigm (e.g., how time is modeled by different people), and so on — and how ontology versions are created and maintained. This is especially important because distributed development over the Web is now the norm [Heflin and Hendler, 2000]. A recent evaluation of existing tools by Das et al. [2001] reveals that while the knowledge representation capabilities of these tools surpass the requirements, their versioning capabilities remain below par. A number of heuristics have been defined to resolve ontology mismatches, e.g., see Noy and Musen [2002]. Potential matches are suggested to a user, who can decide if the matches are appropriate. Under some reasonable assumptions, the task can be made greatly tractable for tools for knowledge modeling. • The various models are derived from a common model, which could have been proposed say by a standardization group in the domain of the given ontology. When the upper parts of the model are fixed, changes in the rest of it are easier to accommodate. • Editing changes made to the models are available, so that several changes can be reapplied to the original model when a derived model is to be consolidated in it. If immutable internal identifiers are used for the terms, that facilitates applying the editing changes unambiguously. • Simple granular locking can be applied whenever the tasks of developing a model are parceled out to members of a domain community. The people granted the locks have an advantage in that their suggested changes would propagate by default. Other users should restrict their changes in the components where there is a potential conflict. This is clearly pessimistic, but can still be effective in practice.
29.7 Summary This chapter provided a brief introduction to Web semantics. The main take-away message is that Web semantics is here to stay. Further, Web semantics pervades Internet computing. It applies not only to traditional Web applications such as Web browsing but to all aspects of information management over the Web. Web semantics promises not only improved functionality to users but also enhanced productivity for programmers and others who manage information resources. However, important challenges remain, the handling of which will determine how quickly Web semantics propagates into applications of broad appeal, but already efforts are under way to address those challenges.
Copyright 2005 by CRC Press LLC
C3812_C29.fm Page 13 Wednesday, August 4, 2004 8:55 AM
Introduction to Web Semantics
29-13
References Arroyo, Sinuhe, Ruben Lara, Juan Miguel Gomez, David Berka, Ying Ding, and Dieter Fensel. Semantic aspects of Web services. In The Practical Handbook of Internet Computing, Munindar P. Singh, Ed., CRC Press, Boca Raton, FL, 2005. Bechhofer, Sean, Ian Horrocks, Carole Goble, and Robert Stevens. OilEd: A reasonable ontology editor for the semantic Web. In Proceedings of KI-2001, Joint German/Austrian Conference on Artificial Intelligence, volume 2174 of Lecture Notes in Computer Science, pages 396–408. Springer-Verlag, Berlin, September 2001. Brusilovsky, Peter and Wolfgang Nejdl. Adaptive hypermedia and adaptive Web. In The Practical Handbook of Internet Computing, Vol. 2 Munindar P. Singh, Ed., CRC Press, Boca Raton, FL, 2005. Campbell, Keith E., Simon P. Cohn, Christopher G. Chute, Glenn D. Rennels, and Edward H. Shortliffe. Gálapagos: Computer-based support for evolution of a convergent medical terminology. In Proceedings of the AMIA Annual Fall Symposium, Washington, D.C., pages 269–273, 1996. Das, Aseem, Wei Wu, and Deborah L. McGuinness. Industrial strength ontology management. In Proceedings of the International Semantic Web Working Symposium (SWWS), Palo Alto, CA, pages 17–37, 2001. Decker, Stefan, Prasenjit Mitra, and Sergey Melnik. Framework for the semantic Web: An RDF tutorial. IEEE Internet Computing, 4(6): 68–73, November 2000. Fisher, Mark and Amit Sheth. Semantic enterprise content management. In The Practical Handbook of Internet Computing, Vol. 2 Muninder P. Singh, Ed., CRC Press, Boca Raton, FL, in press. Grosof, Benjamin N. and Terrence C. Poon. SweetDeal: Representing agent contracts with exceptions using XML rules, ontologies, and process descriptions. In Proceedings of the 12th International Conference on the World Wide Web, Budapest, Hungary, pages 340–349, 2003. Heflin, Jeff and James A. Hendler. Dynamic ontologies on the Web. In Proceedings of American Association for Artificial Intelligence Conference (AAAI), Cape Cod, MA, pages 443–449, 2000. Horrocks, Ian, Ulrike Sattler, and Stephan Tobies. Practical reasoning for expressive description logics. In Proceedings of the 6th International Conference an Logic for Programming and Automated Reasoning (LPAR), Tbilisi, pages 161–180, 1999. Kashyap, Vipul. Information modeling on the Web. In The Practical Handbook of Internet Computing, Vol. 2 Munindar P. Singh, Ed., CRC Press, Boca Raton, FL, 2005. Klein, Michel. Combining and relating ontologies: An analysis of problems and solutions. In Proceedings of the IJCAI Workshop on Ontologies and Information Sharing, Seattle, WA, 2001. McBride, Brian. Jena: A semantic Web toolkit. IEEE Internet Computing, 6(6): 55–59, November 2002. McGuiness, Deborah L. and Frank van Harmelen. Web Ontology Language (OWL): Overview. www.w3.org/TR/2003/WD-owl-features-20030210/, W3C working draft, February 2003. Noy, Natalya Fridman and Mark A. Musen. PROMPTDIFF: A fixed-point algorithm for comparing ontology versions. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Edmonton, Alberta, Canada, pages 744–750, 2002. Protégé. The Protégé ontology editor and knowledge acquistion system. http://protege.stanford.edu, 2000. Schulz, Erich B., James W. Barrett, and Colin Price. Read code quality assurance: From simple syntax to semantic stability. Journal of the American Medical Informatics Association, 5: 337–346,1998, Wilde, Erik. XML core technologies. In The Practical Handbook of Internet Computing, Vol. 2 Munindar P. Singh, Ed., CRC Press, Boca Raton, FL, 2005.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 1 Wednesday, August 4, 2004 8:57 AM
30 Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies CONTENTS 30.1 Introduction 30.2 What is Metadata? 30.2.1 Metadata Usage in Various Applications 30.2.2 Metadata: A Means for Modeling Information
30.3 Metadata Expressions: Modeling Information Content 30.3.1 The InfoHarness System: Metadata-Based Object Model for Digital Content 30.3.2 Metadata-Based Logical Semantic Webs 30.3.3 Modeling Languages and Markup Standards
30.4 Ontology: Vocabularies and Reference Terms for Metadata 30.4.1 30.4.2 30.4.3 30.4.4
Terminological Commitments: Constructing an Ontology Controlled Vocabulary for Digital Media Ontology-Guided Metadata Extraction Medical Vocabularies and Terminologies: The UMLS Project 30.4.5 Expanding Terminological Commitments across Multiple Ontologies
Vipul Kashyap
30.5 Conclusions References
The Web consists of huge amounts of data available in a variety of digital forms stored in thousands of repositories. Approaches that use semantics of information captured in metadata extracted from the data underlying are being viewed as an appealing approach, especially in the context of the Semantic Web effort. We present in this chapter a discussion on approaches adopted for metadata-based information modeling on the web. Various types of metadata developed by researchers for different media are reviewed and classified with respect to the extent they model data or information content. The reference terms and ontology used in the metadata are classified with respect to their dependence on the application domain. We discuss approaches for using metadata to represent the context of the information request, the interrelationships between various pieces of data, and for their exploitation for search, browsing, and querying the information. Issues related to the use of terminologies and ontologies, such as establishing and maintaining terminological commitments, and their role in metadata design and extraction are also discussed.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 2 Wednesday, August 4, 2004 8:57 AM
30-2
The Practical Handbook of Internet Computing
Modeling languages and formats, including the most recent ones, such as the Resource Description Framework (RDF) and the DARPA Agent Markup Language (DAML+OIL) are also discussed in this context.
30.1 Introduction The World Wide Web [Berners-Lee et al., 1992] consists of huge amounts of digital data in a variety of structured, unstructured (e.g., image) and sequential (e.g., audio, video) formats that are either stored as Web data directly manipulated by Web servers, or retrieved from underlying database and content management systems and served as dynamically generated Web content. Whereas content management systems support creation, storage, and access functions in the context of the content managed by them, there is a need to support correlation across different types of digital formats in media-independent, content-based manner. Information relevant to a user or application need may be stored in multiple forms (e.g., structured data, text, image, audio and video) in different repositories and Websites. Responding to a user’s information request typically requires correlation of such information across multiple forms and representations. There is a need for association of various pieces of data either by preanalysis by software programs or dynamic correlation of information in response to an information request. Common to both the approaches is the ability to describe the semantics of the information represented by the underlying data. The use of semantic information to support correlation of heterogeneous representations of information is one of the aims of the current Semantic Web effort [Berners-Lee et al., 2001]. This capability of modeling information at a semantic level both across different types of structured data (e.g., in data warehouses) and across different types of multimedia content, is missing on the current Web and has been referred to as the “semantic bottleneck” [Jain, 1994]. Machine-understandable metadata and standardized representations thereof form the foundation of the Semantic Web. The Resource Description Framework (RDF) [Lassila and Swick] and XML [Bray et al.] based specifications are currently being developed in an effort to standardize the formats for representing metadata. It is proposed that the vocabulary terms used to create the metadata will be chosen from third-party ontologies available from the Web. Standardized specifications for representing ontologies include XML and RDF schemas, DARPA Agent Markup Language (DAML+OIL) [DAML+OIL] and the Web Ontology Language (OWL) [OWL]. In this chapter, we present issues related to the use of metadata, semantics, and ontologies for modeling information on the Web organized in a three-level framework (Figure 30.1): • The middle level represents the metadata component involving the use of metadata descriptions to capture the information content of data stored in Websites and repositories. Intensional descriptions constructed from metadata are used to abstract from the structure and organization of data and specify relationships across pieces of interest.
VOCABULARY used-by INFORMATION CONTENT abstracted-into REPRESENTATION
FIGURE 30.1 Key issues for information modeling.
Copyright 2005 by CRC Press LLC
ONTOLOGICAL-TERMS (domain, application specific) used-by METADATA (content descriptions, intensional) abstracted-into DATA (heterogeneous types, media)
C3812_C30.fm Page 3 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-3
• The top level represents the ontology component, involving terms (concepts, roles) in domainspecific ontologies used to characterize metadata descriptions. These terms capture pieces of domain knowledge that describe relationships between data items (via association with the terms) across multiple repositories, enabling semantic interoperability. The organization of this chapter is as follows. In Section 30.2, we discuss a definition of metadata, with various examples. A classification of metadata based on the information content they capture is presented along with its role in modeling information. In Section 30.3, we discuss how metadata expressions can be used to model interrelationships between various pieces of information within a dataset and across multiple datasets. We also present an account of various modeling and markup languages that may be used to model the information represented in the data. Finally, in Section 30.4, we present issues related to the use of reference terms and ontological concepts for creating metadata descriptions. Section 30.5 presents the conclusions.
30.2 What is Metadata? Metadata in its most general sense is defined as data or information about data. For structured databases, the most common example of metadata is the schema of the database. However, with the proliferation of various types of multimedia data on the Web, we shall refer to an expanded notion of metadata, of which the schema of structured databases is a (small) part. Metadata may be used to store derived properties of media useful in information access or retrieval. They may describe, or be a summary of the information content of the data described in an intensional manner. They may also be used to represent properties of, or relationships between individual objects of heterogeneous types and media. Figure 30.1 illustrates the components for modeling information on the Web. Metadata is the pivotal idea on which both the (ontology and metadata) components depend. The function of the metadata descriptions is twofold: • To enable the abstraction of representational details such as the format and organization of data and capture the information content of the underlying data independent of representational details. These expressions may be used to represent useful relationships between various pieces of data within a repository or Website. • To enable representation of domain knowledge describing the information domain to which the underlying data belongs. This knowledge may then be used to make inferences about the underlying data to determine the relevance and identify relationships across data stored in different repositories and Websites. We now discuss issues related to metadata from two different perspectives identified in Boll et al. [1998], such as the usage of metadata in various applications and the information content captured by the metadata.
30.2.1 Metadata Usage in Various Applications We discuss a set of application scenarios that require functionality for manipulation and retrieval of digital content that are relevant to the Web. The role of metadata, especially in the context of modeling information to support this functionality, is discussed. 30.2.1.1 Navigation, Browsing, and Retrieval from Image Collections An increasing number of applications, such as those in healthcare, maintain large collections of images. There is a need for semantic content based navigation, browsing, and retrieval of images. An important issue is to associate a user’s semantic impression with images, e.g., image of a brain tumor. This requires knowledge of spatial content of the image and the way it changes or evolves over time, which can be represented as metadata annotations.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 4 Wednesday, August 4, 2004 8:57 AM
30-4
The Practical Handbook of Internet Computing
30.2.1.2 Video In many applications relevant to news agencies, there exist collections of video footage which need to be searched based on semantic content, e.g., videos containing field goals in a soccer game. This gives rise to the same set of issues as described above, such as the change in the spatial positions of various objects in the video images (spatial evolution). However, there is a temporal aspect to videos that was not captured above. Sophisticated time-stamp based schemes can be represented as a part of the metadata annotations. 30.2.1.3 Audio and Speech Radio stations collect many, if not all, of their important and informative programs, such as radio news, in archives. Parts of such programs are often reused in other radio broadcasts. However, to efficiently retrieve parts of radio programs, it is necessary to have the right metadata generated from, and associated with, the audio recordings. An important issue here is capturing in text the essence of the audio, in which vocabulary plays a central role. Domain-specific vocabularies can drive the metadata extraction process, making it more efficient. 30.2.1.4 Structured Document Management As the publishing paradigm is shifting from popular desktop publishing to database-driven Web-based publishing, processing of structured documents becomes more and more important. Particular document information models, such as SGML [SGML] and XML, introduce structure and content-based metadata. Efficient retrieval is achieved by exploiting document structure, as the metadata can be used for indexing, which is essential for quick response times. Thus, queries asking for documents with a title containing “Computer Science” can be easily optimized. 30.2.1.5 Geographic and Environmental Information Systems These systems have a wide variety of users that have very specific information needs. Information integration is a key requirement, which is supported by provision of descriptive information to end users and information systems. This involves issues of capturing descriptions as metadata and reconciling the different vocabularies used by the different information systems in interpreting the descriptions. 30.2.1.6 Digital Libraries Digital libraries offer a wide range of services and collections of digital documents, and constitute a challenging application area for the development and implementation of metadata frameworks. These frameworks are geared towards description of collections of digital materials such as text documents, spatially referenced datasets, audio, and video. Some frameworks follow the traditional library paradigm with metadata-like subject headings [Nelson et al., 2001] and thesauri [Lindbergh et al., 1993]. 30.2.1.7 Mixed-Media Access This is an approach that allows queries to be specified independent of the underlying media types. Data corresponding to the query may be retrieved from different media such as text and images and “fused” appropriately before being presented to the user. Symbolic metadata descriptions may be used to describe information from different media types in a uniform manner.
30.2.2 Metadata: A Means for Modeling Information We now characterize different types of metadata based on the amount of information content they capture and present a classification of metadata types used by various researchers (Table 30.1). 30.2.2.1 Content Independent Metadata This type of metadata captures information that does not depend on the content of the document with which it is associated. Examples of this type of metadata are location, modification date of a document, and type of sensor used to record a photographic image. There is no information content captured by these metadata, but these might still be useful for retrieval of documents from their actual physical
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 5 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-5
TABLE 30.1 Metadata for Digital Media Metadata Q-Features R-Features Impression vector NDVI, Spatial registration Speech feature index Topic change indices Document vectors Inverted indices Content classification metadata Document composition metadata Metadata templates Land-cover, relief Parent–child relationships Contexts Concepts from Cyc User’s data attributes Medical subject headings Domain-specific ontologies
Media/Metadata type Image, video/Domain specific Image, video/Domain independent Image/Content descriptive Image/Domain specific Audio/Direct content-based Audio/Direct content-based Text/Direct content-based Text/Direct content-based Multimedia/Domain specific Multimedia/Domain independent Media independent/Domain specific Media independent/Domain specific Text/Domain independent Structured databases/Domain specific Structured databases/Domain specific Text, Structured databases/Domain specific Text databases/Domain specific Media independent/Domain specific
locations and for checking whether the information is current or not. This type of metadata helps to encapsulate information into units of interest and organizes their representation within an object model. 30.2.2.2 Content Dependent Metadata This type of metadata depends on the content of the document it is associated with. Examples of contentdependent metadata are size of a document, maxcolors, number of rows, and number of columns of an image. These type of metadata typically capture representational and structural information and provide support for browsing and navigation of the underlying data. Content-dependent metadata can be further subdivided as follows: 30.2.2.2.1 Direct Content-Based Metadata This type of metadata is based directly on the contents of a document. A popular example of this is fulltext indices based on the document text. Inverted tree and document vectors are examples of this type of metadata. Media-specific metadata such as color, shape, and texture are typically direct content-based metadata. 30.2.2.2.2 Content-Descriptive Metadata This type of metadata describes information in a document without directly utilizing its contents. An example of this metadata is textual annotations describing the contents of an image. This metadata comes in two flavors: 30.2.2.2.2.1 Domain-Independent Metadata — These metadata capture information present in the document independent of the application or subject domain of the information, and are primarily structural in nature. They often form the basis of indexing the document collection to enable faster retrieval. Examples of these are C/C++ parse trees and HTML/SGML document type definitions. Indexing a document collection based on domain independent metadata may be used to improve retrieval efficiency. 30.2.2.2.2.2 Domain-Specific Metadata — Metadata of this type is described in a manner specific to the application or subject domain of the information. Issues of vocabulary become very important in this case, as the metadata terms have to be chosen in a domain-specific manner. This type of metadata, which helps abstract out representational details and capture information meaningful to a particular application or subject domain, is domain-specific metadata. Examples of such metadata are relief, land-cover from the geographical information domain, and medical subject headings (MeSH) from the medical domain.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 6 Wednesday, August 4, 2004 8:57 AM
30-6
The Practical Handbook of Internet Computing
In the case of structured data, the database schema is an example of domain-specific metadata, which can be further categorized as: Intra-Domain-Specific Metadata These type of metadata capture relationships and associations between data within the context of the same information domain. For example, the relationship between the CEO and his corporation is captured within a common information domain, such as the business domain. Inter-Domain-Specific Metadata These type of metadata capture relationships and associations between data across information domains. For example, the relationship between (medical) instrument and (legal) instrument spans across the medical and legal information domains. 30.2.2.2.2.3 Vocabulary for Information Content Characterization — Domain-specific metadata can be constructed from terms in a controlled vocabulary of terms and concepts, e.g., the biomedical vocabularies available in the Unified Medical Language System (UMLS) [Lindbergh et al., 1993], or a domainspecific ontology, describing information in an application or subject domain. Thus, we view ontologies as metadata, which themselves can be viewed as a vocabulary of terms for construction of more domainspecific metadata descriptions. 30.2.2.2.2.4 Crisp vs. Fuzzy Metadata — This is an orthogonal dimension for categorization. Some of the metadata referred to above are fuzzy in nature and are modeled using statistical methods, e.g., document vectors. On the other hand, other metadata annotations might be of a crisp nature, e.g., author name. In Table 30.1 we have surveyed different types of metadata used by various researchers. Q-Features and R-Features were used for modeling image and video data [Jain and Hampapur, 1994]. Impression vectors were generated from text descriptions of images [Kiyoki et al., 1994]. NDVI and spatial registration metadata were used to model geospatial maps, primarily of different types of vegetation [Anderson and Stonebraker, 1994]. Interesting examples of mixed media access are the speech feature index [Glavitsch et al., 1994] and topic change indices [Chen et al., 1994]. Metadata capturing information about documents are document vectors [Deerwester et al., 1990], inverted indices [Kahle and Medlar, 1991], document classification and composition metadata [Bohm and Rakow, 1994], and parent–child relationships (based on document structure) [Shklar et al., 1995c]. Metadata Templates [Ordille and Miller, 1993] have been used for information resource discovery. Semantic metadata such as contexts [Sciore et al., 1992; Kashyap and Sheth, 1994], land-cover, relief [Sheth and Kashyap, 1996], Cyc concepts [Collet et al., 1991], concepts from domain ontologies [Mena et al., 1996] have been constructed from well-defined and standardized vocabularies and ontologies. Medical Subject headings [Nelson et al., 2001] are used to annotate biomedical research articles in MEDLINE [MEDLINE]. These are constructed from biomedical vocabularies available in the UMLS [Lindbergh et al., 1993]. An attempt at modeling user attributes is presented in Shoens et al. [1993]. The above discussion suggests that domain-specific metadata capture information which is more meaningful with respect to a specific application or a domain. The information captured by other types of metadata primarily reflect the format and organization of underlying data.
30.3 Metadata Expressions: Modeling Information Content We presented in the previous section different types of metadata that capture information content to different extents. Metadata has been used by a wide variety of researchers in various contexts for different functionality relating to retrieval and manipulation of digital content. We now discuss approaches for combining metadata to create information models based on the underlying data. There are two broad approaches: • Use of content- and domain-independent metadata to encapsulate digital content within an infrastructural object model. • Use of domain-specific metadata to specify existing relationships within the same content collection or across collections. We present both these approaches, followed by a brief survey of modeling and markup languages that have been used.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 7 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-7
30.3.1 The InfoHarness System: Metadata-Based Object Model for Digital Content We now discuss the InfoHarness [Shklar et al., 1994, 1995c, a, b; Sheth et al., 1995] system, which has been the basis of many successful research projects and commercial products. The main goal of InfoHarness is to provide uniform access to information independent of the formats, location, and organization of the information in the individual information sources. We discuss how content-independent metadata (e.g., type, location, access rights, owner, creation date, etc.) may be used to encapsulate the underlying data and media heterogeneity and represent information in an object model. We then discuss how the information spaces might be logically structured and discuss an approach for an interpreted modeling language. 30.3.1.1 Metadata for Encapsulation of Information Representational details are abstracted out of the underlying data, and metadata is used to capture information content. This is achieved by encapsulation of the underlying data into units of interest called information units, and extraction of metadata describing information of interest. The object representation is illustrated in Figure 30.2 and is discussed below. A metadata entity that is associated with the lowest level of granularity of information available to InfoHarness is called an information unit (IU). An IU may be associated with a file (e.g., a Unix man page or help file, a Usenet news item), a portion of a file (e.g,, a C function or a database table), a set of files (e.g., a collection of related bitmaps), or any request for the retrieval of data from an external source (e.g., a database query). An InfoHarness Object (IHO) may be one of the following: 1. A single information unit 2. A collection of InfoHarness objects (either indexed or nonindexed) 3. A single information unit and a nonindexed collection of InfoHarness objects Each IHO has a unique object identifier that is recognized and maintained by the system. An IHO that encapsulates an IU contains information about the location of data, retrieval method, and any parameters needed by the method to extract the relevant portion of information. For example, an IHO associated with a C function will contain the path information for the .c file that contains the function, the name and location of the program that knows how to extract a function from a .c file, and the name of the function to be passed to this program as a parameter. In addition, each IHO may contain an arbitrary number of attribute-value pairs for attribute-based access to the information. An InfoHarness Repository (IHR) is a collection of IHOs. Each IHO (known as the parent) that encapsulates a collection of IHOs stores unique object identifiers of the members of the collection. We refer to these members as children of the IHO. IHOs that encapsulate indexed collections store information about the location of both the index and the query method. 30.3.1.2 Logical Structuring of the Information Space We now discuss the various types of logical structure that can be imposed on the content in the context of the functionality enabled by such a structuring. This structuring is enabled by the extraction of the different kinds of metadata discussed above.
InfoHarness Object (IHO) 1. Information Unit 1.1 Type 1.2 Location 1.3 Other Attributes 2. List of Collections
Metadata Extraction
FIGURE 30.2 Metadata encapsulation in InfoHarness.
Copyright 2005 by CRC Press LLC
Text file (or portion), bitmap, email, man page , directory of man pages
C3812_C30.fm Page 8 Wednesday, August 4, 2004 8:57 AM
30-8
The Practical Handbook of Internet Computing
Consider the scenario illustrated in Figure 30.3. Case I depicts the actual physical distribution of the various types of documents required in a large software design project. The different documents are spread all over the file system as a result of different members of the project putting the files where they are deemed appropriate. Appropriate metadata extractors preprocess these documents and store important information like type and location and establish appropriate parent–child relationships. Case II illustrates the desired logical view seen by the user. Information can be browsed according to units of interest as opposed to browsing the information according to physical organization in the underlying data repositories. A key capability enabled by the logical structuring is the ability to seamlessly plug in third-party indexing technologies to index document collections. This is illustrated in Figure 30.3, Case II, where the same set of documents is indexed using different third-party indexing technologies. Each of these document collections so indexed can be now queried using a keyword-based query without the user having to worry about the details of the underlying indexing technology. Attribute-based access provides a powerful complementary or alternative search mechanism to traditional content-based search and access [Sheth et al., 1995]. While attribute-based access can provide better precision [Salton, 1989], it can be more complex as it requires that appropriate attributes have been identified and the corresponding metadata instantiated before accessing data. In Figure 30.4 we illustrate an example of attribute-based access in InfoHarness. Attribute-based queries by the user result in SQL queries to the metadata repository and retrieval of the news items that satisfy the conditions specified. The advantages of attribute-based access are:
/ InfoHarness ihtest/
usr/
u/
RequirementsTesting SourceCode
ih/
QMO/
httpd_1.3/
rel05/
man/ System Man Pages
Figures
kjshah/
ManPages
ih/
src/
LSI
WAIS
Source code, C functions Figures rel05/
doc/
Third Party Indexing Technologies
doc/
Requirements, Testing
caise95
man/ InfoHarness Man Pages
www3/ Case II: Logical User View
Case I: Actual Physical Structure
FIGURE 30.3 Logical structuring of the Information Space.
author
Ted Koppel
title
Dole and Clinton
date
> 010197
News Items Dole leads Clinton in Georgia Clinton wins over Dole in Arizona Dole and Clinton neck to neck in Delaware
..
select IHO from Metadata_Table where title like ’%Dole’ and date > 01 01 97
FIGURE 30.4 Attribute-based access in InfoHarness.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 9 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-9
Enhance the semantics of the keywords. When a user presents a keyword (e.g., “Ted Koppel”) as the value of an attribute (e.g., author), there are more constraints on the keyword as compared to when it appears by itself in a keyword-based query. This improves the precision of the answer. Attributes can have associated types. The attribute submission date could have values of type date. Simple comparison operators (, , ) can now be used for specifying constraints. Querying content-independent information. One cannot query content-independent information such as modification date using keyword-based access as such information will never be available from the analysis of the content of the document. 30.3.1.3 IRDL: A Modeling Language for Generating the Object Model The creation of an IHR amounts to the generation of metadata objects that represent IHOs and indexing of physical information encapsulated by members of indexed collections. The IHR can either be generated manually by writing metadata extractors or created automatically by interpreting IRDL (InfoHarness Repository Definition Language) statements. A detailed discussion of IRDL can be found in Shklar et al. [1995a], and its use in modeling heterogeneous information is discussed in Shklar et al. [1995b]. There are three main IRDL commands: Encapsulate. This command takes as input information the type and location of physical data and returns a set of IHOs, each of which encapsulates a piece of data. Boundaries of these pieces are determined by the type. For example, in the case of e-mail, a set IHOs, each of which is associated with a separate mail message, is returned. Group. This command generates an IHO associated with a collection and establishes parent–child relationships between the collection IHO and the member IHOs. In case a parameter indicating the indexing technology is specified, an index on the physical data associated with the member IHOs is created. Merge. This command takes as input an IHO and associated references and creates a composite IHO. We explain the model generation process by discussing an example for C programs (Figure 30.5). The steps that generate the model displayed in Figure 30.5, Case I, are as follows: 1. For each C file do the following: (a) Create simple IHOs that encapsulate individual functions that occur in this file. (b) Create a composite IHO that encapsulates the file and points to IHOs created in step 1.1. 2. Create an indexed collection of the composite IHOs created in step 1, using LSI for indexing physical data. The IRDL statements that generate the model discussed above are: BEGIN COLLTYPE LSI; DATATYPE TXT, C; VAR IHO: File_IHO, LSI_Collection; VAR SET IHO: File_IHO_SET, Function_IHO_SET; File_IHO_SET = ENCAP TXT “/usr/local/test/src”; FORALL File_IHO IN File_IHO_SET { Function_IHO_SET = ENCAP C File_IHO; File_IHO = COMBINE IHO Function_IHO_SET; WRITE File_IHO, Function_IHO_SET; } LSI_Collection = INDEX LSI File_IHO_SET “/usr/local/db/c”; WRITE LSI_Collection; END
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 10 Wednesday, August 4, 2004 8:57 AM
30-10
The Practical Handbook of Internet Computing
RAW DATA File1.c function a function e
IU
IU
IU
C File
IU
IU
IU
C function
C C C function function function
IU
C File
C File
function c File2.c function d
Simple IHO
function e function f IU C File
IU
IU
IU
C C C function function function
IU C File
IU
IU
IU
C C C function function function
IU
IU
IU
IU
IU
IU
IU
C C File function
C C C function function function
C File
File3.c function g function e
IU
IU
IU
C C File function
C File
IU
C C File function
IU C File
IU C File
Indexed Collection Composite IHO
function i /usr/local/test/src
Indexed Collection /usr/local/test/src function d
File1.c
File2.c
File1.c function d
function e
function f
File3.c
function e
File2.c
File3.c
function f CASE II
CASE I
FIGURE 30.5 Object model generation for a C program.
This is another example of logical structuring using parent–child relationships to set up different logical views of the same underlying physical space. Case I (Figure 30.5) illustrates the case where a directory containing C code is viewed as a collection of C files, each of which is a collection of C functions. Case II (Figure 30.5), on the other hand, illustrates the case where the directory is viewed as a collection of C functions, each of which is a collection of the C files in which it appears.
30.3.2 Metadata-Based Logical Semantic Webs The Web as it exists today is a graph of information artifacts and resources, where graph nodes are represented by embedded HREF tags. These tags enable linking of related (or unrelated) Web artifacts. This Web is very suitable for browsing but provides little or no direct help for searching, information gathering, or analysis. Web crawlers and search engines try to impose some sort of an order by building indices on top of Web artifacts, which are primarily textual. Thus a keyword query may be viewed as imposing a correlation (logical relationship) at a very basic (limited) level between the artifacts that make up the result set for that query. For example, let us say a search query “Bill Clinton” (Q) retrieves http:/ /www.billclinton.com (Resource1) and http://www.whitehouse.gov/billclinton.html (Resource2). An interesting viewpoint would be that Resource1 and Resource2 are “correlated” with each other through the query Q. The above may be represented graphically with Resource1 and Resource2 represented as nodes linked by an edge labeled by the string corresponding to Q. Metadata is the key to this correlation. The keyword index (used to process keyword queries) may be conceptually viewed as content-dependent metadata, and the keywords in the query as specific resource descriptors for the index, the evaluation of which would result in a set of linked or correlated resources. We discussed in the previous section the role played by metadata in encapsulating digital content into an object model. An approach using an interpreted modeling language for metadata extraction and generation of the object model was presented. We now present a discussion on how a metadata-based
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 11 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-11
formalism, the metadata reference link (MREF) [Sheth and Kashyap, 1996; Shah and Sheth, 1998] can be used to enable semantic linking correlation, an important prerequisite for building logical Semantic Webs. MREF is a generalization of the construct used by the current Web to specify links and is defined as follows. • Document Description< /A> • Document Description< /A> Different types of correlation are enabled based on the type of metadata that is used. We now present examples of correlation. 30.3.2.1 Content-Independent Correlation This type of correlation arises when content-independent metadata (e.g., the location expressed as a URL) is used to establish the correlation. The correlation is typically media independent as contentindependent metadata typically do not depend on media characteristics. In this case, the correlation is done by the designer of the document as illustrated in the following example: A Scenic Sunset at Lake Tahoe Lake Tahoe is a very popular tourist spot and some interesting facts are available here. The scenic beauty of Lake Tahoe can be viewed in this photograph:
The correlation is achieved by using physical links and without using any higher-level specification mechanism. This is predominantly the type of correlation found in the HTML documents on the World Wide Web [Berners-Lee et al., 1992]. 30.3.2.2 Correlation Using Direct Content-Based Metadata We present in the following text an example based on a query in Ogle and Stonebraker [1995] to demonstrate a correlation involving attribute-based metadata. One of the attributes is color, which is a media-specific attribute. Hence we view this interesting case of correlation as media-specific correlation. Scenic waterfalls Some interesting information on scenic waterfalls is available here.
30.3.2.3 Correlation Using Content-Descriptive Metadata In Kiyoki et al. [1994], keywords are associated with images, and a full-text index is created on the keyword descriptions. Because the keywords describe the contents of an image, we consider these as content-descriptive metadata. Correlation can now be achieved by querying the collection of image documents and text documents using the same set of keywords as illustrated in this example: Scenic Natural Sights some interesting information on Lake Tahoe is available here.
This type of correlation is more meaningful than content-independent correlation. Also the user has more control over the correlation, as he or she may be allowed to change the thresholds and the keywords. The keywords used to describe the image are media independent and hence correlation is achieved in a media-independent manner.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 12 Wednesday, August 4, 2004 8:57 AM
30-12
The Practical Handbook of Internet Computing
30.3.2.4 Domain-Specific Correlation To better handle the information overload on the fast-growing global information infrastructure, there needs to be support for correlation of information at a higher level of abstraction independent of the medium of representation of the information [Jain, 1994]. Domain-specific metadata, which is necessarily media independent, needs to be modeled. Let us consider the domain of a site location and planning application supported by a Geographic Information System and a correlation query illustrated in the following example: Site Location and Planning To identify potential locations for a future shopping mall, we present below all regions having a population greater than 500 and area greater than 50 sq feet having an urban land cover and moderate relief can be viewed here
The processing of the preceding query results in the structured information (area, population) and the snap of the regions satisfying the above constraints being included in the HTML document. The query processing system will have to map these attributes to image processing and other SQL-based routines to retrieve and present the results. 30.3.2.5 Example: RDF Representation of MREF These notions of metadata-based modeling are fundamental to the notion of the emerging Semantic Web [Berners-Lee et al., 2001]. Semantic Web researchers have focused on markup languages for representing machine-understandable metadata. We now present a representation of the example listed above using the RDF markup language. population number greater 500 ”> To identify potential locations for a future shopping mall, all regions having a population greater than 500 and area greater than 50 acres having an urban land cover and moderate relief can be viewed here.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 13 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-13
30.3.3 Modeling Languages and Markup Standards The concept of a simple, declarative language to support modeling is not new. Although modeling languages borrow from the classical hierarchical, relational, and network approaches, a number of them incorporate and extend the relational model. The languages examined in the following text may be categorized as: Algebraic model formulation generators AMPL [Fourer et al.,1987], GAMS [Kendrick and Meeraus, 1987] and GEML [Neustadter, 1994] belong to this group. Graphical model generators GOOD [Gyssens et al., 1994] and GYNGEN [Forster and Mevert, 1994] belong to this group. Hybrid/compositional model generators These languages have an underlying representation based on mathematical and symbolic properties, e.g., CML [Falkenhainer et al., 1994] and SHSML [Taylor, 1993]. GOOD attempts to provide ease of high-level conceptualizing and manipulation of data. Sharing similarities with GOOD, GYNGEN focuses on process modeling by capturing the semantics underlying planning problems. CNFL and SHSML facilitate the modeling of dynamic processes. GEML is a language based on sets and has both primitive and derived data types. Primitive types may be defined by the user or built-in scalars. Derived types are recursive applications of operations such as the cartesian product or subtyping. GOOD, GYNGEN, SHSML, and CML all employ graphs for defining structures. For the individual languages, variations arise when determining the role of nodes/edges as representations of the underlying concepts, and composing and interconnecting them to produce meaningful representations. GOOD relies on the operations of node addition/deletion, edge addition/deletion, and abstraction to build directed graphs. SHSML and CML are designed specifically to handle data dependencies arising from dynamic processes with time-varying properties. Structure in CML is domain-theory dependent, defined by a set of top-level forms. Domain theories are composed from components, processes, interaction phenomena, logical relations, etc. The types in this language include symbols, lists, terms composed of lists, sequences, and sets of sequences. The language promotes reuse of existing domain theories to model processes under a variety of conditions. A host of initiatives have been proposed by the W3C consortium that have a lot in common with the modeling languages listed above. The effort has been to standardize the various features across a wide variety of potential applications on the Web and specify markup formats for them. A list of such markup formats are: XML. XML is a markup language for documents containing structured information. It is a metalanguage for describing markup languages, i.e., it provides a facility to define tags and the structural relationships between them. Because there is no predefined tag set, there cannot be any preconceived semantics. All of the semantics of an XML document are defined by specialized instantiations, applications that process XML specifications, or by stylesheets. The vocabulary that makes up the tags and associated values may be obtained from ontologies and thesauri possibly available on the Web. XSLT and XPath. The Extensible Stylesheet Language Transformations (XSLT) and the XML Path Language (XPath) are essentially languages that support transformation of XML specifications from one language to another. XML Schema. The XML Schema definition language is a markup language that describes and constrains the content of an XML document. It is analogous to the database schema for relational databases and is a generalization of DTDs. XQuery. The XQuery language is a powerful language for processing and querying XML data. It is analogous to the structured query language (SQL) used in the context of relational databases. RDF. The Resource Description Framework (RDF) is a format for representing machine-understandable metadata on the Web. It has a graph-based data model with resources as nodes, properties as labeled edges, and values as nodes.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 14 Wednesday, August 4, 2004 8:57 AM
30-14
The Practical Handbook of Internet Computing
RDF Schema. Though RDF specifies a data model, it does not specify the vocabulary, e.g., what properties need to be represented, of the metadata description. These vocabularies (ontologies) are represented using RDF Schema expressions and can be used to constrain the underlying RDF statements. DAML+OIL. The DARPA Agent Markup Language (DAML+OIL) is a more sophisticated specification (compared to RDF Schema) used to capture semantic constraints that might be available in an ontology/ vocabulary. Topic Maps. Topic Maps share with RDF the goal of representing relationships amongst data items of interest. A topic map is essentially a collection of topics used to describe key concepts in the underlying data repositories (text and relational databases). Relationships to these topics are represented using links also, called associations. Links that associate a given topic with the information sources in which it appears are called occurrences. Topics are related together independently of what is said about them in the information being indexed. A topic map defines a multidimensional topic space, a space in which the locations are topics, and in which the distances between topics are measurable in terms of the number of intervening topics that must be visited in order to get from one topic to another. It also includes the kinds of relationships that define the path from one topic to another, if any, through the intervening topics, if any. Web Services. Web Services are computations available on the Web that can be invoked via standardized XML messages. Web Services Description Language (WSDL) describes these services in a repository — the Universal Description, Discovery, and Integration Service (UDDI) — which can be invoked using the Simple Object Access Protocol (SOAP) specification.
30.4 Ontology: Vocabularies and Reference Terms for Metadata We discussed in the previous sections how metadata-based descriptions can be an important tool for modeling information on the Web. The degree of semantics depends on the nature of these descriptions, i.e., whether they are domain specific. A crucial aspect of creating metadata descriptions is the vocabulary used to create them. The key to utilizing the knowledge of an application domain is to identify the basic vocabulary consisting of terms or concepts of interest to a typical user in the application domain and the interrelationships among the concepts in the ontology. In the course of collecting a vocabulary or constructing an ontology for information represented in a particular media type, some concepts or terms may be independent of the application domain. Some of them may be media specific while others may be media independent. There might be some applicationspecific concepts for which interrelationships may be represented. They are typically independent of the media of representation. Information represented using different media types can be modeled with application-specific concepts.
30.4.1 Terminological Commitments: Constructing an Ontology An ontology may be defined as the specification of a representational vocabulary for a shared domain of discourse that may include definitions of classes relations functions and other objects [Gruber, 1993]. A crucial concept in creating an ontology is the notion of terminological commitment, which requires that subsribers to a given ontology agree on the semantics of any term in that ontology. This makes it incumbent upon content providers subscribing to a particular ontology to ensure that the information stored in their repositories is somehow mapped to the terms in the ontology. Content users, on the other hand, need to specify their information requests by using terms from the same ontology. A terminological commitment may be achieved via various means, such as alignment with a dominant standard or ontology, or via a negotiation process. Terminological commitments act as a bridge between various content providers and users. This is crucial as this terminological commitment then carries forward to the metadata descriptions constructed from these ontological concepts. However, in some cases, content providers and subscribers may subscribe to different ontologies, in which case terminological commitments need to be expanded to multiple ontologies, a situation we discuss later in this chapter. We view
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 15 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-15
Geological-Region Urban Forest Land Residential
Industrial
Commercial
Water
Evergreen
Deciduous
Lakes
Mixed
Reservoirs Streams and Canals
A classification using a generalization hierarchy Geological-Region State County City
Rural Area
Tract Block Group
Block
A classification using an aggregation hierarchy FIGURE 30.6 Hierarchies describing a domain vocabulary.
terminological commitments as a very important requirement for capturing the semantics of domainspecific terms. For the purposes of this chapter, we assume that media types presenting related information share the same domain of discourse. Typically, there may be other terms in the vocabulary that may not be dependent on the domain and may be media specific. Further, it may be necessary to translate between descriptive vocabularies that involve approximating, abstracting, or eliminating terms as a part of the negotiated agreement reached by various content managers. It may also be important to translate domainspecific terms to domain-independent media-specific terms by using techniques specialized to that media type. An example of a classification that can serve as a vocabulary for constructing metadata is illustrated in Figure 30.6. In the process of construction, we view the ontology from the following two different dimensions: 1. Data-Driven vs. Application-Driven Dimension Data-driven perspective. This refers to concepts and relationships designed by interactive identification of objects in the digital content corresponding to different media types. Application-driven perspective. This refers to concepts and relationships inspired by the class of queries for which the related information in the various media types is processed. The concept Rural Area in Figure 30.6 is one such example. 2. Domain-Dependent vs. Domain-Independent Dimension Domain-dependent perspective. This represents the concepts that are closely tied to the application domain we wish to model. These are likely to be identified using the application-driven approach. Domain-independent perspective. This represents concepts required by various media types, e.g., color, shape, and texture for images, such as R features [Jain and Hampapur, 1994], to identify
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 16 Wednesday, August 4, 2004 8:57 AM
30-16
The Practical Handbook of Internet Computing
the domain-specific concepts. These are typically independent of the application domain and are generated by using a data-driven approach.
30.4.2 Controlled Vocabulary for Digital Media In this section we survey the terminology and vocabulary identified by various researchers for characterizing multimedia content and relate the various terms used to the perspectives discussed earlier. Jain and Hampapur [1994] have used domain models to assign a qualitative label to a feature (such as pass, dribble, and dunk in basketball), and these are called Q-Features. Features which rely on lowlevel domain-independent models such as object trajectories are called R-Features. Q-Features may be considered as an example of the domain-dependent application-driven perspective, whereas R-Features may be associated with the domain-independent data-driven perspective. Kiyoki et al. [1994] have used basic words from the General Basic English Dictionary as features that are then associated with the images. These features may be considered as examples of the domaindependent data-driven perspective. Color names defined by ISCC (Inter Society Color Council) and NBS (National Bureau of Standard) are used as features and may be considered as examples of the domainindependent data-driven perspective. Anderson and Stonebraker [1994] model some features that are primarily based on the measurements of Advanced Very High Resolution Radiometer (AVHRR) channels. Other features refer to spatial latitude/ longitude and temporal (begin date, end date) information. These may be considered as examples of domain-independent data-driven perspective. However, there are features such as the normalized difference vegetation index (NDVI) that are derived from different channels and may be considered as an example of the domain-dependent data-driven perspective. Glavitsch et. al. [1994] have determined from experiments that good indexing features lay between phonemes and words. They have selected three special types of subword units VCV-, CV- and VC-. The letter V stands for a maximum sequence of vowels and C for a maximum sequence of consonants. They process a set of speech and text documents to determine a vocabulary for the domain. The same vocabulary is used for both speech and text media types, and may be considered as examples of the domain-dependent data-driven perspective. Chen et. al. [1994] use the keywords identified in text and speech documents as their vocabulary. Issues of restricted vs. unrestricted vocabulary are very important. These may be considered as examples of the domain-dependent data and application-driven perspectives. A summary of the above discussion is presented in Table 30.2. TABLE 30.2 Controlled Vocabulary for Digital Media
Vocabulary Feature Q Features [Jain and Hampapur, 1994] R Features [Jain and Hampapur, 1994] English Words [Kiyoki et. al., 1994] ISCC and NBS colors [Kiyoki et. al., 1994] AVHRR features [Anderson and Stonebraker, 1994] NDVI [Anderson and Stonebraker, 1994] Subword units [Glavitsch et. al., 1994] Keywords [Chen et. al., 1994]
Copyright 2005 by CRC Press LLC
Media Type Video, Image Video, Image Image Image Image Image Audio, Text Image, Audio Text
Domain Dependent or Independent Domain Dependent Domain Independent Domain Dependent Domain Independent Domain Independent Domain Dependent Domain Dependent Domain Dependent
Application or Data Driven Application Driven Data Driven Data Driven Data Driven Data Driven Data Driven Data Driven Application and Data Driven
C3812_C30.fm Page 17 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-17
30.4.3 Ontology-Guided Metadata Extraction The extraction of metadata from the information in various media types can be primarily guided by the domain-specific ontology, though it may also involve terms in the domain-independent ontology. Kiyoki et. al. [1994] describe the automatic extraction of impression vectors based on English words or ISCC and NBS colors. The users, when querying an image database, use English words to query the system. One way of guiding the users could be to display the list of English words used to construct the metadata in the first place. Glavitsch et. al. [1994] describe the construction of a speech feature index for both text and audio documents based on a common vocabulary consisting of subword units. Chen et al. [1994] describe the construction of keyword indices, topic change indices, and layout indices. These typically depend on the content of the documents, and the vocabulary is dependent on keywords present in the documents. In the above cases, the vocabulary is not predefined and depends on the content of documents in the collection. Also, interrelationships between the terms in the ontology are not identified. A controlled vocabulary with terms and their interrelationships can be exploited to create metadata that model domain-dependent relationships as illustrated by the GIS example discussed in Kashyap and Sheth [1997]. Example: Consider a decision-support query across multiple data repositories possibly representing data in multiple media. Get all regions having a population greater then 500, area greater than 50 acres, having an urban land-cover and moderate relief. The metadata (referred to as m-context) can be represented as: (AND region (population > 500) (area > 50) (= land-cover ”urban”) (= relief “moderate”)) Suppose the ontology from which the metadata description is constructed supports complex relationships. Furthermore, let: CrowdedRegion @ (AND region (population > 200)) Inferences supported by the ontology enable determination that the regions required by the query metadata discussed earlier are instances of CrowdedRegion. Thus the metadata description (now referred to as c-context) can be rewritten as: (AND CrowdedRegion (population > 500) (area > 50) (= land-cover “urban”) (= relief “moderate”)) The above example illustrates how metadata expressions, when constructed using ontological concepts, can take advantage of ontological inferences to support metadata computation.
30.4.4 Medical Vocabularies and Terminologies: The UMLS Project Metadata descriptions constructed from controlled vocabularies have been used extensively to index and search for information in medical research literature. In particular, articles in the MEDLINE (R) bibliographic database has used terms obtained from the MeSH vocabulary to annotate medical research articles. Besides this, there are a wide variety of controlled vocabularies in medicine used to capture information related to disesases, drugs, laboratory tests, etc. Efforts have been made to integrate various perspectives by creating a “Meta” Thesaurus or vocabulary that links these vocabularies together. This was the goal of the UMLS project, initiated in 1986 by the U.S. National Library of Medicine (NLM) [Lindbergh et al., 1993]. The UMLS consists of biomedical concepts and associated strings (Metathesaurus), a semantic network, and a collection of lexical tools, and has been used in a large variety of applications. The three main Knowledge Sources in the UMLS are: 1. The UMLS Metathesaurus provides a common structure for more than 95 source biomedical vocabularies, organized by concept or meaning. A concept is defined as a cluster of terms (one or more words representing a distinct concept) representing the same meaning (e.g., synonyms, lexical variants, translations). The 2002 version of the Metathesaurus contains 871,584 concepts named by 2.1 million terms. Interconcept relationships across multiple vocabularies, concept
Copyright 2005 by CRC Press LLC
Semantic Network Types Fully Formed Anatomical Structure
Embryonic Structure
Body Part, Organ or Organ Component
Disease or Syndrome Population Group
Pharmacologic Substance
Saccular Viscus
Mediastinum
Cardiotonic Agents
Heart Valves
Angina Pectoris Fetal Heart
Tissue Donors Biomedical Vocabularies
MeSH
ICD
LOINC
FIGURE 30.7 Biomedical vocabularies and the Unified Medical Language System.
SNOMED
CPT
The Practical Handbook of Internet Computing
Heart
Metathesaurus Concepts
C3812_C30.fm Page 18 Wednesday, August 4, 2004 8:57 AM
30-18
Copyright 2005 by CRC Press LLC
Anatomical Structure
C3812_C30.fm Page 19 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-19
categorization, and information on concept cooccurrence in MEDLINE are also included [McCray and Nelson, 1995]. 2. The UMLS Semantic Network categorizes Metathesaurus concepts through semantic types and relationships [McCray and Nelson, 1995]. 3. The SPECIALIST lexicon contains over 30,000 English words, including many biomedical terms. Information for each entry, including base form, spelling variants, syntactic category, inflectional variation of nouns, and conjugation of verbs are used by the lexical tools [McCray et al., 1994]. There are over 163,000 records in the 2002 SPECIALIST lexicon representing over 268,000 distinct strings. Some of the prominent medical vocabularies are as follows: Medical Subject Headings (MeSH). The Medical Subject Headings (MeSH) [Nelson et al., 2001] have been produced by the NLM since 1960. The MeSH thesaurus is NLM’s controlled vocabulary for subject indexing and searching of journal articles in PubMed, and books, journal titles, and nonprint materials in NLM’s catalog. Translated into many different languages, MeSH is widely used in indexing and cataloging by libraries and other institutions around the world. An example of the MeSH expression used to index and search for the concept “Mumps pancreatitis” is illustrated in Figure 30.8. International Classification of Diseases (ICD). The World Health Organization’s International Classification of Diseases, 9th Revision (ICD-9)[ICD] is designed for the classification of morbidity and mortality information for statistical purposes, for the indexing of hospital records by disease and operations, and for data storage and retrieval. ICD-9-CM is a clinical modification of ICD-9. The term “clinical” is used to emphasize the modification’s intent: to serve as a useful tool in the area of classification of morbidity data for indexing of medical records, medical care review, and ambulatory and other medical care programs, as well as for basic health statistics. To describe the clinical picture of the patient, the codes must be more precise than those needed only for statistical groupings and trend analysis. Systematized Nomenclature for Medicine (SNOMED). The SNOMED [Snomed] vocabulary was designed to address the need for a detailed and specific nomenclature to accurately reflect, in computer readable format, the complexity and diversity of information found in a patient record. The design ensures clarity of meaning, consistency in aggregation, and ease of messaging. The SNOMED is compositional in nature, i.e., new concepts can be created as compositions of existing ones, and has a systematized hierarchical structure. Its unique design allows for the full integration of electronic medical record information into a single data structure. Overall, SNOMED has contributed to the improvement in patient care, reduction of errors inherent in data coding, facilitation of research, and support of compatibility across software applications. Current Procedural Terminology (CPT). The Current Procedural Terminology (CPT) codes [CPT] are used to describe services in electronic transactions. CPT was developed by the American Medical Association (AMA) in the 1960s, and soon became part of the standard code set for Medicare and AND
MH/SH
QB
complications
MH/SH
MH
Mumps
FIGURE 30.8 A MeSH descriptor for information retrieval.
Copyright 2005 by CRC Press LLC
QB
etiology
MH
Pancreatitis
C3812_C30.fm Page 20 Wednesday, August 4, 2004 8:57 AM
30-20
The Practical Handbook of Internet Computing
Medicaid. In subsequent decades, it was also adopted by private insurance carriers and managed care companies, and has now become the de facto standard for reporting healthcare services. Logical Observation Identifier Names and Codes (LOINC). The purpose of the Logical Observation Identifier Names and Codes (LOINC) database [LOINC] is to facilitate the exchange and pooling of results, such as blood hemoglobin, serum potassium, or vital signs, for clinical care, outcomes management, and research. Its purpose is to identify observations in electronic messages such as Health Level Seven (HL7) [HL7] observation messages, so that when hospitals, health maintenance organizations, pharmaceutical manufacturers, researchers, and public health departments receive such messages from multiple sources, they can automatically file the results in the right slots of their medical records, research, and/or public health systems.
30.4.5 Expanding Terminological Commitments across Multiple Ontologies We discussed in the beginning of this section the desirability of expanding the process of achieving terminological commitments across multiple ontologies. The UMLS system described earlier may be viewed as an attempt to establish terminological commitments against a multitude of biomedical vocabularies. The UMLS Metathesaurus may be viewed as a repository of intervocabulary relationships. Establishing terminological commitments across users of the various biomedical vocabularies would require using the relationships represented in the UMLS Metathesaurus to provide translations from a term in a source vocabulary to a term or expression of terms in a target vocabulary. This requires the ability to integrate the two vocabularies in a common graph structure and navigation of the graph structure for suitable translation. This is illusrated in an abstract manner in Figure 30.9 and is being investigated in the context of the Semantic Vocabulary Interoperation Project at the NLM [SVIP].
30.5 Conclusions The success of the World Wide Web has led to the availability of tremendous amounts of heterogeneous digital content. However, this has led to concerns about scalability and information loss (e.g., loss in precision/recall). Information modeling is viewed as an approach for enabling scalable development of the Web for access to information in an information-preserving manner. Creation and extraction of machine-understandable metadata is a critical component of the Semantic Web effort that aims at enhancing the current Web with the “semantics” of information. In this chapter we presented a discussion on metadata, its use in various applications having relevance to the Web, and a classification of various metadata types, capturing different levels of information content. We discussed approaches that use metadata descriptions for creating information models and spaces, and various ways by which the semantics of information embedded in the data can be captured. A3 A3 B1 B1
A2
B3
INTEGRATION
B3
A1
B2
A1 A4
B4
A5
A4
A5
B4
B2 A6
FIGURE 30.9 Expanding terminological commitments by integration of ontologies.
Copyright 2005 by CRC Press LLC
A6
C3812_C30.fm Page 21 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-21
In this context, we also discussed the role played by controlled vocabularies and ontologies in providing reference terms and concepts for constructing metadata descriptions. Examples from the domain of biomedical information were presented and issues related to the establishment of terminological commitments across multiple user communities were also discussed. The role played by metadata and ontologies is crucial in modeling information and semantics, and this chapter provides an introduction to these technologies from that perspective.
References Anderson, J. and M. Stonebraker. Sequoia 2000 Metadata schema for satellite images, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Berners-Lee, T. et al. World Wide Web: the information universe. Electronic Networking: Research, Applications, and Policy, 1(2), 1992. Berners-Lee, T., James Hendler, and Ora Lassila. The Semantic Web. Scientific American, May, 2001. Bohm, K. and T. Rakow. Metadata for multimedia documents, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Boll, S., W. Klas, and A. Sheth. Overview on using metadata to manage multimedia data. In A. Sheth and W. Klas, Eds., Multimedia Data Management. McGraw-Hill, New York, 1998. Bray, T., J. Paoli, and C. M. Sperberg-McQueen. Extensible markup language (XML) 1.0. http:// www.w3.orgfrR/REC-xml. Chen, F., M. Hearst, J. Kupiec, J. Pederson, and L. Wilcox. Metadata for mixed-media access, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Collet, C., M. Huhns, and W. Shen. Resource integration using a large knowledge base in Carnot. IEEE Computer, December 1991. CPT. Current Procedural Terminology. http://www.ama-assn.org/ama/pub/category/3113.html. DAML+OIL. The DARPA Agent Markup Language. http://www.daml.org/. Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Hashman. Indexing by latent semantic indexing. Journal of the American Society for Information Science, 41(6), 1990. Falkenhainer, B. et al. CML: A Compositional Modeling Language, 1994. Draft. Forster, M. and P. Mevert. A tool for network modeling. European Journal of Operational Research, 72, 1994. Fourer, R., D. M. Gay, and B. W. Kernighan. AMPL: A Mathematical Programming Language. Technical Report 87-03, Department of Industrial Engineering and Management Sciences, Northwestern University, Chicago, IL, 1987. Glavitsch, U., P. Schauble, and M. Wechsler. Metadata for integrating speech documents in a text retrieval system, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Gruber, T. A translation approach to portable ontology specifications. Knowledge Acquisition: International Journal of Knowledge Acquisition for Knowledge-Based Systems, 5(2), June 1993. Gyssens, M. et al. A graph-oriented object database model. IEEE Transactions on Knowledge and Data Engineering, 6(4), 1994. HL7. The Health Level Seven Standard. http://www.hl7.org. ICD. The International Classification of Diseases, 9th Revision, Clinicial Modification. http:// www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm. Jain, R. Semantics in multimedia systems. IEEE Multimedia. 1(2), 1994. Jain, R. and A. Hampapur. Representations of video databases, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Kahle, B. and A. Medlar. An information system for corporate users: wide area information servers. Connexions — The Interoperability Report, 5(11), November 1991. Kashyap, V. and A. Sheth. Semantics-based Information Brokering. In Proceedings of the Third International Conference on Information and Knowledge Management (CIKM), November 1994.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 22 Wednesday, August 4, 2004 8:57 AM
30-22
The Practical Handbook of Internet Computing
Kashyap, V. and A. Sheth. Semantic heterogeneity: role of metadata, context, and ontologies. In M. Papazoglou and G. Schlageter, Eds., Cooperative Information Systems: Current Trends and Directions. Academic Press, San Diego, CA, 1997. Kendrick, D. and A. Meeraus. GAMS: An Introduction. Technical Report, Development Research Department, The World Bank, 1987. Kiyoki, Y., T. Kitagawa, and T. Hayama. A meta-database system for semantic image search by a mathematical model of meaning, in W. Klaus and A. Sheth, Eds., SIGMOD Record, special issue on Metadata for Digital Media, 1994. Klaus, W. and A. Sheth. Metadata for digital media. SIGMOD Record, special issue on Metadata for Digital Media, W. Klaus and A. Sheth, Eds., 23(4), December 1994. Lassila, O. and R. R. Swick. Resource description framework (RDF) model and syntax specification. http:/ /www.w3.org/TR/REC-rdf-syntax/. Lindbergh, D., B. Humphreys, and A. McCray. The unified medical language system: Methods of Information in Medicine, 32(4), 1993. http://umlsks.nlm.nih.gov. LOINC. The logical observation identifiers names and codes database. http://www.loinc.org. McCray, A. and S. Nelson. The representation of meaning in the UMLS. Methods of Information in Medicine, 34(1–2): 193–201, 1995. McCray, A., S. Srinivasan, and A. Browne. Lexical Methods for Managing Variation in Biomedical Terminologies. In Proceedings of the Annual Symposium on Computers in Applied Medical Care, 1994. M E D L I N E . T h e Pu b Me d M E D L I N E s y s t e m . h t t p : / / w w w. n c b i . n l m . n i h . g o v / e n t r e z / query.fcgi?db=PubMed. Mena, E., V. Kashyap, A. Sheth, and A. Illarramendi. OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-existing Ontologies. In Proceedings of the First IFCIS International Conference on Cooperative Information Systems (CoopIS ’96), June 1996. Nelson, S.J., W.D. Johnston, and B.L. Humphreys. Relationships in medical subject headings (MeSH). In C.A. Bean and R.Green, Eds., Relationships in the Organization of Knowledge. Kluwer Academic, Dordrecht, The Netherlands, 2001. Neustadter, L. A Formalization of Expression Semantics for an Executable Modeling Language. In Proceedings of the Twenty Seventh Annual Hawaii International Conference on System Sciences, 1994. Ogle, V. and M. Stonebraker. Chabot: retrieval from a relational database of images. IEEE Computer; special issue on Content-Based Image Retrieval Systems, 28(9), 1995. Ordille, J. and B. Miller. Distributed Active Catalogs and Meta-Data Caching in Descriptive Name Services. In Proceedings of the 13th International Conference on Distributed Computing Systems, May 1993. OWL. The Web Ontology Language. http://www.w3.org/TR/owl_guide/. Salton, G. Automatic Text Processing. Addison-Wesley, Reading, MA, 1989. Sciore, E., M. Siegel, and A. Rosenthal. Context interchange using meta-attributes. In Proceedings of the CIKM, 1992. SGML. The Standard Generalized Markup Language. http://www.w3.org/MarkUp/SGML/. Shah, K. and A. Sheth. Logical information modeling of web-accessible heterogeneous digital assets. In Proceedings of the IEEE Advances in Digital Libraries (ADL) Conference, April 1998. Sheth, A. and V. Kashyap. Media-independent correlation of information. What? How? In Proceedings of the First IEEE Metadata Conference, April 1996. http://www.computer.org/conferences/meta96/ sheth/index.html. Sheth, A., V. Kashyap, and W. LeBlanc. Attribute-based Access of Heterogeneous Digital Data. In Proceedings of the Workshop on Web Access to Legacy Data, Fourth International WWW Conference, December 1995. Shklar, L., K. Shah, and C. Basu. The InfoHarness repository definition language. In Proceedings of the Third International WWW Conference, May 1995a.
Copyright 2005 by CRC Press LLC
C3812_C30.fm Page 23 Wednesday, August 4, 2004 8:57 AM
Information Modeling on the Web: The Role of Metadata, Semantics, and Ontologies
30-23
Shklar, L., K. Shah, C. Basu, and V. Kashyap. Modelling Heterogeneous Information. In Proceedings of the Second International Workshop on Next Generation Information Technologies (NGITS ’95), June 1995b. Shklar, L., A. Sheth, V. Kashyap, and K. Shah. InfoHarness: Use of Automatically Generated Metadata for Search and Retrieval of Heterogeneous Information. In Proceedings of CAiSE ’95, Lecture Notes in Computer Science #932, June 1995c. Shklar, L., S. Thatte, H, Marcus, and A. Sheth. The InfoHarness Information Integration Platform. In Proceedings of the Second International WWW Conference, October 1994. Shoens, K., A. Luniewski, P. Schwartz, J. Stamos, and J. Thomas. The Rufus System: Information organization for semi-structured data. In Proceedings of the 19th VLDB Conference, September 1993. Snomed. The systematized nomenclature of medicine. http://www.snomed.org. SVIP. The Semantic Vocabulary Interoperation Project, http://cgsb2.nlm.nih.gov/kashyap/projects/SVIP/. Taylor, J.H. Towards a Modeling Language Standard for Hybrid Dynamical Systems. In Proceedings of the 32nd Conference on Decision and Control, 1993.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 1 Wednesday, August 4, 2004 8:59 AM
31 Semantic Aspects of Web Services CONTENTS Abstract 31.1 Introduction 31.2 Semantic Web 31.2.1 Related Projects
31.3 Semantic Web Services 31.4 Relevant Frameworks
Sinuhé Arroyo Rubén Lara Juan Miguel Gómez David Berka Ying Ding Dieter Fensel
31.4.1 WSMF 31.4.2 WS-CAF 31.4.3 Frameworks Comparison
31.5 Epistemological Ontologies for Describing Services 31.5.1 DAML-S and OWL-S 31.5.2 DAML-S Elements 31.5.3 Limitations
31.6 Summary Acknowledgements References
Abstract Semantics promise to lift the Web to its full potential. The combination of machine-processable semantics provided by the Semantic Web with current Web Service technologies has coined the term Semantic Web Services. Semantic Web Services offer the means to achieve a higher order level of value-added services by automating the task-driven assembly of interorganization business logics, thus making the Internet a global, common platform where agents communicate with each other. This chapter provides an introduction to the Semantic Web and Semantic Web Services paying special attention to its automation support. It details current initiatives in the E.U. and U.S., presents the most relevant frameworks towards the realization of a common platform for the automatic task-driven composition of Web Services, sketches a comparison among them to point out their weaknesses and strengths, and finally introduces the most relevant technologies to describe services and their limitations.
31.1 Introduction Current Web technology exploits very little of the capabilities of modern computers. Computers are used solely as information-rendering devices, which present content in a human-understandable format. Incorporating semantics is essential in order to exploit all of the computational capabilities of computers for information processing and information exchange. Ontologies enable the combination of data and information with semantics, representing the backbone of a new technology called Semantic Web. By
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 2 Wednesday, August 4, 2004 8:59 AM
31-2
The Practical Handbook of Internet Computing
using ontologies, the Semantic Web will lift the current WWW to a new level of functionality where computers can query each other, respond appropriately, and manage semistructured information in order to perform a given task. If ontologies are the backbone of the Semantic Web, Web Services are its arms and legs. Web Services are self-contained, self-describing, modular applications that can be published, located, and invoked over the Web [Tidwell, 2000]. In a nutshell, Web Services are nothing but distributed pieces of software that can be accessed via the Web, roughly just another implementation of RPC. The potential behind such a simple concept resides in its possibilities of being assembled ad hoc to perform tasks or execute business processes. Web Services will significantly further the development of the Web by enabling automated program communication. Basically, they will allow the deployment of new complex value-added services. Web Services constitute the right means to accomplish the objectives of the Semantic Web. They not only facilitate the resources to access semantically enriched information, but its assembly and combination possibilities, enhanced with semantic descriptions of their functionalities, also will provide higher order functionality that will lift the Web to its full potential. The combination of the Semantic Web and the Web Service technology has been named Semantic Web Services. Semantic Web Services may be the killer application of this emerging Web. The Semantic Web will provide the means to automate the use of Web Services, namely discovery, composition, and execution [Mcllraith et al., 2001]. The automation of these processes will make the task-driven assembly of interorganization business logics a reality. Such automation will transform the Internet in a global, common platform where agents (organizations, individuals, and software) communicate with each other to carry out various commercial activities, providing a higher order level of value-added services. The effects of this new technological development will expand areas such as Knowledge Management, Enterprise Application Integration, and e-Business. In a nutshell, the Semantic Web promises a revolution in human information access similar to the one caused by the telephone and comparable to the invention of the steam engine. Such a revolution will count with ontologies and Web Services as its most important champions. The way computers are seen and the WWW is understood will be changed completely, and effects will be felt in every aspect of our daily life. This chapter provides an overview and detailed analysis of Semantic Web Services and related technologies. The contents are organized as follows: Section 31.2 presents an overview of the Semantic Web, a little bit of history, what is the Semantic Web, and a prospective of its possibilities; Section 31.3 introduces Semantic Web Services, fundamentals, actual state of development, and future trends and directions; 31.4 presents the most important initiatives towards the development of frameworks for Semantic Web Services, why frameworks are necessary, and what benefits they provide; Section 31.5 includes an overview of the most relevant upper ontologies developed to describe Semantic Web Services from a functional and nonfunctional point of view; and finally Section 31.6 provides a summary together with a view of the future direction the Semantic Web Services technology will take.
31.2 Semantic Web The Semantic Web is the next generation of the WWW where information has machine-processable and machine-understandable semantics. This technology will bring structure to the meaningful content of Web pages, being not a separate Web but an augmentation of the current one, where information is given a well-defined meaning. The Semantic Web will include millions of small and specialized reasoning services that will provide support for the automated achievement of tasks based on accessible information. One of the first to come up with the idea of the Semantic Web was the inventor of the current WWW, Tim Berners-Lee. He envisioned a Web where knowledge is stored on the meaning or content of Web resources through the use of machine-processable meta-data. The Semantic Web can be defined as “an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in co-operation” [Berners et al., 2001].
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 3 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-3
The core concept behind the Semantic Web is the representation of data in a machine-interpretable way. Ontologies facilitate the means to realize such representation. Ontologies represent formal and consensual specifications of conceptualizations, which provide a shared and common understanding of a domain as data and information machine-processable semantics, which can be communicated among agents (organizations, individuals, and software) [Fensel, 2001]. Many definitions of ontology have been given during the last years. One that really fits its application in the computer science field is the one given by Gruber [1993]: “An ontology is a formal, explicit specification of a shared conceptualization.” Ontologies bring together two essential aspects that help to push the Web to its full potential. On the one hand they provide (a) machine processability by defining formal semantics for information making computers able to process it; on the other, they allow (b) machine-human understanding due to their ability to specify real-world semantics that permit to link machine-processable content with human meaning using a consensual terminology as connecting element [Fensel, 2001]. Regarding Web Services, machine processability enabled by ontologies represent their most valuable contribution. They offer the necessary means to describe the capabilities of a concrete Web Service in a shared vocabulary that can be understood by every service requester. Such a shared functionality description is the key element towards task-driven automatic Web Service discovery, composition, and execution of interorganization business logics. It will allow to (1) locate different services that, solely or in combination, will provide the means to solve a given task, (2) combine services to achieve a goal, and (3) facilitate the replacement of such services by equivalent ones that, solely or in combination, can realize the same functionality, e.g., in case of failure during execution. As ontologies are built (models of how things work), it will be possible to use them as common languages to describe Web Services and the payloads they contain in much more detail [Daconta et, al 2003].
31.2.1 Related Projects Currently there are many initiatives, private and public, trying to bring the Berners-Lee vision of the Semantic Web to a plausible reality [Ding et al., 2003]. Among the most relevant ones financed by the European Comission are projects like Onto-Knowledge [Ontoknowledge], DIP [DIP], SEKT [SEKT], SWWS [SWWS], Esperonto [Esperonto], Knowledge Web [KowledgeWeb], and OntoWeb [OntoWeb]. In the U.S. one of the most relevant initiatives is the DARPA Agent Markup Language [DAML], supported by the research funding agency of the U.S. Department of Defense. Other relevant DARPA-funded initiatives are High Performance Knowledge Bases [HPKB], which is now completed, and its follow up, Rapid Knowledge Formation [RKF]. The National Science Foundation (NSF) has also sponsored some Semantic Web efforts. On its side, the W3C [W3C] has made an important effort towards the standardization of Semantic Web related technologies. European and U.S. researchers have been collaborating actively on standardization efforts [Ding et al., 2003]. A result of this collaboration is the joint Web Ontology Language OWL. Some of the initiatives presented in this section count among their main partners major software vendors such as BT or HP, providing a valuable business point of view that allows aligning the technology with market needs, and proving the interest that exists towards its development.
31.3 Semantic Web Services In simple terms, Web Services are a piece of software accessible via the Web. By Service can be understood any type of functionality that software can deliver, ranging from mere information providers (such as stock quotes, weather forecasts, or news aggregation) to more elaborate ones that may have some impact in the real world (such as booksellers, plane-ticket sellers, or e-banking), basically any functionality offered by the current Web can be envisioned as a Web Service. The big issue about Web Services resides in their capabilities for changing the Web from a static collection of information to a dynamic place where different pieces of software can be assembled on the fly to accomplish users’ goals (expressed,
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 4 Wednesday, August 4, 2004 8:59 AM
31-4
The Practical Handbook of Internet Computing
perhaps, in any natural language). This is a very ambitious goal, which at the moment cannot be achieved with the current state of the art technologies around Web Services. UDDI [Bellwood et al., 2002], WSDL [Christensen et al., 2001], and SOAP [Box et al., 2000] facilitate the means to advertise, describe, and invoke them, using semiformal natural language terms but do not say anything about what services can do, nor how they do it in a machine understandable and processable way. An alternative to UDDI, WSDL, and SOAP is ebXML (electronic business XML), which permits exchanging and transporting business documents over the Web using XML. It converges with the previously cited technologies, differing with them to the degree in which it recognizes business process modeling as a core feature [Newcomer, 2002]. All these initiatives lack proper support for semantics, and therefore human intervention is needed to actually discover, combine, and execute Web Services. The goal is to minimize any human intervention, so the assembly of Web Services can be done in a task-driven automatic way. The Semantic Web in general and ontologies in particular are the right means to bridge the gap and actually realize a dynamic Web. They provide the machine-processable semantics that, added on top of current Web Services, actualize the potential of the Semantic Web Services. Semantic Web Services combine Semantic Web and Web Service Technology. Semantic Web Services are defined as self-contained, self-describing, semantically marked-up software resources that can be published, discovered, composed and executed across the Web in a task driven automatic way. What really makes a difference with respect to traditional Web Services is that they are semantically marked-up. Such enhancement enables Automatic Web Service Discovery: the location of services corresponding to the service requester specification for a concrete task. Ideally, a task will be expressed using any natural language, and then translated to the ontology vocabulary suitable for the concrete application domain. Due to the fact that there will most likely be thousands of different ontologies for a concrete domain, (different service providers will express their business logics using different terms and conventions, and different service requesters will express their requirements using different vocabularies) the appropriate support for ontology merging and alignment must be available. Semantically marked-up services will be published in semantically enhanced services’ repositories where they can be easily located and their capabilities matched against the user’s requirements. In a nutshell, these repositories are traditional UDDI registries augmented with a semantic layer on top that provides machine-processable semantics for the services registered. As an example a service requester might say, “Locate all the services that can solve mathematical equations.” An automatic service discovery engine will then surf all available repositories in search for services that fulfill the given task. The list of available services will probably be huge, so the user might impose some limitations (functional — how the service is provided — and nonfunctional — execution time, cost, and so on) in order to get an accurate and precise set of services. Automatic Web Service composition is an assembly of services based on its functional specifications in order to achieve a given task and provide a higher order of functionality. Once a list of available Web Services has been retrieved, it could happen that none of the available services completely fulfils the proposed task by itself or that other cheaper, faster, vendor-dependent combinations of services are preferred. In this case, some of the services would need to be assembled, using programmatic conventions to accomplish the desired task. During this stage, Web Services are organized in different possible ways based on functional requirements (pre-conditions: conditions that must hold before the service is executed; and post-conditions: conditions that hold after the service execution) and non functional (processing time and cost) requirements. As an example, a service requester might say, “Compose available services to solve the following set of mathematical operations [((a ¥ b) + c) – d].” During the discovery phase different multiplication, addition and subtraction services might have been found, each one of them with its particular functional and nonfunctional attributes. Let us suppose that some multiplication services have been located, but they are all too slow and expensive, so we are not interested in using them. Instead we want to use additions to perform the multiplication. Such knowledge, the fact that multiple addition can realize multiplication, should be stated in some domain knowledge — characterization of relevant information for a specific area — in a way that the service composer can understand and is able to present all different
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 5 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-5
possibilities to solve our set of operations. Different alternatives that lead towards the task accomplishments are then presented to the service requester who chooses among them the composition path that suits its needs the best. Automatic Web Service execution is an invocation of a concrete set of services, arranged in a particular way following programmatic conventions that realizes a given task. When the available services have been composed and the service requester has chosen the execution path that best suits its needs according to the nonfunctional requirements, the set of Web Services is to be executed. Each Web Service specifies one or more APIs that allows its execution. The semantic markup facilitates all the information regarding inputs required for service execution and facilitates outputs returned once this has finished. As an example a service requester might say, “Solve the following mathematical operations using the execution path that contains no multiplication services [((3 ¥ 3) + 4)–1].” So what the user is actually saying is solve: [((3 + 3 + 3) + 4)–1]. The addition service that realized the multiplication will be put into a loop, which will add 3 to itself three times, the result will be added to 4 and then 1 will be subtracted to obtain the result, as stated in the equation. Prior to execution, the service may impose some limitations to the service requester, such restrictions are expressed in terms of assumptions — conditions about the state of the world. A service provider may state that the service requester has to have a certain amount of money in the bank prior to the execution of the service, or whatever other requirements are considered necessary as part of a concrete business logic. In case any of the execution’s constituents fails to accomplish its goal (e.g., network breaks down, service provider server breaks down, etc.), a recovery procedure must be applied to replace the failed service with another service or set of services capable of finishing the work. Once the composed service has been successfully executed, it might be registered in any of the available semantically enhanced repositories. By doing so, a new level of functionality is achieved, making available for reuse already composed and successfully executed services. Current technology does not fully realize any of the parts of the Web Services’ automation process. Among the reasons are: • • • • •
Lack of Lack of Lack of Lack of Lack of
fully developed markup languages marked up content and services semantically enhanced repositories frameworks that facilitate discovery, composition, and execution tools and platforms that allow semantic enrichment of current Web content
Essentially, the technology is not yet mature enough, and there is a long path that should be walked in order to make it a reality. Some academic and private initiatives are very interested in developing these technologies, both entities already making very strong, active efforts to grow them. Among them, Sycara et al., [1999] presented a Web Service discovery initiative using matchmaking based on a representation for annotating agent capabilities so that they can be located and brokered. Sirin et al. [2003] have developed a prototype that guides the user in the dynamic composition of Web Services. Web Service Modeling Ontology (WSMO) an initiative carried by the next Web generation research group at Innsbruck University aims at providing a conceptual model for describing the various aspects of Semantic Web Services [WSMO].
31.4 Relevant Frameworks Various initiatives aim to provide a full-fledged modeling framework for Web Services that allows their automatic discovery, composition, and execution. Among them, the most relevant ones are WSMF and WS-CAF. The benefits that can be derived from these developments will constitute a significant evolution in the way the Web and e-business are understood, providing the appropriate conceptual model for developing and describing Web Services and their assembly.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 6 Wednesday, August 4, 2004 8:59 AM
31-6
The Practical Handbook of Internet Computing
31.4.1 WSMF The Web Service Modeling Framework (WSMF) is an initiative towards the realization of a full-fledged modeling framework for Semantic Web Services that counts on ontologies as its most important constituents. WSMF aims at providing a comprehensive platform to achieve automatic Web Service discovery, selection, mediation, and composition of complex services; that is, to make Semantic Web services a reality and to exploit their capabilities. The WSMF description given in this section is based on Fensel and Bussler [2002]. WSMF specification is currently evolving, but none of the key elements presented here is likely to undergo a major change. 31.4.1.1 WSMF Objectives The WSMF objectives are as follows. Automated discovery includes the means to mechanize the task of finding and comparing different vendors and their offers by using machine-processable semantics. Data mediation involves ontologies to facilitate better mappings among the enormous variety of data standards. Process mediation provides mechanized support to enable partners to cooperate despite differences in business logics, which are numerous and heterogeneous. 31.4.1.2 WSMF Principles WSMF revolves around two complementary principles, namely (1) strong decoupling, where the different components that realize an e-commerce application should be as disaggregated as possible, hiding internal business intelligence from public access, allowing the composition of processes based on their public available interfaces, and carrying communication among processes by means of public message exchange protocols; and (2) strong mediation, which will enable scalable communications, allowing anybody to speak with everybody. To allow this m:m communication style, terminologies should be aligned, and interaction styles should be intervened. In order to achieve such principles, a mapping among different business logics, together with the ability to establish the difference between public processes and private processes of complex Web Services, are key characteristics the framework should support. Mediators provide such mapping functionalities and allow expressing the difference among publicly visible workflows and internal business logics. 31.4.1.3 WSMF Elements WSMF consists of four different main elements: (1) ontologies that provide the terminology used by other elements, (2) goal repositories that define the tasks to be solved by Web Services, (3) Web Services as descriptions of functional and nonfunctional characteristics, and (4) mediators that bypass interoperability problems. A more detailed explanation of each element is given below. Ontologies. They interweave human understanding with machine processability. In WSMF ontologies provide a common vocabulary used by other elements in the framework. They enable reuse of terminology, as well as interoperability between components referring to the same or linked terminology. Goal repositories. Specify possible objectives a service requester may have when consulting a repository of services. A goal specification consists of two elements: • Preconditions. Conditions that must hold previous to the service execution and enable it to provide the service. • Postconditions. Conditions that hold after the service execution and that describe what happens when a service is invoked. Due to the fact that a Web Service can actually achieve different goals (i.e., Amazon can be used to buy books, and also as an information broker on bibliographic information about books), the Web Services descriptions, and the goals they are able to achieve are kept separately, allowing n:m mapping among services and goals. Conversely, a goal can be achieved but by different and eventually competing Web Services. Keeping goal specifications separate from Web Service descriptions enhances the discovery phase, as it enables goal-based search of Web Services instead of functionality based search.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 7 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-7
Web Services. In WSMF the complexity of a Web Service is measured in terms of its external visible description, contrary to the flow followed by most description languages, which differentiate them based on the complexity of the functionality and whether they can be broken into different pieces or not. Under traditional conventions, a complex piece of software such as an inference engine with a rather simplistic interface can be defined as elementary, whereas a much simpler software product such as an e-banking aggregator that can be broken down into several Web Services is considered complex. This reformulation of the definition that may look trivial has some relevant consequences: • Web Services are not described themselves but rather their interfaces, which can be accessed via a network. By these means service providers can hide their business logic, which usually is reflected in the services they offer. • The complexity of a Web Service description provides a scale of complexity, which begins with some basic description elements, and gradually increases the description density by adding further means to portray various aspects of the service. As can be inferred from the previous paragraph, the framework describes services as black boxes, i.e., it does not model the internal aspects of how such a service is achieved, hiding all business logic aspects. The black box description used by WSMF consists of the following main elements: Web Service name. Unique service identifier used to refer to it Goal reference Description of the objective that can be stored in a goal repository Input and output data. Description of the data structures required to provide the service Error data. Indicates problems or error states Message exchange protocol. Provides the means to deal with different types of networks and their properties providing an abstraction layer • Nonfunctional parameters. Parameters that describe the service such as execution time, price, location, or vendor
• • • • •
In addition to this basic Web Service description, WSMF considers other properties of the service such as: 1. Failure — When an error occurs affecting one of the invoked elements of a service and recovery is not possible, information about the reason of the failure must be provided. 2. Concurrent execution — If necessary it should allow the parallel execution of different Web Services as a realization of the functionality of a particular one. 3. Concurrent data input and output — In case input data is not available while a service is executing, it enables providing such input at a later stage and, if required, pass it from one invoker to another until reaching the actual requester. 4. Dynamic service binding — In case a service is required to invoke others to provide its service, a new proxy call is declared; the proxy allows referral to a service without knowing a definite time to which concrete service is bound. 5. Data flow — Refers to the concrete proxy ports where data has to be forwarded. 6. Control flow — Defines the correct execution sequence among two or more services. 7. Exception handling — Upon failure, services may return exception codes, if this is the case, the means to handle a concrete exception must be defined. 8. Compensation — Upon failure of an invoked Web Service, a compensation strategy that specifies what to do can be defined. Mediators. Deal with the service requester and provider heterogeneity, performing the necessary operations to enable full interoperation among them. It uses a peer-to-peer approach by means of a third party, facilitating in this way a higher degree of transparency and better scalability for both requester and provider. Mediators facilitate coping with the inherit heterogeneity of Web-based computing environments, which are flexible and open by nature. This heterogeneity refers to: • Data mediation. In In terms of data representation, data types and data structures.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 8 Wednesday, August 4, 2004 8:59 AM
31-8
The Practical Handbook of Internet Computing
• Business logics mediation. Compensate for the mismatches in business logics. • Message protocols mediation. Deals with protocol’s heterogeneity. • Dynamic service invocation mediation. In terms of cascading Web Service invocation. It can be done in a hard-wired way, but also, it can be more flexible by referring to certain (sub)-goals. These main elements of the framework together with message understanding and message exchange protocol layers are put together to provide automatic Web Service discovery, selection, mediation, and composition of complex services.
31.4.2 WS-CAF The Web Service Composite Application Framework (WS-CAF) [Bunting et al., 2003] represents another initiative that tries to address the application composition problem with the development of a framework that will provide the means to coordinate long-running business processes in an architecture and transaction model independent way. WS-CAF has been designed with the aim of solving the problems that derive from the combined use of Web Services to support information sharing and transaction processing. The discussion of WS-CAF is based on a draft of the specification made available on July 28, 2003, by Arjuna Technologies, Fujitsu Software, IONA Technologies, Oracle, and Sun Microsystems. Due to the draft nature of the specification, it is very likely that changes will occur, even though the main elements of it may remain stable. 31.4.2.1 WS-CAF Objectives The main objectives of WS-CAF are: (1) interoperability, to support various transaction models across different architectures; (2) complementarity, to accompany and support current state of the art of business process description languages standards such as BPEL [Andrews et al., 2002], WSCI [Arkin et al., 2002], or BPML [Arkin, 2002] to compose Web Services; (3) compatibility, to make the framework capable of working with existing Web Service standards such as UDDI [Bellwood et al., 2002], WSDL [Christensen et al., 2001], or WS-Security [Atkinson et al., 2002]; and (4) flexibility based on a stack architecture, to support the specific level of service required by Web Services combination. 31.4.2.2 WS-CAF Principles and Terminology The framework definition is based on two principles, namely: (1) interrelation or how participants share information and coordinate their efforts to achieve predictable results despite failure; and (2) cooperation to accomplish shared purposes. Web Services cooperation can range from performing operations over a shared resource to its own execution in a predefined sequence. The WS-CAF terminology makes use of the concepts of participant, context, outcome, and coordinator throughout the specification: • Participants. Cooperating Web Services that take part in the achievement of a shared purpose. • Context. Allows storing and sharing relevant information to participants in a composite process to enable work correlation. Such data structure includes information like identification of shared resources, collection of results, and common security information. • Outcome. Summary of results obtained by the execution of cooperating Web Services. • Coordinator. Responsible for reporting participants about the outcome of the process, context management, and persisting participants outcome. 31.4.2.3 WS-CAF Elements The framework consists of three main elements: (1) Web Services Context (WS-CTX) represents the basic processing unit of the framework, allowing multiple cooperating Web Services to share a common context; (2) Web Services Coordination Framework (WS-cf.) builds on top of WS-CTX and distributes organizational information relevant for the activity to participants, allowing them to customize the coordination protocol that better suits their needs; and (3) Web Services Transaction Management (WS-TXM) represents
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 9 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-9
a layer on top of WS-CTX and WS-cf., defining transaction models for the different types of B2B interaction [Bunting et al., 2003]. 31.4.2.4 WS-CTX WS-CTX is a lightweight mechanism that allows multiple Web Services participating in an activity to share a common context. It defines the links among Web Services through their association with a Web Service Context Service, which manages the shared context for the group. WS-CTX defines the context, the scope of the context sharing, and basic rules for context management [Bunting et al., 2003a]. WS-CTX allows stating starting and ending points for activities, provides registry facilities to control which Web Services are taking part in a concrete activity, and disseminates context information. WSCTX is composed of three main elements, namely: • Context Service: Defines the scope of an activity and how information about the context can be referenced and propagated. • Context: Defines basic information about the activity structure. It contains information on how to relate multiple Web Services with a particular activity. The maintenance of contexts and its association with execution environments is carried by the Context Service, which keeps a repository of contexts. • The Activity Lifecycle Service (ALS): An extension of the Context Service, which facilitates the activity’s enhancement with higher-level interfaces. Whenever a context is required for an activity and it does not exactly suit the necessities of the particular application domain, the Context Service issues a call to the registered ALS, which provides the required addition to the context. Essentially WS-CTX allows the definition of activity in regard to Web Services, provides the means to relate Web Services to one another with respect to a particular activity, and defines Web Services mappings onto the environment. 31.4.2.5 WS-CF WS-CF is a sharable mechanism that allows management of lifecycles, and context augmentation, and guarantees message delivery, together with coordination of the interactions of Web Services that participate in a particular transaction, by means of outcome messages [Bunting et al., 2003b]. WS-CF allows the definition of the starting and ending points for coordinated activities, the definition of points where coordination should take place, the registration of participants to a concrete activity, and propagation of coordination information to activity participants. The WS-CF specification has three main architectural components, namely: • Coordinator: Provides the means to register participants for a concrete activity • Participant: Specifies the operation(s) performed as part of the coordination sequence processing • Coordination Service: Determines a processing pattern used to define the behavior of a particular coordination model In a nutshell, WS-CF defines the core infrastructure for Web Services Coordination Service, provides means to define Web Services mappings onto the environment, delineates infrastructure support, and finally allows concreting the responsibilities of the different WS-CF subcomponents. 31.4.2.6 WS-TXM WS-TXM comprises protocols that facilitate support for various transaction processing models, providing interoperability across multiple transaction managers by means of different recovery protocols [Bunting et al., 2003c]. WS-TXM gives the core infrastructure for Web Services Transaction Service, facilitates means to define Web Services mappings onto the environment, defines an infrastructure to support an event communication mechanism, and finally establishes the roles and responsibilities of the WS-TXM components.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 10 Wednesday, August 4, 2004 8:59 AM
31-10
The Practical Handbook of Internet Computing
Roughly speaking, WS-CAF enables the sharing of Web Service’s context viewed as an independent resource; provides a neutral and abstract transaction protocol to map to existing implementation; presents an application transaction level dependent upon application needs; includes a layered architecture that allows applications to the level of service needed, not imposing more functionalities than required; and includes a great degree of interoperability, thanks to the use of vendor neutral protocols.
31.4.3 Frameworks Comparison Both of the initiatives introduced so far present a solution to compose applications out of multiple services using different approaches, and currently at different development stages. Whereas WSMF adopts a more formal philosophy that focuses on how Web Services should be described to achieve composition based on the paradigms of the Semantic Web (and providing an extensive covering of all the different aspects in the field), WS-CAF takes a more hands-on approach from a service execution point of view and its requirements to address the same problem. WS-CAF puts special emphasis on dealing with failure, context management, transaction support, and effort coordination, whereas WSMF uses ontologies as a pivotal element to support its scalable discovery, selection, mediation and composition aim. WS-CAF already counts with a layered architecture for services execution, which shows a more mature state of development, whereas WSMF efforts have focused on establishing the pillars of what a complete framework to model Semantic Web Services should look like, providing a broader coverage of the subject. Both initiatives count on the support of major software vendors, which will result in the development of a solid technology with a clear business- and user-driven aim. Table 31.1 summarizes the functionality provided by both frameworks regarding its automation support. As a comment, the mediation must account for process, protocol, data, and service invocation mediation, whereas execution must take care of failure, context management, transaction support, effort coordination, concurrent execution, concurrent input and output, exception handling, and compensation strategies in case of failure. The Semantic Web is the future of the Web. It counts on ontologies as a key element to describe services and provide a common understanding of a domain. Semantic Web Services is its killer application. The development foreseen for this technology presents WSMF as a stronger candidate from an impact and visibility point of view, as well as a long-term runner, due to the use of the paradigms of the Semantic Web. WS-CAF does not use ontologies in its approach, but puts strong emphasis in the coordination of multiple and possibly incompatible transaction processing models across different architectures which, in the short term, will bridge a technology gap and will enable the cooperation of both frameworks in the near future.
TABLE 31.1 Summary of Intended Purpose of WSMF and WS-CAF
Framework
Copyright 2005 by CRC Press LLC
Automation Support
WSMF
— Discovery — Selection — Mediation — Composition — Execution
WS-CAF
— Mediation — Composition — Execution
C3812_C31.fm Page 11 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-11
31.5 Epistemological Ontologies for Describing Services Upper level ontologies supply the means to describe the content of on-line information sources. In particular, and regarding Web Services, they provide the means to mark them up, describing their capabilities and properties in unambiguous, computer-interpretable form. In the coming sections a description of DAML-S and its actual relation with OWL-S is presented.
31.5.1 DAML-S and OWL-S The Defense Advanced Research Projects Agency (DARPA) Agent Markup Language (DAML) for Services (DAML-S) [DAML-S 2003] is a collaborative effort by BBN Technologies, Carnegie Mellon University, Nokia, Stanford University, SRI International, and Yale University to define an ontology for semantic markup of Web Services. DAML-S sits on top of WSDL at the application level, and allows the description of knowledge about a service in terms of what the service does — represented by messages exchanged across the wire between service participants as to why and how it functions [DAML-S 2003]. The aim of DAML-S is to make Web Services computer interpretable enabling their automated use. Current DAML-S releases (up to 0.9) have been built upon DAML+OIL, but to ensure a smooth transition to OWL (Web Ontology Language), 0.9 release and subsequent ones will be based also on OWL. Roughly speaking, DAML-S refers to the ontology built upon DAML+OIL, whereas OWL-S refers to that built upon OWL.
31.5.2 DAML-S Elements DAML-S ontology allows the definition of knowledge that states what the service requires from agents, what it provides them with, and how. To answer such questions, it uses three different elements: (1) the Service Profile, which facilitates information about the service and its provider to enable its discovery, (2) the Service Model, which makes information about how to use the service available, and (3) Service Grounding, which specifies how communications among participants are to be carried on and how the service will be invoked. Service Profile. The Service Profile plays a dual role; service providers use it to advertise the services they offer, whereas service requesters can use it to specify their needs. It presents a public high-level description of the service that states its intended purpose in terms of: (1) the service description, information presented to the user, browsing service registries, about the service and its provider that helps to clarify whether the service meets concrete needs and constraints such as security, quality requirements, and locality; (2) functional behavior, description of duties of the service; and (3) functional attributes, additional service information such as time response, accuracy, cost, or classification of the service. Service Model. This allows a more detailed analysis of the matching among service functionalities and user needs, enabling service composition, activity coordination and execution monitoring. It permits the description of the functionalities of a service as a process, detailing control and data flow structures. The Service Model includes two main elements, namely: (1) Process Ontology, which describes the service in terms of inputs (information necessary for process execution), outputs (information that the process provides), preconditions (conditions that must hold prior the process execution), effects (changes in the world as a result of the execution of the service) and, if necessary, component subprocess; and (2) Process Control Ontology that describes process in terms of its state (activation, execution and completion). Processes can have any number of inputs, outputs, preconditions, and effects. Both outputs and effects can have associated conditions. The process ontology allows the definition of atomic, simple, and composite processes:
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 12 Wednesday, August 4, 2004 8:59 AM
31-12
The Practical Handbook of Internet Computing
• Atomic processes. They are directly invocable, have no subprocesses, and execute in a single step from the requester perspective. • Simple processes. They are not directly invocable and represent a single-step of execution. They are intended as elements of abstraction that simplify the composite process representation or allow different views of an atomic process. • Composite processes. They decompose into other processes either composite or noncomposite, by means of control constructs. Service Grounding. Service Grounding specifies details regarding how to invoke the service (protocol, messages format, serialization, transport, and addressing). The grounding is defined as mapping from abstract to concrete realization of service’s descriptions in terms of inputs and outputs of the atomic process, realized by means of WSDL and SOAP. The service grounding shows how inputs and outputs of an atomic process are realized as messages that carry inputs and outputs. In brief, DAML-S is an upper ontology used to describe Web Services and includes elements intended to provide automated support for the Semantic Web Service’s tasks. Table 31.2 summarizes the facilities covered by each upper level concept in the DAML-S ontology: The following example (adapted from Ankolekar et al. [2002]) provides detailed information on how to use the DAML-S ontology to describe the pieces of software that can build a service and how to define the grounding of each one of these basic processes. The example presents a Web aggregation service. Given the user’s login, his password, and the area of interest to which the aggregation is to be performed, it consolidates information disseminated over different Web sources, (i.e., banks in which the user has an account, telephone companies the user works with, and news services to which the user has signed), and presents it avoiding the process of login and browsing each one of the sources, in search for the desired piece of information. First the service must be described in terms of the different constituents that comprise it, specifying the type of process (atomic, simple, composite). In this case a description of the news aggregation service, as an atomic process, is provided.
Then, the set of different properties associated with each one of the programs of the service must be defined. An input for the news aggregation service could be the language used to write the gathered news.
TABLE 31.2 Summary of Intended Purpose of DAML-S Upper Level Concepts Upper level concept Profile
Model Grounding
Copyright 2005 by CRC Press LLC
Automation Support — Discovery — Planning — Composition — Interoperation — Execution monitoring — Invocation
C3812_C31.fm Page 13 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-13
Next the grounding must be defined and related to the service constituent. For this purpose a restriction tag is used, establishing that the NewsAggregation program has grounding and that it is identified by the name NewsAggregationGrounding. Basically we are stating that every instance of the class has an instance of the hasGrounding property, with the value NewsAggregationGrounding.
Finally, an example of a DAML-S grounding instance is presented. It is important to notice that URIs (#ConsolidateNews, #NewsAggregationInput, etc.) correspond to constructs in a WSDL document, which is not shown here. “http://www.w3.org/TR/2001/NOTE-wsdl-20010315” “http://schemas.xmlsoap.org/wsdl/soap/” “http://schemas.xmlsoap.org/soap/http/” “http://service.com/aggregation/newsAggregation.wsdl” … other message map elements… … similar to wsdlInputMessageParts… Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 14 Wednesday, August 4, 2004 8:59 AM
31-14
The Practical Handbook of Internet Computing
31.5.3 Limitations DAML-S and OWL-S early beta stage can be seen as a Gruyere cheese with lots of holes to fill [Payne, 2003]. It has numerous limitations that will be overcome in a near future: • One-to-one mapping between Profiles and Service Models. It prevents the reuse of profiles and n:m mappings between profiles and concrete services. • Separation between public and private business logic. It does not allow hiding the internals of the service, exposing its business logic to requesters. • Lack of conversation interface definition. The service model could be used as the conversational interface if the service is published as a composite service, but this is not stated as the intended purpose of the service model. • Interface overloading. The defined WSDL grounding doesn’t support publishing one single service interface for a service accepting different inputs. • Preconditions and effects. No means to describe pre-conditions and effects are given, and the definitions of these elements are not clear enough. • Constructs. The constructs defined for the process model may not be sufficient. The path followed by this standard seems to be the correct one. Much effort is being put to work to overcome many of the actual problems and limitations, which will undoubtedly be solved in the near future, providing a solid standard to consistently describe Semantic Web Services.
31.6 Summary The Semantic Web is here to stay. It represents the next natural step in the evolution of the current Web. It will have a direct impact in areas such as e-Business, Enterprise Application Integration, and Knowledge Management, and an indirect impact on many other applications affecting our daily life. It will help create emerging fields where knowledge is the most precious value, and will help to further the development of existing ones. The Semantic Web will create a complete new concept of the Web by extending the current one, alleviating the information overload problem, and bringing back computers to their intended use as computational devices, and not just information-rendering gear. Ontologies are the backbone of this revolution due to their potential for interweaving human understanding of symbols with machine processability [Fensel, 2002]. Ontologies will enable semantical enhancement of Web Services, providing the means that facilitate the task-driven automatic discovery, composition, and execution of interorganization business logics, thus making Semantic Web Services the killer application of the Semantic Web. A shared functionality description is the key element towards task-driven automatic Web Service discovery, composition, and execution of interorganization business logics. It will permit the (1) location of different services that, solely or in combination, will provide the means to solve a given task, (2) combination of services to achieve a goal, and (3) facilitation of the replacement of such services by equivalent ones that, solely or in combination, can realize the same functionality, such as in case of failure during execution. Many business areas are giving a warm welcome to the Semantic Web and Semantic Web Services. Early adopters in the fields of biotechnology or medicine are already aware of its potentials to organize knowledge and infer conclusions from available data. The juridical field has already realized its benefits to organize and manage large amounts of knowledge in a structured and coherent way. The interest of professionals from this sector towards the Semantic Web is rapidly gaining momentum, being forecasted as an area where the Semantic Web will make a difference. Engineers, scientists, and basically anyone dealing with information will soon realize the benefits of Semantic Web Services due to the new business paradigm it provides. Business services will be published on the Web and will be marked up with machineprocessable semantics to allow their discovery and composition, providing a higher level of functionality,
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 15 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-15
adding increasingly more complex layers of services, and relating business logics from different companies in a simple and effective way. The hardest problem the Semantic Web Services must overcome is its early development stage. Nowadays, the technology is not mature enough to accomplish its promises. Initiatives in Europe and the U.S. are gaining momentum, and the appropriate infrastructure, technology, and frameworks are currently being developed. Roughly speaking there is a gap between the current Web and the Semantic one in terms of annotations. Pages and Web resources must be semantically enriched in order to allow automatic Web Services interoperation. Once this is solved, the Semantic Web and Semantic Web Services will become a plausible reality and the Web will be lifted to its full potential, causing a revolution in human information access. The way computers are perceived and the Web is understood will be completely changed, and the effects will be felt in every detail of our daily life.
Acknowledgements The authors would like to thank SungKook Han, Holger Lausen, Michael Stolberg, Jos de Brujin, and Anna Zhdanova for the many hours spent discussing different issues that helped in great manner to make this work possible. Also thanks to Alexander Bielowski for his feedback regarding English writing.
References [Andrews et al., 2002] Tony Andrews, Francisco Curbera, Hitesh Dholakia, Yaron Goland, Johannes Klein, Frank Leymann, Kevin Liu, Dieter Roller, Doug Smith, Ivana Trickovic, and Sanjiva Weerawarana. Business Process Execution Language for Web Service (BPEL4WS) 1.1. http://www.ibm.com/developerworks/library/ws-bpel, August 2002. [Ankolekar et al., 2002] Anupriya Ankolekar, Mark Burstein, Jerry R. Hobbs, Ora Lassila, David Martin, Drew McDermott, Sheila A. McIlraith, Srini Narayanan, Massimo Paolucci, Terry Payne, and Katya Sycara. DAML-S: Web Service Description for the Semantic Web. International Semantic Web Conference, ISWC 2002. [Arkin, 2002] Assaf Arkin. Business Process Modelling Language. http://www.bpmi.org/, 2002. [Arkin et al., 2002] Assaf Arkin, Sid Askary, Scott Fordin, Wolfgang Jekeli, Kohsuke Kawaguchi, David Orchard, Stefano Pogliani, Karsten Riemer, Susan Struble, Pal Takacsi-Nagy, Ivana Trickovic, and Sinisa Zimek. Web Service Choreography Interface 1.0. http://wwws.sun.com/software/xml/developers/wsci/wsci-spec-10.pdf, 2002. [Atkinson et al., 2002] Bob Atkinson, Giovanni Della-Libera, Satoshi Hada, Maryann Hondo, Phillip Hallam-Baker, Johannes Klein, Brian LaMacchia, Paul Leach, John Manferdelli, Hiroshi Maruyama, Anthony Nadalin, Nataraj Nagaratnam, Hemma Prafullchandra, John Shewchuk, and Dan Simon. Web Services Security (WS-Security) 1.0. http://www-106.ibm.com/developerworks/library/wssecure, 2002. [Bellwood et al., 2002] Tom Bellwood, Luc Clément, David Ehnebuske, Andrew Hately, Maryann Hondo, Yin Leng Husband, Karsten Januszewski, Sam Lee, Barbara McKee, Joel Munter, and Claus von Riegen. UDDI Version 3.0. Published Specification. http://uddi.org/pubs/uddi-v3.00-published20020719.htm, 2002. [Berners-Lee et al., 2001] Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web. Scientific American, 284(5): 34–43, 2001. [Box et al., 2000] Don Box, David Ehnebuske, Gopal Kakivaya, Andrew Layman, Noah Mendel-sohn, Henrik F. Nielsen, and Satish Thatte, Dave Winer. Simple Object Access Protocol (SOAP) 1.1. http:/ /www.w3.org/TR/SOAP/, 2000. [Bunting et al., 2003] Doug Bunting, Martin Chapman, Oisin Hurley, Mark Little, Jeff Mischkinsky, Eric Newcomer, Jim Webber, and Keith Swenson. Web Service Composite Application Framework (WSCAF). http://developers.sun.com/techtopics/Webservices/wscaf/primer.pdf, 2003.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 16 Wednesday, August 4, 2004 8:59 AM
31-16
The Practical Handbook of Internet Computing
[Bunting et al., 2003a] Doug Bunting, Martin Chapman, Oisin Hurley, Mark Little, Jeff Mischkinsky, Eric Newcomer, Jim Webber, and Keith Swenson. Web Service Context (WS-Context). http:// developers.sun.com/techtopics/Webservices/wscaf/wsctx.pdf, 2003. [Bunting et al., 2003b] Doug Bunting, Martin Chapman, Oisin Hurley, Mark Little, Jeff Mischkinsky, Eric Newcomer, Jim Webber, and Keith Swenson. Web Service Coordination Framework (WS-cf.). http://developers.sun.com/techtopics/Webservices/wscaf/wscf.pdf, 2003. [Bunting et al., 2003c] Doug Bunting, Martin Chapman, Oisin Hurley, Mark Little, Jeff Mischkinsky, Eric Newcomer, Jim Webber, and Keith Swenson. Web Service Transaction management (WSTXM). http://developers.sun.com/techtopics/Webservices/wscaf/wstxm.pdf, 2003. [Christensen et al., 2001] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. WSDL Web Services Description Language (WSDL) 1.1. http://www.w3.org/TR/wsdl, 2001. [Daconta et al., 2003] Michael C. Daconta, Leo J. Obrst, and Kevin T. Smith. The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management. John Wiley and Sons, Indianapolis, IN, 2003. [DAML-S, 2003] The DAML services coalition. DAML-S: Semantic Markup for Web Services (version 0.9). http://www.daml.org/services/daml-s/0.9/daml-s.pdf, 2003. [DAML] DARPA Agent Markup Language (DAML). www.daml.org. [Ding et al., 2003] Ying Ding, Dieter Fensel, and Hans-Georg Stork. The semantic web: from concept to percept. Austrian Artificial Intelligence Journal (OGAI), 2003, in press. [DIP] Data, Information and Process Integration with Semantic Web Services. http://dip.semanticweb.org/. [ebXML] electronic business XML (ebXML). www.ebxml.org/specs. [Esperonto] Esperonto. esperonto.semanticWeb.org. [Fensel, 2001] Dieter Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag, Berlin, 2001. [Fensel, 2002] Dieter. Fensel. Semantic Enabled Web Services XML-Web Services ONE Conference, June 7, 2002. [Fensel and Bussler, 2002] Dieter Fensel and Christoph Bussler. The Web Service Modeling Framework WSMF. Electronic Commerce Research and Applications, 1(2), 2002. [Gruber, 1993] Thomas R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5: 199–220, 1993. [HPKB] High Performance Knowledge Bases (HPKB). reliant.teknowledge.com/HPKB. [KowledgeWeb] Knowledge Web. knowledgeWeb.semanticWeb.org [Martin, 2001] James Martin. Web Services: The Next Big Thing. XML Journal, 2, 2001. http://www.syscon.com/xml/archivesbad.cfm. [McIlraith et al., 2001] Sheila A. McIlraith, Tran C. Son, and Honglei Zeng. Semantic Web Services. IEEE Intelligent Systems, 16(2), March/April, 2001. [Newcomer, 2002] Eric Newcomer. Understanding Web Services: XML, WSDL, SOAP UDDI. Addison Wesley, Reading, MA, 2002. [OntoKnowledge] Ontoknowledge. www.ontoknowledge.org. [OntoWeb] OntoWeb. www.ontoWeb.org. [Payne, 2003]. Terry Payne. The First European Summer School on Ontological Engineering and the Semantic Web. http://minsky.dia.fi.upm.es/sssw03, Cercedilla, Spain, July 21-26, 2003. [Peer, 2002] Joachim Peer. Bringing Together Semantic Web and Web Services. First International Semantic Web Conference, Sardinia, Italy, June 2002. [RKF] Rapid Knowledge Formation (RKF). reliant.teknowledge.com/RKF. [SEKT] Semantic Knowledge Technologies. sekt.semanticWeb.org. [Sirin et al., 2003] Evren Sirin, James Hendler, and Bijan Parsia. Semi-automatic Composition of Web Services using Semantic Descriptions. Proceedings of the 1st Workshop on Web Services: Modeling, Architecture and Infrastructure (WSMAI-2003). In conjunction with ICEIS 2003, Angers, France, pp. 17–24, April 2003.
Copyright 2005 by CRC Press LLC
C3812_C31.fm Page 17 Wednesday, August 4, 2004 8:59 AM
Semantic Aspects of Web Services
31-17
[Sycara et al., 1999] Katya Sycara, Matthias Klusch, and Seth Widoff. Dynamic service matchmaking among agents in open information environments. ACM SIGMOD Record, 28(1): 47–53, March 1999. [SWWS] Semantic Web Enabled Web Services. swws.semanticWeb.org. [WSMO] Web Service Modeling Ontology, http://www.wsmo.org/. [Tidwell, 2000] Doug Tidwell. Web Services: the Web’s Next Revolution. http://www-106.ibm.com/ developerworks/edu/ws-dw-wsbasics-i.html. [W3C] World Wide Web Consortium (W3C). www.w3.org/2001/sw.
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 1 Wednesday, August 4, 2004 9:01 AM
32 Business Process: Concepts, Systems, and Protocols CONTENTS 32.1 Introduction 32.1.1 32.1.2 32.1.3 32.1.4 32.1.5 32.1.6 32.1.7
The Need for Business Process Automation Workflow Management Systems Overview Scheduling Resource Assignment Data Management Failure and Exception Handling WfMS Architectures
32.2 Web Services and Business Processes
Fabio Casati Akhil Sahai
32.2.1 Web Services 32.2.2 Web Services and Business Process Orchestration
32.3 Conclusion References
32.1 Introduction Companies and organizations are constantly engaged in the effort of providing better products and services at lower costs, of reducing the time to market, of improving and customizing their relationships with customers and ultimately, of increasing customer satisfaction and the company’s profits. These objectives push companies and organizations towards continuously improving the business processes that are performed in order to provide services or produce goods. The term business process denotes a set of tasks executed by human or automated resources according to some ordering criteria that collectively achieves a certain business goal. For example, the set of steps required to process a travel reimbursement request constitutes a business process. A new execution of the process begins when an employee files a reimbursement request. At that point, an approving manager is selected, and the request is routed to him. If the manager approves the request, then the funds are transferred to the employee bank account, otherwise a notification is sent to the employee informing him or her about the causes of the rejection. This process is graphically depicted in Figure 1. Of course, the process shown here is much simplified with respect to the way expense reimbursement requests are actually handled, but this example suffices to introduce the main concepts. Many different business processes are executed within a company. Requesting quotes, procuring goods, processing payments, and hiring or firing employees are all examples of business processes. The term “business” denotes that these are processes that perform some business function, and this distinguishes them from operating system processes.
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 2 Wednesday, August 4, 2004 9:01 AM
32-2
The Practical Handbook of Internet Computing
select approving manager
send request to manager
else
notify reason for rejection
response = “approved”
transfer funds to employee
FIGURE 32.1 The travel expense reimbursement process.
32.1.1 The Need for Business Process Automation Business processes are really at the heart of what a company does (Figure 2). Every activity of a company can actually be seen as a business process. This means that improving the quality and reducing the cost of the business processes is key to the success of a company. For example, if a company can have a better and more effective supply chain operations, then this will typically result in both higher revenues and higher profits. From an information technology perspective, business process improvement and optimization are closely related to businesss automation. In fact, if the different steps of the process (and even its overall execution) can be enacted and supervised in an automated fashion, there is a reduced need for human involvement (i.e., reduced costs), and the execution is faster and more accurate. Business process automation can be achieved at many different levels that also correspond to the historical evolution in the technology within this domain. At the basic level, the different enterprise functions can be automated. This corresponds to automating the individual steps in a business process. For example, in the expense reimbursement process, fund transfers can be performed through banking transactions (e.g., SWIFT wire transfer) and by accessing ERP systems for internal accounting purposes. Automating the individual steps provides great benefit to companies because it generates significant improvement in terms of cost and quality: The execution of each individual step is now faster and more accurate. However, this in itself does not suffice to reach the goal of a streamlined process. Indeed, it leads to islands of automation, where the individual functions are performed very efficiently, but the overall process still requires much manual work for transferring data among the different applications and for synchronizing and controlling their execution. For example, in the expense-reimbursement process, even if the individual steps are automated, there is still lots of manual work involved in entering data into a system (e.g., the human resource database), getting the desired output (e.g., the name and email address of the approving manager), providing the data for the next step (sending an email to the manager), collecting once again the results (i.e., the approval or rejection) and, based on them,
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 3 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-3
DBMS Customers business process ERP Manufacturers, suppliers
CRM
FIGURE 32.2 Business processes are at the heart of what a company does.
interacting with the funds-transfer system or sending an email to the employee notifying him or her of the rejection. Even in such a simple process, someone must take care of interacting with many different systems, entering the same data many times, making sure that the steps are performed in the correct order and that nothing is skipped, etc. All this is clearly prone to errors. Furthermore, exception detection and handling is also left to the human. For example, the administrative employees in charge of executing the process need to continuously monitor each case to make sure that the approving manager sends a response in due time, otherwise a reminder should be sent or the request should be routed to a different manager. Another problem is that of process tracking. It happens frequently that, once a request is made, the employee calls back at a later time to know the status of the request, or a customer enquires about the progress of an order. If only the individual process steps are automated, but the overall process logic is executed manually, it will be difficult to be able to assess the stage in which a request is. The discussion above shows just a few of the reasons why automating the individual steps, and not the overall process, is not sufficient to achieve the quality and efficiency goals required by today’s corporations. The obvious next step, therefore, consists in automating the process logic by developing an application that goes through the different steps of the process, which automatically transfers data from one step to the next and manages exceptions. This is conceptually simple, but the implementation can prove to be very challenging. The reason is simple: the systems that support the execution of each individual step are very different from each other. They have different interfaces, different interaction paradigms, different security requirements, different transport protocols, different data formats, and more. Therefore, if we take the simplistic approach of coding the process logic in some third generation language without any specific process automation support, then not only do we have to code the sequencing of the steps, the transfer of data between one step and the next, and the exception handling, but we also have to deal with the heterogeneity of the invoked applications. The amount of coding needed is usually quite large, and it is one of the hardest tasks in process automation. The problem gets even worse if the invoked applications belong to a different company (e.g., a product database maintained by a vendor), as this requires crossing firewalls and trust domains. A solution frequently adopted to deal with this problem consists in adding a layer that shields the programmers from the heterogeneity and makes all systems look alike from the perspective of the
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 4 Wednesday, August 4, 2004 9:01 AM
32-4
The Practical Handbook of Internet Computing
developer who has to code the process logic. The middleware tools that support this effort are typically called Enterprise Application Integration (EAI) platforms. Essentially, their functionality consists in providing a single data model and interaction mechanism to the developers (e.g., Java objects and JMS messages), and in converting these homogeneous models and mechanisms into the ones supported by each of the invoked applications, by means of adapters that perform appropriate mappings. Many commercial EAI platforms exist, supporting integration within and across enterprises. Some examples are IBM WebSphere MQ [1], Tibco Rendez-Vous and related products [2], WebMethods Enterprise Integrator [3], and Vitria BusinessWare [4]. EAI platforms therefore aim at solving one part of the problem: that of managing heterogeneity. To achieve process automation, the “only” remaining step consists in encoding the process logic. This is actually what most people refer to when speaking about process automation. It is also the most interesting problem from a conceptual and research perspective. In the following we, therefore, focus on this aspect of business process automation, examine the challenges in this domain, and show what has been done in the industry and in academia to address them.
32.1.2 Workflow Management Systems Overview In order to support the definition and execution of business processes, several tools have been developed within the last decade. These tools are collectively known as Workflow Management Systems (WfMS). In WfMS terminology, a workflow is a formal representation of a business process, suitable for being executed by an automated tool. Specifications of a process done by means of a workflow language are called workflow schemas. Each execution of a workflow schema (e.g., each travel expense reimbursement) is called workflow instance. A WfMS is then a platform that supports the specification of workflow schemas and the execution of workflow instances. In particular, a WfMS supports the following functionality: • Schedule activities for execution, in the appropriate order, as defined by the workflow schema. For example, in the expense reimbursement process, it schedules the manager selection, then the notification of the request to the manager, and then either the employee notification or the fund transfer, depending on the manager’s approval. • Assign activities to resources. In fact, a different person or component can execute each task in a process. The WfMS identifies the appropriate resource, based on the workflow definition and on resource assignment criteria. • Transfer data from one activity to the next, thereby avoiding the problem of repeated data entry, one of the most labor-intensive and error-prone aspects of manual process execution. • Manage exceptional situations that can occur during process execution. In the travel expense reimbursement process, these can include situations in which the manager does not send the approval, the Human Resources database system is not available, or the employee cancels the reimbursement request. • Efficiently execute high process volumes. In workflows such as payrolls or order management, the number of instances to be executed can be in the order of the tens of thousands per day. Therefore, it is essential that the WfMS be able to support volumes of this magnitude. There are currently hundreds of tools in the market that support these functionalities, developed by small and large vendors. Some examples are HP Process Manager, IBM MQ Series Workflow [5] (now called WebSphere workflow), Tibco Process Management [6], and Microsoft BizTalk Orchestration [7]. Although these systems differ in the detail, they are essentially based on similar concepts, and all of them try to address the issues described above. We now examine each of these issues in more detail, also showing several possible alternatives to approach each issue. The reader interested in more details is referred to [8].
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 5 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-5
32.1.3 Scheduling Scheduling in a WfMS is typically specified by means of a flow chart, just like the one shown in Figure 1. More specifically, a workflow is described by a directed graph that has four different kinds of nodes: • Work nodes represent the invocation of activities (also called services), assigned for execution to a human or automated resource. They are represented by rounded boxes in Figure 1. • Route nodes are decision points that route the execution flow among nodes based on an associated routing rule. The diamond of Figure 1. is an example of a routing node that defines conditional execution: one of the activities connected in output is selected, based on some runtime branching condition. In general, more than one output task can be started, thereby allowing for parallel execution. • Start nodes denote the entry point to the processes. • Completion nodes denote termination points. Arcs in the graph denote execution dependencies among nodes: When a work node execution is completed, the output arc is “fired,” and the node connected to this arc is activated. Arcs in output to route nodes are, instead, fired based on the evaluation of the routing rules. The model described above is analogous in spirit to activity diagrams [9]. Indeed, many models follow a similar approach because this is the most natural way for developers to think of a process. It is also analogous to how programmers are used to coding applications, essentially involving the definition of a set of procedure calls, and of the order and conditions under which these procedures should be invoked. A variation on the same theme consists in using Petri nets as a workflow-modeling paradigm, although modified to make them suitable for this purpose. Petri-net-based approaches have been discussed in detail in [10]. Other techniques have also been proposed although they are mostly used as internal representation of a workflow schema and, therefore, are not exposed to the designers. For example, one such modeling technique consists in specifying the workflow as a set of Event-Condition-Action (ECA) rules that define which step should be activated as soon as another step is completed. For example, the following rule can be defined to specify part of the process logic for the travel expense reimbursement process: WHEN COMPLETED(SEND_REQUEST_TO_MANAGER) IF RESPONSE=“APPROVED” THEN START(TRANSFER_FUNDS_TO_EMPLOYEE)
The entire process can then be specified by a set of such rules. Examples of rule-based approaches can be found in [11, 12, 13]. In the late 1990s, an industry-wide effort was started with the goal of standardizing scheduling languages and more generally, standardizing workflow languages. The consortium supervising the standardization is called Workflow Management Coalition [14]. Despite initial enthusiasm, the efforts of the coalition did not manage to achieve consensus among the major players, and therefore few vendors support the workflow language proposed by this consortium. As a final remark on process modeling we observe that, until very recently, processes were designed to work in isolation, that is, each process was designed independently of others, and it was very hard to model interactions among them (e.g., points at which processes should synchronize or should exchange data). WfMSs provided no support for this, neither in the modeling language nor in the infrastructure, so that interoperability among processes had to be implemented in ad hoc ways by the developer. The only form of interaction supported by the early system was that a process was able to start another process. More recently, approaches to support interoperability were proposed both in academia (see, e.g., [15, 16]) and in industry, especially in the context of service composition (discussed later in this chapter). These approaches were mainly based on extending workflow models with the capability of publishing and subscribing to messages and of specifying points in the flow where a message was to be sent or
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 6 Wednesday, August 4, 2004 9:01 AM
32-6
The Practical Handbook of Internet Computing
received. In this way, requests for messages acted both as a synchronization mechanism (because such requests are blocking) and as a means to receive data from other processes.
32.1.4 Resource Assignment Once a task has been selected for execution, the next step that the WfMS must perform consists in determining the best resource that can execute it. For example, in the travel expense reimbursement process, once the WfMS has scheduled the task send request to manager for execution, it needs to determine the manager to whom the request should be routed. The ability of dynamically assigning work nodes to different resources requires the possibility of defining resource rules, i.e., specifications that encode the logic necessary for identifying the appropriate resource to be assigned to each work node execution, based on instance-specific data. WfMSs typically adopt one of the two following approaches for resource selection: 1. In one approach, the workflow model includes a resource model that allows system administrators to specify resources and their properties. The resource rule will then select a resource based on the workflow instance data and the attributes of the resource. For example, a workflow model may allow the definition of resources characterized by a name and a set of roles, describing the capabilities or the authorizations of a resource. A role may be played by multiple resources, and the same resource can have several roles. For example, the resource John Smith may play the roles of IT manager and Evaluator of supply chain projects. Once resources and roles have been defined, then work node assignment is performed by stating the roles that the resource to which the work is assigned must have. For example, an assignment rule can be: if %EMPLOYEE_DEPT=“IT” then role=“IT manager” else role=“manager”
where EMPLOYEE_DEPT is the name of a workflow data item (discussed next). In this case, the rule states that the node with which the rule is associated should be assigned to a resource playing role IT manager if the requesting employee is in the IT department and to a (generic) manager otherwise. Roles can typically be organized into a specialization hierarchy, with the meaning that if role A is an ancestor of role B, then a resource playing the specialized role B can also play the more generic role A. For example, role IT manager may be a child of role manager. Role hierarchies simplify the specifications of resource rules, as it is possible to assign nodes to a super-role (such as manager), rather than explicitly listing all the subroles for which manager is the ancestor. This is particularly useful if roles change over time (e.g., a new subrole of manager is introduced), because the existing resource rules that assign work items to role manager do not need to be modified to take the new role into account. 2. In an alternative approach, the workflow model does not include a resource model. Instead, work nodes are assigned to resources by contacting a resource broker each time a work node needs to be executed. In this case, the resource rule language is determined by the resource broker, not by the WfMS. The rule is simply passed to the resource broker for execution. The broker is expected to return the names or identifiers of one or more resources to which the work node should be assigned. These two approaches present both advantages and disadvantages. When an internal resource model is present, the assignment is typically faster, as it does not require the invocation of the resource broker. In addition, it is more practical if the resources are limited in number and change slowly with time as there is no need to install and configure an external broker, and it is possible to specify assignments with the simple, point-and-click interface provided by the workflow design tool. However, the internal resource model presents several drawbacks. The main problem is that often the resources are actually controlled by a component that is external with respect to the WfMS. This is a frequent situation in large organizations. For example, work nodes may need to be executed by employees
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 7 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-7
who have certain attributes (e.g., work in a specific location). The database describing listed active employees, along with their roles and permissions, is maintained externally, typically by the Human Resource (HR) department, and its contents may change daily. It is therefore impractical to keep changing the workflow resource definition to keep it synchronized with the HR database. In addition, resource rules may need to be based on a variety of attributes that go beyond the resource name or role (e.g., the location). Therefore, in such cases, WfMSs offering an external broker are more flexible because users can typically plug-in the broker they need to contact in an external resource directory and query the appropriate resource, based on a broker-specific language.
32.1.5 Data Management Once the WfMS has identified the activity to be executed and the resource that will execute it, the next step consists in preparing the data necessary for the task execution. For example, invoking the send request to manager activity essentially involves sending an email to a manager, thereby passing the relevant information (e.g., the employee name, the travel data, the reimbursement amount, and other information). In general, this data is derived from workflow invocation parameters (provided as a new instance is created) or from the output of a previously executed activity. An important aspect of a workflow execution therefore involves transferring data from one activity to the next, so that the proper data is made available to the resource. Data transfers between activities are typically specified as part of a workflow schema, by means of workflow variables. Basically, each workflow schema includes the declaration of a set of variables; just as is done in conventional programming languages. Variables are typed, and the data types can be the “usual” integer, real, or string, or they can be more complex types, ranging from vectors to XML schemas. The variables act as data containers that store the output of the executed activities as well as the workflow invocation data passed to the WfMS as a new instance is started. For example, data about the employee reimbursement request, provided as a new instance is started, can be stored in variables employeeName, employeeID, travelDestination, and requestedAmount. The value of workflow variables can then be used to determine the data to be passed to the resource once the activity is invoked. For example, the input for the activity send request to manager can be specified as being constituted by variables employeeName, travelDestination, and requestedAmount. Variables can also be used to describe how the execution flow should be routed among activities as they are typically used as part of branch conditions. For example, Figure 1 shows that the variable response is used in a branch condition to determine whether the workflow should proceed by refunding the employees or by notifying them of the rejection. This approach is the one followed by most workflow models, including for example BPEL4WS [17] or Tibco [6]. It is also analogous in spirit to how programming languages work. An alternative approach consists in performing data transfer by directly linking the output of a node A with the input of a subsequent node B, with the meaning that the output of A (or a subset) is used as the input of B. This is called the data flow approach, and it is followed by a few models, in particular, those specified by IBM, including MQ Series Workflow [5] and WSFL [18].
32.1.6 Failure and Exception Handling Although research in workflow management has been very active for several years, and the need for modeling exceptions in information systems has been widely recognized (see, e.g., [19, 20]), only recently has the workflow community tackled the problem of exception handling. One of the first contributions came from Eder and Liebhart [21], who analyzed the different types of exceptions that may occur during workflow execution and provided a classification of such exceptions. They divided exceptional situations into basic failures, corresponding to failures at the system level (e.g., DBMS, operating system, or network failure), application failures, corresponding to failures of the applications invoked by the WfMS in order to execute a given task, expected exceptions, corresponding to predictable deviations from the normal
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 8 Wednesday, August 4, 2004 9:01 AM
32-8
The Practical Handbook of Internet Computing
behavior of a process, and unexpected exceptions, corresponding to inconsistencies between the business process in the real world and its corresponding workflow description. Basic failures are not specific to business processes and workflow management; approaches to failure handling have in fact been developed in several different contexts, particularly in the area of transaction processing. Therefore, WfMSs may (and in fact do) handle failures by relying on existing concepts and technology. For instance, basic failures are handled at the system level, by relying on the capability of the underlying DBMS to maintain a persistent and consistent state, thus supporting forward recovery. A generic approach to handling application failures involves the integration of workflow models with advanced transaction models [22]. In fact, if the workflow model provides “traditional” transaction capabilities such as the partial and global rollback of a process, application failures can be handled by rolling back the process execution until a decision (split) point in process is reached, from which forward execution can be resumed along a different path. This model is supported by several systems; for instance, ConTracts provide an execution and failure model for workflow applications [23]; a ConTract is a transaction composed of steps. Isolation between steps is relaxed so that the results of completed steps are visible to other steps. In order to guarantee semantic atomicity, each step is associated with a compensating step that (semantically) undoes its effect. When a step is unable to fulfill its goal, backward recovery is performed by compensating completed steps, typically, in the reverse order of their forward execution, up to a point from which forward execution can be resumed along a different path. WAMO [21, 24] and Crew [25] extend this approach by providing more flexible and expressive models, more suitable for workflow applications. Recently, commercial systems (such as the above mentioned BizTalk) started to provide this kind of functionality. Expected exceptions are predictable deviations from the normal behavior of the process. Examples of expected exceptions are: • In a travel reservation process, the customer cancels the travel reservation request. • In a proposal presentation process, the deadline for the presentation has expired. • In a car rental process, an accident occurs to a rented car, making it unavailable for subsequent rentals. Unlike basic and application failures, expected exceptions are strictly related to the workflow domain: They are part of the semantics of the process, and it should be possible to model them within the process, although they are not part of its “normal” behavior. Different models offer different approaches to define how to handle expected exceptions: • A modeling paradigm often proposed in the literature but rarely adopted in commercial systems consists in modeling the exception by means of an ECA rule, where the event describes the occurrence of a potentially exceptional situation (e.g., the customer cancels the order), the condition verifies that the occurred event actually corresponds to an exceptional situation that must be managed, whereas the action reacts to the exception (e.g., aborts the workflow instance) [11, 12, 26]. Although this approach is conceptually feasible, it did not gain popularity due to the complexity of rule-based modeling (viable only if there are very few rules) and due to the need of supporting two different languages (one for defining the normal flow and the other for defining the exceptional flow). • Another viable paradigm consists in mimicking the exception-handling scheme followed by programming languages such as Java: A part of the schema is enclosed into a logical unit, and exception-handling code is attached to the unit to catch and handle the exception, much like a pair of try/catch statements [27]. • Another possible approach consists in defining a specialized type of node, called event node, that “listens” to an event explicitly raised by a user or by the WfMS, and if the event is detected, activates an exception handling portion of the flow, specified using the same techniques used for defining the normal schema. This approach is for example discussed in [28, 29].
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 9 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-9
The common aspect of all these techniques is that they allow capturing events that occur asynchronously with respect to the workflow execution, i.e., that can occur at any point in a workflow instance (or in a logical unit, for the Java-like approach) and not just in correspondence with the execution of a certain task. This is important as exceptions typically have this asynchronous nature. Finally, unexpected exceptions correspond to errors in modeling the process. Therefore, the only (obvious) way to handle them consists in modifying the process definition, and hence we do not discuss them further.
32.1.7 WfMS Architectures After having presented the different aspects of a workflow model, we now briefly discuss workflow architectures. A WfMS is typically characterized by the following components (Figure 3): • The workflow designer, a lightweight client-side tool that allows users to define and deploy new workflows as well as modify existing workflows. It is typically characterized by a graphical user interface through which designers can quickly and easily draw a schema. Definitions are then translated and saved in a textual format (typically in some XML representation). • The workflow definition manager receives process definitions from the designer and stores them into a file repository or a relational database. In addition, it deploys the workflows so that it can be instantiated, manages versions, and controls concurrent access, for example, preventing multiple users from modifying the same workflow at the same time. • A workflow engine executes workflow instances, as discussed earlier in the section. The engine accesses both the workflow definition and the workflow instance databases, in order to determine the nodes to be executed. • A resource broker determines the resources that are capable and authorized to execute a work item. • The worklist handler receives work items from the engine and delivers them to users (by pushing work to the users or by delivering upon request). For human resources, access to the worklist handler is typically performed via a browser, and automated resources can access the worklist through an API, typically a Java or C++ API. Note that the worklist handler hides the heterogeneity Web services
SOAP
Interoperability layer (typically a MOM or a message broker)
Workflow designer
HTTP
Workflow definition manager
Workflow engine
Workflow definitions
Workflow instances
Worklist handler
Java, C++ API
Workflow analyzer
FIGURE 32.3 A typical WfMS architecture.
Copyright 2005 by CRC Press LLC
Resource broker
HTTP (through browser)
Resources Users and applications
C3812_C32.fm Page 10 Wednesday, August 4, 2004 9:01 AM
32-10
The Practical Handbook of Internet Computing
of the resources from the WfMS. In fact, a WfMS simply places tasks in the resources’ worklists. It is up to the resource (or to a suitably defined adapter) to access the worklist and deliver the job to the resource. For example, if the resource is a SAP system, then the adapter will have to access the worklist corresponding to the SAP workflow resource and, based on the task, invoke one of the SAP interface operations. • The workflow analyzer allows users (either through a browser or through a programmatic API) to access both status information about active processes and aggregate statistics on completed processes, such as the average duration of a process or of a node. • The interoperability layer enables interaction among the different WfMS components. Typically, this layer is a CORBA broker or a message-oriented middleware (MOM) implementation. This architecture allows the efficient execution of thousands of concurrent workflow instances. As with TP monitors, having a single operating system process (the workflow engine) manage all workflow instances avoids overloading the execution platform with a large number of operating system processes. In addition, when multiple machines are available, the WfMS components can be distributed across them, thereby allowing the workflow engine (which is often the bottleneck) to have more processing power at its disposal, in the form of a dedicated machine. WfMSs may also include a load balancing component to allow the deployment of multiple workflow engines over different machines within the same WfMS. Modern WfMSs can typically schedule hundreds of thousands of work nodes per hour, even when running on a single workstation. We observe here that, recently, WfMSs based on a completely different architecture have appeared, especially in the context of Web service composition. One of the novelties that Web services brings is standardization, so that each component supports the same interface definition language and interaction protocol (or, at least, heterogeneity is considerably reduced with respect to traditional integration problems). Uniformity in the components assembled by the workflow makes integration easier and in particular, makes it possible to directly push the work item to the resource (a Web service, in this case), without requiring the presence of a worklist handler. In fact, when a workflow composes Web services rather than generic applications, the steps in the workflow correspond to invocation of Web service operations. Because the underlying assumption is that the operation can be invoked by means of standard protocols (typically SOAP on top of HTTP), the engine can directly invoke the operation through the middleware ( Figure 3). There is no need for an intermediate stage and intermediate format, where the engine deposits the work items and the resource picks up the work to do. Web services bring another important differentiation: Up till now the business processes inside the enterprise could collaborate with other processes in a mutually agreed upon but ad hoc manner. Web services are providing standardized means through which business processes can compose with other processes. This enables dynamic composition ( Figure 4). In fact, not only is it possible to develop complex applications by composing Web services through a workflow, but it is also possible to expose the composition (the workflow) as yet another Web service, more complex and at a higher level of abstraction. These newly created services (called composite services) can be dynamically composed (just like basic Web services are composed), thereby enabling the definition of complex interactions among business processes.
32.2 Web Services and Business Processes 32.2.1 Web Services Web services are described as distributed services that are identified by URIs, whose interfaces and binding can be defined, described, and discovered by XML artifacts, and that support direct XML message-based interactions with other software applications via Internet-based protocols. Web Services have to be described in such a way that other Web services may use them.
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 11 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-11
select approving manager
collect data
send request to manager else notify reason for rejection
Invitation
Advertisement
Registration
response = “approved” transfer funds to employee
Billing
composite service A
composite service B receive order
process order
send payment
composite service C
FIGURE 32.4 Services can be iteratively composed into more complex (composite) services.
Interface definition languages (IDL) have been used in traditional software environments. The IDL describes the interfaces that the object exposes. These interfaces are implemented by the object, and are useful for interoperation between objects. Web Services Description Language (WSDL) is an attempt to use similar concepts in Web services. Web Services Definition Language (WSDL) enables dynamic interaction and interoperation between Web services. WSDL not only describes the interfaces, but also the corresponding bindings involved. For example, in WSDL a service is described through a number of endpoints. An endpoint is composed of a set of operations. An operation is described in terms of messages received or sent out by the Web service: • Message — an abstract definition of data being communicated, consisting of message parts. • Operation — a unit of task supported by the service. Operations are of the following types, namely, one-way, request–response, solicit–response, and notification. • Port type — an abstract collection of operations that may be supported by one or more end points. • Binding — a concrete protocol and data format specification for a particular port type. • Port — a single end point defined as a combination of a binding and a network address. • Service — a collection of related end points.
32.2.2 Web Services and Business Process Orchestration Web Services usually expose a subset of enterprise business processes and activities to the external world through a set of operations defined in their WSDL. The enterprise business processes have to be defined and some of their activities have to be linked to the WSDL operations. This requires modeling of Web service’s back-end business processes. In addition, Web services need to interact with other Web services by exchanging a sequence of messages. A sequence of message exchange is termed a conversation. The conversations can be described independently of the internal flows of the Web services and could be simply described by sequencing the exposed operations and messages of the Web services (as mentioned in their WSDLs). However, such an inter-Web service interaction automatically leads to coupling of the
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 12 Wednesday, August 4, 2004 9:01 AM
32-12
The Practical Handbook of Internet Computing
internal business processes of the Web services to form what is called a global process. The participating Web services may or may not be aware of the whole global process depending on their understanding with each other and the internal information they are willing to expose to each other. 32.2.2.1 Business Process Languages Business processes languages like XLANG/WSFL/BPEL4WS are used to model business processes. The Web Services Flow Language (WSFL) introduces the notion of flows and activities. XLANG is another technology from Microsoft that provides a mechanism for defining business processes and global flow coordination. WSFL [18] models business processes as a set of activities and links. An activity is a unit of useful work. The links could be data links where data is fed into an activity from another, or control links where decisions are made to follow one activity or another. These activities are made available through one or more operations that are grouped through end points (as defined in WSDL). A service is made up of a set of end points. A service provider can provide one or more services. Just like internal flows, global flows can be defined. Global flow consists of plug links that link up operations of two service providers. This helps in the creation of complex services that can be recursively defined. XLANG defines services by extending WSDL. The extension elements describe the behavioral aspects. A behavior spans multiple operations. A behavior has a header and a body. An action is an atomic component of a behavior. The action elements could be an operation, a delay element or a raise element. The delay elements, namely delayFor and delayUntil, introduce delays in the execution of the process to either wait for something to happen (for example a timeout) or wait till an absolute date–time has been reached, respectively. The raise construct is used to create exceptions. It handles the exceptions by calling the handlers registered with the raise definition. Processes put actions together in useful ways. A process form could be a sequence, switch, while, All, Pick, Context, Compensate, or Empty. Business Process Execution Language for Web Services (BPEL4WS) [30] combines the WSFL and XLANG capabilities. It is an attempt to standardize business process language. A single BPEL4WS process describes the global process that links multiple Web services. Entry points are defined in the BPEL4WS specification of a global process. These entry points either consume WSDL operations’ incoming messages from input-only or input–output operations. BPEL4WS only utilizes the input-only and input–output (request–response) operations of WSDL. It does not require or support the output-only (notification) and output–input (solicit–response) operations. The BPEL4WS process itself is composed of activities. There are a collection of primitive activities: invoking an operation on a Web service (), waiting to receive a message to operation of the service (), creating a response to an input/output operation (), waiting for some time without doing anything (), indicating an error (), copying data from one place to another (), closing the entire service instance down(), or doing nothing through (). These basic primitives may be: combining through (sequence), branching through (switch), defining loops (while), executing one of the chosen paths (pick), or executing activities in parallel (flow). Within activities executing in parallel, one can indicate execution order constraints by using links. BPEL4WS provides the capability to combine activities into complex algorithms that may represent Web service to Web service interactions. 32.2.2.2 e-Business Languages Web Service to Web Service interactions need to follow certain business protocols to actually undertake business on the Web. X-EDI, ebXML, BTP, TPA-ML, cXML, and CBL are some of the B2B technologies that have been proposed to enable this paradigm with Web services. In ebXML [31], parties register their Collaboration Protocol Profiles (CPP) at ebXML registries. Each CPP is assigned a GUID by the ebXML registry. Once a party discovers another party’s CPP and they decide on doing business, they form a Collaboration Protocol Agreement (CPA). CPAs are formed after negotiation between the parties. The intent of the CPA is not to expose internal business process of the
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 13 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-13
parties but to expose the part of the process that is visible and that involves interactions between the parties. The messages exchanged between the involved parties or business partners may utilize ebXML Messaging Service (ebMS). A conversation between parties is defined through the CPA and the business process specification document it references. This conversation involves one or more Business Transactions. A business transaction may involve exchange of messages as requests and replies. The CPA may refer to multiple business process specification documents. Any one conversation will involve only a single process specification document, however. Conceptually, the B2B server at the parties is responsible for managing the CPAs and for keeping track of the conversations. It also interfaces the functions defined in the CPA with the internal business processes. The CPP contains the following: • Process specification layer: This details the business transactions that form the collaboration. It also specifies the relative order of business transactions. • Delivery channel: It describes party’s characteristics for receiving and sending messages. A specification can contain more than one delivery channels. • Document exchange layer: This describes the processing of the business documents like encryption, reliable delivery, and digital signatures. • Transport layer: The transport layer identifies the transport protocols to be used in the end point addresses, along with other properties of the transport layer. The possible protocols being SMTP, HTTP, and FTP. Web services can be used to enable business process composition. Web service to Web service interactions happen through message-based conversations. However, in order to undertake business on the Web, it is essential that enterprises provide guarantees to each other on performance, response times, reliability, and availability of these business processes and Web services. The guarantees are usually specified through service level agreements (SLA). The e-business operations have to be semantically analyzed [32], SLA violations have to be monitored [33], and the guarantees have to be assured.
32.3 Conclusion Enterprises have used business processes and workflows for automating their tasks. Business processes have been traditionally managed by using workflow-management systems. Web services and the ensuing standardization process enable dynamic composition of business processes. Many issues remain unresolved regarding conversation definition, guaranteed negotiation, specification, and assurance.
References 1. IBM. WebSphere MQ Integrator Broker: Introduction and Planning. June 2002. Available from www.ibm.com. 2. TIBCO Software. TIBCO Enterprise Application Integration solutions. 2002. Available from www.tibco.com. 3. WebMethods. WebMethods Enterprise Integrator: User’s Guide. 2002. Available from www.webmethods.com. 4. Vitria. BusinessWare: The Leading Integration Platform. 2002. Available from www.vitria.com. 5. IBM. MQ Series Workflow for Business Integration. 1999. Available from www.ibm.com. 6. Tibco. TIBCO Business Process Management solutions. 2002. Available from www.tibco.com. 7. Microsoft. Microsoft BizTalk Server 2002 Enterprise Edition. 2002. Available from www.microsoft.com. 8. F. Leymann and D. Roller. Production Workflow: Concepts and Techniques. 1999. Prentice-Hall, New York. 9. OMG. Unified Modeling Language Specifications. Version 1.3. 1999.
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 14 Wednesday, August 4, 2004 9:01 AM
32-14
The Practical Handbook of Internet Computing
10. W. van der Aalst and Kees van Hee. Workflow Management: Models, Methods, and Systems. 2001. MIT Press, Cambridge, MA. 11. D. Chiu, Q. Li, Kamalakar Karlapalem. ADOME-WfMS: Towards cooperative handling of workflow exceptions. In A. Romanowsky, C. Dony, J. L. Knudesn, A. Tripathi, Eds. Advances in Exception Handling Techniques. 2000. Springer, New York. 12. G. Kappel, P. Lang, S. Rausch-Schott, W. Retschitzegger. Workflow Management Based on Objects, Rules, and Roles. IEEE Data Engineering Bulletin 18(1): 11–18, 1995. 13. F. Casati, S. Ceri, B. Pernici, G. Pozzi.Deriving active rules for workflow enactment. Proceedings of Database and Expert System Applications (DEXA’96). Zurich, Switzerland. 94–115, 1996. 14. The Workflow Management Coalition. Process Definition Interchange. Document number WfMCTC-1025. 2002. 15. F. Casti and A. Discenza. Supporting Workflow Cooperation Within and Across Organizations. Symposium on Applied Computing (SAC’2000). Como, Italy. 196–202, June 2000. 16. W. van der Aalst and M. Weske. The P2P Approach to Interorganizational Workflows. Proceedings of the Conference on Advanced Information Systems Engineering (CAISE’01). Interlaken, Switzerland. 140–156, June 2001. 17. T. Andrews, F. Curbera, H. Dholakia, Y. Goland, J. Klein, F. Leymann, K. Liu, D. Roller, D. Smith, S. Thatte, I. Trickovic, S. Weerawarana. Business Process Execution Language for Web Services. Version 1.1. 2003. 18. Frank Leymann. Web Services Flow Language. Version 1.0. 2001. 19. A. Borgida. Language features for flexible handling of exceptions in information systems. ACM Transactions on Database Systems, 10(4): 565–603, 1985. 20. H. Saastamoinen. On the Handling of Exceptions in Information Systems. PhD thesis, University of Jyvaskyla, Finland, 1995. 21. J. Eder and W. Liebhart. The Workflow Activity Model WAMO. Proceedings of the 3rd International Conference on Cooperative Information Systems (CoopIs’95). Wien, Austria. 87–98, 1995. 22. D. Worah and A. Sheth. Transactions in transactional workflows. In S. Jajodia and L. Kerschberg, Eds., Advanced Transaction Models and Architectures. Kluwer Academic, New York, 1997. 23. A. Reuter, K. Schneider, and F. Schwenkreis. Contracts revisited. In S. Jajodia and L. Kerschberg, Eds., Advanced Transaction Models and Architectures. Kluwer Academic, New York, 1997. 24. J. Eder and W. Liebhart. Contributions to exception handling in workflow management. Proceedings of the EDBT Workshop on Workflow Management Systems, Valencia, Spain. 3–10, 1998. 25. M. Kamath and K. Ramamritham. Failure handling and coordinated execution of concurrent workflows. Proceedings of the 14th International Conference on Data Engineering (ICDE’98). Orlando, FL. 334–341, 1998. 26. Fabio Casati, Stefano Ceri, Stefano Paraboschi, Giuseppe Pozzi. Specification and Implementation of exceptions in workflow management systems. ACM TODS 24(3): 405–451, 1999. 27. Alexander Wise. Little-JIL 1.0: Language Reports. University of Massachussets, Amherst, MA. Document number UM-CS-1998-024. 1998. 28. C. Hagen and G. Alonso. Flexible exception handling in the OPERA process support system. In Proceedings of the 18th International Conference on Distributed Computing Systems (ICDCS’98), Amsterdam, The Netherlands. 526–533, May 1998. 29. Fabio Casati and Giuseppe Pozzi. Modeling and Managing Exceptional Behaviors in Workflow Management Systems. Proceedings of Cooperative Information Systems (CoopIS) 99. Edinburgh, Scotland. 127–138, 1999. 30. Business Process Execution Language for Web Services (BPEL4WS) http://www.ibm.com/developerworks/library/ws-bpel/. 31. EbXML: http://www.ebxml.org. 32. M. Sayal, A. Sahai, V. Machiraju, F. Casati. Semantic analysis of e-business operations. Journal of Network and System Management, March 2003 (special issue on E-Business Management).
Copyright 2005 by CRC Press LLC
C3812_C32.fm Page 15 Wednesday, August 4, 2004 9:01 AM
Business Process: Concepts, Systems, and Protocols
32-15
33. Akhil Sahai, Vijay Machiraju, Mehmet Sayal, Aad van Moorsel, Fabio Casati, Li Jie Jin. Automated SLA Monitoring for Web Services. IEEE/IFIP Distributed Systems: Operations and Management (DSOM 2002). Montreal, Canada. 28–41, 2002. 34. Gustavo Alonso, Divyakant Agrawal, Amr El Abbadi, Mohan Kamath, Roger Günthör, C. Mohan. Advanced transaction model in workflow context. Proceedings of the 12th International Conference on Data Engineering (ICDE’96), New Orleans, LA, 574–581, February 1996.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 1 Wednesday, August 4, 2004 9:03 AM
33 Information Systems CONTENTS Abstract 33.1 Introduction 33.2 Information Systems Before the Advent of the Internet 33.2.1 33.2.2 33.2.3 33.2.4 32.2.5
Processes Products Nonfunctional Qualities Social Structures Automation
33.3 The World As Seen by Information Systems 33.3.1 33.3.2 33.3.3 33.3.4
Processes and Products Nonfunctional Qualities Social Structures Automation
33.4 What Is New about Internet Computing? 33.5 Information Systems Challenges in the Internet Age 33.5.1 33.5.2 33.5.3 33.5.4
Products and Processes Nonfunctional Qualities Social Structures Automation
33.6 Conceptual Abstractions for Information Systems 33.6.1 33.6.2 33.6.3 33.6.4
Eric Yu
Conceptualizing “What” and “When” Conceptualizing “Where” Conceptualizing “How” and “Why” Conceptualizing “Who”
33.7 Summary and Conclusions Acknowledgements References
Abstract Internet computing is changing the nature and scope of information systems (IS). Most IS methods and techniques were invented before the advent of the Internet. What will the world of IS practice be like in the age of the Internet? What methods and techniques will be relevant? We review the world of information systems in terms of processes and products, qualities, social structures, and the role of automation. Given the rapid adoption of Internet thinking, not only among technical professionals but also by the public, we outline the prospects and challenges for information systems in the emerging landscape. In particular, we highlight the need for richer modeling abstractions to support the diversity of services and modes of operation that are required in the new age of worldwide, open network information systems.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 2 Wednesday, August 4, 2004 9:03 AM
33-2
The Practical Handbook of Internet Computing
33.1 Introduction How will Internet computing change the world of information systems? Following the widespread commercial availability of computing technologies, IS has been the dominant application area of computing. Organizations large and small, private and public, have come to rely on IS for their day-to-day operation, planning, and decision-making. Effective use of information technologies has become a critical success factor in modern society. Yet, success is not easily achieved. Many of the failures occur not in the technology, but in how technology is used in the context of the application domain and setting [Lyytinen, 1987; Standish, 1995]. Over the years, many methods and techniques have been developed to overcome the challenges to building effective information systems. For many segments of society, the Internet has already changed how people work, communicate, or even socialize. Many of the changes can be attributed to information systems that now operate widely over the Internet. Internet computing is changing the scope and nature of information systems and of IS work. What opportunities, problems, and challenges do Internet computing present to the IS practitioner? What makes the new environment different? Which existing techniques continue to be applicable and what adaptations are necessary? What new IS methods and techniques are needed in the Internet world? IS is a multifaceted field, and requires multidisciplinary perspectives. In this chapter, we will only be able to explore some of the issues from a particular perspective — primarily that of IS engineering, with an emphasis on the interplay between the technical world of system developers and programmers on the one hand, and the application or problem-domain world of users, customers, and stakeholders on the other. This perspective highlights some of the key IS issues as the bridge between raw technology and the application domain. The chapter is organized as follows. Section 33.2 considers the world of IS practice before the advent of the Internet. In Section 33.3, we ask how the users and applications are seen through the eyes of the IS practitioner, pre-Internet. Section 33.4 focuses on the new environment for information systems, brought about by Internet computing. Section 33.5 considers the implications and challenges of the Internet age for IS practice and research. As conceptual abstractions are at the heart of IS engineering, we focus in Section 33.6 on the kinds of abstractions that will be needed in the Internet age. We close in Section 33.7 with a summary and conclusions.
33.2 Information Systems Before the Advent of the Internet Let us first consider the world of IS practice, focusing on methods and techniques used before the advent of the Internet. What kinds of tasks and processes do IS professionals engage in? What products do the processes produce? What quality concerns drive their daily work and improvement initiatives? How is the division of work organized among professional specialties, and within and across organizations and industry sectors? Which areas of work can be automated, and which are to be retained as human tasks?
33.2.1 Processes The overarching organizing concept in most IS curricula is that of the system development life cycle [Gorgone et al., 2002]. The overall process of creating and deploying an information system is broken down into a number of well-defined interdependent processes. These typically include planning, requirements elicitation, analysis, specification, design, implementation, operations and support, maintenance, and evolution. Verification and validation, including testing, is another set of activities that needs to be carried out in parallel with the main production processes. Some of the life-cycle activities involve participation of users and stakeholders. For example, technical feasibility, and business priorities and risks are reviewed at predefined checkpoints. When externally provided components or subsystems are involved, there are processes for procurement and integration. Processes are also needed to manage the information content
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 3 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-3
— during system development (e.g., defining the schemas) and during operation (e.g., ensuring information quality) [Vassiliadis et al., 2001]. A systematic process methodology is therefore a central concept in the field, imported initially from practices in large-scale engineering projects. The systematic approach is used to control budget, schedule, resources, and opportunities to change course, e.g., to reduce scope, or to realign priorities. Nevertheless, lack of a systematic process methodology continues to be a concern, as a contributing factor to poor quality or failure of software and information systems. Substantial efforts are used to institutionalize good practices in processes, through standards, assessment, and certification, and process improvement initiatives, e.g., Capability Maturity Model Integrated (CMMI) [Chrissi et al., 2003] and ISO 9000 [ISO, 1992]. Many IS projects adopt methodologies offered by vendors or consulting companies, which prescribe detailed processes supported by associated tools. Prescriptive processes provide guidance and structure to the tasks of system development. They may differ in the stages and steps defined, the products output at each step, and how the steps may overlap or iterate (e.g., the waterfall model [Royce, 1970], the spiral model [Boehm, 1988], and the Rational Unified Process [Kruchten, 2000]). Although prescriptive processes aim to create order out of chaos, they are sometimes felt to be overrestrictive or requiring too much effort and time. Alternative approaches that have developed over the years include rapid prototyping, Joint Application Development (JAD) [Wood and Silver, 1995], Rapid Application Development (RAD) [McConnell, 1996], and more recently agile development [Cockburn, 2001]. All of these make use of a higher degree of human interaction among developers, users, and stakeholders.
33.2.2 Products Complementary to and intertwined with processes are the products that they output. These include products and artifacts that are visible to the end user such as executable code, documentation, and training material, as well as intermediate products that are internal to the system development organization. When more than one organization is involved in the creation and maintenance of a system, there are intermediate products that are shared among or flow across them. Most of the products are informational — plans, requirements, specifications, test plans, designs, budgets and schedules, work breakdowns and allocations, architectural diagrams and descriptions, and so on. Some products are meant for long-term reference and record keeping, whereas others are more ephemeral and for short-term coordination and communication. These informational products are encoded using a variety of modeling schemes, languages, and notations. Information-modeling techniques continue to be a central area of research [Brodie et al., 1984; Webster, 1988; Loucopoulos and Zicari, 1992; Boman et al., 1997; Mylopoulos, 1998]. Widely used techniques include Entity-Relationships (ER) modeling [Chen, 1976], Integrated Definition for Function Modelling (IDEF0) [NIST, 1993] (based on the Structured Analysis and Design Technique (SADT) [Ross and Shoman, 1977]), and the Unified Modelling Language (UML) [Rumbaugh et al., 1999]. Large system projects involve many kinds of processes, producing a great many types of information products related to each other in complex ways. Metamodeling and repository technologies [Brinkkemper and Joosten, 1996; Jarke et al., 1998; Bernstein et al., 1999; Bernstein, 2001] are often used to manage the large amounts and variety of information produced in a project. These technologies support retrieval, update, and coordination of project information among project team members. Metamodels define the types of processes and products and their interrelationships. Traceability from one project artifact or activity to another is one of the desired benefits of systematic project information management [Ramesh and Jarke, 2001].
33.2.3 Nonfunctional Qualities Although processes and products constitute the most tangible aspects of IS work, less tangible issues of quality are nevertheless crucial for system success. Customers and users want systems that not only
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 4 Wednesday, August 4, 2004 9:03 AM
33-4
The Practical Handbook of Internet Computing
provide the desired functionalities, but also a whole host of nonfunctional requirements that are often conflicting — performance, costs, delivery schedules, reliability, safety, accuracy, usability, and so on. Meeting competing quality requirements has been and remains a formidable challenge for software and IS professionals [Boehm and In, 1996]. Not only are system developers not able to guarantee the correctness of large systems, they frequently fail to meet nonfunctional requirements as well. Many of the issues collectively identified as the software crisis years ago are still with us today [Gibbs, 1994]. Research subspecialties have arisen with specific techniques to address each of the many identified areas of quality or nonfunctional requirements — performance, reliability, and so forth. However, many quality attributes are hard to characterize, e.g., evolvability and reusability. When multiple requirements need to be traded off against one another, systematic techniques are needed to deal with the synergistic and conflicting interactions among them. Goal-oriented approaches [Chung et al., 2000] have recently been introduced to support the systematic refinement, interaction analysis, and operationalization of nonfunctional requirements. On the project management level, institutionalized software process improvement programs (such as CMMI) target overall project quality improvements. Quality improvements need to be measured, with results fed back into new initiatives [Basili and Caldiera, 1995].
33.2.4 Social Structures Most information systems require teams of people to develop and maintain them. The structuring of projects into process steps and artifacts implies a social organization among the people performing the work, with significant degrees of task specialization. Some tasks require great familiarity with the application domain, whereas others require deep knowledge about specific technologies and platforms. Some require meticulous attention to detail, whereas others require insight and vision. Interpersonal skills are as important as technical capabilities for project success [Weinberg, 1998; DeMarco and Lister, 1999]. Every product requires time and effort to create, so the quality depends on motivation, reward structures, and priorities, as well as on personnel capabilities. Yet the social organization is often implicit in how processes and products are structured, rather than explicitly designed, as there are few aids beyond generic project management tools. Processes are judged to be too heavy (excessive regimentation) or too light (chaotic) based on the perceived need for human creativity, initiative, and flexibility for the task at hand. Factors influencing the determination of social structure include project and team size, familiarity with the application domain, and maturity of the technologies, as well as sociocultural and economic factors. Industry categories and structures (e.g., Enterprise Resource Planning [ERP] vendors vs. ERP implementers) and human resource categories (database designers vs. database administrators) are larger social structures that specific projects must operate within. The social nature of IS work implies that its structure is a result of conflicting as well as complementary goals and interests. Individuals and groups come together and cooperate to achieve common objectives, but they also compete for resources, pursue private goals, and have different visions and values. Processes and products that appear to be objectively defined are in fact animated by actors with initiatives, aspirations, and skills. The human intellectual capital perspective [Nonaka and Takeuchi, 1995] highlights the importance of human knowledge and ingenuity in systems development. Although considerable knowledge is manifested in the structure of processes and products, a great deal of knowledge remains implicit in human practices and expertise. There are limits on how much and what kinds of knowledge can be made explicit, encoded in some language or models, and systematically managed. In reflecting on IS practices and software development as professional disciplines, authors acknowledge the human challenges of the field [Banville and Landry, 1992; Humphrey, 1995].
32.2.5 Automation The quest for higher degrees of automation has been a constant theme in information systems and in software engineering. The large amounts of complex information content and the numerous, complicated
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 5 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-5
relationships, the need for meticulous detail and accuracy, the difficulty of managing large teams, and the desire for ever quicker delivery and higher productivity — all call for more and better automated tools. Numerous tools to support various stages and aspects of IS work have been offered — from Computer Aided Software Engineering (CASE) tools that support modelling and analysis, to code generators, test tools, simulation tools, repositories, and so on. They have met with varying degrees of success in adoption and acceptance among practitioners. Automation relies on the formalization of processes and products. Those areas that are more amenable to mathematical models and semantic characterization have been more successful in achieving automated tool support. Thus, despite great efforts and many advances, IS work remains labour-intensive and requires social collaboration. Many issues are sociotechnical, e.g., requirements elicitation, reuse, agile development, and process improvement. The difficulties encountered with automation in the developer’s world may be contrasted with that in the user’s world, where automation is the mandate and expectation of the IS practitioner.
33.3 The World As Seen by Information Systems Information systems convey and manipulate information about the world. The kind of world (the application setting and the problem domain) that is perceived by the IS analyst is filtered through presuppositions of what the technology of the day can support. In the preceding section, we reflected upon the world of the IS practitioner in terms of processes and products, quality, social structures, and automation. Let us now use the same categories to consider how IS practitioners treat the world that they serve — the world that users and stakeholders inhabit.
33.3.1 Processes and Products The predominant conceptualization of the world as seen by IS analysts is that of processes and products. The main benefit of computers was thought to be the ability to process and store large amounts of encoded information at high speeds and with great accuracy. In early applications, information systems were used to replace humans in routine, repetitive information processing tasks, e.g., census data processing and business transaction processing. The processes automate the steps that humans would otherwise perform. Processes produce information artifacts that are fed into other processes. The same concept can be applied to systems that deal with less routine work, e.g., management information systems, decision support systems, executive information systems, and strategic information systems. Models and notations, usually graphical with boxes and arrows, were devised to help describe and understand what processes are used to transform what kinds of inputs into what kinds of outputs, and state transitions. Data Flow Diagrams (DFD) [DeMarco, 1979], SADT [Ross and Shoman, 1977], ER modelling [Chen, 1976], and UML [Rumbaugh et al., 1999] are in common use. These kinds of models shape and constrain how IS analysts perceive the application domain [Curtis et al., 1992]. We note that processes and products in the developer’s world are treated somewhat differently than those in the user’s world. In the latter, attention is focused on those that are potentially automatable. In the former, there is an understanding that a large part of the processes and products will be worked on by humans, with limited degrees of automation. We will return to this point in Section 33.4.
33.3.2 Nonfunctional Qualities Most projects aim to achieve some improvement or change in qualitative aspects of the world — faster processing, fewer delays, information that is more accurate and up-to-date, lower costs, and so forth. In Section 32.3, we considered the pursuit of quality during a system development project. Here, we are concerned with the quality attributes of processes and products in the application domain in which the
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 6 Wednesday, August 4, 2004 9:03 AM
33-6
The Practical Handbook of Internet Computing
target system is to function. Many of the same considerations apply, except that now the IS professional is helping to achieve quality objectives in the client’s world. Quality issues may be prominent when making the business case for a project, and may be documented in the project charter or mandate. However, the connection of these high level objectives to the eventual definition of the system in terms of processes and products may be tenuous. Quality attributes are not easily expressible in the models that are used to define systems, as the latter are defined in terms of processes and products. Quality concerns may appear as annotations or comments accompanying the text (e.g., a bottleneck or missing information flow). Furthermore, a model typically describes only one situation at a time, e.g., the current system as it exists or a proposed design. Comparisons and alternatives are hard to express, as are pros and cons and justifications of decisions. These kinds of information, if recorded at all, are recorded outside of the modelling notations. Some quality attributes can be quantified, but many cannot. Specialized models can be used for certain quality areas (e.g., economic models and logistical models), but analyzing cross-impacts and making tradeoffs among them is difficult, as noted in Section 33.2.3. Design reasoning is therefore hard to maintain and keep up to date when changes occur.
33.3.3 Social Structures Information systems change the social structures of the environment in which they operate. In performing some aspects of work that would otherwise be performed by people, they change how work is divided and coordinated. Bank tellers take on broader responsibilities as customer service representatives, phone inquiries are funnelled into centralized call centers, and data entry tasks are moved from clerical pools to end users and even to customers. Each time a system is introduced or modified, responsibilities and relationships are reallocated, and possibly contested and renegotiated. Reporting structures, and other channels of influence and control, are realigned. The nature of daily work and social interactions are altered. Reward structures and job evaluation criteria need to be readjusted. The importance of social factors in information systems have long been recognized (e.g., [Kling, 1996; Lyytinen, 1987]). Many systems fail or fall into disuse not because of technical failure, but due to a failure in how the technology is matched to the social environment. Alternative methodologies have been proposed that pay attention to the broader context of information systems, e.g., Soft Systems Methodology [Checkland, 1981], ethnographic studies of work practices [Goguen and Jirotka, 1994], Participatory Design [Muller and Kuhn, 1993], Contextual Design [Holtzblatt and Beyer, 1995], and so on. Each has developed a following, and has produced success stories. Workplace democracy approaches have a long history in Scandinavian countries [Ehn, 1988]. Nevertheless, despite the availability of these alternative methods, social issues are not taken into account in-depth in most projects. When an information system operates within an organizational context, the corporate agenda of the target system dominates, e.g., to improve productivity and profitability. Users, who are employees, are expected to fit their work practices to the new system. Although users and other stakeholders may be given opportunities, in varying degrees, to participate and influence the direction of system development, their initiatives are typically limited. Existing modelling techniques, most of which focus on process-and-product, are geared primarily to achieving the functionalities of the system, deferring or side-stepping quality or social concerns. For example, in the Structured Analysis paradigm, people and roles that appear in “physical” DFDs are abstracted away in going to the “logical” DFD, which is then used as the main analysis and design vehicle [DeMarco, 1979]. Actors in UML Use Case Diagrams [Rumbaugh et al., 1999] are modelled in terms of their interactions with the system, but not with each other. Given the lack of representational constructs for describing social relationships and analyzing their implications, IS practitioners are hard pressed to take people issues into account when considering technical alternatives. Conversely, stakeholders and users cannot participate effectively in decision making when the significance and implications of complex design alternatives are not accessible to them. It is hard for technical developers and application domain personnel to explore, analyze, and understand the space of possibilities together.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 7 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-7
33.3.4 Automation The responsibility of the IS professional is to produce automated information systems that meet the needs of the client. Although the success of the system depends a great deal on the environment, the mandate of the IS professional typically does not extend much beyond the automated system. In the early 1990s, the concept of business process reengineering (BPR) overturned the narrow focus of traditional IS projects. Information systems are now seen as enablers for transforming work processes, not just to automate them in their existing forms [Hammer, 1990; Davenport and Short, 1990]. The transformation may involve radical and fundamental change. Process steps and intermediate products judged to be unnecessary are eliminated, together with the associated human roles, in order to achieve dramatic efficiency improvements and cost reductions. IS therefore has been given a more prominent role in the redesign of organizations and work processes. Yet, IS professionals do not have good techniques and tools for taking on this larger mandate. Many BPR efforts failed due to inadequate attention to social and human issues and concerns. A common problem was that implicit knowledge among experienced personnel is frequently responsible for sustaining work processes, even though they are not formally recognized. Existing IS modelling techniques, based primarily on a mechanistic view of work, are not helpful when one needs a sociotechnical perspective to determine what processes can be automated, eliminated or reconfigured.
33.4 What Is New about Internet Computing? Why cannot IS practice carry on as before, i.e., as outlined in the preceding two sections? What parameters have changed as a result of Internet computing? From a technology perspective, the Internet revolution can be viewed, simplistically, as one in connectivity, built upon a core set of protocols and languages: TCP/IP, HTTP, and HTML or XML. With their widespread adoption through open standards and successful business models (e.g., affordable connection fees and free browsers), the result, from the user’s point of view, is a worldwide, borderless infrastructure for accessibility to information content and services — information of all types (as long as they are in digital format), regardless of what “system” or organization they originate from. Digital connectivity enabled all kinds of information services to coexist on a common, interoperable network infrastructure. Service providers have ready access to a critical mass of users, through the network effect of Metcalfe’s Law [Gilder, 1993]. Automated services can access, invoke, and interact with each other. Universal connectivity at the technology level makes feasible universal accessibility at the information content and services level. Internet computing is, therefore, triggering and stimulating the removal of technology-induced barriers in the flow and sharing of information. Previously compartmentalized information services and user communities are now reaching out to the rest of the world. Information systems, with Internet computing, find themselves broadening in scope with regard to content types, system capabilities, and organizational boundaries in the following ways: 1. Information systems have traditionally focused on structured data. The Internet, which gained momentum by offering information for the general public, unleashed an enormous appetite for unstructured information, especially text and images, but also multimedia in general. Corporations and other organizations quickly realized that their IS capabilities must address the full range of information content, to serve their public as well as streamline their internal workings. They can do this relatively easily, by embracing the same Internet technology for internal use as intranets. 2. Users working with information do not want to have to deal with many separate systems, each with its own technical idiosyncrasies. Internet computing, by offering higher level platforms for application building, makes it possible for diverse technical capabilities to appear to the user as a single system, as in the concept of portals. Thus, Internet computing vastly expands what a user may expect of a system. 3. Most information systems in the past had an internal focus and operated within the boundaries of an organization, typically using proprietary technologies from a small number of selected
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 8 Wednesday, August 4, 2004 9:03 AM
33-8
The Practical Handbook of Internet Computing
vendors. Internet computing is inverting that, both from a technological viewpoint, and from an IS viewpoint. Technologically, the momentum and economics of Internet computing are such that corporate internal computing infrastructures are being converted to open Internet standards [IETF; W3C; OpenGroup]. At the information services level, organizations are realizing that much can be gained by opening up their information systems to the outside world — to customers and constituents, to suppliers, partners and collaborators, as in Business to Business (B2B) e-commerce and virtual enterprises [Mowshowitz, 1997]. The boundaries of organizations have become porous and increasingly fluid, defined by the shifting ownership and control of information and information flow, rather than by physical locations or assets.
33.5 Information Systems Challenges in the Internet Age With the apparently simple premise of universal connectivity and accessibility, Internet computing is changing IS fundamentally. It is redrawing the map of information systems. As barriers to connectivity are removed, products and processes are being redefined. Quality criteria are shifting. New social structures are emerging around systems, both in the user’s world and in the developer’s world. People’s conceptions of what computers can do, and what they can be trusted to do, are evolving.
33.5.1 Products and Processes Let us first consider the impact of Internet computing on processes and products in the IS user’s world. Over the years, a large organization would have deployed dozens or hundreds of information systems to meet their various business and organizational needs. Each system automates its own area of work processes and products, with databases, forms, reports, and screens for input and output. Soon, it was realized that these independently developed systems should be interacting with each other directly and automatically. Thus, long before the Internet, numerous approaches emerged for extending the reach of information processes and products beyond the confines of a single system. For example, information in separate databases often represents different aspects of the same entity in the world. A customer, a purchase, an insurance policy, or a hospitalization — each of these has many aspects that may end up in many databases in the respective organizations. Database integration techniques were introduced to make use of data across multiple databases. Data warehousing provided powerful tools for understanding trends by enabling multidimensional analysis of data collected from the numerous operational databases in an organization. Data mining and knowledge discovery techniques enhanced these analyses. Enterprise-wide information integration has also been motivated by the process perspective. BPR stimulated cross-functional linking of previously stand-alone “stove pipe” systems. Workflow management systems and document management systems were used to implement end-to-end business processes that cut across functional, departmental lines. Different approaches were used to achieve integration or interoperability at various levels. Middleware technologies provided interprocess communication at a low level, requiring handcrafting of the interactions on an application-to-application basis. Enterprise application integration (EAI) products offered application-level interoperability. ERP systems offered integrated package solutions for many standard back office business processes. Integration is achieved at the business process level by adopting process blueprints from a single vendor [Curran and Keller, 1997]. When systems had disparate conceptual models of the world, metamodelling techniques were used to map across them. Internet computing technologies come as a boon to the mishmash of technologies and approaches that have proliferated in the IS world. By offering a common network computing and information infrastructure that is readily accessible to everyone — regardless of organizational and other boundaries — the integration and interoperability challenges that organizations had been confronting individually at an enterprise level are now being addressed collectively on a worldwide scale [Yang and Papazoglou, 2000]. Organizations that had already been opening up their operations to the external world through
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 9 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-9
IS-enabled concepts such as supply chain management, customer relationships management (CRM), and virtual enterprise now have the momentum of the whole world behind them. Interorganizational interoperability initiatives (also known as B2B e-commerce) no longer need to begin from scratch between partner and partner, but are undertaken by entire industries and sectors through consortiums that set standards for business application level protocols, e.g., Rosettanet, ebXML, HL7, UN/CEFACT, OASIS, and BPMI. Once the interaction protocols are set up, processes in one organization can invoke automated services in another without human intervention (WebServices). In an open world, anyone (individuals or organizations, and their information systems) has potential access to the full range of products and services offered on the open network, in contrast to the closed, proprietary nature of pre-Internet interactions. End-to-end process redesign can now be done, not only from one end of an organization to another, but across multiple organizations through to the customer and back. To support flexible, open interactions, products and services increasingly need to be accompanied by rich metadata, e.g., by using XML and its semantic extensions [Berners-Lee et al., 2001]. Catalogs and directories are needed for locating desired products and services. Brokers, translators, and other intermediaries are also needed [Wiederhold and Genesereth, 1997]. Internet computing is stimulating coordinated use of multimedia and multichannel user interactions. The same user — a sales representative, a student, or a community services counsellor — may be drawing on material that combines text, images, voice, music, and video on a desktop, laptop, PDA, mobile phone, or other device. There will be increasing demands to enable higher level automated processing of digital information in all formats. The semantic web initiative, for example, aims to enhance semantic processing of web content through formal definitions of meanings (ontologies) for various subject domains and communities [Gomez-Peres and Corcho, 2002; SemanticWeb]. In terms of products and processes, the challenges brought about by Internet computing can be summarized as one of diversity. Standardization is one way to overcome the excessive proliferation of diversity. Yet, in an open world, the capacity to cope with diversity must recognize the inherent need to differentiate, and not inhibit innovation. So the great challenge is to have processes that can interoperate seamlessly, and products that are intelligible and useful to their intended users. Given these recent transformations in the user’s world, the character of work in the developer’s world has seen rapid changes in the past decade or so. Development work that used to be organized vertically (from requirements to design to implementation) is now dealing increasingly with horizontal interactions, coordination, and negotiations. Each layer in the developer’s world, from business process analysis to architectural design to implementation platforms, must address interaction with peers, coping with diversity and interoperability at that level [Bussler, 2002]. As a result, each level is working with new kinds of information artifacts (e.g., using new languages and metamodels [Mylopoulos et al., 1990; Yu et al. 1996]) and new development processes (e.g., understanding and negotiating peer-level protocols and interactions) [Isakowitz et al., 1998].
33.5.2 Nonfunctional Qualities In the Internet world, when we are pursuing quality goals such as faster processing, greater accuracy and reliability, better usability, and so forth, we are dealing with processes that cut across many systems and organizations, and with information products from many sources. Unlike in the traditional world of closed systems, Internet computing implies that one may need to rely on many processes and products over which one has limited control or influence. Achieving quality in an open network environment requires new techniques not in common use in the traditional environment. For example, if the product or service is commodity-like, one can switch to an alternate supplier when the supplier is unsatisfactory. This presupposes efficient market mechanisms, with low transaction costs. Accurate descriptions of functionalities as well as quality attributes are required, using metrics that allow meaningful comparison by automated search engines and shopbots. This may involve third-party assessors and certifiers of quality, and regulatory protection and legal recourse when obligations are not met.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 10 Wednesday, August 4, 2004 9:03 AM
33-10
The Practical Handbook of Internet Computing
The situation is complicated by the dynamic nature of Internet collaborations, where automated processes can come together for one transaction fleetingly, then in the next moment go their own way to participate in new associations. When market mechanisms fail, one would need to establish more stable associations among players based on past experiences of trust [Rosenbloom, 2000; Falcone et al., 2001]. As for developers, due to the open network environment, one can expect special emphasis on certain nonfunctional requirements such as scalability, reliability, usability, security (including availability, integrity, and confidentiality), time-to-market, costs, and performance. Design tradeoffs may be more challenging, as the designer attempts to cater to market-based, dynamically changing clientele, as well as to stakeholders in more stable long-term relationships. Design techniques have traditionally been weak in dealing with quality or nonfunctional requirements. With Internet computing, there is the added need to support the more complex decision making involving competing demands from multiple dynamically configured stakeholders.
33.5.3 Social Structures Traditional information systems that are function-specific and narrowly focused imply that there are well-defined human roles and responsibilities associated with each function, e.g., planning vs. execution, or product lines vs. geographic regions. Internet computing, by facilitating ready access to a wide range of information system capabilities, enables much greater flexibility in social organizational arrangements. When a common platform is used, learning curves are reduced, and movement across roles and positions is eased. For example, as more routine tasks are automated, the same personnel can monitor a wider range of activities, respond to problems and exceptions, and engage in process improvement and redesign. More fundamental changes are occurring at the boundaries of organizations. The Internet has made the online consumer/citizen a reality. Many transactions (e.g., catalogue browsing and ordering, banking and investments, tax filing, and proposal submissions) are now handled online, with the user directly interacting with automated information systems. The organization is effectively pushing some of its processes to the customer’s side. Similar boundary renegotiations are taking place among suppliers and partners. These shifts in boundaries are changing internal organizational structures as well as broad industry structures. New business models are devised to take advantage of newly created opportunities [Timmers, 2000]. Disintermediation and reintermediation are occurring in various sectors of society and business. Organizations are experimenting with different kinds of decentralization, recentralization, and market orientation mechanisms, as well as internal coordination mechanisms. Citizens groups are organizing their activities differently, using chat rooms and other web-based media. Many of the social organizational relationships are being shifted into the automated realm, as software agents act on behalf of their human counterparts, as referred to in the preceding sections. New partners may be found via automated directory services, e.g., Universal Description, Discovery, and Integration (UDDI). Social dynamics is therefore becoming important in the analysis and design of information systems in the Internet age. Unfortunately, there are few techniques in the information system practitioner’s toolbox that take social structures and dynamics into account [deMichelis et al., 1998]. The social organization of system development organizations is also rapidly changing, most directly resulting from changes in development processes and the types of artifacts they produce. New professional categories arise, as specialized knowledge and skills are sought. New dependencies and relationships among teams and team members need to be identified and negotiated. Education and training, upgrading, and obsolescence — these and other labour-market and human-resource issues are often critical for project success. Larger changes, analogous to those happening in the user’s world, are also happening in the developer’s world. Industry structures are changing, and technology vendors are specializing or consolidating. Component creators and service providers are springing up to take advantage of the Internet computing platform. Outsourcing or insourcing, proprietary or open, commercial off-the-shelf (COTS) systems,
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 11 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-11
and open source development — all these alter the dynamics of IS work. The adoption of particular system architectures has direct significance for the social structures around it. As in the user’s world, some processes in the developer’s world will be carried out by software agents, with the social dynamics carried over into the automated realm. Again, there is little theoretical or practical support for the IS practitioner facing these issues.
33.5.4 Automation With Internet computing, the broad range of IS capabilities is now accessible to the user on a single, consistent platform. The ability of these functions and capabilities to interoperate creates a powerful synergistic effect because they can make use of information that is already in digital form and that is machine processable. Automated functions can invoke each other at electronic speeds. For example, programmed trading of commodities and financial instruments has been operating for some time. It is feasible to have medical test results sent to and responded to by one’s family physician, specialist, pharmacist, and insurer within seconds rather than days or weeks, if all the processing is automated. Governments can potentially collect electronic dossiers on the activities and movements of citizens for tax collection and law enforcement. Almost all knowledge work in organizations will be conducted through computers, as the technological support for searching, indexing, cross-referencing, multimedia presentation, and so forth, becomes a matter of routine expectation. More and more documents and other information content are “born digital” and will remain digital for most of their life cycle. In the past, what was to become automated was decided for each system within a well-defined context of use. Significant investments and efforts were required for each application system because each system required its own underlying computing support (vertical technology stack) and operational procedures (including data entry and output). Cost–benefits analysis led to automation only in selected areas or processes, typically based on economic and efficiency criteria. This was typically done by system analysts at the early stages of system definition, with the application system as the focal point and unit of analysis. The Internet has turned the tide in automation. We are witnessing that the concept of an isolated application system is dissolving. Information content — public or private — may pass through numerous systems on the network, invoking processing services from many operators and developed by many system vendors (e.g., via web services). Because of synergy and the network effect, it will be irresistible in economic or efficiency terms to automate [Smith and Fingar, 2002]. The investments have already been made, the technology infrastructure is there, and the content is already in machine-processable form. It will take a conscious effort to decide what not to automate. The decision as to what to automate requires difficult analysis and decision making, but is crucial for the success and sustainability of systems. There will often be a clash of competing interests among stakeholders involving issues of trust, privacy, security, reliability, vulnerability, risks, and payoffs. Even economic and speed advantages that are the usual benefits are not necessarily realizable in the face of the potential downsides. One needs to understand broad implications and long-term consequences. Heavily interconnected networks imply many far-reaching effects that are not immediately discernable. With the digital connectivity infrastructure in place, one has to take decisions on the degrees of automation. Information processing can range from the minimal (e.g., message transmission and representation at the destination, with no processing in between) to the sophisticated (extracting meaning and intent, and acting upon those interpretations). But even in messaging services, traffic patterns can be monitored and analyzed. So the analysis of what a system or service should do and should not do is much more complicated than in the pre-Internet world, and will involve complex human and machine processes as well as conflicting interests of many parties and conflicting perspectives. The same factors apply to the developer’s world. The increased demand from the great variety of IS capabilities will lead to pressure for more automation. When automation is raised to such a level that technical details can be hidden from the user, the entire development process can be pushed into the user’s world.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 12 Wednesday, August 4, 2004 9:03 AM
33-12
The Practical Handbook of Internet Computing
Much human knowledge and experience cannot be made explicit and codified symbolically. Where and how implicit knowledge and human judgement interact and combine with automated machine processes remains a difficult design challenge. IS practitioners have few tools that can support the analysis of these issues to help make these important decisions for users, service providers, and for society.
33.6 Conceptual Abstractions for Information Systems IS practice is based on conceptual abstractions with well-defined properties. Abstractions focus attention on aspects of the world that are relevant for information systems development. As we saw in Section 33.3, in order to develop information systems that can serve in some application setting, the complexity of that setting needs to be reduced through a set of modelling abstractions. The user’s world needs to be expressed in terms of models that can be analyzed, leading to decisions about what aspects of that world will become the responsibility of the intended system. During systems development (Section 33.2), the models are translated stepwise through a different set of abstractions, from ones that describe the user’s world (e.g., travel plans and bookings) to ones that describe the machine’s world (data and operations in computers that store those plans and execute those bookings) [Jarke et al., 1992]. At each stage or level of translation, analyses are performed to understand the situation; decisions are made on how elements at the current level should translate into or correspond to elements at the next level. Notations are important to help communication between the worlds of stakeholders and users on the one hand, and system designers on the other. They need to have sufficient expressiveness to convey the desired needs and requirements. Yet the notations need to be simple and concise enough for widespread adoption and standardization. Furthermore, they need to support analysis and inference, preferably automated, so as to be scalable. A necessary consequence of using modelling is restriction in what can be said about the world. Aspects of the world that cannot be expressed tend to be left out, and will no longer be the focus of attention during system development. Therefore, the design of notations requires a difficult balance [Potts and Newstetter, 1997]. With too much detail, one can get bogged down; with too little, one can get a wrong system that does not do what is needed or intended. Whatever is chosen, it is the modelling techniques used that shape the analyst’s perception of the world. As we surveyed the user’s and developer’s worlds, we noted that not all the relevant aspects are equally well supported by existing modelling abstractions. Processes and products are the mainstay of most existing modelling techniques, but quality and social structure are not well captured. Therefore, those issues are not systematically dealt with in mainstream methodologies. In the Internet age, it will be especially important to have conceptualizations and abstractions that relate concerns about social relationships and human interests to the technical alternatives in systems design, and vice versa. The preceding sections revealed that, with Internet computing, information systems are now expected to deal with a much wider range of conceptualizations than before. We will consider the abstractions for expressing what, when, where, how and why, and who.
33.6.1 Conceptualizing “What” and “When” The “what” refers to things that exist, events that occur, and properties and relationships that hold. These aspects of the world are most heavily addressed in existing modelling schemes. An online transaction needs to identify products bought and when payments take effect. Patient records need to distinguish different kinds of diseases and symptoms and document the nature and timing of treatments. Product and process proliferation triggered by Internet computing will test the limits of these modelling techniques. Current work in ontologies [Guarino and Welty, 2002] is revealing subtleties and limitations in earlier work. Knowledge structuring mechanisms such as classification, generalization, and aggregation
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 13 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-13
[Greenspan et al., 1994] that have been used in object-oriented modelling will be utilized extensively. The global reach of Internet computing is likely to push each classification and specialization scheme to the limits of its applicability, for example, to organize the types and features of financial instruments that may be transacted electronically and that are becoming available in the global investment marketplace. Metamodelling techniques are especially relevant for working with the conceptual structures spanning multiple domains or contexts [Nissen and Jarke, 1999]; an example is the classification of medical conditions by physicians as opposed to insurance companies, and in the context of one country or culture compared to another. The “what” and “when” cover the static and dynamic aspects of the world. Time is not always explicitly represented in dynamic models, but may appear as sequence or precedence relationships, e.g., coordinating multistep financial transactions that traverse many systems, countries, time zones, and organizations. Internet computing brings more complex temporal issues into play. Multiple systems cooperating on the network may operate on different time scales, interact synchronously or asynchronously at different periods. They will have different development and evolution lifecycles that require coordination. Conventional modelling techniques typically deal with only first-order dynamics, as in process execution and interaction. Second or higher order dynamics, such as change management, are usually not well integrated in the same modelling framework. When Internet computing is relied upon as a platform for long term continuity, there will be processes that have time horizons extending into years and decades (e.g., interorganizational workflow, managing the impacts of legislative change). Over the long term, process execution and process change will have human and automated components, involving users and developers. Similarly, the long-term presentation and preservation of information content over generations of information systems will be significant issues (e.g., identification and referencing of objects and how to make objects interpretable by humans and machines in future generations) [GAO, 2002].
33.6.2 Conceptualizing “Where” It should be no surprise that the Internet challenges conventional conceptions of geographic space. On the one hand, it enables users to transcend physical space, reaching out to others wherever they are. On the other hand, worldwide coverage means that users do come from many different geographic regions and locales, and that these differences are significant or can be taken advantage of in many applications. Peoples’ preferences and interests, linguistic and cultural characteristics, legal frameworks and social values — all of these can be correlated to physical locations. Mobile and ubiquitous computing, silent commerce, and intelligent buildings can make use of fine-grained location awareness. Modelling techniques developed in geographical information systems (GIS) can be expected to find wider applications stimulated by Internet computing, e.g., to offer location sensitive services to users in vehicles, to help visitors navigate unfamiliar territory, or to track material goods in transit. Physical locations will often need to be mapped to jurisdictional territories, e.g., in enforcing building security.
33.6.3 Conceptualizing “How” and “Why” The distinction between “what” and “how” is often made within software engineering and in systems design. Requirements are supposed to state the “what” without specifying the “how.” Here the “what” refers to essential characteristics, whereas the “hows” used to achieve the “what” reflect incidental characteristics that may be peculiar to the implementation medium or mechanisms. This is one of the core principles of abstraction in dealing with large systems. The what and how distinction can be applied at multiple levels or layers, each time focusing on features and issues relevant to that level, while hiding “details” that can be deferred. Structured analysis techniques (e.g., SADT, DFD) rely heavily on a layered, hierarchical structure for the gradual revealing of details. Although the vertical layering of processes or functions can be viewed as embodying the how (downwards) and why (upwards) for understanding the structure of a system, much of the reasoning leading to the structure is not captured. There is almost always more than one
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 14 Wednesday, August 4, 2004 9:03 AM
33-14
The Practical Handbook of Internet Computing
answer when considering how to accomplish a task. Yet most modelling techniques only admit one possible refinement in elaborating on the “how.” Alternatives, their pros and cons, and why one of them is chosen typically cannot be described and analyzed within the notation and methodology. The lack or loss of this information in systems descriptions makes system evolution difficult and problematic. Understanding how and why will be critical in the Internet environment, where systems can be much more dynamic and contingent. Systems are typically not conceived in terms of a single coherent system with a top-level overview that can then be decomposed into constituent elements to be designed and constructed by the same project team. Instead, systems could arise from network elements that come together in real time to participate in a cooperative venture, then dissolve and later participate in some other configuration. There will be many ways to assemble a system from a network of potential participants. Components can come together in real time for short periods to form a cooperative venture. Designers, or the systems component themselves, must have ways of identifying possible solutions (the hows) and ways for judging which ones would work, and work well, according to quality criteria and goals (the whys). Reasoning about how and why is needed at all levels in systems development, e.g., at the application service level (a navigation system recommends an alternate route based on traffic conditions and user preferences) and at the systems and networks level (a failure triggering diagnostics that lead to system recovery). Representing how and why has been addressed in the areas of goal-oriented requirements engineering [Mylopoulos et al., 1999; van Lamsweerde, 2000], design rationales [Lee, 1997], the quality movement (e.g., [Hauser and Clausing, 1988], and partly in requirements traceability [Ramesh and Jarke, 2001].
33.6.4 Conceptualizing “Who” The most underdeveloped aspect is the conceptualization of the notion of “who” to support systems analysis. The Internet environment will bring many actors into contact with each other. There will be individuals, groups, organizations, and units within organizations such as teams, task forces, and so forth. They will be acting in many different capacities and roles, with varying degrees of sustained identity. They will have capabilities, authorities, and responsibilities. Information system entities (e.g., software agents) may be acting on behalf of human actors, taking on some of their rights and obligations. Traditional information systems tend to exist in closed environments, e.g., within the authority structure of a single organization. Social structures are more easily defined and instituted, e.g., as used in rolebased access control techniques in computer security [SACMAT, 2003]. With Internet computing, there can be much greater numbers of participants and roles (both in types and instances), with dynamic and evolving configurations of relationships. Consider, for example, healthcare information systems that connect patients at home to hospitals and physicians, later expanding to community care centers, and eventually to insurers and government regulatory agencies and registries. New configurations can arise from time to time due to innovation (e.g., in business models) or regulatory change. There are complex issues of reliability, trust, privacy, and security, as well as operational responsibilities. There are difficult analysis and tradeoffs, arising from the complex social relationships. Virtuality of the Internet creates many new issues for notions of “who,” e.g., identity and personae, influence and control, authority and power, ownership and sharing [Mulligan, 2003]. All of these are crucial in analyzing new organizational forms (centralize vs. decentralize, internal vs. external) and social relationships. Notions of community are important in knowledge management and in managing meaning in conceptual models. These issues are not well addressed in traditional information systems techniques. Some of them are beginning to be studied in agent-oriented approaches to software systems and information systems [Papazoglou, 2001; Huhns and Singh, 1998]. However much of the work on agent-oriented software engineering is currently focused on the design of software agents [Giunchiglia et al., 2003]. For information systems, more attention needs to be paid to modelling and analyzing conceptions of “who” as applied to complex relationships involving human as well as software agents [Yu, 2002].
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 15 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-15
33.7 Summary and Conclusions Internet computing is changing the world of information systems. Information systems started historically as computer applications designed specifically for a well defined usage setting. They implemented a narrow range of repetitive processes, producing predetermined types of information products. Most often, these are automated versions of manual processes. Systems development was primarily “vertically” oriented; the main activities or processes were to convert or translate a vision of a new system into functioning procedures (executable code) and populated databases. A system project involves significant investments and lead time because it typically requires its own technology infrastructure, including networking. With Internet computing, information systems projects will become more and more “horizontal.” The larger proportion of the effort will be to coordinate interactions with other system and information resources that already exist or may exist in the future. They will potentially interact with a much wider range of users, with different quality expectations and offerings, and evolving usage patterns. Development work can be more incremental, as new systems are built from ever higher-level platforms and components. These developments will enable information systems professionals to concentrate on the application level, helping users and stakeholders to formulate and understand their problems and aspirations in ways that can take advantage of information technology solutions. Given the broad spectrum of technological capabilities that are now available on a common infrastructure with ever higher-level interoperability, information systems are coming into their own as embodying and realizing the wishes and visions of the user’s world instead of reflecting the limitations and inherent structures of the underlying technologies. The chief limitations in this regard are those imposed by the modelling techniques of the day. As we have reviewed in this chapter, traditional IS techniques have focused on those aspects that lead most directly to the computerization or automation of existing information processes and products, as these are perceived to be the most tangible results of the project investment. The compartmentalized, vertical system development perspective means that the perception and conceptualization of the world is filtered through preconceived notions of what can be automated, based on the technological implementation capabilities of the day. Hence traditional techniques have focused on the modelling and analysis of processes and products, activities and entities, objects and behaviors. Ontologies for analysis and design are well developed for dealing with the static and dynamic dimensions of the world. Much less attention has been paid to the quality and social aspects, even though these are known to be important success factors and to have contributed to many failures. In the horizontal world brought about by massive networking, quality and social dimensions will come to the fore. Modelling techniques must cover the full range of expressiveness needed to reasoning about the what, where, when, how, and why, and especially the who of information systems in their usage and development contexts. Refined characterizations of the notions of who will be crucial for tackling the human and social issues that will increasingly dominate systems analysis and design. Privacy and security, trust and risks, ownership and access, rights and obligations, these issues will be contested, possibly down to level of transactions and data elements, by a complex array of stakeholders in the open networked world of Internet computing. Development processes and organizations are benefiting from the same advances experienced in user organizations. Systems development work is taking advantage of support tools that are in effect specialized information systems for its own work domain. As the level of representation and analysis is raised closer to and becomes more reflective of the user’s world and their language and conceptual models, the developer’s world blends in with the user’s world, providing faster and tighter change cycles achieving more effective information systems. Despite connectivity and potential accessibility, the networked world will not be of uniform characteristics or without barriers. There will continue to be differentiation and heterogeneity in technical capabilities as well as great diversity in information content and services. Internet computing allows information systems to transcend many unwanted technological barriers, yet it must allow user commu-
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 16 Wednesday, August 4, 2004 9:03 AM
33-16
The Practical Handbook of Internet Computing
nities to create, maintain, and manage boundaries and identities that reflect the needs for locality and autonomy. Techniques that support the management of homogeneity within a locality and heterogeneity across localities will be a crucial challenge in the Internet age.
Acknowledgements The author is grateful to the editor and Julio Leite for many useful comments and suggestions.
References Banville, Claude and Maurice Landry (1992). Can the Field of MIS be Disciplined? In: Galliers, R., Ed., Information Systems Research: Issues Methods and Practical Guidelines. London et al.: Blackwell, 61–88. Basili, Victor R. and G. Caldiera (1995). Improve software quality by reusing knowledge and experience. Sloan Management Review, 37(1): 55–64, Fall. Berners-Lee, Tim, Jim Hendler, and Ora Lassila (2001). The semantic web. Scientific American, May, 2001. Bernstein, Philip A. (2001). Generic model management — a database infrastructure for schema management. Proceedings of the 9th International Conference on Cooperative Information Systems (CoopIS 01), Trento, Italy, LNCS 2172, Springer, New York, 1–6. Bernstein, Philip A., Bergstraesser, T., Carlson, J., Pal, S., Sanders, P., Shutt, D. (1999). Microsoft repository version 2 and the open information model. Information Systems, 24(2): 71–98. Boehm, Barry (1981). Software Engineering Economics. Prentice Hall, Englewood Cliffs, NJ. Boehm, Barry and Hoh In (1996). Identifying Quality-Requirement Conflicts. IEEE Software, March, 25–35. Boman, Magnus, Janis Bubenko, Paul Johannesson, and Benkt Wangler (1997). Conceptual Modeling. Prentice Hall, Englewood Cliffs, NJ. BPMI. The Business Process Management Initiative. www.bpmi.org Brinkkemper, Sjaak and S. Joosten (1996). Method engineering and meta-modelling: editorial. Information and Software Technology, 38(4): 259. Bussler, Christoph (2002). P2P in B2BI. Proceedings of the 35th Hawaii International Conference on System Sciences. IEEE Computer Society. Vol. 9: 302–311. Checkland, Peter B. (1981). Systems Thinking, Systems Practice. John Wiley and Sons, Chichester, U.K. Chen, Peter (1976). The Entity-Relationship Model: towards a unified view of data. ACM Transactions on Database Systems 1(1): 9–36. Chrissis, Marybeth, Mike Konrad, and Sandy Shrum (2003). CMMI : Guidelines for Process Integration and Product Improvement. Addison-Wesley, Reading, MA. Chung, Lawrence, Brian Nixon, Eric Yu, and John Mylopoulos (2000). Non-Functional Requirements in Software Engineering. Kluwer Academic , Dordrecht, The Netherlands. Cockburn, Alistair (2001). Agile Software Development. Addison-Wesley, Reading, MA. Curran, Thomas and Gerhard Keller (1997). SAP R/3 Business Blueprint: Understanding the Business Process Reference Model. Pearson Education, Upper Saddle River, NJ. Curtis, Bill, Mark Kellner, and James Over (1992). Process modelling. Communications of the ACM, 35(9): 75–90, September. Davenport, Thomas H. and J.E. Short (1990). The new industrial engineering: information technology and business process redesign. Sloan Management Review, 1990, 31 (4): 11–27. DeMarco, Thomas (1979). Structured Analysis and System Specification. Prentice Hall, Englewood Cliffs, NJ. DeMarco, Thomas and T. Lister (1999). Peopleware, 2nd ed. Dorset House, New York. deMichelis, Giorgio, Eric Dubois, Matthias Jarke, Florian Matthes, John Mylopoulos, Michael Papazoglou, Joachim W. Schmidt, Carson Woo, and Eric Yu (1998). A three-faceted view of information systems: the challenge of change. Communications of the ACM, 41(12): 64–70
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 17 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-17
ebXML. www.ebxml.org. Ehn, Pelle (1988). Work-Oriented Development of Software Artifacts. Arbetslivscentrum, Stockholm. Falcone, Rino, Munindar P. Singh, and Yao-Hua Tan (2001). Trust in Cyber-Societies: Integrating the Human and Artificial Perspectives. Lecture Notes in Artificial Intelligence 2246. Springer, New York. General Accounting Office (GAO) (2002). Information Management: Challenges in Managing and Preserving Electronic Records. Report number GAO-02-586. GAO, Washington, D.C. Gibbs, W. Wayt (1994). Software’s chronic crisis. Scientific American (International edition) September 1994. 72–81. Gilder, George (1993). Metcalfe’s Law and Legacy. Forbes ASAP, September 13, 1993. http://www.gildertech.com/public/telecosm_series/metcalf.html Giunchiglia, Fausto, James Odell, and Gerhard Weib, Eds. (2003). Agent-Oriented Software Engineering III, Third International Workshop, Bologna, Italy, July 15, 2002. Lecture Notes in Computer Science, 2585. Springer, New York. Goguen, Joseph and Marina Jirotka, Eds. (1994). Requirements Engineering: Social and Technical Issues. Academic Press, London. Gomez-Peres, A. and O. Corcho (2002). Ontology languages for the semantic web. IEEE Intelligent Systems 17(1): 54–60. Gorgone, John T. et al. (2002). IS 2002 — Model Curriculum and Guidelines for Undergraduate Degree Programs in Information Systems. Association for Computing Machinery (ACM), Association for Information Systems (AIS), Association of Information Technology Professionals (AITP). http:// www.acm.org/education/is2002.pdf. Greenspan, Sol J., John Mylopoulos, and Alexander Borgida (1994). On Formal Requirements Modeling Languages: RML Revisited. International Conference on Software Engineering. ACM Press, New York. pp. 135–147. Guarino, Nicola and Christopher A. Welty (2002) Evaluating ontological decisions with OntoClean. Communications of the ACM, 45(2): 61–65. Hammer, Michael (1990) .Reengineering work: don’t automate, obliterate. Harvard Business Review, July. 104–112. Hauser, J.R. and D. Clausing (1988). The house of quality. Harvard Business Review, (3), May, 63–73. HL7. www.hl7.org. Holtzblatt, Karen and H.R. Beyer (1995). Requirements gathering: the human factor. Communications of the ACM, 38(5): 30–32. Huhns, Michael and Munindar P. Singh (1998). Readings in Agents. Morgan Kaufmann, San Francisco, CA. Humphrey, Watts (1995). A Discipline for Software Engineering. Addison-Wesley, Reading, MA. IETF. The Internet Engineering Task Force. http://www.ietf.org/. Isakowitz, Tomas, Michael Bieber, and Fabio Vitali (1998). Web information systems. Communications of the ACM, 41(7): 78–80. ISO (1992) ISO9000 International Standards for Quality Management. Geneva, International Organization for Standardization. Jarke, Matthias, John Mylopoulos, Joachim Schmidt, and Yannis Vassiliou (1992). DAIDA: an environment for evolving information systems. ACM Transactions on Information Systems, 10(1): 1–50, 1992. Jarke, Matthias, K. Pohl, K. Weidenhaupt, K. Lyytinen, P. Martiin, J.-P. Tolvanen, and M. Papazoglou (1998) Meta modeling: a formal basis for interoperability and adaptability. In B. Krämer, M. Papazoglou, H.-W. Schmidt, Eds., Information Systems Interoperability. John Wiley and Sons, New York, pp. 229–263. Kling, Rob (1996). Computerization and Controversy: Value Conflicts and Social Choices, 2nd ed. Academic Press, San Diego, CA. Kruchten, Philippe (2000). The Rational Unified Process: An Introduction, 2nd ed. Addison-Wesley, Reading, MA.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 18 Wednesday, August 4, 2004 9:03 AM
33-18
The Practical Handbook of Internet Computing
Lamsweerde, Axel van (2000). Requirements Engineering in the Year 00: A Research Perspective. 22nd International Conference on Software Engineering, Limerick, Ireland, ACM Press, New York. Lee, Jintae (1997). Design rationale systems: understanding the issues. IEEE Exper., 12(3): 78–85. Lyytinen, Kalle (1987). Different perspectives on information systems: problems and their solutions. ACM Computing Surveys, 19(1): 5–44. Loucopoulos, Pericles and R. Zicari, Eds. (1992). Conceptual Modeling, Databases and CASE: An Integrated View of Information System Development. John Wiley and Sons, New York. McConnell, Steve (1996). Rapid Development: Taming Wild Software Schedules. Microsoft Press. Mowshowitz, Abbe (1997). Virtual Organization — introduction to the special section. Communications of the ACM, 40(9): 30–37 Muller, Michael J. and Sarah Kuhn (1993). Participatory design. Communications of the ACM, 36(6): 24–28, June. Mulligan, Deirdre K. (2003). Digital rights management and fair use by design. Special issue. Communications of the ACM, 46(4): 30–33. Mylopoulos, John (1998). Information modeling in the time of the revolution. Invited review. Information Systems, 23(3, 4): 127–155. Mylopoulos, John, Alex Borgida, Matthias Jarke, and Manolis Koubarakis (1990). Telos: representing knowledge about information systems. ACM Transactions on Information Systems, 8(4): 325–362. Mylopoulos, John, K. Lawrence Chung, and Eric Yu (1999). From object-oriented to goal-oriented analysis. Communications of the ACM, 42(1): 31–37, January 1999. Nissen, Hans W. and Matthias Jarke (1999). Repository support for multi-perspective requirements engineering. Information Systems, 24(2): 131–158. NIST (1993). Integrated Definition for Function Modeling (IDEF0). 1993, National Institute of Standards and Technology. Nonaka, Ikujiro and Hirotaka Takeuchi (1995) The Knowledge-Creating Company. Oxford University Press, Oxford, U.K. OASIS. Organization for the Advancement of Structured Information Standards. http://www.oasisopen.org/. OpenGroup. The Open Group. http://www.opengroup.org/. Papazoglou, Michael (2001). Agent-oriented technology in support of e-business. Communications of the ACM 44(4): 71–77. Potts, Colin and Wendy Newstetter (1997). Naturalistic Inquiry and Requirements Engineering: Reconciling Their Theoretical Foundations. Proceedings of the 3rd IEEE International Symposium on Requirements Engineering, Annapolis, MD, January 1997. 118–127. Ramesh, Bala and Matthias Jarke (2001). Towards reference models for requirements traceability. IEEE Transactions on Software Engineering, 27(1): 58–93. Rosenbloom, Andrew (2000). Trusting technology: introduction to special issue. Communications of the ACM, 43(12), December. Ross, Douglas T. and D. Schoman (1977). Structured analysis for requirements definition. IEEE Transactions on Software Engineering 3(1): 6–15. Special Issue on Requirements Analysis, January. Royce, W.W. (1970). Managing the Development of Large Software Systems: Concepts and Techniques. Proceedings of WESCON, IEEE Computer Society Press, Los Alamitos, CA. Rumbaugh, James, Ivar Jacobson, and Grady Booch (1999). The Unified Modeling Language Reference Manual. Addison-Wesley, Reading, MA. SACMAT (2003). 8th ACM Symposium on Access Control Models and Technologies, June 2-3, 2003, Villa Gallia, Como, Italy, Proceedings. SemanticWeb. The Semantic Web community portal. http://semanticweb.org/. Smith, Howard and Peter Fingar (2002). Business Process Management: The Third Wave. Meghan-Kiffer Press, Tampa, FL. Standish Group (1995). Software Chaos. http://www.standishgroup.com/chaos.html.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 19 Wednesday, August 4, 2004 9:03 AM
Information Systems
33-19
Timmers, Paul (2000). Electronic Commerce: Strategies and Models for Business-to-Business Trading. John Wiley and Sons, New York. UDDI. www.uddi.org. UN/CEFACT. United Nations Centre for Trade Facilitation and Electronic Business. http:// www.unece.org/cefact/. Vassiliadis, Panos, Christoph Quix, Yannis Vassiliou, and Matthias Jarke (2001). Data warehouse process management. Information Systems, 26(3): 205–236. W3C. The World Wide Web Consortium. http://www.w3.org/. WebServices. WebServices.org portal. http://www.webservices.org/. Webster, Dallas E. (1988). Mapping the Design Information Representation Terrain. IEEE Computer, 21(12): 8–23. Weinberg, Jerry (1998). The Psychology of Computer Programming, Silver Anniversary Edition. Dorset House, New York. Wiederhold, Gio and Michael R. Genesereth (1997). The conceptual basis for mediation services. IEEE Expert, 12(5): 38–47. Wood, Jane and Denise Silver (1995). Joint Application Development, 2nd ed. John Wiley and Sons, New York. Yang, Jian and Mike P. Papazoglou (2000) Interoperation support for electronic business. Communications of the ACM, 43(6): 39–47. Yu, Eric (January 1997). Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering. Proceedings IEEE International Symposium on Requirements Engineering, Annapolis, MD, 226–235. Yu, Eric and John Mylopoulos (1994).Understanding “Why” in Software Process Modeling, Analysis and Design. Proceedings 16th International Conference on Software Engineering, Sorrento, Italy. Yu, Eric and John Mylopoulos (1994). From E–R to A–R — Modeling Strategic Actor Relationships for Business Process Reengineering. Proceedings 13th International Conference on the Entity-Relationship Approach, Manchester, U.K., December 1994; P. Loucopoulos, Ed., Lecture Notes in Computer Science 881, Springer-Verlag, New York, 548–565. Yu, Eric, John Mylopoulos, and Yves Lespérance (1996). AI models for business process reengineering. IEEE Expert, 11(4). Yu, Eric S. K. (2002). Agent-oriented modelling: software versus the world. In Michael Wooldridge, Gerhard Weib, and Paolo Ciancarini, Eds. Agent-Oriented Software Engineering II, Lecture Notes in Computer Science 2222 Springer, New York, 206–225.
Copyright 2005 by CRC Press LLC
C3812_C33.fm Page 21 Wednesday, August 4, 2004 9:03 AM
Part 4 Systems and Utilities
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 1 Wednesday, August 4, 2004 9:05 AM
34 Internet Directory Services Using the Lightweight Directory Access Protocol CONTENTS Abstract 34.1 Introduction 34.2 The Evolution of LDAP 34.2.1 The Past, Present, and Future Generations of LDAP Directories 34.2.2 First- and Second-Generation Directory Services. 34.2.3 Next-Generation Directory Services
34.3 The LDAP Naming Model 34.3.1 34.3.2 34.3.3 34.3.4 34.3.5
The X.500 Naming Model Limitations of the X.500 Naming Model Early Alternatives to the X.500 Naming Model Internet Domain-Based Naming Naming Entries within an Organization
34.4 The LDAP Schema Model 34.4.1 34.4.2 34.4.3 34.4.4
Attribute-Type Definitions Object-Class Definitions Object Classes for Entries Representing People Other Typical Object Classes
34.5 LDAP Directory Services 34.5.1 34.5.2 34.5.3 34.5.4 34.5.5 34.5.6
Basic Directory Services High Availability Directory Services Master–Slave Replication LDAP Proxy Server Multimaster Replication Replication Standardization
34.6 LDAP Protocol and C Language Client API
Greg Lavender Mark Wahl
Copyright 2005 by CRC Press LLC
34.6.1 LDAPv3 Protocol Exchange 34.6.2 General Result Handling 34.6.3 Bind 34.6.4 Unbind 34.6.5 Extended Request 34.6.6 Searching 34.6.7 Search Responses 34.6.8 Abandoning an Operation 34.6.9 Compare Request 34.6.10 Add, Delete, Modify, and ModifyDN Operations
34.11 Conclusion
C3812_C34.fm Page 2 Wednesday, August 4, 2004 9:05 AM
34-2
The Practical Handbook of Internet Computing
Acknowledgments References Author Bios
Abstract We survey the history, development, and usage of directory services based on the Lightweight Directory Access Protocol (LDAP). We present a summary of the naming model, the schema model, the principal service models, and the main protocol interactions in terms of a C language application programming interface.
34.1 Introduction The landscape of network-based directory technology is fascinating because of the evolution of distributed systems ideas and Internet protocol technologies that have contributed to the success of the Internet as a collection of loosely coordinated, interoperable network-based systems. The success of open-systems directory technology based on the Lightweight Directory Access Protocol (LDAP) is attributed to the persistence of many people in academia, industry, and within international standards organizations. Today, LDAP-based technology is widely used within national and multinational intranets, wired and wireless service provider value-added networks, and the public Internet. This success is due to an Internet community process that worked to define and evolve practical X.500 and LDAP directory specifications and technologies towards a more ubiquitous Internet directory service. There are many different types of directory services, each providing useful capabilities for users and applications in different network-based settings. We are primarily concerned with directories that have a structured data model upon which well-defined operations are performed, most notably search and update. Directories are used to service a much higher percentage of authentication operations (technically called “bind” operations) and search operations, rather than update operations, which requires them to optimize for reading rather than writing information. However, as directories become increasingly authoritative for much of the information that is used to enable a wider range of Web services as part of private intranet and public Internet infrastructures, they are increasingly required to provide the kind of update rates one expects of a transactional database management system. Directories in general can be characterized by a hierarchical tree-based naming system that offers numerous distributed system advantages: • Names are uniquely determined by concatenating hierarchical naming components starting at a distinguished root node (e.g., “com”) • Object-oriented schema and data model supporting very fast search operations • Direct navigation or key-based querying using fully qualified or partially specified names • Distribution and replication based on named sub-tree partitions • Delegated or autonomous administration based on named sub-tree partitions In addition, Internet directories add various authentication services and fine-grained access control mechanisms to ensure that access to information contained in the directory is only granted to authorized users. The type of information contained in a directory can be either general purpose and extensible, or specialized for a particular optimized directory service. In the following section, we briefly distinguish a general-purpose directory service based on LDAP from more specialized directory services, all of which typically coexist as part of an organizational information service, but each providing an important component of a multitiered networked information service. The most widely known directory service on the Internet today is the Domain Name System (DNS). Others that have appeared and are either lightly used or no longer used include whois, whois++, and WAIS. The primary purpose of DNS is to provide name to address lookups where a name is a uniquely
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 3 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-3
determined hierarchical Internet domain name and an address is an IP or other network layer protocol address, resulting in a very specialized name-to-address mapping service. DNS has a hierarchical name structure, it is widely replicated, and it is autonomously administered. However, we distinguish DNS as a specialized directory service because its information model and query mechanism are specialized to the purpose of providing very specific host addressing information in response to direct or reverse lookup queries. There have been various extensions to DNS to extend its rather limited data model, namely in the area of service (SRV) records, but such records do not significantly enhance the main DNS hostnameto-address lookup service. Another class of directories provides various specialized naming services for network operating systems, called NOS directories. Popular NOS directories include NIS for UNIX® systems, NDS for Novell Netware™, and Active Directory for Microsoft Windows™. NOS directories are typically based on proprietary protocols and services that are tightly integrated with the operating system, but may include some features and services adopted from X.500 and LDAP directory specifications in order to permit a basic level of interoperability with client applications that are written to such open-systems specifications. NOS directories are very well suited to local area network environments in support of workgroup applications (e.g., file, print, and LAN email address book services), but have historically failed to satisfy other business-critical applications requiring Internet scale levels of service and reliability. Network-based file systems, such as NFS or AFS, may also be considered as specialized directory services in that they support an RPC-based query language that uses hierarchical file names as keys to access files for the purposes of reading and writing those files across a network. There have been attempts to create Internet-scale distributed file systems, but most file systems are highly specialized for efficient reading and writing of large amounts of file-oriented data on high-bandwidth local area networks or as part of a storage area network. Network file systems are not typically intended for use as general-purpose directory services distributed or replicated across wide-area networks, although there are various attempts underway to define Internet-scale file systems. Various ad hoc directory services have also been constructed using custom database systems (e.g., using sequentially accessed flat files or keyed record access) or relational database management systems. For example, a common application of a directory service is a simple “white pages” address book, which allows various lookup queries based on search criteria ranging from people’s names, email addresses, or other searchable identity attributes. Given that RDBMS products are widely deployed and provide robust information storage, retrieval, and transactional update services, it is straightforward to implement a basic white pages directory service on top of an RDBMS. However, there can be complications due to limitations imposed by the fact that most directory services are defined in terms of a hierarchical data model and the mapping of this hierarchical data model into a relational data model is often less than satisfactory for many network based directory-enabled applications that require ultrafast search access and cannot tolerate the inherent overhead of mapping between the two data models. In addition, network based authentication and access control models may require an extra abstraction layer on top of the RDBMS system. This mapping may result in less functionality in these areas as well as potential inefficiencies. Finally, numerous “yellow pages” directories have arisen on the Internet that provide a way to find information. While called “directories” because of the way information is organized hierarchically and presented via a Web browser, such directories are more properly text-based information search and retrieval systems that utilize Web spiders and some form of semantic qualification to group textual information into categories that meet various search criteria. These Web directories are highly useful for searching and accessing very large information spaces that are otherwise difficult to navigate and search without the assistance of some form of browsable navigation directory (e.g., Yahoo! and Google). While powerful, we distinguish this type of directory from LDAP-based directories that rely on an extensible but consistent data model that facilitates a more highly structured search mechanism. Some yellow pages directories actually utilize LDAP directories to augment their generalized text search mechanisms by creating structured repositories of metainformation, which is used to guide future searches based on stored topological information and/or historical search results.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 4 Wednesday, August 4, 2004 9:05 AM
34-4
The Practical Handbook of Internet Computing
34.2 The Evolution of LDAP The LDAPv3 protocol and modern directory servers are the result of many years of research, international standardization activities, commercial product developments, Internet pilot projects, and an international community of software engineers and network administrators operating a globally interconnected directory infrastructure. LDAP directories originated from projects and products that originally implemented directories based on the 1988 and 1993 International Telecommunications Union (ITU) X.500 series of international standards [ITU X.500, 1993; ITU X.501, 1993; ITU X.511, 1993; ITU X.518–X.521, 1993; ITU X.525, 1993] under the assumption that electronic messaging based on the ITU X.400 series of standards would eventually become the dominant e-mail system in the international business community. For several technical, economic, and sociopolitical reasons, X.400 has not become the dominant email transport system that was envisioned, but X.500 has survived, and many LDAP protocol features and the schema and data model are derived from the X.500 specifications, as well as the early work on deploying practical Internet directories by the IETF OSI-DS working group [Barker and Kille, 1991]. Before the emergence of the Internet as a ubiquitous worldwide public network, there was an assumption that the world’s data networks, and the messaging and directory infrastructures deployed on them, would be operated much like the world’s voice communications networks were managed in the 1980s. These networks were managed by large telecommunications companies that had international bilateral agreements for handling such international communication services. The emergence of the worldwide Internet as a viable commercial network undercut many of the underlying service model assumptions that had gone into designing these technologies by participants in the International Telecommunications Union standards organizations. As universities, industrial R&D organizations, and technology companies increasingly embraced the Internet as a model for doing business, they created technologies that effectively created a value-added network on top of the bandwidth leased from the Telcos. This new style of networking created new demand for innovations that would better fit the Internet style of networked computing, where there is a very high degree of local autonomy with regard to deploying and managing network services as part of an enterprise’s core IT function, rather than leasing expensive application services from a telecommunications company, as opposed to simply leasing bandwidth.
34.2.1 The Past, Present, and Future Generations of LDAP Directories Since the mid-1990s, directory servers based on LDAP have become a significant part of the network infrastructure of corporate intranets, business extranets, and service providers in support of many different kinds of mission-critical networked applications. The success of LDAP within the infrastructure is due to the gradual adoption of directory servers based on the LDAPv3 protocol. The use of LDAP as a client access protocol to X.500 servers via an LDAP-to-X.500 gateway has been replaced by pure LDAPv3 directory servers. In this section, we briefly review the technological evolution that led to the current adoption of LDAPv3 as part of the core network infrastructure. During the 1980s, both X.400 and X.500 technologies were under active development by computer technology vendors, as well as by researchers working in networking and distributed systems. Experimental X.400/X.500 pilots were being run on the public research Internet based on open source technology called the ISO Development Environment (ISODE), which utilized a convergence protocol allowing OSI layer 7 applications to operate over TCP/IP networks through a clever mapping of the OSI class 0 transport protocol (TP0) onto TCP/IP. TP0 was originally designed for use with a reliable, connectionoriented network service, such as that provided by the X.25 protocol. However, the principal inventor of ISODE, Marshall Rose (who was to go on to develop SNMP), recognized that the reliable, connectionoriented transport service provided by TCP/IP could effectively (and surprisingly efficiently) masquerade as a reliable connection-oriented network service underneath TP0. By defining the simple (5-byte header) convergence protocol specified in RFC 1006 [Rose and Cass, 1987], and implementing an embedding of IP addresses into OSI presentation addresses, the OSI session and presentation layers could be directly mapped onto TCP. Hence any OSI layer 7 application requiring the upper-layer OSI protocols could
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 5 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-5
instantly be made available on the public Internet. Given the lack of public OSI networks at the time, this pragmatic innovation enabled the rapid evolution of X.500-based directory technology through realworld deployments across the Internet at universities, and industrial and government R&D labs, and forward-looking companies working to commercially exploit emerging Internet technologies. The lack of any real competitive technology led to the rapid adoption of network-based directory services on the Internet (unlike X.400, which failed to displace the already well-established SMTP as the primary Internet protocol for email). LDAP arose primarily in response to the need to enable a fast and simple way to write directory client applications for use on desktop computers with limited-memory capacity (< 16 MB) and processing power (< 100 MHz). The emergence of desktop workstations and the PC were driving demand for increasingly sophisticated client applications. In order for X.500 to succeed independently of X.400, client applications were needed that could run on these rather limited desktop machines by today’s standards. A server is only as good as the service offered to client applications, and one of the major inhibitors to the adoption of X.500 as a server technology outside its use as an address routing service for X.400 was the lack of sophisticated client applications. The X.500 client protocol, DAP, was used principally by X.400 to access information required to route X.400 messages. Since X.400 was another server application based on the full OSI stack, the complexity of using the DAP protocol for access to X.500 was “in the noise” given the overall complexity and computing resources required by a commercial-grade X.400 system. Many early X.400/X.500 vendors failed to grasp that simply specified protocols and the “good enough” services they enable were a key driver in the growth of the Internet and the resulting market demand for server software. As a result, X.400 and X.500 infrastructures were not as widely deployed commercially as anticipated, except in selected markets (e.g., some military and governmental organizations) where a high degree of complexity is not necessarily an inherent disadvantage. The simplest application for a directory is as a network-based white pages service, which requires a rather simple client to issue a search using a string corresponding to a person’s name or a substring expression that will match a set of names, and return the list of entries in the directory that match the names. This type of simple white pages client application was the original motivation for defining LDAP as a “lightweight” version of DAP. Some people like to claim that the “L” in LDAP no longer stands for “lightweight” because LDAP is now used in servers to implement a full-blown distributed directory service, not just a simple client-access protocol. However, the original motivation for making LDAP a lightweight version of DAP was to eliminate the requirement for the OSI association control service element (ACSE), the remote operations service element (ROSE), the presentation service, and the rather complicated session protocol over an OSI transport service (e.g., TP4). Even with the convergence protocol defined in RFC 1006, the upper layer OSI protocols required a rather large in-memory code and data footprint to execute efficiently. LDAP was originally considered “lightweight” precisely because it operated directly over TCP (eliminating all of the OSI layers except for use of ASN.1 and BER [ITU X.681, 1994; ITU X.690, 1994]), had a much smaller binary footprint (which was critical for clients on small-memory desktop PCs of the time), and had a much simpler API than DAP. In this context, the use of the term lightweight meant a small memory footprint for client application, fewer bits on the wire, and a simpler programming model, not lightweight functionality. The LDAPv2 specification [Yeong et al., 1995] was the first published version of the lightweight client directory access protocol. While supporting DAP-like search and update operations, the interface to LDAPv2 was greatly simplified in terms of the information required to establish an association with an X.500 server, via an LDAP-to-DAP gateway. The introduction of this application-level protocol gateway mapped the client operations to the full DAP and OSI protocol stack. So, in this sense, LDAPv2 was a proper subset of the services offered by DAP, and no changes were required to an X.500 server to support these lightweight client applications, such as address book client services, as part of a desktop email application. LDAPv2 enabled rapid development of client applications that could then take advantage of what was expected to be a global X.500-based directory system. As client applications began to be developed with LDAPv2, some operational shortcomings manifested themselves. The most notable was the lack of a strong authentication mechanism. LDAPv2 only supports anonymous and simple password-
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 6 Wednesday, August 4, 2004 9:05 AM
34-6
The Practical Handbook of Internet Computing
based authentication (note: Yeong et al. [1995] predated the emergence of SSL/TLS). Such security concerns, the mapping of the X.500 geopolitical naming model to the Internet domain names, the need for referrals to support distributed directory data, the need for an Internet standard schema (e.g., inetOrgPerson), and the desire for a mechanism for defining extensions led to the formation of IETF LDAPEXT working group that began defining a richer set of services based on LDAPv2, which became a series of specifications for LDAPv3 (RFCs 2251–2256 [Wahl, 1997; Wahl et al., 1997a; 1997b; 1997c; Howes, 1997; Howes and Smith, 1997). This process of gradually realizing that a simpler solution will likely satisfy the majority of user and application requirements is a recurring theme in today’s technology markets. In addition, the emergence of business-critical Web-based network services and the widespread adoption of LDAPv3 based technologies as part of the infrastructure enabling Web services has enabled a sustainable market for LDAP directories and ensured that someday in the future LDAP directories will be considered entrenched legacy systems to be coped with by some future infrastructure initiatives.
34.2.2 First- and Second-Generation Directory Services As just discussed, LDAP originated as a simplification of the X.500 DAP protocol to facilitate client development on small machines. The first LDAPv2 client applications were used with an X.500 server called “Quipu,” which was developed as a collaboration among various European and American universities, research organizations, and academic network providers and administrators.1 Quipu was based on the 1988 X.500 standards specifications and, as such, implemented DAP as its client access protocol, requiring either a full OSI protocol stack or, as discussed previously, the OSI upper layers over RFC 1006. Quipu was deployed on the research Internet in the late 1980s and gained substantial exposure as an early directory service on the Internet at relatively small scale (e.g., 100k directory entries was considered a large directory at the time). Quipu was deployed at a number of universities as part of the Paradise directory project, which was administered from University College London where Quipu was primarily developed. In cooperation with researchers at the University of Michigan (i.e., Tim Howes), individuals at the Internet service provider PSI (i.e., Marshall Rose and Wengyik Yeong), researchers at University College London (e.g., Steve Kille), and other individuals, LDAPv2 emerged, and an application layer gateway called the LDAP daemon (ldapd) was developed at the University of Michigan that mapped LDAPv2 operations to DAP operations that were then forwarded onto an X.500 server, such as Quipu. As a result of LDAPv2 and this LDAP-to-DAP gateway, lightweight client applications were rapidly developed that could run on Windows and Macintosh PCs. The success of Quipu as an early prototype X.500 directory on the Internet and LDAPv2 as a client led to further innovation. One of the main advantages of Quipu was that it was extremely fast in responding to search operations. This was due to its internal task switching (co-routine) architecture, which predated POSIX threads on most UNIX® systems, and the fact that on startup, it cached all of the directory entries into memory. This feature also severely limited the scalability of Quipu because of the expense and limitations of physical memory on 32-bit server machines and the potentially long startup time required to build the in-memory cache. Work was begun in 1992, both at the University of Michigan and at the ISODE Consortium, to produce a more scalable and robust directory server. The ISODE Consortium was an early open source organization that was a spinout of University College London and the Microelectronics and Computer Technology Corporation (MCC) in Austin, Texas. The University of Michigan team first developed a server that exploited X.500 chaining to create an alternative back-end server process for Quipu that utilized a disk-based database built from the UNIX® dbm package as the underlying data store. Client requests were first sent to the main Quipu server instance that maintained topology information in it cache that allowed it to chain the requests to the back-end server. Effectively, Quipu was turned into a “routing proxy” and scalability was achieved with one or 1 A Quipu (pronounced key-poo) is a series of colored strings attached to a base rope and knotted so as to encode information. This device was used by peoples of the ancient Inca empire to store encoded information.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 7 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-7
more back-end servers hosting the data in its disk-based database, using caching for fast access. This approach proved the viability of a disk-based database for Quipu, but without integrating the disk-based database into the core of the Quipu server. This approach was taken for simplicity, and also because POSIX threads were finally viable in UNIX® and the new back-end server was based on a POSIX threading model instead of a task-based co-routine model. However, it suffered from the drawback of now having two separate server processes between the LDAP client and the actual data. An LDAP client request had to first go through the LDAP-to-DAP gateway, then through the Quipu server, then to the back-end server over the X.500 DSP protocol [ITU X.525, 1993], then back to the client. During this same period, the ISODE Consortium began work on a new X.500 server that was essentially a rewrite of much of Quipu. Based on the promising performance and scalability results from the University of Michigan’s back-end server implementation, the goal was to integrate the disk-based backend and POSIX threading model into a new, single-process directory server that could scale well and deliver search performance comparable to that of Quipu. This result was achieved with a new directory server from ISODE in 1995, based on the 1993 X.500 ITU standards. The protocol front-end was also redesigned so that additional protocol planes could be added as needed to accommodate additional server protocols. At the time, there was serious consideration given to implementing a DNS protocol plane, so that the directory server could be used to provide a DNS service in addition to both the LDAP and X.500 directory services. However, this work was never done. Instead, work was done to provide an integrated LDAPv3 protocol plane alongside of the X.500 DAP and DSP protocols, resulting in the first dual-protocol X.500+LDAP directory server. The work at the University of Michigan continued in parallel and it became obvious that one could do away with the LDAP-to-DAP protocol gateway and the routing Quipu server, and simply map LDAPv2 directly to the new disk-based back-end server. All that was required was to implement the LDAPv2 protocol as a part of the back-end server and eliminate the X.500 DSP protocol. This was the key observation that led to the first pure directory server based on LDAPv2 with some extensions, called the standalone LDAP daemon (slapd). These two separate architectural efforts led to the definition of the LDAPv3 protocol within the IETF, which was jointly defined and authored by individuals at the University of Michigan (Tim Howes and Mark Smith) and at the ISODE Consortium (Steve Kille and Mark Wahl). Both the slapd and ISODE server were the first directory servers to implement LDAPv3 as a native protocol directly, and validated its utility for implementing a directory service, not just as a client access protocol to an X.500 directory. In addition, both servers adopted the Berkeley DB b-tree library package as the basis for the disk-based backend, which added to the scalability, performance, and eventual robustness of both servers. In 1996, Netscape hired the principal inventors of the University of Michigan slapd server, which became the Netscape Directory Server that was widely adopted as an enterprise-scale LDAPv3 directory server. In 1995, the ISODE Consortium converted from a not-for-profit open-source organization to a for-profit OEM technology licensing company and shipped an integrated X.500 and LDAPv3 server. Also in 1996, Critical Angle was formed by former ISODE Consortium engineers and they developed a carrier-grade LDAPv3 directory server for telecommunications providers and ISPs. This was the first pure LDAPv3 server to implement chaining via LDAPv3, and also the first server to have multimaster replication using LDAPv3. In addition, Critical Angle developed the first LDAP Proxy Server, which provided automatic referral following for LDAPv2 and LDAPv3 clients, as well as LDAP firewall, load balancing, and failover capability, which are critical features for large-scale directory service deployments. With the emerging market success of LDAPv3 as both a client and a server technology on the Internet and corporate intranets, vendors of previously proprietary LAN-based directory servers launched LDAPv3-based products, most notably Microsoft Active Directory and Novell eDirectory. Microsoft had originally committed to X.500 as the basis for its Windows™ directory server, but adopted LDAP as part of its Internet strategy. IBM built its SecureWay LDAP directory product using the University of Michigan slapd open-source code and designed a mapping onto DB2 as the database backend. Like IBM, Oracle implemented an LDAP gateway onto its relational database product. However, it is generally the case that mapping of the
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 8 Wednesday, August 4, 2004 9:05 AM
34-8
The Practical Handbook of Internet Computing
hierarchical data model of LDAP and X.500 onto a relational data model has inherent limitations. For some directory service deployments, the overhead inherent in the mapping is a hindrance in terms of performance. Most X.500 vendors continue to provide an LDAP-to-DAP gateway as part of their product offerings, but their marketing does not usually mention either the gateway or X.500, and instead calls the X.500 server an LDAP server. The Critical Angle LDAP products were acquired in 1998 by Innosoft International. In 1999, AOL acquired Netscape. Sun Microsystems and AOL/Netscape entered into a joint marketing and technology alliance called iPlanet shortly thereafter. In March 2000, Sun acquired Innosoft and consolidated its directory server expertise by combining the Netscape technology, the Innosoft/Critical Angle technology, and its own Solaris LDAP-based directory technology initiatives into a single-directory product line under the Sun ONE™ brand.
34.2.3 Next-Generation Directory Services Now that LDAPv3 directory servers are widely deployed, whether for native LDAP implementations or using LDAP gateways mapping operations onto X.500 servers or relational database systems, new types of directory-based services are being deployed. The most recent of these are identity management systems that provide authentication, authorization, and policy-based services using information stored in the directory. The most common type of identity management service is Web single sign-on. With the proliferation of network based systems, the need for a common authentication mechanism has dictated that a higher-level service be deployed that abstracts from the various login and authentication mechanisms of different Web-based services. Identity servers built on top of directory services are providing this functionality. At present, such services are primarily being deployed within an enterprise, but there are efforts underway to define standards for federating identity information across the Internet. It will take some time before these standards, activities, and the technologies that implement them are deployed, but the foundation on which most of them are being built is a directory service based on LDAPv3. Another area where LDAP directories are gaining widespread usage is among wireless carriers and service providers. Next-generation wireless services are providing more sophisticated hand-held devices with the opportunity to interact with a more functional server infrastructure. Directories are being deployed to provide a network-based personal address book and calendars for mobile phones and PDAs that can be synchronized with the handheld devices, a laptop computer and a desktop computer. Directory services are being deployed as part of the ubiquitous network infrastructure that is supporting the management of personal contact and scheduling information for hundreds of millions of subscribers. Fortunately, LDAP directory technology has matured to the point at which it is capable of providing the performance, scalability, and reliability required to support this “always on” service in a globally connected world. In order to simplify the integration of directory data access into Web service development environments, new directory access protocols building on LDAP are being defined. Instead of ASN.1 and TCP, these protocols use XML [Bray et al., 2000] as the encoding syntax and Simple Object Access Protocol (SOAP) [Gudgin et al., 2003] as a session protocol overlying HTTP or persistent message bus protocols based on Java Messaging Service (JMS) APIs. The standards body where these protocols are being developed is OASIS, the Organization for the Advancement of Structured Information Standards. One group within OASIS in particular, the Directory Services working group, has published version 2 of the Directory Services Markup Language (DSMLv2), which leverages the directory semantics from LDAP — a hierarchical arrangement of entries consisting of attributes, but expressing the LDAP operations in SOAP. Just as X.500 servers added support for LDAP either natively or through an LDAP to X.500 gateway, there are implementations of DSMLv2 both as native protocol responders within an LDAP server and as DSMLv2 to LDAP gateway. Other working groups have already or are in the process of defining protocols for more specialized directory access, such as Universal Description, Discovery, and Integration of Web Services (UDDI)
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 9 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-9
[Bellwood, 2002], ebXML Registry Services [OASIS, 2002], and Service Provisioning Markup Language (SPML) [Rolls, 2003]. In the future, market dynamics may favor the adoption of one or more of these XML-based protocols to augment and eventually supplant LDAP as the primary client access protocol for directory repositories in the Web services environment.
34.3 The LDAP Naming Model The primary contents of most directory services are entries that represent people, but entries may also represent organizations, groups, facilities, devices, applications, access control rules, and any other information object. The directory service requires every entry have a unique name assigned when the entry is created, and most services have names for users that are based on attributes that do not change frequently and are human readable. Entries in the directory are arranged in a single-rooted hierarchy. The Distinguished Name (DN) of an entry consists of the list of one or more distinguished attribute values chosen from the entry itself, followed by attributes from that entry’s parent, and so on up the tree to the root. In most deployments, only a single-attribute value is chosen from each entry. The most widely used naming attributes are defined in the following table. dc uid cn l st o ou c
domainComponent: one element of a DNS domain name, e.g., dc=sun, dc=com userid: a person’s account name, e.g., uid=jbloggs commonName: the full name of a person, group, device, etc. e.g., cn=Joe Bloggs localityName: the name of a geographic region, e.g., l=Europe stateOrProvinceName: used in the United States and Canada organizationName: the name of an organization, e.g., o=Sun Microsystems organizationalUnitName: the name of a part of an organization: ou=Engineering countryName: the two letter ISO 3166 code for a country, e.g., c=US
The hierarchy of entries allows for delegated naming models in which the organization managing the name space near the root of the tree agrees on the name for a particular entry with an organization to manage that entry, and delegates to that organization the ability to construct additional entries below that one. Several naming models have been proposed for LDAP.
34.3.1 The X.500 Naming Model The original X.500 specifications assumed a single, global directory service, operating based on interconnections between national service providers. In the X.500 naming model, the top levels of the hierarchy were to have been structured along political and geographic boundaries. Immediately below the root would have been one entry for each country, and the entries below each country entry would have been managed by the telecommunications operator for that country. (In countries where there were multiple operators, the operators would have been required to coordinate the management of this level of the tree.) As there is no one international telecommunications operator, an additional set of protocol definitions was necessary to define how the country entries at the very top of the tree were to be managed. The telecommunications operator for each country would be able to define the structure of entries below the country entry. The X.500 documents suggested that organizations that had a national registration could be represented by organization entries immediately below the country entry. All other organizations would be located below entries that represented that country’s internal administrative regions, based on where that organization had been chartered or registered. In the U.S. and Canada, for example, there would have been intermediate entries for each state and province, as states and provinces operate as registrars for corporations. For example, a corporation that had been chartered as “A Inc.” in the state of California might have been represented in the X.500 naming model as an entry with the name o=A Inc.,st=California,c=US, where “o” is the attribute for the organization name that was registered within the state, “st” for state or province
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 10 Wednesday, August 4, 2004 9:05 AM
34-10
The Practical Handbook of Internet Computing
name within the country, and “c” the attribute for the country code. It should be noted that some certificate authorities that register organizations in order to issue them X.509 public key certificates, e.g., for use with SSL or secure email, assume this model for naming organizations. The entries for people who were categorized as part of an organization (e.g., that organization’s employees) would be represented as entries below the organization’s entry. However X.500 did not suggest a naming model for where entries representing residential subscribers would be represented.
34.3.2 Limitations of the X.500 Naming Model The first limitation is that there is no well-defined place in the X.500 model to represent international and multinational organizations. Organizations such as NATO and agencies of the United Nations and the European Community, as well as multinational corporations were some of the first to attempt to pilot standardsbased directory services, yet ran into difficulties as there was no obvious place for the organization’s entry to be located in a name space that requires a root based on a national geopolitical naming structure. A related problem is that a corporation typically operates in additional locales beyond the one where they are legally incorporated or registered. In the U.S., for example, many corporations are registered in Delaware for legal and tax reasons, but may have no operating business presence in that state beyond a proxy address. A naming structure that has the organization based in Delaware may hinder users searching the directory, who might not anticipate this location as the most likely place to find the directory information for the corporation. In some cases, organizations preferred to have an entry created for them in a logically appropriate place in the X.500 hierarchy, yet the telecommunications operator implied by the naming model as being authoritative for that region of the directory tree may have had no plans to operate X.500. Conversely, some parts of the directory tree had conflicting registration authorities as a result of political turf wars and legal disputes, not unlike those that plagued the Internet Assigned Numbers Authority (IANA) and the administrators of the root DNS servers. For use within the U.S. and Canada, the North American Directory Forum (NADF) proposed a set of attribute and server extensions to address the problems of overlapping registration authorities creating entries for individual and business subscribers in entries; however, these extensions were not implemented, and no X.500 service saw significant deployment in these countries.
34.3.3 Early Alternatives to the X.500 Naming Model Many LAN-based directory services predating LDAP suggested a simpler naming model. Unlike the complete interconnection in a single, global directory service, these models assumed that interconnection only occurred between pairs or small groups of cooperating organizations, and that relatively few organizations worldwide would interconnect. Instead of the geographic divisions of X.500, this naming model is based on a single registration authority that would assign names to organizations immediately below the root of the tree, e.g., o=Example, resulting in a flattened namespace. This model did not readily accommodate conflicts in names between organizations and relied on one registration authority to ensure uniqueness. As the Internet became more widely used by organizations for electronic mail, a variant of the flat namespace model was to register the organization’s Internet domain name as the value for the organizationName attribute, e.g., o=example.com. By relying on an external naming authority for managing the actual assignment of names to organizations, potential conflicts would be resolved before they reached the directory name registration authority, and the use of the hierarchical domain name space would allow for multiple organizations with the same name that were registered in different countries or regions, e.g., o=example.austin.tx.us and o=example.ca.
34.3.4 Internet Domain-Based Naming The single-component organization naming model described above addresses the difficulty that organizations have when faced with getting started using the X.500 model, but this approach suffers from a
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 11 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-11
serious limitation. While domain names themselves are hierarchical, placing the full domain name as a string into the organizationName attribute prevented the hierarchical structure of the directory from being used. In particular, it was not defined in that approach how an organization that was itself structured and represented internally with multiple domain names, e.g., east.example.com and west.example.com, would be able to manage these as part of a hierarchy below the organization entry. These limitations were removed in the mapping defined in RFC 2247 [Kille, 1988] between Internet domain names and LDAP distinguished names. In this approach, each domain-name component is mapped into its own value of the “dc” attribute to form the distinguished name of the entry (typically an organization) to which the domain corresponds. For example, the domain name cs.ucl.ac.uk would be transformed into dc=cs,dc=ucl,dc=ac,dc=uk, and the domain name example.com into dc=example,dc=com. Follow-on documents have proposed how DNS SRV records can be used to specify the public directory servers that an organization provides, in a similar manner to the MX records for specifying the mail servers for that organization. In combination with RFC 2247 naming, a directory client that knows only a domain name can use these techniques to locate the LDAP server to contact and construct the search base to use for the entry that corresponds to that domain. RFC 2247 assumes that when performing operations on the directory entries for an organization, the organization’s domain name is already known to the client so that it can be automatically translated into a distinguished name to be used as an LDAP search base. RFC 2247 does not address how to programmatically locate an organization when the organization’s domain name is not known; this is currently an unsolved problem in the Internet.
34.3.5 Naming Entries within an Organization There are currently no Internet standards that are widely adopted for naming entries representing people within an organization. Initial deployments of LDAP made extensive use of the organizationalUnit entries to construct a directory tree that mirrored the internal divisions of the organization and use the “cn” attribute as the distinguished attribute for the person’s entry, as in the following: cn=Joe Bloggs, ou=Western, ou=Sales, ou=People, dc=example, dc=com That approach, however, resulted in organizations needing to frequently restructure their directory tree as the organization’s internal structure changed, and even placing users within organizational units did not eliminate the potential for name conflicts between entries representing people with the same full name. Currently, the most common approach for directory deployments, in particular those used to enable authentication services, is to minimize the use of organizationalUnit entries, and to name users by their login name in the uid attribute. Many deployments now have only a single organizationalUnit entry: ou=People. Some multinational organizations use an organizationalUnit for each internal geographic or operating division, in particular when there are different provisioning systems in use for each division, or it is necessary to partition the directory along geographic lines in order to comply with privacy regulations. For example, an organization that has two operating subsidiaries X and Y might have entries in their directory named as follows: uid=jbloggs, ou=X, ou=People, dc=example, dc=com uid=jsmith, ou=France, ou=Y, ou=People, dc=example, dc=com For service provider directories or directories that offer a hosted directory service for different organization entities, the DNS domain name component naming is most often used to organize information in the directory naming tree as follows: uid=jbloggs, ou=X, dc=companyA, dc=com uid=jsmith, ou=Y, dc=companyA, dc=com uid=jwilliams, ou=A, dc=companyB, dc=com
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 12 Wednesday, August 4, 2004 9:05 AM
34-12
The Practical Handbook of Internet Computing
uid=mjones, ou=B, dc=companyB, dc=com In a typical service provider or hosted directory environment, the directory data for different organizations is physically partitioned and an LDAP proxy server is used to direct queries to the appropriate directory server that holds the data for the appropriate naming context.
34.4 The LDAP Schema Model Directory servers based on LDAP implement an extensible object-oriented schema model that is derived from the X.500 schema model, but with a number of simplifications as specified in RFC 2256 [Wahl, 1997]. Two additional schema definition documents are RFC 2798, which defines the inetOrgPerson object class, and RFC 2247, which defines the dcObject and domain object classes. The schema model as implemented in most LDAP servers consists of two types of schema elements: attribute types and object classes. Object classes govern the number and type of attributes that an entry stored in the directory may contain, and the attribute types govern the type of values that an attribute may have. Unlike many database schema models, LDAP schema has the notion of multivalued attributes that allows a given attribute to have multiple values. The types of the values that may be associated with a given attribute are defined by the attribute type definition. The most common attribute type is a UTF-8 string, but many other types occur, such as integer, international telephone number, email address, URL, and a reference type that contains one or more distinguished names, representing a pointer to another entry in the directory. A directory entry may have multiple object classes that define the attributes that are required to be present, or may optionally be present. Directory servers publish their internal schema as an entry in the directory. It can be retrieved by LDAP clients performing a baseObject search on the a special entry that is defined by the directory server to publish schema information (e.g., cn=schema), with the attributes attributeTypes and objectClasses specified as part of the search criteria. This schema entry maintains the schema definitions that are active for a given directory server instance. The format of these two attributes is defined in RFC 2252 [Wahl et al., 1997b].
34.4.1 Attribute-Type Definitions An attribute-type definition specifies the syntax of values of the attribute and whether the attribute is restricted to having at most one value, and the rules that the server will use for comparing values. Most directory attributes have the Directory String syntax, allowing any UTF-8 encoded Unicode character, and use matching rules that ignore letter case and duplicate white space characters. User attributes can have any legal value that the client provides in the add or modify request, but a few attributes are defined as operational, in which the attributes are managed or used by the directory server itself and may not be added or changed by most LDAP clients directly. Examples of operational attributes include createTimestamp and modifyTimestamp.
34.4.2 Object-Class Definitions Each entry has one or more object classes, which specifies the real-world or information object that the entry represents, as well as the mandatory and permitted attributes defined in the entry. Object classes come in one of three kinds: abstract, structural, or auxiliary. There are only two abstract classes. The top class is present in every entry, and it requires that the objectClass attribute be present. The other abstract class is alias, and it requires that the aliasedObjectName attribute be present. Structural object classes define what the entry represents, and every entry must contain at least one structural object class; for example: organization, device, or person. A structural object class inherits either from top or from another structural object class, and all the structural object classes in an entry must form a single “chain” leading back to top. For example, the object class organizationalPerson
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 13 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-13
inherits from person and permits additional attributes to be present in the user’s entry that describe the person within an organization, such as title. It is permitted for an entry to be of object classes top, person and organizationalPerson, but an entry cannot be of object classes top, person and device, because device does not inherit from person, nor person from device. Auxiliary classes allow additional attributes to be present in a user’s entry, but do not imply a change in what the entry represents. For example, the object class strongAuthenticationUser allows the attribute userCertificate;binary to be present, but this class could be used in an entry with object class person, device, or some other structural class.
34.4.3 Object Classes for Entries Representing People The person structural class requires the attributes “cn” (short for commonName) and “sn” (short for surname). This class is subclassed by the organizationalPerson class, and the organizationalPerson class is subclassed by the inetOrgPerson class. Most directory servers for enterprise and service provider applications use inetOrgPerson, or a private subclass of this class, as the structural class for representing users. In addition to the mandatory attributes cn, sn, and objectClass, the following attributes are typically used in entries of the inetOrgPerson object class: departmentNumber description displayName employeeNumber employeeType facsimileTelephoneNumber givenName homePhone homePostalAddress jpegPhoto labeledURI mail manager mobile ou pager postalAddress roomNumber secretary surname (sn) telephoneNumber title uid userPassword
a numeric or alphanumeric code a single line description of the person within the organization name of the user as it should be displayed by applications unique employee number a descriptive text string, such as “Employee” or “Contractor” fax number, in international dialing format (e.g., +1 999 222 5555) first or given name home phone number, in international dialing format home mailing address, with “$” inserted between lines photograph in JPEG format the URI for a Web home page Internet email address distinguished name of the entry for a manager mobile phone number in international dialing format (e.g., +1 999 222 4444) organizational Unit, if different from department pager phone number and codes, if any mailing address, with “$” inserted between lines office room number distinguished name of the entry for a secretary Last or surname telephone number, in international dialing format Job title user id, typically part of the person's distinguished name user password compared against during LDAP authentication
For example, the following definition is an LDAP Interchange Format (LDIF) text representation of a typical user entry in an LDAP directory, as specified in RFC 2489 [Good, 2000]. The order of appearance of most attributes and value pairs does not imply any specific storage requirements, but it is convention to present the objectClass attribute first, after the distinguished name. Several additional attributes are permitted in entries of this object class, but are no longer widely used. For further details consult RFC 2256 [Wahl, 1997] and RFC 2798 [RFC 2798]. dn: uid=jbloggs,ou=people,dc=example,dc=com objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson cn: Joe Bloggs
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 14 Wednesday, August 4, 2004 9:05 AM
34-14
The Practical Handbook of Internet Computing
sn: Bloggs departmentNumber: 15 description: an employee displayName: Joe Bloggs employeeNumber: 655321 employeeType: EMPLOYEE facsimileTelephoneNumber: +1 408 555 1212 givenName: Joe homePhone: +1 408 555 1212 homePostalAddress: Joe Bloggs $ 1 Mulberry Street $ Anytown AN 12345 labeledURI: http://eng.example.com/~jbloggs mail: [email protected] manager: uid=jsmith,ou=people,dc=example,dc=com mobile: +1 408 555 1212 ou: Internet Server Engineering pager: +1 408 555 1212 postalAddress: Joe Bloggs $ 1 Main Street $ Anytown AN 12345 roomNumber: 2114 telephoneNumber +1 408 555 1212 x12 title: Engineering Manager uid: jbloggs userPassword: secret
34.4.4 Other Typical Object Classes The organization structural class requires the “o” attribute (short for organizationName) to be present in the entry, and permits many attributes to optionally also be present, such as telephoneNumber, facsimileTelephoneNumber, postalAddress, and description. This class is normally used to represent corporations, but could also represent other organizations that have employees, participants, or members and a registered name. The organizationalUnit structural class requires the “ou” (short for organizationalUnitName) be present, and permits the same list of optional attributes as the organization class. This class is normally used to represent internal structures of an organization, such as departments, divisions, or major groupings of entries (e.g., ou=People). The domain structural class requires the “dc” (domainComponent) attribute be present. This class is used to represent objects that have been created with domain component naming, but about which no other information is known. The dcObject auxiliary class is used to permit the dc attribute to be present in entries of the organization or organizationalUnit classes, typically so that the dc attribute can be used for naming the entry, although the o or ou attribute is still required to be present. The groupOfNames structural class uses cn for naming the group, which is represented by the attribute member. This requires one or more attribute values each containing the distinguished name of another entry that is a member of the group. A similar and more widely used class, groupOfUniqueNames, uses the uniqueMember attribute.
34.5 LDAP Directory Services There are several architectural applications of LDAP in today’s Internet: email address book services, Web-based white pages lookup services, Web authentication/authorization services, email server routing and address list expansion services, and literally hundreds of uses that are generally categorized as a network-based repository for application-specific information (e.g., application configuration information, directory-enabled networking such as router tables, and policy-based user authentication and access
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 15 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-15
authorization rules). In other words, basic LDAP directory services have become a critical part of the network infrastructure for many applications, just as DNS, FTP, SMTP, and HTTP are core infrastructure services. LDAP is very often there behind-the-scenes of many end-user applications and embedded in a number of other services that are not end-user visible. One of the most common questions that arises in corporate directory services deployments is the following: “Why not just use a relational database system rather than a new kind of database?” The answer to this question is often as varied as the application services in support of which an LDAP directory is being considered. In some cases, such as a simple Web-based white pages service, there is no real compelling advantage over using an RDBMS that may already contain all the information about people and their contact details. Directories are a distinct type of distributed database and are best suited to a new generation of network-based applications whose data access and service availability requirements do not require a relational data model or a SQL-based transactional database. An important distinction that networking system novices make is to distinguish a protocol and the service implemented in terms of the protocol. A protocol defines the communication pattern and the data exchanges between two end points in a distributed system. Typically, one end point is a client and the other a server, but both end points could be peers. The semantics offered by a server often extends beyond the information exchange rules that are specified by the protocol. In other words, the server may require additional features to implement a reliable, maintainable, highly available service that transcend the basic information exchange implied by a protocol. For example, LDAPv3 has the concept of extended operations and special controls, some of which are standardized and some of which are not. The result is that vendors have created extensions to the core protocol specifications to enable additional services such as configuration and management of the server over LDAP without ever having to shut down the server to ensure high availability. This is not necessarily a bad thing because extended operations and controls are useful from an administrative perspective, enabling network-based management of an LDAP service using the LDAP protocol, or special server-to-server communication enabling a distributed directory service. In a competitive market, where technology vendors compete with one another by enabling proprietary client visible features, complete interoperability between clients and servers may be broken. This situation is typically avoided by having periodic interoperability testing forums where competing vendors demonstrate interoperability. As long as the core protocol and basic service model is not violated, then client interoperability is maintained. Another confusing aspect of standards-based network services is the difference between standards conformance vs. demonstrated interoperability. Within some standards organizations, the emphasis has been on demonstration of static conformance to a written specification. Static conformance is often achieved through the successful demonstration that a server passes some set of conformance tests. Conformance testing is useful to the vendor, but what is most useful to a user is interoperability testing, i.e., does the server from vendor A work with clients from vendors B, C, and D? Conformance is easier to achieve than interoperability. LDAPv3 has been shown to be a highly interoperable protocol and most clients work without complication with most servers. Vendors of LDAPv3 client and server products regularly meet to perform interoperability testing forums, sponsored by The Open Group, to ensure that products work together. In the remainder of this section we discuss the basic and advanced modes of operations for common LDAP-based directory services.
34.5.1 Basic Directory Services The LDAP service model is a basic request/response client-server model. A client issues either a bind, search, update, unbind, or abort request to the server. All protocol operations may either be initiated directly over a TCP connection or encrypted via an SSL/TLS session. The bind operation is used to pass authentication credentials to the server, if required. LDAP also supports anonymous search, subject to access control restrictions enforced by the server. Bind credentials consist of the user’s distinguished name along with authentication credentials. A distinguished name typically identifies a user corresponding to a logical node in the directory information tree (DIT). For
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 16 Wednesday, August 4, 2004 9:05 AM
34-16
The Practical Handbook of Internet Computing
example: uid=jbloggs, ou=People, dc=sun, dc=com. Authentication credentials may be a simple clear text password (optionally over a SSL or TLS session), information obtained from a digital certificate required to strongly authenticate, or other encrypted or hashed password authentication mechanisms enabled by a particular server. There are three principal models of distributed operations: a simple client-server model, a referral model, and a chaining model. The simple client-server interaction is not depicted, but operates as one would expect: a client performs a bind operation against the server (either anonymously or with bind credentials), issues an LDAP operation (e.g., search), obtains a result (possibly empty or an error), and then either issues other operations or unbinds from the server. In this mode of operation, an LDAP server will typically handle hundreds to thousands of operations per second on behalf of various types of LDAPenabled clients and applications. In some deployments, most notably those on the public Internet or in government, university and enterprise directory service environments where anonymous clients may connect and search a forest of directories, a referral model may be appropriate. The referral model assumes that either (1) all clients will bind anonymously, or (2) authentication information is replicated among the set of directory servers, or (3) there is some mechanism for proxy authentication at one server on behalf of a network of trusted directory servers that will accept authentication credentials via a proxy [Weltman, 2003]. Figure 34.1 depicts the most common referral model situation and assumes that a client is anonymous, such that it can bind and search any one of the two directory servers depicted. In the referral model, if a client requests an operation against one directory server (e.g., a search operation) and that directory Server-A holding naming context: for ou = People, dc = sun, dc = com
1) LDAP search on Server-A for cn = jbloggs, ou = Engineering ou = People, dc = sun, dc = com
LDAP Client 2) LDAP Referral returned for Server-B
3) LDAP search reissued to Server-B for cn = jbloggs, ou = Engineering ou = People, dc = sun, dc = com Server-B holding naming context: for ou = Engineering, ou = People, dc = sun, dc = com
3) LDAP result returned for cn = jbloggs
FIGURE 34.1 LDAP referral model.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 17 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-17
server does not hold the entry or entries that satisfy that operation, then a referral may be returned to the client. A referral is a list of LDAP URLs that point to other directory servers that the original server is configured to refer queries to. A referring server may have out-of-date information and the referral may not succeed. Referral processing is the responsibility of the client application and is most often handled transparently as part of the LDAP API from which the client application is built. The referral model is most often appropriate in directory service deployments where there are no stringent requirements on authentication because servers may be configured to accept unauthenticated anonymous operations, such as searches. In fact, one of the major disadvantages of the referral model is that it facilitates trawling of a large distributed directory service and allows a snooping client application to probe and discover a directory service’s topology. This may be undesirable even in a public Internet environment, such as a university network. For this reason, LDAP proxy servers were invented to provide an additional level of control in these more open network environments. The chaining model is similar to the referral model, but provides a higher degree of security and administrative control. LDAP chaining is similar to chaining in the X.500 directory service model, but it is done without requiring an additional sever-to-server protocol as in the case of X.500. Figure 34.2 illustrates the chaining model. In this case, a client issues a search request to Server A, which uses its knowledge of which server holds the subordinate naming context, and it chains the request to Server B. Chaining assumes an implied trust model between Server A and Server B because typically Server A will authenticate to Server B as itself, not as the client. For efficiency in the chaining model, it is typical LDAP Client
1) LDAP search on Server-A for cn = jbloggs, ou = Engineering ou = People, dc = sun, dc = com 4) LDAP result returned for cn = jbloggs
Server-B holding naming context: for ou = Engineering, ou = People, dc = sun, dc = com
Server-A holding naming context: for ou = People, dc = sun, dc = com
2) LDAP search chained to Server-B for cn = jbloggs, ou = Engineering, ou = People, dc = sun, dc = com 3) LDAP result returned for cn = jbloggs
FIGURE 34.2 LDAP chaining model.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 18 Wednesday, August 4, 2004 9:05 AM
34-18
The Practical Handbook of Internet Computing
for Server A to maintain a persistent open network connection to Server B to eliminate the overhead of binding for each chained operation. In some cases, such as anonymous clients, there is no authentication information to proxy, so Server A will often maintain a separate, nonauthentication connection to Server B for such requests. A proxied authentication model could also be used in which case the client credentials are passed along as part of the chained operation, requiring both Server A and Server B to authenticate or proxy the authentication of the client.
34.5.2 High Availability Directory Services First generation X.500 and LDAP directory servers focused primarily on implementing as much of the protocol specifications as possible, providing flexible and extensible schema mechanisms, and ensuring very fast search performance. During the early adoption phase of a new technology, these are the critical elements to get right and it is necessary for rapid feedback and technology evolution as the result of real world deployments. However, as LDAP server technology has become more central to the network infrastructure backing up mission-critical business operations (e.g., as a network user authentication service), security, reliability, scalability, performance, and high availability of the service are the dominant operational requirements. In the remainder of this section, we briefly discuss high availability features and issues related to deployment of LDAP directory services in support of business critical applications.
34.5.3 Master–Slave Replication The X.500 specifications define a replication protocol called DISP (Directory Information Shadowing Protocol) that provides a simple master–slave replication service. Technically, directory replication is based on a weakly consistent supplier–consumer model, but the master–slave terminology has become dominant. In this weakly consistent replication model, one directory server acts a supplier (master) of data to another directory server that is the consumer (slave). At any given time a replica may be inconsistent with the authoritative master, and so a client accessing data at the replica might not see the latest modifications to the data. If this situation is likely to cause problems for the client applications, there are deployment techniques (e.g., using a proxy server) that ensure that a client application will be connected with an authoritative master server so that it may obtain up-to-date information. Within the X.500 model it is possible for either the supplier to initiate a replication data flow or for the consumer to request a replication data flow from the supplier. Replication can occur as soon as a change is detected in a supplier, called on-change, or according to some periodic schedule. Supplierinitiated replication works best when network links between the supplier and the consumer are reliable, with low latency and reasonably high bandwidth. Consumer-initiated replication is most often used in situations where the consumer might frequently become disconnected from the network either due to the unscheduled network outages or high-latency, low-bandwidth networks that require potentially large replication data exchanges to be done during off-peak times as determined by the consumer (e.g., a consumer on a computer in a submarine). Replication typically requires that not only directory entries be replicated from a supplier to a consumer but also schema and access control information. If not done automatically via a protocol, manual configuration is required for each consumer, and there could be thousands of consumers in large distributed directory system (e.g., a directory consumer in every airport in the world holding flight schedule information). Some directory servers do not implement schema and access control such that the information can be replicated, so manual configuration or some other out-of-band technique is used to replicate this type of operational information. While master–slave replication provides high availability via geographic distribution of information, facilitating scalability for search operations, it does not provide high-availability for modify operations. In order to achieve write-failover, it is necessary to employ systems engineering techniques to enable either a cluster system running the single master or enable a hot standby mode so that in the event of the failure of the single master the hot standby server can be brought online without delay. Another
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 19 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-19
technique is to allow a slave server to become a master server through a process of promotion, which involves special control logic to allow a slave to begin receiving updates and to notify other slaves that it is now the authoritative master. Many directory deployments only allow write operations to the master server and route all search operations, using DNS or an LDAP Proxy server, to one of the slaves so that the load on the master server is restricted to only modify operations. A common scenario that is employed in single-master, multiple-slave, directory deployments is to deploy a small set of replica hubs, each being a read-only replica from which other slaves may obtain updates. In the event a master fails, a hub is easily promoted. In this model, depicted in Figure 34.3, the replica hubs are both consumers and suppliers because they consume their updates from an authoritative master server, but also supply updates to other slaves. This scenario is most useful when a large number of slave servers would put unnecessary load on a single master, and so a hierarchy of servers is established to distribute the replication update load from the master to the hubs; otherwise, the master might spend all of its time updating consumers.
34.5.4 LDAP Proxy Server An LDAP proxy server is an OSI layer 7 application gateway that does LDAP protocol data unit forwarding, possibly requiring examination, and possibly on-the-fly modification, of the LDAP protocol message to determine application-layer semantic actions. The objective of the LDAP proxy server is to provide an administrative point of control in front of a set of deployed directories, but does not answer LDAP queries itself, instead chaining the queries when appropriate to the directory servers. The proxy allows a degree of transparency to the directory services, in conjunction with DNS maps pointing LDAP clients to one or more proxy services instead of at the actual directory servers. The LDAP proxy provides a number of useful functions that are best done outside of the core directory server. These functions include: 1. LDAP schema rewriting to map server schema to client schema in the cases where client schema is either hard-coded or for historical reasons does not match the extensible schema of the directory server. Once thousands of clients are deployed, it is difficult to correct the problem, and so server applications must often adapt for the sake of seamless interoperability. 2. Automatic LDAP referral following on behalf of both LDAPv2 and LDAPv3 clients. The LDAPv2 protocol did not define a referral mechanism, but a proxy server can map an LDAPv2 client request into an LDAPv3 client request so that referrals can be used with a mixed set of LDAPv2 and LDAPv3 clients. 3. An LDAP firewall that provides numerous control functions to detect malicious behavior on the part of clients, such as probing, trawling, and denial-of-service attacks. The firewall functions include rate limiting, host and TCP/IP-based filters similar to the TCP wrappers package, domain access control rules, and a number of LDAP-specific features, including operations blocking, size limits, time limits, and attribute filters. The rate-limiting feature allows a statistical back-off capability using TCP flow control so that clients attempting to overload the directory are quenched. 4. The proxy provides a control point for automatic load balancing and failover/failback capability. It may also be able to maintain state information about load on a set of directory servers and redirect LDAP traffic based on load criteria or other semantic criteria at the LDAP protocol level. In addition, it can detect the failure of a directory server and rebalance the incoming load to other directory servers, and detect when a failed directory server rejoins the group. 5. The proxy also provides the point-of-entry advertised to clients in the DNS to the directory service, thereby providing a level of indirection to client applications that facilitates maintenance, upgrades, migrations, and other server administrative tasks done on the back-end directory servers, in a manner that is transparent to clients so that a highly available directory service is delivered. An LDAP proxy is unlike an IP firewall in that it does application-layer protocol security analysis. The proxy is also unlike an HTTP proxy in that it does not do caching of directory data because it is unable to apply directory access rules to clients. The LDAP proxy is also unlike an IP load balancer in that it is
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 20 Wednesday, August 4, 2004 9:05 AM
34-20
The Practical Handbook of Internet Computing
Directory Master (supplier)
Data Center
Directory Hub (consumer/supplier)
Directory Hub (consumer/supplier)
Directory Slaves (consumers)
Directory Slaves (consumers)
Tower box
Tower box Remote Data Center
Tower box
Tower box Remote Data Center
FIGURE 34.3 Master–slave replication.
able to make application-level load balancing decisions based on knowledge of directory server topology, query rates, query types, and load and availability metrics published by the directory servers. Figure 34.4 is an illustration of a typical deployment of a pair of LDAP proxy servers that sit behind an IP firewall and accept LDAP connection requests and operations on port 389.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 21 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-21
WAN
IP Firewall w/TCP Port 389 Open
LDAP Proxy Server
LDAP Proxy Server
Replicated Directory Servers
FIGURE 34.4 LDAP proxy servers.
34.5.5 Multimaster Replication Multimaster replication with high-speed RAID storage, combined with multiple LDAP proxy servers and dynamic DNS provides a very high availability directory service. In some cases, a clustered operating system platform may provide additional availability guarantees. There are different techniques for imple-
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 22 Wednesday, August 4, 2004 9:05 AM
34-22
The Practical Handbook of Internet Computing
menting multimaster replication, but they all share in common the goal of maintaining write availability of a distributed directory service by ensuring that more than one master server is available and reachable for modifications that will eventually be synchronized with all of the other masters. As discussed previously, LDAP replication is based on a weakly-consistent distributed data model, and so any given master may be in a state of not having processed all updates seen at other master servers. This feature of an LDAP directory service is sometimes criticized as a weakness of the LDAP service model, but in welldesigned directory service deployments with high bandwidth, low latency-LANs and WANs, it is possible to have weak consistency and still provide a very high service level for most application environments. In loosely coupled distributed systems over WANS, global consistency is very difficult to achieve economically. For those environments that require total consistency at any point in time, another networkbased distributed data service is probably more appropriate. Ideally, any directory server in a deployment of a directory service could be a master for some or all of the data that it holds, thereby providing n-way multimaster replication. In practice, however, the masters are typically placed in a controlled environment like a data center, where the server’s authoritative database can be properly administered, regularly backed up, etc. In the case of geographically dispersed data centers, each data center may contain one or more master servers that are interconnected by a highspeed LAN and that stay in close synchronization, whereas masters in another data center are connected by a slower WAN and might often be out of synchronization. Different high availability goals will dictate how the masters are deployed, how the data is partitioned, and how proxy servers are deployed to help enable client applications to get at replicas or the masters themselves. Whatever the choice of topology, multimaster replication combined with hub and slave replicas and proxy servers offers a highly available directory service. In this scenario, proxy servers provide a critical piece of functionality because replicas and hubs will often be configured to return referrals to one or more master servers for a client that requests to do an update on a replica. Alternatively, if the directory server used as a replica offers chaining, then it may be able to chain the operation to the master. For modifications, it is often desirable to have the client application authenticate to the master with its own credentials for auditing purposes, rather than have a replica proxy the modification on behalf of the client applications.
34.5.6 Replication Standardization There is no official standard for how replication is to be done with LDAP. Each LDAP directory server vendor has implemented a specialized mechanism for replication. There are many reasons why no LDAP replication standard was reached within the IETF, but the principal reason was that no consensus could be reached on standard access control mechanism. A common access control mechanism, or consistent mapping, is required before LDAP replication interoperability can be achieved between servers from different vendors. In addition, market politics inhibited the successful definition, as the access control model and other semantic features implemented as part of a directory service, independent of the LDAP directory protocol, were viewed as competitive elements of various vendor products. The question is often asked by the LDAP standards working group did not simply adopt the X.500 access control model and replication protocol, as they were already standardized. The reasons are complicated, but there is an important fact about X.500 replication that is often not often well understood. The X.500 replication protocol suffers from numerous flaws itself, and most X.500 products implement proprietary workarounds in their products to enable replication to work at a scale beyond a couple of hundred thousand directory entries. Briefly, the DISP requires a replication data unit to be sent as a single protocol data unit for both the total update during initialization, as well as during incremental updates for changes. Using the OSI upper layer protocols, the DISP protocol is defined in terms of a remote procedure call service and a session layer service that was not well designed to take procedure arguments that could be on the order of several megabytes to several gigabytes. Hence, implementations of DISP that use the remote operation service element (ROSE) of the OSI stack are most often severely limited in their ability to perform replication updates of any appreciable size. The alternative is to implement DISP in terms of the reliable transfer service
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 23 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-23
element (RTSE), which is used by X.400, but no two X.500 vendors convincingly demonstrated interoperability at scale. As a result, X.500 total update replication in practice is neither scalable nor interoperable as a practical matter between any two X.500 serves from different vendors. The vendors had no choice but to make changes to the protocol as implemented in the product of each vendor to achieve practical replication between each vendor’s own servers, and those changes usually make all but the most basic level of interoperability unachievable in practice. Fully interoperable and scalable replication between disparate directory servers, whether X.500 or LDAP, has not yet been achieved. The replication problem remains an active area of research, especially with respect to performance and scalability over WANs, and topology and replication agreement manageability of potentially hundreds-to-thousands of master–master and master–slave replication agreements.
34.6 LDAP Protocol and C Language Client API The LDAPv3 protocol is defined in RFC 2251 [Wahl et al., 1997a]. Closely related specifications are RFC 2222 [Myers, 1997] and RFC 2252–2256 [Wahl, 1997; Wahl et al., 1997b; 1997c; Howes, 1997]. Many Internet application protocols, such as SMTP and HTTP, are defined as text-based request-response interactions; however, LDAPv3 (like SNMP) is defined using Abstract Syntax Notation One (ASN.1) so that application protocol data units (PDUs) are strongly typed in, that structural type information is sent along with user data in the form of encoded type tags. Clients and servers implementing LDAPv3 use the Basic Encoding Rules (BER to encode protocol data units as compact network byte order binary strings before transmission via TCP. This encoding/decoding process introduces slight computational overhead to protocol processing, but processing of LDAP operations is less computational intensive than other ASN.1-represented protocols such as DAP- or XML-encoded protocols, and even some text based protocols that require a lot of string parsing and end-of-message handling. This efficiency is due to the restricted use of only the basic and efficiently encoded ASN.1 data types in defining the LDAP protocol structure. Most data elements are represented as strings that are easily encoded using BER. This optimization allows very compact and efficient encoders/decoders to be written.
34.6.1 LDAPv3 Protocol Exchange As with typical client-server protocols, the LDAPv3 protocol exchange is initiated by a client application requesting a TCP connection to a server, typically on reserved TCP port 389. Either endpoint of the connection can send an LDAPMessage PDU to the other endpoint at any time, although only request forms of the LDAPMessage are sent by the client and response forms sent by the server. Typically, a client will initiate a bind, operation1, …, operationN, unbind sequence, where each operation is either a search, compare, or one of a family of update operations. A client may also choose to abandon an operation before unbinding and closing or aborting the TCP connection or the client may choose to rebind on an existing TCP connection with new credentials. The LDAPMessage is defined in ASN.1 as follows: An LDAPMessage is converted to bytes using a BER encoding, and the resulting series of bytes is sent on the TCP connection. In LDAP, only the BER definite length fields are used, so the receiver of a PDU knows how long the PDU will be as soon as the type tag and length of the outermost LDAPMessage SEQUENCE has been read from the network. The use of definite length encodings allows LDAP PDU processing on the server side to be done very efficiently because knowing the full length of the incoming PDU from reading the first few bytes leads to efficient memory management and minimization of data copying while reading bytes from the network. The messageID field is an INTEGER. When the client sends a request, it chooses a value for the messageID, which is distinct from that of any other message the client has recently sent on that connection, typically by incrementing a counter for each message. Any LDAPMessage PDUs returned by the server in response to that request will use the same messageID field. This enables a client to send multiple
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 24 Wednesday, August 4, 2004 9:05 AM
34-24
The Practical Handbook of Internet Computing
LDAPMessage ::= SEQUENCE { messageID MessageID, protocolOp CHOICE { bindRequest BindRequest, bindResponse BindResponse, unbindRequest UnbindRequest, searchRequest SearchRequest, searchResEntry SearchResultEntry, searchResDone SearchResultDone, searchResRef SearchResultReference, modifyRequest ModifyRequest, modifyResponse ModifyResponse, addRequest AddRequest, addResponse AddResponse, delRequest DelRequest, delResponse DelResponse, modDNRequest ModifyDNRequest, modDNResponse ModifyDNResponse, compareRequest CompareRequest, compareResponse CompareResponse, abandonRequest AbandonRequest, extendedReq ExtendedRequest, extendedResp ExtendedResponse }, controls [0] Controls OPTIONAL } requests consecutively on the same connection, and servers that can process operations in parallel (for example, if they are multithreaded) will return the results to each operation as it is completed. Normally, the server will not send any LDAPMessage to a client except in response to one of the above requests. The only exception is the unsolicited notification, which is represented by an extendedResp form of LDAPMessage with the messageID set to 0. The notice of disconnection allows the server to inform the client that it is about to abruptly close the connection. However, not all servers implement the notice of disconnection, and it is more typical that a connection is closed due to problems with the network or the server system becoming unavailable. The controls field allows the client to attach additional information to the request, and the server to attach data to the response. Controls have been defined to describe server-side sorting, paging, and scrolling of results, and other features that are specific to particular server implementations. In the C API, an application indicates that it wishes to establish a connection using the ldap_init call. LDAP *ldap_init (const char *host,int port); The host argument is either the host name of a particular server or a space-separated list of one or more host names. The default reserved TCP port for LDAP is 389. If a space-separated list of host names is provided, the TCP port for each host can be specified for each host, separated by a colon, as in “servera:41389 server-b:42389.” The TCP connection will be established when the client makes the first request call.
34.6.2 General Result Handling The result PDU for most requests that have a result (all but Abandon and Unbind) is based on the following ASN.1 data type: The resultCode will take the value zero for a successfully completed operation, except for the compare operation. Other values indicate that the operation could not be performed, or could only partially be performed. Only LDAP resultCode values between 0 and 79 are used in the protocol, and most indicate
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 25 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
LDAPResult ::= SEQUENCE resultCode matchedDN errorMessage referral
34-25
{ ENUMERATED, LDAPDN, LDAPString, [3] Referral OPTIONAL }
error conditions; for example, noSuchObject, indicating that the requested object does not exist in the directory. The LDAP C API uses resultCode values between 80 and 97 to indicate errors detected by the client library (e.g., out of memory). In the remainder of this section, the C API will be described using the LDAP synchronous calls that block until a result, if required by the operation, is returned from the directory server. These API calls are defined by standard convention to have the suffix “_s” appended to the procedure names. Client applications that need to multiplex several operations on a single connection, or to obtain entries from a search result as they are returned asynchronously by the directory server, will use the corresponding asynchronous API calls. The synchronous and asynchronous calls generate identical LDAP messages and hence are indistinguishable to the server. It is up to the client application to define either a synchronous or asynchronous model of interaction with the directory server.
34.6.3 Bind The first request that a client typically sends on a connection is the bind request to authenticate the client to the directory server. The bind request is represented within the LDAPMessage as follows: BindRequest ::= [APPLICATION 0] SEQUENCE { version INTEGER (1.. 127), name LDAPDN, authentication AuthenticationChoice } AuthenticationChoice ::= CHOICE { simple [0] OCTET STRING, sasl [3] SaslCredentials } There are two forms of authentication: simple password-based authentication and SASL (Simple Authentication and Security Layer). The SASL framework is defined in RFC 2222 [Myers, 1997]. A common SASL authentication mechanism is DIGEST-MD5 as defined in RFC 2831 [Leach and Newman, 2000]. Most LDAP clients use the simple authentication choice. The client provides the user’s distinguished name in the name field, and the password in the simple field. The SaslCredentials field allows the client to specify a SASL security mechanism to authenticate the user to the server without revealing a password on the network, or by using a non-password-based authentication service, and optionally to authenticate the server as well. SaslCredentials ::= SEQUENCE { mechanism credentials
LDAPString, OCTET STRING OPTIONAL }
Some SASL mechanisms require multiple interactions between the client and the server on a connection to complete the authentication process. In these mechanisms the server will provide data back to the client in the serverSaslCreds field of the bind response. BindResponse ::= [APPLICATION 1] SEQUENCE { COMPONENTS OF LDAPResult, serverSaslCreds [7] OCTET STRING OPTIONAL }
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 26 Wednesday, August 4, 2004 9:05 AM
34-26
The Practical Handbook of Internet Computing
The client will use the server’s credential to compute the credentials to send to the server in a subsequent bind request. In the C API, an application can perform a simple bind and block waiting for the result using the ldap_simple_bind_s call. int ldap_simple_bind_s (LDAP *ld,const char *dn,const char *password);
34.6.4 Unbind The client indicates to the server that it intends to close the connection by sending an unbind request. There is no response from the server. UnbindRequest ::= [APPLICATION 2] NULL In the C API, an application can send an unbind request and close the connection using the ldap_unbind call. int ldap_unbind (LDAP *ld);
34.6.5 Extended Request The extended request enables the client to refer operations that are not part of the LDAP core protocol definition. Most extended operations are specific to a particular server’s implementation. ExtendedRequest ::= [APPLICATION 23] SEQUENCE { requestName [0] LDAPOID, requestValue [1] OCTET STRING OPTIONAL } ExtendedResponse ::= [APPLICATION 24] SEQUENCE { COMPONENTS OF LDAPResult, responseName [10] LDAPOID OPTIONAL, response [11] OCTET STRING OPTIONAL }
34.6.6 Searching The search request is defined as follows: SearchRequest ::= [APPLICATION 3] SEQUENCE { baseObject LDAPDN, scope ENUMERATED { baseObject (0), singleLevel (1), wholeSubtree (2) }, derefAliases ENUMERATED, sizeLimit INTEGER (0 .. maxInt), timeLimit INTEGER (0 .. maxInt), typesOnly BOOLEAN, filter Filter, attributes AttributeDescriptionList } The baseObject DN and scope determine which entries will be considered to locate a match. If the scope is baseObject, only the entry named by the distinguished name in baseObject field will be searched. If the scope is singleLevel, only the entries immediately below the baseObject entry will be searched. If the scope is wholeSubtree, then the entry named by baseObject and all entries in the tree below it are searched.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 27 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-27
The derefAliases specifies whether the client requests special processing when an alias entry is encountered. Alias entries contain an attribute with a DN value that is the name of another entry, similar in concept to a symbolic link in a UNIX® file system. Alias entries are not supported by all directory servers and many deployments do not contain any alias entries. The sizeLimit indicates the maximum number of entries to be returned in the search result, and the timeLimit the number of seconds that the server should spend processing the search. The client can provide the value 0 for either to specify “no limit.” The attributes field contains a list of the attribute types that the client requests be included from each of the entries in the search result. If this field contains an empty list, then the server will return all attributes of general interest from the entries. The client may also request that only types be returned, and not values. The LDAP Filter is specified in the protocol encoding using ASN.1; however, most client APIs allow a simple text encoding of the filter to be used by applications. This textual encoding is defined in RFC 2254 [Howes, 1997]. In LDAP search processing, a filter, when tested against a particular entry, can evaluate to TRUE, FALSE, or Undefined. If the filter evaluates to FALSE or Undefined, then that entry is not returned in the search result set. Each filter is grouped by parenthesis, and the most common types of filter specify the “present,” “equalityMatch,” “substrings,” “and,” and “or” filter predicates. The “present” filter evaluates to TRUE if an attribute of a specified type is present in the entry. It is represented by following the type of the attribute with “=*,” as in (telephoneNumber=*). The “equalityMatch” filter evaluates to TRUE if an attribute in the entry is of a matching type and value to that of the filter. It is represented as the type of the attribute, followed by a “=,” then the value, as in (cn=John Smith). The “substring” filter evaluates to TRUE by comparing the values of a specified attribute in the entry to the pattern in the filter. It is represented as the type of the attribute, followed by a “=,” and then any of following, separated by “*” characters: • A substring that must occur at the beginning of the value • Substrings that occur anywhere in the value • A substring that must occur at the end of the value For example, a filter (cn=John*) would match entries that have a commonName (cn) attribute beginning with the string “John.” A filter (cn=*J*Smith) matches entries that have a value that contains the letter “J,” and ends with “Smith.” Many servers have restrictions on the substring searches that can be performed. It is typical for servers to restrict the minimum substring length. An “and” filter consists of a set of included filter conditions, all of which must evaluate to TRUE if an entry is to match the “and” filter. This is represented using the character “&” followed by the set of included filters, as in (&(objectClass=person)(sn=smith)(cn=John*)). An “or” filter consists of a set of included filter conditions, any of which must evaluate to TRUE if an entry is to match the “or” filter. This is represented using the character “|” followed by the set of included filters, as in (|(sn=smith)(sn=smythe)). The “not” filter consists of a single included filter whose sense is inverted: TRUE becomes FALSE, FALSE becomes TRUE, and Undefined remains as Undefined. The negation filter is represented using the character “!” followed by the included filter, as in (!(objectClass=device)). Note that the negation filter applies to a single search filter component, which may be compound. Most LDAP servers cannot efficiently process the “not” filter, so it should be avoided where possible. Other filters include the approxMatch filter, the greaterOrEqual, the lessOrEqual, and the extensible filter, which are not widely used. The approximate matching filter allows for string matching based on algorithms for determining phonetic matches, such as soundex, metaphone, and others implemented by the directory server.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 28 Wednesday, August 4, 2004 9:05 AM
34-28
The Practical Handbook of Internet Computing
34.6.7 Search Responses The server will respond to the search with any number of LDAPMessage PDUs with the SearchResultEntry choice, one for each entry that matched the search, as well as any number of LDAPMessage PDUs with the SearchResultReference choice, followed by an LDAPMessage with the SearchResultDone choice. SearchResultEntry ::= [APPLICATION 4] SEQUENCE { objectName LDAPDN, attributes PartialAttributeList } SearchResultReference ::= [APPLICATION 19] SEQUENCE OF LDAPURL SearchResultDone ::= [APPLICATION 5] LDAPResult The SearchResultReference is returned by servers that do not perform chaining, to indicate to the client that it must progress the operation itself by contacting other servers. For example, if server S1 holds dc=example,dc=com, servers S2 and S3 hold ou=People,dc=example,dc=com and server S4 holds ou=Groups,dc=example,dc=com, then a wholeSubtree search sent to server S1 would result in the two LDAPMessage PDUs containing SearchResultReference being returned, one with the URLs: ldap://S2/ou=People,dc=example,dc=com ldap://S3/ou=People,dc=example,dc=com and the other with the URL: ldap://S4/ou=Groups,dc=example,dc=com followed by a SearchResultDone. Invoking a search request in the C API, and blocking for the results, uses the ldap_search_s call. int ldap_search_s ( LDAP const char int const char char int LDAPControl LDAPControl struct timeval int LDAPMessage
*ld, *basedn, scope, *filter, **attrs, attrsonly, **serverctrls, **clientctrls, *timeout, sizelimit, **res);
The scope parameter can be one of LDAP_SCOPE_BASE, LDAP_SCOPE_ONELEVEL, or LDAP_SCOPE_SUBTREE.
34.6.8 Abandoning an Operation While the server is processing a search operation, the client can indicate that it is no longer interested in the results by sending an abandon request, containing the messageID of the original search request. AbandonRequest ::= [APPLICATION 16] MessageID The server does not reply to an abandon request, and no further results for the abandoned operation are sent.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 29 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-29
In the C API, the client requests that an operation it invoked on that connection with an earlier asynchronous call be abandoned using the ldap_abandon call. int ldap_abandon (LDAP *ld,int msgid);
34.6.9 Compare Request The compare operation allows a client to determine whether an entry contains an attribute with a specific value. The typical server responses will be the compareFalse or compareTrue results codes, indicating that the comparison operation failed or succeeded. In practice, few client applications use the compare operation. CompareRequest ::= [APPLICATION 14] SEQUENCE { entry LDAPDN, ava AttributeValueAssertion} AttributeValueAssertion ::= SEQUENCE { attributeDesc AttributeDescription, assertionValue OCTET STRING } In the C API, the client requests a comparison on an attribute with a string syntax using the ldap_compare_s call. Note that a successful comparison is expressed with the result code LDAP_COMPARE_TRUE rather than LDAP_SUCCESS. int ldap_compare_s (LDAP *ld,const char *dn, const char *type, const char *value);
34.6.10 Add, Delete, Modify, and ModifyDN Operations The Add, Delete, Modify, and ModifyDN operations operate on individual entries in the directory tree. ModifyRequest ::= [APPLICATION 6] SEQUENCE { object LDAPDN, modification SEQUENCE OF SEQUENCE { operation ENUMERATED { add (0), delete (1), replace (2) }, modification AttributeTypeAndValues } } In the C API, the client can invoke the modify operation using the ldap_modify_s call. int ldap_modify_s (LDAP *ld,const char *dn,LDAPMod **mods); typedef struct LDAPMod { int mod_op; #define LDAP_MOD_ADD0x0 #define LDAP_MOD_DELETE0x1 #define LDAP_MOD_REPLACE0x2 #define LDAP_MOD_BVALUES0x80 char *mod_type; union mod_vals_u { char **modv_strvals; #define mod_values mod_vals.modv_strvals struct berval **modv_bvals; #define mod_bvalues mod_vals.modv_bvals } mod_vals; LDAPMod;
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 30 Wednesday, August 4, 2004 9:05 AM
34-30
The Practical Handbook of Internet Computing
The mod_op field is based on one of the values LDAP_MOD_ADD, LDAP_MOD_DELETE or LDAP_MOD_REPLACE. The Add operation creates a new entry in the directory tree. AddRequest ::= [APPLICATION 8] SEQUENCE { entry LDAPDN, attributes SEQUENCE OF AttributeTypeAndValues } In the C API, the client can invoke the add operation using the ldap_add_s call. int ldap_add_s (LDAP *ld,const char *dn,LDAPMod **attrs); The Delete operation removes a single entry from the directory. DelRequest ::= [APPLICATION 10] LDAPDN In the C API, the client can invoke the delete operation using the ldap_delete_s call. int ldap_delete_s (LDAP *ld,const char *dn); The ModifyDN operation can be used to rename or move an entry or an entire branch of the directory tree. The entry parameter specifies the DN of the entry at the base of the tree to be moved. The newrdn parameter specifies the new relative distinguished name (RDN) for that entry. The deleteoldrdn parameter controls whether the previous RDN should be removed from the entry or just be converted by the server into ordinary attribute values. The newSuperior field, if present, specifies the name of the entry that should become the parent of the entry to be moved. ModifyDNRequest ::= [APPLICATION 12] SEQUENCE { entry LDAPDN, newrdn RelativeLDAPDN, deleteoldrdn BOOLEAN, newSuperior [0] LDAPDN OPTIONAL } Many directory servers do not support the full range of capabilities implied by the ModifyDN operation (e.g., subtree rename), so this operation is not frequently used by clients.
34.7 Conclusion This history of the evolution of LDAP technology is indeed a unique and fascinating case study in the evolution of a key Internet protocol and client-server technology used worldwide. There are many excellent client and server products based on LDAP available from several companies, each providing its various advantages and disadvantages. However, it is fair to say that it was through the diligent efforts of a few dedicated individuals that the emergence of the Internet as a commercially viable technology was accomplished, and it was the financial investment of several research organizations and corporations in this technology that has made directories based on the lightweight directory access protocol critical components of the worldwide public Internet, most corporate and organizational wired and wireless intranets, and the global wireless phone network.
Acknowledgments The authors wish to thank K. C. Francis, Steve Kille, Scott Page, Stephen Shoaff, Kenneth Suter, Nicki Turman, and Neil Wilson for their insightful suggestions and editorial comments on earlier drafts of this article.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 31 Wednesday, August 4, 2004 9:05 AM
Internet Directory Services Using the Lightweight Directory Access Protocol
34-31
References Barker, Paul and Steve Kille. The COSINE and Internet X.500 Schema. Internet RFC 1274, November 1991. Bellwood, Tom (Ed.). UDDI Version 2.04 API Specification. July 2002. Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, and Eve Maler, (Eds.). Extensible markup language (XML) 1.0 (second edition). W3C, October 2000. Howes, Tim. The String Representation of LDAP Search Filters, Internet RFC 2254, December 1997. Howes, Tim and Mark Smith. The LDAP URL Format. Internet RFC 2255, December 1997. Good, Gordon. The LDAP Data Interchange Format (LDIF) — Technical Specification, Internet RFC 2849, June 2000. Gudgin, Martin, Marc Hadley, Noah Mendelsohn, Jean-Jacques Moreau, and Henrick Nielsen. SOAP version 1.2 part 1: Messaging framework. W3C, June 2003. Kille, Steve et al., Using Domains in LDAP/X.500 Distinquishing Names, Internet RFC 2247, January 1998. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.500 — The Directory: Overview of concepts, models, and services, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.501 — The Directory: Models, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.511 — The Directory: Abstract service definition, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.518 — The Directory: Procedures for distributed operation, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.519 — The Directory: Protocol specifications, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.520 — The Directory: Selected attribute types, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.521 — The Directory: Selected object classes, 1993. International Telecommunications Union (ITU). Information technology — Open Systems Interconnection Recommendation X.525 — The Directory: Replication, 1993. International Telecommunications Union (ITU). Information technology Recommendation X.681 — Abstract Syntax Notation One (ASN.1): Specification of basic notation, 1994. International Telecommunications Union (ITU). Information technology Recommendation X.690 — ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER), 1994. Leach, Paul and Chris Newman. Using Digest Authentication as a SASL Mechanism. Internet RFC 2831, May 2000. Myers, John. Simple Authentication and Security Layer (SASL), Internet RFC 2222, October 1997. OASIS. Organization for the Advancement of Structured Information Standards. Directory Services Markup Language v2.0, December 2001. OASIS. Organization for the Advancement of Structured Information Standards. OASIS/ebXML Registry Services Specification v2.0. April 2002. Rolls, Darren (Ed.). Service Provisioning Markup Language (SPML) Version 1.0. OASIS Technical Committee Specification, June 2003. Rose, Marshall T. and Dwight E. Cass. ISO Transport Services on top of TCP: Version 3. Internet RFC 1006, May 1987. Smith, Mark, Definition of the inet OrgPerson LDAP Object Class, Internet RFC 2798, April 2000. Smith, Mark, Andrew Herron, Tim Howes, Mark Wahl, and Anoop Anantha. The C LDAP Application Program Interface. Internet draft-ietf-ldapext-ldap-c-api-05.txt, November 2001. Wahl, Mark. A Summary of X.500(96) User Schema for use with LDAPv3, Internet RFC 2256 December 1997.
Copyright 2005 by CRC Press LLC
C3812_C34.fm Page 32 Wednesday, August 4, 2004 9:05 AM
34-32
The Practical Handbook of Internet Computing
Wahl, Mark, Tim Howes, and Steve Kille. Lightweight Directory Access Protocol (v3). Internet RFC 2251, December 1997a. Wahl, Mark, Andrew Coulbeck, Tim Howes, and Steve Kille. Lightweight Directory Access Protocol (v3): Attribute Syntax Definitions. Internet RFC 2252, December 1997b. Wahl, Mark, Steve Kille, and Tim Howes. Lightweight Directory Access Protocol (v3): UTF-8 String Representation of Distinguished Names. Internet RFC 2253, December 1997c. Weltman, Rob. LDAP Proxied Authorization Control. Internet draft-weltman-ldapv3-proxy-12.txt, April 2003. Yeong, Wengyik, Tim Howes, and Steve Kille. Lightweight Directory Access Protocol. Internet RFC 1777, March 1995. Yergeau, Frank. UTF-8, a transformation format of Unicode and ISO 10646. Internet RFC 2044, October 1996.
Authors Greg Lavender is currently a director of software engineering and CTO for Internet directory, network identity, communications, and portal software products at Sun Microsystems. He was formerly Vice President of Technology at Innosoft International, co-founder and Chief Scientist of Critical Angle, an LDAP directory technology startup company, and co-founder and Chief Scientist of the ISODE Consortium, which was an open-source, not-for-profit, R&D consortium that pioneered both X.500 and LDAP directory technology. He is also an adjunct associate professor in the Department of Computer Sciences at the University of Texas at Austin. Mark Wahl is currently a senior staff engineer and Principal Directory Architect at Sun Microsystems. He was previously Senior Directory Architect at Innosoft International, co-founder and President of Critical Angle, and lead directory server engineer at the ISODE Consortium. He was the co-chair of the IETF Working Group on LDAPv3 Extensions, and co-author and editor-in-chief of the primary LDAPv3 RFCs within the IETF.
Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 1 Wednesday, August 4, 2004 9:09 AM
35 Peer-to-Peer Systems CONTENTS Abstract 35.1 Introduction 35.2 Fundamental Concepts 35.2.1 Principles of P2P Architectures 35.2.2 Classification of P2P Systems 35.2.3 Emergent Phenomena in P2P Systems
35.3 Resource Location in P2P Systems 35.3.1 Properties and Categories of P2P Resource Location Systems 35.3.2 Unstructured P2P Systems 35.3.3 Hierarchical P2P Systems 35.3.4 Structured P2P Systems
35.4 Comparative Evaluation of P2P Systems
Karl Aberer Manfred Hauswirth
35.4.1 Performance 35.4.2 Functional and Qualitative Properties
35.5 Conclusions References
Abstract Peer-to-peer (P2P) systems offer a new architectural alternative for global-scale distributed information systems and applications. By taking advantage of the principle of resource sharing, i.e., integrating the resources available at end-user computers into a larger system, it is possible to build applications that scale to a global size. Peer-to-peer systems are decentralized systems in which each participant can act as a client and as a server and can freely join and leave the system. This autonomy avoids single-point-of-failures and provides scalability but implies considerably higher complexity of algorithms and security policies. This chapter gives an overview of the current state in P2P research. We present the fundamental concepts that underly P2P systems and offer a short, but to-the-point, comparative evaluation of current systems to enable the reader to understand their performance implications and resource consumption issues.
35.1 Introduction The Internet has enabled the provision of global-scale distributed applications and some of the most well-known Internet companies’ success stories are based on providing such applications. Notable examples are eBay, Yahoo!, and Google. Although these applications rely on the Internet’s networking infrastructure to reach a global user community, their architecture is strictly centralized, following the clientserver paradigm. This has a significant impact on the computing resources required to provide the services. For example, Google centrally collects information available on the Web by crawling Web sites, indexing the retrieved data, and providing a user interface to query this index. Recent numbers reported Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 2 Wednesday, August 4, 2004 9:09 AM
35-2
The Practical Ha
by Google (http://www.google.com/) show that their search engine service requires a workstation cluster of about 15,000 Linux servers. From this observation one might be tempted to conclude that providing a global-scale application necessarily implies a major development, infrastructure, and administration investment. However, this conclusion has been shown to be inaccurate by a new class of applications, initially developed for the purpose of information sharing (e.g., music files, recipes, etc.). These systems are commonly denoted as P2P file sharing systems. The essential insight on which this new type of system builds is to take advantage of the principle of resource sharing: The Internet makes plenty of resources available at the end-user computers, or at its “edges,” as is frequently expressed. By integrating these into a larger system it is possible to build applications at a global scale without experiencing the “investment bottleneck” mentioned above. From a more architectural point of view, client-server systems are asymmetric whereas P2P systems are symmetric. In the client-server approach to system construction, only the clients request data or functionality from a server and consequently client-server systems are inherently centralized. To deal with the possibly very large number of clients, replication of servers and load balancing techniques are applied. Still, this centralized virtual server is a single-point of failure and a network bandwidth bottleneck because all network traffic uses the same Internet connection, i.e., if the server crashes or if it is not reachable, the system cannot operate at all. On the other hand, this approach offers a more stringent control of the system that facilitates the use of simple yet efficient algorithms and security policies. In contrast, P2P systems are decentralized systems. There no longer exists a distinction between clients and servers but each peer can act both as a client and as a server, depending on which goal needs to be accomplished. The functionality of a P2P system comes into existence by the cooperation of the individual peers. This approach avoids the problems of client-server systems, i.e., single-point of failure, limited scalability, and “hot spots” of network traffic, and allows the participants to remain autonomous in many of their decisions. However, these advantages come at the expense of a considerably higher complexity of algorithms and more complicated security policies. Nevertheless, autonomy, scalability, load-sharing, and fault-tolerance are of such premiere importance in global-scale distributed systems that the P2P approach is of major interest. Napster was the first and most famous proponent of this new class of systems. By the above definition, Napster is not a pure P2P system but a hybrid one: Coupling of resources was facilitated through a central directory server, where clients (Napster terminology!) interested in sharing music files logged in and registered their files. Each client could send search requests to the Napster server, which searched its database of currently registered files and returned a list of matches to the requesting client (client-server style). The requester then could choose from this list and download the files directly from an offering computer (P2P style). The centralized database of Napster provided efficient search functionality and data consistency but limited the scalability of the system. Nevertheless, Napster took advantage of a number of resources made available by the community of cooperating peers by its P2P approach: Storage and bandwidth: Because the most resource-intensive operations in music file sharing, i.e., the storage and the exchange of usually large data files, were not provided by a central server but by the participating peers, it was possible to build a global-scale application with approximately 100 servers running at the Napster site. This is lower than Google’s requirement by orders of magnitude, even if the resource consumption of the system is of the same scale. At the time of its top popularity in February 2001, Napster had 1.57 million online users, who shared 220 files on average, and 2.79 billion files were downloaded during that month. Knowledge: Users annotate their files (author, title, rating, etc.) before registering at the Napster server. In total this amounts to a substantial investment in terms of human resources, which facilitates making searches in Napster more precise. This effort can be compared to the one invested by Yahoo! for annotating Web sites. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 3 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-3
Ownership: The key success factor for the fast adoption of the Napster system was the possibility to obtain music files for free. This can be viewed as “sharing of ownership.” Naturally this large-scale copyright infringement provoked a heavy reaction from the music industry, which eventually led to the shutdown of Napster. The interpretation of the underlying economic processes is not the subject of this chapter, but they are quite interesting and a topic of vivid debates (see, e.g., http:/ /www.dklevine.com/general/intellectual/napster.htm). In parallel, another music file sharing system that was a pure P2P system entered the stage: Gnutella [Clip2, 2001]. Though Gnutella provides essentially the same functionality as Napster, no central directory server is required. Instead, in Gnutella each peer randomly chooses a small number of “neighbors” with whom it keeps permanent connections. This results in a connection graph in which each peer forwards its own or other peers’ search requests. Each peer that receives a query and finds a matching data item in its store, sends an answer back to the requester. After a certain number of hops queries are no longer forwarded to avoid unlimited distribution, but still the probability of finding a peer that is able to satisfy it is high due to certain properties of the Gnutella graph, which will be discussed in detail later. A more technical and detailed discussion of Gnutella is provided in Section 35.3.2.1. Gnutella’s approach to search, which is commonly denoted as constrained flooding or constrained broadcast, involves no distinguished component that is required for its operation. It is a fully decentralized system. Strictly speaking, Gnutella is not a system, but an open protocol that is implemented by Gnutella software clients. As Gnutella can run without a central server, there is no single-point of failure, which makes attacks (legal, economic, or malicious ones) very difficult. The downside of Gnutella’s distributed search algorithm, which involves no coordination at all, is high network bandwidth consumption. This severely limits the throughput of the system and the network, which must support a plenitude of other services as well. So a key issue which is subject to current research is to find less bandwidth-intensive solutions: Do systems exist that offer similar functional properties as Gnutella or is decentralization necessarily paid by poor performance and high bandwidth consumption? In addition, the performance of message flooding with respect to search latency depends critically on the global network structure that emerges from aggregate local behaviors of the Gnutella peers. In other words, the global network structure is the result of a self organization process. This leads to the interesting question of how it is possible to construct large self-organizing P2P systems such as Gnutella, yet with predictable behavior and quality-of-service guarantees. In the following we will describe the current state-of-the-art in research and real-world systems to shed more light on these questions. We will present the concepts of P2P computing in general and discuss the existing P2P approaches.
35.2 Fundamental Concepts 35.2.1 Principles of P2P Architectures The fundamental difference between a P2P architecture and a client–server architecture is that there no longer exists a clear distinction between the clients consuming a service and the servers providing a service. This implies partial or complete symmetry of roles regarding the system architecture and the interaction patterns. To clearly denote this, often the term servent is used for peers. The P2P architecture has two benefits: It enables resource sharing and avoids bottlenecks. Pure P2P systems such as Gnutella, which feature completely symmetric roles, no longer require a coordinating entity or globally available knowledge to be used for orchestrating the evolution of the system. None of the peers in a pure P2P system maintains the global state of the system and all interactions of the peers are strictly local. The peers interact with only a very limited subset of other peers, which is frequently called “neighborhood” in the literature to emphasize its restricted character. Therefore, the control of the system is truly decentralized. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 4 Wednesday, August 4, 2004 9:09 AM
35-4
The Practical Handbook of Internet Computing
Decentralization is beneficial in many situations but its foremost advantage is that it enables scalability, which is a key requirement for designing large-scale, distributed applications: Because the individual components (peers) of a decentralized system perform local interactions only, the number of interactions remains low compared to the size of the system. As no component has a global view of the state, the resource consumption in terms of information storage and exchange compared to the size of the system also stays low. Therefore these systems scale well to very large system sizes. However, the development of efficient distributed algorithms for performing tasks in a decentralized system such as search or data placement is more complex than for the centralized case. Even though no central entity exists, pure, decentralized P2P systems develop certain global structures that emerge from the collective behavior of the system’s components (peers), and that are important for the proper operation of the system. For example, in the case of Gnutella, the global network structure emerges from the local pair-wise interactions that Gnutella is based on. Such global structures emerge without any external or central control. Systems with distributed control-evolving emergent structures are called self-organizing systems (decentralization implies self organization). A significant advantage of self-organizing systems is their failure resilience, which — in conjunction with the property of scalability — makes such systems particularly attractive for building large-scale, distributed applications. Understanding the global behavior of P2P systems based on these principles is not trivial. To alleviate the comprehension of these phenomena and their interplay we will provide further details and discussions in Section 35.2.3.
35.2.2 Classification of P2P Systems The P2P architectural paradigm is not new to computer science. In particular, the Internet’s infrastructure and services exploit the P2P approach in many places at different levels of abstraction — for example, in routing, the domain name service (DNS), or in Usenet News. These mechanisms share a common goal: locating or disseminating resources in a network. What is new with respect to the recent developments in P2P systems and more generally in Web computing is that the P2P paradigm also appears increasingly at other system layers. We can distinguish the following layers for which we observe this development: Networking layer: basic services to route requests over the physical networks to a network address in an application-independent way Data access layer: management of resource membership to specific applications; search and update of resources using application-specific identifiers in a distributed environment Service layer: combination and enhancement of data access layer functionalities to provide higherlevel abstractions and service ranging from simple data exchange, such as file sharing, to complex business processes User layer: interactions of users belonging to user communities using system services for community management and information exchange It is interesting to observe that the P2P paradigm can appear at each of these layers independently. The analysis to what degree and at which layers the P2P paradigm is implemented in a concrete system facilitates a more precise characterization and classification of the different types of P2P systems. Examples of how the P2P paradigm is used at the different layers are given in Figure 35.1. Layer Network Data access Service
Application domain Internet Overlay networks P2P applications
User Layer
User communities
Service Routing Resource location Messaging, distibuted processing Collaboration
FIGURE 35.1 The P2P paradigm at different layers. Copyright 2005 by CRC Press LLC
Example system TCP/IP, DNS Gnutell, Freenet Napster, Seti, Groove eBay, Ciao
C3812_C35.fm Page 5 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-5
For the networking layer of the Internet we have already pointed out the fact that it relies strongly on P2P principles, in particular, to achieve failure resilience and scalability. After all, the original design goal of Arpanet, the ancestor of today’s Internet, was to build a scalable and failure-resilient networking infrastructure. An important aspect in recent P2P systems, such as Freenet and Gnutella, is resource location. These systems construct so-called overlay networks over the physical network. In principle, applications could use the services provided by the networking layer to locate their resources of interest. However, having a separate, application-specific overlay network has the advantage of supporting application-specific identifiers and semantic routing and offers the possibility to provide additional services for supporting network maintenance, authentication, trust, etc., all of which would be hard to integrate into and support at the networking layer. The introduction of overlay networks is probably the essential innovation of P2P systems. The P2P architecture can also be exploited at the service layer. In fact, with Napster we already discussed a system in which only this layer is organized in a P2P way (i.e., the download of files), whereas resource location is centralized. Also the state-of-the-art architectures for Web services such as J2EE and .NET build on a similar, directory-based architecture: Services are registered at a directory (e.g., a UDDI directory), where they can be looked up. The service invocation is then directly between the service provider and the requester. The main difference, though, between Napster and Web services is that Napster deals with only one type of service. Another well-known example for the P2P paradigm used at the service layer is SETI@Home, which exploits idle processing time on desktop computers to analyze radio signals for signs of extraterrestrial intelligence. Because social and economic systems are typically organized in a P2P style and are also typical examples of self-organizing systems, it is natural that this is reflected in applications that support social and economic interactions. Examples are eBay or recommender systems such as Ciao (www.ciao.com). From a systems perspective these systems are centralized both at the resource location and the service layer, but the user interactions are P2P. Another classification in P2P approaches can be given with respect to the level of generality at which they are applicable. Generally, research and development on P2P systems is carried out at three levels of generalization: P2P applications: This includes the various file sharing systems, such as Gnutella, Napster, or Kazaa. P2P platforms: Here, most notably SUN’s JXTA platform [Gong, 2001] is an example of a generic architecture standardizing the functional architecture, the component interfaces, and the standard services of P2P systems. P2P algorithms: This area deals with the development of scalable, distributed, and decentralized algorithms for efficient P2P systems. The current focus in research is on algorithms for efficient resource location.
35.2.3 Emergent Phenomena in P2P Systems A fair share of the fascination of P2P systems but probably also their major challenges stem from the emergent phenomena that play a main role at all system layers. These phenomena arise from the fact that the systems operate without central control and thus it is not always predictable how they will evolve. Self-organizing systems are well-known in many scientific disciplines, in particular in physics and biology. Prominent examples are crystallization processes or insect colonies. Self-organization is basically a process of evolution of a complex system with only local interaction of system components, where the effect of the environment is minimal. Self-organization is driven by randomized variation processes — movements of molecules in the case of crystallization, movements of individual insects in insect colonies, or queries and data insertions in the case of P2P systems. These “fluctuations” or “noise,” as they are also called, lead to a continuous perturbation of the system and allow the system to explore a global state space till it finds stable (dynamic) equilibrium states. These states correspond to the global, emergent structures [Heylighen, 1997]. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 6 Wednesday, August 4, 2004 9:09 AM
35-6
The Practical Handbook of Internet Computing
In the area of computer science self-organization and the resulting phenomena have been studied in particular in the field of artificial intelligence for some time (agent systems, distributed decision making, etc.), but with today’s P2P systems large-scale, self-organizing applications have become reality for the first time. Having P2P systems widely deployed offers an unforeseen opportunity for studying these phenomena in concrete large-scale systems. Theoretical work on self-organization can now be verified in the real world. Conversely, insights gained from self-organizing P2P systems may and do also contribute to the advancement of theory. An implicit advantage of self-organizing systems that makes them so well applicable for constructing large-scale distributed applications is their inherent failure resilience: The global behavior of self-organizing systems is insensitive to local perturbations such as local component failures or local overloads. Other components can take over in these cases. Thus self-organizing systems tend to be extremely robust against failures. Put differently, in self-organizing systems random processes drive the exploration of the global state space of the system in order to “detect” the stable subspaces that correspond to the stable, emergent structures. Thus failures simply add to the randomization and thus to the system’s evolution. A number of experimental studies have been performed in order to elicit the emergent properties of P2P systems. In particular Gnutella has been subject to a number of exemplary studies, due to its openness and simple accessibility. We cite here some of these studies that demonstrate the types of phenomena that are likely to play a role in any P2P system and more generally on the Web. Studies of Gnutella’s network structure (that emerges from the local strategies used by peers to maintain their neighborhood) have revealed two characteristic properties of the resulting global Gnutella network [Ripeanu and Foster, 2002]: Power-law distribution of node connectivity: Only few Gnutella peers have a large number of links to other peers whereas most peers have a very low number of links. Small diameter: The Gnutella network graph has a relatively short diameter of approximately seven hops. This property ensures that a message flooding approach for search works with a relatively low timeto-live. The power-law distribution of peer connectivity is explained as a result of a process of preferential attachment, i.e., nodes arriving at the network or changing their connectivity attach with higher probability to already well connected nodes [Barabási and Albert, 1999], whereas the low diameter is usually related to the small-world phenomenon [Kleinberg, 2000], which has first been observed for social networks. Small-world graphs have been identified as a class of graphs that combine the property of having a short diameter as found in random graphs with a high degree of local clustering typical for regular graphs [Watts and Strogatz, 1998]. In addition, they enable the efficient discovery of short connections between any nodes [Kleinberg, 2000]. Examples of small-world graphs are typically constructed from existing regular graphs by locally rewiring the graph structures, very much as they would result from a self-organization process. An example of such a process has been provided in the Freenet P2P system [Clarke et al., 2001, 2002]. Self-organization processes also play a role at the user layer in P2P systems. As P2P systems can also be viewed as social networks, they face similar problems. For example, “free riding” [Adar and Huberman, 2000] has become a serious problem. As in the real world, people prefer to consume resources without offering similar amounts of their own resources in exchange. This effect is also well-known from other types of economies. Most Gnutella users are free-riders, i.e., do not provide files to share, and if sharing happens, only a very limited number of files is of interest at all. Adar and Huberman show that 66% of Gnutella users share no files and nearly 47% of all responses are returned by the top 1% of the sharing hosts. This starts to transform Gnutella into a client–server-like system with a backbone structure, which soon may exhibit the same problems as centralized systems. Another economic issue that is becoming highly relevant for P2P systems is reputation-building [Aberer and Despotovic, 2001]. Simple reputation mechanisms are already in place in some P2P file sharing systems such as Kazaa (http://www.kazaa.com/). The full potential of reputation management can be seen best from online citation indices such as CiteSeer [Flake et al., 2002]. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 7 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-7
In the future we may see many other emergent phenomena. In particular, the possibility not only to have emergent structures, such as the network structure, but also emergent behaviors, e.g., behaviors resulting from evolutionary processes or a decentralized coordination mechanism, such as swarm intelligence [Bonabeau et al., 1999], appears to be a promising and exciting development. A merit of P2P architectures is that they have introduced the principle of self-organization into the domain of distributed application architecture on a broad scale. We have seen that self-organization exhibits exactly the properties of scalability and failure resilience that mainly contributed to the success of P2P systems. This has become especially important recently, as the dramatic growth of the Internet has clearly shown the limits of the standard client–server approach and thus many new application domains inherently are P2P. For example, mobile ad-hoc networking, customer-to-customer e-commerce systems, or dynamic service discovery and workflow composition, are areas where self-organizing systems are about to become reality as the next generation of distributed systems. However, it can be seen that the P2P architectural principle and self-organization can appear in many different ways. We explore this in the following sections in greater detail.
35.3 Resource Location in P2P Systems The fundamental problem of P2P systems is resource location. In fact, P2P systems in the narrow sense, i.e., P2P information sharing systems such as Gnutella or Napster, are resource location systems. Therefore we provide an overview of the fundamental issues and approaches of P2P resource location in this section.
35.3.1 Properties and Categories of P2P Resource Location Systems The problem of resource location can be stated as follows: A group G of peers with given addresses p Œ P holds resources R(p) Õ R. The resources can be, for example, media files, documents, or services. Each resource r Œ R(p) is identified by a resource key k Œ K. The resource keys can be numbers, names, or metadata annotations. Each peer identified by address p is thus associated with a set of keys K(p) Õ K identifying the resources it holds. The problem of resource location is to find, given a resource identifier k, or more generally a predicate on a resource identifier k, a peer with address p that holds that resource, i.e., k Œ K(P). In other words, the task is managing and accessing the binary relation I = {(p, k)|k Œ K(p)} Õ K ¥ P, the index information. Generally, each peer will hold only a subset I(p) Õ I of the index information. Thus, in general it will not be able to answer all requests for locating resources itself. In that case, a peer p can contact another peer in its neighborhood N(p) Õ G. For interacting with other peers two basic protocols need to be supported: Network maintenance protocol: This protocol enables a node to join and leave a group G of peers. In order to join, a peer p needs to know a current member p¢ Œ G to which it can send a join message: p¢Æ join(p¢). This may cause a number of protocol-specific messages to reorganize the neighborhood and the index information kept at different peers of the group. Similarly, a peer can announce that it departs from a group by sending a message p¢Æ leave (p¢). A peer p joining a group G may already belong to another group G¢, and thus different networks may merge and split as a result of local join-and-leave interactions. Whereas joining usually is done explicitly in all protocols, leaving typically is implicit (network separation, peer failure, etc.) and the system has to account for that. Data management protocol: This protocol allows nodes to search, insert, and delete resources, once they belong to a group. The corresponding messages are p¢Æ search(k), which returns one or more peers holding the resource or the resource itself, p¢Æ insert(k,r) for inserting a resource, and p¢Æ delete(k) for deleting it. Peers receiving search, insert, or delete messages may forward them to other peers they know in case they cannot satisfy them. Updates as in database systems are usually not considered because P2P resource location systems usually support the publication of information rather than online data management, and strong consistency of data is not required. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 8 Wednesday, August 4, 2004 9:09 AM
35-8
The Practical Handbook of Internet Computing
These protocols must be supported by all kinds of P2P systems discussed below although the degree and strategies may vary according to the specific approach. The distribution of index information, the selection of the neighborhood, the additional information kept at peers about the neighborhood, and the specific types of protocols supported defines the variants of resource location mechanisms. A direct comparison of resource location approaches is difficult for various reasons: (1) The problem is a very general one, with many functional and performance properties of the systems to be considered simultaneously, (2) the environments in which the systems operate are complex with many parameters defining their characteristics, and (3) the approaches are very heterogeneous because they are designed for rather different target applications. Nevertheless, it is possible to identify certain basic categories of approaches. We may distinguish them along the following three dimensions: Unstructured vs. structured P2P systems: In unstructured P2P systems no information is kept about other nodes in terms of which resources they hold, i.e., the index information corresponding to a resource k Œ K(p) is kept only at peer p itself and no other information related to resources is kept for peers in its neighborhood N(p). This is Gnutella’s approach, for example. The main advantage of unstructured P2P systems is the high degree of independence among the nodes, which yields high flexibility and failure resilience. Structured P2P systems on the other hand can perform search operations more efficiently by exploiting additional information that is kept about other nodes, which enables directing search requests in a goal-oriented manner. Flat vs. hierarchical P2P systems. In flat P2P systems all nodes are equivalent, i.e., there exists no distinction in the roles that the nodes play in the network. Again, Gnutella follows this approach. In hierarchical P2P systems distinctive roles exist, e.g., only specific nodes can support certain operations, such as search. An extreme example is Napster, where only a single node supports search. The main advantage of hierarchical P2P systems is improved performance, in particular for search, which comes at the expense of giving up some benefits of a “pure” P2P architecture, such as failure resilience. Loosely coupled vs. tightly coupled P2P systems. In tightly coupled P2P systems there exists only one peer group at a time and only a single peer may join or leave the group at a time. Upon joining the group, the peer obtains a static, logical identifier that constrains the possible behavior of the peer with respect to its role in the group, e.g., the type of resources it keeps and the way it processes messages. Typically the key space for logical peer identifiers and for resources are identical. In loosely coupled P2P systems different peer groups can evolve, merge, or split. Though peers may play a specific role at a given time, this role and thus the logical peer address may change over time dynamically. Gnutella is an example of a loosely coupled P2P system, whereas Napster is an (extreme) case of a tightly coupled P2P system. In the following we overview the main representatives of current P2P resource location systems in each of these categories and describe their functional properties. Following that, we will provide a comparative evaluation of important functional properties and performance criteria for these systems.
35.3.2 Unstructured P2P Systems All current unstructured P2P systems are flat and loosely coupled. The canonical representative of this class of systems is Gnutella. 35.3.2.1 Gnutella Gnutella is a decentralized file-sharing system that was originally developed in a 14-day “quick hack” by Nullsoft (Winamp) and was intended to support the exchange of cooking recipes. Its underlying protocol has never been published but was reverse-engineered from the original software. The Gnutella protocol consists of 5 message types that support network maintenance (Ping, Pong) a n d data management (Query, QueryHit, Push). Messages are distributed using a simple constrained broadcast mechanism: Upon receipt of a message a peer decrements the message’s time-to-live (TTL) field. If the TTL is greater than 0 and it has not seen the message’s identifier before (loop detection), Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 9 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-9
it resends the message to the other peers it has an open connection to. Additionally, the peer checks whether it should respond to the message, e.g., send a QueryHit in response to a Query message if it can satisfy the query. Responses are routed along the same path, i.e., the same peers, as the originating message. To join a Gnutella network a peer must connect to a known Gnutella peer and send a Ping message that announces its availability and probes for other peers. To obtain peer addresses to start with, dedicated servers return lists of peers. This is outside the Gnutella protocol specification. Every peer receiving the Ping message can cache the new peer’s address and can respond with a Pong message holding its IP address, port number and total size of the files it shares. This way a peer obtains many peer addresses that it caches (also QueryHit and Push messages contain IP address/port pairs, which can additionally be used to fill a peer’s address cache). Out of the returned addresses it selects C neighbors (typically C = 4) and opens permanent connections to those. This defines its position in the Gnutella network graph. If one of these connections is dropped the peer can retry or choose another peer from its cache. To locate a file a peer issues a Query message to all its permanently connected peers. The message defines the minimum speed and the search criteria. The search criteria can be any text string and its interpretation is up to the receivers of the message. Though this could be used as a container for arbitrary structured search requests, the standard is simple keyword search. If a peer can satisfy the search criteria it returns a QueryHit message listing all its matching entries. The originator of the query can then use this information to download the file via a simplified HTTP GET interaction. In case the peer which sent the QueryHit message is behind a firewall, the requester may send a Push message (along the same way as it received the QueryHit) to the firewalled peer. The Push message specifies where the firewalled servent can contact the requesting peer to run a “passive” GET session. If both peers are behind firewalls, then the download is impossible. Gnutella is a simple yet effective protocol: Hit rates for search queries are reasonably high, it is faulttolerant toward failures of servents, and adopts well to dynamically changing peer populations. However, from a networking perspective, this comes at the price of very high bandwidth consumption: Search requests are broadcast over the network and each node receiving a search request scans its local database for possible hits. For example, assuming a typical TTL of seven and an average of four connections C per peer (i.e., each peer forwards messages to three other peers) the maximum possible number of messages originating C *(C - 1)i = 26, 240. from a single Gnutella message is 2 * S TTL i =0 35.3.2.2 Improvements of Message Flooding The high bandwidth consumption of Gnutella is a major drawback. Thus better ways to control the number of messages have been devised [Lv et al., 2002]. Expanding ring search starts a Gnutella-like search with small TTL (e.g., TTL = 1) and if there is no success it iteratively increases the TTL (e.g., TTL = TTL + 2) up to a certain limit. A more radical reduction of the message overhead is achieved by the random walker approach: To start a search, a peer sends out k random walkers. In contrast to Gnutella each peer receiving the request (a random walker) forwards it to only one neighbor, but search messages have a substantially higher TTL. The random walker periodically checks back with the originally requesting peer to stop the search i n case another random walker found the result. Simulation studies and analytical results show that this model reduces message bandwidth substantially at the expense of increased search latency. To further improve the performance, replication schemes are applied. A recent work shows that the random walker model can be further improved by using percolationbased search [Sarshar et al., 2003]. By using results from random graph theory, it is shown that searches based on probabilistic broadcast, i.e., multiple search messages, are forwarded to neighbors only with a probability q < 1 and will be successfully answered traversing qkN links where k is the average degree of nodes in the graph and q is fairly small (e.g., q = 0.02 in a network with N = 20,000 nodes). Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 10 Wednesday, August 4, 2004 9:09 AM
35-10
The Practical Handbook of Internet Computing
35.3.3 Hierarchical P2P Systems Hierarchical P2P Systems store index information at some dedicated peers or servers in order to improve search performance. Simple hierarchical P2P systems were already discussed in detail at the beginning of this chapter by the example of Napster as the best-known example of this class of systems. Thus, the following discussions focus on advanced hierarchical systems that are known under the term super-peer architecture. 35.3.3.1 Super-Peer Architectures The super-peers approach was devised to combine some of the advantages of Napster with the robustness of Gnutella. In this class of systems three types of peers are distinguished: 1. A super-super-peer that serves as the entry point, provides lists of super-peers to peers at startup, and coordinates super-peers. 2. Super-peers that maintain the index information for a group of peers connected to them and interact with other super-peers through a Gnutella-like message forwarding protocol. Multiple super-peers can be associated with the same peer groups. 3. Ordinary peers that contact super-peers in order to register their resources and to obtain index information, just as in Napster. The file exchanges are then performed directly with the peers holding the resources. A certain number of super-peers (a small number compared to the number of peers) is selected among the ordinary peers dynamically. An important criterion for selecting a super-peer is its available physical resources, such as bandwidth and storage space. Experimental studies [Yang and Garcia-Molina, 2002] show that the use of redundant super-peers is of advantage and that super-peers should have a substantially higher number of outgoing connections than in Gnutella (e.g., > 20) to minimize the TTL. In this way the performance of this approach can come close to Napster’s with respect to search latency. Most prominently, the super-peer approach has been implemented by FastTrack (http://www. fasttrack.nu/), which is used in media sharing applications such as Kazaa (http://www.kazaa.com/). This approach has also been implemented in the now defunct Clip2 Reflector and the JXTA search implementation.
35.3.4 Structured P2P Systems Structured P2P systems distribute index information among all participating peers to improve search performance. In contrast to hierarchical P2P systems, this uniform distribution of index information avoids potential bottlenecks and asymmetries in the architecture. The difficult issue in this approach is that not only the index information but also the data access structure to this index information needs to be distributed as illustrated in Figure 35.2. These problems do not appear in the hierarchical approaches, where the data access structure (such as a hash table or a search tree) is only constructed locally. All structured P2P systems share the property that each peer stores a routing table that contains part of the global data access structure, and searches are performed by forwarding messages selectively to peers using the routing table. The routing tables are constructed in a way that their sizes scale gracefully Server Distributed Data Access Structure Data Access Structure Index Information I
FIGURE 35.2 Centralized vs. distributed index. Copyright 2005 by CRC Press LLC
I1
I2
I3
I4
C3812_C35.fm Page 11 Wednesday, August 4, 2004 9:09 AM
35-11
Peer-to-Peer Systems
and searches are fast (e.g., both table size and search latency are logarithmic in the total number of peers). Using routing tables to direct search messages reduces the total bandwidth consumed by searches and therefore these systems also scale well with respect to throughput. This advantage distinguishes them from unstructured P2P networks. There are various strategies to organize the distributed data access structure, i.e., how the routing tables of the peers are being maintained: • Ad hoc caching and clustering of index information along search paths (Freenet [Clarke et al., 2001, 2002]) • Prefix/suffix routing (P-Grid [Aberer et al., 2003], Chord [Dabek et al., 2001], Pastry [Rowstron and Druschel, 2001], and Tapestry [Zhao et al., 2004]) • Routing in a d-dimensional space (CAN [Ratnasamy et al., 2001]) Besides the applied routing approach, the main differences among the systems are found in their functional flexibility regarding issues such as replication, load balancing, and supported search predicates and in the properties of the network maintenance protocol. Loosely coupled structured P2P systems such as Freenet and P-Grid share the flexibility of network evolution with unstructured P2P networks, whereas tightly coupled structured P2P systems, such as CAN, Chord, Tapestry, and Pastry, impose a more strict control on the global properties and structure of the network for the sake of controllable performance. In the following text we will discuss the corresponding systems and some of their technical details for each of the three strategies. 35.3.4.1 Freenet Freenet [Clarke et al., 2001, 2002] is a P2P system for the publication, replication, and retrieval of data files. Its central goal is to provide an infrastructure that protects the anonymity of authors and readers of the data. It is designed in a way that makes it infeasible to determine the origin of files or the destination of requests. It is also difficult for a node to determine what it stores, as the files are encrypted when they are stored and sent over the network. Thus — following the reasoning of the designers of Freenet — nobody can be legally challenged, even if he or she stores and distributes illegal content, for example. Freenet has an adaptive routing scheme for efficiently routing requests to the physical locations where matching resources are most likely to be stored. The routing tables are continually updated as searches and insertions of data occur. Additionally, Freenet uses dynamic replication of files to replicate them along search paths. Thus, search hits may occur earlier in the search process, which further improves search efficiency. When a peer joins a Freenet network, it has to know some existing node in the network. By interacting with the network it will fill its routing table, which is initially empty, and the Freenet network structure will evolve. Figure 35.3 shows a sample Freenet routing table. The routing tables in Freenet store the addresses of the neighboring peers and additionally the keys of the data files that this peer stores along with the corresponding data. When a search request arrives, it may be that the peer stores the data in its table and can immediately answer the request. Otherwise, it has to forward the request to another peer. This is done by selecting the peer that has the most similar key in terms of lexicographic distance. When an answer arrives the peer stores the answer in its data store. If the data store is already full this might require evicting other entries using a LRU (least recently used) strategy. As can be seen in Figure 35.3 the peer may also decide to evict only the data corresponding to a key before it evicts the address of the peer in order to save space while maintaining the routing Key 8e4768isdd0932uje89 456r5wero04d903iksd0 712345jb89b8nbopledh
Data ZT38we01h02hdhgdzu Rhweui12340jhd091230
FIGURE 35.3 Sample routing table in Freenet. Copyright 2005 by CRC Press LLC
Address tcp/125.45.12.56:6474 tcp/67.12.4.65:4711 tcp/40.56.123.234:1111
C3812_C35.fm Page 12 Wednesday, August 4, 2004 9:09 AM
35-12
The Practical Handbook of Internet Computing
information. Because peers route search requests only to the peer with the closest key, Freenet implements a depth-first search strategy rather than a breadth-first strategy as Gnutella. Therefore, the time-to-live of messages is also substantially longer, typically 500. As in Gnutella, Freenet messages carry identifiers in order to detect cycles. Figure 35.4 shows Freenet’s search mechanism and network reorganization strategy. Peer A is sending a search request for file X.mp3 to B. As it does not have the requested data, B forwards the request to C, which has the closest key in its routing table. Because the TTL is 2 and C does not hold the data, this request fails. Therefore, B next forwards the request to D (next-similar key) where the data is found. While sending back the response containing X.mp3, the file is cached at all peers, i.e., at B and A. In addition A learns about a new node D and a new connection is created in the network. Adding new information to Freenet, i.e., adding a file, is done in a sophisticated manner that tries to avoid key collisions: First, a key is calculated which is then sent out as a proposal in an insert message with a time-to-live value. The routing of insert messages uses the same key similarity measure as searching. Every receiving peer checks whether the proposed key is already present in its local store. If yes, it returns the stored file and the original requester must propose a new key. If no, it routes the insert message to the next peer for further checking. The message is forwarded until the time-to-live is 0 or a collision occurs. If the time-to-live is 0 and no collision was detected, the file is inserted along the path established by the initial insert message. As the caching strategy for the routing tables is designed in a way that tends to cluster similar data (keys) at nodes in the network, the assumption is that nodes get more and more specialized over time. This assumption turns out to hold in simulations of Freenet, which show that the median path length converges to logarithmic length in the size of the network. An explanation why the search performance improves so dramatically is found in the properties of the graph structure of the network. Analyses of the resulting Freenet networks reveal that they have the characteristics of small-world graphs. 35.3.4.2 Prefix/Suffix Routing The idea of prefix/suffix routing is usually credited to Plaxton [Plaxton et al., 1997] and in the meanwhile a number of variants of this approach have been introduced. We describe here only the underlying principle in a simplified form, to illustrate the strategy to construct a scalable, distributed data access structure. Without constraining generality, assume that the keys K for identifying resources are binary. Then we may construct a binary tree, where each level of the tree corresponds to one bit of a key. The edges to the left are marked 0 whereas the edges to the right are marked 1. Following some path of such a tree starting at the root produces every possible key (more precisely the structure described is a binary trie).
TTL = 2
E A C
X.mp3 1
X.mp3 not found
2 3
B
X.mp3
D X.mp3
FIGURE 35.4 Searching in Freenet. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 13 Wednesday, August 4, 2004 9:09 AM
35-13
Peer-to-Peer Systems
??? 0
1
0??
1??
00
01
10
11
00?
01?
10?
11?
000 001
010 011
100 101
110 111
peer 1
peer 2
peer 3
peer 4
??? 0
0??
???
??? 1
0
reference of peer 3 at level
0??
00
???
01
1
1
1??
1??
10
11
reference of peer 3 at level
2
00?
01?
10?
11?
000 001
010 011
100 101
110 111
peer 1
peer 2
peer 3
peer 4
FIGURE 35.5 Prefix routing.
For simplicity we assume that the tree is balanced and has depth d. The upper part of Figure 35.5 shows such a tree for d = 2. Now we associate each leaf of the tree with one peer and make the peer responsible for holding index information on those resources whose keys have the path leading from the root to the leaf (peer) as a prefix. As we cannot store the tree centrally, the key idea for distributing the tree (i.e., the data access structure) is that a peer associated with a specific leaf stores information on all nodes in the path from the root of the tree to its own leaf. More precisely, for each node along the path (each level) it stores the address(es) of some peer(s) that can be reached by following the alternative branch of the tree at the respective level. This information constitutes its routing information. The lower part of Figure 35.5 shows how the tree is decomposed into routing tables by this strategy and gives two sample routing table entries for peer 3. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 14 Wednesday, August 4, 2004 9:09 AM
35-14
The Practical Handbook of Internet Computing
By this decomposition we achieve the following: • Searches for resource keys can be started from any node and by either successively continuing the search along the peer’s own path in case the key matches at that level, or by forwarding the search to a peer whose address is in the routing table at the corresponding level. • As the depth of the tree is logarithmic in the number of peers, the sizes of routing tables and the number of steps and thus messages for search are logarithmic. Therefore, storage, search latency, and message bandwidth scale well. The most important criteria to distinguish the different variations of prefix/suffix routing used in practice are the following: • • • • • •
The method used to assign peers to their position in the tree and to construct the routing tables Not all leaves are associated with peers and neighboring peers are taking over their responsibility N-ary trees instead of binary trees, which reduces the tree depth and search latency substantially Multiple routing entries at each level to increase failure resilience Using unbalanced trees for storage load balancing Replication of index information, either by assigning multiple peers to the same leaf or by sharing index information among neighboring leaves
We will go into some of the differences in the following when introducing several structured P2P systems. 35.3.4.3 Distributed Hash Table Approaches Distributed hash table (DHT) approaches, such as Chord [Dabek et al., 2001], Pastry [Rowstron and Druschel, 2001], and Tapestry [Zhao et al., 2004], hash both peer addresses and resource keys into the same key space underlying the construction of the prefix/suffix routing scheme. This key space consists of binary or n-ary keys of fixed length. The key determines the position of the node in the tree and the index information it has to store. Often, it is assumed that a peer also stores the associated resources. Typically a peer stores all resources with keys numerically closest to it, e.g., in Pastry, or with keys in the interval starting from its own key to a neighbor’s key, such as in Chord. When a peer wants to join a network, it first has to obtain a globally unique key that is determined from the key space typically by applying a hash function to the node’s physical (IP) address. Upon entry, it has to contact a peer already in the network, which it has to know by some out-of-band means, as in the other P2P approaches. After contacting this peer, the new peer starts a search for its key. At each step of this search, i.e., when traversing down the tree level by level, it will encounter peers that share a common prefix with its key. This allows the new peer to obtain entries from the routing tables of those peers that are necessary to fill its own routing table properly. Also it allows the current peers to learn about the new peer and enter its address into their routing tables. After the new peer has located the peer that is currently responsible for storing index information related to its own key, it can take over from this peer that part of the index information that it is responsible for. Similarly, the network is reorganized upon the departure of a peer. Given this approach for joining and leaving a peer network, different subnetworks cannot develop independently and be merged later (such as in Gnutella or Freenet), which explains why DHT approaches are tightly coupled. 35.3.4.4 P-Grid P-Grid [Aberer et al., 2003] is a loosely coupled peer-to-peer resource location system combining the flexibility of unstructured P2P networks with the performance of a prefix routing scheme. It differs from DHT approaches by its approach for constructing the routing tables. P-Grid constructs the routing tables by exploring the network through randomized interactions among peers. The random meetings are initiated either by the peers themselves, similar to how Gnutella uses Ping messages, or to how Freenet exploits queries and data insertions for this purpose. Initially each peer covers the complete key space (or in other words is associated with the root of the search tree). When Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 15 Wednesday, August 4, 2004 9:09 AM
35-15
Peer-to-Peer Systems
Legend: X
Peer X
P:X Routing table (route keys with prefix P to peer X) P
0
Data store (keys have prefix P)
query(6, 100)
00
1
query(5, 100)
01
10
11
query(4, 100), found! 1
6
2
3
4
5
1 :3 01 : 2
1 :5 01 : 2
1 :4 00 : 6
0 :2 11 : 5
0 :6 11 : 5
0 :6 10 : 4
00
00
01
10
10
11
FIGURE 35.6 Example P-Grid.
two peers meet that cover the same key space, they can split the space into two parts and each peer associates itself with one of the two parts, i.e., one of the new paths in the search tree. The two peers reference each other in order to construct their routing tables at the newly created level of the search tree. Such splits are only performed if a sufficiently large number of data items is stored at the peers for the newly created subspaces. Otherwise, the peers do not split but replicate their data, i.e., they stay responsible for the same index information and become replicas of each other. Figure 35.6 shows an example P-grid. If two peers meet that do not cover the same key space, they use their already existing routing structure to search for more candidates they could contact and then can use to exchange routing information to improve their routing tables. Through this adaptive method of path specialization, P-Grid’s distributed tree adapts its shape to the data distribution, which makes it particularly suitable for skewed data distributions. This may result in unbalanced trees but the expected number of messages to perform a search remains logarithmic [Aberer, 2002]. By virtue of this construction process peers adopt their index responsibility incrementally within the P-Grid routing infrastructure (in terms of data keys for which they are responsible), and different networks can develop independently and be merged. 35.3.4.5 Topological Routing Topological Routing, as introduced in CAN [Ratnasamy et al., 2001], uses a model different from prefix routing for organizing routing tables. As in DHT approaches, peer addresses and resource keys are hashed into a common key space. For topological routing these keys are taken from a d-dimensional space, i.e., the keys consist of a d-dimensional vector of simple keys, e.g., binary keys of fixed length. More precisely the d-dimensional space is a d-dimensional torus because all calculations on keys are performed modulo a fixed base. The dimension d is usually low, e.g., d = 2 º 10. Each peer is responsible for a d-dimensional, rectangular sub-volume of this space. An example for dimension d = 2 is shown in Figure 35.7. Peers are only aware of other peers that are responsible for neighboring subvolumes. Thus the routing tables contain the neighborhood information together with the direction of the neighbor. Searches are processed by routing the request to the peer that manages the subvolume with coordinates that are closer to the search key than the requester’s coordinates. When a peer joins the network it selects (or obtains) randomly a coordinate of the d-dimensional space. Then it searches for this coordinate and will encounter Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 16 Wednesday, August 4, 2004 9:09 AM
35-16
The Practical Handbook of Internet Computing
p3
p3 p4
p4
p1 p1
p2
p2 p6
p5
p5
(x1, x2) neighbors(p1)={p2, p3, p4, p5}
neighbors(p1)={p2, p3, p4, p6} neighbors(p6)={p1, p2, p4, p5}
FIGURE 35.7 Example CAN.
the peer responsible for the subvolume holding the coordinate. The subvolume will be split in two along one dimension and the routing tables containing the neighbors will be reorganized. The search-and-join costs in CAN do not only depend on the network size but in particular on the choice of the dimension d. Searches in CAN have an expected cost of O(dn)1/d and the insertion of a node incurs an additional constant cost of O(d) for reorganizing the neighborhood. CAN is a strongly coupled P2P network as the identity of peers is fixed when peers join the network.
35.4 Comparative Evaluation of P2P Systems This section compares the various flavors of P2P networks in respect to a set of criteria. Due to space limitations we discuss only functional and nonfunctional properties that we consider most important or which are not discussed in related work. For more detailed comparisons we refer the reader to Milojicic et al., [2002].
35.4.1 Performance The key factors for comparing P2P systems are the search and maintenance costs. With respect to search cost, from a user’s perspective, search latency should be low, whereas from a system’s perspective the total communication costs, i.e., number and size of messages and number of connections (permanent and on-demand), are relevant. With respect to maintenance costs, which are only relevant from a system’s perspective, we can distinguish communication costs, both for updates of data and for changes in the network structure, and the additional storage costs for supporting search, in terms of routing and indexing information required to establish the overlay network. As storage costs are likely to become less and less critical, it even seems to be possible that peers might become powerful enough to store indices for complete P2P networks. However, this simplistic argument does not apply, as keeping the index information consistent in such systems would incur unacceptably high costs. Thus distribution of routing and indexing information is not just a matter of reducing storage cost, but in particular of reducing the maintenance costs for this information. Figure 35.8 informally summarizes the performance characteristics of the important classes of P2P systems that have been discussed earlier. We have excluded the random walker approach because no analytical results are available yet and have included the approach of fully replicating the index information for comparison, i.e., every peer would act as a Napster server. The following notation is used: n denotes the number of peers in a network. n, log(n), dn1/d, 1, etc. are used as shorthand for the O() notation, i.e., within a constant factor these functions provide an upper bound. We do not distinguish different bases of logarithms as these are specific to choices of parameters Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 17 Wednesday, August 4, 2004 9:09 AM
35-17
Peer-to-Peer Systems
Approach Gossiping (Gnutella) Directory server (Napster) Full replication Super-peers Prefix routing Topological Routing (CAN)
Latency log(n) 1 1 log(C) log(n)
Messages n 1 1 C log(n)
Update cost 1 1 n 1 log(n)
dn1/d
dn1/d
dn1/d
Storage 1 n (max), 1 (avg) n C (max), 1 (avg) log(n) d
FIGURE 35.8 Performance comparison of P2P approaches.
in the systems. d and C are constants that are used to parameterize the systems. Where necessary we distinguish average from maximum bounds. For Gnutella we can see that search latency is low due to the structure of its network graph. However, network bandwidth consumption is high and grows linearly in n. More precisely, if c is the number of outgoing links of a node, cn is an upper bound for the number of messages generated. On the other hand update and storage costs are constant because no data dependencies exist. On the other end of the spectrum we find Napster and full replication: They exhibit constant search costs but high update and storage costs. Super-peers trade a modest increase of search costs — assuming they use the same gossiping scheme as Gnutella — for a reduction of the storage load on the server peers. C is the number of superpeers in the system. As we can see, structured networks balance all costs, namely search latency, bandwidth consumption, update costs, and storage costs. This balancing is the reason that makes these approaches so attractive as the foundation of the next generation of P2P systems. The schemes that are based on some variation of prefix routing incur logarithmic costs for all of these measures. Also Freenet exhibits similar behavior, but as analytical results are lacking (only simulations exist so far) we cannot include it in the comparison. Topological routing deviates from the cost distribution scheme of structured networks, but balanced cost distributions can be achieved by proper choices of the dimension parameter d. A more detailed comparison of the approaches would have to include the costs for different replication schemes and the costs incurred for network maintenance, e.g., for joining the network or repairing the network after failures. Also other parameters such as failure rates for nodes or query and data distributions influence the relative performance. However, a complete comparison at this level of detail is beyond the scope of this overview and also still part of active research.
35.4.2 Functional and Qualitative Properties 35.4.2.1 Search Predicates Besides performance, an important distinction among P2P approaches concerns the support of different types of search predicates. A substantial advantage of unstructured and hierarchical P2P systems is the potential support of arbitrary predicates, as their use is not constrained by the resource location infrastructure. In both of these classes of P2P systems, search predicates are evaluated only locally, which enables the use of classical database query or information retrieval processing techniques. In part, this explains the success of these systems despite their potentially less advantageous global performance characteristics. As soon as the resource location infrastructure exploits properties of the keys in order to structure the search space and to decide on the forwarding of search requests if they cannot be answered locally, the support of search predicates is constrained. A number of approaches, including Chord, CAN, Freenet, and Pastry intentionally hash search keys, either for security or load balancing purposes, and thus lose all potential semantics contained in the search keys that could be used to express search predicates. Therefore, they support exclusively the search for keys identical to the search key, or in other words they support only the equality predicate for search. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 18 Wednesday, August 4, 2004 9:09 AM
35-18
The Practical Handbook of Internet Computing
More complex predicates, such as range or similarity, cannot be applied in a meaningful way. Nevertheless, these approaches have the potential to support more complex predicates. The prefix-routingbased approaches implement a distributed trie structure that can support prefix queries and thus range queries. For P-Grid prefix-preserving hashing of keys has been applied to exploit this property. Freenet uses routing based on lexicographic similarity. Thus, if the search keys were not hashed (which is done for security purposes to support anonymity in Freenet), queries could also find similar keys and not only the identical one. CAN uses a multidimensional space for keys, and thus data keys that, for example, bear spatial semantics might be mapped into this key space preserving spatial relationships. Thus spatial neighborhood searches would be possible. Beyond the support of more complex atomic predicates also the ability of supporting value-based joins among data items, similar to relational databases, is of relevance and subject of current research [Harren et al., 2002]. 35.4.2.2 Replication Because the peers in a P2P network are assumed to be unreliable and frequently offline or unreachable, most resource location systems support replication mechanisms to increase failure resilience. Replication exists in two flavors in P2P systems: at the data level and possibly at the index level. Replication of data objects is applied to increase the availability of the data objects in the peer network. Hierarchical and structured P2P networks additionally replicate index information to enhance the probability of successfully routing search requests. For data replication we can distinguish four different methods that are employed, depending on the mechanism to initiate the replication: Owner replication: A data object is replicated to the peer that has successfully located it through a query. This form of replication occurs naturally in P2P file sharing systems such as Gnutella (unstructured), Napster (hierarchical), and Kazaa (super-peers) because peers implicitly make available to other users the data that they have found and downloaded (though this feature can be turned off by the user). Path replication: A data object is replicated along the search path that is traversed as part of a search. This form of replication is used in Freenet, which routes results back to the requester along the search path in order to achieve a data clustering effect for accelerating future searches. This strategy would also be applicable to unstructured P2P networks in order to replicate data more aggressively. Random replication: A data object is replicated as part of a randomized process. In P-Grid random replication is part of the construction of the P2P network. If peers do not find enough data to justify a further refinement of their routing tables, they replicate each others’ data. For unstructured networks it has been shown that random replication, initiated by searches and implemented by selecting random nodes visited during the search process, is superior to owner and path replication [Lv et al., 2002]. Controlled replication: Here data objects are actively replicated a pre-specified number of times when they are inserted into the network. This approach is used in strongly coupled P2P networks such as Chord, CAN, Tapestry, and Pastry. We can distinguish two principal approaches: Either a fixed number of structured networks is constructed in parallel or multiple peers are associated with the same or overlapping parts of the data key space. Index replication is applied in structured and in hierarchical P2P networks. For the super-peer approach it has been shown that having multiple replicated super-peers maintaining the same index information increases system performance [Yang and Garcia-Molina, 2002]. Structured P2P networks typically maintain multiple entries for the same routing path to have alternative routing paths at hand in case a referenced node fails or is offline. 35.4.2.3 Security As in any distributed system, security plays a vital role in P2P systems. However, only a limited amount of work has been dedicated to this issue so far. At the moment, trust and reputation management, anonymity vs. identification, and denial-of-service (DOS) attacks seem to be the most relevant security aspects related to P2P systems: Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 19 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-19
Trust and reputation management: P2P systems depend on the cooperation of the participants of the system. Phrased differently, this means that each participant trusts the other participants that it interacts with in terms of proper routing, exchange of index information, and provision of proper (noncorrupted) data or, more generally, quality of service. For example, a peer could return false hits that hold advertisements instead of the content the requester originally was looking for, or a quality of service guarantee given by a peer could not be fulfilled. These are just two examples, but in fact trust and reputation management are crucial to make P2P systems a viable architectural alternative for systems beyond mere sharing of free files. The main requirements of trust and reputation management in a P2P setting are (full) decentralization of the trust information and robustness against positive/negative feedback and collusions. Several approaches have have already been proposed but have mostly been applied in experimental settings — for example for Gnutella [Damiani et al., 2003] and for P-Grid [Aberer and Despotovic, 2001], and have not found their way into generally available software distributions yet. Anonymity vs. identification: Anonymity and identification in P2P systems serve conflicting purposes: Anonymity tries to protect “free speech” [Clarke et al., 2002] in the broadest sense whereas identification is mandatory in commercial systems to provide properties such as trust, non-repudiation, and accountability. So far anonymity and identification issues have only been addressed in some P2P systems. For example, Freenet uses an approach that makes it impossible to find out the origin of data or what data a peer stores (caches) [Clarke et al., 2002]. Thus nobody can be legally challenged even if illegal content is stored or distributed. For identification purposes, existing public key infrastructures (PKI) could be used. However, this would introduce a form of centralization and may harm scalability. P-Grid proposes a decentralized yet probabilistically secure identification approach [Aberer et al., 2004] that can also provide PKI functionality [Aberer et al., to be published]. Other P2P systems discussed in this chapter do not address anonymity or identification explicitly. DOS attacks: At the moment, the proper functioning of P2P systems depends on the well-behaved cooperation of the participants. However, as P2P systems are large-scale distributed systems, the possibilities for DOS attacks are numerous. For example, in systems that use a distributed index such as Chord, Freenet, or P-Grid, the provision of false routing information would be disastrous or query flooding in unstructured systems such as Gnutella could easily overload the network. Currently only little work exists on the prevention of attacks in P2P systems (although more work exists on the analysis of attack scenarios [see Daswani and Garcia-Molina, 2002]): Pastry provides an approach to secure routing [Castro et al., 2002] which, however, is rather costly in terms of bandwidth consumption and Freenet [Clarke et al., 2002] can secure data so that it can only be changed by the owner. Otherwise the P2P systems discussed in this chapter do not address this issue. 35.4.2.4 Autonomy A P2P system is composed of autonomous peers by definition, i.e., peers that belong to different users and organizations with no or only limited authority to influence their operation or behavior. In technical terms autonomy means that peers can decide independently on their role and behavior in the system, which, if done properly, is a key factor to provide scalability, robustness, and flexibility in P2P systems. Consequently, a higher degree of peer autonomy also implies that a higher degree of self-organization mechanisms are required to provide a system with meaningful behavior. Though all P2P systems claim their peers to be autonomous, a closer look at existing systems reveals that this statement is true only to a varying degree. Hierarchical systems such as Napster and Kazaa limit the autonomy of peers to a considerable degree by their inherent centralization. This makes them less scalable, which must be compensated by considerable investments into their centralized infrastructures. Also robustness is harder to achieve because special point-of-failures (super-peers) exist. On the other hand, overall management is simpler than in systems with greater autonomy. Unstructured systems like Gnutella offer the highest degree of autonomy. Such systems are very robust in scale but pay these advantages with considerable resource consumption. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 20 Wednesday, August 4, 2004 9:09 AM
35-20
The Practical Handbook of Internet Computing
Structured systems, as the third architectural alternative, balance the advantages of autonomy with resource consumption. Within this family many degrees of autonomy exist: Freenet offers a degree of autonomy that nearly reaches Gnutella’s. However, the applied mechanisms have inhibited the development of an analytical model for Freenet so far and thus its properties have only been evaluated by simulations. P-Grid offers peer autonomy at a similar level to Freenet but additionally provides a mathematical model that enables quantitative statements on the system and its behavior. Freenet and P-Grid are loosely coupled and thus share the flexibility of network evolution with unstructured systems like Gnutella, i.e., peer communities can develop independently, merge, and split. Tightly coupled systems such as CAN, Chord, Tapestry, and Pastry, impose stricter control on the peers in terms of routing table entries, responsibilities, and global knowledge. This offers some advantages but limits the flexibility of the systems, for example, splitting and merging independent peer communities is impossible.
35.5 Conclusions In this chapter we have tried to provide a concise overview of the current state of research in P2P systems. Our goal was to communicate the fundamental concepts that underly P2P systems, to provide further insights by a detailed presentation of the problem of resource location in P2P environments, and give a short, but to-the-point, comparative evaluation of state-of-the-art systems to enable the reader to understand the performance implications and resource consumption issues of the various systems. We have, however, omitted some areas from the discussion due to space limitations. For example, we did not discuss advanced functionalities beyond resource location such as support for update functionality [Datta et al., 2003] or applications in information retrieval [Aberer and Wu, 2003], which would help to broaden the functionalities of P2P systems and in turn would increase the applicability of the P2P paradigm to domains beyond mere file sharing. Some interesting fields of study are the application of economic principles to P2P systems [Golle et al., 2001], implications of the “social behavior” of the peers in a system [Adar and Huberman, 2000], and trust and reputation management in P2P systems [Aberer and Despotovic, 2001]. Interdisciplinary research among other disciplines studying complex systems such as economy, biology, and sociology will definitely be a direction attracting growing interest in the future. There are a number of other interesting topics related to P2P that we could not address. However, we believe that we have provided the interested reader with sufficient basic know-how to conduct further studies.
References Aberer, Karl. Scalable Data Access in P2P Systems Using Unbalanced Search Trees. In Proceedings of Workshop on Distributed Data and Structures (WDAS-2002), Paris, 2002. Aberer, Karl and Zoran Despotovic. Managing Trust in a Peer-2-Peer Information System. In Proceedings of the 10th International Conference on Information and Knowledge Management (2001 ACM CIKM), pages 310–317. ACM Press, New York, 2001. Aberer, Karl, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic, Manfred Hauswirth, Magdalena Punceva, and Roman Schmidt. P-Grid: A Self-organizing Structured P2P System. SIGMOD Record, 32(3), September 2003. Aberer, Karl, Anwitaman Datta, and Manfred Hauswirth. Efficient, self-contained handling of identity in peer-to-peer systems. IEEE Transactions on Knowledge and Data Engineering, 16(7): 858–869, July 2004. Aberer, Karl, Anwitaman Datta, and Manfred Hauswirth. A decentralized public key infrastrcuture for customer-to-customer e-commerce. International Journal of Business Process Integration and Management. In press. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 21 Wednesday, August 4, 2004 9:09 AM
Peer-to-Peer Systems
35-21
Aberer, Karl and Jie Wu. A Framework for Decentralized Ranking in Web Information Retrieval. In Proceedings of the Fifth Asia Pacific Web Conference (APWeb 2003), number 2642 in Lecture Notes in Computer Science, pages 213–226, 2003. Adar, Eytan and Bernardo A. Huberman. Free Riding on Gnutella. First Monday, 5(10), 2000. http:// firstmonday.org/issues/issue5-10/adar/index.html. Barabási, Albert-László and Réka Albert. Emergence of scaling in random networks. Science, 286: 509–512, 1999. Bonabeau, Eric, Marco Dorigo, and Guy Theraulaz. Swarm intelligence: from natural to artificial systems. Oxford University Press, Oxford, U.K., 1999. Castro, Miguel, Peter Druschel, Ayalvadi Ganesh, Antony Rowstron, and Dan S. Wallach. Secure routing for structured peer-to-peer overlay networks. In Proceedings of Operating Systems Design and Implementation (OSDI, 2002). Clarke, Ian, Scott G. Miller, Theodore W. Hong, Oskar Sandberg, and Brandon Wiley. Protecting free expression online with freenet. IEEE Internet Computing, 6(1): 40–49, January/February 2002. Clarke, Ian, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. In Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability, number 2009 in Lecture Notes in Computer Science, 2001. Clip2. The Gnutella Protocol Specification v0.4 (Document Revision 1.2), June 2001. http:// www9.limewire.com/developer/gnutella-protocol-0.4.pdf. Dabek, Frank, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, and Hari Balakrishnan. Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service. In Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), pages 81–86, 2001. Damiani, Ernesto, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samerati. Managing and sharing servents’ reputations in P2P systems. Transactions on Knowledge and Data Engineering, 15(4): 840–854, July/August 2003. Daswani, Neil and Hector Garcia-Molina. Query-flood DoS attacks in Gnutella. In Proceedings of the 9th ACM conference on Computer and Communications Security (CCS), pages 181–192, 2002. Datta, Anwitaman, Manfred Hauswirth, and Karl Aberer. Updates in Highly Unreliable, Replicated Peerto-Peer Systems. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS’03), pages 76–87, 2003. Flake, Gary William, Steve Lawrence, C. Lee Giles, and Frans Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3): 66–71, 2002. Golle, Philippe, Kevin Leyton-Brown, and Ilya Mironov. Incentives for sharing in peer-to-peer networks. In Proceedings of the Second International Workshop on Electronic Commerce (WELCOM 2001), number 2232 in Lecture Notes in Computer Science, pages 75–87, 2001. Gong, Li. JXTA: A Network Programming Environment. IEEE Internet Computing, 5(3): 88–95, May/ June 2001. Harren, Matthew, Joseph M. Hellerstein, Ryan Huebsch, Boon Thau Loo, Scott Shenker, and Ion Stoica. Complex Queries in DHT-based Peer-to-Peer Networks. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS ’02), volume 2429 of Lecture Notes in Computer Science, pages 242–259, 2002. Heylighen, Francis. Self-organization. Principia Cybernetica Web, January 1997. http:// pespmc1.vub.ac.be/SELFORG.html. Kleinberg, Jon. The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pages 163–170, 2000. Lv, Qin, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and Replication in Unstructured Peerto-Peer Networks. In Proceedings of the 2002 International Conference on Supercomputing, pages 84–95, 2002. Copyright 2005 by CRC Press LLC
C3812_C35.fm Page 22 Wednesday, August 4, 2004 9:09 AM
35-22
The Practical Handbook of Internet Computing
Milojicic, Dejan S., Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne, Bruno Richard, Sami Rollins, and Zhichen Xu. Peer-to-Peer Computing. Technical Report HPL-2002-57, HP Laboratories Palo Alto, CA, March 2002. http://www.hpl.hp.com/techreports/2002/HPL-2002-57.pdf. Plaxton, C. Greg, Rajmohan Rajaraman, and Andréa W. Richa. Accessing Nearby Copies of Replicated Objects in a Distributed Environment. In Proceedings of the 9th Annual Symposium on Parallel Algorithms and Architectures, pages 311–320, 1997. Ratnasamy, Sylvia, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A Scalable ContentAddressable Network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pages 161–I72, 2001. Ripeanu, Matei and Ian Foster. Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS ’02), volume 2429 of Lecture Notes in Computer Science, pages 85–93, 2002. Rowstron, Antony and Peter Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), number 2218 in Lecture Notes in Computer Science, pages 329–350, 2001. Sarshar, N., V. Roychowdury, and P. Oscar Boykin. Percolation-Based Search on Unstructured Peer-ToPeer Networks, 2003. http://www.ee.ucla.edu/~nima/Publications/search_ITPTS.pdf. Watts, Duncan and Steven Strogatz. Collective dynamics of small-world networks. Nature, 393, 1998. Yang, Beverly and Hector Garcia-Molina. Improving Search in Peer-to-Peer Networks. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDS ’02), pages 5–14, 2002. Zhao, Ben Y., Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John Kubiatowicz. Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, 22(1):41–53, January 2004.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 1 Wednesday, August 4, 2004 9:12 AM
36 Data and Services for Mobile Computing CONTENTS Abstract 36.1 Introduction 36.2 Mobile Computing vs. Wired-Network Computing 36.3 M-Services Application Architectures 36.4 Mobile Computing Application Framework
Sasikanth Avancha Dipanjan Chakraborty Filip Perich Anupam Joshi
36.4.1 36.4.2 36.4.3 36.4.4 36.4.5 36.4.6 36.4.7
Communications Layer Discovery Layer Location Management Layer Data Management Layer Service Management Layer Security Plane System Management Plane
36.5 Conclusions Acknowledgments References
Abstract The advent and phenomenal growth of low-cost, lightweight, portable computers concomitant with that of the Internet have led to the concept of mobile computing. Protocols and mechanisms used in Internet computing are being modified and enhanced to adapt to “mobile” computers. New protocols and standards are also being developed to enable mobile computers to connect to each other and to the Internet through both wired and wireless interfaces. The primary goal of the mobile computing paradigm is to enable mobile computers to accomplish tasks using all possible resources, i.e., data and services available in the network, anywhere, anytime. In this chapter we survey the state of the art of mobile computing and its progress toward its goals. We also present a comprehensive, flexible framework to develop applications for mobile computing. The framework consists of protocols, techniques, and mechanisms that enable applications to discover and manage data and services in wired, infrastructuresupported wireless and mobile ad hoc networks.
36.1 Introduction The term computing device or computer usually evokes the image of a big, powerful machine, located in an office or home, that is always on and possibly connected to the Internet. The rapid growth of lightweight, easily and constantly available devices — available even when one is on the move — has dramatically altered this image. Coupled with the potential for easy network access, the growth of these
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 2 Wednesday, August 4, 2004 9:12 AM
36-2
The Practical Handbook of Internet Computing
mobile devices has tremendously increased our capability to take computing services with us wherever we go. The combination of device mobility and computing power has resulted in the mobile computing paradigm. In this paradigm, computing power is constantly at hand irrespective of whether the mobile device is connected to the Internet or not. The smaller the devices, the greater their portability and mobility, but the lesser their computing capability. It is important to understand that the ultimate goal of the mobile computing paradigm is to enable people to accomplish tasks using computing devices, anytime, anywhere. To achieve this goal, network connectivity must become an essential part of mobile computing devices. The underlying network connectivity in mobile computing is, typically, wireless. Portable Computing is a variant of mobile computing that includes the use of wired interfaces (e.g., a telephone modem) of mobile devices. For instance, a laptop equipped with both a wireless and a wired interface connects via the former when the user is walking down a hallway (mobile computing), but switches to the latter when the user is in the office (portable computing). The benefits of mobility afforded by computing devices are greatly reduced, if not completely eliminated, if devices can only depend on a wired interface for their network connectivity (e.g., telephone or network jack). It is more useful for a mobile computing device to use wireless interfaces for network connectivity when required. Additionally, networked sources of information may also become mobile. This leads to a related area of research called ubiquitous computing. Let us now discuss the hardware characteristics of current-generation mobile computing devices. The emphasis in designing mobile devices is to conserve energy and storage space. These requirements are evident in the following characteristics, which are of particular interest in mobile computing: Size, form factor, and weight. Mobile devices, with the exception of high-end laptops, are hand-helds (e.g., cell phones, PDAs, pen computers, tablet PCs). They are lightweight and portable. Mobility and portability of these devices are traded off for greater storage capacity and higher processing capability. Microprocessor. Most current-generation mobile devices use low-power microprocessors, such as the family of ARM and XScale processors, in order to conserve energy. Thus, high performance is traded off for energy consumption because the former is not as crucial to mobile devices as the latter. Memory size and type. Primary storage sizes in mobile devices range anywhere between 8 and 64 MB. Mobile devices may additionally employ flash ROMs for secondary storage. Higher-end mobile devices, such as pen computers, use hard drives with sizes of the order of gigabytes. In mid-range devices, such as the iPAQ or Palm, approximately half of the primary memory is used for the kernel and operating system leaving the remaining memory for applications. This limited capacity is again used to trade off better performance for lower energy consumption. Screen size and type. The use of LCD technology and viewable screen diagonal lengths between 2 and 10 in. are common characteristics of mobile devices. The CRT technology used in desktop monitors typically consumes approximately 120 W, whereas the LCD technology used in PDAs consumes only between 300 and 500 mW. As with the other characteristics, higher screen resolution is traded off for lower power consumption; however, future improvements in LCD technology may provide better resolution with little or no increase in power consumption. Input mechanisms. The most common input mechanisms for mobile devices are built-in key-pads, pens, and touch-screen interfaces. Usually, PDAs contain software keyboards; newer PDAs may also support external keyboards. Some devices also use voice as an input mechanism. Mobility and portability of devices are primary factors in the design of these traditional interfaces for cell phones, PDAs, and pen computers. Human–computer interaction (HCI) is a topic of considerable research and impacts the marketability of a mobile device. For example, a cell phone that could also be used as a PDA should not require user input via keys or buttons in the PDA mode; rather, it should accept voice input. Communication interfaces. As discussed above, mobile devices can support both wired and wireless communication interfaces, depending on their capabilities. We shall concentrate on wireless interfaces in this context. As far as mobile devices are concerned, wireless communication is either short range or long range. Short-range wireless technologies include infrared (IR), Global System for Mobile Communications (GSM), IEEE 802.11a/b/g, and Bluetooth. IR, which is part of the optical spectrum, requires line-of-sight communication whereas the other three, which are part of the radio spectrum, can function
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 3 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-3
as long as the two devices are in radio range and do not require line of sight. Long-range wireless technologies include satellite communications, which are also part of the radio spectrum. Although wireless interfaces provide network connectivity to mobile devices, they pose some serious challenges when compared to wired interfaces. Frequent disconnections, low and variable bandwidth — and most importantly, increased security risks — are some of these challenges. The discussion thus far clearly suggests that mobile computing is not limited to the technical challenges of reducing the size of the computer and adding a wireless interface to it. It encompasses the problems and solutions associated with enabling people to use the computing power of their devices anytime, anywhere, possibly with network connectivity.
36.2 Mobile Computing vs. Wired-Network Computing We now compare mobile computing and wired-network computing from the network perspective. For the purposes of this discussion, we consider only the wireless networking aspect of mobile computing. We shall also use the terms wired-network computing and wired computing interchangeably. We shall compare mobile computing and wired computing based on layers 1 through 4 of the standard 7-layer Open Standards Interconnection (OSI) stack. Figure 36.1 shows the Physical, Data Link (comprising the Link Management and Medium Access Control sub-layers), Network, and Transport layers of the two stacks. The Physical layer: In the network stack for mobile computing, the physical layer consists of two primary media — the radio spectrum and the optical spectrum. The radio spectrum is divided into licensed and unlicensed frequency bands. Cellular phone technologies use the licensed bands whereas technologies such as Bluetooth and IEEE 802.11b use the unlicensed band. The optical spectrum is mainly used by infrared devices. The network stack for wired computing consists of cable technologies such as coaxial cable and optical fiber. The Medium Access Control (MAC) sub-layer: The most frequently adopted MAC mechanism in the wired computing network stack is the well-known Carrier Sense Multiple Access with Collision Detection (CSMA/CD). It is also well known that CSMA/CD cannot be directly applied to the mobile computing stack because it would cause collisions to occur at the receiver as opposed to the sender. To prevent this situation, researchers and practitioners have designed different mechanisms based on collision avoidance and synchronizing transmissions. CSMA with Collision Avoidance (CSMA/CA) helps transmitters determine if other devices around them are also preparing to transmit, and, if so, to avoid collisions by deferring transmission. Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Code Division Multiple Access (CDMA), and Digital Sense Multiple Access with Collision Detection (DSMA/CD) are other popular MAC protocols used by mobile device network stacks to coordinate transmissions. The Link Management sub-layer: This layer is present in only a few network stacks of mobile devices. For example, the IEEE 802.11b standard describes only the Physical and MAC layers as part of the specification. Some of the link management protocols on mobile device network stacks are required to handle voice connections (usually connection-oriented links), in addition to primarily connection-less data links. The Logical Link Control and Adaptation Protocol (L2CAP) in Bluetooth is an example of such a protocol. GSM uses a variant of the well-known Link Access Protocol D-channel (LAPD) called LAPDm. High-level Data Link Control (HDLC), Point-to-Point Protocol (PPP), and Asynchronous Transfer Mode (ATM) are the most popular data link protocols used in wired networks. The Network layer: Mobility of devices introduces a new dimension to routing protocols, which reside in the network layer of the OSI stack. Routing protocols for mobile networks, both ad hoc and infrastructure supported, need to be aware of mobile device characteristics such as mobility and energy consumption. Unlike static devices, mobile devices cannot always depend on a static address, such as an IP address. This is because they need to be able to attach to different points in the network, public or private. Routing protocols such as Mobile IP [Perkins, 1997] enable devices to dynamically obtain an IP address and connect to any IP-based network while they are on the move. This solution requires the
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 4 Wednesday, August 4, 2004 9:12 AM
36-4
The Practical Handbook of Internet Computing
TCP & its variants, Wireless Transaction Protocol
TRANSPORT
TCP, TP4
IP, Mobile IP, Routing Protocols for MANETs
NETWORK
IP (with IPSec), CLNP
L2CAP (Bluetooth), LAPDm (GSM)
LINK
HDLC, PPP (IP), ATM
CSMA/CA (802.11b), TDMA (Bluetooth), TD-FDMA (GSM), CDMA
MAC
CSMA/CD (Ethernet), Token Ring, FDDI
Radio Transceiver (802.11b, Bluetooth, GSM) Optical Transceiver (IR, Laser)
PHYSICAL
Co-axial Cable, Optical Fiber
FIGURE 36.1 Network stack comparison of mobile and wired-network computing.
existence of a central network (i.e., home network) that tracks the mobile device and knows its current destination network. Routers are the linchpins of the Internet. They decide how to route incoming traffic based on addresses carried by the data packets. In ad hoc networks, no static routers exist. Many nodes in the network may have to perform the routing function because the routers may be mobile, and thus move in and out of range of senders and receivers. All of these considerations have focused research on developing efficient routing protocols in mobile ad hoc networks (MANET). The Transport layer: TCP has been the protocol of choice for the Internet. TCP performs very well on wired networks, which have high bandwidth and low delay. However, research on TCP performance over wireless networks has shown that it typically fails if non-congestion losses (losses due to wireless channel errors or client mobility) occur on the wireless link. This is because TCP implicitly assumes that all losses are due to congestion and reduces the window on the sender. If the losses are not due to congestion, TCP unnecessarily reduces throughput, leading to poor performance. Solutions to this problem include designing new transport protocols, such as CentaurusComm [Avancha et al., 2002b], that are more mobile-aware and modifying TCP to make it more mobile-aware. Modified versions of TCP [Barke and Badrinath, 1995; Brown and Singh, 1997; Goff et al., 2000] are well known in the research community. The Wireless Transaction Protocol (WTP) is part of the well-known Wireless Application Protocol (WAP) stack and provides reliable data transmission using retransmission, segmentation, and reassembly, as required. Both academia and industry have contributed significantly to mobile-computing research aimed at designing the best possible network stack that takes into account the challenges of reduced computer size and computing power, energy conservation, and low-bandwidth, high-delay wireless interfaces. The discussion in this section provides a glimpse of the solutions applied to the most significant layers of the wired-network stack in order to address these challenges.
36.3 M-Services Application Architectures Mobile computing applications can be classified into three categories — client-server, client-proxy-server, and peer-to-peer — depending on the interaction model. Evolution of mobile applications started from common distributed object-oriented systems such as CORBA and DCOM [Sessions, 1997], which primarily follow client-server architecture. The emergence of heterogeneous mobile devices with varying capabilities has subsequently popularized the client-proxy-server architecture. Increasing computational capabilities of mobile devices and the emergence of ad hoc networks is leading to a rapid growth of peerto-peer architectures, similar to Gnutella in the Internet.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 5 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-5
In the client-server architecture, a large number of mobile devices can connect to a small number of servers residing on the wired network, organized as a cluster. The servers are powerful machines with high bandwidth, wired-network connectivity and the capability to connect to wireless devices. Primary data and services reside on and are managed by the server, whereas clients locate servers and issue requests. Servers are also responsible for handling lower level networking details such as disconnection and retransmission. The advantages of this architecture are simplicity of the client design and straightforward cooperation among cluster servers. The main drawback of this architecture is the prohibitively large overhead on servers in handling each mobile client separately in terms of transcoding and connection handling, thus severely affecting system scalability. In the client-proxy-server architecture, a proxy is introduced between the client and the server, typically on the edge of the wired network. The logical end-to-end connection between each server and client is split into two physical connections, server-to-proxy and proxy-to-client. This architecture increases overall system scalability because servers only interact with a fixed number of proxies, which handle transcoding and wireless connections to the clients. There has been substantial research and industry effort [Brooks et al., 1995; Zenel, 1995; Bharadvaj et al., 1998; Joshi et al., 1996] in developing clientproxy-server architectures. Additionally, intelligent proxies [Pullela et al., 2000] may act as computational platforms for processing queries on behalf of resource-limited mobile clients. Transcoding, i.e., conversion of data and image formats to suit target systems, is an important problem introduced by client-server and client-proxy-server architectures. Servers and proxies are powerful machines that, unlike mobile devices, can handle data formats of any type and image formats of high resolution. Therefore, data on the wired network must be transcoded to suit different mobile devices. It is therefore important for the server or proxy to recognize the characteristics of a client device. Standard techniques of transcoding, such as those included in the WAP stack, include XSLT [Muench and Scardina, 2001] and Fourier transformation. The W3C CC/PP standard [Klyne et al., 2001] enables clients to specify their characteristics when connecting to HTTP servers using profiles. In the peer-to-peer architecture, all devices, mobile and static, are peers. Mobile devices may act servers and clients. Ad hoc network technologies such as Bluetooth allow mobile devices to utilize peer resources in their vicinity in addition to accessing servers on the wired network. Server mobility may be an issue in this architecture, and so the set of services available to a client is not fixed. This may require mobile devices to implement service discovery [Rekesh,1999; Chakraborty et al., 2002a], collaboration, and composition [Chakraborty et al., 2002b; Mao et al., 2001]. The advantage of this architecture is that each device may have access to more up-to-date, location-dependent information and may interact with peers without infrastructure support. The disadvantage of this architecture is the burden on the mobile devices in terms of energy consumption and network-traffic handling. Client-server and client-proxy-server architectures remain the most popular models of practical use from both commercial and non-commercial perspectives. Both these architectures provide users with certain guarantees, such as connectivity, fixed bandwidth, and security, because of the inherent power of the proxies and servers. From a commercial perspective, they guarantee increased revenues to infrastructure and service providers, as the number of wireless users increases. Peer-to-peer architectures, which truly reflect the goal of anytime, anywhere computing, are largely confined to academia, but possess the potential to revolutionize mobile computing in the decades to come.
36.4 Mobile Computing Application Framework In this section, we describe a comprehensive framework for enabling the development of a mobile application using one of the three architectures described above. Figure 36.2 depicts the different components of the framework. Depending on the selected model, some of the components may not be required to build a complete mobile application. However, other components, such as the communications layer, form an intrinsic part of any mobile application. The design of this framework takes into consideration such issues. We describe the different layers and components of the framework in the next few subsections.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 6 Wednesday, August 4, 2004 9:12 AM
36-6
The Practical Handbook of Internet Computing
Security
Data Management
Service Management
Location Management
Discovery
System Management
Application Specific Logic (API, user interface, transcoding, logic)
Communications (transport, routing, link, physical)
FIGURE 36.2 Mobile computing application framework.
36.4.1 Communications Layer The communications layer in this framework encompasses the physical, MAC, link, network, and transport layers of the mobile computing stack illustrated in Figure 36.1. This layer is responsible for establishing and maintaining logical end-to-end connections between two devices, and for data transmission and reception. The physical and MAC layers are primarily responsible for node discovery, and establishment and maintenance of physical connections between two or more wireless entities. These functions are implemented in different ways in different technologies. For example, in Bluetooth, node discovery is accomplished through the use of the inquiry command by the baseband (MAC) layer. In IEEE 802.11b, the MAC layer employs the RTS-CTS (i.e., Request-To-Send and Clear-To-Send) mechanism to enable nodes to discover each other when they are operating in the ad hoc mode. When IEEE 802.11b nodes are operating in infrastructure mode, the base station broadcasts beacons which the nodes use to discover the base station and establish physical connections with it. The establishment of physical connections is a process in which the nodes exchange operational parameters such as baud rate, connection mode (e.g., full-duplex or half-duplex), power mode (e.g., low-power or high-power), and timing information for synchronization, if required. In order to maintain the connection, some or all of these parameters are periodically refreshed by the nodes. The link layer may not be part of the specifications of all wireless technologies. Some, such as IEEE 802.1.1b, use existing link layer protocols such as HDLC or PPP (for point-to-point connections) to establish data or voice links between the nodes. Bluetooth, on the other hand, uses a proprietary protocol, L2CAP, for establishing and maintaining links. This protocol is also responsible for other common linklayer functions such as framing, error correction, and quality of service. The task of the link layer is more difficult in wireless networks than in wired networks because of the high probability of errors either during or after transmission. Thus, error correction at the link layer must be robust enough to withstand the high bit-error rate of wireless transmissions. The network layer in mobile computing stacks must deal with device mobility, which may cause existing routes to break or become invalid with no change in other network parameters. Device mobility may also be the cause of packet loss. For example, if the destination device, to which a packet is already en route, moves out of range of the network, then the packet must be dropped, Thus, both route establishment and route maintenance are important problems that the network layer must tackle. As the mobility of a network increases, so do route failures and packet losses. Thus, the routing protocol must be robust
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 7 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-7
enough to either prevent route failures or recover from them as quickly as possible. In particular, routing protocols for mobile ad hoc networks have received considerable attention in the recent past. Many routing protocols for MANETs have been developed primarily for research purposes. These include Ad hoc On Demand Distance Vector (AODV) routing protocol, Dynamic Source Routing (DSR), and Destination-Sequenced Distance Vector (DSDV) routing protocol, However, for most applications that use some variant of the client-server model, the standard Internet Protocol (IP) is quite sufficient. Mobile applications, unlike wired-network applications, tend to generate or require small amounts of data (of the order of hundreds or at the most thousands of bytes). Thus, protocols at the transport layer should be aware of the short message sizes, packet delays due to device mobility, and non-congestion packet losses. TCP is ill-suited for wireless networks. Numerous variations of TCP and transport protocols designed exclusively for wireless networks ensure that both ends of a connection agree that packet loss has occurred before the source retransmits the packet. Additionally, some of these protocols choose to defer packet transmission if they detect that current network conditions are unsuitable. The functionality of the communications layer in this framework is usually provided by the operating system running on the mobile device. Therefore, the mobile application can directly invoke the lower level system functions via appropriate interfaces.
36.4.2 Discovery Layer The discovery layer helps a mobile application discover data, services, and computation sources. These may reside in the vicinity of the mobile device or on the Internet. Due to resource constraints and mobility, mobile devices may not have complete information about all currently available sources. The discovery layer assumes that the underlying network layer can establish a logical end-to-end connection with other entities in the network. The discovery layer provides upper layers with the knowledge and context of available sources. There has been considerable research and industry effort in service discovery in the context of wired and wireless networks. Two important aspects of service discovery are the discovery architecture and the service matching mechanism. Discovery architectures are primarily of two types: lookup-registry-based and peer-to-peer. Lookupregistry-based discovery protocols work by registering information about the source at a registry. Clients query this registry to obtain knowledge about the source (such as its location, how to invoke it etc.). This type of architecture can be further subdivided into two categories: centralized registry-based, and federated or distributed registry-based architectures. A centralized registry-based architecture contains one monolithic centralized registry, whereas a federated registry-based architecture consists of multiple registries distributed across the network. Protocols such as Jini [Arnold et al., 1999], Salutation and Salutation-lite, UPnP [Rekesh, 1999], UDDI, and Service Location Protocol [Veizades et al., 1997] are examples of a lookup-registry-based architecture. Peer-to-peer discovery protocols query each node in the network to discover available services on that node. These types of protocols treat each node in the environment equally in terms of functional characteristics. Broadcasting of requests and advertisements to peers is a simple, albeit inefficient, servicediscovery technique in peer-to-peer environments. Chakraborty et al. [2002a] describe a distributed, peer-to-peer service discovery protocol using caching that significantly reduces the need to broadcast requests and advertisements. Bluetooth Service Discovery Protocol (SDP) is another example of a peerto-peer service discovery protocol. In SDP, services are represented using 128-bit unique identifiers. SDP does not provide any information on how to invoke the service. It only provides information on the availability of the service on a specific device. The service discovery protocols discussed in this section use simple interface, attribute, or unique identifier-based matching techniques to locate appropriate sources. Jini uses interface matching and SDP uses identifier matching, whereas the Service Location Protocol and Ninja Secure Service Discovery Systems discover services using attribute-based matching. The drawbacks of these techniques include lack of rich representation of services, inability to specify constraints on service descriptions, lack of
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 8 Wednesday, August 4, 2004 9:12 AM
36-8
The Practical Handbook of Internet Computing
inexact matching of service attributes, and lack of ontology support [Chakraborty et al., 2001]. Semantic matching is an alternative technique that addresses these issues. DReggie [Chakraborty et al., 2001] and Bluetooth Semantic Service Discovery Protocol (SeSDP) [Avancha et al., 2002a] both use a semantically rich language called DARPA Agent Markup Language (DAML) to describe and match both services and data. Semantic descriptions of services and data allow greater flexibility in obtaining a match between the query and the available information. Matching can now be inexact. This means that parameters such as functional characteristics, and hardware and device characteristics of the service provider may be used in addition to service or data attributes to determine whether a match can occur.
36.4.3 Location Management Layer The location management layer deals with providing location information to a mobile device. Location information dynamically changes with mobility of the device and is one of the components of context awareness. It can be used by upper layers to filter location-sensitive information and obtain locationspecific answers to queries, e.g., weather of a certain area and traffic condition on a road. The current location of a device relative to other devices in its vicinity can be determined using the discovery layer or the underlying communications layer. Common technologies use methods such as triangulation and signal-strength measurements for location determination. GPS [Hofmann-Wellenhof et al., 1997] is a well-known example of the use of triangulation based on data received from four different satellites. Cell phones use cell tower information to triangulate their position. On the other hand, systems such as RADAR [Bahl and Padmanabhan, 2000], used for indoor location tracking, work as follows. Using a set of fixed IEEE 802.11b base stations, the entire area is mapped. The map contains (x, y) coordinates and the corresponding signal strength of each base station at that coordinate. This map is loaded onto the mobile device. Now, as the user moves about the area, the signal strength from each base station is measured. The pattern of signal strengths from the stored map that most closely matches the pattern of measured signal strengths is chosen. The location of the user is that corresponding to the (x, y) coordinates associated with the stored pattern. Outdoor location management technologies have achieved technical maturity and have been deployed in vehicular and other industrial navigational systems. Location management, indoor and outdoor, remains a strong research field with the rising popularity of technologies such as IEEE 802.11b and Bluetooth. The notion of location can be dealt with at multiple scales. Most “location determination” techniques actually deal with position determination, with respect to some global (latitude or longitude) or local (distances from the “corner” of a room) grid. Many applications are not interested in the absolute position as much as they are in higher-order location concepts (inside or outside a facility, inside or outside some jurisdictional boundary, distance from some known place, at a mountaintop, in a rain forest region, etc.) Absolute position determinations can be combined with GIS-type data to infer locations at other levels of granularity. Expanding the notion of location further leads us to consider the notion of context. Context is any information that can be used to characterize the situation of a person or a computing entity [Dey and Abowd, 2000]. So, context covers data such as location, device type, connection speed, and direction of movement. Arguably, context even involves a user’s mental state (beliefs, desires, intentions, etc.). This information can be used by the layers described next for data and service management. However, the privacy issues involved are quite complex. It is not clear who should be allowed to gather such information, under what circumstances should it be revealed, and to whom. For instance, a user may not want his or her GPS chip to reveal his or her current location, except to emergency response personnel. Some of these issues, specifically related to presence and availability, are being discussed in the PAM working group of Parlay. A more general formulation of such issues can be found in the recent work of Chen et al. [2003] who are developing OWL-based policies and a Decision-Logic-based reasoner to specify and reason about a user’s privacy preferences as related to context information.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 9 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-9
36.4.4 Data Management Layer The data management layer deals with access, storage, monitoring, and data manipulation. Data may reside locally and also on remote devices. Similar to data management in traditional Internet computing, this layer is essential in enabling a device to interact and exchange data with other devices located in its vicinity and elsewhere on the network. The core difference is that this layer must also deal with mobile computing devices. Such devices have limited battery power and other resources in comparison to their desktop counterparts. The devices also communicate over wireless logical links that have limited bandwidth and are prone to frequent failures. Consequently, the data management layer often attempts to extend data management solutions for Internet computing by primarily addressing mobility and disconnection of a mobile computing device. Work on data management can be classified along four orthogonal axes [Ozsu and Valduriez, 1999; Dunham and Helal, 1995]: autonomy, distribution, heterogeneity, and mobility. We can apply the classification to compare three architecture models adopted by existing data management solutions. The client-server model is a two-level architecture with data distributed among servers. Servers are responsible for data replication, storage, and update. They are often fixed and reside on the wired infrastructure. Clients have no autonomy as they are fully dependent on servers, and may or may not be mobile and heterogeneous. This model was the earliest adopted approach for distributed file and database systems because it simplifies data management logic and supports rapid deployment [Satyanarayanan et al., 1990]. The model delegates all data management responsibility to only a small subset of devices, the servers. Additionally, the model addresses the mobility problem by simply not dealing with it or by using traditional time-out methods. The client-proxy-server model extends the previous approach by introducing an additional level in the hierarchy. Data remains distributed on servers residing on a wired infrastructure. Clients still depend on servers, and may or may not be mobile and heterogeneous. However, a proxy, residing on the wired infrastructure, is placed between clients and servers. The proxy takes on a subset of server responsibilities, including disconnection management, caching, and transcoding. Consequently, servers no longer differentiate between mobile and fixed clients. They can treat all clients uniformly because they communicate with devices on the wired infrastructure only. Proxy devices are then responsible for delivering data to clients and for maintaining sessions when clients change locations [Dunham et al., 1997]. The peer-to-peer model takes a completely different approach from the other two models. This model is highly autonomous as each computing device must be able to operate independently. There is no distinction between servers and clients, and their responsibilities. The model also lies at the extreme of the other three axes because data may reside on any device, and each device can be heterogeneous and mobile. In this model, any two devices may interact with each other [Perich et al., 2002]. Additionally, unlike client-server-based approaches, the model is open in that there is no strict set of requirements that each device must follow. This may cause the data management layer to be implemented differently on each device. Consequently, each peer must address both local and global data-management issues; the latter are handled by servers or proxies in client-server-based models. Local data management, logically operating at the end-user level, is responsible for managing degrees of disconnection and query processing. The least degree of disconnection encourages the device to constantly interact with other devices in the environment. The highest degree represents the state when the device only utilizes its local resources. The mobility of a device can affect both the type of queries as well as the optimization techniques that can be applied. Traditional query-processing approaches advocate location transparency. These techniques only consider aspects of data transfer and processing for query optimization. On the other hand, in the mobile computing environment, query-processing approaches promote location awareness [Kottkamp and Zukunft, 1998]. For example, a mobile device can ask for the location of the closest Greek restaurant, and the server should understand that the starting point of the search refers to the current position of the device [Perich et al., 2002; Ratsimor et al., 2001].
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 10 Wednesday, August 4, 2004 9:12 AM
36-10
The Practical Handbook of Internet Computing
Global data management, logically operating at the architecture-level, deals with data addressing, caching, dissemination, replication, and transaction support. As devices move from one location to another or become disconnected, it is necessary to provide a naming strategy to locate a mobile station and its data. There have traditionally been three approaches for data addressing: location-dependent, location-transparent, and location-independent [Pinkerton et al., 1990; Sandberg et al., 1985]. To allow devices to operate disconnected, they must be able to cache data locally. This requirement introduces two challenges: data selection and data update. Data selection can be explicit [Satyanarayanan et al., 1990] or proactively inferred [Perich et al., 2002]. In the former approach, a user explicitly selects files or data that must be cached. The latter approach automatically predicts and proactively caches the required information. Data update of local replicas usually requires a weaker notion of consistency as the mobile device may have to operate on stale data without the knowledge that the primary copy was altered. This is especially the case when devices become disconnected from the network and cannot validate consistency of their data. Either subscription-based callbacks [Satyanarayanan et al., 1990] or latency- and recencybased refreshing [Laura Bright and Louiqa Raschid, 2002] can address this issue. In subscription-based approaches, a client requests the server to notify it (the client) when a particular datum is modified. In turn, when a server modifies its data, it attempts to inform all clients subscribed to that data. In the latter approaches, a client or proxy uses timestamp information to compare its local replicas with remote copies in order to determine when to refresh its copy. Data dissemination models are concerned with read-only transactions where mobile clients can “pull” information from sources, or the sources can “push” data to them automatically [Acharya et al., 1995]. The latter is applicable when a group of clients share the same sources and they can benefit from accepting responses addressed to other peers. To provide consistent and reliable computing support, the data management layer must support transaction and replica control. A transaction consists of a sequence of database operations executed as an atomic action [Ozsu and Valduriez, 1999]. This definition encompasses the four important properties of a transaction: atomicity, consistency, isolation, and durability (i.e., ACID properties). Another important property of a transaction is that it always terminates, either by committing the changes or by aborting all updates. The principal concurrency-control technique used in traditional transaction management relies on locking [Ozsu and Valduriez, 1999; Eswaran et al., 1976]. In this approach, all devices enter a state in which they wait for messages from one another. Because mobile devices may become involuntarily disconnected, this technique raises serious problems such as termination blocking and reduction in the availability of data. Current-generation solutions to the mobile transaction management problem often relax the ACID [Walborn and Chrysanthis, 1997] properties or propose completely different transaction processing techniques [Dunham et al., 1997]. Having relaxed the ACID properties, one can no longer guarantee that all replicas are synchronized. Consequently, the data management layer must address this issue. Traditional replica control protocols, based on voting or lock principles [Ellis and Floyd, 1983], assume that all replica holders are always reachable. This is often invalid in mobile environments and may limit the ability to synchronize the replica located on mobile devices. Approaches addressing this issue include data division into volume groups and the use of versions for pessimistic [Demers et al., 1994] or optimistic updates [Satyanarayanan et al., 1990; Guy et al., 1998]. Pessimistic approaches require epidemic or voting protocols that first modify the primary copy before other replicas can be updated and their holders can operate on them. On the other hand, optimistic replication allows devices to operate on their replicas immediately, which may result in a conflict that will require a reconciliation mechanism [Holliday et al., 2000]. Alternatively, the conflict must be avoided by calculating a voting quorum [Keleher and Cetintemel, 1999] for distributed data objects. Each replica can obtain a quorum by gathering weighted votes from other replicas in the system and by providing its vote to others. Once a replica obtains a voting quorum, it is assured that a majority of the replicas agree with the changes. Consequently, the replica can commit its proposed updates.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 11 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-11
36.4.5 Service Management Layer Service management forms another important component in the development of a mobile application. It consists of service discovery monitoring, service invocation, execution management, and service fault management. The service management layer performs different functions depending on the type of mobile application architecture. In the client-server architecture, most of the management (e.g., service execution state maintenance, computation distribution, etc.) is done by the server side of the application. Clients mostly manage the appropriate service invocation, notifications, alerts, and monitoring of local resources needed to execute a query. In the client-proxy-server architecture, most of the management (session maintenance, leasing, and registration) is done at the proxy or the lookup server. Disconnections are usually managed by tracking the state of execution of a service (mostly at the server side) and by retransmitting data once connection is established. One very important function of this layer is to manage the integration and execution of multiple services that might be required to satisfy a request from a client; this is referred to as service composition. Such requests usually require interaction of multiple services to provide a reply. Most of the existing service management platforms [Mao et al., 2001; Mennie and Pagurek, 2000] for composite queries are centralized and oriented toward services in the fixed wired infrastructure. Distributed broker-based architectures for service discovery, management, and composition in wireless ad hoc environments are current research topics [Chakraborty et al., 2002b]. Fault tolerance and scalability are other important components, especially in environments with many shortlived services. The management platform should degrade gracefully as more services become unavailable. Solutions for managing services have been incorporated into service discovery protocols designed for wired networks but not for mobile environments. 36.4.5.1 Service Transaction Management This sub-layer deals with the management of transactions associated with m-services, i.e., services applicable to mobile computing environments. We discuss service transaction management as applied to client-server, client-proxy-server, and peer-to-peer architectures. Service transaction management in mobile computing environments is based on the same principles used by e-commerce transaction managers in the Internet. These principles are usually part of a transaction protocol such as the Contract Net Protocol [FIPA, 2001]. A Contract Net Protocol involves two entities, the buyer (also known as manager) and the seller (also known as contractor), who are interested in conducting a transaction. The two entities execute actions as specified in the protocol at each step of the transaction. Examples of these actions include Call for Proposal (CFP), Refuse, Propose, Reject-Proposal, Accept-Proposal, Failure, and InformDone. In the wired computing environment, the two entities execute all actions explicitly. In a mobile computing environment, complete execution of the protocol may be infeasible due to memory and computational constraints. For example, the Refuse action, performed by a seller who refuses the CFP, is implicit if the seller does not respond to the CFP. Thus, service transaction managers on mobile computing devices use simplified versions of transaction protocols designed for the Internet [Avancha et al., 2003]. In mobile computing environments using the client-server or client-proxy-server architecture, the service transaction manager would choose to use the services available on the Internet to successfully complete the transaction. For example, if a person is buying an airline ticket at the airport using her PDA, she could invoke the airline software’s payment service and specify her bank account as the source of payment. On the other hand, in a peer-to-peer environment, there is no guarantee of a robust, online payment mechanism. In such situations, the transaction manager may choose other options, such as micropayments. For example, if a person were buying a music video clip from another person for $1, he may pay for it using digital cash. Both industry and academia have engaged in core research in the area of micropayments in past few years [Cox et al., 1995; Choi et al., 1997]. Three of the most important e-service transactional features that must be applied to m-services are: Identification, Authentication, and Confidentiality. Every entity in a mobile environment must be able to
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 12 Wednesday, August 4, 2004 9:12 AM
36-12
The Practical Handbook of Internet Computing
uniquely and clearly identify itself to other entities with whom it wishes to transact. Unlike devices on the Internet, a mobile device may not be able to use its IP address as a unique identifier. Every mobile device must be able to authenticate transaction messages it receives from others. In a mobile environment where air is the primary medium of communication, anybody can eavesdrop and mount man-in-themiddle attacks against others in radio range. Confidentiality in a mobile environment is achieved through encryption mechanisms. Messages containing payment and goods information must be encrypted to prevent theft of the data. However, mobile devices are constrained by computational and memory capacities to perform expensive computations involved in traditional encryption mechanisms. Technologies such as Smartcards [Hansmann et al., 2000] can help offload the computational burden from the mobile device at the cost of higher energy consumption.
36.4.6 Security Plane Security has greater significance in a mobile environment than in a wired environment. The two main reasons for this are the lack of any security on the transmission medium, and the real possibility of theft of a user’s mobile device. Despite the increased need for security in mobile environments, the inherent constraints on mobile devices have prevented large-scale research and development of secure protocols. Lightweight versions of Internet security protocols are likely to fail because they ignore or minimize certain crucial aspects of the protocols in order to save computation and memory. The travails of the Wired Equivalent Privacy (WEP) protocol designed for the IEEE 802.11b are well known [Walker, 2000]. The IEEE 802.11b working group has now released WEP2 for the entire class of 802.1x protocols. Bluetooth also provides a link layer security protocol consisting of a pairing procedure that accepts a user-supplied passkey to generate an initialization key. The initialization key is used to calculate a link key, which is finally used in a challenge-response sequence, after being exchanged. The current Bluetooth security protocol uses procedures that have low computation complexity, which makes them susceptible to attacks. To secure data at the routing layer in client-server and client-proxy-server architectures, IPSec [Kent and Atkinson, 1998] is used in conjunction with Mobile IP. Research in securing routing protocols for networks using peerto-peer architectures has resulted in interesting protocols such as Ariadne [Hu et al., 2002] and SecurityAware Ad hoc Routing [Yi et al., 2001]. The Wireless Transport Layer Security protocol is the only known protocol for securing transport layer data in mobile networks. This protocol is part of the WAP stack. WTLS is a close relative of the Secure Sockets Layer protocol that is de jure in securing data in the Internet. Transaction and application layer security implementations are also based on SSL.
36.4.7 System Management Plane The system management plane provides interfaces so that any layer of the stack in Figure 36.2 can access system level information. System level information includes data such as current memory level, battery power, and the various device characteristics. For example, the routing layer might need to determine whether the current link layer in use is IEEE 802.11b or Bluetooth to decide packet sizes. Transaction managers will use memory information to decide whether to respond to incoming transaction requests or to prevent the user from sending out any more transaction requests. The application logic will acquire device characteristics from the system management plane to inform the other end (server, proxy, or peer) of the device’s screen resolution, size, and other related information. The service discovery layer might use system-level information to decide whether to use semantic matching or simple matching in discovering services.
36.5 Conclusions Mobile devices are becoming popular in each aspect of our everyday life. Users expect to use them for multiple purposes, including calendaring, scheduling, checking e-mail, and browsing the web. Current-
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 13 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-13
generation mobile devices, such as iPAQs, are powerful enough to support more versatile applications that may already exist on the Internet. However, applications developed for the wired Internet cannot be directly ported onto mobile devices. This is because some of the common assumptions made in building Internet applications, such as the presence of high-bandwidth disconnection-free network connections, and resource-rich tethered machines and computation platforms, are not valid in mobile environments. Mobile applications must take these issues into consideration. In this chapter, we have discussed the modifications to each layer of the OSI stack, which are required to enable mobile devices to communicate with wired networks and other mobile devices. We have also discussed three popular application architectures, i.e., client-server, client-proxy-server, and peer-to-peer, that form an integral part of any mobile application. Finally, we have presented a general framework that mobile applications should use in order to be functionally complete, flexible, and robust in mobile environments. The framework consists of an abstracted network layer, discovery layer, location management, data management, service management, transaction management, and application-specific logic. Depending on the architecture requirements, each application may use only a subset of the described layers. Moreover, depending on the type of architecture, different solutions apply for the different layers. In conclusion, we have presented a sketch of the layered architecture and technologies that make up the state of the art of mobile computing and mobile applications. Most of these have seen significant academic research and, more recently, commercial deployment. Many other technologies are maturing as well, and will move from academic and research labs into products. We feel that the increasing use of wireless local and personal area networks (WLANs and WPANs), higher-bandwidth wireless telephony, and a continued performance/price improvement in handheld and wearable devices will lead to a significant increase in the deployment of mobile computing applications in the near future, even though not all of the underlying problems would have completely wrapped up solutions in the short term.
Acknowledgments This work was supported in part by NSF awards IIS 9875433, IIS 0209001, and CCR 0070802, the DARPA DAML program, IBM, and Fujitsu Labs of America, Inc.
References Acharya, Swarup, Rafael Alonso, Michael Franklin, and Stanley Zdonik. Broadcast Disks: Data Management for Asymmetric Communication Environments. In Michael J. Carey and Donovan A. Schneider, Eds., ACM SIGMOD International Conference on Management of Data, pp. 199–210, San Jose, CA, June 1995. ACM Press. Arnold, Ken, Bryan O’Sullivan, Robert W. Scheifler, Jim Waldo and Ann Wollrath. The Jini Specification (The Jini Technology). Addison-Wesley, Reading, MA, June 1999. Avancha, Sasikanth, Pravin D’Souza, Filip Perich, Anupam Joshi, and Yelena Yesba. P2P M-Commerce in pervasive environments. ACM SIGecom Exchanges, 3(4): 1–9, January 2003. Avancha, Sasikanth, Anupam Joshi, and Tim Finin. Enhanced service discovery in Bluetooth. IEEE Computer, 35(6): 96–99, June 2002. Avancha, Sasikanth, Vladimir Korolev, Anupam Joshi, Timothy Finin, and Y. Yesha. On experiments with a transport protocol for pervasive computing environments. Computer Networks, 40(4): 515–535, November 2002. Bahl, Paramvir and Venkata N. Padmanabhan. RADAR: An in-building RF-based user location and tracking system. In IEEE INFOCOM, Vol. 2, pp. 775–784, Tel Aviv, Israel, March 2000. Barke, Ajay V. and B. R. Badrinath. I-TCP: Indirect TCP for Mobile Hosts. In 15th International Conference on Distributed Computing Systems, pp. 136–113, Vancouver, BC, Canada, June 1995. IEEE Computer Society Press.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 14 Wednesday, August 4, 2004 9:12 AM
36-14
The Practical Handbook of Internet Computing
Bharadvaj, Harini, Anupam Joshi, and Sansanee Auephanwiriyakyl. An Active Transcoding Proxy to Support Mobile Web Access. In 17th IEEE Symposium on Reliable Distributed Systems (SRDS), pp. 118–123, West Lafayette, IN, October 1998. Brooks, Charles, Murray S. Mazer, Scott Meeks, and Jim Miller. Application-Specific Proxy Servers as HTTP Stream Transducers. In 4th International World Wide Web Conference, pp. 539–548, Boston, MA, December 1995. Bright, Laura and Louiqa Raschid. Using Latency-Recency Profiles for Data Delivery on the Web. In International Conference on Very Large Data Bases (VLDB), pp. 550–561, Morgan Kaufmann, Kowloon Shangri-La Hotel, Hong Kong, China, August 2002. Brown, Kevin and Suresh Singh. M-TCP: TCP for mobile cellular networks. ACM Computer Communications Review, 27(5): 19–43, October 1997. Chakraborty, Dipanjan, Anupam Joshi, Tim Finin, and Yelena Yesha. GSD: A novel group-based service discovery protocol for MANETS. In 4th IEEE Conference on Mobile and Wireless Communications Networks (MWCN), pp. 301–306. Stockholm, Sweden, September 2002a. Chakraborty, Dipanjan, Filip Perich, Sasikanth Avancha, and Anupam Joshi. DReggie: A smart Service Discovery Technique for E-Commerce Applications. In Workshop at 20th Symposium on Reliable Distributed Systems, October 2001. Chakraborty Dipanjan, Filip Perich, Anupam Joshi, Tim Finin, and Yelena Yesha. A Reactive Service Composition Architecture for Pervasive Computing Environments. In 7th Personal Wireless Communications Conference (PWC), pp. 53–62, Singapore, October 2002b. Chen, Harry, Tim Finin, and Anupam Joshi. Semantic Web in a Pervasive Context-Aware Architecture. In Artificial Intelligence in Mobile System at UBICOMP, Seattle, WA, October 2003. Choi, Soon-Yong, Dale O. Stahl, and Andrew B. Whinston. Cyberpayments and the Future of Electronic Commerce. In International Conference on Electronic Commerce, Cyberpayments Area, 1997. Cox, Benjamin, Doug Tygar, and Marvin Sirbu. NetBill Security and Transaction Protocol. In 1st USENIX Workshop of Electronic Commerce., pp. 77–88, New York. July 1995. Demers, Alan, Karin Petersen, Mike Spreitzer, Douglas Terry, Marvin Theimer, and Brent Welch. The Bayou Architecture: Support for Data Sharing among Mobile Users. In Proceedings IEEE Workshop on Mobile Computing Systems and Applications, pp. 2–7, Santa Cruz, CA, December 8–9, 1994. Dey, Anind K. and Gregory D. Abowd, Eds. Towards a Better Understanding of Context and ContextAwareness. In Proceedings of the CHI 2000, The Hague, Netherlands, April 2000. Also in GVU Technical Report GIT-99-22, College of Computing, Georgia Institute of Technology, Atlanta, GA. Dunham, Margaret H. and Abdelsalam (Sumi) Helal. Mobile Computing and Databases: Anything New? In ACM SIGMOD Record, pp. 5–9. ACM Press, New York, December 1995. Dunham, Margaret, Abdelsalam Helal, and Santosh Balakrishnan. A mobile transaction model that captures both the data movement and behavior. ACM/Baltzer Journal of Mobile Networks and Applications, 2(2): 149–162,1997. Ellis, Carla Schlatter and Richard A. Floyd. The Roe File System. In 3rd Symposium on Reliability in Distributed Software and Database Systems, pp. 175–181, Clearwater Beach, FL, October 1983. IEEE. Eswaran, Kapali P., Jim Gray, Raymond A. Lorie, and Irving L. Traiger. The notion of consistency and predicate locks in a database system. Communications of the ACM, 19(11): 624–633, December 1976. FIPA. FIPA Contract Net Interaction Protocol Specification. World Wide Web, http://www.fipa.org/specs/ fipa00029/XC00029F.pdf,2001. Goff, Tom, James Moronski, Dhananjay S. Phatak, and Vipul Gupta. Freeze-TCP: A True End-to-End TCP Enhancement Mechanism for Mobile Environments. In INFOCOM, Vol. 3, pp. 1537–1545, Tel Aviv, Israel, March 2000. Guy, Richard, Peter Reiher, David Ratner, Michial Gunter, Wilkie Ma, and Gerald Popek. Rumor: Mobile Data Access through Optimistic Peer-to-Peer Replication. In Workshop on Mobile Data Access in conjunction with 17th International Conference on Conceptual Modeling (ER), pp. 254–265, Singapore, November 1998. World Scientific.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 15 Wednesday, August 4, 2004 9:12 AM
Data and Services for Mobile Computing
36-15
Hansmann, Uwe, Martin S. Nicklous, Thomas Schack, and Frank Seliger. Smart Card Application Development using Java. Springer-Verlag, New York, 2000. Hofmann-Wellenhof, Bernhard, Herbert Lichtenegger, and James Collins. Global Positioning System: Theory and Practice, 4th ed., Springer-Verlag, New York, May 1997. Holliday, JoAnne, Divyakant Agrawal, and Amr El Abbadi. Database Replication Using Epidemic Communication. In Arndt Bode, Thomas Ludwig II, Wolfgang Karl, and Ronal Wism, Eds., 6th EuroPar-Conference, Vol. 1900, pp. 427–434, Munich, Germany, September 2000. Springer. Hu, Yih-Chun, Adrian Perrig, and David B. Johnson. Ariadne: A Secure On-Demand Routing Protocol for Ad Hoc Networks. In 8th ACM International Conference on Mobile Computing and Networking, pp. 12–23, Atlanta, GA, September 2002. ACM Press. Joshi, Anupam, Ranjeewa Weerasinghe, Sean P. McDermott, Bun K. Tan, Gregory Bernhardt, and Sanjiva Weerawarana. Mowser: Mobile Platforms and Web Browsers. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 8(1), 1996. Keleher, Peter J. and Ugur Cetintemel. Consistency management in Deno. ACM Mobile Networks and Applications, 5: 299–309, 1999. Kent, Stephen and Randall Atkinson. IP Encapsulating Security Payload. World Wide Web, http:// www.ietf.org/rfc/rfc2406.txt, November 1998. Klyne, Graham, Franklin Raynolds, and Chris Woodrow. Composite Capabilities/Preference Profiles (CC/ PP): Structure and Vocabularies. World Wide Web, http://www.w3.org/TR/CCPP-struct-vocab/, March 2001. Kottkamp, Hans-Erich and Olaf Zukunft. Location-aware query processing in mobile database systems. In ACM Symposium on Applied Computing, pp. 416–423, Atlanta, GA, February 1998. Mao, Zhuoqing Morley, Eric A. Brewer, and Randy H. Katz. Fault-tolerant, Scalable, Wide-Area Internet Service Composition. Technical report, CS Division, EECS Department, University of California, Berkeley, January 2001. Mennie, David and Bernard Pagurek. An Architecture to Support Dynamic Composition of Service Components. In 5th International Worshop on Component-Oriented Programming, June 2000. Muench, Steve and Mark Scardina. XSLT Requirements. World Wide Web, http://www.w3.org/TR/ xslt20req, February 2001. Ozsu, M. Tamer and Patrick Valduriez. Principles of Distributed Database Systems, 2nd ed., Prentice Hall, Hillsdale, NJ, 1999. Perich, Filip, Sasikanth Avancha, Dipanjan Chakraborty, Anupam Joshi, and Yelena Yesha. Profile Driven Data Management for Pervasive Environments. In 3rd International Conference on Database and Expert Systems Applications (DEXA), pp. 361–370, Aix en Provence, France, September 2002. Perkins, Charles E. Mobile IP Design Principles and Practices. Wireless Communication Series, AddisonWesley, Reading, MA, 1997. Pinkerton, C. Brian, Edward D. Lazowska, David Notkin, and John Zahorjan. A Heterogeneous Distributed File System. In 10th International Conference on Distributed Computing Systems, pp. 424–431, May 1990. Pullela, Chaitanya, Liang Xu, Dipanjan Chakraborty, and Anupam Joshi. A Component based Architecture for Mobile Information Access. In Workshop in conjunction with International Conference on Parallel Processing, pp. 65–72, August 2000. Ratsimor, Olga, Vladimir Korolev, Anupam Joshi, and Timothy Finin. Agents2Go: An Infrastructure for Location-Dependent Service Discovery in the Mobile Electronic Commerce Environment. In ACM Mobile Commerce Workshop in Conjunction with MobiCom, pp. 31–37, Rome, Italy, July 2001. Rekesh, John. UPnP, Jini and Salutation — A look at some popular coordination frameworks for future network devices. Technical report, California Software Labs, 1999. URL http://www.cswl.com/ whitepaper/tech/upnp.html. Sandberg, Russel, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon. Design and Implementation of the Sun Network Filesystem. In Summer USENIX Conference, pp. 119–130, Portland, OR, 1985.
Copyright 2005 by CRC Press LLC
C3812_C36.fm Page 16 Wednesday, August 4, 2004 9:12 AM
36-16
The Practical Handbook of Internet Computing
Satyanarayanan, Mahadev, James J. Kistler, Puneet Kumar, Maria E. Okasaki, Ellen H. Siegel, and David C. Steere. Coda: A Highly Available File System for a Distributed Workstation Environment. IEEE Transactions on Computers, 39(4): 447–459, 1990. Sessions, Roger. COM and DOOM: Microsoft’s Vision for Distributed Objects. John Wiley & Sons, New York, October 1997. Veizades, John, Erik Guttman, Charles E. Perkins, and Scott Kaplan. RFC 2165: Service location protocol, June 1997. Walborn, Gary D. and Panos K. Chrysanthis. PRO-MOTION: Management of Mobile Transactions. In ACM Annual Symposium on Applied Computing, pp. 101–108, San Jose, CA, February 1997. Walker, Jesse R. Unsafe at any key size; An analysis of the WEP encapsulation. IEEE Document 802.11–00/ 362, October 2000. Yi, Seung, Prasad Naldurg and Robin Kravets. Security-aware ad hoc Routing for Wireless Networks. In 2nd RCM Symposium on Mobile Ad Hoc Networking and Computing, pp. 299–302, Long Beach. California. USA., October 2001. Zenel, Bruce. A Proxy Based Filtering Mechanism for The Mobile Environment. Ph.D. thesis, Department of Computer Science, Columbia University, New York, December 1995.
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 1 Wednesday, August 4, 2004 9:14 AM
37 Pervasive Computing CONTENTS 37.1 The Vision of Pervasive Computing 37.2 Pervasive Computing Technologies 37.2.1 37.2.2 37.2.3 37.2.4 37.2.5 37.2.6
Device Technology Network Technology Environment Technology Software Technology Information Access Technology User Interface Technology
37.3 Ubiquitous Computing Systems
Sumi Helal Choonhwa Lee
37.3.1 37.3.2 37.3.3 37.3.4
Active Bat Classroom 2000 Lancaster Guide System Matilda’s Smart House
37.4 Conclusion References
37.1 The Vision of Pervasive Computing In the early 1990s, Mark Weiser called for a paradigm shift to ubiquitous computing in his seminal article [Weiser, 1991] opening with “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” This insight was influential enough to reshape computer research, and since then the research community has put forth enormous efforts to pursue his vision. Writing and electricity are examples of such ubiquitous technologies “disappearing” into the background of our daily activities [Weiser and Brown, 1996]. Writing, perhaps our first information technology, is found in every corner of life in the “civilized” world: dashboards, road signs, toothbrushes, clothes, and even candy wrappers. As a part of physical objects to help us use them better, writing remains almost unnoticeable until for some specific reason we need to call them to the center of our consciousness. In other words, our everyday practices are not interfered with by their surrounding presence, but their meanings are readily available so that we can catch the conveyed message when necessary. For example, little attention is given to exit signs on a highway until we approach our destination. Also, dashboard speedometers on jammed downtown roads are rarely given much focus. Electricity permeates every aspect of our world, and has become so inseparable from modern life that we tend to forget about it and take it for granted. Likewise, computation will be available everywhere, but unobtrusively, in the future ubiquitous computing world. It will be embedded in physical objects such as clothing, coffee cups, tabletops, walls, floors, doorknobs, roadways, and so forth. For example, while reading a morning newspaper, our coffee placed on a tabletop will be kept warm at our favorite temperature. A door will open automatically by sensing the current in our hand; another door, more intelligently, will detect our fingerprints or sense our
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 2 Wednesday, August 4, 2004 9:14 AM
37-2
The Practical Handbook of Internet Computing
approach. This seamless computation will be invisibly integrated into physical artifacts within the environment, and will always be ready to serve us without distracting us from our daily practices. In a future world saturated with computation [Satyanarayanan, 2001], time-saving, nondistracting ease of use requiring minimum human attention will be a prime goal of ubiquitous computing and an inspiration to innovation as well. Eventually, computers will be everywhere and “nowhere.” The computer research community responded to Mark Weiser’s call for a paradigm shift to ubiquitous computing by eagerly investing enormous resources to investigate enabling technologies. Although his view has been enthusiastically followed by numerous ensuing research projects, it often has been given different terms: pervasive, invisible, calm, augmented, proactive, ambient, and so forth to reflect their scope or emphasis on particular areas. Among them, the term pervasive computing gained popularity in the mid-1990s, and is now used with ubiquitous computing interchangeably. Pervasive computing calls for interdisciplinary efforts, embracing nearly all sub-disciplines within computer science. Included are hardware, software, network, human–computer interface, and information technologies. Also, it subsumes distributed and mobile computing and builds on what has been achieved [Satyanarayanan, 2001]. Presented in the following section is an illustrative scenario depicting the future pervasive computing world to identify enabling technologies of pervasive computing. Next, we present a detailed look into each technology through a sampling of representative current developments and future challenges.
37.2 Pervasive Computing Technologies First, a visionary scenario is presented that could commonly happen in the pervasive computing norm of the future. The scenario serves as a basis for identifying enabling technologies of pervasive computing and to support further discussions. Bob from headquarters is visiting a branch office of his company for a meeting on the West Coast. At the reception desk in the lobby, he downloads a tag application into his cellular phone through a wireless connection, instead of wearing a visitor tag. This allows him access to the building without an escort. Moreover, he is constantly being tracked by the building location system, so his location may be pinpointed when necessary. Because this is Bob’s first time in the branch, his secretary agent on the phone guides him to a reserved meeting room by contacting his scheduler on his office computer and referring him to a floor plan from the branch location system. Alice had called for this meeting. The meeting room was automatically reserved when she marked her schedule one week previously. In addition, the time was consulted and confirmed among scheduling agents of all the intended attendees. Her scheduling agent notified the building receptionist of expected visitors, including Bob. As Alice greets Bob in the meeting room, the room “knows” a meeting is about to begin and “notices” another attendee from the branch side, Charlie, is missing. The room sends a reminder to a display that Charlie has been staring at for hours while finalizing his departmental budget. With the deadline for the next year’s budget approaching, Charlie has completely forgotten about the meeting. While rushing to get there, he tries to enter the wrong room where he is redirected by a display on the door that provides directions to lead him to the appropriate meeting room. Using his phone, Bob instructs a projector to download his presentation slides from his office computer. The projector has been personalized to know what cues he has been using for his presentation; for example, he snaps his finger to signify the next slide and waives his left hand for the previous slide. The voice and video annotation are automatically indexed during the entire meeting and stored for later retrieval by both headquarters and branch computer systems. This scenario involves six categories of various enabling technologies of pervasive computing that include device, network, environment, software, information access, and user interface technology, as shown in Table 37.1. We will take a closer look at the enabling technologies of pervasive computing by discussing key subtechnologies and research issues of each category along with representative developments in the area.
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 3 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-3
TABLE 37.1 Pervasive Computing Technologies Category Device Network Environment Software Information access User interface
Constituent Technologies and Issues Processing capability, form factors, power efficiency, and universal information appliance Wireless communication, mobility support, automatic configuration, resource discovery, and spontaneous interaction Location awareness, context sensing, sensor network, security, and privacy Context information handling, adaptation, dynamic service composition, and partitioning Ubiquitous data access, agent, collaboration, and knowledge access Perceptual technologies, biometrics, context-aware computing, multiple modality, implicit input, and output device coordination
It is followed by more futuristic trials and innovative approaches the ubiquitous computing research community is taking to cope with the challenges. This discussion will aid in understanding the direction ubiquitous computing is headed, as well as where it has come from.
37.2.1 Device Technology The last decade has seen dramatic improvements of device technology. The technology available in the early 1990s fell short in meeting the ubiquitous computing vision; the Xerox ParcTab had just 128K of memory, 128 ¥ 64 monochrome LCD display, and IR support [Weiser, 1993]. Today’s typical PDAs are armed with powerful processing capability, 32M RAM, 320 ¥ 240 transflective color TFT, and IEEE 802.11b support and/or Bluetooth. The most remarkable advances are found in processing, storage, and display capabilities [Want et al., 2002]. For more than 35 years, microprocessor technology has been accelerating in accordance with the self-fulfilling Moore’s law that states transistor density on a microprocessor doubles every 18 months. The smaller the processor, the higher the performance, as it can be driven by a faster clock. Also implied is less power consumption, which is another key issue for pervasive computing. Thanks to chip technology progress, today’s PDAs are equipped with a processing capability equivalent to the early to mid-1990s desktop computer power [Estrin et al., 2002]. This advance is matched by storage capability improvement. Common today are high-end PDAs equipped with up to 64M RAM. Also, a matchbook-size CompactFlash card (i.e., small removable mass storage) provides up to 1G storage capability. Yet another area of drastic improvement is display technology: TFT-LCD and PDP. Their capability crossed the 60” point recently, and prototype costs continue to drop. Predictions were made that in the post-PC era we would carry multiple gadgets such as cellular phones, PDAs, and handhelds at all times. The need for multiple devices sparked the integration of specialized devices into a multifunctional device. For example, phone-enabled PDAs are already seen on the market, and many cellular phones have gained Internet browsing capability and personal organizer functionality traditionally performed by PDAs. Along with these developments, a more radical approach is being explored. Rather than building multiple hardware functionalities into a single device, software that runs on general-purpose hardware can transform it into a universal information appliance, with variety of gadgets including a cellular phone, a PDA, a digital camera, a bar-code scanner, and an AM/FM radio. The MIT SpectrumWare project has already built a prototype of a software radio out of a general PC. All signal processing except for antenna and sampling parts is handled by software, enabling the performance of any device, once loaded with appropriate software modules in RAM [Guttag, 1999]. The feasibility of this multifunctional device is further backed by MIT Raw chip technology. By exposing raw hardware to software at the level of logic gates, a new opportunity opens up in which a chip itself can be dynamically customized to fit the particular needs of desired devices. More specifically, a software compiler can customize wirings on the chip to direct and store signals in a logic-wire-optimal way for target applications. This approach yields unprecedented performance, energy efficiency, and cost-effectiveness to perform the function of desired gadgets [Agarwal, 1999].
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 4 Wednesday, August 4, 2004 9:14 AM
37-4
The Practical Handbook of Internet Computing
37.2.2 Network Technology Wireless communication technology is another area with huge successes in technical advancement and wide deployment since the mid-1990s. According to their coverage, wireless communication networks can be classified into short range, local area, and wide area networks. Initially, Bluetooth was developed as a means of cable replacement, thereby providing coverage for short range (in-room coverage, typically 10 m). Communicating mobile devices form a small cell called a piconet, which is made up of one master and up to seven slave nodes, and supports about 1M data rate. Smaller coverage means lower power consumption, which is an indispensable feature for the tiny devices being targeted. Infrared is another local connectivity protocol that was developed well before Bluetooth, yet failed to gain popularity. The disadvantage of its line-of-sight requirement is ironically useful in some pervasive computing applications such as an orientation detection system or a location system that can detect physical proximity in terms of physical obstacles or containment. Next, as a midrange wireless protocol covering hundreds of meters, 11Mbps IEEE 802.11b was widely deployed in the past few years. It is no longer a surprise to see wirelessLAN-covered streets, airport lounges, and restaurants. More recently, 54Mbps IEEE 802.11a has started its deployment. Finally, we have seen explosive cellular market growth; as of early 2001, one out of 10 people in the world (680 million) own cellular phones [Parry, 2002]. Moreover, the next generation digital cellular networks such as 2.5G and 3G networks will bring communication capability closer to what ubiquitous computing vision mandates. For instance, the 3G networks support a data rate of 2Mbps for stationary users and 144Kbps for mobile users. The wireless network technologies with different coverage complement one another through vertical handoff by which a device switches to the best network available in a given environment [Stemm and Katz, 1998]. While moving around within a network or across several networks, seamless connectivity can be supported by IP mobility protocols [Perkins, 1996] [Campbell and Gomez-Castellanos, 2001]. Aside from the basic capability of communication, network technology must be able to foster spontaneous interaction, which encompasses mobility support, network resource discovery and automatic configuration, dynamic service discovery, and subsequent spontaneous interactions. Some of these issues are addressed by IETF Zero Configuration Working Group [Hattig, 2001]. A device discovers network resources and configures its network interface using stateful DHCP or stateless automatic configuration [Cheshire et al., 2002] [Thomson and Narten, 1996]. Once base communication protocol is configured and enabled, users may locate a service to meet their needs specified in a particular query language of service discovery protocols [Guttman et al., 1999] [Sun Microsystems, 1999] [UPnP Forum, 2000]. The service discovery problem brings about the ontology issue, which is about how the functionality of services we have never encountered before can be described.
37.2.3 Environment Technology Smart environments instrumented with a variety of sensors and actuators enable sensing and activating a physical world, thereby allowing tight coupling of the physical and virtual computing worlds. Environmental technology is unique to and inherent in ubiquitous computing, bringing with it new challenges that do not have a place in distributed and mobile computing. Early works completed in the beginning of the 1990s focused on office environments with location awareness [Want et al., 1992] [Want et al., 1995]. The first indoor location system seems to be the Active Badge project at the Olivetti Research Lab from 1989 to 1992. This system is able to locate people through networked sensors that pick up IR beacons emitted by wearable badges. Later, it evolved into Active Bat, the most fine-grained 3D location system [Addlesee et al., 2001], which can locate 3D position of objects within the accuracy of 3 cm for 95% of the readings. Users carry the Active Bats of 8.0 cm ¥ 4.1 cm ¥ 1.8 cm dimension, synchronized with a ceiling sensor grid using a wireless radio network. A particular Bat emits an ultrasonic pulse, when instructed by a RF beacon, and the system measures the time-of-flight of the ultrasonic pulse to the ceiling receivers. Measured arrival times are conveyed to a central computer, which calculates the position of the Bat. The fine-grained accuracy enables interesting location-aware applications such as location maps of people in an office complex,
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 5 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-5
automatic call routing to the nearest phone, and Follow-me desktop application. The Follow-Me desktop follows a user in that the user’s VNC desktop can be displayed on the nearest computer screen by clicking a button on the Bats. Location systems can be classified into either positioning or tracking systems. With positioning systems, users receive location information from the environment and calculate their own position, whereas user locations are continuously kept track of by a central computer in tracking systems. Various techniques to detect user locations are used, including Infrared, ultrasonic, radio frequency, physical contacts, and even camera vision. Without requiring dedicated infrastructure to be deployed throughout the environments, some systems exploit cellular proximity, i.e., wireless communication signal strength and limited coverage of existing wireless communication infrastructure [Bahl and Padmanabhan, 2000]. For outdoor positioning systems, GPS can determine locations within the accuracy of 10 m in most areas, which is more accurate than most indoor location systems considering the large scale of outdoor features on the earth’s surface. A comprehensive survey on the location systems was conducted by Hightower and Borriello [2001]. Until now, most smart space trials have focused on the instrumentation of spatially limited office or home environments, resulting in moderate systems scale. (The largest deployment reported is the Active Bat system consisting of 200 Bats and 750 receiver units covering three-floors of 10,000 ft2 [Addlesee et al., 2001]). The environments were spatially limited and relatively static, so the infrastructure deployments were well planned via presurvey and offline calibration with emphasis on location determination. Because the environments may be unpredictable, unknown, and span heterogeneous, wide areas, they will be better sensed by taking various metrics collectively. To cope with this uncertainty, researchers are investigating relevant issues such as massive scale, autonomous, and distributed sensor networks using rich environmental information sensing rather than simply location. The various metrics may include light, temperature, acceleration, magnetic field, humidity, pressure, and acoustics [Estrin et al., 2002]. The possibility for new environmental technologies is being raised by recent advances in microelectromechanical systems (MEMS) and miniaturization technology. Researchers have been able to build a 1-in. scale system that involves processing, communication, and sensors, expecting to break the barrier of a 1 mm3 computer within 10 years [Warneke et al., 2001]. The new wireless sensor network will find new applications and uses that were previously not possible, e.g., being sprinkled in an ad hoc manner to monitor the weather, forests, and wildfires, as well as civil infrastructures including buildings, roads, and bridges. The most challenging issues of sensor networks are scalability and autonomy [Estrin et al., 2002]; unlike the early systems in which sensed information was collected and processed by a central computer, information in sensor networks ought to be preprocessed by the source or intermediate node(s) because of energy concern, uncertainty, and dynamism of the environment. Because individual nodes have limited processing power and view of the environment, the intermediate data may be sent to another node for further processing.
37.2.4 Software Technology Early mobile and ubiquitous computing applications focused primarily on service mobility (i.e., device independent service access), and context information processing. First, service mobility means services on any device, i.e., services must be able to be invoked on whatever devices happen to be with users at the moment. Services are negotiated and further adapted or transformed to the devices’ capability [Klyne et al., 2002; Gimson et al., 2001]. In addition to the device capability context information, ubiquitous computing applications must deal with various context information by a variety of sensors. Even for the same type of context, it may have to switch to a different source. For example, an application will have to switch to an indoor location service from a GPS location service, when a user enters a building. Thus, an important issue becomes achieving context awareness while separating the context-sensing portion from the rest of applications. The context widget [Dey et al., 1999] provides abstraction components that pass logical information to applications, while hiding unnecessary specifics of raw information from physical sensors. This eliminates applications’ dependency on a particular sensor type.
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 6 Wednesday, August 4, 2004 9:14 AM
37-6
The Practical Handbook of Internet Computing
Pervasive computing vision requires more revolutionary changes to the notion of applications, i.e., a new model of applications. An application should not be designed with regard to a rigid decomposition of interactions with users. Rather a task (i.e., an application component) must be described in high-level logic in order to be instantiated later on according to the logic. More specifically, considering the application will run in a dynamic and unknown environment, the task (and constituent subtasks) logic must be described to abstract away any specific details relating to user device capability, the environments, and available services [Banavar et al., 2000]. The problem of how to describe abstracted task interfaces relates to the service description issue discussed in the network technology subsection. An abstract description of application components (i.e., services) facilitates dynamic service discovery and composition. Components compatible according to the application’s task interface description and appropriate in the given context are integrated to synthesize an application. But the new model does not require all components to be loaded on a single device. The constituent modules spread through the ubiquitous environments, and an application is instantiated in just-in-time fashion (i.e., when needed). Thus the model of the distributed application components is characterized as “disappearing software,” application functionality partitioning over multiple devices that facilitates recomposition, migration, and adaptation [Want et al., 2002]. The real power of a ubiquitous computing application does not come solely from the application itself. Rather, it comes from an orchestration of the application, supporting services, and environments as a cooperative whole. The new model has numerous advantages. First, it allows for adaptation to dynamically changing environments and fault-tolerance. Applications autonomously respond to changes and failures by migrating affected components only. Second, it enables gradual evolution of the environments and an application itself without disrupting the function as a whole. Introduction of new devices does not affect the entire application directly, and better services can be incorporated into the application as they become available. Therefore, no need exists for global upgrade or the associated downtime. Finally, this approach enhances the scalability of software infrastructure by allowing it to be shared among multiple application instances [Banavar and Bernstein, 2002].
37.2.5 Information Access Technology In addition to the challenges brought forth by the ubiquity of computing, we face new requirements raised by the omnipresence of information as well. Ubiquitous information access means more than just ensuring that information is readily available under any condition; Anytime, anywhere, and any device computing has been more or less a motto of distributed and mobile computing research for the past 30 years. Built on predecessors’ achievements, ubiquitous information research topics include remote information access, fault tolerance, replication, failure recovery, mobile information access [Kistler and Satyanarayanan, 1992], adapted access such as proxies and transcoding [Noble et al., 1997], CSCW (Computer Supported Cooperative Work), software agent, data mining, and knowledge representation [Berners-Lee et al., 2001]. However, ubiquitous information access requires a new level of sophistication atop distributed and mobile computing; personalized, context-aware information systems equipped with intelligence to infer users’ implicit intentions by capturing relevant contextual cues. The whole process of information processing, i.e., information acquisition, indexing, and retrieval, should be augmented to handle implicit user needs. Acquired information ought to be organized and indexed according to relevant contextual information. For example, a smart meeting room gathers and stores all information regarding on-going meetings from various sources, including meeting agendas and time, presentation slides, attendees, and relevant materials such as Web pages accessed during the meeting. To enable ubiquitous information access, the information access system first needs to establish useful contexts by sensing environments and monitoring users’ activities. It may consider users’ personal information and past access patterns, nearby people and objects, and inferred social activities they are presently engaging in. This context metadata becomes an inherent part of the information set, enabling personalized information access.
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 7 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-7
For information retrieval, a users’ query needs to be refined according to relevant context, and its result as well. Given a search query of “pervasive computing,” the system will return information relating to ubiquitous information access technology if able to infer that the user has recently been working for a mobile file system. It may, however, return information regarding wireless sensor nodes if the individual is a hardware engineer.
37.2.6 User Interface Technology “Disappearing technology” implies the need for new modalities concerning input and output. Traditional WIMP (Windows, Icons, Menus, Pointing devices) based user interface requires constant attention and often turns out to be an annoying experience. It will not allow computing to disappear from our consciousness. One enlightening example of a new modality is the “dangling string” [Weiser and Brown, 1996]. This string hangs from a ceiling and is somehow electrically connected to an Ethernet cable. More specifically, a motor rotates the string at different speeds proportional to traffic volume on the network. When traffic is light, a gentle waving movement is noted; the string whirls fast under heavy traffic with a characteristic noise indicating the traffic volume. This aesthetically designed interface conveys certain information without moving into the foreground of our consciousness and overwhelming our attention, which may be opportunistic in certain situations. This interface can be compared to a fast scrolling, dizzy screen that quickly dumps the details of traffic monitoring and analysis. The primary challenges of the ubiquitous computing user interface are implicit input and distributed output, which are based on diverse modalities as pointed out by Abowd et al. [2002]. Unobtrusive interfaces that employ appropriate modalities rather than plain keyboard and display allow for natural, comfortable interactions with computers in everyday practices. This makes forgetting their existence possible, hence invisible computing. There is a trend of input technology advancing toward a more natural phenomena for humans, going from text input to the desktop metaphor of GUI-based windowing systems and perceptual technologies such as handwriting, speech, and gesture recognition. Further advances in recognition technologies include face, gait, and biometric recognition such as fingerprint and iris scanning. Along with the environmental sensing technology (including user identity, location, and other surrounding objects), the new development of various input modalities raises possibilities of natural interactions to catalyze the disappearance of computing. For example, when we say “Open it” in front of a door or while pointing to it, our intention is inferred by multiple input sources; primary user intention interpretation via speech recognition can be refined by physical proximity sensing or image recognition techniques. Three form factor scales, i.e., the inch, foot, and yard scale, were prototyped by the early Xerox ParcTab project [Weiser, 1993], and were called ParcTab, MPad, and Liveboard, respectively. These form factors remain prevalent in today’s computing devices; the ParcTab is similar to today’s PDAs in size, whereas current laptops or tablet PCs are the MPad scale. Also, we can easily find high-resolution wall displays (the Liveboard class devices) in an office environment or public place. Aside from the classical display interface, there are several trials for innovative output modes like the “dangling string.” A true challenge is to coordinate multiple output devices to enhance user experiences with minimal user attention and cognitive effort. For instance, a presentation system can orchestrate appropriate output devices among wall displays, projectors, stereo speaker systems, and microphones at the right time. While giving a presentation, if the presenter points to a slide containing a pie chart representing this year’s sales, a big table of detailed numbers, categorized by product classes, will be displayed on a wall display. An example of output coordination across multiple devices can be found in WebSplitter [Han et al., 2000].
37.3 Ubiquitous Computing Systems Despite all efforts in the last decade to reach for the compelling vision, there are very few ubiquitous computing systems that have undergone significant field trials. The examples are Active Bat [Addlesee et al., 2001], Classroom 2000 [Abowd, 1999], Lancaster Guide System [Davies et al., 2001], and Matilda’s
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 8 Wednesday, August 4, 2004 9:14 AM
37-8
The Practical Handbook of Internet Computing
Smart House [Helal et al., 2003]. The system descriptions below not only give insight into the essential and unique features of ubiquitous computing systems, but help identify gaps between state-of-the-art technologies and reality. As we will see in the descriptions, these systems focus on certain aspects of ubiquitous computing to dig into relevant technologies among the classification discussed in the previous section.
37.3.1 Active Bat The first indoor location system, the Active Badge project (1989 to 1992), later evolved into the Active Bat at AT&T Laboratories Cambridge, which is the most fine-grained 3D location system. Users carry the Active Bat of 8.0 cm ¥ 4.1 cm ¥ 1.8 cm dimensions, which consists of a radio transceiver, controlling logic, and an ultrasonic transducer. Ultrasonic receiver units mounted on the ceiling are connected by a wired daisy-chain network. A particular Bat emits an ultrasonic pulse, when informed by an RF beacon from the cellular network base stations. Simultaneously, the covered ceiling-mounted receivers are reset via the wired network. The system then measures the time-of-flight of the ultrasonic pulse to the ceiling receivers. The measured arrival times are conveyed to a central computer, which calculates the 3D position of the Bat. In addition, the Bat has two buttons for input and a buzzer and two LEDs for output. The control button messages are sent to the system over the wireless cellular network. The interaction with the environment is based on a model of the world constructed and updated by sensor data. Changes in the real world objects are immediately reflected by the corresponding objects of the model. Thus the model, i.e., a shared view of the environment by users and computers, allows application to control and query the environment. For example, Figure 37.1 is a visualization of an office model that mirrors a users’ perception of the environment. It displays users, workstations, telephones with an extension number, and furniture in the office. The map allows users to easily locate a colleague. They can also locate the nearest phone not in use and place a call by simply clicking the phone. Another interesting application enabled by the fine-grained 3D location sensing is a virtual mouse panel. An information model sets a mouse panel space around several 50 in. display objects in the office building. When a Bat enters the space, its positions are projected into the panel and translated into mouse pointer readings in the imaginary pad. The project concentrated on the environment and device categories of the technology classification in Section 37.2, which include environmental instrumentation and accurate 3D location sensing, the model of the real world shared by users and applications, and the formalization of spatial relationships
FIGURE 37.1 A model of an office environment. [From Addlesee, M., R. Curwen, S. Hodges, J. Newman, P. Steggles, A. Ward, and A. Hopper. Implementing a sentient computing system. IEEE Computer, Vol. 34, No. 8, August 2001, pp. 50–56.]
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 9 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-9
between the objects and 2D spaces around them. In addition, the project dealt with some issues of the software and user interface categories. For example, its programming support for location events based on the spatial relationship formalization enabled several interesting applications. Also, it was demonstrated that novel user interfaces such as the virtual mouse panel can be enabled when the Bat’s limited processing and input/output capabilities are supplemented by environmental supports.
37.3.2 Classroom 2000 The Classroom 2000 project at Georgia Institute of Technology captures the traditional lecture experience (teaching and learning) in an instrumented classroom environment. The Classroom 2000 system supports the automated capture, integration, and access of a multimedia record of a lecture. For example, to facilitate student reviews after a lecture, class presentation slides are annotated with automatically captured multimedia information such as audio, video, and URLs visited during the class. A student’s personal notes can also be incorporated. In addition to the whole class playback, the system supports advanced access capabilities, e.g., clicking on a particular handwritten note gives access to the lecture at the time it was written. This project was a long-term, large-scale experiment that started in 1995. Its more than three-year project lifetime includes extensive trials of over 60 courses at Georgia Institute of Technology, nine courses at Kennesaw State University, GA, and a few others. The classroom is instrumented with microphones and video cameras mounted to the ceiling. An electronic whiteboard (a 72 in. diagonal upright pen-based computer used to capture the instructor’s presentation slides and notes) and two ceiling-mounted LCD projectors are connected to the classroom network reaching the seats of 40 students. After class, the instructor makes the lectures available for review by the students using any networked computer. Also, students may use a hand-held tablet computer in the classroom to take their private notes, which will be automatically consolidated into the lecture during the integration phase. Figure 37.2 is an example of class notes taken by the Classroom 200 system. The class progression is indicated on the left by a time line decorated with activities during the class. The left decoration frame includes covered slides and URLs visited in the lecture, providing an overview of the lecture to facilitate easy browsing. Clicking on a decoration causes either the slide to be displayed in the right frame or the URL page to be opened in a separate browser. The figure also shows a presentation slide annotated with the instructor’s handwritten notes on the electronic whiteboard. With great emphasis given to the information access and user interface technology categories, the project used existing technologies for the device, network, environment, and software categories. The classroom was instrumented with standard classroom equipments, an electronic whiteboard, and ordinary client and server machines connected to the classroom network. The primary concern was facilitation of the automatic capture and easy access later on, i.e., whether all significant activities were captured with time stamp information, and the captured multimedia were integrated and indexed based on the time stamps for later class reviews. For example, speech recognition software was used to generate a timestamped transcript of the lecture, to be used for keyword search. The search result pointed to the moment of the lecture when the keyword was spoken.
37.3.3 Lancaster Guide System The goal of the Lancaster Guide project, which started in 1997, was to develop and deploy a contextsensitive tourist guide for visitors to the city of Lancaster, U.K. In 1999, a field trial involving the general public was conducted to gain practical experiences for further understanding the requirements of ubiquitous computing applications. The Lancaster Guide system provides a customized tour guide for visitors considering their current location, preferences, and environmental context. For example, an architect may be more interested in old buildings and castles rather than a stroll along a river bank, whereas second-time visitors will probably be interested in the places that they missed during their last visit. The
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 10 Wednesday, August 4, 2004 9:14 AM
37-10
The Practical Handbook of Internet Computing
FIGURE 37.2 Access tool user interface. [From Abowd, G. D. Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Systems Journal, Vol. 38, No. 4, 1999, pp. 508–530.]
system also considers the visitors’ languages, financial budgets, time constraints, and local weather conditions. Approximate user location is determined by a cell of the 802.11 wireless network deployed over the city. A cell consists of a cell server machine and several base stations. As end-user systems, a tablet-based PC (Fujitsu TeamPad 7600 with a transflexive 800 x 600 pixel display) equipped with a 2M WaveLAN card communicates with the cell server via the base stations. The cell server periodically broadcasts location beacons containing a location identifier, and also caches Web pages frequently accessed by users in the cell. The cached pages are periodically broadcast to the cell, which in turn is cached by the enduser systems. When the cell server is informed of a cache miss on the client devices, it will add the missed page to its broadcast schedule for the next dissemination cycle. The information broadcast and caching is to reduce information access latency in the cell. The Guide System’s geographical information model associates a location with a set of Web pages. The Web pages presented to the visitors are dynamically created to reflect their context and preferences. Special Guide tags are used to indicate their context and preferences in the template HTML pages. In other words, the special tags (i.e., hooks between the HTML pages and information model) are expanded into personalized information pieces. Figure 37.3 shows a dynamically created tour guide page to list attractions based on proximity to a user’s current location and the attractions’ business hours. To provide context-sensitive tour information to visitors, the project focused on the information access, context-aware user interface, and environment technology categories with regard to the technology classification. The 802.11 base stations over the city point to a location that is associated with a set of Web pages linked to the geographical information model. Based on the location information along with
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 11 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-11
FIGURE 37.3 Lancaster Guide System. [From Davies, N., K. Cheverst, K. Mitchell, and A. Efrat. Using and determining location in a context-sensitive tour guide. IEEE Computer, Vol. 34, No. 8, August 2001, pp. 35–41.]
other context information, tailor-made HTML guide pages are dynamically created and presented to tourists. However, the system’s ability to recommend a custom tour is dependent on the preference information a visitor inputs during the time of a user unit pickup at the tour information center.
37.3.4 Matilda’s Smart House Matilda’s Smart House project is a relatively recent effort launched in 2002 to innovate pervasive applications and environments specifically designed to support the elderly, i.e., Matilda, in the RERC — Pervasive Computing Laboratory at the University of Florida. The project explores the use of emerging smart phone and other wireless technologies to create “magic wands”, which enable elder persons with disabilities to interact with, monitor, and control their surroundings. By integrating the smart phone with smart environments, elders are able to turn appliances on and off, check and change the status of locks on doors and windows, aid in grocery shopping, and find other devices such as car keys or a TV remote. It also explores the use of smart phones as devices that can proactively provide advice such as reminders to take medications or call in prescription drug refills automatically. Figure 37.4 is the top view of Matilda’s Smart House consisting of a kitchen, living room, bathroom, and bedroom from left to right. Comparing the smart environment instrumentation with the early ubiquitous computing systems reveals computing technology advances. For the Xerox ParcTab project, every piece of the system needed to be hand-crafted from scratch, including computing devices, sensors, network, and user interface, because they were not available at that time. In contrast, many commercial off-the-shelf products readily available on the market are used for the Matilda’s Smart House project. Included are J2ME smart phones as user devices, ultrasonic receivers in the four corners of the mockup house, X-10 controlled devices (door, mailbox, curtain, lamp, and radio), and networked devices
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 12 Wednesday, August 4, 2004 9:14 AM
37-12
The Practical Handbook of Internet Computing
FIGURE 37.4 Matilda’s Smart House.
(microwave, fridge, LCD displays on the wall, and cameras). The OSGi (Open Service Gateway initiative) is adopted as base software infrastructure facilitating the smart house resident’s interactions with the smart environments. It also supports remote monitoring and administration by family members and caregivers. Building the smart space using COTS components freed the project team to focus more on the integration of the smart phone with the smart environments and various pervasive computing applications [Helal et al., 2003]. For example, medication reminders are provided on the LCD display Matilda is facing, with her orientation and location being sensed by the smart house. An audio warning is issued if she picks up a wrong medicine bottle, which is detected by a barcode scanner attached to her phone. Also, Matilda can use her phone to open the door for the delivery of automatically requested prescription drug refills. The project spans the environment, device, network, and user interface technology categories. The smart environments are built using various networked appliances, X-10 devices, sensors, and an ultrasonic location system. J2ME smart phones are utilized as a user device to control and query the environment. Also, some user interface issues such as multiple modality and output device coordination are addressed. Several pervasive computing applications enabled through the project involve service discovery and interaction issues.
37.4 Conclusion Five major university projects have been exploring pervasive computing issues, including the Oxygen project at MIT (http://oxygen.lcs.mit.edu); the Endeavor project at the University of California, Berkeley (http://endeavour.cs.berkeley.edu); the Aura Project at Carnegie Mellon University (http:// www.cs.cmu.edu/~aura); the Portolano Project at the University of Washington (http://portolano.cs.washington.edu); and the Infosphere Project at Georgia Tech and Oregon Graduate Institute (http:/ /www.cc.gatech.edu/projects/infosphere). These large-scale projects are investigating comprehensive issues of pervasive computing, so their scope is much broader than industry projects such as PIMA at
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 13 Wednesday, August 4, 2004 9:14 AM
Pervasive Computing
37-13
IBM Research (http://www.research.ibm.com/PIMA), Cooltown at HP (http://www.cooltown.hp.com), and the Easy Living Project at Microsoft Research (http://research.microsoft.com/easyliving). Despite its compelling vision and enormous efforts from academia and industry during the past 10 years, the pervasive computing world does not seem close to us yet. Only a few ubiquitous computing systems were deployed and evaluated in the real world on a realistic scale. The changes and impacts pervasive computing systems will bring are not yet thoroughly understood. There were privacy and security controversies sparked by early deployment of pervasive computing systems, and user groups were reluctant to use those systems. The more information the system knows about an individual, the better he or she will be served. To what extent should personal information be allowed to enter the system and how can it prevent the information from being abused? These are questions that need to be answered for wide acceptance of the pervasive computing technologies. Besides, an introduction of new technologies may cause unexpected responses and confusions as seen in “The Pied Piper of Concourse C” behavior [Jessup and Robey, 2002]. People need time to develop notions of appropriate behaviors and new practices to adapt themselves for new technologies. In order to gain wide acceptance of pervasive computing systems, nontechnical factors such as social, economical, and legal issues should not be underestimated. A deeper understanding of these issues can be developed through real field trials out of laboratories, and practical experiences and feedback from the experiments will redefine pervasive computing systems.
References Abowd, G. D. Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Systems Journal, Vol. 38, No. 4, 1999, pp. 508–530. Abowd, G. D., E. D. Mynatt, and T. Rodden. The human experience. IEEE Pervasive Computing, Vol. 1, No. 1, January–March 2002, pp. 48–57. Addlesee, M., R. Curwen, S. Hodges, J. Newman, P. Steggles, A. Ward, and A. Hopper. Implementing a sentient computing system. IEEE Computer, Vol. 34, No. 8, August 2001, pp. 50–56. Agarwal, A. Raw computation. Scientific American, Vol. 281, No. 2, August 1999. Bahl, P. and V. Padmanabhan. RADAR: An In-Building RF-based User Location and Tracking System. In Proceedings of the IEEE Infocom 2000, March 2000, pp. 775–784. Banavar, G., J. Beck, E. Gluzberg, J. Munson, J. Sussman, and D. Zukowski. Challenges: An Application Model for Pervasive Computing. In Proceedings of the 6th ACM/IEEE International Conference on Mobile Computing and Networks (MobiCom’00), August 2000, pp. 266–274. Banavar, G. and A. Bernstein. Software infrastructure and design challenges for ubiquitous computing applications. Communications of the ACM, Vol. 45, No. 12, December 2002, pp. 92–96. Berners-Lee, T., J. Hendler, and O. Lassila. The semantic web. Scientific American, Vol. 284, No. 5, May 2001. Campbell, A. T. and J. Gomez-Castellanos. IP micro-mobility protocols. ACM SIGMOBILE Mobile Computer and Communication Review (MC2R), Vol. 4, No. 4, October 2001, pp. 42–53. Cheshire, S., B. Aboba, and E. Guttman. Dynamic Configuration of IPv4 Link-Local Addresses. IETF Internet Draft, August 2002. Davies, N., K. Cheverst, K. Mitchell, and A. Efrat. Using and determining location in a context-sensitive tour guide. IEEE Computer, Vol. 34, No. 8, August 2001, pp. 35–41. Dey, A. K., G. D. Abowd, and D. Salber. A Context-Based Infrastructure for Smart Environment. In Proceeding of the 1st International Workshop on Managing Interactions in Smart Environments (MANSE’99), December 1999, pp. 114–128. Estrin, D., D. Culler, K. Pister, and G. Sukatme. Connecting the physical world with pervasive networks. IEEE Pervasive Computing, Vol. 1, No. 1, January–March 2002, pp. 59–69. Gimson, R., S. R. Finkelstein, S. Maes, and L. Suryanarayana. Device Independence Principles. http:// www.w3.org/TR/di-princ/, September 2001. Guttag, J. V. Communication chameleons. Scientific American, Vol. 281, No. 2, August 1999.
Copyright 2005 by CRC Press LLC
C3812_C37.fm Page 14 Wednesday, August 4, 2004 9:14 AM
37-14
The Practical Handbook of Internet Computing
Guttman, E., C. Perkins, J. Veizades, and M. Day. Service Location Protocol, Version 2. IETF RFC 2608, June 1999. Han, R., V. Perret, and M. Naghshineh. WebSplitter: A Unified XML Framework for Multi-Device Collaborative Web Browsing. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work (CSCW 2000), December 2000, pp. 221–230. Hattig, M. Zeroconf Requirements. IETF Internet Draft, March 2001. Helal, S., B. Winkler, C. Lee, Y. Kaddoura, L. Ran, C. Giraldo, S. Kuchibhotla, and W. Mann. Enabling Location-aware Pervasive Computing Applications for the Elderly. In Proceedings of the 1st IEEE International Conference on Pervasive Computing and Communications (PerCom 2003), March 2003, pp. 531–536. Hightower, J. and G. Borriello. Location systems for ubiquitous computing. IEEE Computer, Vol. 34, No. 8, August 2001, pp. 57–66. Intanagonwiwat, C., R. Govindan, and D. Estrin. Directed Diffusion: a Scalable and Robust Communication Paradigm for Sensor Networks. In Proceedings of the 6th ACM/IEEE International Conference on Mobile Computing and Networks (MobiCom’00), August 2000, pp. 56–67. Jessup, L. M. and D. Robey. The relevance of social issues in ubiquitous computing environments. Communications of the ACM, Vol. 45, No. 12, December 2002, pp. 88–91. Kistler, J. J. and M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transactions on Computer Systems, Vol. 10, No. 1, February 1992, pp. 3–25. Klyne, G., F. Reynolds, C. Woodrow, H. Ohto, and M. H. Butler. Composite Capabilities/Preference Profiles: Structure and Vocabularies. http://www.w3.org/TR/2002/WD-CCPP-struct-vocab20021108, November 2002. Noble, B. D., M. Satyanarayanan, D. Narayanan, J. E. Tilton, J. Flinn, and K. R. Walker. Agile ApplicationAware Adaptation for Mobility. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, October 1997, pp. 276–287. Parry, R. Overlooking 3G. IEEE Potentials, Vol. 21, No. 4, October/November 2002, pp. 6–9. Perkins, C. E. IP Mobility support. IETF RFC 2002, October 1996. Satyanarayanan, M. Pervasive computing: vision and challenges. IEEE Personal Communications, Vol. 8, No. 4, August 2001, pp. 10–17. Stemm, M. and R. H. Katz. Vertical handoffs in wireless overlay networks. ACM Mobile Networking (MONET), Vol. 3, No. 4, 1998, pp. 335–350. Sun Microsystems. Jini Technology Architectural Overview http://www.sun.com/jini/whitepapers/architecture.html, January 1999. Thomson, S. and T. Narten. IPv6 Stateless Address Autoconfiguration. IETF RFC 1971, August 1996. UPnP Forum. Universal Plug and Play Device Architecture. http://www.upnp.org/download/ UPnPDA10_20000613.htm, June 2000. Want, R., A. Hopper, V. Falcao, and J. Gibbons. The active badge location system. ACM Transaction on Information Systems, Vol. 10, No. 1, January 1992, pp. 91–102. Want, R., T. Pering, G. Borriello, and K. I. Farkas. Disappearing hardware. IE,EE Pervasive Computing, Vol. 1, No. 1, January/March 2002, pp. 36–47. Want, R., B. N. Schilit, N. I. Adams, R. Gold, K. Petersen, D. Goldberg, J. R. Ellis, and M. Weiser. An overview of the ParcTab ubiquitous computing experiment. IEEE Personal Communications, Vol. 2, No. 6, December 1995, pp. 28–43. Warneke, B., M. Last, B. Liebowitz, and K. S.J. Pister. Smart dust: communicating with a cubic-millimeter computer. Computer, Vol. 34, No. 1, Jan 2001, pp. 44–51. Weiser, M. The computer for the 21st century. Scientific American, Vol. 256, No. 3, September 1991, pp. 94–104. Weiser, M. Some computer science issues in ubiquitous computing. Communications of the ACM, Vol. 36, No. 7, July 1993, pp. 75–84. Weiser, M. and J. S. Brown. The coming age of calm technology. PowerGrid Journal, Version 1.01, July 1996.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 1 Wednesday, August 4, 2004 9:15 AM
38 Worldwide Computing Middleware CONTENTS Abstract 38.1 Middleware 38.1.1 38.1.2 38.1.3 38.1.4
Asynchronous Communication Higher-Level Services Virtual Machines. Adaptability and Reflection
38.2 Worldwide Computing 38.2.1 38.2.2 38.2.3 38.2.4 38.2.5 38.2.6 38.2.7
Actor Model Language and Middleware Infrastructure Universal Actor Model and Implementation Middleware Services Universal Naming Remote Communication and Mobility Reflection
38.3 Related Work 38.3.1 38.3.2 38.3.3 38.3.4 38.3.5
Gul A. Agha Carlos A. Varela
Worldwide Computing Languages for Distributed and Mobile Computation Naming Middleware Remote Communication and Migration Middleware Adaptive and Reflective Middleware
38.4 Research Issues and Summary 38.5 Further Information 38.6 Glossary Acknowledgments References
Abstract The Internet provides the potential for utilizing enormous computational resources that are globally distributed. Such worldwide computing can be facilitated by middleware — software layers that deal with distribution and coordination, such as naming, mobility, security, load balancing, and fault tolerance. This chapter describes the World-Wide Computer, a worldwide computing infrastructure that enables distribution and coordination. We argue that the World-Wide Computer enables application developers to concentrate on their domain of expertise, reducing code and complexity by orders of magnitude.
38.1 Middleware The wide variety of networks, devices, operating systems, and applications in today’s computing environment create the need for abstraction layers to help developers manage the complexity of engineering
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 2 Wednesday, August 4, 2004 9:15 AM
38-2
The Practical Handbook of Internet Computing
distributed software. A number of models, tools, and architectures have evolved to address the composition of objects into larger systems: Some of the widely used middleware, ranging in support from basic communication infrastructure to higher-level services, include CORBA [Object Management Group, 1997], DCOM [Brown and Kindel, 1996], Java RMI [Sun Microsystems and JavaSoft, 1996], and more recently Web Services [Curbera et al., 2002]. Middleware abstracts over operating systems, data representations, and distribution issues, enabling developers to program distributed heterogeneous systems largely as though they were programming a homogeneous environment. Many middleware systems accomplish this transparency by enabling heterogeneous objects to communicate with each other. Because the communication model for object-oriented systems is synchronous method invocation, middleware typically attempts to give programmers the illusion of local method invocation when they invoke remote objects. The middleware layers are in charge of low-level operations, such as marshaling and unmarshaling arguments to deal with heterogeneity, and managing separate threads for network communication. Middleware toolkits provide compilers capable of creating code for the client (aka stub) and server (aka skeleton) components of objects providing remote application services, given their network-unaware implementation. Intermediate brokers help establish interobject communication and provide higherlevel services, such as naming, event and life-cycle services.
38.1.1 Asynchronous Communication In order to give the illusion of local method invocation when invoking remote objects, a calling object is blocked — waiting for a return value from a remote procedure or method call. When the value returns, the object resumes execution (see Figure 38.1). This style of communication is called remote procedure call or RPC. Users of middleware systems realized early on that network latencies make RPC much slower in nature than local communication and that the consequence of its extensive use can be prohibitive in overall application performance. Transparency of communication may thus be a misleading design principle [Waldo et al., 1997]. Asynchronous (aka event-based) communication services enable objects to communicate in much more flexible ways [Agha, 1986]. For example, the result of invoking a method may be redirected to a third party or customer, rather than going back to the original method caller. Moreover, the target object need not synchronize with the sender in order to receive the message, thus retaining greater scheduling flexibility and reducing the possibilities of deadlocks. Because asynchronous communication creates the need for intermediate buffers, higher-level communication mechanisms can be defined without significant additional overhead. For example, one can define a communication mechanism that enables objects to communicate with peers without knowing in advance the specific target for a given message. One such model is a shared memory abstraction used Sender
0
5
10
15 Value
Message
Receiver
0
5
10
15
FIGURE 38.1 Synchronous communication semantics requires the sender of a message to block and wait until the receiver has processed the message and returned a value. In this example, the message processing takes only 7 time units, whereas the network communication takes 10 time units.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 3 Wednesday, August 4, 2004 9:15 AM
Worldwide Computing Middleware
38-3
in Linda [Carriero and Gelernter, 1990]. Linda uses a shared tuple-space from which different processes (active objects) read and write. Another communication model is ActorSpaces [Callsen and Agha, 1994]; in ActorSpaces, actors use name patterns for directing messages to groups of objects (or a representative of a group). This enables secure communication that is transparent for applications. A more open but restrictive mechanism, called publish-and-subscribe [Banavar et al., 1999], has been used more recently. In publish-and-subscribe, set membership can be explicitly modified by application objects without pattern matching by an ActorSpace manager.
38.1.2 Higher-Level Services Besides communication, middleware systems provide high-level services to application objects. Such high-level services include, for example, object naming, lifecycle, concurrency, persistence, transactional behavior, replication, querying, and grouping [Object Management Group, 1997]. We describe these services to illustrate what services middleware may be used to provide. A naming service is in charge of providing object name uniqueness, allocation, resolution, and location transparency. Uniqueness is a critical condition for names so that objects can be uniquely found, given their name. This is often accomplished using a name context. Object names should be object location independent so that objects can move, preserving their name. A global naming context supports a universal naming space, in which context-free names are still unique. The implementation of a naming service can be centralized or distributed; distributed implementations are more fault-tolerant but create additional overhead. A life-cycle service is in charge of creating new objects, activating them on demand, moving them, and disposing of them, based on request patterns. Objects consume resources and therefore cannot be kept on systems forever — in particular, memory is often a scarce shared resource. Life-cycle services can create objects when new resources become available, can deactivate an object — storing its state temporarily in secondary memory — when it is not being actively used and its resources are required by other objects or applications, and can also reactivate the object, migrate it, or can dispose of (garbage collect) it if there are no more references to it. A concurrency service provides limited forms of protection against multiple threads sharing resources by means of lock management. The service may enable application threads to request exclusive access to an object’s state, read-only access, access to a potentially dirty state, and so on, depending on concurrency policies. Programming models such as actors provide higher-level support for concurrency management, preventing common errors, such as corrupted state or deadlocks, which can result from the use of a concurrency service. A persistence or externalization service enables applications to store an object’s state in secondary memory for future use, e.g., to provide limited support for transient server failures. This service, even though high level, can be used by other services such as the life-cycle service described above. A transactional service enables programming groups of operations with atomicity, consistency, isolation, and durability guarantees. Advanced transactional services may contain support for various forms of transactions such as nested transactions and long-lived transactions. A replication service improves locality of access for objects by creating multiple copies at different locations. In the case of objects with mutable state, a master replica is often used to ensure consistency with secondary replicas. In the case of immutable objects, cloning in multiple servers is virtually unrestricted. A query service enables manipulating databases with object interfaces using highly declarative languages such as SQL or OQL. Alternative query services may provide support for querying semistructured data such as XML repositories. A grouping service supports the creation of interrelated collections of objects, with different ordering and uniqueness properties, such as sets and lists. Different object collections provided by programming language libraries have similar functionality, albeit restricted to a specific programming language.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 4 Wednesday, August 4, 2004 9:15 AM
38-4
The Practical Handbook of Internet Computing
38.1.3 Virtual Machines Although CORBA’s approach to heterogeneity is to specify interactions among object request brokers to deal with different data representations and object services, an alternative approach is to hide hardware and operating system heterogeneity under a uniform virtual machine layer (e.g., see Lindholm and Yellin [1997]). The virtual machine approach provides certain benefits but also has its limitations [Agha et al., 1998]. The main benefit of a virtual machine is platform independence, which enables safe remote code execution and dynamic program reconfiguration through bytecode verification and run-time object migration [Varela and Agha, 2001]. In principle, the virtual machine approach is programming language independent because it is possible to create bytecode from different high-level programming languages. In practice, however, bytecode verification and language safety features may prevent compiling arbitrary code in unsafe languages, such as C and C++, into Java bytecodes without using loopholes such as the Java native interface, which break the virtual machine abstraction. The main limitations of the pure virtual machine approach are bytecode interpretation overhead in program execution and the inability to control heterogeneous resources as required in embedded and real-time systems. Research on just-in-time and dynamic compilation strategies has helped overcome the virtual machine bytecode execution performance limitations [Krall, 1998]. Open and extensible virtual machine specifications attempt to enable the development of portable real-time systems satisfying hard scheduling constraints and embedded systems with control loops for actuation [OVM Consortium, 2002; Schmidt et al., 1997; Bollela et al., 2000].
38.1.4 Adaptability and Reflection Next-generation distributed systems will need to satisfy varying levels of quality of service, dynamically adapt to different execution environments, provide well-founded failure semantics, have stringent security requirements, and be assembled on-the-fly from heterogeneous components developed by multiple service providers. Adaptive middleware [Agha, 2002] will likely prove to be a fundamental stepping stone to building nextgeneration distributed systems. Dynamic run-time customization can be supported by a reflective architecture. A reflective middleware provides a representation of its different components to the applications running on top of it. An application can inspect this representation and modify it. The modified services can be installed and immediately mirrored in further execution of the application (see Figure 38.2). We will describe the reflective model of actors in the next section.
38.2 Worldwide Computing Worldwide computing research addresses problems in viewing dynamic networked distributed resources as a coordinated global computing infrastructure. We have developed a specific actor-based worldwide computing infrastructure, the World-Wide Computer (WWC), that provides naming, mobility, and coordination middleware, to facilitate building widely distributed computing systems over the Internet. Worldwide computing applications view the Internet as an execution environment. Because Internet nodes can join and leave a computation at runtime, the middleware infrastructure needs to provide dynamic reconfiguration capabilities: In other words, an application needs to be able to decompose and recompose itself while running, potentially moving its subcomponents to different network locations.
38.2.1 Actor Model In traditional object-oriented systems, the interrelationship between objects — as state containers, as threads, and as process abstractions — is highly intertwined. For example, in Java [Gosling et al., 1996], multiple threads may be concurrently accessing an object, creating the potential for state corruption. A class can declare all its member variables to be private and all its methods to be synchronized to prevent
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 5 Wednesday, August 4, 2004 9:15 AM
Worldwide Computing Middleware
38-5
Image of application Middleware
Reflect (modify middleware components)
Reify (inspect middleware)
Application Image of middleware components FIGURE 38.2 Using reflection, an application can inspect and modify middleware components.
state corruption because of multiple concurrent thread accesses. However, this practice is inefficient and creates potential for deadlocks (see e.g., citepvarela-agha-www7-98). Other languages, such as C++, do not even have a concurrency model built-in, requiring developers to use thread libraries. Such passive object computation models severely limit application reconfigurability. Moving an object in a running application to a different computer requires guaranteeing that active threads within the object remain consistent after object migration and also requires very complex invocation stack migration, ensuring that references remain consistent and that any locks held by the thread are safely released. The actor model of computation is a more natural approach to application reconfigurability because an actor is an autonomous unit abstracting over state encapsulation and state processing. Actors can only communicate through asynchronous message passing and do not share any memory. As a consequence, actors provide a very natural unit of mobility and application reconfigurability. Actors also provide a unit of concurrency by processing one message at a time. Migrating an actor is then as simple as migrating its encapsulated state along with any buffered unprocessed messages. Reconfiguring an application composed of multiple actors is as simple as migrating a subset of the actors to another computer. Because communication is asynchronous and buffered, the application semantics remains the same as long as actor names can be guaranteed to be unique across the Internet. The universal actor model is an extension of the actor model, providing actors with a specific structure for universal names.
38.2.2 Language and Middleware Infrastructure Several libraries that support the Actor model of computation have been implemented in different objectoriented languages. Three examples of these are the Actor Foundry [Open Systems Lab, 1998], Actalk [Briot, 1989], and Broadway [Sturman, 1996]. Such libraries essentially provide high-level middleware services, such as universal naming, communication, scheduling, and migration.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 6 Wednesday, August 4, 2004 9:15 AM
38-6
The Practical Handbook of Internet Computing
An alternative is to support distributed objects in a language rich enough to enable coordination across networks. Several actor languages have also been proposed and implemented to date, including ABCL, [Yonezawa, 1990], Concurrent Aggregates [Chien, 1993], Rosette [Tomlinson et al., 1989], and Thal [Kim, 1997]. An actor language can also be used to provide interoperability between different object systems; this is accomplished by wrapping traditional objects in actors and using the actor system to provide the necessary services. There are several advantages associated with directly using an actor programming language, as compared to using a library to support actors: • Semantic constraints: Certain semantic properties can be guaranteed at the language level. For example, an important property is to provide complete encapsulation of data and processing within an actor. Ensuring that there is no shared memory or multiple active threads within an otherwise passive object is very important to guarantee safety and efficient actor migration. • API evolution: Generating code from an actor language, it is possible to ensure that proper interfaces are always used to communicate with and create actors. In other words, programmers cannot incorrectly use the host language. Furthermore, evolutionary changes to an actor API need not affect actor code. • Programmability: Using an actor language improves the readability of programs developed. Often, writing actor programs using a framework involves using language-level features (e.g., method invocation) to simulate primitive actor operations (e.g., actor creation or message sending). The need for a permanent semantic translation, unnatural for programmers, is a very common source of errors. Our experience suggests that an active object-oriented programming language — one providing encapsulation of state and a thread manipulating that state — is more appropriate than a passive objectoriented programming language (even with an actor library) for implementing concurrent and distributed systems to be executed on the Internet.
38.2.3 Universal Actor Model and Implementation The universal actor model extends the actor model [Agha, 1986] by providing actors with universal names, location awareness, remote communication, migration, and limited coordination capabilities [Varela, 2001]. We describe Simple Actor Language System and Architecture (SALSA), an actor language and system that has been developed to provide support for worldwide computing on the Internet. Associated with SALSA is a runtime system that provides the necessary services at the middleware level [Varela and Agha, 2001]. By using SALSA, developers can program at a higher level of abstraction. SALSA programs are compiled into Java bytecode and can be executed stand-alone or on the WorldWide Computer infrastructure. SALSA programs are compiled into Java bytecode to take advantage of Java virtual machine implementations in most existing operating systems and hardware platforms. SALSA-generated Java programs use middleware libraries implementing protocols for universal actor naming, mobility, and coordination in the World-Wide Computer. Table 38.1 relates different concepts in the World Wide Web to analogous concepts in the World-Wide Computer.
TABLE 38.1 Comparison of WWW and WWC Concepts Entities Transport protocol Language Resource naming Linking Run-time support
Copyright 2005 by CRC Press LLC
World Wide Web Hypertext documents HTTP HTML/MIME types URL Hypertext anchors Web browsers/servers
World Wide Computer Universal actors RMSP/UANP Java bytecode UAN/UAL Actor references Theaters/Naming servers
C3812_C38.fm Page 7 Wednesday, August 4, 2004 9:15 AM
Worldwide Computing Middleware
38-7
HIGHER-LEVEL SERVICES
Coordination Service
Replication Service
Migration Service
Messaging Service
Naming Service
Split and Merge Service
Lifecycle Service
CORE SERVICES
Actor Creation Service
Transport Service
Persistence Service
FIGURE 38.3 Core services include actor creation, transportation, and persistence. Higher-level services include actor communication, naming, migration, and coordination.
38.2.4 Middleware Services Services implemented in middleware to support the execution of SALSA programs over the World-Wide Computer can be divided into two groups: core services and higher-level services, as depicted in Figure 38.3. 38.2.4.1 Core Services An actor creation service supports the creation of new actors (which consists of an initial state, a thread of execution, and a mailbox) with specific behaviors. Every created actor is in a continuous loop, sequentially getting messages from its mailbox and processing them. Concurrency is a consequence of the fact that multiple actors in a given program may execute in parallel. A transport service supports reliable delivery of data from one computer to another. The transport service is used by the higher-level remote communication, migration, and naming services. A persistence service supports saving an actor’s state and mailbox into secondary memory, either for checkpoints, fault tolerance or improved resource (memory, processing, power) consumption. 38.2.4.2 Higher-Level Services A messaging service supports reliable asynchronous message delivery between peer actors. A message in SALSA is modelled as a potential Java method invocation. The message along with optional arguments is placed in the target actor’s mailbox for future processing. A message-sending expression returns immediately after delivering the message — not after the message is processed as in traditional method invocation semantics. Section 38.2.6 discusses this service in more detail. A naming service supports universal actor naming. A universal naming model enables developers to uniquely name resources worldwide in a location-independent manner. Location independence is important when resources are mobile. Section 38.2.5 describes the universal actor naming model and protocol used by this service. A life-cycle service can deactivate an actor into persistent storage for improved resource utilization. It can also reactivate the actor on demand. Additionally, it performs distributed garbage collection.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 8 Wednesday, August 4, 2004 9:15 AM
38-8
The Practical Handbook of Internet Computing
A migration service enables actor mobility, preserving universal actor names and updating universal actor locations. Migration can be triggered by the programmer using SALSA messages, or it may be triggered by higher-level services such as load balancing and coordination. Section 38.2.6 provides more details on the actor migration service. A replication service can be used to improve locality and access times for actors with immutable state. It can also be used for improving concurrency in parallel computations when additional processing resources become available. A split-and-merge service can be used to fine-tune the granularity of homogeneous actors doing parallel computations to improve overall system throughput. Coordination services are meant to provide the highest level of services to applications, including those requiring reflection and adaptation. For example, a load-balancing service can profile resource utilization and automatically trigger actor migration, replication, and splitting and merging behaviors for coordinated actors [Desell et al., 2004].
38.2.5 Universal Naming Because universal actors are mobile (their location can change arbitrarily) it is critical to provide a universal naming system that guarantees that references remain consistent upon migration. Universal Actor Names (UANs) are identifiers that represent an actor during its life time in a locationindependent manner. An actor’s UAN is mapped by a naming service into a Universal Actor Locator (UAL), which provides access to an actor in a specific location. When an actor migrates, its UAN remains the same, and the mapping to a new locator is updated in the naming system. As universal actors refer to their peers by their names, references remain consistent upon migration. 38.2.5.1 Universal Actor Names A UAN refers to an actor during its life time in a location-independent manner. The main requirements on universal actor names are location independence, worldwide uniqueness, human readability, and scalability. We use the Internet’s Domain Name System (DNS) [Mockapetris, 1987] to hierarchically guarantee name uniqueness over the Internet in a scalable manner. More specifically, we use Uniform Resource Identifiers (URI) [Berners-Lee et al., 1998] to represent UANs. This approach does not require actor names to have a specific naming context because we build on unique Internet domain names. The universal actor name for a sample address book actor is: uan://wwc.yp.com/~smith/addressbook/ The protocol component in the name is uan. The DNS server name represents an actor’s home. An optional port number represents the listening port of the naming service — by default 3030. The remaining name component, the relative UAN, is managed locally at the home name server to guarantee uniqueness. 38.2.5.2 Universal Actor Locators An actor’s UAN is mapped by a naming service into a UAL, which provides access to an actor in a specific location. For simplicity and consistency, we also use URIs to represent UALs. Two UALs for the address book actor above are: rmsp://wwc.yp.com/~smith/addressbook/ and rmsp://smith.pda.com:4040/addressbook/ The protocol component in the locator is rmsp, which stands for the Remote Message Sending Protocol. The optional port number represents the listening port of the actor’s current theater, or single-node runtime system — by default 4040. The remaining locator component, the relative UAL is managed locally at the theater to guarantee uniqueness.
Copyright 2005 by CRC Press LLC
C3812_C38.fm Page 9 Wednesday, August 4, 2004 9:15 AM
Worldwide Computing Middleware
38-9
Although the address book actor can migrate from the user’s laptop to his or her personal digital assistant (PDA) or cellular phone, the actor’s UAN remains the same, and only the actor’s locator changes. The naming service is in charge of keeping track of the actor’s current locator. 38.2.5.3 Universal Actor Naming Protocol When an actor migrates, its UAN remains the same, and the mapping to a new locator is updated in the naming system. The Universal Actor Naming Protocol (UANP) defines the communication between an actor’s theater and an actor’s home during its life time, which involves creation and initial binding, migration, and garbage collection. UANP is a text-based protocol resembling HTTP with methods to create a UAN to UAL mapping, to retrieve a UAL given the UAN, to update a UAN’s UAL, and to delete the mapping from the naming system. The following table shows the different UANP methods: Method PUT GET DELETE UPDATE
Parameters Relative UAN, UAL Relative UAN Relative UAN Relative UAN, UAL
Action Creates a new entry in the database Returns the UAL entry in the database Deletes the entry in the database Updates the UAL entry in the database
A distributed naming service implementation can use consistent hashing to replicate UAN to UAL mappings in a ring of hosts and provide a scalable and reasonable level of fault tolerance. The logarithmic lookup time can further be reduced to a constant lookup time in most cases [Tolman, 2003]. 38.2.5.4 Universal Naming in SALSA The SALSA pseudo-code for a sample address book management program is shown in Figure 38.4. The program creates an address book manager and binds it to a UAN. After the program successfully terminates, the actor can be remotely accessed by its name.
38.2.6 Remote Communication and Mobility The underlying middleware used by SALSA-generated Java code uses an extended version of Java-object serialization for both remote communication and actor migration.
behavior AddressBook { String getEmail(String name){...} void act(String[] args){ try [ AddressBook AddressBook=new AddressBook()at (“uan://wwc.yp.com/˜smith/addressbook/”; “rmsp://wwc.yp.com/˜smith/addressbook/”); } catch (Exception e){ standardOutputDCA>DISA IANA (ISI)
ISI denominates Postel DARPA role as “IANA” (1988)
DISA NIC to Boeing (‘97),
acquired by SAIC (’99)
IANA Function transferred to DOC; outsource switched from ISI to ICANN (2000)
FIGURE 54.5 EVOLUTION of Internet NICs.
54.6.1 Legacy Standards and Administrative Forums During the 1970s and 1980s, Internet standards were the province of the DARPA-sponsored committees that produced the specifications in the form of Requests for Comment (RFC). This activity and the standards were formalized by the U.S. Department of Defense in 1982 and published by DARPA and the Defense Communications Agency (now DISA). The standards development activity became institutionalized in the form IETF, maintained through an IETF Secretariat under the aegis of the Corporation for National Research Initiatives (CNRI). The IETF itself has become associated with the Internet Society. This configuration remains today, and the authoritative standards are published by the IETF Secretariat on its Website. The IETF work is managed through an Internet Engineering Steering Group (IESG) and an Internet Architecture Board (IAB) — also supported by the IETF Secretariat. During the 1970s, the USC Information Sciences Institute (ISI) in Marina del Rey, CA, in cooperation with the Menlo Park, CA, NIC, began to provide some of the administrative functions necessary to implement the Internet standards. The ISI activity subsequently became institutionalized in the late 1980s as the Internet Assigned Numbers Authority (IANA). The evolution of the IP address and DNS components of this function are depicted in Figure 54.5. There were scores of other functions, however, that remain with the IANA, which is maintained as an outsourced contractor activity by the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST).
54.6.2 The Universe of Internet Standards and Administrative Forums As the Internet grew, so did the standards and administrative forums of various kinds. There are now more than 100 different bodies and forums of various kinds that are far too numerous to describe here. Table 54A.1 — Internet Standards Forums — lists most of them (See Appendix). Some of these forums operate essentially independently of each other. Many serve specialized technologies, applications, or constituencies.
Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 16 Thursday, August 5, 2004 5:58 AM
54-16
The Practical Handbook of Internet Computing
54.7 Emerging Trends Like all ecosystems, those for Internet policy and governance continue to evolve to accommodate the needs of its constituents. The inherently autonomous, self-organizing characteristics of the Internet will no doubt continue indefinitely to stress governmental attempts to encourage beneficial actor conduct and punish undesirable behavior, which is what policy and governance mechanisms are meant to accomplish.
54.7.1 Security The most obvious emerging trends revolve around two kinds of protective and security-related needs. One is proactive, involving actions to reduce the vulnerability of Internet resources, including users subject to adverse behavior. The other is reactive, involving a need to identify bad Internet actors and to acquire evidence for subsequent legal proceedings. Almost all new, successful infrastructure technologies have these same steady-state needs. These needs have grown dramatically post 2001 as governments worldwide have witnessed dramatic increases in malevolent Internet use. The needs seem unlikely to abate. An almost certain result will be to impose user authentication requirements and the maintenance of usage records. Accountability cannot otherwise exist. At the same time, encryption as a means of both protecting sensitive information and verifying content will expand.
54.7.2 Diversity The Internet, because of its growing ubiquity, seems destined to support an increasing diversity of uses, both in terms of an expanding number of transport options and as increasing numbers of users and services. This “hourglass effect” of the Internet protocol becomes ever more attractive as a universal glue between transport options and applications, especially with expanded address options supported by IP version 6. On the other hand, single infrastructures create their own vulnerabilities, and because of increasing concerns regarding security and survivability, the all-encompassing expansion of the Internet is likely self-limiting.
54.7.3 Assimilation Just as all of the precedent technologies before it, the Internet has moved into a mass-market assimilation phase where its identity has substantially merged into a common infrastructure together with a vast array of “always on” access devices, networks, and services. The price of success, however, is the adaptation and adoption of the infrastructure and the emergence of vulnerabilities as it becomes a vehicle for unintentional or intentional harm with profound adverse consequences for people, commerce, and society. The vulnerabilities exist for any significant infrastructure, whether communications, power, or transport. Going forward, the challenges faced with this larger infrastructure will be not be those of innovation and growth alone, but include ever more prominently the imposition of policies and requirements that lessen infrastructure vulnerabilities.
References Where possible, readers are urged to access source documents rather than secondary material. 1. Bootstrap Institute, Interview 4, 1987 Interviews with Douglas Engelbart, http://www.sul.stanford.edu/depts/hasrg/histsci/ssvoral/engelbart/engfmst4-ntb.html. 2. Request for Comments Repository, http://www.ietf.org/rfc.html. 3. J. McQuillan, V. Cerf, A Practical View of Computer Communications Protocols, IEEE Computer Society, 1978. Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 17 Thursday, August 5, 2004 5:58 AM
The Internet Policy and Governance Ecosystem
54-17
4. FCC, Computer II Final Order, 77 FCC2d 384 (1980). 5. DoD Policy on Standardization of Host-to-Host Protocols for Data Communications Networks, The Under Secretary of Defense, Washington, D.C., 23 March 1982. 6. Internet Protocol Implementation Guide, Network Information Center, SRI International, Menlo Park, CA, August 1982. 7. R. E. Kahn, A. Vezza, and A. Roth, Electronic Mail and Message Systems: Technical and Policy Perspectives, AFIPS, Arlington, VA, 1981. 8. Internet Protocol Transition Workbook, Network Information Center, SRI International, Menlo Park, CA, March 1982. 9. Mark Lottor, Internet Domain Survey, Network Wizards, http://www.nw.com/. Lottor has since 1982 engaged in Internet metrics research, data collection, and analysis. 10. DDN Protocol Handbook, DDN Network Information Center, SRI International, Menlo Park,CA, 1985. 11. Towards a Dynamic Economy — Green Paper on the Development of the Common Market for Telecommunications Services and Equipment. Communication by the Commission. COM (87) 290 final (June 30, 1987). 12. OECD, Committee for Information, Computer and Communications Policy, Value-Added Services: Implications for Telecommunications Policy, Paris 1987. 13. National Science Foundation, Network Information Services Manager(s) for NSFnet and the NREN, NSF 92-24 (1992). 14. Internet Engineering Task Force Secretariat, www.ietf.org. 15. RIPE NCC, www.ripe.net. 16. Cooperative Association for Internet Data Analysis, www.caida.org. 17. Internet International Ad Hoc Committee, www.iahc.org. 18. Commission of the European Union, Information Society, http://europa.eu.int/information_society. See also, EU Law + Policy Overview, The Internet, The Information Society and Electronic Commerce, http://www.eurunion.org/legislat/interweb.htm. 19. U.S. Department. of Congress, National Telecommunications and Information Administration, Management of Internet Names and Addresses, www.ntia.doc.gov/ntiahome/domainname/. 20. FCC, Notice of Proposed Rulemaking, Appropriate Framework for Broadband Access to the Internet over Wireline Facilities, CC Docket No. 02-33, FCC 02-42, 15 February 2002. 21. FCC, Notice of Proposed Rulemaking, In the Matter of IP-Enabled Services, Docket No. 04-36, Document No. FCC 04-28, 10 March 2004. 22. EC Information Society Directorate, IP Voice and Associated Convergent Services, Final Report for the European Commission, 28 January 2004. 23. FCC, Comment Sought on CALEA Petition for Rulemaking, RM-10865, DA No. 04-700, 12 March 2004. 24. FCC, Notice of Proposed Rulemaking, in the Matter of Rules and Regulations Implementing the Controlling of the Assault of Non-Solicited Pornography and Marketing Act of 2003, CG Docket No. 04-53, Doc. FCC 05-52, 19 March 2004.
Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 18 Thursday, August 5, 2004 5:58 AM
54-18
The Practical Handbook of Internet Computing
Appendix 54A.1 Internet standards and Administrative Forums Name
Acronym
3RD Generation Partnership Project
3GPP
URL
Type
Focus
www.3gpp.org
standards
telecom
Accredited Standards Committee (ASC) X12
www.x12.org
standards
data exchange
Aim, Inc.
www.aimglobal.org
standards
identifiers
www.atis.org
standards
telecom
www.ala.org
standards
library
Alliance for Telecommunications Industry Solutions
ATIS
American Library Association American National Standards Institute
ANSI
www.ansi.org
standards
diverse
American Registry for Internet Numbers
ARIN
www.arin.net
operations
internet
American Society for Information Science and Technology
ASIS
www.asis.org
standards
general
www.x9.org
standards
financial
ANSI X9 Asia Pacific Networking Group
APNG
www.apng.org
operations
internet
Asia-Pacific Telecommunity Standardization Program
ASTAP
www.aptsec.org/astap/
standards
telecom
www.aiim.org
standards
imaging
Bluetooth Consortium
www.bluetooth.com
standards
wireless
Cable Labs
www.cablelabs.org
standards
telecom
Association for Information and Image Management International
AIIM
Computer Emergency Response Team
CERT
www.cert.org
operations
security
Content Reference Forum
CRF
www.crforum.org
standards
digital content
Critical Infrastructure Assurance Office
CIAO
www.ciao.gov
government
security
Cross Industry Working Team
XIWT
www.xiwt.org
standards
internet
Data Interchange Standards Association
DISA
www.disa.org
standards
application
Department of Justice
DOJ
www.doj.gov
government
security
Digital Library Federation
DLF
www.diglib.org
standards
library
Digital Video Broadcasting Consortium
DVB
www.dvb.org
standards
broadcasting
Directory Services Markup Language Initiative Group
DSML
www.dsml.org
standards
directory
Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 19 Thursday, August 5, 2004 5:58 AM
54-19
The Internet Policy and Governance Ecosystem
54A.1 Internet standards and Administrative Forums (continued) Name
Acronym
Distributed Management Task Force
DMTF
URL
Type
Focus
www.dmtf.org
standards
management
DOI Foundation
www.doi.org
standards
application
ebXML
www.ebxml.org
standards
application
EC Diffuse Project
www.diffuse.org/fora.html
standards
reference
EPF
www.epf.org
standards
financial
Electronics Industry Data Exchange Association
EIDX
www.eidx.org
standards
data exchange
Enterprise Computer Telephony Forum
ECTF
www.ectf.org
standards
telecom
ENUM Forum
ENUM
www.enum-forum.org
standards
telecom
www.epcglobalinc.com
standards
identity/RFID
europa.eu.int/comm/ index_en.htm
government
telecom
CENELEC
www.cenelec.org
standards
general
CEN
www.cenorm.be
standards
general
Electronic Payments Forum
EPCGlobal European Commission
European Committee for Electrotechnical Standardization European Committee for Standardization
EC
European Computer Manufacturers Association
ECMA
www.ecma.ch
standards
telecom
European Forum for Implementers of Library Automation
EFILA
www.efila.dk
standards
classification
ETSI
www.etsi.org
standards
telecom
www.eurogi.org
standards
location
European Telecommunications Standards Institute European Umbrella Organisation for Geographic Information
EUROGI
Federal Communications Commission
FCC
www.fcc.gov
government
telecom
Federal Trade Commission
FTC
www.ftc.gov
government
diverse
FidoNet Technical Standards Committee
FSTC
www.ftsc.org
standards
network
Financial Information eXchange (FIX) protocol
www.fixprotocol.org
standards
financial
Financial products Markup Language Group
www.fpml.org
standards
financial
financial services industry
www.x9.org
standards
financial
www.fstc.org
standards
financial
Financial Services Technology Consortium Copyright 2005 by CRC Press LLC
FSTC
C3812_C54.fm Page 20 Thursday, August 5, 2004 5:58 AM
54-20
The Practical Handbook of Internet Computing
554A.1 Internet standards and Administrative Forums (continued) Name
Acronym
Forum for metadata schema implementers Forum of Incident Response and Security Teams
FIRST
Global Billing Association
URL
Type
Focus
www.schemas-forum.org
standards
application
www.first.org
operations
security
www.globalbilling.org
standards
Global Standards Collaboration
GSC
www.gsc.etsi.org
standards
telecom
Group on Electronic Document Interchange
GEDI
lib.ua.ac.be/MAN/T02/ t51.html
standards
classification
GSM Association
GSM
www.gsmworld.com
standards
telecom
IEEE Standards Association
standards.ieee.org
standards
diverse
IMAP Consortium
www.impa.org
standards
application
www.ict.etsi.org
standards
authentication
www.infraguard.net
government
security
www.ieee.org
standards
www.ifxforum.org
standards
financial
Information and Communications Technologies Board
ICTSB
Infraguard Alliance Institutute of Electrical and Electronic Engineers
IEE802.11
Interactive Financial eXchange (IFX) Forum International Confederation of Societies of Authors and Composers
CISAC
www.cisac.org
standards
classification
International Digital Enterprise Alliance
IDEA
www.idealliance.org/
standards
metadata
International Federation for Information Processing
IFIP
www.ifip.or.at
standards
application
International Federation of Library Associations
IFLA
www.ifla.org
standards
classification
www.i3a.org
standards
imaging
www.imtc.org
standards
telecom
International Imaging Industry Association International Multimedia Telecommunications Forum
IMTC
International Organization for Standardization
ISO
www.iso.ch
standards
diverse
International Telecommunication Union
ITU
www.itu.org
standards
telecom
International Telecommunication Union
ITU
www.itu.int
government
telecom
International Telecommunications Advisory Committee
ITAC
www.state.gov/www/issues/ economic/cip/itac.html
standards
telecom
Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 21 Thursday, August 5, 2004 5:58 AM
54-21
The Internet Policy and Governance Ecosystem
54A.1 Internet standards and Administrative Forums (continued) Name
Acronym
URL
Type
Focus
International Webcasting Association
IWA
www.iwa.org
standards
broadcasting
Internet Architecture Board
IAB
www.iab.org
standards
internet
Internet Corporation for Names and Numbers
ICANN
www.icann.org
Internet Engineering Task Force
IETF
www.ietf.org
standards
network
Internet Mail Consortium
IMC
www.imc.org
standards
application
Internet Security Alliance
ISA
www.isalliance.org
operations
security
www.ipdr.org
standards
telecom
IPV6 Forum
www,ipv6.org
standards
internet
ISO/TC211
www.isotc211.org
standards
location
jcp.org/jsr/detail/035.jsp
standards
telecom
Java Community
java.sun.com
standards
application
Liberty Alliance
www.projectliberty.net
standards
Library of Congress
www.loc.gov/standards/
standards
classification
IPDR (Internet Protocol Detail Record) Organization, Inc
Java APIs for Integrated Networks
IPDR
JAIN
Localisation Industry Standard Association
LISA
www.lisa.org
standards
application
Mobile Games Interoperability Forum
MGIF
www.mgif.org
standards
games
www.mobilepaymentforum. org
standards
financial
www.mwif.org
standards
wireless
www.msforum.org
standards
telecom
government
telecom
Mobile Payment Forum
Mobile Wireless Internet Forum Multiservice Switch Forum
MWIF MSF
National Association of Regulatory and Utility Commissioners
NARUC
www.naruc.org
National Automated Clearing House Association
NACHA
www.nacha.org
National Committee for Information Technology Standards
NCITS
www.ncits.org
standards
security
www.ncs.gov/ncs/html/ NCSProjects.html
standards
telecom
National Communications System
NCS
National Emergency Number Association
NENA
www.nena.org
standards
telecom
National Exchange Carriers Association
NECA
www.neca.org
government
telecom
Copyright 2005 by CRC Press LLC
C3812_C54.fm Page 22 Thursday, August 5, 2004 5:58 AM
54-22
The Practical Handbook of Internet Computing
54A.1 Internet standards and Administrative Forums (continued) Name
Acronym
National Genealogical Society
URL
Type
Focus
www.ngsgenealogy.org/ comstandards.htm
standards
application
National Information Assurance Partnership
NIAP
niap.nist.gov
standards
security
National Information Standards Organization
NISO
www.niso.org
standards
security
National Infrastructure Protection Center
NIPC
www.nipc.gov
government
security
National Institute for Standards and Technology
NIST
www.nist.gov
government
security
National Security Agency
NSA
www.nsa.org
government
security
National Standards System Network
NSSN
www.nssn.org/ developer.html
standards
reference
National Telecommunications and Information Administration
NTIA
www.ntia.doc.gov
government
telecom
Network Applications Consortium
NAC
www.netapps.org
standards
application
Network Reliability & Interoperability Council
NRIC
operations
telecom
standards
location
NIMA Geospatial and Imagery Standards Management Committee
NIMA http://164.214.2.51/ GSMC ISMC
NIST Computer Security Resource Center
CSRC
csrc.nist.gov
standards
security
North American Numbering Council
NANC
www.fcc.gov/ccb/Nanc/
operations
telecom
www.nanog.org
operations
internet
www.omg.org
standards
general
Dublin Core www.oclc.org
standards
metadata
www.ontology.org
standards
metadata
North American Operators Group Object Management Group Online Computer Library Center
NANOG OMG
Ontology.org Open Applications Group
OAGI
www.openapplications.org
standards
application
Open Archives Forum
OAF
edoc.hu-berlin.de/oaf
standards
archive
Open Bioinformatics Foundation
www.open-bio.org
standards
application
Open Directory Project
www.dmoz.org
standards
directory
www.opengis.org
standards
location
Open H323 Forum
www.openh323.org
standards
multimedia
Open LS
www.openls.org
standards
location
Open Mobile Alliance
www.openmobilealliance.org standards
wireless
Open GIS Consortium
Copyright 2005 by CRC Press LLC
OGC
C3812_C54.fm Page 23 Thursday, August 5, 2004 5:58 AM
54-23
The Internet Policy and Governance Ecosystem
54A.1 Internet standards and Administrative Forums (continued) Name
Acronym
URL
Type
Focus
Open Services Gateway Initiative
OSGi
www.osgi.org
standards
application
Organization for Economic Cooperation and Development
OECD
www.oecd.gov
government
political
Organization for the Advancement of Structured Information Standards
OASIS
www.oasis-open.org
standards
PKIForum
PKI Forum
www.pkiforum.org
standards
security
Presence and Availability Management Forum
PAM Forum www.pamforum.org
standards
wireless
www.projectmesa.org
standards
wireless
Project MESA Reseau IP EuropeenRéseaux IP Européens
RIPE
www.ripe.net
operations
internet
Security Industry Association
SIA
www.siaonline.org
standards
security
www.sipforum.com/
standards
telecom
www.smartcardalliance.org/
standards
identifiers
www.smpte.org
standards
imaging
www.softswitch.org/
standards
telecom
www.saltforum.org
standards
application
www.syncml.org
standards
wireless
www.tiaonline.org
standards
standards
www.tmforum.org
standards
telecom
www.ataccess.org
standards
handicaped
NACHA
www.nacha.org
standards
financial
EEMA
www.eema.org
standards
financial
www.opengroup.org
standards
general
www.parlay.org
standards
telecom
The Portable Application Standards Committee
www.pasc.org
standards
application
TruSecure
www.trusecure.com
standards
security
www.trustedcomputinggrou p.org
standards
security
www.umts-forum.org
standards
wireless
SIP Forum Smart Card Alliance Society of Motion Picture and Television Engineers
SCA SMPTE
Softswitch Consortium Speech Application Language Tags SyncML Initiative, Ltd Telecommunications Industry Association
SALT SyncML TIA
TeleManagment Forum The Alliance for Technology Access The Electronic Payments Association The European Forum for Electronic Business
ATA
The Open Group The PARLAY Group
Trusted Computing Group
UMTS Forum Copyright 2005 by CRC Press LLC
PARLAY
TCG
UMTS
C3812_C54.fm Page 24 Thursday, August 5, 2004 5:58 AM
54-24
The Practical Handbook of Internet Computing
54A.1 Internet standards and Administrative Forums (continued) Name
Acronym
Unicode Consortium Uniform Code Council
EAN-UCC
URL
Type
Focus
www.unicode.org
standards
identifiers
www.uc-council.org/
standards
identifiers
Universal Description, Discovery and Integration Community
UDDI
www.uddi.org
standards
application
Universal Plug and Play Forum
UPnP
www.upnp.org
standards
network
Universal Wireless Communications Consortium
UWC
www.uwcc.org
standards
wireless
Value Added Services Alliance
VASA
www.vasaforum.org
standards
telecom
www.voicexml.org
standards
wireless
www.ws-i.org
standards
Web3d
www.web3d.org
standards
games
WiFi Alliance
www.wirelessethernet.org
standards
wireless
www.wlana.org
standards
wireless location
Voice XML Initiative Web Services Interoperability Organization
WS-I
Wireless LAN Association
WLANA
Wireless Location Industry Association
WLIA
www.sliaonline.com
standards
World Wide Web Consortium
W3C
www.w3.org
standards
XML Forum
www.xml.org
standards
application
XML/EDI Group
www.xmledi-group.org
standards
data exchange
Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 1 Wednesday, August 4, 2004 9:48 AM
55 Human Implications of Technology CONTENTS Abstract 55.1 Overview 55.2 Technological Determinism 55.3 Social Determinism 55.4 Values in Design 55.5 Human Implications of the Security Debates 55.6 Conceptual Approaches to Building Trustworthy Systems 55.6.1 55.6.2 55.6.3 55.6.4 55.6.5
L. Jean Camp Ka-Ping Yee
Trusted Computing Base Next Generation Secure Computing Platform Human-Centered Trusted Systems Design Identity Examples Data Protection vs. Privacy
55.7 Network Protocols as Social Systems 55.8 Open vs. Closed Code 55.9 Conclusions Additional Resources References
Abstract The relationship between technology and society is characterized by feedback. Technological determinist and social determinist perspectives offer informative but narrow insights into both sides of the process. Innovative individuals can generate new technologies that alter the lives, practices, and even ways of thinking in a society. Societies alter technologies as they adopt them, often yielding results far from the hopes or fears of the original designers. Design for values, also called value-sensitive design, consists of methods that explicitly address the human element. As the example of usability in security illustrates, a designer who is cognizant of human implications of a design can produce a more effective technological solution.
55.1 Overview The human implications of a technology, especially for communications and information technology, begin in the design stage. Conversely, human implications of technology are most often considered after the widespread adoption of the technology. Automobiles cause pollution; televisions may cause violence in children. Social values can be embedded at any stage in the development process: invention, adoption, diffusion, and iterative improvement. A hammer wielded by Habitat for Humanity and a hammer wielded in a © 2005 by CRC Press LLC
C3812_C55.fm Page 2 Wednesday, August 4, 2004 9:48 AM
55-2
The Practical Handbook of Internet Computing
violent assault cannot be said to have the same human value at the moment of use, yet the design value of increasing the efficacy of human force applies in both situations. In the case of the hammer, the laws of physics limit the designer. Increasingly the only limit to a designer in the virtual world is one of imagination. Thus, designs in search engines, browsers, and even network protocols are created from a previously inconceivably vast range of alternatives. How important are the choices of the designer? There are two basic schools, one which privileges technical design as the driver of the human condition, and one which argues that technologies are the embodiment of social forces beyond the designers’ control. After introducing these boundary cases, and identifying the emerging middle, this chapter focuses on a series of studies of particular technical designs. The most considered case is that of security and trust. The chapter closes with a pointer to the significant debates on open or closed code, and the potential of network protocols themselves to embody valueladen choices. The final word is one of caution to the designer that the single reliable principle of responsible design is full disclosure, as any obfuscation implicitly assumes omnipotence and correctness about the potential human implications of the designers’ technical choice.
55.2 Technological Determinism Technological determinism argues that the technologically possible will inevitably be developed and the characteristics of the newly developed technologies will alter society as the technology is adopted [Winner, 1986] [Eisenstein, 1987]. Some find optimism from such a viewpoint, arguing that technology will free us from the human condition [Negroponte, 1995; Pool, 1983]. Others find such a scenario to be the source of nightmares, arguing that information and communications technologies (ICT) have “laid waste the theories on which schools, families, political parties, religion, nationhood itself ” and have created a moral crisis in liberal democracy [Postman, 1996]. Marx has been identified as perhaps the most famous technological determinist in his descriptions of the manner in which the industrial revolution led to the mass exploitation of human labor. Yet technological determinism is not aligned exclusively with any particular political viewpoint. Technological determinism may be overarching, as with Marx and Winner. In this case, both observed that large complex technologies require large complex organizational systems for their management. The state was required by the structure of large capitalist institutions, which were required by the technologies of the factory and the railroad. Even government was a function of the organization of capital through the state, as Engels noted, “the proletariat needs the state, not in the interests of freedom but in order to hold down its adversaries, and as soon as it becomes possible to speak of freedom the state as such ceases to exist” [Tucker, 1978]. Thus when means and methods of production changed hands the state would no longer be needed. Urban industrialization created that moment in time, at which workers (previously peasants) would be empowered to organize and rise up, overthrowing their oppressors and thereby removing the burden of oppression and government simultaneously. An echo of this determinist came from the libertarian viewpoint with the publication of the Declaration of Cyberspace. At the peak of the technologically deterministic embrace of the then-emerging digital network, this declaration argued that past technologies had created an invasive oppressive state. Once again the new technologies created under the old state would assure its downfall. “You claim there are problems among us that you need to solve. You use this claim as an excuse to invade our precincts. Many of these problems do not exist. Where there are real conflicts, where there are wrongs, we will identify them and address them by our means. We are forming our own Social Contract. This governance will arise according to the conditions of our world, not yours. Our world is different.” [Barlow, 1996] Both of these are reductionist political arguments that the state and the society were a function of previous technologies. The changes in technologies would therefore yield radical changes — in both cases, the destruction of the state. Technological determinism is reductionist. The essential approach is to consider two points in time that differ in the technology available. For example, the stirrup enabled the creation of larger towns by Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 3 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-3
making the care of far-flung fields feasible. Larger towns created certain class and social practices. Therefore determinism would say that the stirrup created social and class realignments and sharpened class distinctions. In the case of technological determinism of communications the argument is more subtle. Essentially the ICT concept of technological determinism is that media is an extension of the senses in the same way transport is an extension of human locomotion. Thus, communications technologies frame the personal and cultural perspective for each participant, as expressed most famously, in “The medium is the message.” [McLuhan, 1962] In the case of communications and determinism, Marshall McLuhan framed the discourse in terms of the tribal, literate, print, and electronic ages. For McLuhan each technology — the phonetic alphabet, the printing press, and the telegraph — created a new world view. Tribal society was based on stories and magic. Phonetic societies were based on myth and the preservation of cultural heritage. Print societies were based on rational construction and the scientific method. Text reaches for the logical mind; radio and television call out to the most primitive emotional responses. The new hypertext society will be something entirely different, both more reasoned and less rational. A related debate is geographical determinism [Diamond, 1999] versus cultural determinism [Landes, 1999]. In this argument the distance from the equator and the availability of natural resources enabled technological development, with the role of culture in innovation being hotly contested. It is agreed that technological development then determined the relative power of the nations of the world. In McLuhan’s view the media technology determined our modern world, with a powerful rational scientific North and a colonized South. In the view of the geographical determinist the geography determined the technology, and technology then determined social outcomes.
55.3 Social Determinism A competing thesis holds that technology is a product of society. This second view is called social construction [Bijker et al., 2001]. Technologies are physical representations of political interests. Social implications are embedded by the stakeholders including inventors and governments, on the basis of their own social values. Some proponents of this view hold that users are the only critical stakeholders. This implies that adoption is innovation, and thus the users define technology [Fischer, 1992]. As technical determinists look at two points in time and explain all changes as resulting from technology as a social driver, social constructionists look at two points in technological evolution and use social differences as the sole technical driver. One example of how society drives innovation is in telephony. Both automated switching and lowend telephones were invented for identifiable, explicit social reasons. Telephones were initially envisioned as business technology and social uses were discouraged, by company policy and advertising campaigns. As long as patents in the U.S. protected the technology there was no rural market as the technology was also assumed by the Bell Company to be inherently urban. When the patents expired, farmers used their wire fences for telephony and provided part-time low-cost service. The existence of the service, not the quality of the service, was the critical variable. Once intellectual property protections were removed, families adopted phones and social uses of phones came to dominate. Thus, telephones were developed that were cheap, less reliable, and of lower quality as opposed to the high-end systems developed by the Bell Company. The design specifications of the telephones were socially determined, but the overall function was technically determined. In the case of automatic switching, the overall function was socially determined. The goal of an automatic switch was to remove the human switchboard operator. An undertaker, believing that the telephone operator was connecting the newly bereaved to her brother (the competing town undertaker), invented automated switching [Pierce and Noll, 1990]. His design goal was socially determined yet the implementation and specifications for the switch were technical. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 4 Wednesday, August 4, 2004 9:48 AM
55-4
The Practical Handbook of Internet Computing
Social determinism reflects the obvious fact that identical technologies find different responses in different societies. The printing press created a scientific revolution in Western Europe, but coincided with a decline of science and education in Persian and Arab regions. The printing press had had little effect on the practice of science and literacy in China by 1600, despite the fact that paper and the movable type press had been invented there centuries earlier. Yet as Barlow illustrates, the extreme of technological determinism, social determinism, also produces extremes. Robert Fogel received a Nobel Prize in economics for “The Argument for Wagons and Canals” in which he declared that the railroad was of no importance as canals would have provided the same benefit. The sheer impossibility of building a canal over the Sierra Mountains or across Death Valley were not elements of his thesis, which assumed that canals were feasible within 40 mi of any river — including the Rio Grande and various arroyos. A second thesis of Fogel’s was that internal combustion would have developed more quickly without the railroad. This thesis ignores the contributions of the railroad engine in the technological innovation and the investments of railroads in combustion innovation. Most important, this does not acknowledge the dynamics of the emerging engineering profession, as the education and creation of human capital created by the railroad industry were important in internal combustion innovations. Such innovations are less likely to have arisen from those educated to dig canals. This finding requires a reductionist perspective that eliminates all technical distinction from the consideration. In fact, an appeal of this paper was the innovation of treating very different technologies as replaceable “black boxes” while maintaining a mathematically consistent model for their comparison. As is the case with technical determinism, social determinism is reductionist, ignoring technical fundamentals rather than the ubiquitous social environs.
55.4 Values in Design Clearly, the social implications of technology cannot be understood by ignoring either the technologies or the social environment. Inclusive perspectives applicable to information and communications technologies have emerged in human-centered design and design for values. These perspectives recognize that technologies can have biases and attempt to address the values in an ethical manner in the design process. Further, the perspectives explicitly address technologies as biased by the fundamentals of the technology itself as well as through the adoption process. Social determinism either argues for designers as unimportant cogs to controlling destiny or trivial sideshows to the economic and social questions. Technical determinists view designers as oblivious to their power or omniscient. The mad scientist is the poster child of dystopian technological determinism. The design for values or human-centered design schools conceives of designers as active participants yet acknowledges the limits of technological determinism [Friedman and Nissenbaum, 2001]. From this designfor-values perspective, designers offer technological systems and products that bundle functions and values. If the functions are sufficiently useful, the system may be adopted despite undesirable values. If the functions and the values align with the needs of the potential adopters, then widespread adoption is inevitable. While this description sounds clear, determining the values embedded in a system is not trivial. The examination of values in communications technology is enhanced by past research in the classification of specific values [Spinello, 1996]. In privacy there are definitions of private based on system purpose and use (for example the American Code of Fair Information Practice [Federal Trade Commission, 2000]) or only on system use (as with the European Directive on Data Protection [European Union, 1995]). In both cases, there are guidelines intended for users of technology that can be useful for designers of technology. Complicating the lack of examination of inherent and emergent (but theoretically economically predictable) biases is the reality that once adopted, technical standards are difficult to replace. Biases in communications technologies result from the omission of variables in the design stage (e.g., packet-based networks are survivable and, incidentally, quality of service is difficult). Some decisions which may exist in reality as values-determinant are framed by economics only. For example, backward compatibility for nth generation wireless systems is a determinant of cost and therefore of accessibility. Backwards compatibility enables the adoption of obsolete technology for regions Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 5 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-5
that have less capital. In this case a choice on the basis of expense and compatibility can reasonably be said to result in a system which values the marginal first-world consumer more or less against the infrastructure needs of the third-world consumer. Yet the decision would be made based on the expectations of migration of users of earlier generation technology. Such economic biases are inherent in engineering and design decisions. In other cases values are embedded through the technical assumptions of the designers, a case well made in a study of universal service and QoS. Similarly privacy risks created in ecommerce are arguably based on the assumption of middle-class designers about the existence of identity-linked accounts. Identity-linked accounts are not available to much of the population. Those without banks or credit cards obviously cannot use identity-linked ecommerce mechanisms. An awareness of privacy would have resulted in technologies usable by those in the cash economy. The plethora of ecash design illustrates that such assumptions can be avoided. Often design for computer security requires developing mechanisms to allow and refuse trust and thus it may be necessary to embed social assumptions. The issue of human/computer interaction further complicates the design for values system. Any interface must make some assumption about the nature of simplification, and the suitable metaphors for interface (e.g., why does a button make sense rather than a switch or path?). Yet choosing against a simplifying interface is itself a values-laden choice. (See the section on human-computer interaction and security, for example.) Even when technologists set out to design for a specific value, sometimes the result is not always as intended [Herkert, 1999]. For example, the Platform for Privacy Preferences has been described by Computer Scientists for Social Responsibility as a mechanism for ensuring that customer data are freely available to merchants, while its designers assert that the goal was customer empowerment. Similarly PICS has been described as a technology for human autonomy [Resnick, 1997] and as “the devil” [Lessig, 1997] for its ability to enhance the capabilities of censorship regimes worldwide. In both of these cases (not incidentally developed by the World Wide Web Consortium) the disagreement about values is a result of assumptions of the relative power of all the participants in an interaction — commercial or political. If the designers’ assumptions of fundamental bargaining equality are correct then these are indeed “technologies of freedom” [Pool, 1983]. On the other hand, the critics of these technologies are evaluating the implementation of these technologies in a world marked by differences in autonomy, ranging from those seeking Amnesty International to the clients of the Savoy. There is no single rule to avoid unwanted implications for values in design, and no single requirement that will embed values into a specific design. However, the following examples are presented in order to provide insights on how values are embedded in specific designs.
55.5 Human Implications of the Security Debates As computer security is inherently the control over the human uses of information, it is a particularly rich area for considering the human implications of technology. The technologically determinant, socially constructed, and design for values models of technological development all have strong parallels in the causes of failures in computer security. Privacy losses result from inherent characteristics of the technology, elements of specific product design, and implementation environments, as well as the interaction of these three. Security failures are defined as coding errors, implementation errors, user errors, or so-called human engineering [Landwehr et al., 1994]. Security failures are defined as either coding errors or emergent errors, where coding errors are further delineated into either logical flaws in the high level code or simple buffer overruns. The logical, human error and environment errors correspond loosely to technical determinant (accidental), social determinant, and iterative embedding of security values in the code. Coding errors, which can be further delineated into either logical flaws in the high level code or simple buffer overruns, are technologically determinant causes of security failures. The flaw is embedded into the implementation of the technology. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 6 Wednesday, August 4, 2004 9:48 AM
55-6
The Practical Handbook of Internet Computing
Implementation faults result from unforeseen interactions between multiple programs. This corresponds to the evolutionary perspective, as flaws emerge during adaptation of multiple systems over time. Adoption and use of software in unanticipated technical environments, as opposed to assumptions about the human environment, is the distinguishing factor between this case and the previous one. User errors are security vulnerabilities that result from the individual failing to interact in the manner prescribed by the system; for example, users selecting weak passwords. These flaws could be addressed by design for values or human-centered design. Assumptions about how people should be (e.g., valuable sources of entropy) as opposed to how they are (organic creatures of habit) are a core cause of these errors. Alternatively organizational assumptions can create security failures, as models of control of information flow create a systemic requirement to undermine security. Another cause is a lack of knowledge about how people will react to an interface. For example, the SSL lock icon informs the user that there is confidentiality during the connection. Yet the interface has no mechanism to distinguish between confidentiality on the connection and security on the server. Finally, human engineering means that the attacker obtains the trust of the authorized user and convinces that person to use his or her authorization unwisely. This is a case of a socially determined security flaw. As long as people have autonomy, people will err. The obvious corollary is that people should be removed from the security loop, and security should be made an unalterable default. It follows that users must be managed and prevented from harming themselves and others on the network. In this case the options of users must be decreased and mechanisms of automated user control are required. Yet this corollary ignores the fallible human involved in the design of security and fails to consider issues of autonomy. Thus, enabling users to be security managers requires educating users to make choices based on valid information. Designs should inform the user and be informed by principles of human-computer interaction. This section begins with a historical approach to the development of trustworthy systems beginning with the classic approach of building a secure foundation and ending with the recognition that people interact through interfaces, not in raw streams of bits. Both perspectives are presented, with systems that implement both ideas included in the examples. Computer security is the creation of a trustworthy system. A secure system is trustworthy in the narrow sense that no data are altered, accessed, or without authorization. Yet such a definition of trustworthy requires perfect authorization policies and practice. Trustworthy protocols can provide certainty in the face of network failures, memory losses, and electronic adversaries. An untrusted electronic commerce system cannot distinguish a failure in a human to comply with implicit assumptions from an attack; in either case, transactions are prevented or unauthorized access is allowed. When such failures can be used for profit then certainly such attacks will occur. Trust and security are interdependent. A trusted system that is not compatible with normal human behavior can be subverted using human engineering. Thus, the system was indeed trusted, but it was not secure. Trust requires security to provide authentication, integrity, and irrefutability. Yet trust is not security; nor does security guarantee trust. Ideal trustworthy systems inherently recognize the human element in system design. Yet traditional secure systems focus entirely upon the security of a system without considering its human and social context [Computer Science and Telecommunications Board, 1999]. Currently there is an active debate expressed in the legal and technical communities about the fundamental nature of a trustworthy system [Anderson, 2003; Camp, 2003a; Clark and Blumenthal, 2000]. This debate can be summed up as follows: Trusted by whom?
55.6 Conceptual Approaches to Building Trustworthy Systems 55.6.1 Trusted Computing Base The fundamental concept of secure computing is the creation of a secure core and the logical construction of provably secure assertions built upon that secure base. The base can be a secure kernel that prevents unauthorized hardware access or secure special-purpose hardware. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 7 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-7
The concept of the trusted computing base (TCB) was formalized with the development of the Trusted Computer System Evaluation Criteria (TCSEC) by the Department of Defense [Department of Defense, 1985]. The trusted computing base model sets a series of standards for creating machines, and grades machines from A1 to C2 according to the design and production of the machine. (There is a D rating, which means that the system meets none of the requirements. No company has applied to be certified at the D level.) Each grade, with C2 being the lowest, has increasingly high requirements for security beginning with the existence of discretionary access control, meaning that a user can set constraints on the files. The C level also requires auditing and authentication. The higher grade, B, requires that users be able to set security constraints on their own resources and be unable to alter the security constraints on documents owned by others. This is called mandatory access control. The TCB model as implement in the Trusted Computer System Evaluation Criteria is ideal for specialpurpose hardware systems. The TCB model becomes decreasingly applicable as the computing device becomes increasingly general purpose. A general purpose machine, by definition, can be altered to implement different functions for different purposes. The TCSEC model addresses this by making functionality and implementation distinct. Yet logging requirements, concepts of document ownership, and the appropriate level of hardening in the software change over time and with altered requirements. While multiple systems have sought and obtained C level certification under the TCSEC, the certification is not widely used in the commercial sector. Military emphasis on information control differs from civilian priorities in three fundamental ways. First, in military systems it is better that information be destroyed than exposed. In civilian systems the reverse is true: bank records, medical records, and other personally identifiable information are private but critical. Better a public medical decision than a flawed diagnosis based on faulty information. Second, the military is not sensitive to security costs. Security is the reason for the existence of the military; for others it is an added cost to doing business. Third, the Department of Defense is unique in its interactions with its employees and members, and with its individual participants is uniquely tightly aligned. There is no issue of trust between a soldier and commander. If the Department determines its policies then the computer can implement those policies. Civilians, businesses, families, and volunteer organizations obviously have very different organizational dynamics. One goal of the TCB is to allow a centralized authority to regulate information system use and access. Thus, the Trusted Computing Based may be trusted by someone other than the user to report upon the user. In a defense context, given the critical nature of the information, users are often under surveillance to prevent information leakage or espionage. In contrast, a home user may want to be secure against the Internet Service Provider as well as remote malicious users. Microsoft’s Next Generation Secure Computing Platform is built on the trusted computing base paradigm.
55.6.2 Next Generation Secure Computing Platform Formerly known as Palladium, then the Trusted Computing Base, then the Next Generation Secure Computing Platform, the topic of this section is now called Trusted Computing (TC). Regardless of the name, this is a hotly contested security design grounded in the work of Microsoft. TC requires a secure coprocessor that does not rely on the larger operating system for its calculations. In theory, TC can even prevent the operating system from booting if the lowest level system initiation (the BIOS or basic input/ output system which loads the operating system) is determined by TC to be insecure. TC must include at least storage for previously calculated values and the capacity for fast cryptographic operations. A primary function of TC is to enable owners of digital content to control that content in digital form [Anderson, 2003]. TC binds a particular data object to a particular bit of software. Owners of high value commodity content have observed the widespread sharing of audio content enabled by overlay networks, increased bandwidth, and effective compression. By implementing players in “trusted” mode, video and audio players can prevent copying, and enforce arbitrary licensing requirements by allowing only trusted Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 8 Wednesday, August 4, 2004 9:48 AM
55-8
The Practical Handbook of Internet Computing
players. (In this case the players are trusted by the content owners, not the computer user.) This is not limited to multimedia formats, as TC can be used for arbitrary document management. For example, the Adobe eBook had encryption to prevent the copying of the book from one device to another, prohibit audio output of text books, and to enforce expiration dates on the content [Adobe, 2002]. The Advanced eBook Processor enabled reading as well as copying, with copying inherently preventing deletion at the end of the licensed term of use. Yet, with TC an Adobe eBook could only be played with an Adobe eBook player, so the existence of the Advanced eBook Processor would not threaten the Abode licensing terms. In physical markets, encryption has been used to bind future purchases to past purchases, in particular to require consumers of a durable good to purchase related supplies or services from the manufacturer. Encryption protects ink cartridges for printers to prevent the use of third-party printer cartridge use. Patents, trade secrets and encryption link video games to consoles. Encryption connects batteries to cellular phones, again to prevent third-party manufacturers from providing components. Encryption enables automobile manufacturers to prevent mechanics from understanding diagnostic codes and, if all is working smoothly, from turning off warning lights after an evaluation. [Prickler, 2002] Given the history of Microsoft and its use of tying, the power of TC for enforcing consumer choices is worth consideration. It is because of the potential to limit the actions of users that the Free Software Foundation refers to TC as “treacherous computing” [Stallman, 2002]. In 1999, the United States Department of Justice found Microsoft guilty of distorting competition by binding its Explorer browser to its ubiquitous Windows operating system, and in 2004, the European Union ruled that Microsoft abused its market power and broke competition laws by tying Windows Media Player to the operating system. The attestation feature of TC, which enables a remote party to determine whether a user is using software of the remote party's choice, could be used to coerce users into using Explorer and Windows Media Player even if they were not bundled with the operating system. Microsoft also holds a monopoly position in desktop publishing, spreadsheet, and presentation software, with proprietary interests in the corresponding Word document (.doc), Excel spreadsheet (.xls), and PowerPoint presentation (.ppt) file formats. Currently Microsoft is facing competition from open code, including StarOffice and GNU/Linux. The encrypted storage feature of TC could enforce Microsoft lock-in. That is, TC could refuse to decrypt documents to any application other than one that could attest to being a legal copy of a Microsoft Office application running on Windows. Were competitors to reverse-engineer the encryption to enable users to read their own documents using other applications or on non-Windows operating systems, their actions would be felonious under the Digital Millennium Copyright Act. Initially, Palladium explicitly included the ability to disable the hardware and software components, so that the machine could run in “untrusted” mode. The recent integration of document control features in the MS Office suite requires that TC be enabled for any manipulation of the MSOffice documents (reading, saving, altering). TC centralizes power and trust. The centralization of trust is a technical decision. Using IBM implementations of TC it is possible to load Linux. IBM has made the driver code for the TCPA compatible chip (which they distinguish from Palladium) available over the Internet with an open license, and offer the product with Linux. TC makes it possible to remove final (or root) authority from the machine owner or to allow the owner to control her own machine more effectively. Thus, the design leverages and concentrates power in a particular market and legal environment. The design for values perspective argues the TC is valuable only if it provides root access and final authority to the end user. Yet TC is built in order to facilitate removal of owner control. TC offers twoparty authorization — an operator who is assumed to have physical access to the machine and an owner with remote access. The remote owner is specifically enabled in its ability to limit the software run by the owner. A design goal of TC is that the operator cannot reject alterations made by the owner, in that the typical operator cannot return the machine to a previous state after an owner’s update. In short, TC is designed to remove control from the end user (or operator) and place that control with a remote owner. If the operator is also the owner, TC has the potential to increase user autonomy Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 9 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-9
by increasing system security. If the owner is in opposition to the operator then the operator has lost autonomy by virtue of the increased security. The machine is more secure and less trustworthy from the perspective of the owner.
55.6.3 Human-Centered Trusted Systems Design Technical systems, as explained above, embody assumptions about human responses [Camp et al., 2001]. That humans are a poor source of randomness is well documented, and the problems of “social engineering” are well known [Anderson, 2002]. Yet the consideration of human behavior has not been included in classic axiomatic tests [Aslam et al., 1996; Anderson, 1994]. For example, designers of secure systems often make assumptions about the moral trust of humans, which is a psychological state, and strategic trust of machines [Shneiderman, 2000][Friedman et al., 2000]. Yet user differentiation between technical failures and purposeful human acts of malfeasance has never been tested. Despite the fact that the current software engineering process fails to create trustworthy software [Viega et al., 2001] much work on trust-based systems assumes only purposeful betrayals or simply declares that the user should differentiate [Friedman et al., 2000]. The inclusion of human factors as a key concern in the design of secure systems is a significant move forward in human-centered design. Human-centered design attempts to empower users to make rational trust decisions by offering information in an effective manner. “Psychological acceptability” was recognized a security design principle over a quarter century ago [Saltzer and Schroeder, 1975], and users and user behavior are commonly cited as the “weak link” in computer security. Passwords and other forms of authentication are among the more obvious ways that security features appear as part of the humancomputer interface. But the relevance of computer-human interaction to computer security extends far beyond the authentication problem, because the expectations of humans are an essential part of the definition of security. For example, Garfinkel and Spafford suggested the definition: “A computer is secure if you can depend on it and its software to behave as you expect” [1996]. Since goals and expectations vary from situation to situation and change over time in the real world, a practical approach to computer security should also take into account how those expectations are expressed, interpreted, and upheld. Although both the security and usability communities each have a long history of research extending back to the 1960s, only more recently have there been formal investigations into the interaction between these two sets of concerns. Some usability studies of security systems were conducted as early as 1989 [Karat, 1989; Mosteller and Ballas, 1989]. However, with the advent of home networking in the late 1990s, the study of computer-mediated trust has significantly expanded. 55.6.3.1 General Challenges It is fairly well known that usability problems can render security systems ineffective or even motivate users to compromise security measures. While HCI principles and studies can help to inform the design of usable security systems, merely applying established HCI techniques to design more powerful, convenient, or lucid interfaces is not sufficient to solve the problem; the challenge of usable security is uniquely difficult. There are at least six special characteristics of the usable security problem that differentiate it from the problem of usability in general [Whitten and Tygar, 1999; Sasse, 2003]: 1. The barn door property. Once access has been inadvertently allowed, even for a short time, there is no way to be sure that an attacker has not already abused that access. 2. The weakest link property. When designing user interfaces, in most contexts a deficiency in one area of an interface does not compromise the entire interface. However, a security context is less forgiving. The security of a networked computer is only as strong as its weakest component, so special care needs to be taken to avoid dangerous mistakes. 3. The unmotivated user property. Security is usually a secondary goal, not the primary purpose for which people use their computers. This can lead users to ignore security concerns or even subvert them when security tasks appear to interfere with the achievement of their primary goal. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 10 Wednesday, August 4, 2004 9:48 AM
55-10
The Practical Handbook of Internet Computing
4. The abstraction property. Security policies are systems of abstract rules, which may be alien and unintuitive to typical computer users. The consequences of making a small change to a policy may be far-reaching and non obvious. 5. The lack of feedback property. Clear and informative user feedback is necessary in order to prevent dangerous errors, but security configurations are usually complex and difficult to summarize. 6. The conflicting interest property. Security, by its very nature, deals with conflicting interests, such as the interests of the user against the interests of an attacker or the interests of a company against the interests of its own employees. HCI research typically aims to optimize interfaces to meet the needs of a single user or a set of cooperating users, and is ill-equipped to handle the possibility of active adversaries. Because computer security involves human beings, their motivations, and conflicts among different groups of people, security is a complex socio-technical system. 55.6.3.2 Authentication Since user authentication is a very commonly encountered task and a highly visible part of computer security, much of the attention in usable security research has been devoted to this problem. The most common authentication technique, of course, is the password. Yet password authentication mechanisms fail to acknowledge even well-known HCI constraints and design principles [Sasse et al., 2001]. Cognition research has established that human memory decays over time, that nonmeaningful items are more difficult to recall than meaningful items, that unaided recall is more difficult than cued recall, and that similar items in memory compete and interfere with each other during retrieval. Password authentication requires perfect unaided recall of nonmeaningful items. Furthermore, many users have a proliferation of passwords for various systems or have periodically changing passwords, which forces them to select the correct password from a set of several remembered passwords. Consequently, people often forget their passwords and rely on secondary mechanisms to deal with forgotten passwords. One solution is to provide a way to recover a forgotten password or to reset the password to a randomly generated string. The user authorizes recovery or reset by demonstrating knowledge of a previously registered secret or by calling a helpdesk. There are many design choices to make when providing a challenge-based recovery mechanism [Just, 2003]. Another common user response to the problem of forgetting passwords is to write down passwords or to choose simpler passwords that are easier to remember, thereby weakening the security of the system. One study of 14,000 UNIX passwords found that nearly 25% of all the passwords were found by trying variations on usernames, personal information, and words from a dictionary of less than 63,000 carefully selected words [Klein, 1990]. In response to all the shortcomings of user-selected string passwords, several password alternatives have been proposed. Jermyn et al. [1999] have examined the possibility of using hand-drawn designs as passwords. Others have looked at recognition-based techniques, in which users are presented with a set of options and are asked to select the correct one, rather than performing unaided recall. Brostoff and Sasse [2000] studied the effectiveness of images of human faces in this manner, and Dhamija and Perrig [2000] studied the use of abstract computer-generated images. In contexts where it is feasible to use additional hardware, other solutions are possible. Users can carry smart cards that generate password tokens or that produce responses to challenges from a server. Paul et al. [2003] describe a technique called “visual cryptography” in which the user overlays a uniquely coded transparency over the computer screen to decrypt a graphical challenge, thereby proving possession of the transparency. There is considerable interest in using biometrics for user authentication. A variety of measurable features can be used, such as fingerprints, voiceprints, hand geometry, faces, or iris scans. Each of the various methods has its own advantages and disadvantages. Biometrics offer the potential for users to authenticate without having to remember any secrets or carry tokens. However, biometrics have the fundamental drawback that they cannot be reissued. A biometric is a password that can never be changed. Once compromised, a biometric is compromised forever. Biometrics raise significant concerns about the creation of centralized databases of biometric information, as a biometric (unless hashed) creates a Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 11 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-11
universal identifier. Biometrics also have value implications in that biometric systems most often fail for minorities [Woodward et al., 2003]. Biometrics present class issues as well; for example, biometric records for recipients of government aid are already stored in the clear in California. The storage of raw biometric data makes compromise trivial and thus security uncertain. 55.6.3.3 User Perceptions and Trust User perceptions of security systems are crucial to their success in two different ways. First, the perceived level of reliability or trustworthiness of a system can affect the decision of whether to use the system at all; second, the perceived level of security or risk associated with various choices can affect the user’s choice of actions. Studies [Cheskin, 1999; Turner et al., 2001] have provided considerable evidence that user perception of security on e-commerce Web sites is primarily a function of visual presentation, brand reputation, and third-party recommendations. Although sufficiently knowledgeable experts could obtain technical information about a site’s security, for ordinary consumers, “feelings about a site’s security were for the most part not influenced by the site’s visible use of security technology” [Turner et al., 2001]. With regard to the question of establishing trust, however, perceived security is not the whole story. Fogg conducted a large study of over 1400 people to find out what factors contributed to a Web site’s credibility [2001]. The most significant factors were those related to “real-world feel” (conveying the realworld nature of the organization, such as by providing a physical address and showing employee photographs), “ease of use,” and “expertise” (displaying credentials and listing references). There is always a response to presenting photographs of people on Web sites, but the effect is not always positive [Riegelsberger, 2003]. In order to make properly informed decisions, users must be aware of the potential risks and benefits of their choices. It is clear that much more work is needed in this area. For example, a recent study showed that many users, even those from a high-technology community, had an inaccurate understanding of the meaning of a secure connection in their Web browser and frequently evaluated connections as secure when they were not or vice versa [Friedman et al., 2002]. A recent ethnographic study [Dourish et al., 2003] investigated users’ mental models of computer security. The study revealed that users tend to perceive unsolicited e-mail, unauthorized access, and computer viruses as aspects of the same problem, and envision security as a barrier for keeping out these unwanted things. The study participants blended privacy concerns into the discussion, perceiving and handling marketers as threats in much the same way as hackers. However, there seemed to be an “overwhelming sense of futility” in people’s encounters with technology. The perception that there will always be cleverer adversaries and new threats leads people to talk of security in terms of perpetual vigilance. In order to make properly designed systems, designers must be aware of human practices with respect to trust. Studies of trust in philosophy and social science argue that humans trust readily and in fact have a need to trust. Further, humans implement trust by aggregating rather than differentiating. That is, humans sort entities into trustworthy and untrustworthy groups. Thus, when people become users of computers, they may aggregate all computers into the class of computers, and thus become increasingly trusting rather than increasingly differentiating over time [Sproull and Kiesler, 1992]. Finally, when using computers humans may or may not differentiate between malicious behavior and technical incompetence. Spam is clearly malicious, whereas privacy violations may be a result of an inability to secure a Web site or a database. Competence in Web design may indicate a general technical competence, thus mitigating concerns about competence. However, competence in Web design may also indicate efficacy in obtaining user’s trust for malicious purposes. Only the ability to discern the technical actions and understand the implications can provide users with the ability to manage trust effectively on the network. 55.6.3.4 Interaction Design A number of studies have shown the potential for problems in interaction design to seriously undermine security mechanisms. Whitten and Tygar [1999] demonstrated that design problems with PGP made it very difficult for even technically knowledgeable users to safely use e-mail encryption; a study by Good Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 12 Wednesday, August 4, 2004 9:48 AM
55-12
The Practical Handbook of Internet Computing
and Krekelberg [2003] identifies problems in the user interface for KaZaA that can lead to users unknowingly exposing sensitive files on their computer. Carl Ellison has suggested that each mouse click required to use encryption will cut the base of users in half. [Ellison, 2002] Results of such studies and personal experiences with security systems have led researchers to propose a variety of recommendations for interaction design in secure systems. Yee [2002] has proposed ten principles for user interaction design in secure systems. At a higher level are recommendations to apply established HCI techniques to the design process itself. Karat [1989] described the benefits of applying rapid prototyping techniques to enable an iterative process involving several rounds of field tests and design improvements. Zurko and Simon [1996] suggest applying user-centered design to security — that is, beginning with user needs as a primary motivator when defining the security model, interface, or features of a system. Grinter and Smetters [2003] suggested beginning the design process with a usercentered threat model and a determination of the user’s security-related expectations. Techniques such as contextual design [Wixon et al., 1990] and discount usability testing [Nielsen, 1989] are also applicable to interaction design for secure systems. Some have suggested that the best way to prevent users from making incorrect security decisions is to avoid involving users in security at all. Others argue that only the user really knows what they want to do, and knowledge of the user’s primary goal is essential for determining the correct security action. It is clear that forcing users to perform security tasks irrelevant to their main purpose is likely to bring about the perception that security interferes with real work. Yee [2002] has suggested the principle of the path of least resistance, which recommends that the most natural way to perform a task should also be the safest way. Sasse [2003] highlighted the importance of designing security as an integral part of the system to support the user’s particular work activity. Grinter and Smetters [2003] have proposed a design principle called implicit security, in which the system infers the security-related operations necessary to accomplish the user’s primary task in a safe fashion. The perspectives are similar, and all recognize the necessity of harmonizing security and usability goals rather than pitting them against each other. In order for users to manage a computer system effectively, there must be a communication channel between the user and the system that is safe in both directions. The channel should protect against masquerading by attackers pretending to be authorized users, and protect users from being fooled by attackers spoofing messages from the system. This design goal was identified by Saltzer and Schroeder as the trusted path [1975]. Managing complexity is a key challenge for user interfaces in secure systems. Grinter and Smetters [2003] and Yee [2002] have identified the need to make the security state visible to the user so that the user can be adequately informed about the risks, benefits, and consequences of decisions. However, a literal representation of all security relationships would be overwhelming. Cranor [2003] applied three strategies to deal with this problem in designing a policy configuration interface: (1) reducing the level of detail in the policy specification, (2) replacing jargon with less formal wording, and (3) providing the option to use prepackaged bundles of settings. Whitten and Tygar [2003] has suggested a technique called safe staging, in which users are guided through a sequence of stages of system use to increase understanding and avoid errors. Earlier stages offer simpler, more conservative policy options for maintaining security; then as the user moves to later stages, she gains progressively greater flexibility to manipulate security policy while receiving guidance on potential new risks and benefits. Whitten and Tygar’s usability study of PGP [1999] showed strong evidence that designing a user interface according to traditional interface design goals alone was not sufficient to achieve usable security. Whitten recommends that usable security applications should not only be easy to use but also should teach users about security and grow in sophistication as the user demonstrates increased understanding. Ackerman and Cranor [1999] proposed “critics” — intelligent agents that offer advice or warnings to users, but do not take action on the user’s behalf. Such agents could inform users of nonobvious consequences of their actions or warn when a user’s decision seems unusual compared to decisions previously made in similar situations. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 13 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-13
55.6.4 Identity Examples Systems for identity management are inherently social systems. These systems implement controls that place risk, control data flows, and implement authentication. The strong link between control and computer security, and between identity and privacy, make these ideal examples for considering social implications of technology. 55.6.4.1 PKI Public signatures create bindings between identifiable cryptographic keys and specific (signed) documents. Public key infrastructures serve to link knowledge of a particular key to particular attribute. Usually that attribute is a name, but there are significant problems with using a name as a unique identifier. [Ellison and Camp, 2003] The phrase “public key infrastructure” has come to refer to a hierarchy of authority. There is a single root or a set of root keys. The root keys are used to sign documents (usually quite short, called certificates) that attest to the binding between a key and an attribute. Since that attribute is so often identity, the remainder of the section assumes it is indeed identity. Thus each binding between a key and identity is based on a cryptography verification from some other higher-level key. At the base is a key that is self-certified. Standard PKI implements a hierarchy with the assumption of a single point from which all authority and trust emanates. The owner of the root key is a certificate authority. The current public key infrastructure market and browser implementation (with default acceptable roots) create a concentration of trust. Matt Blaze [2003] argues that the SSL protects you from any institution that refuses to give Verisign money. The cryptographer Blaze arguably has an accurate assessment, as the purchaser of the cryptographic verification determines Verisign’s level of investigation into the true identity of the certificate holder. Public key infrastructures centralize trust by creating a single entity that signs and validates others. The centralization of trust is further implemented by the selection of a set of keys which are trusted by default by a market that is a duopoly or monopoly. 55.6.4.2 PGP Confidentiality in communications was the primary design goal of Pretty Good Privacy. PGP was conceived as a privacy enhancing technology as opposed to a technology for data protection. [Garfinkel, 1999] This fundamental distinction in design arguably explains the distinct trust models in the two technologies. PGP allows any person to assert an identity and public key binding. It is then the responsibility of the user to prove the value of that binding to another. In order to prove the binding the user presents the key and the associated identity claim to others. Other individuals who are willing to assert that the key/ identity binding is correct sign the binding with their own public keys. This creates a network of signatures, in which any user may or may not have a trusted connection. PGP utilizes social networks. If PKI can be said to model an authoritarian approach, PGP is libertarian. PKI has roots that are privileged by default. Each PGP user selects parties that are trusted not only for their assertions about their own binding of key and identity but also for their judgment in verifying the linkage of others. Those who have trusted attestations are called introducers. These introducers serve as linking points between the social networks formed by the original user and other social networks. PGP has monotonically increasing trust. When an introducer is selected, that introducer remains trusted over an infinite period of time unless manually removed. If an introducer is manually removed, all the introduced parties remain trusted, as the paths to a trusted entity are not recorded after introduction. PGP promotes trust as an increasing number of introducers know another entity. PGP does not decrease trust if one introducer declares a lack of knowledge, regardless of the properties of the social network. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 14 Wednesday, August 4, 2004 9:48 AM
55-14
The Practical Handbook of Internet Computing
PGP was designed to enable secure email. Secure email provides both integrity of the content and authentication of the sender. PGP enables confidential email by providing the endpoints with the capacity to encrypt. PGP places final trust in the hands of users, and allows users to implement their own social network by creating introducers. The same underlying cryptography is used by PGP and PKI but the values choices embedded are distinct.
55.6.5 Data Protection vs. Privacy The focus on human implications of design have focused on trust and trusted systems. However, privacy as well as security is an element of trust. Data surveillance, privacy violations, or abuse of data (depending on the jurisdiction and action) can be both ubiquitous and transparent to the user. Computers may transmit information without the users’ knowledge; collection, compilation and analysis of data is tremendously simplified by the use of networked information systems. Because of these facts, the balance between consumers and citizens who use services and those that offer digital services cannot be maintained by simply moving services on-line. A consideration of social implications of technology should include the dominant privacy technologies. The two most widely used (and implemented) privacy enhancing technologies are the anonymizer and P3P. The anonymizer implements privacy while P3P implements a data protection regime. Data protection and privacy have more commonalities than differences. The differences have philosophical and as well as technical design implications. Technical decisions determine the party most capable of preventing a loss of security; policy decisions can motivate those most capable. Current policy does not reflect the technical reality. End users are least technically capable and most legally responsible for data loss. For the vast majority of users on the Internet there are programs beyond their understanding and protocols foreign to their experience. Users of the Internet know that their information is sometimes transmitted across the globe. Yet there is no way for any but the most technically savvy consumers to determine the data leakage that results from Internet use. There is a comprehensive and developed argument for data protection. The privacy argument for data protection is essentially that when the data are protected privacy is inherently addressed. One argument against privacy is that it lacks a comprehensive, consistent underlying theory. There are competing theories [Camp, 2003b] [Trublow, 1991] [Kennedy 1995], yet the existence of multiple, complete, but disjoint theories does illustrate the point that there is limited agreement. Data protection regimes address problems of privacy via prohibition of data reuse and constraints on compilation. By focusing on practical data standards, data protection sidesteps difficult questions of autonomy and democracy that are inherent in privacy. Data protection standards constrain the use of personally identifiable information in different dimensions according to the context of the compilation of the information and the information itself. Unlike data protection, privacy has a direct technical implementation: anonymity. For both privacy and data protection, one significant technical requirement is enabling users to make informed choices. Given the role of security technology in enhancing privacy, human-computer design for security can enhance both privacy and data protection. Anonymity provides privacy protection by preventing data from being personally identifiable. Anonymity provides the strongest protection possible for privacy, and anonymity provides uniform protection for privacy across all circumstances. Anonymity is of limited in value in situations where there is a need to link repeated transaction. In such cases pseudonyms are needed. Pseudonyms can have no link to other roles or true identity; for example a pseudonym may be a name used in an imagined community such as a role-playing game. Pseudonyms allow for personalization without privacy violations. Repeated use of a pseudonym in multiple transactions with the same entities leaks little information. Use of a pseudonym in multiple contexts (for example, with multiple companies) causes the pseudonym to converge with the identity of the user. Privacy enhancing technologies include technologies for obscuring the source and destination of a message (onion routing) and preventing information leakage while browsing (the anonymizer). Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 15 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-15
Onion routing encrypts messages per hop, suing an overlay network of routers with public keys. At each router, the message provides the address of the next router and a message encrypted with the public key of the next router. Thus, each router knows the source of the message and the next hop, but not the original source nor the final destination. However, the router records could be combined to trace a message across the network. Yet even with the combined records of the routers, the confidentiality of the message would remain. The anonymizer is a widely used privacy-enhancing proxy. The anonymizer implements privacy by functioning as an intermediary so that direct connections between the server and the browser are prevented. The anonymizer detects Web bugs. Web bugs are 1 ¥ 1 invisible images embedded into pages to allow entities other than the serving page to track usage. Since Web bugs are placed by a server other than the one perceived by the user, Web bugs allow for placement of cookies from the originating server. This subverts user attempts to limit the use of third-party cookies. Note that browsers have a setting that allows users to reject cookies from any server other than one providing the page — so-called third party cookies. Web bugs enable placement of third party cookies. The anonymizer also limits Java script and prevents the use of ftp calls to obtain user email addresses. The anonymizer cannot be used in conjunction with purchasing; it is limited to browsing. In contrast, data protection encourages protection based on policy and practice. The Platform for Privacy Preferences is a technology that implements the data protection approach. [Cranor and Reagle 1998]. The Platform for Privacy preferences was designed to create a technical solution to the problem of data sharing. Ironically, of all privacy-enhancing technologies, it depends on regulatory enforcement of contracts as opposed to offering technical guarantees. [Hochheiser, 2003] The Platform for Privacy Preferences includes a schema (or language) for expressing privacy preferences. P3P allows the user to select a set of possible privacy policies by selecting a number on a sliding scale. P3P also has a mechanism for a service provider to express its privacy practices. If there is agreement between the server policy and user preference then user data are transmitted to the server. Otherwise, no data are sent. P3P also includes profiles where the user enters data. The inclusion of profiles has been a significant source of criticism as it removes from the user the ability to provide incorrect data. By including profiles, P3P removed from the user a common defense against data re-use (obfuscation, lying, or misdirection) by automating the transmission of correct data. P3P, therefore, had an enforcement role with respect to user data; either the user was consistently dishonest or consistently honest. P3P had no corresponding enforcement mechanism for the server. The server attests to its own privacy policy and the protocol assumes the server implements its own policy. The most recent version of P3P removes profiles; however the MS Explorer implementation still maintains profiles.
55.7 Network Protocols as Social Systems Network protocols for reservation of system resources in order to assure quality of service represent a technical area of research. However, even quality of service designs have significant impact on the economics, and therefore the social results, of such systems. There is no more clear example of politics in design than the change in the design of Cisco routers to simplify so-called “digital wiretaps.” Wiretaps refer to aural surveillance of a particular set of twisted pair lines coming from local switching office [IEEE, 1997]. This simple model was often abused by federal authorities in the U.S., and use of aural surveillance against dissidents, activists, and criminals has been widespread across the globe [Diffie and Landau, 1997]. “Digital telephony” is a phrase used by law enforcement to map the concept of wiretaps to the idea of data surveillance of an individual on the network. If the observation of digital activity is conceived as a simple digital mapping, then the risks can be argued as the same. Yet the ease of compilation and correlation of digital information argues that there is not a direct mapping. Cisco implemented a feature for automatically duplicating traffic from one IP address to a distinct location — a feature desired by law-abiding consumers of the product and by law enforcement. However, Cisco included no controls or reporting on this feature. Therefore Cisco altered the balance of power Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 16 Wednesday, August 4, 2004 9:48 AM
55-16
The Practical Handbook of Internet Computing
between those under surveillance and those implementing surveillance by lowering the work factor for the latter. Additionally, the invisibility of surveillance at the router level further empowers those who implement surveillance. The ability to distinguish the flow from one machine across the network and to duplicate that flow for purposes of surveillance is now hard-wired into the routers. Yet the oversight necessary to prevent abuse of that feature is not an element of the design; not even one that must be disabled. An alternative configuration would require selection of a default email address to which notifications of all active taps would be sent. The email could be set according to the jurisdiction of the purchaser; thus implementing oversight. A more subtle question is the interaction of quality of service mechanisms and universal service. The experience of the telephone (described in a previous section) illustrates how high quality service may limit the range of available service. Universal service may require a high degree of service sharing and certainly requires an easy to understand pricing method. Ubiquitous service reservation, and the resulting premium pricing, can undermine the potential of best effort service to provide always on connections at a flat price. [Camp and Tsang, 2002]
55.8 Open vs. Closed Code There exists a strong debate on how the availability of code alters society as digital systems are adopted. The initiator of this dialogue was Richard Stallman, who foresaw the corporate closing of code [Stallman, 1984]. Other technical pioneers contributed both to the code base and the theory of free and open code [Oram, 1999; Raymond, 1999]. Legal pioneers [Branscomb, 1984] clearly saw the problem of closing information by the failure of property regimes to match the economic reality of digital networked information. By 2000 [Lessig, 1999] there was widespread concern about the social implications of the market for code. Code can be distributed in a number of forms that range from completely open to completely closed. Code is governed by licenses as well as law; yet the law is sufficiently slow to adapt that graduations of and innovations in openness are provided by licenses. Computer code exists along a continuum. At one end is source code. Source code is optimized for human readability and malleability. Source code is high level code, meaning that it is several degrees removed from the physical or hardware level. An example of inherently human readable code is markup languages. There are technical means to prohibit the trivially easy reading and viewing of a document source, and methods for writing increasingly obtuse source code are proliferating. For example, popular Web-authoring documents use unnecessary Java calls to confuse the reading. At the most extreme is converting html-based Web pages to Shockwave formats which cannot be read. Markup languages and scripting languages such as JavaScript and CGI scripts are designed to be readable. Such scripts are read (thus the name), and then the listed commands are played in the order received. Between the highest level and the physical addresses required by the machine, there is assembly language. Assembly is the original coding language. Grace Hopper (who found the original computer bug, a moth in the machine at Harvard in 1945) implemented programs in assembly. Assembly requires that humans translate program into the binary language that machines understand. For example, adding two numbers in assembly takes many lines of code. The computer must be instructed to read the first number, bit by bit, and store it in an appropriate location. Then the computer must read the second number. Then the numbers must be placed at the input to the arithmetic logic unit, then added, and the result placed in an output register. Grace Hopper invented the breakthrough of the assembler, the forerunner to the compiler. The earliest code was all binary, of course, and thus clearly the most basic binary codes can be read. In these early binary the commands were implemented by women who physically linked nodes to create the binary “1” of the commands. For each mathematician creating a code there existed a large machine and a score of women to implement the commands by connecting two relays, thus programming a “1.” Current programs are vastly more complex than those implemented in hand-wired binary. The breaking of the Enigma code was a vast enterprise. (Alan Turing was honored for this achievement with a statue in the U.K.) Today, the same endeavor is an advanced undergraduate homework assignment. Thus, machine (sometimes called binary) code for today’s programs is unreadable. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 17 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-17
It is the ability to read code that makes it open or closed. Code that can be read can be evaluated by the user or the representative of the user. Code that is closed and cannot be read requires trust in the producer of the code. Returning to social forces that influence technology, this is particularly problematic. A user wants a secure machine. The producer of commercial code has an incentive to create code as fast as possible, meaning that security is not a priority. The user wants control over his or her personal information. The producer of commercial code may want information about the user, particularly to enforce intellectual property rights [Anderson, 2003] or to implement price discrimination [Odlyzko, 2003]. The ability to read code grants the practical ability to modify it. Of course, the existence of closed code does not prevent modifications. This can be seen most clearly in the modifications of the Content Scrambling System. The decryption of the Content Scrambling System enabled users to watch digital video disks from any region on any operating system. CSS implements the marketing plans, specifically regional price discrimination that is the traditional and profitable practice of the owners of massproduced high-value video content. Open code encourages innovation by the highly distributed end users by optimizing opportunities for innovation. Closed code encourages innovation by increasing the rewards to the fewer centralized innovators. Thus open and closed code implement different visions of innovation in society. Open code offers transparency. Closed code is not transparent. If code is law, then the ability to view and understand law is the essence of freedom [Lessig, 1999; Stallman, 1984; Syme and Camp, 2001]. The inability to examine law is a mark of a totalitarian state [Solzhenitsyn, 1975].
55.9 Conclusions Can the assumption of values be prevented by the application of superior engineering? Or are assumptions about values and humans an integral part of the problem-solving process? Arguably both cases exist in communications and information technologies. These two cases cannot be systematically and cleanly distinguished so that the designer can know when the guidance of philosophy or social sciences is most needed. When is design political? One argument is that politics inevitably intrudes when there are assumptions about trust and power embedded into the design. Yet trust assumptions may be as subtle as reservation of router resources or as obvious as closed code. Technologies embed changes in power relationships by increasing the leverage of applied force. Yet the social implications of amplification of one voice or one force cannot be consistently or reliably predicted. Designers who turn to the study of computer ethics find the field not yet mature. There are some who study ethics that argue that computers create no new ethical problems, but rather create new instantiations of previous ethical problems [Johnson, 2001]. Others argue that digital networked information creates new classes or cases of ethical conundrums [Walter 1996] [Moor, 1985]. Ethicists from all perspectives worked on the relevant professional codes. Thus, the professional can be guided by the ACM/IEEE- CS Software Engineering Code of Ethics and Professional Practice. Integrity and transparency are the highest calling. No engineer should implement undocumented features, and all designers should document their technical choices. The most risk-averse principle of a design scientist may be “Do no harm”; however, following that principle may result in inaction. Arguably inaction in the beginning of a technological revolution is the least ethical choice of all, as it denies society the opportunity to make any choices, however technically framed.
Additional Resources The IEEE Society on Social Implications of Technology: http://radburn.rutgers.edu/andrews/projects/ssit/default.htm ACM SIGCAS: Special Interest Group on Computers and Society: Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 18 Wednesday, August 4, 2004 9:48 AM
55-18
The Practical Handbook of Internet Computing
http://www.acm.org/sigcas/ An extended bibliography on technology and society, developed by a reference librarian: http://www.an.psu.edu/library/guides/sts151s/stsbib.html A listing of technology and society groups, and electronic civil liberties organizations: http://www.ljean.org/eciv.html
References Ackerman, Michael and Lorrie Cranor. Privacy Critics: UI Components to Safeguard Users’ Privacy. Proceedings of CHI 1999. Adobe Corporation. Adobe eBook FAQ, 2002. http://www.adobe.com/support/ebookrdrfaq.html. Alderman, Ellen and Caroline Kennedy. The Right to Privacy. Alfred A Knopf, New York, 1995. Anderson, Ross. Why cryptosystems fail. Communications of the ACM, 37(11): 32–40, November 1994. Anderson, Ross. Cryptography and Competition Policy — Issues with Trusted Computing. 2nd Annual Workshop on Economics and Information Security (May 29–30, 2003, Robert H. Smith School of Business, University of Maryland). Anderson, Ross. Security Engineering: A Guide to Building Dependable Distributed Systems. John Wiley & Sons, New York, 2002. Aslam, Taimur, Ivan Krsul, and Eugene Spafford. A Taxonomy of Security Vulnerabilities. Proceedings of the 19th National Information Systems Security Conference (October 6, 1996, Baltimore, MD), 551–560. Barlow, John. A Declaration of Independence of Cyberspace. http://www.eff.org/~barlow/DeclarationFinal.html, 1996 (last viewed September, 2003). Bijker, Wiebe, Thomas P. Hughes, and Trevor Pinch The Social Construction of Technological Systems. MIT Press, Cambridge, MA, 2001. Blaze, Matt. Quotes, August 31, 2003. http://world.std.com/~cme/html/quotes.html (last viewed September, 2003). Branscomb, Anne W. Who Owns Information? HarperCollins, New York, 1994. Brostoff, Saacha and M. Angela Sasse. Are Passfaces More Usable than Passwords? A Field Trial Investigation. Proceedings of HCI 2000 (September 5–8, Sunderland, U.K.), pp. 405–424. Springer-Verlag, 2000. Camp, L. Jean, First principles for copyright for DRM design. IEEE Internet Computing, 7(3): 59–65, 2003a. Camp, L. Jean. Design for Trust. In Trust, Reputation, and Security: Theories and Practice. Rino Falcone, Ed., Springer-Verlag, New York, 2003b. Camp, L. Jean, Cathleen McGrath, and Helen Nissenbaum. Trust: A Collision of Paradigms. Proceedings of Financial Cryptography 2001, 91–105. Springer-Verlag, 2001. Camp, L. Jean and Rose Tsang. Universal service in a ubiquitous digital network. Journal of Ethics and Information Technology, 2(4): 211–221, 2001. Cheskin and Studio Archetype/Sapient. eCommerce Trust Study. January 1999. Clark, David and Marjory Blumenthal. Rethinking the design of the Internet: The end to end arguments vs. the brave new world. Telecommunications Policy Research Conference, Washington, D.C., September 2000. Computer Science and Telecommunications Board. Trust in Cyberspace. National Academy Press, Washington, D.C., 1999. Cranor, Lorrie. Designing a Privacy Preference Specification Interface: A Case Study. CHI 2003 Workshop on HCI and Security. April 2003. Cranor, Lorrie and Joseph Reagle. Designing a Social Protocol: Lessons Learned from the Platform for Privacy Preferences. In Telephony, the Internet, and the Media. Jeffrey K. MacKie-Mason and David Waterman, Eds., Lawrence Erlbaum, Hillsdale, NJ, 1998. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 19 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-19
Department of Defense. Department of Defense Trusted Computer System Evaluation Criteria. National Computer Security Center, 1985. Dhamija, Rachna and Adrian Perrig. Déjà Vu: A User Study Using Images for Authentication. Proceedings of the 9th USENIX Security Symposium, August 2000. Diamond, Jared. Guns, Germs, and Steel: The Fates of Human Societies, W. W. Norton & Company, New York, 1999. Diffie, Whit and Susan Landau. Privacy on the Line. MIT Press, Cambridge, MA, 1997. Dourish, Paul, Jessica Delgado de la Flor, and Melissa Joseph. Security as a Practical Problem: Some Preliminary Observations of Everyday Mental Models. CHI 2003 Workshop on HCI and Security, April 2003. Eisenstein, Elizabeth L. The Printing Press as an Agent of Change. Cambridge University Press, Cambridge, U.K., 1979. Ellison, Carl. Improvements on Conventional PKI Wisdom. 1st Annual PKI Research Workshop, Dartmouth, NH, April 2002. Ellison, Carl and L. Jean Camp. Implications with Identity in PKI. http://www.ksg.harvard.edu/digital center/conference/references.htm (last viewed September 2003). European Union. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995, Official Journal of the European Communities, L. 281: 31, 23 November 1995. Evans, Nathanael, Avi Rubin, and Dan Wallach. Authentication for Remote Voting. CHI 2003 Workshop on HCI and Security. April 2003. Federal Trade Commission. Privacy Online: Fair Information Practices in the Electronic Marketplace. Federal Trade Commission Report to Congress, 2000. Fischer, Charles. America Calling: A Social History of the Telephone to 1940. University of California Press, Berkeley, CA, 1992. Fogg, B.J., Nicholas Fang, Jyoti Paul, Akshay Rangnekar, John Shon, Preeti Swani, and Marissa Treinen. What Makes A Web Site Credible? A Report on a Large Quantitative Study. Proceedings of ACM CHI 2001 Conference on Human Factors in Computing Systems, pp. 61–68. ACM Press, New York, 2001. Friedman, Batya, Ed. Human Values and the Design of Computer Technology. CSLI Publications, Stanford, CA, 2001. Friedman, Batya, David Hurley, Daniel C. Howe, Edward Felten, and Helen Nissenbaum. Users’ Conceptions of Web Security: A Comparative Study. Extended Abstracts of the ACM CHI 2002 Conference on Human Factors in Computing Systems, pp. 746–747. ACM Press, New York, 2002. Friedman, Batya, Peter H. Kahn, Jr., and Daniel C. Howe. Trust online. Communications of the ACM, 43(12): 34–40, December 2000. Friedman, Batya and Lynette Millett. Reasoning About Computers as Moral Agents. Human Values and the Design of Computer Technology. B. Friedman, Ed., CSLI Publications, Stanford, CA, 2001. Garfinkel, Simson. Pretty Good Privacy. O’Reilly, Sebastapol, CA, 1999. Garfinkel, Simson and Gene Spafford. Practical UNIX and Internet Security, 2nd ed. O’Reilly, Sebastapol, CA, 1996. Good, Nathaniel and Aaron Krekelberg. Usability and Privacy: A Study of Kazaa P2P File-Sharing. Proceedings of the ACM CHI 2003 Conference on Human Factors in Computing Systems, pp. 137–144. ACM Press, New York, 2003. Grinter, Rebecca E. and Diane Smetters. Three Challenges for Embedding Security into Applications. CHI 2003 Workshop on HCI and Security. April 2003. Herkert, Joseph R., Ed. Social, Ethical, and Policy Implications of Engineering: Selected Readings. IEEE Wiley, New York, 1999. Hochheiser, Harry. Privacy, policy, and pragmatics: An examination of P3P’s Role in the Discussion of Privacy Policy. Draft, 2003. IEEE United States Activities Board. Position Statement on Encryption Policy. In The Electronic Privacy Papers, 543. B. Schneier and D. Banisar, Eds. John Wiley & Sons, New York, 1997. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 20 Wednesday, August 4, 2004 9:48 AM
55-20
The Practical Handbook of Internet Computing
Jermyn, Ian, Alain Mayer, Fabian Monrose, Michael K. Reiter, and Aviel D. Rubin. The Design and Analysis of Graphical Passwords. Proceedings of the 8th USENIX Security Symposium, August 1999. Johnson, Deborah. Computer Ethics, 3rd ed. Prentice Hall, Upper Saddle River, NJ, 2001. Just, Mike. Designing Secure Yet Usable Credential Recovery Systems With Challenge Questions. CHI 2003 Workshop on HCI and Security. April 2003. Karat, Clare-Marie. Iterative Usability Testing of a Security Application. Proceedings of the Human Factors Society 33rd Annual Meeting, pp. 273–277, 1989. Klein, Daniel V. Foiling the Cracker — A Survey of, and Improvements to, Password Security. Proceedings of the 2nd USENIX Workshop on Security, pp. 5–14, 1990. Landes, David S. The Wealth and Poverty of Nations: Why Some Are So Rich and Some So Poor. W. W. Norton & Company, New York, 1999. Landwehr, Carl E., A. R. Bull, J. P. McDermott, and W. S. Choi. A taxonomy of computer program security flaws, with examples. ACM Computing Surveys, 26(3): 211–254, September 1994. Lessig, Larry. Tyranny in the Infrastructure. Wired, 5(7), 1997. Lessig, Larry. Code and Other Laws of Cyberspace. Basic Books, New York, 1999. Maner, Walter. Unique ethical problems in information technology. Science and Engineering Ethics, 2(2): 137–154, February 1996. McLuhan, Marshall. The Gutenberg Galaxy: The Making of Typographic Man. Toronto: University of Toronto Press, Toronto, 1962. Moor, James H. What is computer ethics? Metaphilosophy, 16(4): 266–275, October 1985. Mosteller, William and James Ballas. Usability Analysis of Messages from a Security System, Proceedings of the Human Factors Society 33rd Annual Meeting, 1989. Negroponte, Nicholas. Being Digital — The Road Map for Survival on the Information Superhighway. Alfred A. Knopf, New York, 1995. Nielsen, Jakob. Usability engineering at a discount. Designing and Using Human-Computer Interfaces and Knowledge Based Systems, G. Salvendy and M. J. Smith, Eds., Elsevier Science, Amsterdam, 1989, pp. 394–401. Odlyzko, Andrew M. Privacy, Economics, and Price Discrimination on the Internet. Proceedings of ICEC ’03. ACM Press, New York, 2003, in press. Oram, Andy. Open Sources: Voices from the Revolution. O’Reilly, Sebastapol, CA, 1999. Pierce, John and Michael Noll. Signals: The Science of Telecommunications. Scientific American Press, New York, 1990. Pool, Ithiel De Sola. Technologies of Freedom. Harvard University Press, Cambridge, MA, 1983. Postman, Neil. Technopoly: The Surrender of Culture to Technology. Vintage Books, New York. 1996. Prickler, Nedra, Mechanics Struggle with Diagnostics. AP Wire, 24 June 2002. Raymond, Eric. The Cathedral and the Bazaar. O’Reilly, Sebastopol, CA, 1999. Resnick, Paul. A Response to “Is PICS the Devil?” Wired, 5(7), July 1997. Riegelsberger, Jens, M. Angela Sasse, and John McCarthy. Shiny Happy People Building Trust? Photos on e-Commerce Websites and Consumer Trust. Proceedings of the ACM CHI 2003 Conference on Human Factors in Computing Systems, April 5–10, Ft. Lauderdale, FL, pp. 121–128. ACM Press, New York, 2003. Saltzer, Jerome H. and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63 (9): 1278–1308, 1975. Sasse, M. Angela. Computer Security: Anatomy of a Usability Disaster, and a Plan for Recovery. CHI 2003 Workshop on HCI and Security. April 2003. Sasse, M. Angela, Sacha Brostoff, and Dirk Weirich. Transforming the weakest link — a human/computer interaction approach to usable and effective security. BT Technology Journal, 19(3): 122–131, July 2001. Adi Shamir. How to share a secret. Communications of the ACM, 22(11): 612–613, July 1979. Ben Shneiderman. Designing trust into online experiences. Communications of the ACM, 43(12): 57–59, December 2000. Copyright 2005 by CRC Press LLC
C3812_C55.fm Page 21 Wednesday, August 4, 2004 9:48 AM
Human Implications of Technology
55-21
Solzhenitsyn, Alexander. The Law Becomes A Man. In Gulag Archipelago. Little, Brown and Company, New York, 1975 (English translation). Sproull, Lee and Sara Kiesler, Connections. MIT Press, Cambridge, MA, 1992. Spinello, Richard A., Ed. Case Studies in Information and Computer Ethics, Prentice Hall, Upper Saddle River, NJ, 1996. Stallman, Richard. The GNU Manifesto. http://www.fsf.org/gnu/manifesto.html, 1984 (last viewed September 2003). Stallman, Richard. Can You Trust Your Computer? http://www.gnu.org/philosophy/no-word-attachments.html, posted October 2002 (last viewed September 2003). Syme, Serena and L. Jean Camp. Open land and UCITA land. ACM Computers and Society, 32(3): 86–101. Trublow, George. Privacy Law and Practice. Times Mirror Books, Los Angeles, CA, 1991. Tucker, Robert C., Ed. Marx and Engels to Babel, Liebknecht, Branke, and Others. In Marx-Engels Reader. W. W. Norton, New York, 1978, pp. 549–555. Turner, Carl, Merrill Zavod, and William Yurcik. Factors That Affect The Perception of Security and Privacy of E-Commerce Web Sites. Proceedings of the 4th International Conference on Electronic Commerce Research, pp. 628–636, November 2001, Dallas, TX. Viega, John, Tadayoshi Kohno, and Bruce Potter. Trust (and mistrust) in secure applications. Communications of the ACM 44(2): 31–36, February 2001. Whitten, A. and J. Douglas Tygar. Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0. Proceedings of 8th USENIX Security Symposium, 1999. Whitten, Alam and J. Douglas Tygar. Safe Staging for Computer Security. CHI 2003 Workshop on HCI and Security. April 2003. Woodward, John D., Katherine W. Webb, Elaine M. Newton et al. Appendix A, Biometrics: A Technical Primer. Army Biometric Applications: Identifying and Addressing Sociocultural Concerns, RAND/ MR-1237-A, S RAND, 2001. Winner, Langdon, The Whale and the Reactor: A Search for Limits in an Age of High Technology. Chicago University Press, Chicago, IL, 1986. Wixon, Dennis, Karen Holtzblatt, and Stephen Knox. Contextual Design: An Emergent View of System Design. Proceedings of the ACM CHI 1990 Conference on Human Factors in Computing Systems, pp. 329–336, 1990. Yee, Ka-Ping. User Interaction Design for Secure Systems. Proceedings of the 4th International Conference on Information and Communications Security, December 2002, Singapore. Zurko, Mary Ellen and Richard T. Simon. User-Centered Security. Proceedings of the UCLA Conference on New Security Paradigms, Lake Arrowhead, CA, pp. 27–33, September 17–20, 1996.
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 1 Wednesday, August 4, 2004 9:49 AM
56 The Geographical Diffusion of the Internet in the United States CONTENTS Abstract 56.1 Introduction 56.2 Brief History of the Internet 56.3 The Diffusion Process 56.3.1 56.3.2 56.3.3 56.3.4
Standard Diffusion Analysis Demand for Business Purposes Supply by Private Firms Supply by Regulated Telephone Firms
56.4 Mapping the Internet’s Dispersion 56.4.1 56.4.2 56.4.3 56.4.4
Backbone Domain Name Registrations Hosts, Internet Service Providers, and Points of Presence Content and E-Commerce
56.5 Diffusion of Advanced Internet Access 56.5.1 Provision and Adoption 56.5.2 Rural vs. Urban Divides
56.6 Overview
Shane Greenstein Jeff Prince
56.6.1 What Happened during the First Wave of Diffusion? 56.6.2 Open Questions
References
Abstract1 This chapter analyzes the rapid diffusion of the Internet across the U.S. over the past decade for both households and companies. The analysis explains why dialup connection has reached the saturation point while high-speed connection is far from it. Specifically, we see a geographic digital divide for high-speed access. We put the Internet’s diffusion into the context of general diffusion theory where we consider costs and benefits on the demand and supply side. We also discuss several pictures of the Internet’s current physical presence using some of the main techniques for Internet measurement to date. Through this analysis we draw general lessons about how other innovative aspects of the Internet diffuse.
1
Northwestern University, Department of Management and Strategy, Kellogg School of Management, and Department of Economics, respectively. We thank the Kellogg School of Management for financial support. All errors contained here are our responsibility.
© 2005 by CRC Press LLC
C3812_C56.fm Page 2 Wednesday, August 4, 2004 9:49 AM
56-2
The Practical Handbook of Internet Computing
56.1 Introduction The National Science Foundation (NSF) began to commercialize the Internet in 1992. Within a few years there was an explosion of commercial investment in Internet infrastructure in the U.S. By September 2001, 53.9 million homes (50.5%) in the U.S. had Internet connections (National Telecommunications and Information Administration [NTIA, 2002]). The diffusion of the Internet has thus far proceeded in two waves. To be connected to the Internet is to have access at any speed. Yet, there is a clear difference between low-speed/dialup connection and high-speed/hardwire connection. In the early 1990s, those with dialup connection were considered on the frontier, but by the turn of the millennium, dialup connection had clearly become a nonfrontier technology, with the new frontier consisting of high-speed connections, mainly through xDSL and cable. As with any new technology, the diffusion of the Internet and its related technologies follows predictable regularities. It always takes time to move a frontier technology from a small cadre of enthusiastic first users to a larger majority of potential users. This process displays systemic patterns and can be analyzed. Furthermore, the patterns found through analysis of the early diffusion of the Internet are general. These patterns provide insight about the processes shaping the diffusion of other advanced technologies today, such Wi-Fi, Bluetooth, XML, and supply-chain standards. Accordingly, this essay has two goals: to tell a specific story and to communicate general lessons. We provide a survey of the literature concerning the diffusion of the Internet in the U.S. This is a specific story told about a specific technology in a particular time period. Throughout the essay we use this story to understand broader questions about the workings of diffusion processes. We also discuss why these broader processes are likely to continue or not in the future. For the specific story we focus on a few key questions: 1. How has the Internet diffused over geographic space? How and why does location matter for adoption and provision of frontier and nonfrontier Internet technology? 2. Do we see differences in Internet access and use between rural and urban areas, as well as within urban areas? 3. How long will it take the commercial Internet to cover most geographic areas, and, as a result, realize the promise of reducing the importance of distance? The remainder of this chapter will place the diffusion of the Internet in the context of general diffusion theory, analyze the costs and benefits for providers and adopters, and discuss various measurements of Internet presence in order to address these questions.
56.2 Brief History of the Internet The Internet is a malleable technology whose form is not fixed across time and location. To create value, the Internet must be embedded in investments at firms that employ a suite of communication technologies, Transmission Control Protocol/Internet Protocol (TCP/IP) protocols, and standards for networking between computers. Often, organizational processes also must change to take advantage of the new capabilities. What became the Internet began in the late 1960s as a research project of the Advanced Research Projects Administration of the U.S. Defense Department, or ARPANET. From these origins sprang the building blocks of a new communications network. By the mid-1980s, the entire Internet used TCP/IP packet-switching technology to connect universities and defense contractors. Management for large parts of the Internet was transferred to the National Science Foundation (NSF) in the mid-1980s. Through NSFNET, the NSF was able to provide connection to its supercomputer centers and a high-speed backbone from which to develop the Internet. Since use of NSFNET was limited to only academic and research locations, Alternet, PSInet, and SprintLink developed their own private backbones for corporations looking to connect their systems with TCP/IP (Kahn, 1995). Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 3 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-3
By the early 1990s the NSF developed a plan to transfer ownership of the Internet out of government hands and into the private sector. When NSFNET was shut down in 1995, only for-profit organizations were left running the commercial backbone. Thus, with the Internet virtually completely privatized, its diffusion path within the U.S. was dependent on economic market forces (Greenstein, 2003).
56.3 The Diffusion Process General diffusion theory is an effective guide for analyzing the geographical diffusion of the Internet. According to general diffusion theory, new products are adopted over time, not just suddenly upon their entrance into the market. In addition, the rate of adoption of a new technology is jointly determined by consumers’ willingness to pay for the new product and suppliers’ profitability from entering the new market. We consider each of these factors in turn.
56.3.1 Standard Diffusion Analysis At the outset we begin our analysis with simple definitions. Any entity (household, individual, or firm) is considered connected to the Internet if it has the capability of communicating with other entities (information in and information out) via the physical structure of the Internet. We will defer discussion about connections coming at different speeds (56 k dial up vs. broadband) and from different types of suppliers (AOL vs. a telephone company). With regard to consumers, it is the heterogeneity of adopters that generally explains differences in the timing of adoption (Rogers, 1995). In this case, a good deal of heterogeneity is the direct result of another technology’s diffusion — that of personal computers (PCs). The Internet is a “nested innovation” in that heterogeneity among its potential adopters depends heavily on the diffusion process of PCs (Jimeniz and Greenstein, 1998). Then, on top of this nesting, within the class of PC users, there are also differences in their willingness to experiment and the intensity of their use.2 The following five attributes of a new technology are widely considered as the most influential for adoption speed across different types of users: relative advantage, compatibility, complexity, trialability, and observability. Any increase in the relative advantage over the previous technology, the compatibility of the new technology with the needs of potential adopters, the ability of adopters to experiment with the new technology, or the ability of users to observe the new technology will speed up the diffusion process. Similarly, any decrease in technological complexity will also speed up the diffusion process (Rogers, 1995). The Internet has relative advantages along many dimensions. It provides written communication faster than postal mail, allows for purchases online without driving to the store, and dramatically increases the speed of information gathering. The Internet is also easy to try (perhaps on a friend’s PC or at work), easy to observe, and compatible with many consumer needs (information gathering, fast communication); and its complexity has been decreasing consistently. All of these attributes have contributed toward increasing Americans’ propensity to adopt. The above attributes hold across the U.S., but the degree to which they hold is not geographically uniform. Specifically, we see differences between rural and urban areas. For example, people living in rural areas might find greater relative advantage because their next-best communication is not as effective as that of their urban counterparts. Also, they might find the Internet more difficult to try or observe and possibly more complex if they have less exposure or experience with PCs. Beyond differences in the levels of the five major attributes, there are differences in types of adopters across regions. Generic diffusion theory points to five categories of adopters: innovators, early adopters, early majority, late majority, and laggards. When a technology first enters on the frontier, the group of innovators adopts first, and over time, the technology moves down the hierarchy. If these groups are not evenly dispersed geographically, there will be an uneven rate of adoption across regions of the country. 2 For more on the diffusion of PCs, see Goolsbee and Klenow (1999), United States Department of Agriculture (2000), or NTIA (1995, 1997, 1998, 2002).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 4 Wednesday, August 4, 2004 9:49 AM
56-4
The Practical Handbook of Internet Computing
36.3.1.1 Cost–Benefit Framework Within general diffusion theory there can be much dispute as to why the adoption of a new technology is actualized in a specific way. Here, we apply a general-purpose technology (GPT) framework to our study of the diffusion of the Internet. According to the GPT framework, some consumers use a new technology in its generic form, but for the technology to spread it must be customized for different subsets of users. This customization is why there is a delayed pattern of adoption. Bresnahan and Trajtenberg (1995) define a GPT as “a capability whose adaptation to a variety of circumstances raises the marginal returns to inventive activity in each of these circumstances.” Generalpurpose technologies are often associated with high fixed costs and low marginal costs to use. The Invention of the Internet follows this pattern, in the sense that the technology was largely invented and refined by the early 1990s (Bresnahan and Greenstein, 2001). The GPT framework further predicts that additional benefit from the technology comes from “coinvention” of additional applications. Co-invention costs are the costs affiliated with customizing a technology to particular needs in specific locations at a point in time. These costs can be quite low or high, depending on the idiosyncrasy and complexity of the applications, as well as on economies of scale within locations. Provision of the Internet in a region involves high fixed costs of operating switches, lines, and servers. We expect to see firms wishing to minimize fixed costs or exploit economies of scale by serving large markets. Also, the cost of “last mile” connection (e.g., xDSL or cable) in rural areas is far greater due to their longer distance from the backbone. This basic prediction frames much of the research on the diffusion of the Internet. There will necessarily be a margin between those who adopt and those who do not. What factors are correlated with the observed margin? We can divide these factors into those associated with raising or lowering the costs of supply or the intensity of demand. 56.3.1.2 Demand by Households Households will pay for Internet connection when the benefits outweigh the costs. Internet literature points to several household characteristics that strongly correlate with Internet usage, namely, income, employment status, education, age, and location. We address each of these characteristics in turn. According to the NTIA (2002) study, as of 2001, approximately 56.5% of American homes owned a PC, with Internet participation rates at 50.5% in 2001. By 2001, Internet usage correlated with higher household income, employment status, and educational attainment. With regard to age, the highest participation rates were among teenagers, while Americans in their prime working ages (20–50 years of age) were also well connected (about 70%) (NTIA, 2002). Although there did not appear to be a gender gap in Internet usage, there did appear to be a significant gap in usage between two widely defined racial groups: (1) whites, Asian Americans, Pacific Islanders (approximately 70%), and (2) Blacks and Hispanics (less than 40%) (NTIA, 2002). Much of this disparity in Internet usage can be attributed to observable differences in education and income. For example, at the highest levels of income and education there are no significant differences in adoption and use across ethnicities. A great deal of literature points to a digital divide between rural and urban areas, contending that rural residents are less connected to the Internet than urban ones. Some argue that rural citizens are less prone to using computers and digital networks because of exacerbating propensities arising from lower income, and lower levels of education and technological skills (on average) compared to those living in the city. The evidence for this hypothesis is mixed, however, with many rural farms using the Internet at high rates (U.S. Department of Agriculture, 2000). In addition, over the 2-year span from 1998 to 2000, Internet access went up from 27.5 to 42.3% in urban areas, 24.5 to 37.7% in central cities, and 22.2 to 38.9% in rural areas. Thus, there was at least a narrowing of the gap in participation rates between rural and urban areas, and there certainly was no evidence of the gap widening on any front.3
3
For the full historical trend, see also NTIA (1995, 1997, 1998).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 5 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-5
Furthermore, when we divide American geography into three sections — rural, inner city urban, and urban (not inner city) — we see lower participation in the first two categories, with inner city participation also being low potentially due to a greater percentage of citizens with lower income and education levels. With the higher concentration of Blacks and Hispanics in the inner city, there then arises the correlation between education and income and socioeconomics. As we previously stated, ethnicity is not the cause of lower adoption rates; instead, lower education and income levels, which in turn are caused by socioeconomic factors, create lower adoption rates in the inner cities. (Strover, 2001). It has been argued that the benefits of adoption are greater for rural areas because rural residents can use the Internet to compensate for their distance from other activities. Adopting the Internet improves their retail choices, information sources, education options, and job availability more than those of urban residents (Hindman, 2000). However, these benefits may or may not be translated into actual demand.
56.3.2 Demand for Business Purposes Business adoption of the Internet came in a variety of forms. Implementation for minimal applications, such as email, was rather straightforward by the late 1990s. It involved a PC, a modem, a contract with an Internet Service Provider, and some appropriate software. Investment in the use of the Internet for an application module in a suite of Enterprise Resource Planning software, for example, was anything but routine during the latter half of the 1990s. Such an implementation included technical challenges beyond the Internet’s core technologies, such as security, privacy, and dynamic communication between browsers and servers. Organizational procedures usually also changed. Businesses adopt different aspects of Internet technology when anticipated benefits outweigh the costs. In the standard framework for analyzing diffusion, the decision to adopt or reject the Internet falls under three categories: (1) optional, where the decision is made by the individual, (2) collective, where it is made by consensus among members, or (3) authoritative, where it is made by a few people with authority (Rogers, 1995). For businesses, the decision process falls under one of the latter two categories. Either a consensus of members of the organization or top-level management will assess whether adopting the Internet is expected to improve the overall profitability of the company and then proceed accordingly. Again, as in individual adoption decisions, the five key attributes of the new technology will again be important. So there is every reason to expect basic Internet use in business to be as common as that found in households. A further motivating factor will shape business adoption: competitive pressure. As Porter (2000) argues, there are two types of competitive motives behind Internet adoption. First, the level of “table stakes” may vary by region or industry. That is, there may be a minimal level of investment necessary just to be in business. Second, there may be investments in the Internet that confer competitive advantage vis-à-vis rivals. Once again, these will vary by location, industry, and even the strategic positioning of a firm (e.g., price leader, high service provider) within those competitive communities. The key insight is that such comparative factors shape competitive pressure. Several recent studies look empirically at determining factors for Internet adoption by firms and the possible existence of a digital divide among them. Premkumar and Roberts (1999) test the former by measuring the relevance of 10 information technology attributes for the adoption rate of small rural businesses. The 10 attributes are: relative advantage, compatibility, complexity, cost-effectiveness, top-management support, information technology expertise, size, competitive pressure, vertical linkages, and external support. They find that relative advantage, cost-effectiveness, top-management support, competitive pressure, and vertical linkages were significant determinants for Internet adoption decisions. Forman (2002) examines the early adoption of Internet technologies at 20,000 commercial establishments from a few select industries. He concentrates on a few industries with a history of adoption of frontier Internet technology and studies the microeconomic processes shaping adoption. He finds that rural establishments were as likely as their urban counterparts to participate in the Internet and to employ advanced Internet technologies in their computing facilities for purposes of enhancement of these Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 6 Wednesday, August 4, 2004 9:49 AM
56-6
The Practical Handbook of Internet Computing
facilities. He attributes this to the higher benefits received by remote establishments, which otherwise had no access to private fixed lines for transferring data. Forman, Goldfarb, and Greenstein (2002, 2003) measure national Internet adoption rates for medium and large establishments from all industries.4 They distinguish between two purposes for adopting, one simple and the other complex. The first purpose, labeled participation, relates to activities such as email and Web browsing. This represents minimal use of the Internet for basic communications. The second purpose, labeled enhancement, relates to investment in frontier Internet technologies linked to computing facilities. These latter applications are often known as e-commerce and involve complementary changes to internal business computing processes. The economic costs and benefits of these activities are also quite distinct; yet, casual analysis in the trade press tends to blur the lines between the two. Forman, Goldfarb, and Greenstein examine business establishments with 100 or more employees in the last quarter of 2000. They show that adoption of the Internet for purposes of participation is near saturation in most industries. With only a few exceptional laggard industries, the Internet is everywhere in medium to large businesses establishments. Their findings for enhancement contrast sharply with their findings for participation. There is a strong urban bias to the adoption of advanced Internet applications. The study concludes, however, that location, per se, does not handicap adoption decisions. Rather, the industries that “lead” in advanced use of the Internet tend to disproportionately locate in urban areas. They conclude that a large determinant of the location of the Internet in e-commerce was the preexisting distribution of industrial establishments across cities and regions. This conclusion highlights that some industries are more information intensive than others, and, accordingly, make more intensive use of new developments in information technologies, such as the Internet, in the production of final goods and services. Heavy Internet technology users have historically been banking and finance, utilities, electronic equipment, insurance, motor vehicles, petroleum refining, petroleum pipeline transport, printing and publishing, pulp and paper, railroads, steel, telephone communications, and tires (Cortada, 1996).
56.3.3 Supply by Private Firms Since the Internet’s privatization in 1995, private incentives have driven the supply side of Internet access. Internet Service Providers (ISPs) are divided into four classes: (1) transit backbone ISPs, (2) downstream ISPs, (3) online service providers (e.g., AOL), and (4) firms that specialize in Website hosting. Provision incentives are profit based, and for a technology with significant economies of scale, profits will likely be higher in markets with high sales quantity. Thus, we see high numbers of ISPs in regions with high population concentrations (Downes and Greenstein, 1998, 2002). The ISPs also decide on the services they provide (e.g., value-added services) and the price at which they provide them. Greenstein (2000a, 2000b) highlights two types of activities other than basic access in which ISPs partake — high-bandwidth applications and services that are complementary to basic access. He notes that differences in firm choices are due to “different demand conditions, different quality of local infrastructure, different labor markets for talent,” or differing qualities of inherited firm assets. Geography plays a role in these differences and can explain much of the variation in quality of access. We expect the quality of local infrastructure to be higher in urban areas. The quality of ISP service will be higher there as well. Additionally, rural ISPs often have less incentive to improve due to lack of competition, that is, they are the only provider in their area and thus have little incentive to enhance their service.
56.3.4 Supply by Regulated Telephone Firms Every city in the U.S. has at least one incumbent local telephone provider. The deregulation of local telephony has been proceeding in many parts of the U.S. since the AT&T divestiture in the early 1980s.
4
See, also, Atrostic and Nguyen (2002), who look at establishments in manufacturing. To the extent that they examine adoption, their study emphasizes how the size of establishments shapes the motives to adopt networking for productivity purposes. Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 7 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-7
This movement is an attempt to increase the number of potential providers of local voice services beyond this monopoly incumbent, and in so doing, to increase the competitiveness of markets for a variety of voice and data services. This form of deregulation became linked to the growth of broadband because the rules affecting telephony shaped the price of providing broadband. Deregulation had an impact on the Internet’s deployment because it altered the organization of the supply of local data services, primarily in urban areas.5 Prior to the commercialization of the Internet, decades of debate in telephony had already clarified many regulatory rules for interconnection with the public switch network, thereby eliminating some potential local delays in implementing this technology on a small scale. By treating ISPs as an enhanced service and not as competitive telephone companies, the Federal Communication Commission (FCC) did not pass on access charges to them, which effectively made it cheaper and administratively easier to be an ISP (Oxman, 1999; Cannon 2001). 6 The new competitor for the deregulated network is called a Competitive Local Exchange Company (CLEC). No matter how it is deployed, CLECs have something in common: each offers phone service and related data carrier services that interconnect with the network resources offered by the incumbent provider (e.g., lines, central switches, local switches). In spite of such commonalities, there are many claims in the contemporary press and in CLEC marketing literature that these differences produce value for end users. In particular, CLECs and incumbent phone companies offer competing versions of (sometimes comparable) DSL services and networking services. Something akin to CLECs existed prior to the 1996 Telecommunications Act, the watershed federal bill for furthering deregulation across the country. These firms focused on providing high-bandwidth data services to business. After its passage, however, CLECs grew even more. And they quickly became substantial players in local networks, accounting for over $20 billion a year in revenue in 2000.7 More to the point, CLECs became the center of focus of the deregulatory movement. Many CLECs grew rapidly and often took the lead in providing solutions to issues about providing the last mile of broadband, particularly to businesses and targeted households. In addition, many CLECs already were providing direct line (e.g., T-1) services to businesses (as was the incumbent local phone company). The incumbent delivered services over the switch and so did CLECs. In recognition of the mixed incentives of incumbents, regulators set rules for governing the conduct of the transactions. As directed by the 1996 Telecommunications Act, this included setting the prices for renting elements of the incumbent’s network, such as the loops that carried the DSL line.8 For our purposes here, the key question is: Did the change in regulations shape the geographic diffusion of Internet access across the U.S.? The answer is almost certainly, yes, at least in the short run. The answer, however, is more ambiguous in the long run. By the end of the millennium the largest cities in the U.S. had dozens of potential and actual competitive suppliers of local telephone service that interconnected with the local incumbent. By the end of 2000, over 500 cities in the U.S. had experience with at least a few competitive suppliers of local telephony, many of them focused on providing related Internet and networking services to local businesses, in addition to telephone service (New Paradigm Research Group, 2000). This opportunity extended to virtually all cities with a population of more than 250,000, as well as to many cities with a population under 100,000. Very few rural cities, however, had this opportunity except in the few states that encouraged it. So, at the outset, if there were any effects at all, the entry of CLECs only moderately increased broadband supply in just the urban locations. 5
For a comprehensive review of the literature, see Woroch (2001). The FCC’s decision was made many years earlier for many reasons and extended to ISPs in the mid-1990s, with little notice at the time, because most insiders did not anticipate the extent of the growth that would arise. As ISPs grew in geographic coverage and revenues threatened to become competitive voice carriers, these interconnection regulations came under more scrutiny (Werbach, 1997; Kende, 2000; Weinberg, 1999). 7 See Crandall and Alleman (2002). 8 For review of the determinants of pricing within states, see Rosston and Wimmer (2001). 6
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 8 Wednesday, August 4, 2004 9:49 AM
56-8
The Practical Handbook of Internet Computing
Due to the uneven availability of the Internet in some locations, local public government authorities also have intervened to speed deployment. Local governments act as an agent for underserved demanders by motivating broadband deployment in some neighborhoods through select subsidies or granting of right-of-ways (Strover and Berquist, 2001). There also often is help for public libraries, where the presence of a federal subsidy enables even the poorest rural libraries to have Internet access at subsidized rates (Bertot and McClure, 2000).
56.4 Mapping the Internet’s Dispersion A number of alternative methods have been devised for measuring the Internet’s presence or its adoption in a location. None is clearly superior, as they are all valid ways of measuring the diffusion of the technology across geographic regions.
56.4.1 Backbone The commercial Internet comprises hubs, routers, high-speed switches, points of presence (POPs), and high-speed high-capacity pipe that transmit data. These pipes and supporting equipment are sometimes called backbone for short. Backbone comprises mostly fiberoptic lines of various speeds and capacity. However, no vendor can point to a specific piece of fiber and call it “backbone.” This label is a fiction, but a convenient one. Every major vendor has a network with lines that go from one point to another, but it is too much trouble to refer to it as “transmission capacity devoted primarily to carrying traffic from many sources to many sources.” One common theme in almost every article addressing the Internet’s backbone is the following: a handful of cities in the U.S. dominate in backbone capacity, and, by extension, dominate first use of new Internet technology. Specifically, San Francisco/Silicon Valley, Washington, D.C., Chicago, New York, Dallas, Los Angeles, and Atlanta contain the vast majority of backbone capacity (Moss and Townsend, 2000a, 2000b). As of 1997, these seven cities accounted for 64.6% of total capacity, and the gap between this group and the rest remained even during the intense deployment of new networks and capacity between 1997 and 1999. By 1999, even though network capacity quintupled over the previous 2 years, the top seven still accounted for 58.8% of total capacity. The distribution of backbone capacity does not perfectly mimic population distribution because metropolitan regions such as Seattle, Austin, and Boston have a disproportionately large number of connections (relative to their populations), whereas larger cities such as Philadelphia and Detroit have disproportionately fewer connections (Townsend, 2001a, 2001b). In addition, the largest metropolitan areas are well served by the backbone, whereas areas such as the rural South have few connections (Warf, 2001).
56.4.2 Domain Name Registrations Domain names are used to help map intuitive names (such as www.northwestern.edu) to the numeric addresses computers use to find each other on the network (Townsend, 2001). This address system was established in the mid-1990s and diffused rapidly along with the commercial Internet. The leaders in total domain names are New York and Los Angeles; however, Chicago — normally considered along with New York and Los Angeles as a global city — only ranks a distant fifth, far behind the two leaders. Furthermore, when ranking metropolitan areas according to domain names per 1000 persons, of these three cities, only Los Angeles ranks among the top twenty (17th). The full ranking of domain name density indicates that medium-sized metropolitan areas dominate, whereas global cities remain competitive and small metropolitan areas show very low levels of Internet activity (Townsend, 2001a). Moss and Townsend (1998) look at the growth rate for domain name registrations between 1994 and 1997. They distinguish between global information centers and global cities and find that global informaCopyright 2005 by CRC Press LLC
C3812_C56.fm Page 9 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-9
tion centers such as Manhattan and San Francisco grew at a pace six times the national average. In contrast, global cities such as New York, Los Angeles, and Chicago grew only at approximately one to two times the national average. Kolko (2000) examines domain names in the context of questioning whether the Internet enhances the economic centrality of major cities in comparison to geographically isolated cities.9 He argues, provocatively, that reducing the “tyranny of distance” between cities does not necessarily lead to proportional economic activity between them. That is, a reduction of communications costs between locations has ambiguous predictions about the location of economic activity in the periphery or the center. Lower costs can reduce the costs of economic activity in isolated locations, but it can also enhance the benefits of locating coordinative activity in the central location. As with other researchers, Kolko presumes that coordinative activity is easier in a central city where face-to-face communications take place. Kolko (2000) documents a heavy concentration of domain name registrations in a few major cities. He also documents extraordinary per capita registrations in isolated medium-sized cities. He argues that the evidence supports the hypothesis that the Internet is a complement, not a substitute for face-to-face communications in central cities. He also argues that the evidence supports the hypothesis that lowering communication costs helps business in remote cities of sufficient size (i.e., medium-sized, but not too small).
56.4.3 Hosts, Internet Service Providers, and Points of Presence Measurements of host sites, ISPs, and POPs also have been used to measure the Internet’s diffusion. Indeed, the growth of the Internet can be directly followed in the successive years of Boardwatch Magazine (Boardwatch Magazine). The earliest advertisements for ISPs in Boardwatch Magazine appear in late 1993, growing slowly until mid-1995, at which point the magazine began to organize their presentation of pricing and basic offerings. There was an explosion of entry in 1995, with thousands being present for the next few years. Growth only diminished after 2001. Internet hosts are defined as computers connected to the Internet on a full-time basis. Host-site counting may be a suspect measurement technique due to its inability to differentiate between various types of equipment and to the common practice by firms of not physically housing Internet-accessible information at their physical location. Nevertheless, we do see results similar to those found with other measurement techniques, because, as of 1999, five states (California, Texas, Virginia, New York, and Massachusetts) contain half of all Internet hosts in the U.S. (Warf, 2001). For ISPs, Downes and Greenstein (2002) analyze their presence throughout the U.S. Their results show that, while low entry into a county is largely a rural phenomenon, more than 92% of the U.S. population had access by a short local phone call to seven or more ISPs as of 1997. Strover, Oden, and Inagaki (2002) look directly at ISP presence in areas that have traditionally been underserved by communications technologies (e.g., the Appalachian region). They examine areas in the states of Iowa, Texas, Louisiana, and West Virginia. They determine the availability and nature of Internet services from ISPs for each county and find that rural areas suffer significant disadvantages for Internet service (see, also, Strover, 2001). Measurements of POPs help to identify “urban and economic factors spurring telecommunication infrastructure growth and investment” (Grubesic and O’Kelly, 2002). The POPs are locations where Internet service providers maintain telecommunication equipment for network access. Specifically, this is often a switch or router that allows Internet traffic to enter or proceed on commercial Internet backbones. Through POP measurement, Grubesic and O’Kelly derive similar results to those concerning the backbone, namely the top seven cities; Chicago, New York, Washington, D.C., Los Angeles, Dallas, Atlanta, and San Francisco provide the most POPs. Furthermore, cities such as Boston and Seattle are emerging Internet leaders. Grubesic and O’Kelly (2002) use POPs to measure which metropolitan areas are growing the fastest. Their data indicates that areas such as Milwaukee, Tucson, Nashville, and Portland saw major surges in 9
See, also, Kolko (2002).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 10 Wednesday, August 4, 2004 9:49 AM
56-10
The Practical Handbook of Internet Computing
POPs at the end of the 1990s. They provide several explanations for these surges: (1) proximity to major telecommunication centers (Tucson and Milwaukee), (2) intermediation between larger cities with high Internet activity (Portland), and (3) centralized location (Nashville).
56.4.4 Content and E-Commerce Zook (2000, 2001) proposes two additional methods for measuring the presence of the Internet. The first measures the Internet by content production across the U.S. Zook defines the content business as “enterprises involved in the creation, organization, and dissemination of informational products to a global marketplace where a significant portion of the business is conducted via the Internet.” He finds the location of each firm with a dot-com Internet address and plots it: San Francisco, New York, and Los Angeles are the leading centers for Internet content in the U.S. with regard to absolute size and degree of specialization.10 The second method looks at the locations of the dominant firms in e-commerce. Again, Zook (2000, 2001) finds the top Internet companies based on electronically generated sales and other means and their location. His analysis shows San Francisco, New York, and Los Angeles as dominant in e-commerce with Boston and Seattle, beating out the remainder of the top seven. When measured on a scale relative to the number of Fortune 1000 companies located in the region, his results indicate greater activity on the coasts (especially the West coast) with many Midwestern cities such as Detroit, Omaha, Cincinnati, and Pittsburgh lagging.
56.5 Diffusion of Advanced Internet Access Internet connection generally comes in two forms: (1) dialup (technology now off the frontier) and (2) broadband (the new frontier technology).
56.5.1 Provision and Adoption While dialup connection has moved past the frontier stage and is approaching the saturation point in the U.S., broadband access is still on the frontier and far from ubiquitous.11 However, as the volume and complexity of traffic on the Internet increases dramatically each year, the value of universal “always-on” broadband service is constantly increasing. Furthermore, broadband access will enable providers to offer a wider range of bundled communications services (e.g., telephone, email, Internet video, etc.) as well as promote more competition between physical infrastructure providers already in place. The diffusion of Internet access has been very much supply-driven in the sense that supply-side issues are the main determinants of Internet adoption. As ISPs face higher fixed costs for broadband due to lack of preexisting infrastructure, the spread of broadband service has been much slower and less evenly distributed than that of dialup service. In particular, ISPs find that highly populated areas are more profitable due to economies of scale and lower last-mile expenses. As of September 2001, 19.1% of Internet users possessed high-speed connection; the dominant types of broadband access were cable modems and xDSL. The national broadband penetration of 19.1% can be partitioned into central city, urban, and rural rates of 22, 21.2, and 12.2%, respectively. We note that the rate of 22% for central cities is likely biased upward due to the presence of universities in the centers of many cities. Consistent with the supply-side issues, the FCC estimates that high-speed subscribers were present in 97% of the most densely populated zip codes by the end of 2000, whereas they were present in only 45% of the zip codes with the lowest population density (NTIA, 2002). 10 Degree of specialization is measured by relating the number of .com domains in a region relative to the total number of firms in a region to the number of .com domains in the U.S. relative to the total number of firms in the U.S. (Zook, 2000). 11 Broadband is defined by the FCC as “the capability of supporting at least 200 kbps in the consumer’s connection to the network,” both upstream and downstream (Grubesic and Murray, 2002).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 11 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-11
Augereau and Greenstein (2001) analyze the evolution of broadband provision and adoption by looking at the determinants for upgrade decisions for ISPs. Although their analysis only looks at upgrades from dialup service to 56K modem or ISDN service occurring by 1997, it addresses issues related to the provision of high-speed service and warrants mention here. In their model, they look for firm-specific factors and location-specific factors that affect firms’ choices to offer more advanced Internet services. Their main finding is that “the ISPs with the highest propensity to upgrade are the firms with more capital equipment and the firms with propitious locations.” The most expansive ISPs locate in urban areas. They further argue that this could lead to inequality in the quality of supply between ISPs in highdensity and low-density areas. Grubesic and Murray (2002) look at differences in xDSL access for different regions in Columbus, Ohio. They point out that xDSL access can be inhibited for some consumers due to the infrastructure and distance requirements. The maximum coverage radius for xDSL is approximately 18,000 ft from a central switching office (CO), which is a large, expensive building. Furthermore, the radius is closer to 12,000 ft for high-quality, low-interruption service. Therefore, those living outside this radius from all the COs already built before xDSL was available will more likely suffer from lack of service. As a counterintuitive result, such affluent areas as Franklin County in Ohio might lack high-speed access, which is contrary to the usual notion of there being a socioeconomic digital divide (Grubesic and Murray, 2002). However, this does give more insight into a reason why many rural residents (those living in places with more dispersed population) might also lack high-speed access. Lehr and Gillett (2000) compile a database consisting of communities in the U.S. where cable modem service is offered and link it to county-level demographic data. They find that broadband access is not universal. Only 43% of the population lives in counties with available cable modem service.12 Broadband access is typically available in counties with large populations, high per capita income, and high population density; and there is a notable difference in strategy for cable operators with some being more aggressive than others. In a very data-intensive study, Gabel and Kwan (2001) examined deployment of DSL services at central switches throughout the country and provided a thorough census of upgrade activity at switches. They examined providers’ choice to deploy advanced technology to make broadband services available to different segments of the population. The crucial factors that affect the decision to offer service are listed as (1) cost of supplying the service, (2) potential size of the market, (3) cost of reaching the Internet backbone, and (4) regulations imposed on Regional Bell Operating Companies.13 They found that advanced telecommunications service is not being deployed in low-income and rural areas. In summary, the spread of broadband service has been much slower and much less evenly distributed than that of dialup. This is not a surprise once their basic economics are analyzed. The broadband ISPs find highly dense areas more profitable due to economies of scale in distribution and lower expenses in build-out. Moreover, the build-out and retrofit activities for broadband are much more involved and expensive than what was required for the build-out of the dialup networks. So, within urban areas, there is uneven availability. Thus, even before considering the impact of geographic dispersion in demand, the issues over the cost of supply guarantee that the diffusion process will take longer than dialup ever did. Until a low-cost wireless solution for providing high-bandwidth applications emerges, these economic factors are unlikely to change.
56.5.2 Rural vs. Urban Divides To date, there has been a significant amount of research concerning the digital divide. Many researchers emphasize this divide along socioeconomic lines, such as wealth or race. Many others focus on a geo12
They point out that this population is actually closer to 27% (as was stated by Kinetic Strategies), but explain that their data is not fine enough to show this measurement. 13 Data was obtained concerning wire centers; also data on DSL and cable modem service availability was collected via Websites and calling service providers. They supplemented it with U.S. Census data. Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 12 Wednesday, August 4, 2004 9:49 AM
56-12
The Practical Handbook of Internet Computing
graphical divide either contrasting rural vs. urban or rural vs. urban (center city) vs. urban (noncentral city). We can make several key observations concerning a geographical divide. First, the divide for basic Internet services is generally nonexistent. Due to the preexisting infrastructure from telephone service, the cost of provision is relatively low; thus, we see over 92% of households just a local call away from Internet connection. Furthermore, as of 2001, 52.9% of rural residents were using the Internet, not far below the national average of 57.4% (NTIA, 2002). Businesses participate at high rates, over 90%. While we do see lower basic participation rates in rural areas, this essentially is due to the type of industries we find there (i.e., industries deriving less relative benefit from Internet connection). Thus, in this particular case, we see that it is not necessarily availability of Internet access but largely the private incentives of the adopters (commercial businesses) determining the adoption rate. Augereau and Greenstein (2001) warn of the possibility of the divide in availability worsening as large firms in large cities continue to upgrade their services rapidly while smaller firms in smaller cities move forward more slowly. As basic service has almost entirely saturated the country, the real issue of concern is the evolution of quality of service geographically, as well as value per dollar. Several authors warn that we may be headed down a road of bifurcation where large urban areas get better service at a faster pace while smaller cities and urban areas fall behind. Greenstein (2000a) suggests that urban areas get more new services due to two factors: “(1) increased exposure to national ISPs, who expand their services more often; and (2) the local firms in urban areas possess features that lead them to offer services with propensities similar to the national firms.” By a different line of argument, Strover (2001) arrives at a comparatively pessimistic assessment, one shared by many observers.14 She points out that the cost structure for ISPs is unfavorable because of their dependence on commercial telecommunications infrastructure providers, which are reluctant to invest in rural areas due to the high costs necessary to reach what often are relatively few customers. Furthermore, a lack of competition in rural areas among telephone service providers serves to exacerbate the low incentives. Even though we have suggested that high levels of participation by firms are largely determined by industry and not geography, the fact that industries heavily reliant on advanced telecommunications are concentrated in major cities provides an incentive for ISPs to place their main focus there. Furthermore, the fact that the economics of small cities is governed more by the private sector than government initiatives makes small cities less prone to initiating plans to develop telecommunications (Alles et al., 1994). Many studies place a much greater emphasis on other variables along which they find the divide is much more pronounced. Hindman (2000) suggests that there is no strong evidence of a widening gap between urban and rural residents’ use of information technologies, but that such predictors as income, education, and age have become even more powerful in predicting usage over the years (specifically from 1995 to 1998). Forman, Goldfarb, and Greenstein (2002) find that, as of December 2000, 12.6% of establishments engage in some form of Internet enhancement activities. Furthermore, they find much higher enhancement adoption rates in large cities (consolidated metropolitan statistical areas) as the top 10 ranged from Denver at 18.3% to Portland at 15.1%. In addition, enhancement adoption rates in large urban counties (metropolitan statistical areas) is 14.7%, while that of small counties is only 9.9% on average. However, they also find that the industries of “management of companies and enterprises” (NAICS 55) and “media, telecommunications, and data processing” (NAICS 51) had enhancement adoption rates of 27.9 and 26.8%, respectively — rates far exceeding all other industries. This strongly points to the idea that geographical differences may largely be explained by the preexisting geographical distribution of industries.
14
See also Garcia (1996), Parker (2000), Hindman (2000).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 13 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-13
56.6 Overview The geographic diffusion of Internet infrastructure, such as the equipment to enable high-speed Internet access, initially appeared to be difficult to deploy and use. Technically, difficult technologies favor urban areas, where there are thicker labor markets for specialized engineering talent. Similarly, close proximity to thick technical labor markets facilitates the development of complementary service markets for maintenance and engineering services. Labor markets for technical talent are relevant to the diffusion of new technologies in the Internet. As with many high-tech services, areas with complementary technical and knowledge resources are favored during the early use of technology. This process will favor growth in a few locations, such as Silicon Valley, the Boston area, or Manhattan — for a time, at least, particularly when technologies are young. But will it persist? This depends on how fast the technology matures into something standardized that can be operated at low cost in areas with thin supply of technical talent.
56.6.1 What Happened during the First Wave of Diffusion? It was unclear at the outset which of several potential maturation processes would occur after commercialization. If advancing Internet infrastructure stayed exotic and difficult to use, then its geographic distribution would depend on the location of the users most willing to pay for infrastructure. If advancing Internet infrastructure became less exotic to a greater number of users and vendors, then commercial maturation would produce geographic dispersion over time, away from the areas of early experimentation. Similarly, as advanced technology becomes more standardized, it is also more easily serviced in outlying areas, again contributing to its geographic dispersion. As it turned out, the first wave of the diffusion of the Internet (from 1995 to 2000) did not follow the most pessimistic predictions. The Internet did not disproportionately diffuse to urban areas with their complementary technical and knowledge resources. The location of experiments was necessarily temporary, an artifact of the lack of maturity of the applications. As this service matured — as it became more reliable and declined in price so that wider distribution became economically feasible — the geographic areas that were early leaders in technology lost their comparative lead or ceased to be leaders. As such, basic ISP technology diffused widely and comparatively rapidly after commercialization. Open questions remain as the next wave proceeds. There is little experience with uncoordinated commercial forces developing a high-speed communication network with end-to-end architecture. This applies to the many facets that make up advanced telecommunications services for packet switching, such as switching using frame relay or Asynchronous Transfer Mode, as well as Synchronous Optical Network equipment or Optical Carrier services of various numerical levels (Noam, 2001). To be sure, the spread of broadband service has been much slower and much less evenly distributed than that of dialup. This is not a surprise once their basic economics are analyzed. The broadband ISPs find highly dense areas more profitable due to economies of scale in distribution and lower expenses in build-out. Moreover, the build-out and retrofit activities for broadband are much more involved and expensive than what was required for the build-out of the dialup networks. So, within urban areas, there is uneven availability. This situation is unlikely to change until a low-cost wireless solution for providing high-bandwidth applications emerges.
56.6.2 Open Questions Within this survey we have drawn a picture of where the Internet is today and discussed the main forces behind how it got there. While the geographical divide has all but vanished for basic connection, it still exists for advanced connection, and we are yet to determine whether this divide will improve or worsen. Many interesting questions for this field still remain. The bilateral relationship between geography and the Internet still has many properties to be explained. Will Internet connection via satellite emerge as the connection of choice, and if so, how much would this dampen the argument that location matters? Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 14 Wednesday, August 4, 2004 9:49 AM
56-14
The Practical Handbook of Internet Computing
Will another fixed wireless solution emerge for delivery of high-speed data services, and will it exhibit low enough economies of scale to spread to suburban areas? As the majority of American homes become hardwired, how drastic will the effect be on local media, such as local newspapers (see. e.g., Chyi and Sylvie, 2001)? If individuals can access any radio station in the country at any time, can all these stations possibly stay in existence? We can also ask how the spread of the Internet will affect the diffusion of other new products. In other words, are inventions spreading faster now than in the past because everything is connected? For example, use of some peer-to-peer technologies, such as ICQ and Napster, spread very fast worldwide. These were nested within the broader use of the Internet at the time. Was their speed of adoption exceptional, a byproduct of the early state of the commercial Internet, or something we should expect to see frequently? There are related questions about the spread of new technologies supporting improvements in the delivery of Internet services. Will the diffusion of IPv6 occur quickly because its use is nested within the structure of existing facilities? Will various versions of XML spread quickly or slowly due to the interrelatedness of all points on the Internet? What about standards supporting IP telephony? Will 802.11b (aka Wi-Fi) diffuse to multiple locations because it is such a small-scale technology, or will its small scale interfere with a coordinated diffusion? As we speculate about future technologies, two overriding lessons from the past shape our thinking: First, when uncertainty is irreducible, it is better to rely on private incentives to develop mass-market services at a local level. Once the technology was commercialized, private firms tailored it in multiple locations in ways that nobody foresaw. Indeed, the eventual shape, speed, growth, and use of the commercial Internet was not foreseen within government circles (at NSF), despite (comparatively) good intentions and benign motives on the part of government overseers, and despite advice from the best technical experts in the world. If markets result in a desirable outcome in this set of circumstances in the U.S., then markets are likely to result in better outcomes most of the time. Second, Internet infrastructure grew because it is malleable, not because it was technically perfect. It is better thought of as a cheap retrofit on top of the existing communications infrastructure. No single solution was right for every situation, but a TCP/IP solution could be found in most places. The U.S. telephone system provided fertile ground because backbone used existing infrastructure when possible but interconnected with new lines when built. CLECs and commercial ISPs provided Internet access to homes when local telephone companies moved slowly, but incumbent telephone companies and cable companies proved to be agile providers in some situations. Ultimately, however, it is as fun to speculate as to watch the uncertain future unfold. The answers to these questions will become more approachable and important as the Internet continually improves and spreads across the U.S. at an extraordinary pace.
References Alles, P., A. Esparza, and S. Lucas. 1994. Telecommunications and the Large City–Small City Divide: Evidence from Indiana cities. Professional Geographer 46: 307–16. Atrostic, Barbara K. and Sang V. Nguyen. 2002. Computer Networks and U.S. Manufacturing Plant Productivity: New Evidence from the CNUS Data. Working Paper #02–01, Center for Economic Studies, U.S. Census Bureau, Washington, D.C. Augereau, A. and S. Greenstein. 2001. The need for speed in emerging communications markets: Upgrades to advanced technology at Internet service providers. International Journal of Industrial Organization, 19: 1085–1102. Bertot, John and Charles McClure. 2000. Public Libraries and the Internet, 2000, Reports prepared for National Commission on Libraries and Information Science, Washington, D.C., http:// www.nclis.gov/statsurv/2000plo.pdf. Boardwatch Magazine, Various years, Directory of Internet Service Providers, Littleton, CO. Bresnahan, T. and Shane Greenstein. 2001. The economic contribution of information technology: Towards comparative and user studies.” Evolutionary Economics, 11: 95–118. Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 15 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-15
Bresnahan, T. and Manuel Trajtenberg. 1995. General purpose technologies: Engines of growth? Journal of Econometrics. 65 (1): 83–108. Cannon, Robert. 2001. Where Internet Service Providers and Telephone Companies Compete: A Guide to the Computer Inquiries, Enhanced Service Providers, and Information Service Providers, In Communications Policy in Transition: The Internet and Beyond, Benjamin Compaine and Shane Greenstein, Eds., Cambridge, MA: MIT Press. Chyi, H. I. and Sylvie, G. 2001. The medium is global, the content is not: The role of geography in online newspaper markets. Journal of Media Economics 14(4): 231–48. Cortada, James W. 1996. Information Technology as Business History: Issues in the History and Management of Computers. Westport, CT: Greenwood Press. Crandall, Robert W. and James H. Alleman. 2002. Broadband: Should We Regulate High-Speed Internet Access? AEI–Brookings Joint Center for Regulatory Studies, Washington, D.C. Downes, Tom and Shane Greenstein. 1998. Do commercial ISPs provide universal access? in Competition, Regulation and Convergence: Current Trends in Telecommunications Policy Research, Sharon Gillett and Ingo Vogelsang, Eds., pp. 195–212. Hillsdale, NJ: Lawrence Erlbaum Associates. Downes, Tom and Shane Greenstein. 2002. Universal access and local Internet markets in the U.S. Research Policy, 31: 1035–1052. Forman, Chris, 2002. The Corporate Digital Divide: Determinants of Internet Adoption. Working Paper, Graduate School of Industrial Administration, Carnegie Mellon University. Available at http:// www.andrew.cmu.edu/~cforman/research/corp_divide.pdf. Forman, Chris, A. Goldfarb, and S. Greenstein. 2003a. Digital Dispersion: An Industrial and Geographic Census of Commercial Internet Use. Working Paper, NBER, Cambridge, MA. Forman, Chris, A. Goldfarb, and S. Greenstein. 2003b. The geographic dispersion of commercial Internet use. In Rethinking Rights and Regulations: Institutional Responses to New Communication Technologies, Steve Wildman and Lorrie Cranor, Eds., in press. Cambridge, MA: MIT Press. Gabel, D. and F. Kwan. 2001. Accessibility of Broadband Communication Services by Various Segments of the American Population. In Communications Policy in Transition: The Internet and Beyond, Benjamin Compaine and Shane Greenstein, Eds., Cambridge, MA: MIT Press. Garcia, D. L. 1996. Who? What? Where? A look at Internet deployment in rural America, Rural Telecommunications November/December: 25–29. Goolsbee, Austan and Peter Klenow. 1999. Evidence on Learning and Network Externalities in the Diffusion of Home Computers. Working Paper # 7329, NBER, Cambridge, MA. Gorman, S. P. and E. J. Malecki. 2000, The networks of the Internet: an analysis of provider networks in the USA. Telecommunications Policy, 24(2): 113–34. Greenstein, Shane, 2000a. Building and delivering the virtual world: Commercializing services for Internet access. The Journal of Industrial Economics, 48(4): 391–411. Greenstein, Shane, 2000b. Empirical Evidence on Commercial Internet Access Providers’ Propensity to Offer new Services, In the Internet Upheaval, Raising Questions, and Seeking Answers in Communications Policy, Benjamin Compaine and Ingo Vogelsang, Eds., Cambridge, MA: MIT Press. Greenstein, Shane, 2003, The economic geography of Internet infrastructure in the United States, In Handbook of Telecommunications Economics, Vol. 2, Martin Cave, Sumit Majumdar and Ingo Vogelsang, Eds., Amsterdam: Elsevier. Grubesic, T. H. and A. T. Murray. 2002. Constructing the divide: Spatial disparities in broadband access. Papers in Regional Science, 81(2): 197–221. Grubesic, T. H. and M. E. O’Kelly. 2002. Using points of presence to measure accessibility to the commercial Internet. Professional Geographer, 54(2): 259–78. Hindman, D. B. 2000. The rural-urban digital divide. Journalism and Mass Communication Quarterly, 77(3): 549–60. Jimeniz, Ed and Shane Greenstein. 1998. The emerging Internet retailing market as a nested diffusion process. International Journal of Innovation Management, 2(3).
Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 16 Wednesday, August 4, 2004 9:49 AM
56-16
The Practical Handbook of Internet Computing
Kahn, Robert. 1995. The role of government in the evolution of the Internet, In Revolution in the U.S. Information Infrastructure, by National Academy of Engineering, Washington, D.C.: National Academy Press. Kende, Michael. 2000. The Digital Handshake: Connecting Internet Backbones, Working Paper No. 32., Federal Communications Commission, Office of Planning and Policy, Washington D.C. Kolko, Jed. 2000. The Death of Cities? The Death of Distance? Evidence from the Geography of Commercial Internet Usage, In The Internet Upheaval: Raising Questions, Seeking Answers in Communications Policy, Ingo Vogelsang and Benjamin Compaine, Eds., Cambridge, MA: MIT Press. Kolko, Jed. 2002. Silicon Mountains, Silicon Molehills, Geographic Concentration and Convergence of Internet Industries in the U.S., Economics of Information and Policy. Lehr, William and Sharon Gillett. 2000. Availability of Broadband Internet Access: Empirical Evidence. Workshop on Advanced Communication Access Technologies, Harvard Information Infrastructure Project, Kennedy School of Government, Harvard University, Available at www.ksg.harvard.edu/ iip/access/program.html/. Moss, M. L. and A.M. Townsend. 1998. The Role of the Real City in Cyberspace: Understanding Regional Variations in Internet Accessibility and Utilization. Paper presented at the Project Varenius Meeting on Measuring and Representing Accessibility in the Information Age, Pacific Grove, CA. Moss, M. L. and A.M. Townsend. 2000a. The Internet backbone and the American metropolis. Information Society, 16(1): 35–47. Moss, M. L. and A.M. Townsend. 2000b. The Role of the Real City in Cyberspace: Measuring and Representing Regional variations in Internet Accessibility, Information, Place, and Cyberspace, Donald Janelle and David Hodge, Eds., Berlin: Springer-Verlag. Mowery, D.C. and T.S. Simcoe, 2002, The Origins and Evolution of the Internet, in Technological Innovation and Economic Performance, R. Nelson, B. Steil, and D. Victor, Eds., Princeton, NJ: Princeton University Press. NTIA, National Telecommunications and Information Administration. 1995. Falling Through the Net: A Survey of the “Have Nots” in Rural and Urban America. http://www.ntia.doc.gov/reports.html. NTIA, National Telecommunications and Information Administration. 1997. Falling Through the Net: Defining the Digital Divide. http://www.ntia.doc.gov/reports.html. NTIA, National Telecommunications and Information Administration. 1998. Falling Through the Net II: New Data on the Digital Divide. http://www.ntia.doc.gov/reports.html. NTIA, National Telecommunications and Information Administration. 2002. A Nation Online: How Americans Are Expanding Their Use of the Internet. http://www.ntia.doc.gov/reports.html. New Paradigm Resources Group. 2000. CLEC Report, Chicago, IL. Noam, Eli. 2001. Interconnecting the Network of Networks. Cambridge, MA: MIT Press. Oxman, Jason, 1999, The FCC and the Unregulation of the Internet, Working paper 31, Federal Communications Commission, Office of Planning and Policy, Washington, D.C. Parker, E. B. 2000. Closing the digital divide in rural America. Telecommunications Policy, 24(4): 281–90. Porter, Michael. 2000. Strategy and the Internet. Harvard Business Review, 79(March), 63–78. Premkumar, G. and M. Roberts. 1999., Adoption of new information technologies in rural small businesses. Omega–International Journal of Management Science, 27(4): 467–484. Rogers, Everett M. 1995. Diffusion of Innovations. New York: Free Press. Rosston, Greg., and Brad Wimmer. 2001. “From C to Shining C”: Competition and Cross-Subsidy in Communications, In Communications Policy in Transition: The Internet and Beyond, Benjamin Compaine and Shane Greenstein, Eds., Cambridge, MA: MIT Press. Strover, Sharon. 2001. Rural Internet connectivity. Telecommunications Policy, 25(5): 331–47. Strover, Sharon and Lon Berquist. 2001. Ping telecommunications infrastructure: State and Local Policy Collisions, In Communications Policy in Transition: The Internet and Beyond Benjamin Compaine and Shane Greenstein, Eds., Cambridge, MA: MIT Press. Copyright 2005 by CRC Press LLC
C3812_C56.fm Page 17 Wednesday, August 4, 2004 9:49 AM
The Geographical Diffusion of the Internet in the United States
56-17
Strover, Sharon, Michael Oden, and Nobuya Inagaki. 2002. Telecommunications and rural economies: Findings from the Appalachian region, In Communication Policy and Information Technology: Promises, Problems, Prospects, Lorrie Faith Cranor and Shane Greenstein, Eds., Cambridge, MA: MIT Press. Townsend, A. M. 2001a. Network cities and the global structure of the Internet. American Behavioral Scientist, 44(10): 1697–1716. Townsend, A. M. 2001b, The Internet and the rise of the new network cities, 1969–1999. Environment and Planning B-Planning and Design 28(1): 39–58. U.S. Department of Agriculture. 2000. Advanced Telecommunications in Rural America: The Challenge of Bringing Broadband Communications to All of America. http://www.ntia.doc.gov/reports/ ruralbb42600.pdf. Warf, B. 2001. Segue ways into cyberspace: Multiple geographies of the digital divide. Environment and Planning B-Planning and Design 28(1): 3–19. Weinberg, Jonathan. 1999. The Internet and telecommunications services: Access charges, universal service mechanisms, and other flotsam of the regulatory system. Yale Journal of Regulation, 16(2) . Werbach, Kevin. 1997. A Digital Tornado: The Internet and Telecommunications Policy. Working Paper 29, Federal Communication Commission, Office of Planning and Policy, Washington, D.C. Woroch, Glenn. 2001. Local Network Competition, In Handbook of Telecommunications Economics, Martin Cave, Sumit Majumdar, and Ingo Vogelsang, Eds., Amsterdam: Elsevier. Zook, M. A. 2000. The Web of production: The economic geography of commercial Internet content production in the United States. Environment and Planning A 32(3): 411–426. Zook, M. A. 2001. Old hierarchies or new networks of centrality? The global geography of the Internet content market. American Behavioral Scientist, 44(10): 1679–1696.
Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 1 Wednesday, August 4, 2004 9:50 AM
57 Intellectual Property, Liability, and Contract CONTENTS Abstract 57.1 Cyberlaw 57.2 Copyright 57.2.1 57.2.2 57.2.3 57.2.4 57.2.5 57.2.6 57.2.7 57.2.8
Basics of Copyright Registration of Copyright Copying in a Digital Medium Websites Databases Software Open Source Software Digital Rights Management Systems
57.3 Trademarks and Service Marks 57.3.1 Domains 57.3.2 Cybersquatting — Reverse Domain Name Hijacking
57.4 Patents 57.4.1 Software 57.4.2 Business Methods
57.5 Liability 57.5.1 Linking 57.5.2 Providers
57.6 E-Commerce 57.6.1 E-Contract 57.6.2 Taxes
Jacqueline Schwerzmann
References
Abstract Most principles of law apply identically to the physical world as well as to the digital one. But some new rules emerged with the rise of digital technology and the Internet: copyright protection of new types of work like software, databases or Websites, patenting of business methods, and new forms of trademark infringement like cybersquatting or the liability for linking. New forms of contract, concluded digitally without the physical presence of the parties, are challenging the legal community. This chapter provides a legal overview of special cyberlaw problems.
57.1 Cyberlaw In the 1990s, with the increasing awareness of the sociological and economic impact of the Internet and the World Wide Web, legal scholars around the world created the terms cyberlaw and Internet law, implying
© 2005 by CRC Press LLC
C3812_C57.fm Page 2 Wednesday, August 4, 2004 9:50 AM
57-2
The Practical Handbook of Internet Computing
the dawn of a new category of law. Now, cyberlaw summarizes a loose collection of special rules regarding digital media, and most principles of law remain the same; a contract is still based on offer and acceptance, whether concluded electronically, e.g., via e-mail or physically. What changed the manner of doing business more dramatically is the absence of national borders within the environment of the Internet. Internet-based transactions raise many international legal questions addressing potential jurisdiction conflicts. At the forefront of Internet business law is the use of forum clauses (contract clauses that determine the court in which legal questions have to be presented) and the choice of law (the selection of national legal rules that are applicable). These two clauses are essential parts of every contract in e-commerce. (For short overviews of Internet law, see Spindler and Boerner [2002] and Isenberg [2002]).
57.2 Copyright 57.2.1 Basics of Copyright Almost every creative work, whether published or unpublished, digital or nondigital, is protected by copyright. These protected works of authorship include: literary works; musical works; dramatic works; pantomimes and choreographic works; pictorial, graphic and sculptural works; motion pictures and other audiovisual works; sound recordings and architectural works. It is essential to understand that the product does not have to be a piece of art; everyday work is also protected as intellectual property. Copyright protection exists automatically upon creation, without any formalities and generally lasts until 70 years after the death of the creator in Europe and the U.S. Recently the Sonny Bono Copyright Term Extension Act extended the term of copyright for works created on or after January 1, 1978, from 50 to 70 years after the creator’s death (17 U.S.C. 302). This preference of creators and producers caused heavy criticism in the scholarly community (see, e.g., http://cylber.law.harvard.edu/openlaw/eldredvashcroft/). But in its decision on Eldred v. Ashcroft, the U.S. Supreme Court ruled that Congress was within its rights when it passed legislation to extend copyright terms, shooting down an argument by independent online booksellers and other groups who believed repeated extensions are unconstitutional. To be covered by copyright, a work must be original and in a concrete “medium of expression” or a tangible form (e.g., a book or a CD; courts have even recognized that a computer program is protected when it exists in the RAM of a computer). A simple procedure or method cannot be protected as concrete work; however, patent registration might be possible. The copyright owner is usually the creator of a work, or the employer or contracting body in the case of work for hire of an employee or a contractor. In Europe the creator is always the initial copyright owner but can pass the right to make use of the work to the employer; this is usually part of labor contracts. A creator has several rights over reproduction, distribution, etc., which can be transferred to others. If the rights are transferred exclusively to somebody else, a written and signed agreement is mandatory. Single rights can be licensed separately and partial transfers to others are possible that only last for a certain period of time, for a specific geographical region, or for a part of the work. Copyrights are automatically generated only within the borders of the country in which the creator is native or works or in which the work is first published. There is no “international copyright” which will automatically protect an author’s work throughout the entire world, in contrast to international patents. However, multilateral or bilateral international contracts guarantee a similar level of copyright protection in most of the industrial countries and facilitate registration up until a country’s specific national laws require special formalities for complete protection.
57.2.2 Registration of Copyright Copyrights can be registered in the U.S. upon creation on a voluntary basis. Later, registration is necessary for works of U.S. origin before an infringement suit may be filed in court, and if registration is made within 3 months after publication of the work or prior to an infringement of the work, pursuit of statutory Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 3 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-3
damages and attorney’s fees in court is an option to the copyright owner. Otherwise, only the award of actual damages and profits is available to the copyright owner. For documentation purposes, the U.S. Copyright Act establishes a mandatory deposit requirement for works published in the U.S. within 3 months: two copies or phonorecords for the Library of Congress. In contrast, copyright registration is not available in European countries. It is generally wise to publish a work with a copyright note, even if it is not required by law. The public knows to whom the rights belong and how long they will be valid. In case of infringement, an infringer cannot argue that he did not realize that the work was protected. A copyright notice has to contain “the letter C in a circle,” “the letter P in a circle” for pbonorecords or the word “Copyright,” or the abbreviation “Copr.,” the year of first publication and the name of the copyright owner. A copyright notice is also not mandatory in European countries, but can be used to prove the bad faith of an infringer.
57.2.3 Copying in a Digital Medium What is copying in a digital medium? Printing or storing of a digital document on a hard drive is a form of copying and not allowed without permission. Caching is another specific form of digital copying. The Digital Millennium Copyright Act (DMCA), which is an amendment to the U.S. Copyright Act as a response to new legal issues regarding digital media, allows proxy caching by creating an exception to specifically exempt this technically necessary process. However, there are elaborate provisions in the DMCA requiring that the system providers comply with “rules concerning the refreshing, reloading, or other updating of the material when specified by the person making the material available online in accordance with a generally accepted industry standard data communications protocol for the system or network.” Whether client caching is legal copying as well is not clearly addressed in the DMCA, although most legal scholars agree that client caching is allowed. Several court decisions support this interpretation and suggest that client caching, at least, is fair use. The “fair use” doctrine allows certain limited exemptions to the strict copyright interdiction for personal, noncommercial use. In the Betamax case (Sony Corp. of America v. Universal City Studios, 464 U.S. 417, 455 [1984]), the Supreme Court held that “time-shifting” of copyrighted television programs with VCRs constitutes fair use under the Copyright Act and therefore is not an infringement of copyright. In Recording Industry Association of America v. Diamond Multimedia Systems (Court of Appeals (9th Cir.), Case No. 98-56727, June 1999) involving the use of portable digital music recorders for downloading (possibly illegal) MP3 files from the Internet, the U.S. Court of Appeals for the Ninth Circuit observed: “The Rio [the name of the player was Rio] merely makes copies in order to render portable, or ‘spaceshift,’ those files that already reside on a user’s hard drive ... Such copying is paradigmatic noncommercial personal use entirely consistent with the purposes of the Act.”
57.2.4 Websites Websites are normally multimedia works which are copyright protected as a whole or the different parts separately (texts, pictures, tables, and graphics can be trademarks as well). The use of copyrighted material belonging to a third party for creating a Webpage demands the permission of the copyright owner. It can be difficult to find the address or even the name of a copyright owner in order to get a permission (license). However if the work is registered, the Copyright Office may have this information. A Website, created partly with licensed material, still owes its origin as a whole to the new developer, but not the licensed material. Alteration and reuse of third-party material without permission is copyright infringement, although “fair use” exemptions may allow use for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. Fair use is a complex legal issue with no adequate general definition. The courts usually seek to verify the noncommercial purpose and character of the use, the nature of the copyrighted work, the amount of the portion used in relation to the copyrighted work as a whole so that no more than was necessary Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 4 Wednesday, August 4, 2004 9:50 AM
57-4
The Practical Handbook of Internet Computing
was utilized, and the effect of the use on the potential market for the copyrighted work. The extent of the U.S. fair use exemption is determined by the economic effects and possible damages incurred by unauthorized use. (For further information about fair use, see Patry [1996]).
57.2.5 Databases Databases are protected by copyright if the compilation has the quality of an original work of authorship. A collection of hyperlinks, such as the results of a search engine, could be protected if they are original enough, with a minimum of creativity of the collector as a necessary precondition. A database must therefore be original in its selection, coordination, and arrangement of data to have copyright protection as the mere collection of the data (“sweat of the brow”) is not sufficiently original for protection. The same principles apply to Europe. This is why commercial database owners often protect their work through additional contracts to ensure financial compensations if the data are used against their will. Even if the database is not protected by copyright law, users can be liable for breaching the contract, not for copyright infringement. An example is ProCD, Inc. v. Zeidenberg (86 F.3d 1447, 7th Cir. Wis. 1996). ProCD sold a CD ROM telephone database which was not original enough to be copyright-protected and defendant–user Zeidenberg was liable for making portions of the CD’s data available over the Internet by breaking a shrinkwrap license and not for infringing on the copyright (see Section 57.6.1).
57.2.6 Software Software is protected by copyright law (all kinds of formats: source code, object code or microcode). If software is simply copied without authorization, copyright infringement is obvious. More difficult is determining if the development of new software may present copyright infringement when it is, in some way, modeled after existing software to which the developer has been exposed. In Computer Associates International v. Altai, Inc. (982 F2d 693, 1992) the U.S. Court of Appeal for the Second Circuit developed a three-part test for determining whether a new software infringes the copyright of an existing computer program. The test is called a abstraction/filtration/comparison test and is widely but not uniformly accepted by the courts and not yet confirmed by a Supreme Court decision. 1. Abstraction: A computer program is broken down into its structural parts. The idea is to retrace and map each of the designer’s steps — in the opposite order in which they were taken during the program’s creation. 2. Filtration: This level should separate protected expression from nonprotected material. This process entails examining the structural components of a given program at each level of abstraction to determine whether their particular inclusion at that level was “idea” or was dictated by considerations of efficiency, so as to be necessarily incidental to that idea; required by factors external to the program itself; or taken from the public domain and hence is nonprotectable expression: Efficiency means that when there is essentially only one way to express an idea because no other way is as efficient, the idea and its expression are inseparable and copyright is no bar to copying that expression. The more efficient a set of modules are, the more closely they approximate the idea or process embodied in that particular aspect of the program’s structure and should not be protected by copyright. External factors can be mechanical specifications of the computer on which a particular program is intended to run; compatibility requirements of other programs with which a program is designed to operate in conjunction; computer manufacturers’ design standards; demands of the industry being serviced; and widely accepted programming practices within the computer industry. Public domain can be an expression that is, if not standard, then commonplace in the computer software industry. Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 5 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-5
3. Comparison: Comparing the remaining elements with those of the allegedly infringing software, there have to be substantial similarities. The inquiry focuses on whether the defendant copied any aspect of the protected expression, as well as an assessment of the copied portion’s relative importance with respect to the plaintiff ’s overall program.
57.2.7 Open Source Software Open Source Software is software that carries its source code with it or requires that this source code be kept open for others; it comes with a special open license. The Open Source Definition (OSD) specifies the license rules under which the Open Source Software shall be distributed. The license (often the GNU General Public License, but there are several others that have to be approved by the Open Source Initiative) has to guarantee free (but not necessarily cost-free) redistribution and openness of the source code and has further to allow modifications and derived works. The license must not discriminate against any person or group of persons and not restrict anyone from making use of the program in a specific field of endeavor. It also must be technology-neutral. The rights attached to the program must apply to all to whom the program is redistributed. Open Source Software is copyright protected as is every proprietary software and must not be in the public domain, where no copyright protection applies. But the license agreements are more open than normally used and allow wider reuse rights, A developer still enjoys copyright protection and can earn money with its development, but copying, changing, and redistributing of the work has to remain allowed for third parties. Linux is the most successful example of an open-source platform. In March 2003, the SCO Group, which succeeded Novell Networks as inheritor of the intellectual property for the Unix operating system, sued IBM, one of the most popular Linux distributors, for $3 billion, alleging that IBM had used parts of Unix or AIX (IBM’s derivative Unix) in their contribution to the development of Linux (see CNET News.com [2003a]). IBM licensed Unix from initial Unix owner AT&T in the 1980s and was permitted to build on that Unix technology, but SCO argues that IBM violated its contract by transferring some of those modifications to Linux. The SCO Group claims that their Unix System 5 code is showing up directly, line-by-line, inside of Linux (see CNET News.com [2003b]). SCO alleges IBM of misappropriation of trade secrets, unfair competition, breach of contract, and tortious interference with SCO’s business. Further, SCO sent out letters to 1,500 of the largest companies around the world to let them know that by using Linux they violate intellectual property rights of SCO. A court decision has not yet been made.
57.2.8 Digital Rights Management Systems As a result of the ease with which digital material can be duplicated, copyright owners try to protect their content from being copied and distributed legally and also technically. Technical solutions are often called digital rights management systems. In a legal context, the term covers a lot of different technical measures, such as watermarking, copy-protection (hardware- and software-based), and more complex e-commerce systems which implement elements of duplicate protection, use control, or online payment to allow the safe distribution of digital media like e-books and e-newspapers. International treaties (World Intellectual Property Organization’s (WIPO) Copyright and Phonogram Treaty) and the Digital Millennium Copyright Act (DMCA) in the U.S., protect copyright systems and the integrity of copyright management information against circumvention and alteration. A similar Act to the DMCA exists in the European Union (Directive 2001/29/EC on the harmonization of certain aspects of copyright and related rights in the information society), and several countries have adopted these circumvention rules. It is forbidden to circumvent accessor copyright technology or to distribute tools and technologies used for circumvention. Copyright management information (identification of the work; name of the author/copyright owner/performer of the work; terms and conditions for the use of the work) is protected by law against removal or alteration. Civil remedies and criminal charges Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 6 Wednesday, August 4, 2004 9:50 AM
57-6
The Practical Handbook of Internet Computing
carrying up to $1 million in fine or imprisonment of 10 years apply in case of an infringement. There are also several exceptions which enable reverse engineering, to achieve interoperability of computer programs, and for encryption research and security testing. The rules of the Digital Millennium Copyright Act concerning the protection of copyright mechanisms are controversial. Critics argue that they shift the copyright balance towards the copyright owner and are against consumer interests because fair use exemptions can be eliminated, security or encryption research will be difficult, and a cartelization of media publishers could develop. Several cases in which the DMCA applied are public already (for further information, see http://www.eff.org/IP/DMCA/ ). Following are two: In September 2000, the Secure Digital Music Initiative (SDMI), a multiindustrial conglomerate, encouraged encryption specialists to crack certain watermarking technologies intended to protect digital music. Princeton Professor Edward Felten and a team of researchers at Princeton, Rice, and Xerox successfully removed the watermarks, and they were kept from publishing their results by SDMI representatives, as allowed by the DMCA. In December 2002 a jury in San Jose acquitted the Russian software company ElcomSoft of all charges under the DMCA. ElcomSoft distributed over the Internet a software program called Advanced e-Book Processor, which allowed converting Adobe’s protected e-book format into ordinary pdf files. The company faced charges under the DMCA for designing and marketing software that could be used to crack e-book copyright protections. ElcomSoft was acquitted only because the jury believed the company did not violate the DMCA’s circumvention rules intentionally. One year prior to this litigation ElcomSoft’s Russian programmer Dmitry Sklyarov was jailed for several weeks while attending a conference in the U.S., also under reference to the DMCA. In Lexmark International, Inc. v. Static Control Components, Inc. (SCC), the printer company sued SCC for violation of the DMCA because SCC produced Smartek microchips, which allowed third party toner cartridges to work in several printers of Lexmark, circumventing Lexmarks software programs. Lexmark’s complaint alleged that the Smartek microchips were sold by Static Control to defeat Lexmark’s technological controls which guarantee that only Lexmark’s printer cartridges can be used. The U.S. District Court in the Eastern District of Kentucky ordered SCC to cease production and distribution of the Smartek microchips.
57.3 Trademarks and Service Marks Trademarks or Service Marks are used to identify goods or services and to distinguish them from those made or sold by others. Trademarks can be words, phrases, sounds, symbols or designs, or their combination, as well as the color of an item or a product or container shape (for example the shape of a CocaCola bottle). Trademark rights in the U.S. secure the legitimate use of the mark, and registration is not required. This is different from most European countries, where registration is mandatory. However registration is common and has several advantages such as: a legal presumption of the registrant’s ownership of the mark; the ability to litigate the use of a mark in federal court; and the use of the U.S. registration as a basis to obtain registration in foreign countries or an international registration. A mark which is registered with federal government should be marked with the symbol R in a circle. Unregistered trademarks should be marked with a “tm,” while unregistered service marks should be marked with a “sm.” Trademarks are protected against infringement and dilution. Only famous marks are protected against dilution, for example if cited in connection with inferior products which weaken the distinctive quality of the famous trademark.
57.3.1 Domains A domain name can be registered in the U.S. as a trademark, but only if it functions as a source identifier. The domain should not merely be an information used to access a Website. Printed on an article for sale, Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 7 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-7
the domain should indicate to the purchaser the name of the product or company and not the address where information is found.
57.3.2 Cybersquatting — Reverse Domain Name Hijacking Cybersquatting is the practice of registering and using a well-known name or trademark as a domain to keep it away from its owner or to extort a substantial profit from the owner in exchange for the return of the name. This sort of “business” prospered in the initial phase of e-commerce and has now become difficult due to the following legal actions available to an individual or a trademark owner: • The Anticybersquatting Consumer Protection Act (ACPA) of 1999, an amendment to the U.S. trademark laws, protects trademark owners and living individuals against unauthorized domain name use by allowing them to take over or to enforce the cancellation of domain names that are confusingly similar or identical to their names or valid trademarks, and make the cybersquatter liable for damages. To be successful however, it must proven that the cybersquatter acted in bad faith. • As an alternative to pursuing a domain name dispute through the courts, it is possible to use the administrative domain name dispute policies that have been developed by the organizations that assign domain names, This administrative procedure is often faster and cheaper. The Internet Corporation for Assigned Names and Numbers (ICANN) created a Uniform Domain Name Dispute Resolution Policy (UDRP). The World Intellectual Property Organization (WIPO) in Geneva is the leading ICANN-accredited domain name dispute resolution service provider. As of the end of 2001, some 60% of all the cases filed under the UDRP were filed with the WIPO. Generally, the WIPO is known to favor the complainant in its decisions. Reverse domain name hijacking in contrast refers to the bad faith attempt of a trademark owner to deprive a registered domain name holder of the domain by using the dispute resolution. In such cases a complaint in a dispute will not be successful. Trademark owners, though, have to be careful not to get accused — in reverse — of domain name hijacking. This can be the case if confusion between a registered domain and the trademark is very unlikely, if a complaint was brought despite knowledge that the domain name holder has a right or legitimate interest in the domain name or that the name was registered in good faith.
57.4 Patents A patent is an exclusive right given by law to enable inventors to control use of their inventions for a limited period of time. Under U.S, law, the inventor must submit a patent application to the U.S. Patent and Trademark Office (USPTO), where it will be reviewed by an examiner to determine if the invention is patentable. U.S. patents are valid only within the territory of the U.S. The Patent Cooperation Treaty (PCT) allows International Patents for selected countries or all member countries. Europe has the single Community Patent for the E.U. It is a form of a monopoly right to the inventor and excludes others from making, using, and selling the protected invention. There are three kinds of patents protecting different kinds of innovations and they all have different requirements: • The most common type of patent is a utility patent. It covers innovations which have to be useful, new, and nonobvious. The innovation itself has to be a process, a machine, an article of manufacture, or a composition of matter, or a combination of these factors. Prohibited is the patenting of abstract ideas, laws of nature, or natural phenomena, while applications of such discoveries are patentable. The innovation has to have a practical application to limit patent protection to inventions that possess a certain level of real world value. Furthermore, the innovation must be novel and different from existing patents or standard technology, with no public disclosures of the invention having been made before. If two persons are working independently and come to a Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 8 Wednesday, August 4, 2004 9:50 AM
57-8
The Practical Handbook of Internet Computing
result at roughly the same time, the first inventor gets patent protection, not the person who first filed an application. That is in contrast to European countries. “Nonobvious” means an innovation not previously recognized by people in that special field. U.S. patents generally expire 20 years after patent application. There are many Internet-related inventions protected by utility patents like data compression or encryption techniques and communications protocols, etc. IBM is the most prolific patent generator in information technology, topping the list of corporate patent awards for the last 10 years. The company filed 3,288 patents in 2002, bringing its total over the past 10 years to more than 22,000. • Design patents protect ornamental designs. A design patent protects only the appearance of an article. If a design is utilitarian in nature as well as ornamental, a design patent will not protect the design. Such combination inventions (both ornamental and utilitarian) can only be protected by a utility patent. A design patent has a term of 14 years from the date of issuance. For design patents, the law requires that the design is novel, nonobvious, and nonfunctional. Design patents are used in the Internet industry to protect the look of hardware (monitor, mouse, modem, etc.), while icons can be protected either by design patents or as a trademark, such as the Apple or Windows logos. • Plant patents protect new varieties of asexually reproducing plants.
57.4.1 Software In contrast to Europe, where the E.U. is currently discussing a revision of the practice, software is generally patentable in the U.S. where the Supreme Court’s 1981 opinion in Diamond v. Diehr (450 U.S. 175 [1981]) opened the way for patent protection for computer software — in addition to copyright protection as discussed above. Computer programs can be claimed as part of a manufacture or machine — and are legally part of that invention. A computer program can also be patentable by itself, if used in a computerized process where the computer executes the instructions set forth in the program. In that case, the software has to be on a computer readable medium needed to realize the computer program’s functionality. The interaction between the medium and the machine makes it a process and the software patentable. The innovation still has to have practical application and it cannot be a simple mathematical operation. For example, a computer process that simply calculates a mathematical algorithm that models noise is not patentable. However, a process for digitally filtering noise employing the mathematical algorithm can be patented. The number of software patents issued in recent years is immense. Examples are patents on graphical user interface software (e.g., Apple’s Multiple Theme Engine patent), audio software and file formats (e.g., Fraunhofer MP3 patents), Internet search engines (e.g., Google’s patent for a method of determining the relevance of Web pages in relation to search queries), or Web standards (e.g., Microsoft’s Style Sheet patent). One of the reasons for extensive software patenting is the common practice of cross-licensing. Among companies with patent portfolios, it is common defense strategy to cross-license one or more of its own patents if accused of infringing a patent belonging to another company. Instead of paying damages, suits are settled by granting mutual licenses which offer advantages to both sides. Cross-licenses between two parties give each company the right to use the patent of its adversary with or without additional compensation. Software is protected by copyright or patent law; choosing which course to pursue is sometimes difficult. Proving that third party software infringes copyright can be difficult. On the other hand protecting software through patents is expensive and time-consuming. In general patent protection is broader than copyright protection: e.g., if someone develops by chance a computer program similar to yours, copyright law does not forbid this — but patent rules do. Consequently, patent protection is the first choice, especially for financially strong companies. Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 9 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-9
57.4.2 Business Methods Processes involving business methods are patentable in the U.S. but not in Europe. This U.S. practice has been in place since July 1998, when a federal court upheld a patent for a method of calculating the net asset value of mutual funds (State Street Bank & Trust Co. v. Signal Financial Group, Inc. 149 F.3d 1368 [Fed. Cir. 1998] cert denied 119 S. Ct. 851, 1999). One of the most famous patents in this field is Amazon’s 1-Click Technology by Hartman et al. [1999], which allows customers to skip several steps in the checkout process by presetting credit card and shipping information. A long-running patent infringement suit between Barnes & Noble and Amazon ended with a settlement, the details of which were not disclosed. Examples of patented business methods include: Priceline.com’s patent for reverse auctions [Walker et al., 1998]; Open Market’s patents related to secure online credit-card payments or online shopping cards [Payne et al., 1998]; NetZero’s patent covering the use of pop-up advertising windows [Itakura et al., 2000], and a business model for recovery of missing goods, persons, or fugitive or disbursements of unclaimed goods using the Internet [Frankel et al., 2002]. In the initial phase of electronic commerce development, the U.S. Patent and Trademark Office (USPTO) granted a wide range of patents, which often affected the fundamental technologies behind ecommerce. In the following years, lawsuits between patent owners and several e-commerce companies that breached patents became numerous. USPTO now is more cautious in registering business methods, but there are still critics of this form of patent.
57.5 Liability 57.5.1 Linking Links are functional core elements of the World Wide Web and in most cases connect one document to another. They consist of an HTML-command and the address of the linked document. URI and links are considered facts and are not copyrightable. Linking is normally legal. Judge Harry L. Hupp decided for the Central District of California in Ticketmaster Corp. v Tickets.com (99-7654) that “Hypertext linking does not itself involve a violation of the Copyright Act ... since no copying is involved.” But the linking of documents and the way links are used can nevertheless be legally problematic, despite the technical ethos of “free linking.” In certain constellations linking can violate rules of unfair competition, intellectual property, or trademark. Since 1996, when the first linking cases were filed, a number of court decisions have been made, but clear legal guidelines are still not available (for an overview of court decisions see: http://www.linksandlaw.com/linkingcases.htm and Sableman [2001] or Sableman [2002]). 57.5.1.1 Framing and Inlining Framing and inlining are likely to be copyright infringement, if the source is integrated into the new Website without permission of the copyright owner and if no fair use excuse applies. • In the Dilbert case, programmer Dan Wallach used inline-links to display United Media’s Dilbert cartoon on his Website, where he created more and differently arranged comics. He received a cease-and-desist letter from the company and removed the links, as he had infringed the right of the original author by making derivative works. • Several major news organizations, including CNN, the Washington Post, and the Wall Street Journal, sued TotalNews for copyright and trademark infringement for its practice of framing their content within its own site. TotalNews retained its own frames — complete with advertisements — and put the other site or story within its main frame. The case was settled, and TotalNews ceased to frame the external content and a linking agreement guaranteed that TotalNews would link to Websites only via hyperlinks consisting of the names of the linked sites in plain text and in a way that would not lead to the confusion of the consumer. Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 10 Wednesday, August 4, 2004 9:50 AM
57-10
The Practical Handbook of Internet Computing
57.5.1.2 Deep Linking Deep Linking is still a controversial subject. A lot of commercial sites do not want to be deep linked because they fear financial losses if pages with advertisements are bypassed or less hits on their Web pages are recorded. Some claim trademark infringement if the link description uses their trademark logos or protected words. Other claims are unfair competition (if consumers are confused or trade practices are unfair) or a violation of trespass rules if spiders crawl servers excessively for linking information. Contract breaches are also claimed if the terms of service do not allow deep links. In Europe the protection of databases has to be taken into consideration as there are court decisions denying the right to deep link as an infringement of the database copyright. Therefore, it depends on the concrete circumstances whether deep links are legal or not. Noncommercial sites are probably more often allowed to deep link to other sites than commercial sites. Commercial sites have to make it obvious that the user will enter a new Website, and third party trademark signs cannot be used. Deep links to competitors is an especially delicate issue. In controversial linking situations it is wise to reach a linking agreement, although the practice of commercial sites including a linking license in their terms-of-use policy generally does not bind a user. A lot of legal conflicts about deep linking in the U.S. have been settled out of court, which is why strict rules about legal or illegal deep links are still missing. • Shetland Times Ltd. v. Shetland News: The Shetland Times newspaper and online site filed a copyright lawsuit against the digitally produced Shetland News for linking to the Times’ online headlines. The court decided that headlines are copyright protected and that using the headlines of the Shetland Times as a description for the links was copyright infringement on the part of the Shetland News. The case was settled before a final decision; the Shetland News was granted permission to link to the Times’ headlines, but must label individual articles as “A Shetland Times Story.” Near such stories, the Shetland News also promised to feature a button with the Times’ masthead logo that links to the newspaper’s home page. • The results of a search engine query consists of links. Until now, no major search engine has ever been sued in connection with deep linking. In the case Kelly v. Arriba Soft Corp., the U.S. Court of Appeals for the 9th Circuit decided that the display of copyrighted thumbnail pictures as a search result of an image search engine is not a copyright infringement, as long as the original image URL was linked from the thumbnails. The display of the thumbnails qualified as fair use. • Tickets.com was an online provider of entertainment, sports, and travel tickets and made available to its customer information about events as well as hypertext links to ticketsellers for tickets not available at Tickets.com. Tickets.com prefaced the link with the statement “These tickets are sold by another ticketing company.” Ticketmaster sued Tickets.com claiming copyright infringement (because the information about the events was extracted from Ticketmaster’s Website) and unfair competition among several other issues. The court decided in favor of Tickets.com, arguing that the information Tickets.com copied was only factual (date, location, etc.) and not copyrightable. Furthermore, the deep linking to Ticketmaster’s Website, where customers could order tickets, was not unfair competition because the whole situation was made transparent to customers. Ticketmaster advertisement banners were displayed on the event page. • In StepStone v. OFIR (two online recruitment companies) OFIR listed both its “own” vacancies and those gleaned from other online recruitment agencies on its Website such that this was not obvious to visitors at the first glance. The Cologne County Court in Germany dismissed OFIR’s contention and upheld StepStone’s arguments, agreeing that the list of job vacancies did constitute a database which warranted protection and that the bypassing of StepStone’s welcome page did undermine the company’s advertising revenue. • In Danish Newspaper Publishers Association (DDF) v. Newsbooster, a news search service providing users with relevant headlines with deep links to articles, was recognized as copyright infringement by a Danish court. The content of electronic newspapers was protected as a database. Newsbooster had its own commercial interest in the news search service and charged an annual subscription Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 11 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-11
fee from its subscribers. The court also decided that Newsbooster used unfair marketing practices. Meanwhile the Danish search company has begun to offer its Danish clients a version of its Newsbooster service that operates in a similar fashion to the decentralized file-sharing networks like Kazaa and Gnutella. • Dallas Morning News: Belo, the parent corporation of the Dallas Morning News, sent a letter to the Website BarkingDogs.org demanding that it stop deep linking to specific news articles within the paper’s site, rather than from the homepage. Belo’s lawyers justified this demand by citing a loss of advertising revenue, consumer confusion, and reference to its term of service which allowed only linking to the homepage. In the end, Belo did not pursue their claims. • Intensive-searching Websites for obtaining information and collecting descriptions can be illegal. In eBay v. Bidder’s Edge, eBay sued Bidder’s Edge for crawling the eBay Website for auction listing information. The court concluded that by continuing to crawl eBay’s site after being warned to stop, Bidder’s Edge was trespassing on eBay’s Web servers. The “Trespass Theory,” which indicates the forbidden interference with property similar to physical trespassing on land, is also used in connection with spamming or unsolicited emails. A substantial amount of eBay’s server capacity was blocked by the Bidder’s Edge robotics. 57.5.1.3 Search Engines Mark Nutritionals (MNI) as the trademark owner of Body Solutions filed suits against the search engines AltaVista, FindWhat, Kanoodle, and Ouverture for their practice of selling keywords (called “paidplacement”). If a user typed the plaintiff ’s trademark “Body Solutions” into the search bar, the Websites of advertisers who bought this keyword were shown more prominently than the plaintiff ’s Web page. MNI claimed trademark infringement and unfair competition. 57.5.1.4 Metatags Metatags are dangerous, if competing business’s trademarks are used. This often results in liability under the U.S. Lanham Act, which protects trademarks. Adding third-party trademarks as metatags can also be a violation of competition law. In SNA v. Array (51 F. Supp. 2d 554, 555, E.D.Pa. 1999) the court involved ruled that the defendants had intentionally used the plaintiff ’s trademark “to lure Internet users to their site instead of the official site.” The defendants repeatedly inserted the word “Seawind” as a metatag, which was a trademark of the plaintiff. The trademark owner manufactured aircraft kits under the name of Seawind and the defendant produced turbine engines. 57.5.1.5 Links to Illegal Content Linking to illegal or prohibited content can be illegal and can cause derivative liability. Linking can be interpreted as a contributory copyright infringement, if copyright infringement is involved in the Website to which the link points. Linking can also be illegal to Websites or information which breach the rules of the Digital Millennium Copyright Act and are therefore illegal (for example, circumvention technology). Until now, the courts decided only cases where the dissemination of illegal content was pursued in an active way by promoting the banned content. Whether links posted for strictly informational purposes are illegal as well is doubtful. Unanswered also is the question as to where the limits are to contributory infringement: How many link levels still create liability? The popular search engine Google changed its policy and removes offending links when a third party informs Google about potential infringements. A notice will inform users that the link has been removed. Google feared liability under the DMCA (see 1.5.2). Other link issues have included: • 2600.com. This was one of several Websites that began posting DeCSS near the end of 1999 and links to sites where the code was available to download. DeCSS is a program that allows the copying of DVDs protected by the CSS (Content Scramble System) technology. Several members of the Motion Picture Association filed a lawsuit and the U.S. Court of Appeals for the Second Circuit in Manhattan ruled in favor of the Motion Picture Association of America. The injunction barred Eric Corley and his company, 2600 Enterprises, Inc., from posting the software code designed to Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 12 Wednesday, August 4, 2004 9:50 AM
57-12
The Practical Handbook of Internet Computing
crack DVD-movie copy protection on their Website and from knowingly linking their Website to any other site at which the DeCSS software was posted. DeCSS was seen as circumvention technology, illegal under the Digital Millennium Copyright Act. The higher court upheld a linking test to determine if those responsible for the link (1) know at the relevant time that the offending material is on the linked-to site, (2) know that it is circumvention technology that may not lawfully be offered, and (3) create or maintain the link for the purpose of disseminating that technology. • In Bernstein v. J.C. Penney, Inc. the department store J.C. Penney and cosmetics company Elizabeth Arden were sued by photographer Gary Bernstein because of an unauthorized reproduction of one of his photographs. The picture was three clicks away from the Website of the defendants. The case did not reach a final decision but the district court denied the motion for a preliminary injunction.
57.5.2 Providers The Digital Millennium Copyright Act rules when online service providers are responsible for the content on their networks. 1. Service providers are not liable for “transitory digital network communications” that they simply forward “through an automatic technical process without selection of the material by the service provider” from a customer to its intended destination. 2. If a user posts information on the system or network run by the service provider, without the service provider’s knowledge that the information is infringing, then the service provider is generally not liable. If notified of an infringement, the service provider must respond “expeditiously to remove, or disable access to, the material that is claimed to be infringing.” However, a service provider has first to designate an agent from the U.S. Copyright Office to receive notification. 3. Service providers are not liable for information location tools like hyperlinks, directories, and search engines that connect customers to infringing material, if they don’t know about the infringing nature of the material and do not receive a financial benefit from the link. Nevertheless they have to take down or block access to the material upon receiving notice of a claimed infringement. 4. If a copyright owner requests a subpoena from the appropriate court for identification of an alleged infringer, a service provider is required to reveal the identity of any of its subscribers accused of violating the copyright.
57.6 E-Commerce 57.6.1 E-Contract The purpose of commercial contract law is to facilitate and support commerce. Therefore, concluding a contract has to be as easy as possible. Most contracts in the physical world are valid without any formalities (e.g., written form, signature), just by offer and acceptance. Internet business benefits from this principle: Digital contracts can be automated acceptance-offer procedures but still valid. • Click-wrap agreements, with the click of the mouse on the agreement button, are valid, whether a consumer buys merchandise, information, or software or just says yes to the terms and conditions of using a Website or mailservice. Even if a purchaser has not read the terms and conditions, they will usually get part of the contract. But terms of agreement have to be available to the consumer before shopping, and they have to be understandable and reasonable. To be on the safe side, conditions should be displayed to the consumer during the shopping procedure and should get accepted with a formal click. • Browse-wrap agreements are agreements on a Website that a viewer may read, but no affirmative action, like clicking on a button, is necessary . The agreements can be in a link at the bottom of a Web page or even displayed, without enforcing a consumer to accept by clicking on an icon. At
Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 13 Wednesday, August 4, 2004 9:50 AM
Intellectual Property, Liability, and Contract
57-13
least one court has denied the validity of such browse-wrap agreements: in Specht v. Netscape a software license agreement asked the users to “review and agree to the agreement,” but without doing so they still could download the software. The agreement was held as not enforceable. • Shrink-wrap agreements are not digital but printed agreements inside a product-box and apply mainly to software purchases. The term “shrink-wrap” refers to the act of opening a box that is sealed by plastic or cellophane. By breaking the cellophane the consumer should be bound to the terms and agreements inside. Shrink-wrap agreements are controversial because consumers normally see the agreement in detail not until the purchase is over, even if they get informed of the existence of the later agreement terms outside of the box. Some critics argue that consumers never really accept terms this way. U.S. courts generally rule that shrink-wrap agreements are enforceable. The most famous case is ProCD, Inc. v. Zeidenberg concerning the purchase of a CD ROM telephone database (see Section 57.2.5). The proposed federal UCITA (Uniform Computer Transactions Act) aims to make such adhesion contracts enforceable. In Europe shrink-wrap agreements are mostly seen as invalid.
57.6.2 Taxes In November 2001, U.S. Congress extended a ban on Internet taxes in the U.S. for three years (Internet Tax Freedom Act). The idea is to give e-commerce a good chance to grow. But the Internet is not entirely tax-free. States can charge sales tax on e-commerce transactions between companies and consumers within their borders (Seattle-based Amazon.com’s Washington state customers, for instance, must pay Washington’s sales tax). And existing laws enacted previously by several states and local governments are still valid. Further, some companies collect taxes when they have a physical store in a state. The problem is to find a tax system that is enough simplified to be used in every day transactions. Amazon and other companies note that there are thousands of tax jurisdictions nationwide, each with its own definition of taxable goods and its own rate schedule. In the E.U. online sales have been taxed since July 1 2003, when a new E.U. directive went into effect requiring all Internet companies to account for value added tax, or VAT, on digital sales. The levy adds 15% to 25% on select Internet transactions such as software and music downloads, monthly subscriptions to an Internet service provider and on any product purchased through an online auction. Digital goods have to be taxed in the country where they are supplied. For example, an E.U. company that sells an MP3 file or an e-book to a U.S.-based customer is required to collect the E.U.’s value-added tax (VAT). The directive requires U.S. companies to collect the VAT on digital goods sold in E.U.
References CNET News.com. SCO sues Big Blue over Unix, Linux, March 2003a. http://news.com.com/2100-1016_3991464.html?tag=rn. CNET News.com. Why SCO decided to take IBM to court, June 2003b. http://news.com.com/20081082_3-1017308.html?tag=rn. Frankel, Fred et al. Business model for recovery of missing goods, persons, or fugitive or disbursements of unclaimed goods using the internet, September 2002. U.S. Patent 6,157,946. Hartman, Peri et al. Method and system for placing a purchase order via a communications network, September 1999. U.S. Patent 5,960,411. Isenberg, Doug. The GigaLaw Guide to Internet Law. Random House, Washington, D.C., 2002. Itakura, Yuichiro et al. Communication system capable of providing user with picture meeting characteristics of user and terminal equipment and information providing device used for the same, December 2000. U.S. Patent 6,157,946. Patry, William. The Fair Use Privilege in Copyright Law. BNA Books, 2nd ed., Washington, D.C., 1996. Payne, Andrew C. et al. Network sales system, February 1998. U.S. Patent 5,715,314. Copyright 2005 by CRC Press LLC
C3812_C57.fm Page 14 Wednesday, August 4, 2004 9:50 AM
57-14
The Practical Handbook of Internet Computing
Sableman, Mark. Link law revisited: Internet linking law at five years. Berkeley Technology Law Journal, 3(16): 1237, 2001. Sableman, Mark. Link law revisited: Internet linking law at five years. July 2002 Supplement, http:// www.thompsoncoburn.com/pubs/MS005.pdf. Spindler, Gerald and Fritjof Boerner, Eds. E-Commerce Law in Europe and the USA. Springer, Berlin, 2002. Walker, Jay S. et al. Method and apparatus for a cryptographically assisted commercial network system designed to facilitate buyer-driven conditional purchase offers, August 1998. U.S. Patent 5,794,207.
Copyright 2005 by CRC Press LLC