Emerging Technologies for Semantic Work Environments: Techniques, Methods, and Applications
Jörg Rech Fraunhofer Institute for Experimental Software Engineering, Germany Björn Decker empolis GmbH–Part of Arvato: A Bertelsmann Company, Germany Eric Ras Fraunhofer Institute for Experimental Software Engineering, Germany
InformatIon scIence reference Hershey • New York
Acquisitions Editor: Development Editor: Senior Managing Editor: Managing Editor: Assistant Managing Editor: Copy Editor: Typesetter: Cover Design: Printed at:
Kristin Klinger Kristin Roth Jennifer Neidig Jamie Snavely Carole Coulson Jeannie Porter Michael Brehm Lisa Tosheff Yurchak Printing Inc.
Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail:
[email protected] Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2008 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Emerging technologies for semantic work environments : techniques, methods, and applications / Jorg Rech, Bjorn Decker and Eric Ras, editors. p. cm. Summary: "This book describes an overview of the emerging field of Semantic Work Environments by combining various research studies and underlining the similarities between different processes, issues and approaches in order to provide the reader with techniques, methods, and applications of the study"--Provided by publisher. ISBN-13: 978-1-59904-877-2 (hbk.) ISBN-13: 978-1-59904-878-9 (e-book) 1. Semantic Web. 2. Semantic networks (Information theory) 3. Information technology--Management. I. Rech, Jorg. II. Decker, Bjorn. III. Ras, Eric. TK5105.88815.E44 2008 658.4'038--dc22 2007042680 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher. If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication.
Table of Contents
Foreword ............................................................................................................................................ xiv Preface ................................................................................................................................................ xvi Acknowledgment ..............................................................................................................................xxiii
Section I Introduction Chapter I Enabling Social Semantic Collaboration: Bridging the Gap Between Web 2.0 and the Semantic Web ................................................................................................ 1 Sören Auer, University of Pennsylvania, USA Zachary G. Ives, University of Pennsylvania, USA Chapter II Communication Systems for Semantic Work Environments ................................................................ 16 Thomas Franz, University of Koblenz-Landau, Germany Sergej Sizov, University of Koblenz-Landau, Germany Chapter III Semantic Social Software: Semantically Enabled Social Software or Socially Enabled Semantic Web? ..................................................................................................... 33 Sebastian Schaffert, Salzburg Research Forschungsgesellschaft, Austria
Section II Semantic Work Environment Tools Chapter IV SWiM: A Semantic Wiki for Mathematical Knowledge Management ................................................. 47 Christoph Lange, Jacobs University Bremen, Germany Michael Kohlhase, Jacobs University Bremen, Germany
Chapter V CoolWikNews: More than Meet the Eye in the 21st Century Journalism ............................................................................................................................................. 69 Damaris Fuentes-Lorenzo, University Carlos III of Madrid, Spain Juan Miguel Gómez, University Carlos III of Madrid, Spain Ángel García Crespo, University Carlos III of Madrid, Spain Chapter VI Improved Experience Transfer by Semantic Work Support ................................................................. 84 Roar Fjellheim, Computas AS, Norway David Norheim, Computas AS, Norway Chapter VII A Semi-Automatic Semantic Annotation and Authoring Tool for a Library Help Desk Service ......................................................................................................... 100 Antti Vehviläinen, Helsinki University of Technology (TKK), Finland Eero Hyvönen, Helsinki University of Technology (TKK) and University of Helsinki, Finland Olli Alm, Helsinki University of Technology, Helsinki University of Technology (TKK), Finland Chapter VIII A Wiki on the Semantic Web .............................................................................................................. 115 Michel Buffa, Mainline, I3S Lab, France Guillaume Erétéo, Edelweiss, INRIA, France Fabien Gandon, Edelweiss, INRIA, France Chapter IX Personal Knowledge Management with Semantic Technologies ....................................................... 138 Max Völkel, Forschungszentrum Informatik (FZI) Karlsruhe, Germany Sebastian Schaffert, Salzburg Research Forschungsgesellschaft mbH, Austria Eyal Oren, Digital Enterprise Research Institute (DERI), Ireland Chapter X DeepaMehta: Another Computer is Possible ...................................................................................... 154 Jörg Richter, DeepaMehta Company, Germany Jurij Poelchau, fx-Institute, Germany
Section III Methods for Semantic Work Environments Chapter XI Added-Value: Getting People into Semantic Work Environments ..................................................... 181 Andrea Kohlhase, Jacobs University Bremen and DFKI Bremen, Germany Normen Müller, Jacobs University Bremen, Germany Chapter XII Enabling Learning on Demand in Semantic Work Environments: The Learning in Process Approach ..................................................................................................... 202 Andreas Schmidt, FZI Research Center for Information Technologies, Germany
Section IV Techniques for Semantic Work Environments Chapter XIII Automatic Acquisition of Semantics from Text for Semantic Work Environments ............................................................................................................................ 217 Maria Ruiz-Casado, Universidad Autonoma de Madrid, Spain Enrique Alfonseca, Universidad Autonoma de Madrid, Spain Pablo Castells, Universidad Autonoma de Madrid, Spain Chapter XIV Technologies for Semantic Project-Driven Work Environments ........................................................ 245 Bernhard Schandl, University of Vienna, Austria Ross King, Austrian Research Centers GmbH (ARC) Research Studios, Austria Niko Popitsch, Austrian Research Centers GmbH (ARC) Research Studios, Austria Brigitte Rauter, P.Solutions Informationstechnologie GmbH, Austria Martin Povazay, P.Solutions Informationstechnologie GmbH, Austria Chapter XV An Integrated Formal Approach to Semantic Work Environments Design ............................................................................................................................................... 262 Hai H. Wang, University of Southampton, UK Jin Song Dong, National University of Singapore, Singapore Jing Sun, University of Auckland, New Zealand Terry R. Payne, University of Southampton, UK Nicholas Gibbins, University of Southampton, UK Yuan Fang Li, National University of Singapore, Singapore Jeff Pan, University of Aberdeen, UK
Chapter XVI Lightweight Data Modeling in RDF ................................................................................................... 281 Axel Rauschmayer, University of Munich, Germany Malte Kiesel, DFKI, Germany
Compilation of References .............................................................................................................. 313 About the Contributors ................................................................................................................... 337 Index ................................................................................................................................................ 346
Detailed Table of Contents
Foreword ............................................................................................................................................ xiv Preface ................................................................................................................................................ xvi Acknowledgment ..............................................................................................................................xxiii
Section I Introduction This section will help the reader to learn about the most common technologies and to be able to classify these technologies. In addition, the reader will get a better understanding of why certain decisions about the usage of technologies have been made in the chapters of the subsequent sections. These chapters give an introduction to technologies that can be used to develop semantic work environments (SWE) and present several R&D projects in which different technologies and related tools have been developed. The authors compare these technologies using characteristics such as collaboration, communication, and so forth, and provide the reader with an overview of fundamental building blocks as well as development requirements for SWE development. Chapter I Enabling Social Semantic Collaboration: Bridging the Gap Between Web 2.0 and the Semantic Web ................................................................................................ 1 Sören Auer, University of Pennsylvania, USA Zachary G. Ives, University of Pennsylvania, USA Sören Auer and Zachary Ives introduce the interrelation between two trends that semantic work environments rely on: Web 2.0 and the Semantic Web. Both approaches aim at integrating distributed data and information to provide enhanced search, ranking, browsing, and navigation facilities for SWEs. They present several research projects to show how both fields can lead to synergies for developing knowledge bases for the Semantic Web.
Chapter II Communication Systems for Semantic Work Environments ................................................................ 16 Thomas Franz, University of Koblenz-Landau, Germany Sergej Sizov, University of Koblenz-Landau, Germany Thomas Franz and Sergej Sizov point out that communication is one of the main tasks of a knowledge worker, as it denotes the exchange of information and the transfer of knowledge, making it vital for any collaborative human work. The authors introduce different communication systems to indicate their different utilization and role in knowledge work. They present requirements on communication for SWEs and compare conventional communication tools and channels with these requirements. After presenting research work that contributes to the communication of knowledge work, they conclude with a visionary scenario about communication tools for future SWEs. Chapter III Semantic Social Software: Semantically Enabled Social Software or Socially Enabled Semantic Web? ..................................................................................................... 33 Sebastian Schaffert, Salzburg Research Forschungsgesellschaft, Austria Sebastian Schaffert continues the discussion of the synergies between Web 2.0/social web and the Semantic Web. He introduces two perspectives on how Semantic Social Software can be reached: One perspective is semantically enabled social software, that is, the usage of semantic metadata to enhance existing social software. The other perspective is a socially enabled Semantic Web, which means the usage of Social Software to create semantic metadata. Three examplary applications of semantic social software (i.e., Semantic Wikis, Semantic Weblogs, and e-portfolios) are provided by the author for deriving outstanding aspects of Semantic Social Software.
Section II Semantic Work Environment Tools This section provides seven chapters that are more related to concrete realizations of SWEs—tools developed to support work environments and personal activities using semantic technologies. These tools come from very different application domains such as oil drilling, journalism, and library help desk services, and motivate many application scenarios that exist for semantic work environments. The chapters further extend the overview of technologies already provided in Section I. Concrete architectures and platforms are presented for developing SWEs such as Semantic Wikis, Semantic Personal Knowledge Management systems, and Semantic Desktops. Several chapters also elaborate on the topics of authoring and annotating content, refer to inference technologies such as case-based reasoning, or present visualization approaches to support the tagging, linking, or presentation of content in SWEs.
Chapter IV SWiM: A Semantic Wiki for Mathematical Knowledge Management ................................................. 47 Christoph Lange, Jacobs University Bremen, Germany Michael Kohlhase, Jacobs University Bremen, Germany Christoph Lange and Michael Kohlhase present SWiM, a semantic Wiki for collaboratively building, editing, and browsing mathematical knowledge. In this Wiki, the regular Wiki markup is replaced by a markup format and ontology language for mathematical documents. SWiM represents a social semantic work environment, which facilitates the creation of a shared collection of mathematical knowledge. Chapter V CoolWikNews: More than Meet the Eye in the 21st Century Journalism ............................................................................................................................................. 69 Damaris Fuentes-Lorenzo, University Carlos III of Madrid, Spain Juan Miguel Gómez, University Carlos III of Madrid, Spain Ángel García Crespo, University Carlos III of Madrid, Spain Damaris Fuentes Lorenzo, Juan Miguel Gómez, and Ángel García Crespo describe a semantic work environment for the collaborative creation of news articles, thus building a basis for citizen journalism. Articles “within” this Wiki can be annotated using ontological metadata. This metadata is then used to reward users in terms of advanced browsing and searching the newspapers and newspaper archives, in particular finding similar articles. Faceted metadata and graphical visualizations help the user to find more accurate information and semantic related data when it is needed. The authors state that the Wiki architecture is domain-independent and can be used for other domains apart from news publishing. Chapter VI Improved Experience Transfer by Semantic Work Support ................................................................. 84 Roar Fjellheim, Computas AS, Norway David Norheim, Computas AS, Norway Roar Fjellheim and David Norheim describe the Active Knowledge Support for Integrated Operations (AKSIO) system that supports the experience transfer in operations of offshore oilfields. AKSIO is an example of a SWE that provides information in a timely and context-aware manner. Experience reports are processed and annotated by experts and linked to various resources and specialized knowledge networks. The authors demonstrate how Semantic Web technology is an effective enabler of improved knowledge management processes in corporate environments. Chapter VII A Semi-Automatic Semantic Annotation and Authoring Tool for a Library Help Desk Service ......................................................................................................... 100 Antti Vehviläinen, Helsinki University of Technology (TKK), Finland Eero Hyvönen, Helsinki University of Technology (TKK) and University of Helsinki, Finland Olli Alm, Helsinki University of Technology, Helsinki University of Technology (TKK), Finland
Antti Vehviläinen, Eero Hyvönen, and Olli Alm discuss how knowledge technologies can be utilized in creating help desk services on the Semantic Web. The authors focus on support for the semi-automatic annotation of natural language text for annotating question-answer pairs, and case-based reasoning techniques for finding similar questions. To provide answers matching with the content indexer’s and end-user’s information needs, methods for combining case-based reasoning with semantic search, linking, and authoring are proposed. The system itself is used as a help-desk application in Finnish libraries to answer questions asked by library users. Chapter VIII A Wiki on the Semantic Web .............................................................................................................. 115 Michel Buffa, Mainline, I3S Lab, France Guillaume Erétéo, Edelweiss, INRIA, France Fabien Gandon, Edelweiss, INRIA, France Michel Buffa, Guillaume Erétéo, and Fabian Gandon present a semantic Wiki called SweetWiki that addresses several social and usability problems of conventional Wikis by combining a WYSIWYG editor and semantic annotations. SweetWiki makes use of semantic web concepts and languages and demonstrates how the use of such paradigms can improve navigation, search, and usability by preserving the essence of a Wiki: simplicity and social dimension. In their chapter, they also provide an overview of several other semantic Wikis. Chapter IX Personal Knowledge Management with Semantic Technologies ....................................................... 138 Max Völkel, Forschungszentrum Informatik (FZI) Karlsruhe, Germany Sebastian Schaffert, Salzburg Research Forschungsgesellschaft mbH, Austria Eyal Oren, Digital Enterprise Research Institute (DERI), Ireland Max Völkel, Sebastian Schaffert, and Eyal Oren present how to use semantic technologies for improving one’s personal knowledge management. Requirements on personal knowledge management based on a literature survey are provided. Current nonsemantically as well as semantically-enhanced personal knowledge management tools were investigated and the reader is provided with an overview of existing tools. To overcome the drawbacks of the current systems, semantic Wikis are presented as the best implementation of the semantically-enhanced personal knowledge management vision—even if they do not perfectly fulfill all the stated requirements. Chapter X DeepaMehta: Another Computer is Possible ...................................................................................... 154 Jörg Richter, DeepaMehta Company, Germany Jurij Poelchau, fx-Institute, Germany Jörg Richter and Jurij Poelchau present the DeepaMehta platform as a semantic work environment. This platform replaces the traditional desktop by a semantic desktop. The authors explain the multilayered distributed architecture of DeepaMehta, which provides native support for topic maps to visualize the
underlying semantics of knowledge. Two exemplary applications of the DeepaMehta platform are presented that implement semantic work environments. The authors conclude their chapter with interesting future research directions and open questions that reflect future applications of SWEs.
Section III Methods for Semantic Work Environments Besides defining the requirements and choosing the right building blocks for developing an SWE, the success of such an environment still depends first of all on how the systems motivate people to participate and use the system, and second, on how information is structured and presented to the user. Hence, this section describes methods for better involving people in Semantic Work Environments and for enhancing so-called context-steered learning in these environments. Chapter XI Added-Value: Getting People into Semantic Work Environments ..................................................... 181 Andrea Kohlhase, Jacobs University Bremen and DFKI Bremen, Germany Normen Müller, Jacobs University Bremen, Germany Andrea Kohlhase and Normen Müller analyze the motivational aspect of why people are not using semantic work environments. They argue that the underlying motivational problem between vast semantic potential and extra personal investment can be analyzed in terms of the “Semantic Prisoner’s Dilemma.” Based on these considerations, they describe their approach of an added-value analysis as a design method for involving people in Semantic Work Environments. In addition, they provide an overview of other software design methods that can be used to develop SWEs and present two application examples of this analysis approach. Chapter XII Enabling Learning on Demand in Semantic Work Environments: The Learning in Process Approach ..................................................................................................... 202 Andreas Schmidt, FZI Research Center for Information Technologies, Germany Andreas Schmidt presents a method for building individual e-learning material that can be presented in SWEs. The cornerstone of this approach is the context-steered learning method, which uses the context of users and ontologically enriched learning material to build tailored e-learning material. Context-steered learning implements pedagogical guidance and thus goes beyond simple information delivery. It considers not only the current learning needs, but also the prerequisites for understanding the provided resources and a limited form of meaningful order (in the pedagogical sense). The author uses an architecture of loosely coupled services for implementing context-steered learning. This chapter is a contribution towards the challenge of presenting and structuring information so that it supports short-term problem solving as well as long-term competence development.
Section IV Techniques for Semantic Work Environments In order to realize Semantic Work Environments, information has to be collected, structured, and processed. This section describes specific techniques for supporting these activities, which might be helpful when building one’s own semantic-based tools. These techniques enhance available techniques and therefore provide better solutions for the challenges of extracting semantics, managing information from various distributed sources, and developing interfaces to quickly manage, annotate, and retrieve information. Chapter XIII Automatic Acquisition of Semantics from Text for Semantic Work Environments ............................................................................................................................ 217 Maria Ruiz-Casado, Universidad Autonoma de Madrid, Spain Enrique Alfonseca, Universidad Autonoma de Madrid, Spain Pablo Castells, Universidad Autonoma de Madrid, Spain Maria Ruiz-Casado, Enrique Alfonseca, and Pablo Castells provide an overview of techniques for semiautomatically extracting semantics from natural language text documents. These techniques can be used to support the semantic enrichment of plain information, since the manual tagging of huge amounts of contents is very costly. They describe how natural language processing works in general and state methods for tackling the problem of “Word Sense Disambiguation.” The authors provide a set of techniques for information and relationship extraction. This chapter gives a comprehensive overview of semantic acquisition techniques for SWEs, which reduce the cost of manually annotating preexisting information. Chapter XIV Technologies for Semantic Project-Driven Work Environments ........................................................ 245 Bernhard Schandl, University of Vienna, Austria Ross King, Austrian Research Centers GmbH (ARC) Research Studios, Austria Niko Popitsch, Austrian Research Centers GmbH (ARC) Research Studios, Austria Brigitte Rauter, P.Solutions Informationstechnologie GmbH, Austria Martin Povazay, P.Solutions Informationstechnologie GmbH, Austria Bernhard Schandl, Ross King, Niko Popitsch, Brigitte Rauter, and Martin Povazay state that capturing the semantics of documents and their interrelations supports finding, exploring, reusing, and exchanging digital documents. They believe that the process of capturing semantics must take place when the system users have maximum knowledge about a certain document (i.e., when the document is created or updated) and should interfere with a user’s normal workflow as little as possible. Therefore, they present METIS, a framework for the management of multimedia data and metadata from various distributed sources; Ylvi, a semantic Wiki platform with a high-level, collaborative user interface built on top of METIS for rapid knowledge exchange and management; and SemDAV, a Semantic-Web-based protocol that allows integrating personal information and sharing semantic information. SemDAV provides interfaces to quickly manage, annotate, and retrieve information.
Chapter XV An Integrated Formal Approach to Semantic Work Environments Design ............................................................................................................................................... 262 Hai H. Wang, University of Southampton, UK Jin Song Dong, National University of Singapore, Singapore Jing Sun, University of Auckland, New Zealand Terry R. Payne, University of Southampton, UK Nicholas Gibbins, University of Southampton, UK Yuan Fang Li, National University of Singapore, Singapore Jeff Pan, University of Aberdeen, UK The authors state that the services found in SWEs may have intricate data states, complex process behaviors, and concurrent interactions. They propose TCOZ (Timed Communicating Object-Z), a high-level design technique, as an effective way for modeling such complex SWE applications. Tools for mapping those models, for example, to the Unified Modeling Language (UML) or to several other formats, have been developed. In this chapter, the authors explain TCOZ, and use TCOZ for formally specifying the functionalities of an examplary application (a talk discovery system). They present tools for extracting an OWL web ontology used by software services as well as for extracting the semantic markup for software services from the TCOZ design model automatically. Chapter XVI Lightweight Data Modeling in RDF ................................................................................................... 281 Axel Rauschmayer, University of Munich, Germany Malte Kiesel, DFKI, Germany Axel Rauschmayer and Malte Kiesel state that the RDF standard is, in fact, suitable for lightweight data modeling, but it lacks clearly defined standards to completely support it. They present the Editing MetaModel (EMM), which provides standards and techniques for implementing RDF editing: It defines an RDF vocabulary for editing and clearly specifies the semantics of this vocabulary. The authors describe the EMM constructs and its three layers (i.e., schema, presentation, and editing). The schema defines the structure of the data, the presentation selects what data to display, and the editing layer uses projections to encode, visualize, and apply changes to RDF data. Particular focus is given to a formal description of the EMM and to the potential implementation of this model in the GUI of a semantic work environment. At the end of the chapter they provide a set of related technologies for modeling semantics for SWEs. They think that EMM is useful for developers of data-centric (as opposed to ontology-centric) editors and can serve as a contribution to the ongoing discussion about simpler versions of OWL.
Compilation of References .............................................................................................................. 313 About the Contributors ................................................................................................................... 337 Index ................................................................................................................................................ 346
xiv
Foreword
Since the dawn of the Semantic Web, we have been working on developing techniques that use the data, metadata, and links available on the World Wide Web (WWW) for inferring additional services. These services aim at supporting our work and lives with technologies such as the resource description framework (RDF) and, most recently, the Web ontology language (OWL). Several of these technologies enable or use semantic data and also enable further technologies that exploit the wealth of information on the WWW. This book, edited by Jörg Rech, Eric Ras, and Björn Decker, deals with another interesting and important problem, namely, integrating semantic technologies into work environments. It looks at ways of creating semantically richer applications that intelligently assist the user with additional information. A richer representation enables new services for people and enables further technologies that exploit this semantic information. Today, semantic technologies increasingly find their way into collaborative tools such as Wikis, Desktops, or Web-based platforms. In the context of corporate settings, these semantic-based collaborative applications represent enhanced tools that intelligently and autonomously support the knowledge worker with relevant information on time. Semantic work environments such as Semantic Wikis, Semantic Desktops, or Web-based semantic platforms are information systems that use semantic technologies to enhance the content in these systems for presentation, querying, reporting, or analysis purposes. Besides the information available on the WWW, these environments raise and exploit the more specific information available throughout company networks that is ripe to be integrated into new services. Furthermore, most employees of these companies like to share their knowledge and use these systems for documenting, storing, and disseminating their knowledge. To integrate the data into company networks, several systems have been developed that integrate semantic technologies—many of them are presented in this book. The first part of this book (sections one and two) is an interesting collection of chapters dealing with integrating semantic technologies and metadata into work environments. While the first three chapters investigate how semantic collaboration can be enabled and fostered, the other chapters describe real-world semantic work environments such as: • • • • •
SWiM: A Semantic Wiki for collaboratively building, editing, and browsing mathematical knowledge in order to support knowledge management for mathematicians. CoolWikNews: A Semantic Wiki devoted to news publishing in order to support knowledge management for journalists. AKSIO: An active socio-technical system for knowledge transfer between drilling projects, using documented experiences, best practices, and expert references. Opas: A semi-automatic annotation and authoring tool to support librarians via specialized help desk services. SweetWiki: A Semantic Wiki that integrates several semantic technologies to provide a Semantic Web application platform for everyone.
xv
• • • •
SemperWiki: A Semantic Wiki that is targeted to support personal knowledge management with semantic technologies. DeepaMehta: A platform designed to provide knowledge workers with additional information that supports their work, thoughts, and collaborations with colleagues. Ylvi: A Semantic Wiki that enables and supports the creation of semantic information during normal project work. OntoWiki: A Semantic Wiki aimed to support the social and semantic collaboration.
In order to enable and keep these semantic work environments alive, we need several technologies and methodologies. Standard data modeling formats and methods are necessary for promoting interoperability and for integrating users into these systems. This issue of using techniques and methods for semantic work environments is addressed in the second part (sections three and four) of this book. The six chapters address the following questions: • • • • •
How can we integrate people into semantic work environments and show them the added value these systems offer? How can we enable and foster learning during work activities and on demand in semantic work environments? How can we automatically acquire semantic information from previously existing sources for semantic work environments? How can we integrate the various existing technologies for semantic work environments to support project-driven work? How can we model the data, metadata, and relations used in semantic work environments?
In summary, the editors have selected a very interesting collection of chapters that present the current state of the art in semantic work environments. The primary objective of this book is to mobilize researchers and practitioners to develop and improve today’s work environments using semantic technologies. It raises the awareness in the research community for the great potential of SWE research. All in all, this book is a significant collection of contributions on the progress in semantic work environments and its use in various application domains. These contributions constitute a remarkable reference for researchers on new topics on the design and operation as well as on technical, managerial, behavioral, and organizational aspects of semantic work environments.
Prof. Dr. Klaus-Dieter Althoff Intelligent Information Systems University of Hildesheim, Germany September 2007
Klaus-Dieter Althoff is full professor at the University of Hildesheim and is directing a research group on intelligent information systems. He studied mathematics with a focus on expert systems at the University of Technology at Aachen. In 1992 he finished his doctoral dissertation on an architecture for knowledge-based technical diagnosis at the University of Kaiserslautern, where he also received the postdoctoral degree (Habilitation) with a thesis on the evaluation of case-based reasoning systems in 1997. He worked at the Fraunhofer Institute for Experimental Software Engineering as group leader and department head until he went to Hildesheim in April 2004. His main interests include techniques, methods and tools for developing, operating, evaluating, and maintaining knowledge-based systems, with a focus on case-based reasoning, agent technology, experience management, and machine learning.
xvi
Preface
In many companies, technical work environments integrate information systems aimed at supporting their long term organizational strategy and at providing efficient support to their core business processes. To support the knowledge worker by integrating these information systems is a complex task which requires the participation of various groups of people and technical systems. With the rise of semantic technologies, more and more information gets enriched with semantic metadata, which makes the information ready for harvesting. In the Web 2.0 (Murugesan, 2007) and Web 3.0 (Lassila & Hendler, 2007) movement, we experience this phenomenon through so-called “mashups” (Ankolekar, Krötzsch, Tran, & Vrandecic, 2007) of existing information sources such as search engines (e.g., Google Search), geographical map servers (e.g., Google Maps), collaborative encyclopedias (e.g., Wikipedia), or open picture repositories (e.g., Flickr). In order to map this phenomenon to the work environments in companies, we have to integrate the different information sources available in and near organizations. Semantic Work Environments (SWE) such as Semantic Wikis (Semantic Wikis, 2005; Völkel, Schaffert, Pasaru-Bontas, & Auer, 2006) or Semantic Desktops (Decker, Park, Quan, & Sauermann, 2005) are aimed at exploiting this wealth of information in order to intelligently assist our daily work. Ideally, they are built to collect data for deriving our current information needs in a specific situation and to provide processed and improved information that can be integrated into the task at hand. Furthermore, as the usage of this information is tightly integrated into our daily work, we do not only take part in the (re)use but also in the creation and sharing of information. This continuous flow of information, experience, and knowledge helps to keep us up-to-date in our area of expertise and enables us to integrate the experience of our colleagues into our own work. Hence, semantic work environments will also address the challenge of life-long learning because they provide easy and fast access to information that fits our current working situation. This means, on the one hand, that such systems help us to solve short-term problems, and on the other hand, that they enhance long-term competence development. Semantic Work Environments combine the strengths of Semantic Web technologies, workplace applications, and collaborative working—typically for a specific application domain such as research or journalism—and represent the “Semantic Web in the small.” Instead of making all content in the Internet machine-readable (i.e., “Semantic Web in the large”), the SWE approach tackles the problem on a smaller, more focused scale. Take Semantic Wikis as an example: Wikis are enhanced by the simple annotation of Wiki content with additional machine-readable metadata and tools that support authors during the writing of new or the changing of existing content (e.g., via self-explaining templates). This approach of building up the Semantic Web in the small is in line with current developments in the area of the Semantic Web. One prominent example is the definition of so called “microformats” (Ayers, 2006; Khare, 2006): Based on standard Web technology, they allow embedding small information chunks like contact information into Web sites.
xvii
We believe that semantic work environments are the first step towards achieving the vision of the Semantic Web, for several reasons: they are lightweight, goal-oriented, and more likely to use synergies. Semantic work environments are lightweight, since they support a specific problem and, therefore, require only relevant features for this task. They do not intend to solve a general, somewhat unfocused and fuzzy problem but have a certain application domain that imposes specific problem types to be solved. Therefore, requirements elicitation and implementation of the semantic work environments can be performed in a goal-oriented way and can be related to a set of working situations with specific tasks, technical work applications, and networks of people. Since they operate within a defined organizational boundary or community, reaching a consensus about the needed concepts and their meaning (e.g., by creating a consensus through an ontology) can be performed more easily compared to general Semantic Web applications. In addition, due to this focus, a quick return on investment is more likely. The focus of SWEs is also the basis for synergies that arise from embedding them tightly into the business processes and workflows within an organization. These business processes provide relevant information for classifying and organizing the information created and reused. This information can later be exploited by inference techniques to improve reuse by people operating in similar contexts. A second aspect of synergies is to overcome the dichotomy between the need for information and the often insufficient willingness to make information available for others. SWEs will play an important role for information storage, acquisition, and processing in specific application domains during knowledge work. In the future, they will enable the widespread use of automated inference mechanisms or software agents on top of the semantic information. Semantic enrichment of work environments will help participants in their daily work to avoid risks and project failures that are frequently encountered in traditional projects.
Challenges A commonly accepted fact is the ever-increasing amount of information we have to cope with during our daily work. While a century ago, most countries were based on manual-labor cultures, we are currently living in a world of knowledge workers. And the rise of computers and their integration into our daily work environments increases this flood of information even more. Or, to quote John Naisbitt: “We are drowning in information but starved for knowledge” (Naisbitt, 1984). Therefore, we need approaches to reduce the amount of information and to optimize access to important information and the way it is presented to the user—anywhere and anytime. Approaches such as Wikis are important; however, there is still much work to be done to integrate them into our daily working environments. Attempts to construct semantic work environments have to adequately deal with the challenges that exist in the new millennium. Such challenges can be classified into several categories: • • • •
Challenge 1: Enabling the collaboration of work communities for exchanging information and using semantic work environments. Challenge 2: Building semantic work environments to support social collaboration, information integration, and automated inference. Challenge 3: Starting semantic work environments and keeping them alive. Challenge 4: Adequately presenting information to a user so that it supports the two extremes of short-term problem solving and long-term competence development.
xviii
Challenge 8
Chapter III
Chapter IV Chapter V Chapter VI
Chapter VII
Chapter VIII
Chapter XIII
Chapter XIV
• •
Chapter XII
•
Chapter XI
•
Chapter IX Chapter X
Challenge 5
Challenge 4
Challenge 3
Challenge 7
Chapter II
Challenge 6
Chapter I
Challenge 2
Chapter
Challenge 1
Table 1. Chapters and approached challenges
Chapter XV
Chapter XVI
Challenge 5: Coping with the plethora of overlapping and similar Semantic Web-technologies, that is, how to select the right building blocks for the development of semantic work environments. Challenge 6: Coping with quick innovation cycles and the resulting time pressure that drives us away from classical search to context-sensitive and pro-active information offerings. Challenge 7: Obtaining the needed information in a timely manner. Challenge 8: Building architectures of such environments with different APIs, data structures, and business processes. In order to deal with the complexity of developing such tools, adequate methodologies, technologies, and ontologies are mandatory.
As in the case of Chapter X, most chapters in this book do not only approach one challenge, but tackle several of them.
solutions/BaCkground Today, members from multiple disciplines work on SWEs and collaborate to provide highly integrated services by integrating the ever increasing amount of information. Based on collaborative technologies such as Wikis and using semantic technologies such as OWL, collaborative semantic work environments
xix
can be created that are more efficient and effective than the sum of their parts and support the work of their users. However, this requires coping with different APIs, data structures, business and learning processes, as well as with the complexity of developing such tools, methodologies, technologies, and ontologies. Fortunately, SWEs do not need to be built from scratch. Modern information technologies as well as developments in knowledge management provide a substantial basis for developing SWEs. In particular, the vision of the Semantic Web (Berners-Lee, 1998) provides the basis for SWEs: Documents understandable by humans are augmented with machine-processable metadata. The Semantic Web provides standards such as the resource description framework (RDF) (Decker, Melnik et al., 2000; Decker, Mitra, & Melnik, 2000) or the Web ontology language (OWL) (Dean et al., 2002). Based on these standard languages, ontologies—that is, formal descriptions of concepts and their relations—allow inferring further facts and hypotheses. Examples of such ontologies are the document description ontology Dublin Core (McClelland, 2003) or upper-level ontologies like SUMO (Bouras, Gouvas, & Mentzas, 2007; Pease, 2003) or DOLCE (Oberle et al., 2007). These standards as well as the tools using these standards are the technical building blocks for semantic work environments. Besides the usage of such technologies, we have to think about how such systems provide information to the user. How should the information be structured? How should it be presented? What kind of navigation support should be offered? Information might be gathered from very different sources, different domains, and communities. The semantic annotation of information will help us to select relevant information and to put these information chunks in relation, thus giving a meaning to the information set. Solutions for making information more understandable, transferable to a new situation, and more learnable can be found in the domain of e-learning and knowledge management systems, (educational) adaptive hypermedia systems, instructional design literature, and so forth.
Book Content The objective of this book is to provide an overview of the field of semantic work environments by bringing together various research studies from different subfields and underlining the similarities between the different processes, issues, and approaches. The idea is also to show that many different application areas can benefit from the exploitation of already existing information sources. In order to present the solutions that address the challenge of creating semantic work environments by developing adequate methodologies, technologies, and ontologies, we structured the book into the four sections Introduction, Tools, Methods, and Techniques. The introduction section provides approaches that enable collaborative semantic work environments while the tools section gives an overview of currently implemented technologies with concrete results from field applications. The methods section provides insights into how to set up and run semantic work environments, and the techniques section describes base technologies to be used within semantic work environments. The introduction section starts with Chapter I, “Enabling Social Semantic Collaboration: Bridging the Gap between Web 2.0 and the Semantic Web” by Sören Auer and Zachary Ives. This chapter describes the interrelation between two trends that semantic work environments rely on in order to process existing and develop new knowledge: Web 2.0 as the base technology for human collaboration and the Semantic Web as the approach to add machine-processable descriptions to this knowledge. The technical realization is performed using the example of the tool OntoWiki. Chapter II, “Communication Systems for Semantic Work Environments,” by Thomas Franz and Sergej Sizov, points out how different means
xx
of communication are used within knowledge work. Common means of communications like e-mail or groupware are analyzed for “semantic gaps,” which are then refined into requirements for semantically enabled communication. Chapter III, “Semantic Social Software: Semantically Enabled Social Software or Socially Enabled Semantic Web?” by Sebastian Schaffert continues the discussion of the synergies between Web 2.0/social web and the Semantic Web. The author describes two ways of how semantic social software can be implemented: One possibility is semantically enabled social software, that is, Web 2.0 applications that are enriched with semantics. The other possibility is a Socially Enabled Semantic Web, which means involving communities in the build-up of ontologies. Three applications provide examples of semantic social software. The tools section provides an overview of current applications that can be a part of semantic work environments. This section comprises chapters four to ten. Chapter IV, “SWIM – A Semantic Wiki for Mathematical Knowledge Management,” by Christoph Lange and Michael Kohlhase, presents a semantic Wiki to share mathematical knowledge. In this Wiki, the regular Wiki markup is enhanced with additional mathematical markup, which integrates a mathematical ontology. Chapter V, “CoolWikNews: More than Meet the Eye in the XXI Century Journalism,” by Damaris Fuentes Lorenzo, Juan Miguel Gómez, and Ángel García Crespo, is about a semantic work environment for the collaborative creation of news articles, thus building a basis for citizen journalism. Articles in this Wiki can be annotated using ontological metadata. This metadata is then used to support navigation within articles, in particular for finding further relevant articles. Chapter VI, “Improved Experience Transfer by Semantic Work Support,” by Roar Fjellheim and David Norheim describes, the Active Knowledge Support for Integrated Operations (AKSIO) system. This system supports the experience management of oil drilling activities. This system supports collaborative knowledge creation and annotation by linking practitioners and experts. Chapter VII, “A Semi-Automatic Semantic Annotation and Authoring Tool for a Library Help Desk Service,” by Antti Vehviläinen, Eero Hyvönen, and Olli Alm, provides a help desk system that allows annotating natural language question-answer pairs with additional semantic information. To support this annotation, the system suggests potential annotations. Case-based reasoning is then used on this semantic information to retrieve the best fitting answers to a certain problem. The system itself is used in a help-desk application run by Finnish libraries to answer questions asked by library users. Chapter VIII, “A Wiki on the Semantic Web,” by Michel Buffa, Guillaume Erétéo, and Fabian Gandon, is about the SweetWiki system. This system combines a WYSIWYG editor and semantic annotations, creating a Wiki system with improved usability. The semantic annotation feature can use previously uploaded ontologies. In their article, they also provide an overview of several other semantic Wikis. Chapter IX, “Personal Knowledge Management with Semantic Technologies,” by Max Völkel, Sebastian Schaffert, and Eyal Oren, presents how to use semantic technologies to improve one’s personal knowledge management. Requirements on personal knowledge management based on a study are described. Current personal knowledge management tools are investigated concerning their drawbacks. To overcome these drawbacks, the usage of semantic Wikis for personal knowledge management is suggested. Chapter X, “DeepaMehta – Another Computer is Possible,” by Jörg Richter and Jurij Poelchau, presents the DeepaMehta platform, which can be used to build up semantic work environments. This platform provides native support for topics maps to visualize the underlying semantics of knowledge. Two examples of the application of the DeepaMehta platform show implementations of semantic work environments. Methods for Semantic Work Environments as the third section of this book presents approaches on how to build up and run semantic work environments. Chapter XI, “Added Value: Getting People into Semantic Work Environments,” by Andrea Kohlhase and Normen Müller, analyze the motivational aspect of why people are using semantic work environments based on the “prisoner’s dilemma.” Based on these considerations, they describe their approach of added-value analysis. Two application examples
xxi
of this analysis approach are presented. Chapter XII, “Enabling Learning on Demand in Semantic Work Environments: The Learning in Process Approach,” by Andreas Schmidt, presents a method for building individual learning material. The cornerstone of this approach is the Context-Steered Learning method, which uses the context of the user and ontologically enriched learning material to build tailored e-learning material. Base techniques for building Semantic Work Environments are presented in the final section. Chapter XIII, “Added Automatic Acquisition of Semantics from Text for Semantic Work Environments,” by Maria Ruiz-Casado, Enrique Alfonseca, and Pablo Castells, provides an overview of techniques for extracting semantics from text. These techniques can be used to support the semantic enrichment of previously non-annotated documents. Chapter XIV, “Technologies for Semantic Project-Driven Work Environments,” by Bernhard Schandl, Ross King, Niko Popitsch, Brigitte Rauter, and Martin Povazay, is about the METIS media data—an approach to support project management and execution by semantic work environments. Particular focus is placed on semantically enriched multimedia content. Based on METIS, the semantic Wiki Ylvi is used to build up organizational memories. Furthermore, the SemDAV Protocol is used for semantic data exchange. Chapter XV, “An Integrated Formal Approach to Semantic Work Environments Design,” by Hai H. Wang, Jin Song Dong, and Jing Sun, provides an ontology for defining Semantic Web services to build up flexible semantic work environments. An online talk discovery system is used as an example of their approach. Finally, Chapter XVI, “Lightweight Data Modeling in RDF,” by Axel Rauschmayer, and Malte Kiesel, presents the Editing Meta-Model (EMM), which supports editing within semantic work environments. Particular focus is given to a formal description of the Editing Meta-Model and to the potential implementation of this model in the GUI of a semantic work environment.
referenCes Ankolekar, A., Krötzsch, M., Tran, T., & Vrandecic, D. (2007). The two cultures: Mashing up Web 2.0 and the Semantic Web. Banff, Alberta, Canada: ACM Press. Ayers, D. (2006). The shortest path to the future Web. Internet Computing, IEEE, 10(6), 76-79. Berners-Lee, T. (1998). Semantic Web roadmap. Retrieved March 14, 2008, from http://www.w3.org/ DesignIssues/Semantic.html Bouras, A., Gouvas, P., & Mentzas, G. (2007). ENIO: An enterprise application integration ontology. Paper presented at the 18th International Conference on Database and Expert Systems Applications (DEXA ’07). Dean, M., Connolly, D., Harmelen, F. v., Hendler, J., Horrocks, I., McGuinness, D. L., et al. (2002). OWL Web ontology language 1.0 reference. Retrieved March 13, 2008, from http://www.w3.org/TR/ owl-ref/ Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., et al. (2000). The Semantic Web: The roles of XML and RDF. Internet Computing, IEEE, 4(5), 63-73. Decker, S., Mitra, P., & Melnik, S. (2000). Framework for the Semantic Web: An RDF tutorial. Internet Computing, IEEE, 4(6), 68-73.
xxii
Decker, S., Park, J., Quan, D., & Sauermann, L. (2005, November 6). The semantic desktop - next generation information management and collaboration infrastrucutre. Paper presented at the International Semantic Web Conference (ISWC 2005), Galway, Ireland. Khare, R. (2006). Microformats: The next (small) thing on the Semantic Web? Internet Computing, IEEE, 10(1), 68-75. Lassila, O., & Hendler, J. (2007). Embracing“Web 3.0.” IEEE Internet Computing, 11(3), 90-93. McClelland, M. (2003). Metadata standards for educational resources. Computer, 36(11), 107-109. Murugesan, S. (2007). Understanding Web 2.0. IT Professional, 9(4), 34-41. Naisbitt, J. (1984). Megatrends: Ten new directions transforming our lives. New York: Warner Books. Oberle, D., Ankolekar, A., Hitzler, P., Cimiano, P., Sintek, M., Kiesel, M., et al. (2007). DOLCE ergo SUMO: On foundational and domain models in the SmartWeb integrated ontology (SWIntO). Web Semantics, 5(3), 156-174. Pease, A. (2003). SUMO: A sharable knowledge resource with linguistic inter-operability. Paper presented at the Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on. Semantic Wikis. (2005). Semantic Wiki Overview. Retrieved March 13, 2008, from http://c2.com/cgi/ wiki?SemanticWikiWikiWeb Völkel, M., Schaffert, S., Pasaru-Bontas, E., & Auer, S. (2006). Wiki-based knowledge engineering: Second workshop on Semantic Wikis. Odense, Denmark: ACM Press.
xxiii
Acknowledgment
Our vision for this book was to gather information about methods, techniques, and applications from the domain of semantic work environments, to share this information within the community, and to distribute this information across projects and organizational boundaries. During the course of realizing this vision, we received much support from people who spent a huge amount of effort on the creation and review process of the book. We would like to express our appreciation to all the projects and people involved in researching semantic work environments. We are especially grateful to the authors who provided us with deep insights into their projects and related results. Furthermore, we are also indebted to the publishing team at IGI Global for their continuing support throughout the whole publication process. Deep appreciation and gratitude is due to Jessica Thompson, Assistant Managing Development Editor at IGI Global, who supported us and kept the project on schedule. Most of the authors of chapters included in this book also served as reviewers for chapters written by other authors. Thanks go to all those who provided constructive and comprehensive reviews. Last but not least, thanks also go to the technical staff at Fraunhofer IESE and especially to Sonnhild Namingha for proofreading parts of the book. The Editors, Jörg Rech, Eric Ras, Björn Decker Kaiserslautern, Germany September 2007
Section I
Introduction
Chapter I
Enabling Social Semantic Collaboration:
Bridging the Gap Between Web 2.0 and the Semantic Web Sören Auer University of Pennsylvania, USA Zachary G. Ives University of Pennsylvania, USA
introduCtion The concepts Social Software and Web 2.0 were coined to characterize a variety of (sometimes minimalist) services on the Web, which rely on social interactions to determine additions, annotations, or corrections from a multitude of potentially minor user contributions. Nonprofit, collaboration-centered projects such as the free encyclopedia Wikipedia belong to this class of services, as well as commercial applications that enable users to publish, classify, rate, and review objects of a certain content type. Examples for this class of
content-centered Web 2.0 projects are del.iciou. us (for Web links), Digg.com (for news), Flickr (for images), and YouTube (for movies). Communication-centered services such as MySpace or XING enable individual communication and search for and within spatially distributed communities. So-called Web 2.0 mashups integrate and visualize the collected data and information in novel ways, unforeseen by the original content providers. The most prominent examples of mashups are based on Google Maps and overlay external content on a map. All these developments have a common approach of collecting metadata
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Enabling Social Semantic Collaboration
by making participation and contribution as easy and rewarding as possible. Even before Social Software and Web 2.0 applications emerged, prior attempts had been made to enable rapid assembly of data on the Web into more informative content: the most well-known such project is the Semantic Web, although researchers had been working on “information integration for the Web” for many years prior (Mediators,TSIMMIS,Ariadne), with very different methodologies but a similar end goal. The Semantic Web is conceived as an extension of the existing Web to enable machine reasoning and inference: a prerequisite to this is that “information is given well-defined meaning” (Berners-Lee, Hendler, & Lassila, 2001). This approach is based on a standardized description model, Resource Description Framework (RDF) (Lassila & Swick, 1999) and semantic layers on top for semantic nets and taxonomies (RDF-Schema) as well as ontologies, logic axioms, and rules (OWL and SWRL). However, the Semantic Web is not ubiquitous to this point, in part because of the high level of effort involved in annotating data and developing knowledge bases to support the Semantic Web. The Web 2.0 and Semantic Web efforts, which have largely gone on simultaneously, pose an interesting study in contrasting methods to achieve a similar goal. Both approaches aim at integrating dispersed data and information to provide enhanced search, raking, browsing, and navigation facilities for the Web. However, Web 2.0 mainly relies on aggregate human interpretation (the collaborative “ant” intelligence of community members) as the basis of its metadata creation, conflict resolution, ranking, and refinement; the Semantic Web relies on complex but sophisticated knowledge representation languages and machine inference (Table 1). A natural question to ask is whether the different approaches can be combined in a way that leads to synergies. We discuss in this chapter how the question is being answered in the affirmative by a number of promising research
Table 1. Similarities and differences between social software and the Semantic Web Social Software & Web 2.0
Semantic Web
Collaboration and integration focused Based on the Web Provide enhanced means for search and navigation End-user and business centred Community intelligence Post-encoding of semantics Opaque, homogeneous content Light-weight S&T
Technology centred Artificial intelligence Pre-encoding of semantics Complex, heterogeneous content Heavy-weight S&T
projects. The main goal of these projects is to support collaborative knowledge engineering in social networks, with high reward and little effort. After presenting fundamental communication and collaboration patterns of Social Software, we exhibit the tool OntoWiki for social, semantic collaboration. In subsequent sections we suggest strategies for employing Social Software and Web 2.0 methods to support the creation of knowledge bases for the Semantic Web. We give an overview on further and relater work and conclude with remarks concerning future challenges.
soCial software and weB 2.0 The concepts social software (Webb, 2004) and Web 2.0 (O’Reilly, 2005) were recently conceived to explain the phenomenon that computers and technology are becoming more and more important for human communication and collaboration. In particular the following aspects are important with respect to software enabling social collaboration: (1) usability, (2) community and participation, (3) economic aspects, (4) standardisation, and (5) reusability and convergence. In addition to that, a precise delimitation of the concept social software is due to heterogeneity of applications, applicants, and application domains complex. It was proposed by Shirky (2003) to define the concept of social software not just with respect
Enabling Social Semantic Collaboration
Table 2. Typical communication patterns for social software Pattern
Name
Partner
Point-to-point
1:1
E-mail, SMS/MMS
Bidirectional
1:1
IM, VoIP
Star-like
1:n
Web pages, Blogs, Podcasts
Net-like
n:m
Wikis, content communities
to characteristics of a certain software, but also with regard to communication patterns leading to the formation of a virtual community. Typical communication patterns of Social Software are depicted in Table 2. On the technological side, the popularity of social software is related to the development and use of the software development and communication paradigms AJAX (Asyncrounous JAvascript and XML), REST (Representational State Transfer), and JSON (Javascript Object Notation). These, in comparison to their counterparts Web services, RPC or remote desktop light-weight technologies enable completely new adaptive and interactive application architectures and services. Based on these technologies, a number of methods for user-interaction established, which encourage and simplify spontaneous contributions, help to organize a multiplicity of contributions, as well as to syndicate and mutually integrate the gained data. These include: •
Folksonomies: Content annotation by means of tags (i.e., self-describing attributes attached to content objects) enable the fuzzy but intuitive organization of comprehensive content bases (Golder et al., 2006). Tag clouds visualize tags to support navigation and filtering. Tags are colocated in a tag cloud when jointly used and emphasized differently to stress their usage frequency.
Direction
•
•
•
Example
Architecture of participation: Already the usage of an application creates an added value. For example, the added value can be generated by interactively evaluating usage statistics to determine popular content objects or by collecting ratings from users to classify content with respect to quality. Instant-gratification: Active users are rewarded with enhanced functionality and their reputation in the user community is visibly increased. This promotes contributions and helps to establish a collaboration culture. Mashups and feeds: The content collected in the system is syndicated for other services (e.g., RSS feeds, JSON exports, or public APIs). This allows seamless integration of different data end transforms the Web into a Service Oriented Architecture.
In the remainder of this chapter, we suggest approaches how these Web 2.0 and Social Software methods can be adopted to support semantic collaboration scenarios.
soCial semantiC work environments Recently, a number of strategies, approaches, and applications emerged aiming at employing elements of the Web 2.0 and Social Software for
Enabling Social Semantic Collaboration
semantic collaboration on the Web. Examples are the approaches to integrate semantics into wikis, to bring the semantic to the user’s desktops, or to weave social networks by means of semantic technologies such as FOAF. The application domain for semantic collaboration scenarios can be often characterized in the following way: •
• • • • •
A single, precise usage scenario of the envisioned knowledge bases is initially not known or (easily) definable. A possibly large number of involved actors is spatially separated. The collaboration is not a business in itself but a means to an end. Only a small amount of human and financial resources is available. Application of reasoning services is (initially) not mission critical. The collaboration environment is Web-centric.
ally and intuitively for end users in a generic but configurable way and interlinked to related digital resources. Users should be enabled to enhance the knowledge schema incrementally as well as to contribute instance data agreeing on it as easy as possible to provide more detailed descriptions and modelings. More specifically, the following components should be realized in SSWEs which follow the star-like communication pattern: •
•
•
• Some concrete examples for the growing number of in such a way characterized usage scenarios of Social Semantic Working Environments (SSWE) are summarized in Table 3. In order to organize the collaboration in SSWEs we collect in the remainder of this section some requirements for SSWE tool support. The main goal is to rapidly simplify the acquisition, presentation and syndication of semantically structured information (e.g., instance data) from and for end users. This can be achieved by regarding knowledge bases as “information maps.” Each node at the information map is represented visu-
•
• •
Intuitive display and editing of instance data should be provided in generic ways, yet enabling means for domains specific extensions. Semantic views allow the generation of different views and aggregations of the knowledge base. Versioning and evolution provides the opportunity to track, review, and selectively roll-back any changes made. Semantic search facilitates keyword searches on all information, search results can be filtered and sorted (using semantic relations). Community support enables discussions about small information chunks. Users are encouraged to vote about distinct facts or prospective changes. Online statistics interactively measure the popularity of content and activity of users. Semantic syndication supports the distribution of information and their integration into other services and applications.
Table 3. Example SSWE application scenarios
Aim of Semantic Collaboration
Example Domain
Example application
Creation of shared / common terminologies
Biomedicine
Open Biomedical Ontologies (OBO)
Integration of dispersed information sources
Virtual organizations
Web sites of research networks, or social and charitable organizations
Content creation within online-communities
Science & Technology
Conference, publication knowledge bases
Enabling Social Semantic Collaboration
In the next sections we propose strategies on how to put these requirements into effect in real systems and provide some examples of a prototypical implementation of an SSWE called OntoWiki.
visual representation of semantiC Content The intuitive visual representation of highly structured and interlinked content is a major challenge on the Semantic Web. The possibilities for adopting strategies from Social Software here are due to the more heterogeneous and complex content on the Semantic Web limited. However, a commonly seen strategy in Social Software such as Wikis and Blogs is to visually represent content bases to users in the shape of “information maps.” Each node at the information map, that is, Blog or Wiki article, is represented as a Web accessible page and interlinked to related nodes. Wiki or Blog article titles are used to create intuitive and recognizable Web addresses to ease navigation in the information map. A similar strategy can be applied for the generic visual representation of Semantic Web knowledge bases—a Web page can be rendered for each knowledge base object compiling all information available about the object and interlinking it with related content.
different views on instance data In addition to regarding knowledge bases on the Semantic Web as interlinked “information maps,” the intuitive visual representation can be facilitated by providing different views on instance data. Such views can be either domain specific or generic. Domain specific views can be seen in analogy to Web 2.0 Mashups and will have to be implemented specifically for a certain application scenario. Generic views, on the other hand, provide visual representations of instance data according to certain property types. We give some examples.
List Views List views present a selection of several instances in a combined view. The selection of instances to display can be either based on class membership (i.e., according to an rdf:type property) or based on the result of a selection by a facet or full-text search. List views can be made additionally configurable by enabling users to toggle the display of commonly used properties. Furthermore, each list element representing an individual instance should be linked to an individual view of that instance containing all related information.
Individual Views Individual views combine all the information related to a certain node in the knowledge base, that is, all properties and their values attached to a particular instance. Property values pointing to other individuals are (according to the information map metaphor) rendered as HTML links to the corresponding individual view. Alternatively, to get information about the referenced individual without having to load the complete individual view users can be enabled to expand a short summary (loaded per AJAX) right where the reference is shown.
Map View One building block of the Web 2.0 is the availability of public APIs, callable from embedded Javascripts thus enabling the integration of different data. Several APIs are, for example, available for embedding maps. Hence, if instance data in a knowledge base contains property values representing geographical information (i.e., addresses or longitudes/latitudes) map views can provide information about the geographical location of the selected data (see Figure 1). Depending on the extensibility of the API the integration can be realized bidirectional in a way that objects displayed in the map can be expanded and instance details
Enabling Social Semantic Collaboration
Figure 1. Map view (left) and calendar view (right) of instance data about scientific conferences in OntoWiki
are dynamically fetched from the knowledge base and displayed directly within the map view.
Calendar View Instances having property values with the associated datatype xsd:date can be displayed in a calendar view (see Figure 1). As for the map view the selection of instances displayed in the calendar view can be the result of a full-text search or facet-based filtering. Each item displayed can be linked to the individual view of the corresponding instance. To be able to integrate the calendar data with other Web services or desktop applications, a link can be offered to export calendar items in iCal format.
inline editing The smallest possible information chunks (i.e., RDF statements) presented on the user interface of the Semantic Web application are editable for users. For example, all information originating from statements and presented on the user interface can be equipped with small edit and add buttons (see Figure 2). On activation of the buttons, a suitable editing widget can be loaded into the currently displayed page and the corresponding statement object can be edited or a similar content added. This strategy can be seen analogous to the WYSIWYG (What You See Is What You Get) for text editing, since information can be edited in the same environment as it is presented to users.
CollaBorative authoring
view editing
Since Social Software and Web 2.0 applications are mainly focussed on a specific content type, content authoring functionality is mostly realized in an application specific way. A common element, however, is tagging functionality for individually annotating content objects. To enable users to author information within a Semantic Web application in a generic, application independent way, we see two complementary edit strategies:
Common combinations of information are editable in one single step. This requires the generation of comprehensive editing forms based on the view to be edited. The same technique as for generating the view can be applied for generating a suitable form, if the display of property values is replaced with appropriate widgets for editing these values. Examples for editable views are forms to add or edit (a) all information related to a specific instance or (b) values of a specific property across
Enabling Social Semantic Collaboration
Table 4. Editing widgets for the construction of edit forms Semantic widgets
Datatype widgets
Statements: allow editing of subject, predicate, and object.
Text editing: include restricted configurations for e-mail, numbers, and so forth.
Nodes: enable editing of either literals or resources.
WYSIWIG HTML editor: edits HTML fragments.
Resources: search and select for/from existing resources.
Dates: selects dates from a calendar.
Literals: literal data in conjunction with datatype/ language identifier.
File widget: uploading of files to the Semantic Web application
several instances. The latter simplifies the addition of information after a set of instances was initially created. Both editing strategies are founded on the idea that users edit content exactly at the same place where it is displayed. This facilitates incremental additions as well as ease-of-use and promotes user contributions in constructing a knowledge base.
editing widgets The implementation of both strategies can be grounded on a library of editing widgets thus simplifying extensions for new data-types and domain-specific enhancements. Such widgets
can be implemented in a server side programming language; they generate HTML fragments together with appropriate CSS (Cascading Style Sheet) definitions and optionally JavaScript code. They may be customized for usage in specific contexts. In Table 4, we propose some semantic and datatype specific widget types.
Concept Identification and Reuse Knowledge bases become increasingly advantageous, if once defined concepts (e.g., classes, properties, or instances) are as much reused and interlinked as possible. This especially eases the task of rearranging, extracting, and aggregating knowledge. To become part of the daily routine
Figure 2. OntoWiki instance display with statement edit buttons (left). Statement editor with interactive search for predefined individuals based on AJAX technology (right)
Enabling Social Semantic Collaboration
also for inexperienced and rare users, already defined concepts should be suggested to the user whenever he is requested to contribute new information. In a Web-based environment and for highly scalable knowledge bases conventional Web technologies were the major obstacles here, since they do not support large data sets to be handled at the client (browser) side. The Web 2.0 technologies AJAX and JSON help to overcome this limitation by enabling to interactively propose already defined concepts while the user types in new information to be added to the knowledge base (see Figure 2).
soCial CollaBoration aspeCts A major aim of SSWEs is to foster and employ social interactions for the development of knowledge bases. This eases the structured exchange of meta-information about the knowledge base drastically and promotes collaboration scenarios where face-to-face communication is hard. Making means of social interactions as easy as possible furthermore contributes in creating an “architecture of participation” that allows users to add value to the system as they use it. In the following, we elaborate on some examples how social interactions can be specifically supported by SSWEs
Change tracking All changes applied to a knowledge base should be tracked. An SSWE should enable the review of changes on different levels of detail. These are for example the RDF statement level, the taxonomy/class hierarchy level, the logic/ontology level or the domain level. The review of changes should be able to be restricted to a specific context or perspective on the knowledge base, such as changes on a specific instance, changes on instances of a certain class, or changes made by
a distinct user or user group. In addition to present such change sets on the Web, users should be able subscribe to get information about the most recent changes on objects of their interest by email or RSS/Atom feeds. This again promotes the participation of users.
Commenting Ideally an SSWE allows adding of comments to all information in the knowledge base. Commenting (and other tasks in general) can be promoted by reducing the number of clicks and the wait time in between as much as possible. This enables community driven discussions, for example about the validity of certain statements. Technically, this can be implemented on the basis of RDF reifications, which allow making statements about statements. Small icons attached to an object of a statement within the user interface can indicate that such reifications exist. See Figure 3. Positioning the mouse pointer on such an icon will immediately show up a tool tip with the most recent annotations; clicking on the icon will display them all.
rating As many Social Software applications SSWEs should allow users to rate instances according to a certain scale. Content specific ratings could for example be implemented by means of special annotation properties which allow the creation of rating categories and scales with respect to a
Figure 3. Comments attached to statements.
Enabling Social Semantic Collaboration
certain class. Instances of the class can then be rated according to these categories, thus allowing for example the rating of instances of a class publication according to categories originality, quality, and presentation.
Popularity All accesses to the knowledge base can be logged, thus allowing to arrange views on the content based on popularity. As with ratings or user activity, the popularity of content can be measured with respect to a certain knowledge base or fragment of it (e.g., popularity with respect to class membership) and with respect to a certain time period (e.g., last hour, day, week, month, etc.). This enables users to add value to the system as they use it.
Activity/Provenance The system keeps record of what was contributed by whom. This includes contributions to the ontology schema, additions of instance data or ratings, and commenting. This information can be used to honour active users in the context of the overall system, a specific knowledge base or a fragment of it (e.g., instance additions to some class). This way, it contributes to instantly gratify users for their efforts and helps building a community related to certain semantic content.
semantiC searCh To enable users of SSWEs to efficiently search for information in the knowledge base, the semantic structuring and representation of content should be employed to enhance the retrieval of information. We present two complementary strategies to achieve this goal.
facet-Based Browsing Taxonomic structures give users exactly one way to access information. Furthermore, the development of appropriate taxonomic structures (whether, e.g., class or SKOS keyword hierarchies) requires significant initial efforts. As a pay-asyou-go strategy, facet-based browsing allows to reduce the efforts for a priori knowledge structuring, while still offering efficient means to retrieve information. Thereby, facet-based browsing methods can provide similar functionality for semantically rich knowledge bases as tagging systems and tag clouds for Social Software applications. Facet-based browsing was first implemented for RDF data by the Longwell Browser.1 To enable users to select objects according to certain facets, all property values (facets) of a set of selected instances are analyzed. If for a certain property the instances have only a limited set of values, those values are offered to restrict the instance selection further. Hence, this way of navigation through data will never lead to empty results. The analyzing of property values though can be very resource demanding and time consuming. To still enable fast response times it can be beneficial to cache the results of of a property value analysis and to selective invalidate cache objects on updates of respective property values.
Semantically Enhanced Full-Text search OntoWiki provides a full-text search for one or multiple keywords occurring in literal property values. Since there can be several property values of a single individual containing the search string the results should be grouped by instances and ordered by frequency of occurrence of the search string. Search results may be filtered to contain only individuals which are instances of a distinct class or which are described by the literal only in conjunction with a distinct property (see Figure 4).
Enabling Social Semantic Collaboration
Figure 4. User interface for the semantic enhanced search in OntoWiki. After searching for “York” it is suggested to refine the search to instances with one of the properties swrc:address, swrc: booktitle or swrc:name
A semantic search has significant advantages compared to conventional full-text searches. By detecting classes and properties, contain matching instances, the semantic search delivers important feedback to the user how the search may be successfully refined.
data integration for sswes In this section, we give a short overview over approaches from the Data Integration realm, which can build a basis for the implementation of SSWEs. Since the early days of the Web, one of the most compelling visions has been to develop new systems that exploit the wealth of available information to answer new questions—highlevel, semantic queries—that are not directly answerable using a single document or data item. The Semantic Web represents the most widely promoted, modern version of this vision, but actually, as Tim Berners-Lee himself observes in Berners-Lee (2003), the Semantic Web is a particular methodology for attempting to achieve data integration—a topic studied previously in
0
the AI agents community (Arens, Chee, Hsu, & Knoblock, 1993; Friedman & Weld, 1997) and the data integration field (Chawathe et al., 1994; Levy, Rajaraman, & Ordille, 1996). Data integration is an incredibly challenging problem: data is both physically encoded and logically represented in many different ways, and “meaning” or “semantics” is highly dependent on a frame of reference. Over time, many of the physical encoding issues have been largely addressed through standardization; likewise for basic protocols for fetching data. Today, most data sources provide access in one of a handful of physical formats: a serialization hidden underneath JDBC drivers, HTML, comma-separated values, RDF, or XML. Data is often fetched using HTTP, perhaps layered over with Web Service protocols, as we see with many services from Amazon, Google, and so on. This leaves the hardest problems in the areas of semantics: obtaining semantic annotations for data items; determining a unified output representation for the data when it is combined; and converting the data from source to output. Each of these areas was originally addressed by asking the content creator or service builder to encode everything; but in recent years there has
Enabling Social Semantic Collaboration
been a trend towards incorporating Web 2.0-like ideas with a human in the loop.
semantic annotation Early work on information integration was divided into two classes of approaches: integration of databases focuses on settings where the data was already stored in relational tables or in objects, which had a considerable amount of metadata describing semantics; integration of Web sources focused on developing wrapper induction strategies (Kushmerick, Weld, & Doorenbos, 1997) for learning how to extract (and hence annotate) certain data items from Web pages. The Semantic Web has adopted a strategy that greatly resembles that of database integration: data items are annotated with a very rich set of metadata, including class relationships that are based on Knowledge Representation formalisms. In certain settings, however, we are beginning to see the emergence of Web 2.0-like ideas, such as collaborative tagging and instant gratification, in this space. The Mangrove project (Halevy, Etzioni, Doan, & Ives, 2003) focused on encouraging Web contributors to semantically annotate their pages (using a small set of predefined tags) by providing useful “out of the box” services that exploited these tags, such as organizational event calendars. To a significant extent, today’s tag-based Web sites follow a similar model, although to the best of our knowledge there has not yet been a focus on integrating tag-based services with query services that go beyond basic keyword querying.
Developing a Unified Output representation Most approaches to integrating data are instances of the so-called mediator architecture (Wiederhold, 1992): they build an “umbrella” layer and representation over disparate data sources. A challenge lies in developing the necessary unified schema or ontology to represent the combined
knowledge. In some cases, this is fairly straightforward as concepts align closely, but in other cases this can be a major endeavor. In recent years, there has been a focus on “lightweight” integration into commonly understood data representations: overlaid maps, timelines, itemized lists, simple visualizations of one or two attributes. Such target representations are limited in the questions they can answer, but for the right questions, their integrated views are very useful. The most common example of this approach is the plethora of map mashups available over Google Maps. However, recent research projects like SIMILE (Huynh et al., 2005) exploit very similar ideas using Semantic Web technologies. A complementary approach is to allow collaborative authoring of the target knowledgebase. One prime example of this is GoogleBase, which has an extensible set of item types and attributes, but which suggests a starting schema for most common categories of items. This establishes some regularity across all items of a related type. The Open Directory Project (dmoz.org) adopts a slightly different strategy, with a single Web site classification taxonomy that can be edited by any contributor.
Converting the data Probably the most challenging aspect of data integration is the process of actually combining the data. As described previously, some of the lower levels of this process have been largely alleviated, due to common data encodings and Web Service or HTTP interfaces. This leaves most of the emphasis on combining, restructuring, and translating attributes and classes from the source representation to the target. Traditional data integration achieves this using declarative query languages. The Semantic Web approach relies on expressing equivalences or subsumption relationships between classes in different knowledge bases; this works for many situations but cannot perform, for instance, mathematical
Enabling Social Semantic Collaboration
transformations such as unit conversion. For both of these cases, a variety of tools have been built to perform semi-automated schema matching (Aumueller, 2005; Doan, Domingos, & Halevy, 2001; Gal, Modica, & Jamil, 2004; Jian, Hu, Cheng, & Qu, 2005; Madhavan, Bernstein, & Rahm, 2001; McGuinness, Rikes, Rice, & Wilder, 2000; Noy & Musen, 2000); unfortunately, this is a very challenging problem, and hence such tools are somewhat prone to mistakes. Web 2.0 generally relies on mashups built using AJAX—Asynchronous Javascript And XML—which express the transformations using custom procedural code over XML data. Most mashups are highly effective, in part because they focus on simple, lightweight conversions with well-understood data. They are developed grass-roots-style, by third party programmers who wanted functionality not available elsewhere. The MOBS system (McCann et. al, 05) takes the schema matching approach and infuses it with community-based ideas, to great effect. MOBS relies on a large community of users to correct and refine schema mappings between sources, and it focuses on providing reward mechanisms for these users.
related work In addition to the references given in the preceding sections, we would like to exhibit here some work with regard to extending Social Software with semantic enhancements in the light of the earlier mentioned social communication patterns, that is, point-to-point, bidirectional, star-like, and net-like. Supplementary to these categories, local approaches which can be subsumed under the Semantic Desktop concept play a crucial role for integrated support of SSWEs. The project SemperWiki (Oren, 2005), for example, created a simple Wiki, combined with an RDF triple store for the desktop, which can function as a local basis
for content to be exchanged and distributed in a social, semantic collaboration network. Another example is semantically enhanced file systems such as SemDAV (Schandl, 2006) and TagFS (Bloehdorn et. al, 2006). The latter extends the file system metaphor with tags, while SemDAV’s aim is a deeper integration of Semantic Web technologies. Social Software founded on the point-to-point or bidirectional communication patterns is enhanced for example in McDowell et al. (2004) which provides multiple semantic enhancements for the traditional e-mail service. Franz and Staab (2005) show how semantic enhancements can be integrated into instant messaging. The group of star-like semantic social software is dominated by technologies which enrich textual information in news feeds and traditional Web pages with machine-interpretable semantics (e.g., for RSS (Hammersley, 2003) and ATOM (Nottingham & Sayre, 2005) or GRDDL (HazaëlMassieux & Connolly, 2005), RDFa (Adida & Birbeck, 2006) and Microformats (Khare & Celik, 2006) for Web pages). The by far largest group are net-like approaches, which can be categorized into (a) semantic enhancements for decentralized technologies like Wikis, Blogs, and P2P networks (e.g., Haase et al., 2004; Kager et al., 2004; Souzis, 2005), (b) Ontologies for the description of content in social networks (e.g., FOAF (Brickley & Miller, 2004) and SIOC (Breslin et al., 2006)), and (c) centralized services for specialized collaboration communities (e.g., flickr.com, del.icio.us and last.fm).
future researCh direCtions As we mentioned earlier, a pivotal Social Software concept is the collaborative tagging of content leading to folksonomies (i.e., taxonomies created by folks) (Golder et al., 2006). A folksonomy here is a system of weighted keywords emerged from a multiplicity of individually chosen keyword at-
Enabling Social Semantic Collaboration
tachments (i.e., tags) to content objects. Applied to the sphere of the Semantic Web the challenge is to employ human “swarm intelligence” in the spirit of tagging to create not just comprehensive but also consistent knowledge bases. Consistent here is meant less from a logical point of view than from the perspective of achieving an agreement in the user community. When knowledge bases are collaboratively developed by loosely-coupled communities a way to improve the consistency is the development of methodologies for moderation and decision processes within SSWEs. A field of future research are possible indicators for the degree of consistency. A different approach of tackling the consistency problem represents policies and access control mechanisms for accessing, editing, and annotating content. Often knowledge bases start as simple semantic networks, become increasingly rich thus resulting in taxonomies and class hierarchies and are finally enriched with logical axioms and definitions. Due to the variety of the possible expressivity to be considered, the formulation of policy models and access control strategies turns out to be difficult. In addition, policies should be adequate for a spectrum of knowledge bases with a varying degree of semantic richness. Another challenge, lying less in the scientific than the software engineering field, is to increase the flexibility and robustness of storage backends, libraries, and frameworks for the development of Semantic Web applications. In addition, standards for semantic widgets and user interface elements for SSWEs can support user acceptance and interoperability. Last but not least, economic aspects play a crucial role to make SSWEs a success. Due to the fact that semantic collaboration is in many cases not a direct business in itself, specific business models are needed which are focussed on services and products supporting the generation and curation of semantic content by communities.
referenCes Adida, B., & Birbeck, M. (2006). RDFa Primer 1.0 (Working Draft). World Wide Web Consortium (W3C). Arens, Y., Chee, C., Hsu, C.-N., & Knoblock, C. (1993). Retrieving and integrating data from multiple information sources. International Journal on Intelligent and Cooperative Information Systems, 2(2), 127-158. Auer, S. (2005, May 30). Powl – a Web based platform for collaborative Semantic Web development. In Scripting for the Semantic Web, CEUR Workshop Proceedings, CEUR-WS (pp. 135). Auer, S., & Herre, H. (2006). A versioning and evolution framework for RDF knowledge bases. In Proceedings of Ershov Memorial Conference. Aumüller, D. (2005a). SHAWN: Structure helps a wiki navigate. In Proceedings of the BTW-Workshop WebDB Meets IR. Aumüller, D. (2005b). Semantic authoring and retrieval within a wiki (WikSAR). In Demo Session at the Second European Semantic Web Conference (ESWC2005). Retrieved March 2, 2008, from http://wiksar.sf.net Aumueller, D., Do, H. H., Massmann, S., & Rahm, E. (2005). Schema and ontology matching with COMA++. In SIGMOD Conference (pp. 906-908). Bächle, M. (2006). Social Software. Informatik Spektrum, 29(2), 121-124. Berners-Lee, T. (2003, May 20-24). Keynote address: Fitting it all together. In 12th World Wide Web Conference, Budapest, Hungary. Berners-Lee, T., Hendler, J., & Lassila, O. (2001, August). Mein computer versteht mich. Spektrum derWissenschaft, 8, 42-49.
Enabling Social Semantic Collaboration
Bizer, C., Lee, R., & Pietriga, E. (2005, November). Fresnel – a browser-independent presentation vocabulary for RDF. In End User Semantic Web Interaction Workshop at ISWC 2005. Bloehdorn, S., Görlitz, O., Schenk, S., & Völkel, M. (2006). TagFS - tag semantics for hierarchical file systems. Brickley, D., & Miller, L. (2004). FOAF vocabulary specification FOAF Project. Chawathe, S. S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J. D. et al. (1994). The TSIMMIS project: Integration of heterogeneous information sources. IPSJ, 7-18. Dietzold, S., & Auer, S. (2006, June). Access control on RDFTriple stores from a semanticwiki perspective. In Scripting for the Semantic Web, CEUR Workshop Proceedings (pp. 183). ISSN 1613-0073. Doan, A., Domingos, P., & Halevy, A. (2001). Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference (pp. 509-520).
Halevy, A., Etzioni, O., Doan, A., & Ives, Z. (2003). Jayant Madhavan, Luke McDowell, Igor Tatarinov: Crossing the structure chasm. CIDR.. Hammersley, B. (2003). Content syndication with RSS. O’Reilly Associates. Hazaël-Massieux, D., & Connolly, D. (2005). Gleaning resource descriptions from dialects of languages (GRDDL). In World Wide Web Consortium (W3C). Jian, N., Hu, W., Cheng, G., & Qu, Y. (2005). FalconAO: Aligning ontologies with falcon. In Integrating Ontologies. Khare, R., & Çelik, T. (2006). Microformats: A pragmatic path to the Semantic Web. WWW ‘06. ACM Press (pp. 865-866). Kushmerick, N., Weld, D., & Doorenbos, R. (1997). Wrapper induction for information extraction. IJCAI, 1, 729-737. Lassila, O., & Swick, R. R. (1999, February 22). Resource description framework (RDF) model and syntax specification. In W3C Recommendation, WorldWideWeb Consortium (W3C).
Franz, T., & Staab, S. (2005). SAM: Semantics aware instant messaging for the networked semantic desktop. In Semantic Desktop Workshop at the ISWC2005, Galway, Ireland.
Levy, A., Rajaraman, A., & Ordille, J. (1996). Querying heterogeneous information sources using source descriptions. VLDB, 251-262.
Friedman, M., & Weld, D. (1997). Efficiently executing information-gathering plans. IJCAI, 1, 785-791.
Madhavan, J., Bernstein, P., & Rahm, E. (2001). Generic schema matching with cupid. VLDB, 49-58.
Gal, A., Modica, G., & Jamil, H. (2004). OntoBuilder: Fully automatic extraction and consolidation of ontologies from Web sources. ICDE, 853. Haase, P., Broekstra, J., Ehrig, M., Menken, M., Mika, P., Olko, M. et al. (2004). Bibster – a semantics-based bibliographic peer-to-peer system. In The Semantic Web – ISWC 2004 (pp. 122-136). Springer. LNI 3298.
McGuinness, D., Fikes, R., Rice, J., & Wilder, S. (2000). The chimaera ontology environment. AAAI/IAAI, 1123-1124. Nottingham, M., & Sayre, R. (2005). The atom syndication format. The Internet Engineering Task Force (IETF). Noy, N., & Musen, A. (2000). PROMPT: Algorithm and tool for automated ontology merging and alignment. AAAI/IAAI, 450-455.
Enabling Social Semantic Collaboration
Oren, E. (2005). SemperWiki: A semantic personal wiki. Semantic Desktop Workshop at the ISWC 2005, Galway, Ireland.
Leuf, B., & Cunningham, W. (2001). The wiki way. Amsterdam: Addison-Wesley Longman.
Schandl, B. (2006). SemDAV: A file exchange protocol for the semantic desktop. CEUR-WS, 202.
McCann, R., Kramnik, A., Shen, W., Varadarajan, V., Sobulo, O., & Doan, A. (2005). Integrating data from disparate sources: A mass collaboration approach. ICDE, 487-488.
additional reading
McDowell, L., Etzioni, O., & Halevy, A. (2004). Semantic e-mail: Theory and applications. Journal of Web Semantics, 2, 153-183.
Breslin, J., Decker, S., Harth, A., & Bojars, U. (2006). SIOC: An approach to connect Web-based communities. The International Journal of WebBased Communities, 2, 133-142. Golder, S., & Huberman, B. H. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2), 198-208. Karger, D., & Quan, D. (2004). What would it mean to blog on the Semantic Web? In The Semantic Web - ISWC2004 (pp. 214-228). Springer. LNCS 3298. Kroetzsch, M., Vrandecic, D., & Völkel, M. (2005). Wikipedia and the Semantic Web - the missing links. Wikimania, Frankfurt, Germany.
O’Reilly, T. (2005). What Is Web 2.0 - design patterns and business models for the next generation of software. Retrieved March 2, 2008, from http://www.oreillynet.com/pub/a/oreilly/ tim/news/2005/09/30/what-is-web-20.html Shirky, C. (2003, April). A group is its own worst enemy. In Rede zur ETech Conference. Retrieved March 2, 2008, from http://www.shirky.com/writings/group_enemy.html Souzis, A. (2005). Building a semantic wiki. IEEE Intelligent Systems, 20(5), 87-91.
endnote 1
http://simile.mit.edu/longwell/
Chapter II
Communication Systems for Semantic Work Environments Thomas Franz University of Koblenz-Landau, Germany Sergej Sizov University of Koblenz-Landau, Germany
introduCtion The paradigm of a flexible environment that supports the user in producing, organizing, and browsing the knowledge originates in the early 1940s, a long time before the first personal computers and new communication tools like the Internet became available. The conceptual design of Vannevar Bush’s memex (Bush, 1945) (an acronym for Memory Extender) is probably the most cited (e.g., Gemmell, Bell, Leuder, Drucker, & Wong, 2002) and criticized (e.g., Buckland,
1992) representative of such early conceptual work. In his article, Bush described the integrated work environment that was electronically linked to a repository of microfilms and able to display stored contents and automatically follow references from one document to another. A number of visionary ideas from this early conceptual work can be recognized in state-of-the-art information systems (cross-references between documents, browsing, keyword-based annotation of documents using the personal “codebook,” automatic generation of associative trails for content summarization, etc.).
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Communication Systems for Semantic Work Environments
Today we have entered the knowledge age where economic power or wealth is based on ownership of knowledge, Haag, Cummings, and McCubbrey (2002) and the ability to utilize knowledge for the improvement of services or products. Knowledge workers outnumber any other kind of worker already in highly developed economic systems. Their work is characterized by complex human-centric or personal work processes (Kogan & Muller, 2006) describing information intensive and complex tasks comprising of the creation, retrieval, digestion, filtering, and sharing of (large amounts of) knowledge. We understand a Semantic Work Environment (SWE) as an environment implemented by information technology (IT) that supports such complex knowledge work by exploiting the semantics of information. Accordingly, an SWE is based on methods and tools that allow the extraction, representation, and utilization of the semantics of information in order to provide enhanced IT support for the knowledge worker. To a certain extent, the memex can be considered as a microfilm-based precursor to SWEs. Communication is one of the tasks of the knowledge worker as it denotes the exchange of information and transfer of knowledge so that it is vital for any collaborative human work, for example, for coordinating work, reporting on work progress, or discussing solutions and problems. Efficient communication infrastructure is probably one of the key differences of modern SWEs in contrast to isolated memex-style solutions from the past. As an example, we may consider the common scenario of publishing research results. Basically, the preparation steps for a new scientific publication has not changed much since memex. Both Vannevar Bush and (roughly 70 years later) authors of this chapter had to exchange ideas for the new proposals with colleagues, discuss the proposed solution, sketch the outline of the publication, make responsibility assignments for content delivery, request reviews, and iteratively improve
the proposal, collaborate with publishers and other author groups, and finally make the contribution public. In the following sections, we will use this sample scenario for the comparison and illustration of various communication mechanisms. Internet and the World Wide Web (WWW) provide an infrastructure on top of which a variety of communication channels have been established such as e-mail, chat, Web logs (blogs), wikis, and instant messaging. We refer to such a communication channel as the communication protocol and the infrastructure on which a protocol is implemented. For the human utilization of a communication channel, communication clients or tools are employed. We refer to the combination of both as communication system in the following. Communication systems—predominantly those enabling asynchronous textual communication (e.g., e-mail)—are widespread today and have complemented former ways of communicating (e.g., mail). Obvious advantages of such systems are the rate at which messages are delivered between communication partners and the costs for the delivery. These advantages foster distributed work and virtual communities that consist of collaborators that are located across the planet. However, not only collaborators that are physically or timely separated employ IT systems for communicating, colleagues located on the same floor working for the same departmental group use them as well. As we show, communication systems also take part in different personal information management strategies rather than being just used for communicating. Noticing the importance of communication for knowledge work, this chapter deals with the role of communication systems with respect to knowledge work. At first, we introduce different communication systems to indicate their different utilization and role within knowledge work. Next, we develop requirements of communication systems for SWEs by a discussion of how SWEs support the knowledge worker and confront conventional communication tools and channels
Communication Systems for Semantic Work Environments
with these requirements. We then point to several research works that contribute to enhanced communication systems that support knowledge work better. We accommodate such research and give an outlook by a visionary description of the scenario introduced before and end the chapter with a conclusion.
CommuniCation systems In this section, we give an overview of currently available computer-based communication systems and point out their utilization for work. For the comparison of the different communication systems, we consider the following aspects: •
•
•
Symmetry of exchange: This category denotes the directionality of communication. Asymmetric means that there are the roles of an addresser and addressee where the latter cannot communicate back or only in a limited way, for example, one can reply to a Web log by (1) posting a comment that is, however, less visible and prominent on the Web page, (2) by writing one’s own Web log that refers (links) to the initial Web log, or (3) by contacting the blog author using another communication channel. Synchrony: Synchrony relates to the time that passes until a message is perceived by the addressee. Asynchronous communication is characterized by longer delays between utterances or messages while in synchronous and real-time communication, messages are immediately perceived by the addressee, for example, in Internet telephony, words of a sentence are perceived even before the sentence is closed. Communication media: Different media types such as text, image, audio, and video are supported by different communication channels.
While the communication channel defines communication properties such as synchronism or media support, communication tools build upon channels to provide means to convey information via a communication channel but also means to deal with communicated information. We pay attention particularly to the latter as information management features and data models of communication tools are relevant for the analysis of their role within knowledge work as to be supported by SWEs.
electronic mail Electronic mail (e-mail) enables asynchronous, primarily textual communication and is a well analyzed communication system: e-mail is not only employed as a pure communication means but also for time and task management (Mackay, 1998). E-mail overload (Whittaker & Sidner, 1996) summarizes the difficulties in organizing increasingly many e-mails and various kinds of e-mails such as private, work-related, spam, and important ones. It has further been examined how different working backgrounds of people influence the utilization of e-mail, observing that there are significant differences between user groups (Danis, Kellogg, Lau, Dredze, Stylos, & Kushmerick, 2005), for example, in the way different workers structure and classify their e-mails. As a commonly acknowledged communication means, e-mail became a source of knowledge and takes part in the creation of new knowledge making it an important knowledge management tool (Lichtenstein, 2004). In our previously introduced sample scenario, e-mail communication would have a wide range of possible issues: thematically focused discussions, tracking of document evolution, content sharing, and delivery. As e-mails are also used for coordination of tasks or work, they even inherit process knowledge.
Communication Systems for Semantic Work Environments
Mailing Lists and News Groups Mailing lists share the infrastructure of e-mail and allow group communication via e-mails addressed to a mailing list. A mailing list server allows for further administrative functions such as membership control. Furthermore, mailing lists can be moderated by users, that is, some human acts as a filter for posted messages and decides on the delivery of postings. News groups are based on a different technical infrastructure than mailing lists and have less administrative features. However, they closely resemble mailing lists as they also enable group communication by postings to particular news groups that are organized by a topic hierarchy. In our sample scenario, mailing lists and news groups could be used for sharing of contents (e.g., a new update of the paper draft becomes available) and multicast notifications of user groups (all authors are notified about the new submission deadline). In unmoderated cases, both mailing lists and news groups are open to flaming, that is, the posting of violent or rude message content (Kim & Raja, 1991; Mabry, 1998). A benefit of such openness is that it enables lurking, that is, a passive, read-only participation in conversations, for example, to be aware about current topics and issues addressed by a particular group (Nonnecke & Preece, 2000).
instant messaging Instant messaging (IM) enables near-synchronous, textual communication where users see messages nearly immediately after they have been submitted by communication partners. Next to information exchange that is often characterized by informal communication in IM, it is also used for coordinative tasks such as preparing media switches, for example, initiating a phone call or face-to-face meeting (Nardi, Whittaker, & Bradner, 2000). In work environments, IM
is primarily used for work related and complex conversation (Isaacs, Walendowski, Whittaker, Schiano, & Kamm, 2002). For instance, in our sample scenario, instant messaging could be used by two authors for collaborative online writing of the publication and discussion of its scientific issues. Two main types of instant messaging users have been identified in Isaacs et al. (2002): heavy users who employ IM for work and work related conversation and light users who, rather, use IM for coordinating and whose conversations are shorter and less complex.
Chat Systems Chat systems such as Internet relay chat (IRC) are closely related to IM concerning synchrony and exchange symmetry. While both IM as well as chat systems allow one-to-one conversation as well as group conversation, chat systems are focused on group conversation (e.g., prereview meeting of authors and editors of a scientific journal). Similarly to IM, studies about the utilization of a chat system indicate that in working environments, they are primarily used for work related conversation (69%) and for awareness of presence (Handel & Herbsleb, 2002). In contrast to IM clients, chat clients are usually not as intrusive, for example, they do not attract user attention on any new message as IM clients usually do it. Following a chat conversation, however, may become difficult since related posts (e.g., question-answer) may be disrupted by interleaving messages (Herring, 1999).
web logs Web logs (blogs) are frequently updated Web pages that contain chronologically ordered sections written by authors of a Web log. Thus, they present an asynchronous and asymmetric communication system. The content and purpose of blogs varies significantly from personal diary-like entries to
Communication Systems for Semantic Work Environments
Figure 1. Types of blogs Personal Quadrant I Online Diaries
Quadrant II Support Group
Individual
Community
Quadrant III
Quadrant IV
Enhanced Column
Collaborative Content Creation Topical
column-like ones. In Krishnamurthy (2002), that diversity is illustrated by four quadrants of blog types (Figure 1). One distinction is made along the content of blogs ranging from blogs dealing with primarily personal issues and those that deal with topics of general interest, for example, politics. Another axis distinguishes between the multiplicity of authors within a blog ranging from individually maintained blogs to community blogs that contain entries from many different authors. Most blogs are individualistic and self-expressive rather than being interactive and dedicated to general topics (Herring, Scheidt, Wright, & Bonus, 2005). Obviously, Web logs are an efficient means for Web publishing enabling one to many communication. As pointed out in Efimova and de Moor (2005), however, many-to-many conversations can be established by interlinked Web logs that reference each other and have been written by different authors. Beyond the borders of personal information management, blogs are used by many software development portals, research projects, and product vendors for “newsticker”-like publishing and sharing of announcements, new product releases,
0
or major events. In this sense, they can be also considered as a special case of thematically focused notification mechanism.
wikis Wikis are linked Web pages (wiki pages) that are provided by a wiki system that offers user management, a Web-based editor for wiki pages, and versioning of pages. In contrast to Web logs, wiki pages are commonly edited by multiple users. Thus wikis also allow collaborators to communicate by editing a wiki page, that is, by either reading the changes and additions of others or making changes oneself. Due to the simplicity of use and content creation, problem-oriented wikis are the state-of-the-art means for capturing and sharing of knowledge in thematically focused communities and complex work environments. From the background of our application example, wikis are widely used in practice for collaborative work on scientific publications (including discussion tracks on research issues, collecting technical knowledge like questions of formatting, and format conversions, etc.). Due to that collaborative nature, wikis are regarded as a powerful knowledge management tool (Wagner, 2004).
Communication Systems for Semantic Work Environments
Audio and Video Based Communication Systems Internet telephony applications enable synchronous, symmetric audio communication over the Internet based on voice over IP (VoIP) protocols. Trust building occurs faster in communications over rich communication media such as audio and video conferencing than compared to other media (Bos, Olson, Gergle, Olson, & Wright, 2002), for example, text-based chat. Besides one-to-one communication, conference modes that enable many-to-many communication are also possible and provide a further means to support group collaboration that can lower group dependability (Kane & Luz, 2006). Internet phone calls and phone conferences can be recorded and such recordings are sometimes published (e.g., the track of a joint phone conference between authors and publishers of the scientific journal). Such publication is often called audio blogging
attributing to the symmetry of content exchange that is one-to-many similar to regular Web logs. Therefore, audio blogging also resembles Internet radio as another audio-based one-to-many communication channel. Next to audio communication, there exists also audio-visual communication where communication partners can see and listen to each other. Prior studies have shown that video conferencing does not increase the quality of work compared to the use of phone and e-mail based communication. For the negotiation of meaning, however, video conferencing is assumed to be beneficial for particular user groups, for example, non-native speakers benefit from it (Veinott, Olson, Olson, & Fu, 1999). Recently, video blogging became popular by Web sites like YouTube1 that allow users to upload videos to a server to share them with others. Like audio blogging, video blogging allows to use commonly synchronous video conferencing in an asynchronous manner.
Table 1. Comparison of communication channels E-mail Mailing Lists and News Groups Instant Messaging Chat Systems Web log Internet Telephony Audio Blogging Video Conferencing Video Blogging Internet Radio Wiki
Symmetry of Exchange
Synchrony
Media
Symmetric
Asynchronous
Text Image (video, audio)
Symmetric
Asynchronous
Text Image (video, audio)
Symmetric
Synchronous
Text (Image, video, audio)
Symmetric
Synchronous
Text (Image, video, audio)
Asymmetric
Asynchronous
Text, Image, video, audio
Symmetric
Synchronous
Audio (Text, Image, Video)
Asymmetric
Asynchronous
Audio, Text, Image,
Symmetric
Synchronous
Video, Audio
Asymmetric
Asynchronous
Video, Audio, Image, Text
Asymmetric
Asynchronous
Audio
Asynchronous
Text, Image (video, audio)
Symmetric
Communication Systems for Semantic Work Environments
Discussion The previous sections indicate the varying suitability of channels for different tasks such as work coordination and conversation about work, but also for tasks not directly related to information communication such as preparation of media change and the perception of presence and awareness. Moreover, the relation between communicated information and work strategies has been indicated, for example, identifying that e-mails are often used as to-do items and that the e-mail inbox is often employed for task and to-do tracking. In Table 1, we summarize the analysis given in this section. The media column lists the media types supported by a channel ordered by their frequency of use in the channel. Media that can be transported via a channel but which is not supported by conventional communication tools, that is, media that is not directly presented by a communication client, is put in brackets.
towards semantiC work environments In this section we briefly introduce the notion of personal work processes and complex tasks as they are to be supported by SWEs. We then develop requirements for communication tools based on those notions and confront current tools with such requirements.
personal work processes and implications for swes Personal knowledge work is characterized by personal work processes which describe complex, information intensive work. At the end of the successful execution of a personal work process stands the achievement of a task. We refer to such a task as a complex task as it commonly comprises several subtasks, for example, information search,
filtering, selection, and combination of information, reservation of resources, communication, creation, and conveyance of information. SWEs are dedicated to support the knowledge worker and thus the personal processes that describe their complex tasks. In the following, we elaborate on the implications for communication systems that follow from the requirements on SWEs. An example of a common complex task is the publication of research results such as introduced in the beginning of this chapter. In the context of research publication particular contextual views are adopted, for example, people are seen as producers of research papers which have a lifecycle and are developed and manipulated collectively. In another example of a complex task—the planning of a conference—people are seen as meeting participants, and research papers are associated to time slots and organized into conference sessions. The example indicates that the same information is used for different purposes and has different roles depending on the working context, that is, the context constituted by a complex tasks. Accordingly, communication systems of SWEs should enable such cross-contextual utilization of information, meaning that communicated information should be available just as any other information. Users should be able to associate it to other information, reuse it, and exploit it for different tasks, not only in the context of communication. Flexible information classification schemes need to be supported by SWEs that allow knowledge workers to retrieve and to file information with respect to their working context. Moreover, classification issues such as pointed out by e-mail overload (Whittaker & Sidner, 1996) call for enhanced support and guidance in the classification of communicated information, for example, automatic e-mail classification. Such classifications cannot be limited to a particular application but need to be reusable and available across applications as knowledge workers need to organize information that they receive from different sources or have created themselves.
Communication Systems for Semantic Work Environments
Different communication channels support different media differently and are advantageous for particular communication related tasks. SWEs should allow the knowledge worker to focus on what to communicate to whom in the most effective way. The choice of the channel should be the result of what is communicated and to whom it shall be communicated. Accordingly, a cross-channel communication environment is required that allows for conversations that utilize multiple channels. The complexity of personal work processes and the number of such processes that knowledge workers deal with in parallel demand guidance in the conduction of such processes. The knowledge worker needs to know what has been done within the different processes, which steps are pending, and what are the associated timelines. Accordingly, communication systems need to support personal work processes, for example, by the integration with tools that allow a user to manage and track them. Any information about process steps needs to be available for later review and reuse, for example, to review the basis of a decision made in the past. Communication tools need to make communicated information persistent for both retrieval and archiving purposes. Moreover, previous exchanges may also provide contextual information about conversations that span large time intervals of several weeks or months. The single knowledge worker will certainly collaborate in teams with multiple coworkers. Accordingly, the implications outlined in the prior paragraphs need to be fulfilled with respect to collaborative work in a networked environment.
Current Shortcomings Conventional communication tools and channels do not fulfill many of the requirements pointed out before. Many tools support a single communication channel only and prevent more effective, cross
channel conversations, for example, the answering of an e-mail by a quick follow up explanation using chat or instant messaging. If such cross channel conversations cannot be tracked with communication tools, conversational threads are cut which complicates later recall of conversations. Recently, tools became available that integrate some communication channels: for instance, a merge of instant messaging and Internet telephony tools can be noticed as many instant messengers support telephony today while phone tools often support instant messaging (e.g., Skype). There are also tools that integrate instant messaging and e-mail, for example, Google mail and talk as well as the Skype-plugin for the Thunderbird mail client. Today’s communication tools focus on the communication task, that is, they are tailored for information exchange. Thus they are not well suited for all the tasks and processes related to management work that knowledge for which workers might like to employ them. As they also do not provide the communicated information in a standardized format that allows further utilization; they ignore the known role of communicated information for work processes, time and task management. Limited information reuse denotes a further shortcoming caused by the missing interoperability of communication clients and other work tools. Communicated information is often not easy to reuse without the loss of context, for example, saving an e-mail attachment to the file system results in the loss of the relation to the e-mail. Analog to the previous example, copying parts of a message into the clipboard provided by a computer desktop and then pasting it into another application also cuts the relation between the copied text and its origin. Such information, however, can be used for information browsing and information search, for example, to present a user with the origin of some piece of text or with the e-mail that was associated with the attachment that has been saved to the file system for editing.
Communication Systems for Semantic Work Environments
Many communication tools lack means to classify communicated information. Most e-mail clients allow a user to organize e-mails into folder hierarchies. These, however, allow the user to classify e-mails only with respect to one criteria. Back in our example, the lead author of a paper may classify e-mails by the person who sent them. To keep track of e-mails related to the production of the research paper the lead author may want to classify e-mails beyond the sender by their topic. Such multiclassification, however, that allows the user to summarize information to suit different tasks is only weakly supported by current communication clients. Some clients allow the user to flag e-mails with tags such as to-do but they lack efficient retrieval support based on such tags or prevent the user from associating multiple such tags with a single e-mail. Moreover, classification schemes reside within the application and cannot easily be shared outside, for example, to classify information dealt with in further applications. Still, many communication tools omit automatic storage of communicated information and thus also miss support for the retrieval and archiving of prior work. Such features are clearly valuable for the knowledge worker who needs to
track what has been done as part of which complex task, for example, the author of a research paper wants to recall the conversation where coauthors decided on the inclusion or deletion of a particular paragraph. Table 2 summarizes the discussion of this section by listing current communication channels and tools with respect to the work relevant properties mentioned here.
researCh enhanCing work support of CommuniCation systems The analysis of the previous section has developed shortcomings of current communication systems and stated requirements for enhanced work support. This section gives brief pointers to research contributing to overcoming shortcomings of current communication systems. It starts with a section pointing at research towards enhanced management of communication data. The next section deals with research contributing to enhanced support for personal processes and
Table 2. Comparison of communication tools and channels User-Defined Classification
Retrieval
Reuse and Referencing
E-mail, Mailing Lists, News Groups
Folder Hierarchies, Query Folders
Search, Folder Browsing, Ordering by Property
Forward, Inline Reply, File System Export, Clipboard, To Do (MS Outlook™)
Instant Messaging
Often predefined Classification by Sender or Time
Search, Ordering by Property
File System Export, Clipboard
Chat Systems Weblogs
[1] Browser Bookmarking, Web log Tagging (e.g. Technorati)
[2] Web Search, Bookmark and Tag-based Browsing
File System Export URL Referencing, File System Export, Clipboard
Internet Telephony
[1]
[2]
File System Export
Video Conferencing
[1]
[2]
File System Export, Clipboard
Audio and Video Blogging
Tagging (e.g. YouTube)
Browsing by Tag/User/Date/ Genre, Text Search
URL Referencing
Wiki
Arbitrary classification, e.g. by setup of link page
Wiki Search, Web Search, Browsing
URL Referencing, File System Export, Clipboard
[1] Classification only possible outside client, e.g. through file system export [2] Commonly no built-in retrieval support. Retrieval possible through file system search and browsing
Communication Systems for Semantic Work Environments
is followed by a section that indicates different research tools for communication that enable the creation of semantic metadata. We then point out different vocabularies to represent such metadata, present research about trust based communication, and overview work on peer-to-peer technology towards communication support.
Management of Communicated information Cole and Stumme have been working on a Conceptual E-Mail Manager (CEM) (Cole & Stumme, 2000). CEM addresses the issue of e-mail classification we pointed out. The main difference in comparison to common e-mail management relies in a multihierarchy storage in contrast to today’s commonly used tree structuring. Each e-mail is seen in a formal context that assigns a set of catchwords to each e-mail. Additionally, a hierarchy for the set of catchwords exists that is used to order the information. Conceptual scales are provided as a mechanism to group related information so that different views of messages can be created. Their implementation is able to visualize concept lattices (Cole, Ecklund, & Stumme, 2000) for any chosen scale. An example of a further research prototype that also addresses the issue of e-mail overload is ContactMap (Whittaker et al., 2004). In contrast to CEM, ContactMap is an approach to simplify classification and retrieval of messages by a different user interface design. ContactMap provides a user interface to contact data where each contact is presented by an image. Such contact images are presented in a two dimensional area and can be grouped, marked, and arranged within this area. As a result, awareness of groups and contacts is increased and access to messages via contacts is simplified through the additional visual cues provided by the system. The Small World Instant Messenger (SWIM) developed by Zhang and Van Alstyne (2004) describes an approach to integrate interpersonal
aspects in communication systems. SWIM is based on previous work by Watt, Walther, and Nowak (2002) dealing with searchability of social networks on the one hand as well as work by Isaacs et al. (2002) presenting an empirical study of over 21,000 IM conversations by 437 users. SWIM builds a user profile based on the user’s homepage and/or bookmarks to conduct information search on social networks. We pointed out that some communication systems lack persistency support for communicated messages. The communication system developed by Watts, Dodds, and Newman (2002) is an approach to develop asynchronous communication for video as communication medium thus also achieving persistency for video mediated communication. Their system aims at the combination of the advantages of multi modal face-to-face like communication with the advantages of asynchronous communication. The system allows communication via video messages that are made persistent for later retrieval and viewing by an addressee that can also answer asynchronously by further video messages.
Process Support The issue of reuse of communicated information for time and task management is treated by Taskmaster (Bellotti, Decheneaut, Howard, & Smith, 2003), a research tool that provides a solution for aligning e-mail as a communication means and its exploitation for task management. Belotti et al. (2003) developed Taskmaster as an application that is based on so called thrasks. A thrask summarizes a task and its associated e-mails and attachments. The developed application represents e-mails within the context of its related task and lets users directly view and order tasks together with their associated e-mails and attachments. Khoussainov and Kushmerick (2005) also address time and task management issues with respect to e-mail management, however, by a very different approach compared to Taskmaster.
Communication Systems for Semantic Work Environments
Khoussainov and Kushmerick (2005) present how machine learning techniques are applied to automatically classify e-mails by tasks. They also worked on a formalization of e-mail activities and the application of machine learning to identify the structure of such activities (Kushmerick & Lau, 2005). McDowell, Etzioni, and Halevy (2004b, 2004a) focus on Semantic E-mail Processes (SEP) to provide a solution to the challenge of missing process support in current e-mail tools. They present a template-based approach to define SEPs which allow the user to specify more or less complex tasks such as event planning with much less lines of code/data than implementing the process in a programming language. Once such process is defined in that it can be executed semi-automatically, for example, answers to a meeting invitation are interpreted automatically recognizing rejections and positive replies that are then used to execute further actions. QuickML (Masui & Takabayashi, 2003) is an example of enhanced process support and group communication. QuickML allows the user to build mailing lists in an ad-hoc fashion by providing a service that allows a user to initiate and maintain a mailing list simply by sending e-mails to the service. QuickML limits administrative overhead and thus fosters group collaboration through e-mail. The Loops system (Erickson et al., 2006) is a chat system for group collaboration providing—next to a chat communication channel—several enhancements such as a so-called social proxy which is a visual user interface that provides presence information about people. Unlike conventional chat systems, chat messages are persistent in Loops so that previous conversations can be tracked and asynchronous communication is possible. Loops also allow for multichannel communication by including a bulletin board that resembles one-to-many communication and allows the user to make certain information more
prominent than would be possible by persistent chat messages. The Buddyspace (Vogiazou, Eisenstadt, Dzbor, & Komzak, 2005) instant messenger is a research prototype that offers enhanced awareness and presence features, for example, the representation of a buddy list as a map that shows the location and presence status of buddies. Furthermore, Buddyspace allows for more sophisticated presence information allowing to distinguish between different presence information for different groups of buddies, for example, a user can set the status online for the group of coworkers, while simultaneously appearing as not available for another group.
Creating Semantic Communication metadata Information reuse and retrieval for information communicated by Web logs is improved by semantic blogging, as dealt with in Möller, Bojars, and Breslin (2006). Semantic blogging denotes an enhancement of conventional blogging that allows the user to integrate semantic meta data about things mentioned in a blog entry, for example, information about a person or an event. Such semantic meta data can be automatically extracted by appropriate blog readers so that the contained information can be immediately reused by the reader that would usually be required to manually enter given event information into his calendar, or some person’s contact information into his address book. Similar to semantic blogs, semantic wikis (Fischer, Gantner, Rendle, Stritt, & SchmidtThieme, 2006; Oren, Breslin, & Decker, 2006; Völkel, Krötzsch, Vrandecic, Haller, & Studer, 2006; Wagner, 2004) augment standard wiki text with semantic meta data. While wikis are a tool for collective information publishing the underlying main concept of interlinked content in the form of wiki pages makes wikis also a tool
Communication Systems for Semantic Work Environments
for various PIM tasks. Exploitations of semantic wikis are similar to those of semantic blogs, allowing tighter integration with desktop applications such as address books or calendars. Beyond that, semantic metadata in wikis also improves data management features allowing for sophisticated search functionalities and dynamic generation of aggregation pages that need to be edited manually and are static in conventional wikis. For instant messaging communicating, two different approaches are given by the semantics aware messenger (SAM) (Franz & Staab, 2005) and the instant messaging server Nabu (Osterfeld, Kiesel, & Schwarz, 2005). While SAM intercepts conversations at the instant messaging client to store semantic metadata about the conversations, Nabu extracts similar semantic metadata on the server side. Both approaches generate semantic metadata to enable enhanced management and reuse of communication data. Approaches like sTex2 and OntoOffice3 stand for similar solutions dedicated to supporting metadata creation in the context of text processing. The tools provided by the associated research projects enable to add semantic metadata to Latex documents and Microsoft Office documents at the time of their creation.
Representing Communication Next to the development of methods and tools presented above, there is also research on different vocabularies to semantically represent metadata relevant in communication. The Friend-of-afriend4 vocabulary is one such vocabulary defining classes and properties to represent persons, their contact details, affiliations, and social relationships. Further examples of similar vocabularies that allow the user to represent persons and their contact details are the vCard ontology5 and the W3C PIM ontology.6 SIOC (Breslin, Harth, Bojars, & Decker, 2005) provides a vocabulary and tools concentrating on representing public communication via mailing
lists or blogs and its relation to communities and topics. Among others, SIOC includes an ontology for representing data produced by conversations via different communication modes such as blogs, forums, and mailing lists. The framework for cross-context semantic information management (X-COSIM) (Franz, Staab, & Arndt, 2007) includes the X-COSIM reference ontology (X-COSIMO) to align information expressed with different domain ontologies such as described before. Based on a context-independent and formally consistent representation of information in X-COSIMO, information can be reused across desktop applications to support personal processes as described.
Security and Trust Communication channels offer an efficient means for certain types of communication. They raise, however, also new issues concerning trust and security. Decisions about whether someone should be allowed to access some information are partly based on how much we trust a particular person, or even how much we trust a particular person given a specific context. Dependent on the expertise of a person on a particular field, we may trust a person with respect to that field while distrusting the same person about other fields, for example, we might trust a computer scientist when she talks about programming languages but distrust her when it comes to cooking advices. Golbeck and Hendler (2005) developed algorithms for inferring trust relationships between individuals that are not directly connected in a social network. They also provide an extension for the Foaf vocabulary to represent trust values between persons and have developed an e-mail client called TrustMail, which adds trust ratings to e-mails for sorting and filtering them based on such ratings.
Communication Systems for Semantic Work Environments
peer-to-peer The research on decentralized, self-organizing infrastructures has been motivated by increasing popularity of peer-to-peer (P2P) frameworks and applications. Recent P2P information systems usually build on top of structured overlay networks like Chord (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001), CAN (Ratnasamy, Francis, Handley, Karp, & Schenker, 2001), Pastry (Rowstron & Druschel, 2001), or P-Grid (Aberer, Punceva, Hauswirth, & Schmidt, 2002) and provide scalable mechanisms for knowledge sharing and information dissemination. Almost all of the mentioned systems maintain distributed indexes that are based on various forms of distributed hash tables (DHTs) and support mapping from keys associated with managed objects (e.g., files, words, concepts, topics of interest) to corresponding network locations (e.g., peers subscribed for a topic of interest, or nodes that contain matching files). In particular, this infrastructure has been adopted for reliable publish/subscribe (e.g., Rowstron, Kermarrec, Castro, & Druschel, 2001) and multicast (e.g., El-Ansary, Alima, Brand, & Haridi, 2003; Merz & Gorunova, 2005) communication in self-organizing, decentralized environments.
CommuniCation support in future swes Communication tools of future SWEs consolidate multiple research works by exploiting machine interpretable semantics of communicated information to serve several work-supporting features. We exemplify them based on a visionary version of the research publication scenario we used throughout the chapter. Let John be a scientist that prepares a publication together with his colleague Mary. First, John creates the rough draft of the paper structure.
The annotated paper outline is used by his SWE to derive a new major task “publication” and a number of corresponding “subsection” subtasks. The work environment creates a new semantic wiki for collecting publication-specific details and resources. In the next step, the environment uses John’s semantic e-mail archive and earlier task descriptions in order to generate suggestions for co-authors that work on the same topic and collaborated with John in scientific writing in the past. The proposed recommendations are ordered by estimated relevance and presented to John in the task management section. John assigns writing of the evaluation section to Bob and asks Mary to be the internal reviewer of the proposal. The environment automatically notifies Mary and Bob about new responsibilities using annotated e-mail that also contains further details and references specified by John. When communication clients of Bob and Mary receive the notification, they automatically insert new tasks into their personal to-do tracker. Furthermore, SWE generates the thematically focused work environment for Bob and automatically includes related publications, formatting guidelines and templates, locally stored relevant research results, and the calendar with deadlines into this view. Bob uses this information in order to fulfill the task. During writing, he adds further resources (charts, figures, text) to the personal work environment. These resources are seamlessly shared between partners through associated semantic wiki. Once finished, Bob marks the task completed. The notification is automatically sent to others. The next day, Bob and Mary use cross-channel group communication tools (chat, phone, and shared whiteboard) to discuss the evaluation section. The captured records are automatically summarized, annotated by significant keywords and phrases, and persistently added to the semantic wiki of the paper draft for later references. John exploits the track of conversations based on person, topic, or task to retrieve messages relevant for his work.
Communication Systems for Semantic Work Environments
In the next days, John, Mary, and Bob add a number of further corrections and improvements to the evaluation section. After each update, detailed provenance information is generated and persistently stored. This allows John to keep track of authorship and reasons for all changes made. After submission of the publication, the similar work environment is used by conference organizers, reviewers, and authors in order to exchange notifications, request further explanations and clarifications during the review process, give reasons for further improvements, and finally to prepare the camera-ready release of the accepted publication. Our sample scenario indicates that a significant requirement for SWEs is that tools are interoperable, that is, that information created and processed by one tool (or person) needs to be available for further processing and reuse with potentially different applications and user groups. For this reason, the corresponding communication systems denote an important building block of future SWEs.
diversity of research disciplines that contribute to enhancements of communication systems show that diverse aspects need to be considered in the design of better communication tools. The discussion also revealed that supporting knowledge work means to take a cross tool perspective as information reuse and work dependent conceptual views into information occurring frequently, for example, when some information is regarded as a message in a communication context and as a task in a process management context. Research related to the social semantic desktop (Decker, 2006) takes a cross tool perspective and summarizes research towards enhanced support for personal knowledge work.
aCknowledgment This work was funded by the X-Media project (www.x-media-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-026978.
ConClusion referenCes In this chapter, we presented a view into communication systems with respect to work environments, in particular SWEs. We have indicated that sole information transfer denotes only one of the many utilizations of IT-based communication systems. Notably, communication systems play an important role in work processes. Conventional communication systems lack sufficient support for such utilizations and we discussed what the requirements for improved communication systems are and what shortcomings conventional systems have. We gave examples of ongoing research from different research disciplines ranging from machine learning to user interface design and semantic Web research that all work towards an improvement of communication systems. The
Aberer, K., Punceva, M., Hauswirth, M., & Schmidt, R. (2002). Improving data access in P2P systems. IEEE Internet Computing, 6(1), 5-67. Bellotti, V., Ducheneaut, N., Howard, M., & Smith, I. (2003). Taking e-mail to task: The design and evaluation of a task management centered e-mail tool. In CHI (pp. 345-352). Bos, N., Olson, J. S., Gergle, D., Olson, G. M., & Wright, Z. (2002). Effects of four computermediated communications channels on trust development. In CHI (pp. 135-140). Breslin, J. G., Harth, A., Bojars, U., & Decker, S. (2005). Towards semantically-interlinked online communities. In ESWC (pp. 500-514).
Communication Systems for Semantic Work Environments
Buckland, M. K. (1992). Emmanuel Goldberg, electronic document retrieval, and Vannevar Bush’s memex. JASIS, 43(4), 284-294. Bush, V. (1945). As we may think. The Atlantic Monthly, 176(1), 101-108. Cole, R. J., Eklund, P. W., & Stumme, G. (2000). CEM-visualisation and discovery in e-mail. In Principles of Data Mining and Knowledge Discovery (pp. 367-374). Cole, R., & Stumme, G. (2000). CEM - a conceptual e-mail manager. In Iccs (pp. 438-452). Danis, C., Kellogg, W. A., Lau, T., Dredze, M., Stylos, J., & Kushmerick, N. (2005). Managers’ e-mail: Beyond tasks and to-dos. In CHI ’05: CHI ’05 Extended Abstracts On Human Factors in Computing Systems, New York, NY (pp. 13241327). ACM Press. Decker, S. (2006). The social semantic desktop: Next generation collaboration infrastructure [Special issue: APE2006 academic publishing in Europe: The role of information in science and society]. Information Services and Use, 26(2), 139-144. Efimova, L., & de Moor, A. (2005). Beyond personal Webpublishing: An exploratory study of conversational blogging practices. In HICSS, (p. 107a). El-Ansary, S., Alima, L., Brand, P., & Haridi, S. (2003). Efficient broadcast in structured peer-topeer networks. In 2nd International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA (pp. 304-314). Erickson, T., Kellogg, W. A., Laff, M., Sussman, J. B., Wolf, T. V., Halverson, C. A. et al. (2006). A persistent chat space for work groups: The design, evaluation and deployment of loops. In Conference on Designing Interactive Systems (pp. 331-340).
0
Fischer, J., Gantner, Z., Rendle, S., Stritt, M., & Schmidt-Thieme, L. (2006). Ideas and improvements for semantic wikis. In ESWC (pp. 650-663). Franz, T., Staab, S., & Arndt, R. (2007). The XCOSIM Integration Framework for a seamless semantic desktop. In Proceedings of the Fourth International ACM Conference on Knowledge Capture (K-CAP 2007) (pp. 143-150). Franz, T., & Staab, S. (2005). SAM: Semantics aware instant messaging for the networked semantic desktop. In Proceedings of the 1st Workshop On The Semantic Desktop. 4th International Semantic Web Conference (pp. 167-181). Galway, Ireland. Gemmell, J., Bell, G., Lueder, R., Drucker, S. M., & Wong, C. (2002). Mylifebits: Fulfilling the memex vision. In ACM Multimedia (pp. 235-238). Golbeck, J., & Hendler, J. (2005). Inferring trust relationships in Web-based social networks. ACM Transactions on Internet Technology. Haag, S., Cummings, M., & McCubbrey, D. J. (2002). Management information systems for the information age (3rd ed.). Irwin McGraw-Hill. Handel, M., & Herbsleb, J. D. (2002). What is chat doing in the workplace? In CSCW (pp. 1-10). Herring, S. C. (1999). Interactional coherence in cmc. In HICSS ’99: Proceedings of the ThirtySecond Annual Hawaii International Conference on System Sciences, Washington, DC (Vol. 2, pp. 2022). IEEE Computer Society. Herring, S. C., Scheidt, L. A., Wright, E., & Bonus, S. (2005, February 1). Weblogs as a bridging genre. Information Technology & People, 18(30), 142-171. Isaacs, E., Walendowski, A., Whittaker, S., Schiano, D. J., & Kamm, C. A. (2002). The character, functions, and styles of instant messaging in the workplace. In CSCW (pp. 11-20).
Communication Systems for Semantic Work Environments
Kane, B., & Luz, S. (2006). Multidisciplinary medical team meetings: An analysis of collaborative working with special attention to timing and teleconferencing. Computer Supported Cooperative Work (CSCW), V15(5), 501-535. Khoussainov, R., & Kushmerick, N. (2005). Email task management: An iterative relational learning approach. In Second Conference on E-mail and Anti-Spam, CEAS. Kim, M.-S., & Raja, N. S. (1991). Verbal aggression and self-disclosure on computer bulletin boards. Annual Meeting of the International Communication Association. Kogan, S. L., & Muller, M. J. (2006). Ethnographic study of collaborative knowledge work. IBM Systems Journal, 45(4), 759. Krishnamurthy, S. (2002). The multidimensionality of blog conversations: The virtual enactment of September 11. In Internet Research 3.0, Maastricht, The Netherlands. Kushmerick, N., & Lau, T. (2005). Automated e-mail activity management: An unsupervised learning approach. In IUI ’05: Proceedings of the 10th International Conference On Intelligent User Interfaces, New York, NY (pp. 67-74). ACM Press. Lichtenstein, S. (2004). Knowledge development and creation in e-mail. In HICSS. Mabry, E. A. (1998). Frames and flames: The structure of argumentative messages on the net (pp. 13-26). Mackay, W. E. (1988). More than just a communication system: Diversity in the use of electronic mail. In CSCW (pp. 344-3535). Masui, T., & Takabayashi, S. (2003). Instant group communication with quickml. In GROUP (pp. 268-273).
McDowell, L., Etzioni, O., & Halevy, A. Y. (2004a). Semantic e-mail: Theory and applications. J. Web Sem., 2(2), 153-183. McDowell, L., Etzioni, O., Halevy, A. Y., & Levy, H. M. (2004b). Semantic e-mail. In Www (pp. 244-254). Merz, P., & Gorunova, K. (2005). Reliable multicast and its probabilistic model for job submission in peer-to-peer grids. In 6th International Conference on Web Information Systems Engineering (WISE), New York (pp. 504-511). Möller, K., Bojars, U., & Breslin, J. G. (2006). Using semantics to enhance the blogging experience. In ESWC (pp. 679-696). Nardi, B. A., Whittaker, S., & Bradner, E. (2000). Interaction and outeraction: Instant messaging in action. In Conference on Computer Supported Cooperative Work (CSCW) (pp. 79-88). Nonnecke, B., & Preece, J. (2000). Persistence and lurkers in discussion lists: A pilot study. In HICSS ’00: Proceedings of the 33rd Hawaii International Conference on System Sciences, Washington, DC (Vol. 3, p. 3031). IEEE Computer Society. Oren, E., Breslin, J. G., & Decker, S. (2006). How semantics make better wikis. In WWW (pp. 1071-1072). Osterfeld, F., Kiesel, M., & Schwarz, S. (2005). Nabu - a semantic archive for XMPP instant messaging. In S. Decker, J. Park, D. Quan & L. Sauermann (Eds.), Proceedings of the 1st Workshop On The Semantic Desktop. 4th International Semantic Web Conference (pp. 159-166). Galway, Ireland. Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Schenker, S. (2001). A scalable content-addressable network. In ACM SIGCOMM Conference, San Diego, CA (pp. 161-172). Rowstron, A., & Druschel, P. (2001). Pastry: Scalable, decentralized object location and routing for
Communication Systems for Semantic Work Environments
large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms, Heidelberg, Germany (pp. 329-350). Rowstron, A., Kermarrec, A., Castro, M., & Druschel, P. (2001). SCRIBE: The design of a large-scale event notification infrastructure. In Proceedings of the 3rd International Workshop on Networked Group Communication, London, UK (pp. 30-43). Stoica, I., Morris, R., Karger, D., Kaashoek, M., & Balakrishnan, H. (2001). Chord: A scalable peerto-peer lookup service for internet applications. In ACM SIGKOMM Conference, San Diego, CA (pp. 149-160). Veinott, E. S., Olson, J., Olson, G. M., & Fu, X. (1999). Video helps remote work: Speakers who need to negotiate common ground benefit from seeing each other. In CHI ’99: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY (pp. 302309). ACM Press. Vogiazou, Y., Eisenstadt, M., Dzbor, M., & Komzak, J. (2005). From Buddyspace to CitiTag: Large-scale symbolic presence for community building and spontaneous play. In ACM Symposium on Applied Computing (SAC) (pp. 1600-1606). Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., & Studer, R. (2006). Semantic wikipedia. In WWW (pp. 585-594).
Wagner, C. (2004). Wiki: A technology for conversational knowledge management and group collaboration. Communications of the Association for Information Systems, 13, 265-289. Watts, D. J., Dodds, P. S., & Newman, M. E. (2002). Identity and search in social networks. Science Magazine, 296, 1302-1305. Watt, J., Walther, J., & Nowak, K. (2002). Asynchronous videoconferencing: A hybrid communication prototype. hicss, 01, 13. Whittaker, S., Jones, Q., Nardi, B. A., Creech, M., Terveen, L. G., Isaacs, E. et al. (2004). Contactmap: Organizing communication in a social desktop. ACM Trans. Comput.-Hum. Interact., 11(4), 445-471. Whittaker, S., & Sidner, C. L. (1996). E-mail overload: Exploring personal information management of e-mail. In CHI (pp. 276-283). Zhang, J., & Van Alstyne, M. W. (2004). SWIM: Fostering social network based information search. In CHI Extended Abstracts (pp. 1568).
endnotes 1 2
3 4 5 6
http://youtube.com/ http://kwarc.eecs.iu-bremen.de/projects/ stex/index.html http://www.ontoprise.de/ http://www.foaf-project.org/ http://www.w3.org/2006/vcard/ns http://www.w3.org/2000/10/swap/pim/contact
Chapter III
Semantic Social Software:
Semantically Enabled Social Software or Socially Enabled Semantic Web? Sebastian Schaffert Salzburg Research Forschungsgesellschaft, Austria
introduCtion
social software
Recently, the combination of Social Software with Semantic Web technology has been gaining significant attention in the Semantic Web community. This is exemplified by the surprisingly high number of submissions and attendees of the 1st Workshop on Semantic Wikis (Völkel & Schaffert, 2006) that took place at the European Semantic Web Conference 2006 (ESWC06) last year, as well as the ESWC06 best poster award for the Semantic Wikipedia (Völkel, Krötzsch, Vrandecic, Haller & Studer, 2006). This chapter describes what I believe makes Social Software attractive for the Semantic Web community, and what makes the Semantic Web attractive for the Social Software community. It also derives challenges for the Semantic Web community to address that seem relevant to us based on our experience with Social Software and the Semantic Web. In the remainder of this introduction, I briefly introduce Social Software, the Semantic Web, and the combination of both, which I call Semantic Social Software.
According to Wikipedia, Social Software is software that “enables people to rendezvous, connect or collaborate through computer-mediated communication and to form online communities.”1 Although this definition in principle also includes technologies that have already existed for a long time (like e-mail or Usenet), the term Social Software usually only comprises more recent developments like wikis, Weblogs, instant messaging (e.g., AIM, ICQ), social bookmarking (e.g., del.icio.us), media sharing (e.g., Flickr, YouTube), and social networking systems (e.g., MySpace, OpenBC). Today, huge amounts of content are available in Social Software systems. The free Web encyclopedia Wikipedia now hosts over 4 million entries of partly astonishing quality. The social networking site MySpace is one of the most popular Web sites overall (ranked number 4 by Alexa, following closely after Google, Yahoo, and MSN). According to the Web log index Technorati,2 there are currently about 40 million blogs with a
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Semantic Social Software
doubling time of about 6 months and around 1.2 million blog posts every day. What makes Social Software interesting is not only the huge amount of content but that it considerably changes the way content is created and consumed, maybe even more so than the Web did some 15 years ago: where the traditional process of publishing content was expensive and time consuming, Social Software allows virtually everyone to publish on a mouse click. To speak in market terms, with Social Software, consumers become prosumers. Because of these radical changes in content production, I consider Social Software a disruptive technology.
on information to a Web focused on relations between things. According to W3C founder and chair Tim Berners-Lee, the Semantic Web will be the next big thing. The current Semantic Web approach may be briefly described as enriching the existing Web with meta-data and (meta-)data processing so as to provide Web-based systems with advanced (socalled intelligent) capabilities, in particular with context-awareness and decision support. What distinguishes the Semantic Web from previous AI approaches is that it assumes a distributed but strongly connected web of small pieces of formal knowledge rather than big, centralised knowledge bases.
semantic web semantic social software The vision of the Semantic Web is to move from “dumb” content that is suitable for presentation only to “smart” content that may be processed by machines and used in different settings. It is also to move from application-centric systems to data-centric systems, and from a Web focused
Figure 1. Nova Spivack's “Metaweb”
Semantic Social Software is the combination of Social Software with Semantic Web technologies. Its basic ideas are on the one hand to improve usage of Social Software by adding metadata and on the other hand to improve the process of
Semantic Social Software
creating Semantic Web metadata by using Social Software. These two different perspectives are discussed extensively in the next sections. The combination of Social Software and Semantic Web technologies seems to fit well. This may be due to the remarkable similarities between Social Software—which is about small but strongly connected pieces of content from different sources with differing opinions—and the Semantic Web, which is about small but strongly connected pieces of formal knowledge from different sources, with different levels of precision and trustworthiness, maybe even inconsistencies. The difference is in this sense only in the level of abstraction: where Social Software mostly deals with social connections and human readable content, the Semantic Web mostly deals with formal connections and formal content. In a way, the step from traditional AI to the Semantic Web is thus very similar to the step from traditional content production to Social Software. This perspective is also in line with a number of recent predictions for the future of the Web: for example, technology evangelist Nova Spivack outlines in his 2004 articles3,4 what he calls “The Metaweb” (Figure 1). The Metaweb is essentially about using social connections to form information connections and vice versa. As can be seen in Figure 1, Nova Spivack is convinced that “The Metaweb is emerging from the convergence of the Web, Social Software and the Semantic Web,” connecting human and machine intelligence and moving from “just a bunch of interacting parts” to “a new higher-order whole.” This chapter continues as follows: the following section describes two different perspectives one can take on Semantic Social Software, namely the “Semantically Enabled Social Software” perspective (meaning the usage of semantic metadata to enhance existing social software) and the “Socially Enabled Semantic Web” perspective (meaning the usage of Social Software to create semantic metadata). Then, I describe
three different kinds of Social Software whose combination with Semantic Web technology appears promising: Semantic Wikis, Semantic Weblogs, and e-portfolios. Out of these three applications, I derive salient aspects of Semantic Social Software in the following section. I then present the mentioned major challenges for the Semantic Web community, and conclude with an overview of the related work and perspectives.
“semantiCally enaBled soCial software” or “soCially enaBled semantiC weB”? Semantic Social Software is traditionally approached from two directions. The first, which I call “Semantically Enabled Social Software,” makes use of Semantic Web technology to enhance existing Social Software. The second, which I call “Socially Enabled Semantic Web” makes use of the structures (e.g., links between pieces of content or between people) in Social Software to form Semantic Web data. In the following, I describe these two approaches and show that they are really just two sides of the same story, because the difference is primarily in the use of applications and not in the applications themselves.
Semantically Enabled social software As has been outlined in the introduction, massive amounts of digital content are nowadays available in Social Software systems. Although content in Social Software systems is highly connected via hyperlinks and/or social networks, finding relevant content is becoming increasingly difficult, because the existing structure is used merely for presentation purposes. With diverging applications it is furthermore often hard to exchange content between different systems.
Semantic Social Software
“Semantically Enabled Social Software” tries to overcome these issues by applying Semantic Web technology to Social Software. The existing informal or semiformal structures like hyperlinks are augmented by machine-readable formal descriptions (“metadata”) that make explicit the actual meaning behind a connection. Such metadata allows for more sophisticated services, like improved search and navigation (e.g., queries on the structure, context visualisation, derived knowledge), personalised presentation of content (e.g., based on personal preferences), and improved interoperability between systems (e.g., to integrate several applications in a company process, or to support smart agents). For example, a hyperlink from an article in Alice’s Web log to an article in Bob’s Web log could be annotated with “agrees with” or “disagrees with.” Articles in a Web log could furthermore be associated with certain topics like “EU constitution” (as a subtopic of “EU politics”). This would give readers the opportunity to search not only for articles relevant to a certain topic but also to find supporting or diverging opinions on this topic. This approach is, among others, followed by the Semantic Wikipedia project (Völkel et al., 2006), which aims to enhance the existing Wikipedia by semantic annotations to facilitate search and navigation, and by the semiBlog project (Möller, Bojars & Breslin, 2006), which aims to simplify search, connectivity, browsing, and management of a Web log by using semantic annotations.
Socially Enabled semantic web The “Socially Enabled Semantic Web” sees Semantic Social Software as a tool that simplifies and thus improves the creation of metadata on the Semantic Web, in the same manner as Social Software simplified and improved the creation of data on the normal Web. This approach is founded in the observation that creating formal, machine-readable content on the Semantic Web is hard, which is arguably still one of the most
significant barriers to the wide-spread adoption of the Semantic Web. The reason for this is primarily that creating formal metadata currently requires significant expertise in the modelled domain (e.g., biology), as well as the use and intricacies of the used formal languages (e.g., RDF and OWL). The latter issue could be significantly mitigated by appropriate tools, but existing tools are insufficient in the sense that they are much too complicated to use for non-technical users. As a consequence, formalised content on the Semantic Web is still rare and limited to some selected domains. Semantic Social Software has the potential to overcome some of these problems. It supports the creation of metadata in a number of ways. First, it builds upon existing structures, where connections reflect real-world relationships that are “natural” to the people using the software. Second, it supports the collaboration of people with different backgrounds and expertise, allowing, for example, a biologist and a computer engineer to work on the same knowledge model, augmenting each other. And third, Semantic Social Software provides instant gratification: every bit of formal knowledge contributed by a user is immediately usable. For instance, a common ontology engineering process supported by a Semantic Wiki (as a special kind of Semantic Social Software) could be to start with a collection of normal Wiki pages (as, e.g., found on Wikipedia) that make up the domain to be modeled and then augment the existing hyperlink structure between Wiki pages with machine readable annotations. The first task (writing informal Wiki content) could be easily achieved by a biologist, the second task (formalising the hyperlink structure) by a knowledge engineer. Both could contribute their expertise and collaborate on the creation of the knowledge model. This approach is, for example, followed by Peter Mika (2005) in his work about complementing ontologies with social networks and by our own work on collaborative knowledge engineering
Semantic Social Software
with the Semantic Wiki IkeWiki (Schaffert, 2006; Schaffert, Gruber, & Westenthaler, 2005).
Two Sides of the Same Story Although the two perspectives described above have originally developed separately and with different application scenarios in mind, the actual software used in both perspectives shares many properties, even to the extent of being actually the same tool used in different settings. This is best exemplified by the history of our own Semantic Wiki system IkeWiki (Schaffert, 2006; Schaffert, Westenthaler, & Gruber, 2006; Schaffert et al., 2005), originally developed to support non-technical domain experts in the process of formalising their knowledge (i.e., “Socially Enabled Semantic Web”), it is now primarily used in a number of settings focussed on enhancing the functionality of Social Software (i.e., “Semantically Enabled Social Software”), like our own group’s knowledge base or the conference wiki of the Social Skills durch Social Software conference. Similar developments can be seen in other projects. I therefore argue that, although the two research directions have different goals, they are actually only two sides of the same story, namely “Semantic Social Software.” This joint perspective allows us to investigate Semantic Social Software as a kind of software that is—based on its properties—helpful in many different settings, rather than looking at it merely as a tool designed for a single purpose. This chapter investigates different applications of Semantic Social Software and discusses them under both, the “Semantically Enabled Social Software” and the “Socially Enabled Semantic Web” perspective.
three appliCations of semantiC soCial software Social Software is very diverse and there exist numerous different applications. In this chapter,
I introduce three applications where a combination with Semantic Web technology is or might be fruitful: semantic Wikis and semantic Web logs as existing applications, and e-portfolios as an emerging application. The convergence of the two different approaches to Semantic Social Software can be seen in all three applications.
semantic wikis: Community-Authored Knowledge models A wiki is essentially a collection of Web sites connected via hyperlinks. Many different wiki systems exist, but they commonly have a straightforward interface for editing content with a simplified syntax that makes it very easy to set hyperlinks to other pages within the wiki. Therefore, content in a wiki is usually strongly connected via hyperlinks. Furthermore, editing of content in wiki systems is Web-based and access is often unrestricted or at least hardly restricted. Most wiki systems also provide a rollback mechanism for reverting back to previous versions in case of accidental or undesired changes. Wikis are used in many areas, like encyclopedia systems (e.g., Wikipedia), as personal or group knowledge management tools, as collaboration tools, or in collaborative learning environments. The main idea of a Semantic Wiki is to annotate the inherent hyperlink structure of a Wiki with symbols that describe the meaning of a link in a machine readable fashion. A link from Mozart to Salzburg could, for example, be annotated with lived in or born in. Such an annotation can then be used, for example, enhanced presentation by displaying contextual information, enhanced navigation by giving easy access to relevant related information, and enhanced “semantic” search that allows to query the context in addition to the content. Semantic Wikis are also excellent tools for collaborative creation of knowledge models. Based on (existing or emerging) natural language
Semantic Social Software
Figure 2. Semantic annotation in IkeWiki
descriptions of concepts and individuals, formal knowledge models can be created successively. Natural language descriptions can be created primarily by domain experts (e.g., biologists) and then formalised in collaboration with knowledge engineers. For advanced knowledge engineering tasks, Semantic Wikis offer the possibility to export the knowledge model as RDF or OWL, which can then be loaded into more sophisticated and complicated tools like the OWL ontology editor Protegé.5 Semantic Wikis exist in many different flavours (e.g., Semantic MediaWiki, SemWiki, IkeWiki, SemperWiki, PlatypusWiki). Some systems are still primarily focused on the page content and see annotations as optional “added value”; they follow more the “Semantically Enabled Social Software” approach (e.g., Semantic MediaWiki). For others, the semantic annotations are in the foreground and sometimes even more important than the actual content; they follow the “Socially
Enabled Semantic Web” approach (e.g., IkeWiki and PlatypusWiki). Figure 2 shows how the page about the Bilberry is presented in IkeWiki, the Semantic Wiki developed at Salzburg Research. It includes several kinds of semantic metadata: (1) Type information is shown below the page title, (2) Links to (semantically) related pages are displayed in a separate “references box” on the right hand side, and finally, (3) shows interactive typing of links using AJAX technology, making it simple for users to add metadata to existing structures.
semantic web logs: Formalising Discourse A web log or blog is a personal, web-based journal very similar to a digital diary that everyone can read. It is usually organised as a list of short articles ordered by publication date. A very significant difference to a wiki is that content can usually only
Semantic Social Software
be authored by the owner of the weblog. Visitors, however, are usually allowed to leave comments about articles that are shown separately. Also, the software used by weblogs is often aware of other Weblogs linking to one of the articles and show this information in a so-called trackback section as part of the article. Furthermore, weblogs can integrate listings of other weblogs in their navigation (the so-called blogroll). In this way, Weblogs often become part of a strongly networked community, the so-called blogosphere. Karger and Quan (2004) summarise this “essence of blogging” under three key concepts: •
• •
Publishing information in small, discrete notes, as opposed to large, carefully organised Web sites Decentralised, per-user publication Exposing machine-readable listings
A semantic Web log uses semantic Web technology for improved searching and navigation in the blogosphere. Firstly, Web log posts can be associated with formal metadata describing their content, thus improving exchange, search, and retrieval of content. Secondly, the cross-linking between weblog posts in different Weblogs can be used for representing the discourse about a certain subject. In Möller, Bojars, and Breslin (2006), the first kind of metadata use is called
Figure 3. Structural relations in the blogosphere
content metadata, while the second kind is called structure metadata. Figure 3 (Möller et al., 2006) shows some of the possible structural relations between Weblogs. A particularly interesting property of Semantic Weblogs is the formal representation of discourse about different opinions on a certain topic and the corresponding social networks. This gives readers the possibility to search and navigate not only by topic but also to follow a discussion across different weblogs and to search specifically for certain opinions and connections between Weblogs. A salient aspect of discourse representation is the representation of second/higher order metadata (or “metadata about metadata”): where normal blog posts may state opinions about the content of other posts, Semantic Weblogs may make formal statements about the formal content of other Semantic Weblogs, requiring more expressive representation formalisms. This could, for example, also be used to form “networks of trust” concerning formal metadata, touching one of the core remaining open issues of the Semantic Web.
e-portfolios: Collecting learning artefacts The E-Portfolio Method (Barrett, 2005) is an educational method for supporting life-long learning. The basic idea is to collect all kinds of (digital) artifacts that document a personal learning process in a digital portfolio. Such artifacts may be as coarse as certificates or as fine-grained as every-day documentation of one’s learning progress like in a diary, including all relevant material, for example, test results, articles, presentations, group works, house work, contributions to an online discussion, self-written Wikipedia articles, and so forth. In other contexts, e-portfolios are for this reason also called Lifelog or Lifeblog. The goal of this collection is to be able to reflect about one’s own learning progress, to be able to extract parts of one’s personal portfolio
Semantic Social Software
for external presentation, to design a personal development plan (PDP), to validate and assess the learning outcomes against this plan, and to share learning artefacts with others. Artefacts in such a digital portfolio are strongly connected, both within the personal portfolio and with artefacts in the portfolios of other learners. Obviously, the e-portfolio method requires software support. An early e-portfolio system is Elgg (Tosh & Werdmuller, 2004).6 In Elgg, users can sign up and create their own digital portfolio where they can describe their personal skills and interests, publish articles in their personal Web log, collect links to other websites, and upload files that they consider relevant, particularly newsfeeds from other’s portfolios and weblogs. This profile can then, similar to social networking services, be used to find other people with similar interests, or to find communities with interesting topics. In this sense, e-portfolio systems can be seen as next generation social software. Other existing e-portfolio systems, like OSP7 or Blackboard’s e-portfolio tool, put more emphasis on the integration into learning and course management systems, but the social networking aspect is always existent to a certain degree. To the best of my knowledge, a Semantic e-portfolio does not yet exist (beyond the social tagging used by Elgg). However, as we argue in Hilzensauer, Hornung-Prähauser, and Schaffert (2006), such software could significantly benefit from Semantic Web technology. First, the collection of artifacts from different systems (like Weblogs, Wikis, online journals) requires interoperability to a degree that is not yet existent. Where current e-portfolio systems can only include raw file content or content created in the systems themselves, a Semantic e-portfolio could go much beyond that and integrate content from different sources and different systems, encoded in different media formats. Second, semantic metadata about the artifacts in a digital portfolio could be utilised to more easily create custom-tailored presentations out of the personal portfolio, for
0
example, for documenting one’s learning process for the purpose of evaluation or for creating a tailored resume for the application to a specific job. Third, semantically annotated artifacts could more easily—semi-automatically—be matched against one’s personal development plan, allowing for a better reflection by the learner. And fourth, semantic annotations allow for easier navigation and retrieval through an inevitably huge collection of “life artifacts.” E-pprtfolios also provide an interesting and challenging application for Semantic Web technologies. As the e-portfolio aims to represent a learner’s learning process and thus his skills and knowledge, the semantic metadata in an e-portfolio could mirror the learner’s knowledge and interactions with the environment. Such data, if used with the necessary care for privacy, can be useful in many application areas, ranging from automatic creation of personal timetables and development plans over finding relevant jobs in an online job database to assembling a team with complementary skill sets for a certain task in a company.
salient aspeCts of semantiC soCial software Semantic Social Software has a number of properties that make it interesting as a research topic besides the two major “sides” described above. In the following, I give an incomplete overview over what I consider salient aspects of Semantic Social Software.
testbed for semantic web Technology Semantic Social Software is software that can be developed quickly and easily, and can build upon existing applications and principles. Augmenting the existing social and hyperlink structures with formal annotations is rather intuitive and a natural
Semantic Social Software
extension of existing applications. At the same time, Semantic Social Software shows many of the promises and also of the problems of the big “Semantic Web Vision.” Examples are the improved searching and navigation, personalisation and content adaptation, interoperability, open world assumption, coupling of data and metadata, evolving knowledge models, inconsistencies in real-world data (as many authors work on the knowledge model), ontology alignment (as content from different sources is integrated), and so forth. Some of these aspects are further elaborated below. I therefore argue that Semantic Social Software can be an ideal test bed for Semantic Web technology. If problems with the technology arise in Semantic Social Software, they will likely also arise on the large Semantic Web. If technology works properly in Semantic Social Software, there is also a high chance that it works on the large Semantic Web.
Coupling of Data and Metadata An aspect that is often overseen in Semantic Web research is that metadata rarely stands for itself. Instead, metadata is in most cases only relevant with respect to the data it describes. Combined consideration of data and metadata therefore leads to more realistic settings and “connects human and machine intelligence.” In Semantic Social Software, the combination of data and metadata is quite natural, as existing Social Software with existing content is merely augmented (and not replaced) by metadata. Combined access to metadata and data in Semantic Social Software could, for example, mean enhanced search and navigation or context-dependent presentation of human-readable data. A combination of data and metadata requires consideration of query and reasoning languages that are capable of processing both, data and metadata. An example for such a language is Xcerpt (Schaffert, 2004), a rule-based query
language suitable for any kind of semi-structured data, including RDF and HTML. A further issue that often becomes apparent in Semantic Social Software is the so-called URI crisis, that is, the question whether a URI used in the metadata refers to a Web page (Wiki page, Web log entry) or to the content described in that page (e.g., “elephant”). Another interesting aspect of coupling data and metadata is the semi-automatic extraction of metadata from the actual data using, for example, natural language processing techniques. For example, a page in a Semantic Wiki could be annotated automatically based on the content of the page. As automatic metadata extraction is currently not perfectly reliable, it would make sense to mark such annotations as “automatic” and give users the opportunity to revise them if needed.
emerging and evolving knowledge models A very interesting property of Semantic Social Software is that knowledge models do not appear in discrete “releases,” but slowly and dynamically evolve over time. A knowledge model in, for example, a Semantic Wiki may begin with a small set of annotations for a single page to a full-fledged ontology. Also, some parts of the knowledge model will usually be more formalised than others, reflecting areas with more imminent needs or higher community interest. Such evolving knowledge models raise a number of interesting questions that are also relevant for the Semantic Web as a whole (which inevitably will also be an evolving knowledge model). Primarily, evolving systems will be full of inaccuracies and even inconsistencies, demanding for more tolerant formal languages than those that are proposed today. Also, trust, versioning, and merging/aligning are issues that will need to be addressed appropriately.
Semantic Social Software
Self-Organising Communities around Emerging Topics In many Social Software systems, communities organise themselves around certain emerging topics that are relevant at some point in time. In Semantic Social Software, such emerging topics are likely to also be more precisely formalised than others, and could be identified automatically by applying appropriate reasoning. Such information could be used to provide readers with information about “what is relevant,” similar to “recent changes” listings in current systems.
Challenges for the semantiC weB Community In our works with and considerations about different kinds of Semantic Social Software, there have been a number of frequently arising challenges that should be taken into consideration by the Semantic Web community if the Semantic Web is supposed to be adopted by a wide community. I present some of these challenges in the following.
“Keep it Simple, Stupid!” Ultimately, Semantic Web Technology will only “take off” if it is simple enough to use, that is, the additional effort is in average significantly less than the actual benefit. Systems that require users to learn complicated formal languages imposing (for “outsiders”) arbitrary-looking restrictions will very likely fail. Often, as Jim Hendler said,8 “a little semantics goes a long way.” In addition, we need tools that hide most of the complexity of the language from the user, for example, domain-specific interfaces for creating knowledge and visual interfaces for composing queries. Semantic Social Software goes into this direction, but fails when the underlying languages become too expressive (e.g., number restrictions).
don’t forget the data in metadata As has been lined out above, metadata is “data about data,” and only rarely useful on its own. Considering data and metadata separately therefore only makes little sense. Instead, it is necessary to develop formalisms, query languages, and reasoning languages that can handle both data and metadata. Steps into this direction have been made (e.g., in Xcerpt), but they are not in the “mainstream” and there is still a long way to go.
Be Tolerant About Inconsistencies The world is full of inaccuracies and inconsistencies, particularly if different people with different viewpoints work on the same knowledge model—as is the case in Semantic Social Software, and as will be the case in the Semantic Web. Yet, the Semantic Web, like many or most of its predecessors in AI, builds upon Classical Logic, which is by the “ex falsum quodlibet” rule intolerant to inconsistencies. What the Semantic Web as a web of inconsistencies, errors, and inaccuracies needs are formalisms that do not break and can work around problematic parts of the knowledge model (e.g., Schaffert et al., 2005). Various such formalisms have been proposed in the pre-Semantic Web era, for example, paraconsistent logics or fuzzy logics.
there is no “one size fits all” Different applications require different representation formalisms. Whereas languages like OWL are good for representing conceptual hierarchies, they are incredibly bad in representing presentational structures (HTML and CSS do it just fine). What the Semantic Web needs is a flexible and extensible set of formal languages, including formalisms for temporal knowledge, location, rules, situations, and events, and so forth. Even
Semantic Social Software
higher-order languages will be useful in many applications, despite their known undecidability.
Reasoning Needs to be Efficient Many applications work in close interaction with human users. In such situations, it is unacceptable to have complex reasoning systems that take several minutes even for small data sets. It is necessary to improve the performance of existing systems, and to allow developers more fine-grained control over the expressivity and correctness of the reasoner. Often, it is more important to get an almost correct answer within a few milliseconds than to wait for a completely correct answer after several minutes.
Reasoning Needs Truth Maintanance A significant issue with reasoning systems is that their reasoning is not tracable for users. As a consequence, users often get the feeling of being patronised by the system when the system makes a decision based on reasoning. For Semantic Web applications, it would thus be desirable to allow the user to see all justifications for a decision. A typical example is when the system infers the type for some resource (e.g., a Wiki page) based on the type of a relation to another resource and the user does not agree with this inference, but cannot remove the type directly. In such situations, users should have the option to interfere and modify some of the justifications as needed.
metadata needs versioning In evolving knowledge models, it is important and interesting to keep track of the modifications done not only in the data but also in the metadata. For this purpose, it is necessary to define what makes up a “unit” or “transaction,” as a single modification in the application most likely has effects at many different places in a knowledge model.
related work Many people within the Semantic Web community are investigating Semantic Social Software. Most noteworthy is the Semantic Wiki movement that has gained a lot of attention, culminating in last year’s 1st Semantic Wiki Workshop (Vökel & Schaffert, 2006) colocated with the 3rd European Semantic Web Conference (ESWC06). Other researchers have also worked in the field. An interesting work is the article “Ontologies Are Us” by Peter Mika (2005), which discusses the relation between social networks and Semantic Web relationships. Other related works have been mentioned throughout this chapter, most notably various Semantic Social Software systems, and Nova Spivack’s “Metaweb.” Related to the Semantic Web challenges I present in the last section is Frank van Harmelen’s invited talk entitled Where does it Break? or: Why the Semantic Web is not just “Research as Usual” at ESWC06, where he presents similar challenges for the Semantic Web community.
ConClusion and perspeCtives Semantic Social Software as the combination of Social Software and Semantic Web technologies has great potential, for Social Software developers as well as for Semantic Web researchers. In this chapter, I presented the two different perspectives on Semantic Social Software and gave three example applications. From our experience with developing and working with Semantic Social Software, I derived challenging research issues for the Semantic Web. These research issues need—in my opinion—to be investigated for making Semantic Web technology successful. In future work, our group will develop a common framework for implementing Semantic Social Software (and other kinds of semantic software), called the “knowledge-based content management system.” The aim of this system is to
Semantic Social Software
provide a robust foundation for storing, accessing, and processing content as well as metadata. The system will possibly, and hopefully, solve and implement some of the issues mentioned in the previous section. The combination of Semantic Web technology and Social Software has also been chosen as the next research focus of Salzburg NewMediaLab, Austria’s industrial competence centre for research on digital content.
aCknowledgment This work has been partly funded by Salzburg NewMediaLab, a competence centre in the Kind action line funded by the Austrian Federal Ministry of Economics and Labor (BMWA) and the state of Salzburg.
referenCes Barrett, H.C. (2005). Researching electronic portfolios and learner engagement (Tech. Rep., The REFLECT Initiative). Hilzensauer, W., Hornung-Prähauser, V., & Schaffert, S. (2006, September). Requirements for a personal development planning in e-portfolios supported by Semantic Web technology. In 6th International Conference on Knowledge Management (I-KNOW06), Graz, Austria. Karger, D.R., & Quan, D (2004). What would it mean to blog on the Semantic Web. In Proceedings of the 3rd International Semantic Web Conference (ISWC04), Hiroshima, Japan. Springer-Verlag. Krötzsch, M., Vrandecic, D., & Völkel, M. (2005). Wikipedia and the Semantic Web – the missing links. In Proceedings of the WikiMania2005. Mika, P. (2005, November). Ontologies are us: A unified model of social networks and semantics. In Proceedings of the 4th International Semantic
Web Conference (ISWC 2005), Galway, Ireland (LNCS 3729). Springer-Verlag. Möller, K., Bojars, U., & Breslin, J.G. (2006, June). Using semantics to enhance the blogging experience. In Proceedings of the 3rd European Semantic Web Conference (ESWC06), Budva, Montenegro (LNCS 4011). Springer-Verlag. Oren, E. (2005). SemperWiki: A semantic personal Wiki. In 1st Workshop on The Semantic Desktop, colocated with ISWC05, Galway, Ireland. Schaffert, S. (2004, October). Xcerpt: A rule-based query and transformation language for the Web. Ph.D. Thesis, University of Munich. Schaffert, S. (2006, June). IkeWiki: A semantic wiki for collaborative knowledge management. In 1st International Workshop on Semantic Technologies in Collaborative Applications (STICA’06), Manchester, UK. Schaffert, S., Bry, F., Besnard, P., Decker, H., Decker, S., Enguix, C. et al. (2005, November). Paraconsistent reasoning for the Semantic Web (Position paper). In Workshop on Uncertainty Reasoning for the Semantic Web (URSW05) at ISWC05, Galway, Ireland. Schaffert, S., Gruber, A., & Westenthaler, R. (2005, November). A semantic wiki for collaborative knowledge formation. In Semantics 2005, Vienna, Austria. Trauner Verlag. Schaffert, S., Westenthaler, R., & Gruber, A. (2006, June). IkeWiki: A user-friendly semantic wiki. In Proceedings of the 3rd European Semantic Web Conference (ESWC06) – Demonstrations Track, Budva, Montenegro. Tazzoli, R., Castagna, P., & Campanini, S.E. (2004). Towards a semantic wiki WikiWeb. In 3rd International Semantic Web Conference (ISWC2004), Hiroshima, Japan. Tosh, D., & Werdmuller, B. (2004). Creation of a learning landscape: Weblogging and social
Semantic Social Software
networking in the context of e-portfolios (Tech. Rep.). University of Edinburgh.
endnotes 1
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., & Studer, R. (2006). Semantic wikipedia. In Proceedings of the 3rd European Semantic Web Conference (ESWC06) – Poster Track, Budva, Montenegro. Völkel, M., & Oren, E. (2006). Personal knowledge management with wemantic wikis. Völkel, M., & Schaffert, S. (2006). In 1st Workshop “From Wiki to Semantics” (SemWiki’06) – Colocated with ESWC’06, Budva, Montenegro.
2 3
4
5 6 7 8
http://en.wikipedia.org/w/index.php?title=Social_software\ &oldid=82649096 http://www.technorati.com http://novaspivack.typepad.com/nova_spivacks_weblog/2004/03/from_applicatio. html http://novaspivack.typepad.com/nova_spivacks_weblog/2004/04/new_version_of_ .html http://protege.stanford.edu/ http://www.elgg.org http://www.osportfolio.org/ as conference chair in the opening speech of the 2003 International Semantic Web Conference; Sanibel Island, Florida, USA, October 2003
Section II
Semantic Work Environment Tools
Chapter IV
SWiM:
A Semantic Wiki for Mathematical Knowledge Management Christoph Lange Jacobs University, Germany Michael Kohlhase Jacobs University, Germany
aBstraCt In this chapter, we present the SWiM system, a prototype semantic wiki for collaboratively building, editing, and browsing mathematical knowledge. SWiM is based on the semantic wiki IkeWiki, but replaces the wiki text with OMDOc, a markup format and ontology language for mathematical documents as the underlying knowledge representation format. Our long-term objective is to evolve SWiM into an integrated platform for ontology-based added-value services. As a social semantic work environment, it will facilitate the creation of a shared, public collection of mathematical knowledge (e.g., for education) and serve scientists as a tool for collaborative development of new theories. We discuss the architecture of the SWiM system focusing on its conceptual base, the OMDOc system ontology. In contrast to other semantic wikis, SWiM uses the system ontology to operationalize the fragments and relations of the underlying representation format, not only the domain ontology, that is, the relations between the represented objects themselves. We will present the prototype implementation of the SWiM system and propose its further evolution into a service platform for science and technology.
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
SWiM
introduCtion The Internet plays an ever-increasing role in our everyday life, and science is no exception. The way we do (conceive, develop, communicate about, and publish) scientific or mathematical knowledge will change considerably in the next 10 years. In particular, most of the activities will be supported by scientific services, that is, software systems connected by a commonly accepted distribution architecture. It is a crucial but obvious insight that true cooperation of such services is only feasible if they have access to a joint corpus of joint knowledge. A central prerequisite for this is a technology that is capable to create, maintain, and deploy content-oriented (a.k.a. “semantic”) libraries of science on the Web that make the structure of scientific knowledge explicit, so that it can serve as a context for the objects of science. As mathematics is a discipline, which has always been especially aware of the syntax/semantics distinction and has treated its theories as first-class objects, we will pursue this ambitious goal for mathematical knowledge first and extend the methods and technologies to the (natural) sciences and engineering later. If we extend Tim Berners-Lee’s vision of a “Semantic Web” as a Web of data for applications to mathematics, many services come to mind: cutting and pasting mathematical text from a search engine into a computer algebra system, automated checking or explanation of proofs published on the Web, if they are sufficiently detailed and structured, semantic search or data mining for mathematical concepts (“Are there any objects with the group property out there?”), classification (given a concrete mathematical structure, is there a general theory for it?), and many more. All of these services can currently only be performed by humans, limiting the accessibility and thus the potential value of the information. On the other hand, the content-oriented mathematical libraries can only be generated by
humans, as has been proved by the successful PlanetMath project (Krowne, 2003; “PlanetMath,” 2007), which features free, collaboratively created entries on more than 10,000 mathematical concepts. PlanetMath, however, is not completely machine-understandable. There is a fixed set of metadata associated with each article, including its type (definition, theorem, etc.), parent topic, Mathematics Subject Classification, synonyms, and keywords, but the content itself is written in LaTeX and can only be searched in full-text mode.
semantiC mathematiCal knowledge markup with omdoc We make use of the structural/semantic markup approaches using formats such as OpenMaTh (Buswell, Caprotti, Carlisle, Dewar, Gaetano, & Kohlhase, 2004), Content MaThML (Carlisle, Ion, Miner, & Poppelier, 2003), and OMDOc (Open Mathematical Documents (Kohlhase, 2006)), the latter of which embeds and extends the former ones. These formats, constituting the state of the art for representing mathematical knowledge, are now used in a large set of projects in automated theorem proving (Müller, 2006a), e-learning (Melis et al., 2006), e-publishing, semantic search (Kohlhase & Şucan, 2006), and in formal digital libraries. OMDOc builds on OpenMaTh or Content MaThML for mathematical formulae and extends this by an infrastructure for context and domain models from “formal methods.” In contrast to those, a structural/semantic approach does not require the full formalization of mathematical knowledge, but only the explicit markup of important structural properties. For instance, a statement will already be considered as “true” if there is a proof object that has certain structural properties, not only if there is a formally verifiable proof for it. This allows OMDOc to be used, for example, as a development/migration format,
SWiM
which offers “structural” services while the content is still “false.” Since the structural properties are logic-independent, a commitment to a particular logical system can be avoided without losing the automatic knowledge management, which is missing for semantically unannotated documents. Work on OMDOc and SWiM shows that many added-value services in knowledge management do not need tedious formalization, but can be based on the structural/semantic level. OMDOc assumes a three-layered structure model for semantic representation formalisms: •
•
•
Object Level: represents objects such as complex numbers, differential equations, finite groups, and so forth. Semantic representation formats specify the logical structure of objects rather than their presentation. This avoids ambiguities, which would otherwise arise from domain specific representations. Statement Level: sciences (natural/social/ technological) are concerned with modeling our environment, more precisely with statements about the objects in it. We can distinguish different types of statements: model assumptions, their consequences, hypotheses, and measurement results. All of them state relationships between scientific objects and have to be verified or falsified in theories or experiments. Moreover, all these statements have a conventionalized structure, such as Exercise, Definition, Theorem, Proof, and a standardized set of relations among each other, both of which are part of the system ontology. For instance, a model is fully determined by its assumptions (also called axioms); all consequences are deductively derived from them (via theorems and proofs), and therefore their experimental falsification uncovers false assumptions of the model. Theory/Context Level: Representations always depend on the ontological context;
even the meaning of a single symbol—for example, the glyph h as the height of a triangle or Planck’s quantum of action—is determined by its context, and, depending on the current assumptions, a statement can be true or false. Therefore the sciences (with mathematics leading the way) have formed the habit to fix and describe the situation of a statement. OMDOc makes this structure explicit. In mathematical logic, a theory is the deductive closure of a set of axioms, that is, the (in general infinite) set of logical consequences of the model assumptions. Even though this fully explains the phenomenon context in theory, important aspects like the re-use of theories, knowledge inheritance, and the management of theory changes are disregarded completely. Therefore, formalisms with context level use elaborate inheritance structures for theories, for example, in form of ontologies in the Semantic Web or in form of “algebraic specifications” in program verification. An important trait of the three-layer language architecture is the inherent dependency loop between the object and theory levels mediated by the statement level: the objects obtain their meaning from the theories their functional components are at home in, and the theories are constituted by special statements, and, in particular, the objects that are contained in them. Making these structures explicit enables the mechanization and automation of knowledge management and the unambiguous, flexible communication of mathematical objects and knowledge that is needed for meaningful interoperability of software systems in science. In fact, we have experimented with extending this three-level architecture pioneered by the OMDOc format beyond mathematics. For physics, we only had to introduce three new statement-level constructs: observables, systems, and experiments (Hilf, Kohlhase, & Stamerjohanns, 2006), for chemistry and computer science only
SWiM
one at the statement level each, and one each at the object level (molecules and code fragments). In particular, the OMDOc theory level did not change at all; it seems to constitute a description of what is sometimes called “the scientific method.” Therefore, we are confident that the conceptual core of the SWiM system presented here can be extended to cover all STEM (Science, Technology, Engineering, and Mathematics) disciplines.
Cross-fertilization of mathematiCal knowledge management and semantiC wiki Even though the work reported here was initially motivated by solving the semantic prisoner’s dilemma in mathematical knowledge management (MKM) described later, we contend that the new application area MKM can also contribute to the development of semantic wikis in particular and semantic work environments in general.
semantic wikis: state of the art and Future Trends A semantic wiki is a wiki with an “underlying model of the knowledge described in its pages” (Wikimedia Foundation, 2006b). The concept of a semantic wiki is not definitive as of 2007, for semantic wikis did not emerge until late 2004. For an overview, see Lange (2007, section 1.4). Common semantic wikis, most of which are still research prototypes, combine the wiki principle with Semantic Web technologies. Most of them utilize formal languages like RDF (Lassila & Swick, 1999) or OWL (McGuinness & Harmelen, 2004) for annotating pages and links with semantic information. SweetWiki (see chapter VIII), for example, features a wiki system ontology with classes like “page” or “user” and lets users tag pages with concepts from several OWL domain ontologies. Other well-known semantic wikis
0
include Semantic MediaWiki (Krötzsch, 2008; Völkel, Krötzsch, Vrandečić, Haller, & Studer, 2006), an attempt to integrate the Wikipedia into the Semantic Web, and IkeWiki (Schaffert, 2006), which is intended as a social tool for knowledge engineering. The first well-known semantic wikis are Platypus (Tazzoli, Castagna, & Campanini, 2004) and WikSAR (Aumüller & Auer, 2005), but they are not under active development any more. In many semantic wikis, including SWiM, IkeWiki and Semantic MediaWiki, one page describes one real-world concept. The page itself and its links to other pages are typed using terms from ontologies. Importing and exporting RDF from and to the semantic Web is possible in most semantic wikis, and the knowledge base can be queried. Some semantic wikis employ a reasoning engine to infer additional knowledge from the explicit annotations that users made. Recent trends in wikis in general include: •
•
•
•
Improved user experience through WYSIWYG editing—as, for example, in SweetWiki and IkeWiki (see chapter III), Integration with other social software like Weblogs—SnipSnap (“SnipSnap,” 2007) is an example of a wiki with blogging functionality and some semantic features (“Labels”), A standardized Web Service interface and data exchange format for automated clients and distributed systems (Völkel et al., 2006), as well as Better spam and vandalism protection (especially on large, public wiki sites like Wikipedia).
Another important aspect is bringing semantic wikis to nonscientists. The by far most successful software in this regard is Semantic MediaWiki, which, despite its intention, is not yet used by Wikipedia (mainly for performance and scalability reasons), but by numerous other sites (Semantic
SWiM
Wiki Interest Group, 2007b). The trends mentioned for general wikis also prevail in the area of semantic wikis, with the addition of: •
•
•
Incorporation of multimedia (metadata/ ontologies/database management systems) (Nixon & Simperl, 2006; Popitsch, Schandl, Amiri, Leitich, & Jochum, 2006) Visualization of the RDF graph of the knowledge inside the wiki (Eronen & Röning, 2006), incorporation of visual mapping techniques (Haller, Kugel, & Völkel, 2006) Formalizing the knowledge in a semantic wiki more stringently—from ad-hoc database queries to DL reasoning (Vrandečić & Krötzsch, 2006; Krötzsch, Vrandečić, Page, et al., 2007).
Most semantic wikis serve general purposes; as of January 2007, there is only one more semantic wiki dedicated to MKM besides SWiM: se(ma)2wi (Zinn, 2006) is an installation of Semantic MediaWiki. OMDOc content from the ActiveMath learning environment (Melis et al., 2006) has been automatically converted to MediaWiki’s syntax via XSL transformations and imported to the wiki. The pages are categorized by their OMDOc statement types (e.g., definition) and annotated with learning metadata from ActiveMath. All other semantic information from OMDOc is lost in the conversion: The formulae are given in presentational-only LaTeX, and the links between wiki pages that represent mathematical statements, for example a link from a theorem to its proof, are not typed, as they are in SWiM.
ers to contribute, wiki-like openness to anybody does not suffice. Unlike the text formats used by common semantic wikis, the OMDOc format makes the fine-grained semantic structure implicit in the text explicit in the markup, making it tedious to author by hand. Moreover, only after a substantial initial investment (writing, annotating, and linking) on the author’s part, the community can benefit from the added-value services supported by the format—for example, the creation of customized textbooks with ActiveMath. If the author and beneficiary of such services were different persons, though, only few persons would be willing to contribute to a knowledge base. This “semantic prisoner’s dilemma” (see chapter XI) can be overcome when the authors themselves are rewarded for their contributions by being offered special added-value services, which improve immediately the more annotations and cross-references the users contribute. In semantic wikis, there are even core services that instantly gratify users for their contribution to the knowledge graph: Dynamic navigation links visualizing part of the semantic annotations in the current context directly depend on the page contents editable by the user and thus instantly open up new views on related topics to the user (Aumüller, 2005, section 3.2). In terms of chapter III, SWiM is both semantically enhanced social software and a socially
Figure 1. The Semantic Web authoring challenge
Benefits of a Semantic wiki for mkm SWiM is intended to encourage collaboration: Nonmathematicians can jointly create a “Wikipedia of mathematics” by compiling the knowledge available so far, while scientists can collaboratively develop new theories. However, to encourage us-
SWiM
enabled semantic Web application: first, it enhances a wiki by ontology-based added-value services and uses a semantic markup format, and second, its added-value services lead to a better connection between contributions of different users (consider, for example, linking assistance by auto-completion).
an alternative semantic web Most semantic work environments are based on ideas and techniques from Berners-Lee’s Semantic Web. The Semantic Web uses RDF triples to describe resources and the background knowledge in ontologies to draw inferences about their content. While, however, in most areas of the semantic Web, resources (anything with an URI, for example fragments of XML documents) and statements about them (the ABox, “assertional box”) are separated from their context (a vocabulary and background knowledge encoded in an ontology, called TBox, “terminological box”), the content resp. context markup language OMDOc can express both of them, using the above-mentioned three levels of formalization: all OMDOc statements must be made in proper context. The context is provided by theories and, syntactically, indicated by a certain content markup. Consider the statements “all differentiable functions are continuous” (background knowledge) and “the derivative of the sine function, evaluated at 0, is 1” (assertion about resources): both can be expressed as OMDOc statements. This turns collections of OMDOc documents into referentially closed systems (all the knowledge referred to, that is, both ABox and TBox, can be expressed in the system itself), which in turn allows ontological bootstrapping: the ontologies needed to draw inferences can be built up as we build up the data. Note that only part of the mathematical knowledge embedded in mathematical documents can be exploited for ontological reasoning, as it cannot faithfully be expressed in first-order logic (much
less so in DL or RDF without reification). For the sake of this argument we will use the term Web ontology language synonymously with “description logics” (DL), as in OWL-DL; if we pass to more expressive logics like first-order logic, we lose decidability and thus the raison d’être for Web ontologies. Consider, for instance, the theorem that all differentiable functions are continuous (C1(R,R)⊆C0(R,R)). This subset relation can easily be expressed in DL (C1(R,R)C0(R,R)), if both C1(R,R) and C0(R,R) are declared as classes, but its justification via ε/δ arguments and the definition of Ck(R,R) cannot. Thus, any Web ontology that deals with objects such as the ones above will necessarily have to approximate the underlying mathematical knowledge. Generally in science, knowledge comes in documents and constitutes the context, whereas DL ontologies only reference and approximate the knowledge in a document. Therefore, with OMDOc we propose an alternative vision for a “Semantic Web for science and technology” where the ontologies necessary for drawing inferences are views derived from normative documents. Taxonomic relations are, however, only implicitly expressed in OMDOc: one can write statements (definitions, assertions, and so forth) about mathematical concepts, which, for example, state that one concept (e.g., the set C1(R,R)) is a subset of another one (e.g., the set C0(R,R)). Where ontological fragments cannot be derived automatically (an interesting research problem in itself), they can be embedded into OMDOc-encoded documents as OWL, and later simply extracted. Thus OMDOc, as a document format with embedded taxonomic information, serves as its own ontology language. Thus, for the time being, we restrict ourselves to that semantic information that can explicitly be expressed in OMDOc: those from the theory and statement level, for example, the import relation between theories. A subset of OMDOc’s system
SWiM
Figure 2.
we will concern ourselves with its information model: what a wiki page should comprise, what semantic information can be inferred from the OMDOc documents and the user interaction logs, and finally how this can be utilized.
data model The smallest unit in a wiki that can be displayed, edited, linked to, or archived is a page. Editing of and linking to page sections is possible in some wikis, such as MediaWiki, but only in a limited way. In a non-semantic wiki, one page is a document with arbitrary contents. In a semantic wiki, one page usually describes one concept; for SWiM, we have to consider the following aspects: ontology covering these levels is presented in Figure 2. In particular, the enhancements of the data model semantic wikis bring along—compared to traditional wikis—are already present in the OMDoc format, so that an OMDOc-based wiki only needs to operationalize their underlying meaning. For example, typed links, which are implemented via an extension to the wiki syntax in Semantic MediaWiki or editable through a separate editor in IkeWiki, are implemented by the for attribute in OMDOc (e.g., ). It remains left to the wiki to make them editable easily and to visualize them adequately. More than a general semantic wiki, one targeted at mathematics must ensure that dependencies between concepts are preserved. Results in this area will be interesting for nonmathematical semantic wikis as well, especially when they support higher levels of formalization such as ontologies.
design of swim Before we can go into the design of SWiM, its user interaction and its added-value services,
•
•
•
Ontology: Which OMDOc elements can be considered as concepts? This should be decided in consistence with the OMDOc system ontology. Usability: Small pages generally improve wiki usability, but it should not be too hard for users to split up a document into small pages. Validity: Wiki pages should stay valid OMDOc XML documents to ensure higher quality.
OMDOc groups closely related concepts into “theories” and advises to follow a “little theories approach” (Farmer, Guttman, & Thayer, 1992), where theories introduce as few new concepts as possible. A theory may introduce more than one concept, if they are interdependent, for example, to introduce the natural numbers via the Peano Axioms, we need to introduce the set of natural numbers, the number zero and the successor function at the same time. We follow this intuition and recommend that users restrict the pages of SWiM to single (little) theories to keep them manageable. Moreover, OMDOc distinguishes the knowledge elements in theories into constitutive
SWiM
ones like symbols, definitions, and axioms (these are indispensable for the meaning of the theory) and nonconstitutive ones, such as assertions, their proofs, alternative definitions of concepts already defined, and examples. OMDOc supports rolling out the latter into separate documents, provided they reference their “home theory.” So does SWiM, and additionally, it allows constitutive elements to be rolled out. Small pages improve the effectivity of wiki usage, as they facilitate editing and re-use by linking and allow for a better overview through lists of recent changes and other automatically generated index pages. To facilitate the import of large documents (e.g., courseware), SWiM does not force users to create small pages; actually, any valid OMDOc document can be a SWiM page. The user is, however, rewarded for giving each statement its own page, as most semantic annotations (e.g., page and link type) are only extracted per page. As the presentation module supports the inclusion of the contents of one document fragment into another document, the reader, or learner, can still coherently view a whole theory on one Web page.
The System Ontology, the Core of SWiM In order to facilitate making statements (here, RDF triples) about OMDOc concepts and, in particular, to facilitate semantic Web applications in re-using the knowledge contained in OMDOc documents, a common vocabulary, that is, an ontology, has to be fixed. As this ontology “express[es] the document structure [. . . ] as an ontology,” we will use the term system ontology coined in (Krieg-Brückner, Lindow, Lüth, Mahnke, & Russell, 2004, p. 280) for it. In Müller (2006b, p. 5), it is more precisely defined as “an ontology describing the data model of a system or the representation language the system and its applications are based on independently of their respective syntactical realization.”
SWiM differs from other semantic wikis in that it comes with a prebundled system ontology of a domain-specific data format. Most generalpurpose semantic wikis allow for importing or modeling arbitrary ontologies for arbitrary domains, but due to the great differences between those ontologies, they cannot utilize them further than for displaying navigation links, some editing assistance, and searching. As the OMDOc system ontology contains uniform terms for semantic relations between mathematical statements and theories, like “dependency” or “containment,”added-value services built on top of SWiM can be formally specified in terms of the system ontology. The most fundamental concepts in OMDOc, which constitute documents or parts of them, are theories and statements. The fundamental concepts in a wiki are pages and users. Basic relations are provided by the individual theories and statements. Then, there are basic relations given by the user interaction logs. Further, inferable relations can be defined as transitive closures of the former and as unions of OMDOc and wiki relations. Finally, there are other useful relations that the authors have to provide by manual annotations.
OMDOc Types and Relations Relations between OMDOc concepts include the import of one theory by another, the containment relation between a statement and its home theory, and the further specification of one statement through another one, for example, definition defining a symbol, an example exemplifying a statement, or a proof proving an assertion. All of the former ones can be subsumed by a generic “concept–depends-on–concept” relation, which is transitive. So far, SWiM does not extract semantic relations from OMDOc’s object level. The latter would be suitable for a future integration of computer algebra systems or automated theorem provers.
SWiM
Relations Given by User Interaction (to be implemented) The basic relation given by user interaction is, “Who edited which page when?” This information is available for free in a wiki; it can be logged when a page is saved. Accordingly, a relation that states that a user read a page could be defined—this is, however, hard to determine because of HTTP caching. Further relations can be defined by user feedback to navigation choices proposed by the wiki.
Inferable Relations (to be implemented) Further relations can be inferred from those introduced so far, for example a metric estimating the degree of difficulty of a concept, calculated by counting the questions on its discussion page, as soon as discussion pages, as known from MediaWiki and widely used in Wikipedia, and additional OMDOc elements for structuring them (questions, explanations, opinions) have been implemented. From the user interaction log, sets of related pages can be identified, which are not already related through dependency. For this purpose, a notion of transaction must be introduced, that is, edits carried out by the same user in short succession. Just as for products bought in online shops, two theories are considered “related” when many users edited them in the same transactions. Even more sophisticated relations can be inferred from both OMDOc and the SWiM interaction relations. The software could, for example, track, how many examples to a theory users read and improve the difficulty estimation by including those statistics.
Metadata Furthermore, SWiM should allow the user to search and enter metadata associated with any wiki page. OMDOc allows for annotating documents and fragments of them with Dublin Core metadata
and Creative Commons licensing information (Kohlhase, 2006, Chapter XII). Part of them, such as the last editor and the last editing date of a page, are recorded by any wiki, while others, such as dc:description, will be made editable via a separate form. IkeWiki, on which the system SWiM is based, supports metadata about pages, thus integrating OMDOc metadata into IkeWiki’s user interface is an important to-do item.
OWL-DL Implementation So far, a subset of the OMDOc system ontology has been modeled in OWL-DL, covering most of the ST module (statements, Kohlhase, 2006, Chapter XV) and a small part of the PF module (proofs, Kohlhase, 2006, Chapter XVII). Other parts of OMDOc still need to be modeled, for example abstract data types (module ADT, Chapter XVI)), complex theories and development graphs; modules CTH and DG (Chapter XVIII). The concrete implementation of the ontology has been designed in a modular, extensible way using owl:imports. One main file includes the files that model the respective OMDOc modules, and each of them inherits the generic base classes and relations defined in another shared document. It is being continuously expanded and adjusted to the needs of SWiM and other OMDOc applications. Figure 2 shows the most important concepts (in OWL: classes) and relations between them (in OWL: object properties) that have been modeled.
Extracting Semantics from OMDOc XML per se does not carry any explicit semantics, so ways of extracting the implicit semantics—as, for example, specified in a human-readable documentation of the respective XML document format—and making it explicit as an RDF ABox (possibly backed by an RDFS or OWL TBox) must be found. An informal solution to that problem
SWiM
is writing an ad-hoc XSL transformation that is specialized to a specific XML input format (OMDOc in our case) and outputs an RDF/XML ABox using a specific ontology (the OMDOc system ontology in our case), as proposed by the GRDDL (Gleaning Resource Descriptions from Dialects of Languages, World Wide Web Consortium, 2006) W3C standard. SWiM cannot use RDF/XML directly; an in-memory representation of an RDF graph as a result of the ABox extraction would be preferred. So far, this is hard-coded in a mixture of procedural and declarative style, but we are looking for a more formal solution, which is more scalable. WEESA (Reif, Gall, & Jazayeri, 2005), a powerful declarative language that defines a mapping from XML to RDF using classes and properties from a given OWL ontology and XPath expressions for extraction, looks promising and will be evaluated for integration into SWiM.
user interface Rendering Pages are presented to the user in a human-readable form (XHTML plus presentation MaThML) generated by a multi-pass XSL transformation (Kohlhase, 2006, Chapter XXV). The XHTML contains inline hyperlinks where appropriate, for instance, from a symbol in a formula to its definition. As OMDOc documents, however, need not contain any human-readable sections or comments—after all, the knowledge base might be used to support a theorem prover, not to create a textbook!—there is also a source code view with lines indented, keywords highlighted and URIs displayed as hyperlinks. An intermediate view mode that displays mathematical objects in the source code as formulae using MathML or TeX -generated images is planned.
Dynamic Navigation Links Displaying dynamic links to concepts related to the concept described on the current page—as, for example, import links on a theory page—improves usability by answering the questions “Where am I?” and “Where can I go?” (Aumüller, 2005). If the knowledge base contains RDF triples with the current page as subject or object, these triples are displayed on a navigation bar (labeled “references”), grouped by incoming and outgoing links and by their type (i.e., the predicate; see Figure 3).
Editing While wiki pages can be edited in a WYSIWYG editor in IkeWiki, this is not yet possible for OMDOc pages in SWiM, as OMDOc is much more complex than wiki markup. Currently, OMDOc pages can only be edited in a slightly simplified raw XML representation. Following the wiki spirit, there is a short-hand syntax for links: A statement on a theory page can be referenced as theory#statement instead of the full URI reference. More comprehensive measures to make editing more user-friendly are planned.
implementation notes SWiM, as it is presented in this chapter, is currently in a prototype stage under active development, which is available under the GNU General Public License from http://kwarc.info/projects/swim/. We have based our system on IkeWiki as a development platform because of the modular design of back-end and GUI, its rich Semantic Web infrastructure, including the Jena Semantic Web framework (“Jena,” 2007), the Pellet OWL-DL reasoner (Sirin, Parsia, Grau, Kalyanpur, & Katz, 2006) and RDF import/export, and the user assistance for annotation. IkeWiki is implemented
SWiM
Figure 3. A theory page with navigation links
in Java, using Java Server Pages. The wiki pages are stored in a PostgreSQL database, which also hosts Jena’s RDF triple store. The triple store not only contains page metadata and semantic links extracted from the pages, but also RDF representations of the ontologies used by the system—including Dublin Core for the metadata, as well as the OMDOc system ontology. Some parts of SWiM are, however, very different from IkeWiki’s operating principles and hence required substantial amounts of refactoring and rewriting; for example: •
•
The presentation view of an OMDOc page, for example, cannot be generated by a singlepass XSL transformation from OMDOc to XHTML+MathML; instead, the multi-pass OMDOc presentation workflow (Kohlhase, 2006, sec. 25) had to be adopted. In contrast to wiki text pages, which can (optionally) be edited in a WYSIWYG edi-
•
tor, OMDOc pages can only be edited via the source editing interface so far. The semantic relations between OMDOc theories are not exclusively stored as RDF triples, as is the case with semantic relations between IkeWiki pages; instead, SWiM has to keep the annotations in OMDOc synchronized with the knowledge base, which is still used for reasoning.
We did not integrate OMDOc into IkeWiki by replacing code handling traditional wiki text by new, OMDOc-specific code. Instead, we extracted generic code to base classes and created subclasses both for wiki text and for OMDOc. This peaceful coexistence of two formats can easily be extended to other semantic formats, such as future OMDOc derivatives for other sciences. The prototype, see Lange (2007) for details, still has many limitations, which we plan to overcome in the future (see https://trac.kwarc. info/swim/ for bug tracking and feature plans).
SWiM
Figure 4. Refactoring of wiki classes before adding OMDoc functionality
• •
•
•
•
•
Feedback in case of invalid XML and semantic inconsistencies needs improvement. The ABox extraction (i.e., the mapping from XML elements to ontological concepts) should be formalized instead of being hardcoded. The metadata editor and storage facilities of IkeWiki and the metadata capabilities of OMDOc must be integrated. The rendering must be improved w.r.t. readability (rendered formulae in the source view) and navigation (inline links not only in the navigation box, but also in the document view). Exporting the ABox to other Semantic Work Environments as RDF/XML is possible, but importing knowledge must be implemented: Importing RDF is possible, but it is not accessible through wiki pages, and importing OMDOc is not yet supported. So far, we have not optimized SWiM for performance. The rendering of pages in presentation mode will receive major optimizations, for example, by caching.
future work: swim as a serviCe platform Over the next 2 years, we plan to evolve SWiM into an integrated platform for added-value services for technology and science.
The base system of SWiM—that is, IkeWiki with OMDOc as its page format—is almost complete, and its advantages and shortcomings have been evaluated (Lange, 2007, section 5). Realizing the features for lowering the burden of contributing to a SWiM site, for rewarding users for their contributions and for exploiting the contributed knowledge will, however, is not be possible through simple hacks or ad-hoc development. Furthermore, it is crucial that the system and the services will be independent of a particular scientific domain and the content entered into the system—otherwise, the system itself would have to be ported to a new domain, not only the disciplinary content exchanged. So to arrive at a sensible level of integration, we need a service interface that only builds on the OMDOc system ontology. Furthermore, we aim to make the system as independent of the concrete OMDOc language as possible (to make it applicable to other sciences); therefore we plan to abstract the system ontology to an “upper system ontology”1 (that models the information about “containment” or “dependency” needed by the services), of which the OMDOc system ontology is a specialization. This will allow application of certain services to all specializations of this upper system ontology, that is, to other scientific markup languages. We will not restrict the system ontology to the language part, but model the whole system (i.e., SWiM) in the system ontology, which is part of
SWiM
the system architecture, not editable by the user. Note that the wiki-specific part of the ontology is already available in IkeWiki. The SWiM prototype does not yet utilize more of it than IkeWiki already does, but some of the planned interactive services will. Basing part of the system’s behavior on an explicitly represented ontology rather than hard-coding it is beneficial, since it supports conceptual clarity and rapid prototyping. Including the SWiM components into the ontology allows including, for example, user interactions into the system model. To determine the requirements for the service API, we will formally specify some of the planned services with regard to the following aspects:
Navigating the Dependency Graph
•
•
•
•
•
•
What kind of added value does it provide to individual users and to the community, respectively? How much knowledge does it use? Just the text of the current page, semantic information about the current page, or (explicit or inferred) information about relations between multiple documents? Does it only depend on the system ontology behind the data format used (e.g., OMDOc), or does it rely on a specific notation (e.g., XML)? Does it offer a one-step workflow, or does it require additional user feedback, before further steps are executed? Should it be entirely implemented as a part of SWiM, or does it make sense to integrate an external tool instead?
Figure 5.
The user shall not only be able to navigate along the dependency graph given by the transitive closure of the depends relation using dynamic navigation links; he shall also be able to explore dependencies interactively. Suppose that the user is currently reading a page about the theory ring, which depends (via import) on group and monoid, with group depending on monoid and monoid in turn depending on semigroup. In this case, the wiki shall not only display navigation links to the direct dependencies group and monoid, but it shall also provide unobtrusive buttons that allow the user to give one of the following acknowledgments:
•
•
•
No, thanks! “I already know group and monoid, please let me just read about ring.” Explain “Please show me group and monoid so that I can learn about ring’s prerequisites.” Group and monoid will be displayed. Explore “Please show me all prerequisites for ring.” In our example, these are group, monoid, and semigroup, which could be opened in separate windows or serialized into one page. Suspend “I want to know about group and monoid, but only later.” The system keeps a notice in the user’s profile that he wants to read group and monoid sometime. Reminder links to suspended concepts are shown on a separate navigation bar.
Figure 6.
SWiM
Not only the last case should be recorded—the others are interesting as well for social bookmarking. For example, if many users requested a concept c to be explained, the system could default to display not only the direct dependencies but also the level-two dependencies, for it seems that c is too difficult for only being explained shallowly. Furthermore, the system will not only keep track of which concepts the user wants to be explained, but also which concepts the user has already learned. For each concept, a button will be offered for telling the system “I have learned this.” Links to concepts learned can then be disabled or displayed in a more unobtrusive color than links to concepts that are new to the user. Concretely, this service comprises two steps: 1.
2.
The first step (making suggestions, presenting the action buttons) requires as input the currently viewed concept c, a reference to the dependency graph, and, possibly later, the user’s preferences and background knowledge. Its output is a tree t of concepts that c depends on. DL reasoning is needed in this step. The second step (displaying wanted pages, bookmarking) requires as input c, t from step 1, and the information which button the user pressed. It returns the desired pages to the user and keeps a bookmarking record in the knowledge base.
Another idea for improving navigation is presenting the details of theory imports together with the respective navigation links. If inheritance via renaming is used, for example, when fixing the multiplicative structure of a ring by importing the monoid theory, then the morphism that maps the monoid’s ◦ operator to the ring’s multiplication operator · and renames the identity element from e to 1 could be displayed on request. It remains to be investigated whether theory morphisms can be adequately modeled in the OWL-DL system ontology, that is, whether this service can be implemented on top of the ontology part of the
0
service interface. For example, in OWL-DL, it is not possible to annotate an import link between two theories with an additional property, that is, the morphisms used for importing.
Change management assistance So far, there has not been any approach to preserving dependencies between pages in a semantic wiki. Tracking dependencies and reasoning about them is an integral part of mathematical practice and hence cannot be neglected in a semantic work environment for mathematics. Known results are often generalized to find the “real” dependencies, mathematical theories and books are rewritten to make dependencies minimal. In the special case of OMDOc, where dependencies need not be formally verifiable when they have sufficient structural properties, a dependency could formally be broken but seem intact to the system nevertheless. If, for example, a theory t depends on a theory t’, which can be edited independently from t and still is under development, modifying t’ might break t because some axiom in t’ required by t might have been changed. A lightweight, simple-minded solution to this problem is based on the evident approach to have all hyperlinks not point to concepts in general but to specific versions of them. (Recall that old versions of a page are available for free in a wiki!) When an author enters a link to a theorem thm, for example, this reference will be stored internally as thm/v _ latest. Editors of documents with outdated links will be offered to update them: 1.
2.
For each concept di linked from the currently edited concept c, the version numbers vi in each link are checked for up-to-dateness. All (di, vlatesti ), where vlatesti > vi , are returned. The editing box on the user interface shows indicators next to each updatable link di. If the user clicks on such an indicator, a pop-up dialog with links to di/vi (i.e., the old version of page di), di/vlatesti and a comparison
SWiM
3.
page diff(di/vi, di/vlatesti) showing an XML diff, probably formatted by a diff viewer2 , as well as two buttons labeled “keep” and “update” appears. An alternative user interface could allow the user to select multiple links with check boxes and have all of them updated at once. If the user wants to have the link di/vi updated, its source code is replaced by di/vlatesti.
Note that this specification is independent of the definition of a “dependency link” in the system ontology; it does, however, make sense to restrict the algorithm to those links that actually denote dependencies. Furthermore, manual confirmation of an update should only be demanded when the semantics of di has changed. As this is nontrivial to decide, editors should have the possibility to classify their changes. Many wikis already distinguish between “major” and “minor” changes. More sophisticated classifications of changes in the context of ontologies are discussed in Klein, Fensel, Kiryakov, and Ognyanov (2002), Müller (2006b), and Plessers and Troyer (2005). A second service could warn the user upon changing a concept e that many other concepts f i depend on, that changing the semantics of e might affect the integrity of the f i: the service queries the knowledge base for a list of all concepts {f i}i depending (via the transitive closure of the dependency relation) on the currently edited concept e. To the user, this list can be presented as a warning like “n concepts depend on the one you are editing.” If the user makes a major change (the classification mentioned above) that affects many dependents, SWiM could offer copying e to a new concept e’ leaving e intact for the dependents to reference. So far, we have described a simple-minded assistant that helps human editors to prevent common mistakes in change management. Real change management solutions both from the domains of structured data formats and ontologies will be evaluated in the context of SWiM: The
locutor project (Müller, 2006b) developed in our group aims at realizing a practicable ontology-based change management for OMDoc and other structured document formats on top of informal document engineering processes. Change management for generic structured documents with semantic interrelations has been realized for MMiSSLaTeX (Krieg-Brückner et al., 2004), a structured LaTeX-based format whose system ontology served as an inspiring example for OMDOc’s system ontology. Change management for mathematical and logic-based documents with strong dependencies modeled by development graphs, which are also supported by OMDOc (Kohlhase, 2006, Chapter 18.5), but not yet regarded by the OWL-DL implementation of the system ontology, has been formally investigated in (Hutter, 2004). Change management for OWL ontologies is described in (Klein et al., 2002). We anticipate that an integration of locutor can turn the change management assistant introduced above into an automated service that requires less human reasoning and is therefore tractable for larger document collections. locutor allows users to classify their changes a “taxonomy of change relations” and will be able to find out whether one change actually affects other documents by computing “long-range effects” of changes (Müller, 2006b). It will either be able to automatically make adaptations required after a change, or at least it will pinpoint all effects of changes, so that a human editor would exactly know what to fix. A locutor prototype is currently being implemented as a modular extension of the version management system Subversion (“Locutor,” 2007); thus, it can be integrated into SWiM as an alternative back end.
User-Friendly Editing In contrast to usual wiki text, OMDOc’s XML syntax is hard to edit because of its verbosity and the XML validity constraints. Therefore, SWiM must offer some assistance to authors. We
SWiM
are currently investigating several approaches: a very simple form of authoring assistance is to split documents into smaller parts to enhance manageability. For the Dublin Core metadata in the OMDOc documents, we can utilize the formbased editor from IkeWiki. Generally, we want to integrate external OMDOc editors that offer assistance on their part, and develop a simplified syntax for text fragments that do not use the full representational power of OMDOc. Note that all of these measures can be combined. Two other editing assistance measures warrant a closer look and a more thorough discussion.
2.
Section-Wise “Edit in Place” Users can manually split large OMDOc documents into their individual concepts in the SWiM prototype, but in some cases, even a single concept can be too large to edit—consider a complex proof with many steps or case distinctions. Some wikis, such as MediaWiki, allow for editing individual paragraphs of pages (Wikimedia Foundation, 2006a). The Edit-in-place interface (The Connexions Team, 2006) of the Connexions system (see also chapter XI) is even more user-friendly; it is particularly suitable for editing or inserting sections in large XML documents with shallow structures. The documents are displayed in a near-WYSIWYG view, but clicking on a section replaces its view by a text area containing its XML source. The section can be deleted or edited and saved by an asynchronous request to the server. The user interface to the edit-in-place service will be embedded into the presentation mode: If the user has the permission to edit a page, an “edit” button is added to every section. The user interacts with the service in three steps: 1.
In the first step, the current page is structured section-wise. This is implemented as a part of the transformation of the page for presentation. To determine which XML elements
3.
denote sections, the service first queries the system ontology of the language of the current page for subclasses of the “section” class from the upper system ontology; then it queries the ontology-to-XML mapping for those XML elements that represent them. The output is a presentational view of the page, with each section marked (e.g., by a frame) and prepared to be edited upon user’s request via embedded JavaScript code. The user triggers the second step by selecting a section for editing. The service is provided an identifier of the section (e.g., its xml:id or an XPath expression). The presentation view of the respective section is replaced by an editing box with the source code of the section’s text and buttons for saving and canceling the edit as well as deleting the section. “Save” and “Delete” requests are sent to the server, while “Cancel” can be handled on the client by just replacing the editing box with the cached presentation view. If the user requests the section to be saved or deleted, the third step is triggered. The source code of the page, with the respective section replaced or deleted, is stored to the page database. RDF triples whose subjects are fragments within the section are re-generated. While it is desirable to have only the replaced section re-rendered for presentation—instead of the whole page—it is open whether that is possible using the available transformations of the respective scientific markup language to XHTML.
For mathematical formulae, Edit-in-place must be augmented with specialized facilities that cater to their deeply nested structure, and theory-driven visibility of symbols. This can be done either by short-hand languages for mathematical formulae like LaTeX or even QMath (Palomo, 2006a), using a batch processor that transforms these to OMDOc. Or SWiM could employ specialized
SWiM
OMDOc editor plug-ins at the formula- or even page level and make use of the existing OMDOc editing tools, for example, an OMDOc-mode for the popular Emacs editor or Sentido (Palomo, 2006b), a Mozilla extension for near-WYSIWYG editing and browsing OMDOc documents.
Auto-Completion When the user enters a link like CE + CR. Figure 1 shows the general personal knowledge management dilemma: we never know if our costs of authoring and retrieval are worth the effort.
Cost of Authoring A rather simple form of authoring is writing plain text. In a next step, a user can add formatting to the text, which gives it a structure. For example, different levels of headlines imply a tree structure. Upon retrieval, one can then more easily navigate
to the relevant part of a document. Furthermore, a user can add cross-references to other parts of the same text or to other documents. In academic works, such cross-links enhance the chance for successful retrieval; for example, a citation to another work makes the other work more likely to be found. Each step of externalisation described so far lowers the cost for retrieval (CR) but comes at the expense of higher externalisation costs (CE). Articulating semantic statements about concepts or semantic links between concepts has high costs in many tools (e.g., ontology editors such as Protégé), since the user must change his familiar user interface paradigm (document and text editing) for a different paradigm (ontology editing) and must fully formalise all content or loose the relation between the concepts in the document and the concepts in the ontology.
the Cost gap To sum up the situation, current tools offer two primary choices: On the one hand, one can make very “cheap” notes, which have little structure and thus have a low chance to be found other than through a keyword match on the full-text. This low-cost strategy (comparable to, for example, post-it stickers) does not scale if many documents about the same topic exist. More formally, the cost of externalisation (CE) is low, but the costs
Personal Knowledge Management with Semantic Technologies
of retrieval (CR) are high, as one needs many searches before finding a note again—if it is found at all. On the other hand, one can use a custom database application or ontology editor with high costs of externalisation to lower the search costs. The high costs, due to the paradigm switch and the strict constraints of the editing environment, are often too high for everyday information management as one cannot sit down and design a database or ontology when trying to externalise some fact obtained, for example, from an e-mail. This PKM cost gap can be bridged in two ways. One can lower the cost of search on collections of loosely structured notes. Desktop search engines do this by offering fast full-text search. Other approaches like ontology learning try to extract structures from unstructured data. Alternatively one can lower the cost of authoring information with explicit structure and/or semantics. This will also lead to lower search costs, as more structured data leads to higher precision (Kando, 1997). The second approach, lowering the cost of authoring structured knowledge, is the one we pursue in this chapter.
Outline In the remainder of this chapter, we present general requirements for PKM obtained from a literature study. Then existing “classic” (nonsemantic) PKM tools are reviewed. Next, the notion of a semantically-enhanced PKM tool (SPKM), see Figure 3, is introduced as a novel approach to the PKM problem. The following section reviews existing prototypes of SPKM tools and lists their drawbacks. We then present personal semantic wikis as a best-of-breed implementation of the SPKM vision. In this approach we do not use wiki technology as a community platform but as a personal authoring environment. The final section discusses future trends and concludes the chapter.
0
reQuirements for personal knowledge management We live in a knowledge society in which people have access to tremendous amounts of information and communicate with others all over the world. Although most of this information and communication is digital, people still predominantly use traditional analogue techniques: several studies (Blandford & Green, 2001; Campbell & Maglio, 2003; Jones & Thomas, 1997) show that simple to-do-lists, paper calendars, address books, and diaries are the most commonly used tools, all of which are analogue paper-based tools. Although e-mail is increasingly used for some personal information tasks (Bellotti, Dalal, Good, Flynn, Bobrow, & Ducheneaut, 2004; Whittaker, Bellotti, & Gwizdka, 2006), it is mostly unstructured (Dai, Lutters, & Bower, 2005).
Requirement 1: Acceptable Costs As argued, a working PKM solution must have acceptable costs both for externalisation and for search. Based on an exhaustive literature analysis of information management and knowledge work (Oren, 2006),1 we derive six additional requirements for personal knowledge work.
Requirement 2: Focus on the Individual Many works on information and knowledge management indicate that “knowledge is ultimately created by individuals” (Polanyi, 1958). This motivates our conceptual shift from organisational knowledge management to personal knowledge management. Individuals are personally committed to knowledge creation (Nonaka, 1994) but this commitment relies on their intentions (understanding and actions of individuals) and on their autonomy (self-motivation and freedom to pursue their intentions). Personal autonomy is
Personal Knowledge Management with Semantic Technologies
thus crucial and individuality and freedom should be supported by tools. Traditionally, three types of office workers can be distinguished (Kidd, 1994): the knowledge worker (who creates new knowledge), the communication worker (who amplifies information and connects people), and the clerical worker (who manages documents). All three of them perform administrative tasks, which are repetitive, structured, and document-driven, and research tasks, which are flexible, unstructured, and information-driven, but do so in different proportions: clerical workers perform mostly administrative tasks, while knowledge workers mostly do research tasks. These research tasks usually have no structured procedures and workers have no clear idea of their next steps. The unstructured nature of their work makes it difficult to support knowledge workers in general, and highlights the need for their personal autonomy and freedom.
Requirement 3: Forget Rigid Classification Strict hierarchical classification does not work for many individuals (Hearst, 2006); therefore we see a rise of more flexible structures: tagging and ontologies. Many people categorise information mainly by context (Lansdale, 1988) and retrieve information based on contextual clues (associations). Contrasting, most classifications in the computer are hierarchical (e.g., file folders), which leads to usability problems (Randall, Bowker, & Leigh Star, 2001). Although research has shown that people prefer category-based search over keyword-based search (Yee, Swearingen, Li, & Hearst, 2003), using strict hierarchies of categorisation worsens the results since they do not work well for most people. The activities of knowledge workers are highly unstructured; the intended use of documents (relevant for later retrieval) is often unclear at the time of filing (Bondarenko & Janssen, 2005; Kwaśnik 1983).
Requirement 4: Follow the Links Although desktop search greatly helps individuals in retrieving their own files, users actually prefer to find information not by searching and jumping to it but by orienteering and browsing between related items (Teevan, Alvarado, Ackerman, & Karger, 2004). Such exploration techniques are not only preferred in an unfamiliar domain (or when browsing somebody else’s data) but also when investigating personal information spaces (Cutrell & Dumais, 2006; Schraefel, Wilson, Russell, & Smith, 2006). This preference stems from the fact that people often have difficulties unambiguously specifying what they are looking for and thus prefer browsing over searching (Barreau & Nardi, 1995). Furthermore, human memory can greatly benefit from context information which is enabled by exploratory browsing between related items. Users clearly benefit from being able to follow the links in their data and browse their information space. The ability to automatically identify and present links between relevant data requires the data to be interlinked and semantic relations between data elements to be identified.
Requirement 5: Remember the Context Contextual information is crucial in human information retrieval. People categorise information based on its context and recalling therefore also benefits greatly from contextual information (Lansdale, 1988). In a personal classification system, the classification of a document is strongly affected by its intended use and context (Kwaśnik, 1983); that means that we should capture such document context to facilitate retrieval. Dumais, Cutrell, Cadiz, Jancke, Satin, and Robbins (2003) report that rich contextual information such as people’s names and dates, that is, the personal and temporal dimensions, provide beneficial cues for retrieval in a personal information system and seem even more important than standard ranking
Personal Knowledge Management with Semantic Technologies
functions. Which context information to use in knowledge management is still an open question. Often used dimensions are the five “who, what, when, where, how” or the “who, what, when, where, why” dimensions (Ranganathan, 1962).
Requirement 6: Value the Power of paper Despite all the progress of our PC desktops, physical paper remains a crucial tool for most people. Even the advent of the Internet has only increased the amount of printed paper (Sellen & Harper, 2003, p. 7). This seeming paradox (a continuous rise of paper usage together with an increased use of digital media) is understandable since (i) the Internet gives people access to more and more information which often ends up printed, since (ii) printing technology is evolving, making printing easier, faster, and better, and since (iii) paper is extremely well-suited for authoring, reading, and reviewing documents (Sellen & Harper, 2003, p. 13-15). Research has shown that reading stimulates writing (Adler, Gujar, Harrison, O’Hara, & Sellen, 1998) and that paper is very well-suited for combined reading and writing because of its possibilities for flexible, spontaneous annotation (draw or write with a pen), quick navigation between documents (simply flip a page), and spatial layout to form a mental model of documents (O’Hara & Sellen, 1997). A recent study showed that 85% of reading activities were paper-based although almost all information was available to people digitally, on their computer. It seems that digital technologies, rather than replacing paper, shift the point where paper is used: from publishers to the individuals who download, print, and read documents. Also, paper fits the requirements for individual autonomy and freedom very well: compared to digital media, paper puts almost no restrictions on the content that can be written on it (sketches, text, or both).
Requirement 7: Keep it Simple Perhaps the most important requirement shown in the literature is the simplicity of tools. The adoption rate of digital technology for personal information (even in today’s age of ubiquitous mobile phones and PDAs) is quite low (Jones & Thomas, 1997). This relatively low adoption may be explained by the unawareness of most users with the features of their existing tools; most people, for example, are even unaware of long-existent features such as automatic e-mail filtering (Bondarenko & Janssen, 2005). Research has also shown that people use many tools quite differently than expected; for example, people use folders on their computer not only to organise their files but also to remind them of tasks, to decompose problems, and to plan their work (Barreau & Nardi, 1995; Jones & Thomas, 2005). Simple tools do not constrain user workflows but support their autonomy and are easy to understand and use.
Requirement 8: Keep the Flow As an additional requirement from the authors own observations, a support for personal work processes is crucial: One must allow the information and structures to travel from one application in a workflow to the next. We illustrate this requirement with an article writing scenario. Writing is a central activity for all kinds of knowledge workers. They have to compile reports or write articles for scientific journals. Writing an article includes reading related work—and often also taking personal notes on it. Then an outline or other kind of high-level structure is usually created (Esselborn-Krumbiegel, 2002). This structure is then refined to the argument-level and fleshed out into a linear text. Next, the text is partially rewritten and references to figures, other sections and literature are added. Eventually the text is typeset and thus the final layout is created.
Personal Knowledge Management with Semantic Technologies
state of the art in “ClassiC” pkm approaChes and their proBlems Several different approaches for personal knowledge management exist. In the following, we concentrate on popular tools used for this task: Weblogs, wikis, mind-mapping, and personal information management (PIM) tools. We also discuss common problems for personal knowledge management in these tools.
weblogs A Web log is a specific kind of personal Web site where content is published in the form of small articles and displayed in reverse-chronological order (latest article first), similar to a personal but publically readable diary. Readers can leave comments to articles. A particularly interesting feature of Weblogs is the so-called Trackbackfunction which allows a Web log to become aware of other Weblogs that refer to its articles, and the so-called Blogroll which lists or even syndicates other Weblogs that are interesting to the owner. These two features together allow forming the so-called Blogosphere, which describes the network formed between personal Weblogs by hyperlinks. Weblogs are primarily interesting not because of their technology but rather because of the social interactions they enable by commenting and referring to each others Web log articles. A Web log unfolds its true value only when connected with others. In personal knowledge management, Weblogs are mainly used to record ideas and other knowledge bits quickly. Since articles are usually publically readable, external users can comment on them and thus help improving the quality. On the other hand, private, confidential, or not yet very elaborate ideas should often not be publically available; some Weblogs support this by restricting
the group of people that can read articles. In Weblogs, access to knowledge is primarily oriented on the time-axis, that is, browsing through articles primarily by publication time. Some Weblogs allow users to associate articles with additional categories that support a more selective browsing through the articles.
wikis A wiki (Leuf & Cunningham, 2001) is essentially a collection of Web pages connected via hyperlinks. Many different wiki systems exist, but they commonly have a straightforward interface for editing content with a simplified syntax that makes it very easy to set hyperlinks to other pages within the wiki. Therefore, content in a wiki is usually strongly connected via hyperlinks. Furthermore, editing of content in wiki systems is Web-based and access is often unrestricted or at least hardly restricted. Most wiki systems also provide a rollback mechanism for reverting back to previous versions in case of accidental or undesired changes. Wikis are used in many areas, like encyclopaedia systems (e.g., Wikipedia), as personal or group knowledge management tools, as collaboration tools, or in collaborative learning environments. Concerning personal knowledge management, wikis can be used as very flexible and powerful tools (Wagner & Bolloju, 2005). The simple wiki syntax allows users to easily create content with a low technical barrier, and since it allows entering free text, the user is free to choose his own process and workflow. Easy hyperlinking allows connecting knowledge for navigation purposes and improves the retrieval of relevant knowledge. Cleverly used links and back-links allow also for hierarchies and categorisation. The plug-ins available for many wiki systems provide additional means of entering and visualising content, for example, as a calendar, as a task list, and so forth. Content can easily be made publically avail-
Personal Knowledge Management with Semantic Technologies
able, and most wikis allow restricting access to content to specific user groups. It is also possible and useful to install a wiki on the user’s personal computer. There even exist projects that aim to install a personal wiki on a USB stick. The main disadvantage of wikis with regard to knowledge management is that whereas creating content is easy, retrieval usually is not. Users are often limited to full text search and hyperlinks, which often does not allow to find the relevant content, for example, when synonyms are used or when the hyperlinks are not set properly. Also, wikis that are not maintained very well tend to be rather chaotic, which further aggravates the problem. For note-taking, wikis typically do not allow users to indicate the context of notes (notes are usually related to projects, tasks, responsibilities, e-mail communication, etc.). Therefore users cannot query their personal notes, for example, to find all notes about a certain e-mail discussion or navigate between two notes related to the same project. All navigation links have to be added explicitly because current applications do not support the personal meaning (semantics) of the notes.
sheet of paper/the computer screen only suffices for the visualisation of relatively few concepts at once) and the fact that a mind-map is always centred on a single concept.
personal information management (PIM) Tools Personal Information Management tools usually offer ways to manage a calendar, appointments, tasks, and addresses. Often, e-mail and simple note taking are also included. Popular PIM applications are MS Outlook, KDE Kontact, and Gnome Evolution. Recently gaining importance is the combination of the Google services Google Mail, Google Calendar, and Google Docs. The main advantage of PIM systems is that they integrate the different knowledge sources of office workers into a single interface. Unfortunately, most PIM tools are simply a collection of different systems and the actual information integration is not very deep. Neither is it possible to query the information nor are the relations between information items explicitly represented.
mind-mapping Mind-mapping and concept mapping are methods for structuring ideas, words, topics, or other items using graph representations. Centred around a key concept, further items are arranged around the centre concept. Mind maps are used in many different areas, for example, brain storming and collection of ideas, note taking, learning, summarising, structuring, and so forth (Haller, 2002). Mind-mapping is not bound to computers; the method is much older and can equally or even superiorly be used on paper. Mind-mapping is a traditional tool for personal knowledge management. Its main advantages lie in the structuring of content and the quick overview one gets when looking at a mind-map. The disadvantages are the limited scalability (a
summary of existing approaChes Weblogs are meant as a tool for the individual (Req. 2) for publishing and do not work well for restructuring knowledge. Strengths of Weblogs and wikis are handling of links (Req. 4) and simplicity (Req. 7), but Weblogs lack acceptable search costs (Req. 1), context of notes (Req. 5), and import/export support (Req. 8). A comparison table can be found in Figure 2. A wiki is great to keep a big repository of linked snippets, but it is hard to get an overview of the stored content. Wikis focus mainly on collaborative usage (Req. 2); their interlinked nature allows finding content by associative browsing (Req. 1).
Personal Knowledge Management with Semantic Technologies
Figure 2. An overview of existing PKM and SPKM tools; requirements
Legend: + = feature present, ~ = feature partially present
Figure 3. Semantic personal knowledge management supporting externalisation (authoring) and internalisation (learning) of personal knowledge
Mind-map tools are probably the only tools which have a paper-like work style (Req. 6). They are simple to use (Req. 7) and often have export functions from mind-map to textual outline (Req. 8), however usually additional links and icons are lost and re-import is impossible. PIM tools are good for the narrow domains they are designed for, but fall short when a set of structured notes has to be managed (Req. 1). At least they often can export structured content in usable formats (appointments, contact data, etc.). Overall, acceptable search costs and integration into workflows are the weakest points of existing PKM tools.
introduCing semantiC personal knowledge management (SPKM) We define an SPKM tool as a PKM tool that supports personal knowledge management by managing additional explicit semantic representations. An SPKM tool allows a user to maintain a personal explicit representation of his world view. Instead of large document collections a knowledge worker can centre his work on a single personal knowledge model (Völkel, 2007). For an SPKM tool to be useful, it should offer a good ratio between the effort it takes to externalise and (partially) formalise knowledge and
Personal Knowledge Management with Semantic Technologies
the benefit gained by enhanced query answering and browsing abilities. Figure 3 depicts the knowledge processes described in Nonaka and Takeuchi (1995) as they happen in PKM. Socialisation (direct face-to-face) becomes rare in an always online world. As most communication with oneself (PKM) and others (collaboration) happens through digital channels, efficient externalisation and internalisation are becoming the PKM bottleneck. By mapping structural features of a document to semantic statements of the underlying knowledge model, the computer could assist the user much better. Tables are a particular example of structures with implicit semantics, which are quite difficult to formalise (Pivk, Cimiano, & Sure, 2005). Existing tools do not handle semantics of structures: For example, John might write a document, and mark some parts in red; these markings mean “statements do be discussed with Alice.” If he exchanges the document he needs to communicate these ad-hoc semantics (discuss with Alice) implied by structures (red) as well. Currently, he would do so separately, by e-mail or phone. Making the semantics of structures explicit allows for better search (“what has to be discussed?”) and more efficient knowledge exchange. A truly usable SPKM system should allow the user to use the different editing paradigms (wiki, document, mind-map) at wish—on the same data. Existing PIM objects (tasks, addresses, appointments) and desktop resources (files, folders) have to become first-class citizens, so that they can be annotated and interlinked as well. The importance of flexible authoring methods has been recognised in knowledge engineering (Fensel et al., 1994). An SPKM tool supports the user in these processes. Each individual uses his own SPKM tool as a personal knowledge repository; he benefits personally from this system by having better retrieval of his knowledge. His personal SPKM
tool is connected to other applications and in a loose peer-to-peer fashion to other SPKM tools. This network allows individuals to combine their knowledge through sharing and exchanging. If designed correctly, SPKM tools are easy to use and cognitively adequate for modelling and refactoring knowledge; enable information sharing and reuse within the personal knowledge space, with other knowledge workers, and with existing information systems. And they enable structured access to personal and collaborative knowledge through queries, categorisation, and associative browsing.
existing spkm tools and their drawBaCks Currently, not many SPKM tools exist. A brief overview is presented in this section. IMemex (Dittrich, 2006) calls itself a “A Platform for Personal Dataspace Management” and is centred around a data model (IDM) (Dittrich & Vaz Salles, 2006) for the semantic desktop. IDM’s core goal is to cross the boundaries between files and resources in files. All items, be it folders, files, sections, or paragraphs are mapped into a single model. IDM emphasises performance, expressivity, lazy data sets, and data streams. There is no publically available end-user tool, but first prototypes focusing on search will be available soon. However, authoring of structured data seems not to be the scope of iMemex. Another existing SPKM system is Artificial Memory (Ludwig, O’Sullivan, & Zhou, 2004), which uses a tree-like structure in a Web interface to interact with the user. It represents semantics with icons and allows the user to add new statements. However, unlike a semantic wiki, the user does not have the freedom to not structure his writings. The European project NEPOMUK aims to bring forward the state-of-the-art by developing a social semantic desktop. Within NEPOMUK, a
Personal Knowledge Management with Semantic Technologies
data model for structuring knowledge in explicit yet vague relations is developed as Conceptual Data Structures (CDS) (Völkel & Haller, 2006). NEPOMUK builds on Gnowsis (Sauermann, 2005), which included a semantic wiki from the start to allow a user to make arbitrary statements. In Gnowsis, a central ontology could be browsed and edited in parallel, using the same concepts as those of the semantic wiki. Haystack (Quan, Huynh, & Karger, 2003) is another semantic desktop which emphasizes browsing through and interacting with a heterogeneous collection of RDF resources. However, Haystack does not foresee creating completely unstructured items first, and thus places the burden on the user to formalise her knowledge in one big step. Summary: Although all these tools focus well on the individual (Req. 2, Figure 2) and offer plenty structuring abilities (Req. 3, 4, and 5), they are neither simple to use (Req. 7) nor have they acceptable costs for externalisation (Req. 1).
Semantic wikis enhance classical wikis with the ability to author and use formal semantics: in the same wiki style of free text editing, semantic statements can be added. They can describe the page or parts thereof semantically. A semantic wiki offers a uniform way to work with all levels for formality
(text, structured text, and formal statements). As shown in Figure 4, they allow users to structure and annotate their data but they do not force them to do so. Using enhanced wiki syntax (plain-text with few mark-up commands and few semantic annotation commands) has several benefits: (a) most users are used to text typing and avoid familiarising with yet another user interface; (b) existing skills for text manipulation (e.g., copy and paste of text blocks) are leveraged to edit a document structure; (c) users refine interactively the input until the result matches the intended structure; (d) wikis allow soft transitions between knowledge layers including free-text, therefore no knowledge of any syntax is required to start authoring; (e) wiki syntax has little layout options, forcing the user to focus on the structure and the content; (f) text is in general a faster method of entering semistructured information than graphical approaches. Several semantic wikis exist today. A fullday workshop on semantic wikis was held in 2006 (Völkel & Schaffert, 2006). By using the authoring capabilities of semantic wikis for personal knowledge management—ignoring wiki community aspects—we get a personal semantic wiki (PSW). One of the systems of particular interest is the system SemperWiki (Oren, Völkel, Breslin, & Decker, 2006), which has been designed for personal use and allows the user to integrate simple wiki texts with formal semantic annotations. SemperWiki can be used
Figure 4. Semantic wikis handle a continuous spectrum of knowledge types
Figure 5. Semantic Wikis can author in different paradigms simultaneously
personal semantiC wikis (PSW)
Personal Knowledge Management with Semantic Technologies
as a traditional personal wiki for note-taking, but can also be augmented with simple semantic annotations. These annotations can be shared with others (using RDF, the standard Semantic Web language) but more importantly, enable improved navigation to related items and intelligent queries. Arbitrary annotations can be made and the tool can be customised for usage in a certain domain that use specific terms and ontologies. Such domain ontologies can contain background knowledge (e.g., in the biology domain possibly a taxonomy of biological species and their relations) that can be used in conjunction with the manually entered wiki annotations. SemperWiki integrates with the Gnome (Linux) desktop: it is available across the whole desktop with a single keypress, it automatically saves all changes and updates all navigation links instantly. PSWs allow users to explicitly indicate the semantics of information items and their relations to other information objects. Linking to desktop objects is often still limited or difficult to use. Semantic relations between information objects allow for contextual navigation even between pages that have not explicitly been related to each other but that can be related based on their meaning, such as two notes about the same project. The flexibility and freedom of wikis is maintained: users can annotate notes with their own terms and can later add background knowledge that relates their terms to those used by others in their working environments, enabling those others to reuse personal annotations without requiring a-priori agreement on terms or classifications. However, even semantic wikis do not fully exploit the opportunities of supporting the user. They have currently a severe lack of refactoring tools, a feature found in most advanced IDEs today, which allows renaming or moving items while maintaining all references. Second, semantic wikis do not exploit the structural parts of a page and hence cannot return, for example, short sections relevant to a query but instead return only full documents.
Scenario: Writing an Article using a psw Writing is a central activity for all kinds of knowledge workers. They have to compile reports or write articles for scientific journals. Writing an article includes reading related work—and often also taking personal notes on it. An outline or other kind of high-level structure is usually created. This structure is then refined to the argument-level and fleshed out into a linear text. Next, the text is partially rewritten and references to figures, other sections and literature are added. Eventually the text is typeset and thus the final layout is created. First, the related work can be managed in a PSW. Different from standard reference management tools, a PSW allows annotating and relating references. Second, personal notes on the topic at hand can be managed, structured, refined, and related in the same environment. The benefit of using a PSW is the unified management of heterogeneous artefacts in one environment with one language. Here, language refers to the names used for relation and entity types. A consistently used vocabulary with clear semantics—even if only known to the user in the first place—is of great value in information processing, for example, for posting queries. Such queries can then return a heterogeneous item collection. A second benefit is that a PSW allows users to externalise the semantics of relation and entity types. Once a relation has, for example, been defined as transitive or symmetric, further relations can be computed by inference, leading to better search results. The usual top-down refinement process in article writing is well supported by PSWs: As entities are supposed to be fine-grained and referenceable, items can be re-used. An outline evolving into a draft document can thus be represented by a hierarchy of interlinked items. Smaller items allow for better re-use in other contexts, and semantic annotations help to find the items
Personal Knowledge Management with Semantic Technologies
needed. In collaborative work, multiple users are able to work together explicitly on argument and text structures, something rarely possible with classic tools, that is, only a few dedicated argumentation management tools exist. Another possibility for closely collaborating parties is the direct exchange of model parts from one PSW to another, without the need to create a document at all. The other user can than import and link the other’s knowledge model. A more detailed description how and why a cultural shift from a document-centred to a model-centred society could happen is explained in Völkel (2007).
Evaluating Personal Semantic Wikis The main advantage of PSWs compared to other PKM and SPKM approaches is the use of stepwise, gradual formalisation leading to acceptable costs (Req. 1) both for externalisation and search. PSWs offer the same support for classification and links (Reqs. 3 and 4). Until PSW are integrated with semantic desktops, their ability to track context is still limited (Req. 5). PSWs are simple to use (Req. 7), but have no clear way of working together with paper (Req. 6). As the resulting knowledge base is partially structured, an export to and import from existing formats is partially possible (Req. 8).
ConClusion In this chapter we have introduced the concept of personal knowledge management enhanced with semantic technologies (SPKM). We presented results from an exhaustive literature study on requirements for information management. Then we analysed existing approaches for PKM and evaluated them with respect to the requirements. We found existing tools lead to a high degree of information fragmentation: structures made explicit in one tool become implicit again in the
next tool. There are currently no tools allowing the user to use the different editing paradigms (wiki, document, mind map) at wish—on the same data. Existing PIM objects (task, address, appointment) and desktop resources (files, folders) can currently not be annotated and linked.
Outlook A semantic wiki by itself does not address the requirements: detection of user context and importance of ordinary paper, which are open research issues that are actively worked on in for example ubiquitous computing and in electronic paper. Both are likely to minimise the gap from real to digital world. Advances in scanner technology, optical character recognition (OCR) and natural language processing (NLP) are also likely to integrate paper tighter into the digital world. For privacy and cultural adoption, it seems, the only way is to educate people in the same manner as we teach them reading, writing, and calculating. On future curriculae we will probably also find the subjects semantic modelling and data privacy.
Future Research Trends An SPKM tool should analyse and cluster the information to find related items. All information should be stored in a content repository that can apply rules with background knowledge, and one should be supported in reusing existing terminology. SPKM systems should be connected to each other forming a network of knowledge; this network should allow sharing knowledge with others. The interface of the SPKM system should be able to be personalised to each user and adapt to his preferences. Importing and exporting with existing and future nonsemantic tools is an issue. Also the integration between different semantic applications—although easier—is still a research issue. Another concern is privacy. If people continue to
Personal Knowledge Management with Semantic Technologies
publish a lot of their semantic data, they will be surprised to learn how easy such information can be aggregated later on. Nontechnical issues such as semantic data handling as a cultural technique, the gap from paper to the digital world, and simplicity in use are still open. A critical success factor of SPKM solutions will be the soft introduction of formalization: A user must, for example, always have the option not to formalize at all. A gradual, step-wise formalization of the existing knowledge structures should let the user decide how much effort (K E) he is willing to spend. At any point in time, more formalization effort has to be rewarded with better querying and browsing capabilities (lower K R). The Achilles heel of semantic tools will be the authoring part. Only if the end-user has an easy-to-use environment to create semantic structures on the fly and explicitly, semantic can technologies can improve searching and browsing. Of course, this approach should be accompanied by automatic generation of semantic data from existing content.
aCknowledgment Parts of this work have been funded by the European Commission 6th Framework Programme in the context of the Knowledge Web Network of Excellence Project, FP6-507482, and the EU IST NEPOMUK, the Social Semantic Desktop, FP6-027705. Parts of this material are based upon works supported by the Science Foundation Ireland under Grants No. SFI/02/CE1/I131 and SFI/04/BR/CS0694 and parts of this work have been funded by Salzburg Research, Austria.
referenCes Adler, A., Gujar, A., Harrison, B. L, O’Hara, K., & Sellen, A. (1998). A diary study of work-related
0
reading: Design implications for digital reading devices. In CHI (pp. 241-248). Barreau, D., & Nardi, B. A. (1995). Finding and reminding: File organization from the desktop. SIGCHI Bulletin, 27(3), 39-43. Bellotti, V., Dalal, B., Good, N., Flynn, P., Bobrow, D. G., & Ducheneaut, N. (2004). What a to-do: Studies of task management towards the design of a personal task list manager. In CHI (pp. 735-742). Blandford, A., & Green, T. R. G. (2001). Group and individual time management tools: What you get is not what you need. Personal and Ubiquitous Computing, 5(4), 213-230. Bondarenko, O., & Janssen, R. (2005). Documents at hand: Learning from paper to improve digital technologies. In CHI (pp. 121-130). Braganza, A., & Mollenkramer, G. J. (2002). Anatomy of a failed knowledge management initiative: Lessons from PharmaCorp’s experiences. Knowledge and Process Management, 9(1), 23-33. Campbell, C., & Maglio, P. (2003). Supporting notable information in office work. In CHI (pp. 902-903). Cutrell, E., & Dumais, S. T. (2006). Exploring personal information management. Commun. ACM, 49(1), 50-51. Dai, L., Lutters, W. G., & Bower, C. (2005). Why use memo for all? Restructuring mobile applications to support informal note taking. In CHI (pp. 1320-1323). Dittrich, J.-P. (2006). iMeMex: A platform for personal dataspace management. In SIGIR PIM. Dittrich, J.-P., & Vaz Salles, M. A. (2006). iDM: A unified and versatile data model for personal dataspace management. In VLDB, Seoul, Korea (pp. 367-378).
Personal Knowledge Management with Semantic Technologies
Dumais, S., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins, D. C. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. In SIGIR (pp. 72-79).
Ludwig, L., O’Sullivan, D., & Zhou, X. (2004). Artificial memory prototype for personal semantic subdocument knowledge management (PS-KM). ISWC demo.
Esselborn-Krumbiegel, H. (2002). Von der Idee zum Text. Eine Anleitung zum wissenschaftlichen Schreiben., Utb.
Nonaka, I. (1994, February). A dynamic theory of organizational knowledge creation. Organization Science, 5(1), 14-37.
Fensel, D. et al. (1994). Integrating semiformal and formal methods in knowledge-based systems development. In Proceedings of the Japanese Knowledge Acquisition Workshop (JKAW-94), Hitachi, Japan (pp. 73-89).
Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating company. New York: Oxford University Press.
Haller, H. (2002). Mappingverfahren zur Wissensorganisation. Diploma thesis at Freie Universität Berlin. Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 46(4). Jones, S. R., & Thomas, P. J. (1997). Empirical assessment of individuals’ personal information management systems. Beh. & Inf. Techn., 16(3), 158-160. Kando, N. (1997). Text-level structure of research papers: Implications for text-based information processing systems. In BCS-IRSG Annual Colloquium on IR Research, Workshops in Computing, BCS.
O’Hara, K., & Sellen, A. (1997). A comparison of reading paper and online documents. In CHI (pp. 335-342). Oren, E. (2005, November). SemperWiki: A semantic personal wiki. In Proceedings of the ISWC Workshop on the Semantic Desktop. Oren, E. (2006, November). An overview of information management and knowledge work studies: Lessons for the semantic desktop. In Proceedings of the ISWC Workshop on the Semantic Desktop. Oren, E., Völkel, M., Breslin, J. G., & Decker, S. (2006, September). Semantic wikis for personal knowledge management. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA).
Kidd, A. (1994). The marks are on the knowledge worker. In CHI (pp. 186-191).
Polanyi, M. (1958). Personal knowledge: Towards a post-critical philosophy. London: Routledge & Kegan Paul Ltd.
Kwaśnik, B. H. (1983). How a personal document’s intended use or purpose affects its classification in an office. In SIGIR (pp. 207-210).
Pivk, A., Cimiano, P., & Sure, Y. (2005, October). From tables to frames. Journal of Web Semantics, 3(2), 132-146.
Lansdale, M. (1988). The psychology of personal information management. Applied Ergonomics, 19(1), 55-66.
Quan, D., Huynh, D., & Karger, D. R. (2003). Haystack: A platform for authoring end user Semantic Web applications. In International Semantic Web Conference (pp. 738-753).
Leuf, B., & Cunningham, W. (2001). The wiki way: Collaboration and sharing on the Internet. Addison-Wesley.
Randall, D. W., Bowker, G., & Leigh Star, S. (2001). Sorting things out: Classification and
Personal Knowledge Management with Semantic Technologies
its consequences - review. Computer Supported Cooperative Work, 10(1), 147-153. Ranganathan, S. R. (1962). Elements of library classification. Bombay: Asia Publishing House. Sauermann, L. (2005). The gnowsis semantic desktop for information integration. In Wissensmanagement (pp. 39-42). Schraefel, M. C., Wilson, M., Russell, A., & Smith, D. A. (2006). M-space: Improving information access to multimedia domains with multimodal exploratory search. Communications of the ACM, 49(4), 47-49. Sellen, A. J., & Harper, R. H. R. (2003). The myth of the paperless office. Cambridge, MA: MIT Press. Teevan, J., Alvarado, C., Ackerman, M. S., & Karger, D. R. (2004). The perfect search engine is not enough: A study of orienteering behavior in directed search. In CHI (pp. 415-422). Völkel, M. (2007). From documents to knowledge models. In Proceedings of ProKW workshop, Konferenz Professionelles Wissensmanagement, Potsdam, Germany. Völkel, M., & Haller, H. (2006a). Conceptual data structures (CDS) - towards an ontology for semi-formal articulation of personal knowledge. In Proceedings of the 14th International Conference on Conceptual Structure, Aalborg University – Denmark. Völkel, M., & Schaffert, S. (Eds.). (2006). Proceedings of the First Workshop on Semantic Wikis - From Wiki to Semantics, ESWC. Wagner, C., & Bolloju, N. (2005, April-June). Knowledge management with conversational technologies: Discussion forums, weblogs, and wikis. Journal of Database Management, 16(2), i-viii.
Wenger, E., McDermott, R., & Snyder, W. M. (2002, March). Cultivating communities of practice. Harvard Business School Press. Whittaker, S., Bellotti, V., & Gwizdka, J. (2006). E-mail in personal information management. Communications of the ACM, 49(1), 68-73. Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted metadata for image search and browsing. In CHI.
additional reading Bardini, T. (2000). Bootstrapping: Douglas Engelbart, coevolution, and the origins of personal computing (Writing Science). Stanford University Press. Davenport, T. H. (2005). Thinking for a living: How to get better performances and results from knowledge workers. Harvard Business School Press. Engelbart, D. (1963). A conceptual framework for the augmentation of mans intellect (Tech. Rep.). London, Washington DC: Spartan Books (pp. 1-29). Friedewald, M. (2000). Der Computer als Werkzeug und Medium. Die geistigen und technischen Wurzeln des Personalcomputers (Taschenbuch). Goodman, D. (1988). The complete hypercard handbook. Random House Information Group. Lethbridge, T. C. (1994). Practical techniques for organizing and measuring knowledge. Ph.D. thesis, University of Ottawa. Rheingold, H. (2000). Tools for thought: The history and future of mind-expanding technology. The MIT Press. Sowa, J. F. (1999). Knowledge representation: Logical, philosophical, and computational foundations. Course Technology.
Personal Knowledge Management with Semantic Technologies
endnote 1
The literature survey started in November 2006 from recent issues of relevant conferences (ESWC, ISWC, CHI, and SIGIR conferences and the CACM journal) and followed both backward and forward citations.
Chapter X
DeepaMehta:
Another Computer is Possible Jörg Richter DeepaMehta Company, Germany Jurij Poelchau fx-Institute, Germany
introduCtion machine dreams A crucial experience during my time at university—computer science (with focus on AI) and linguistics—was the documentary “Maschinenträume” (1988) by Peter Krieg. It features the long-term AI project “Cyc,” in which Doug Lenat and his team try to represent common sense knowledge in a computer. When Cyc started, in 1984, it was already known that many AI projects failed due to the machine’s lack of common sense knowledge. Common sense knowledge includes,
for example, that two things cannot be in the same place at the same time, or that people die, or what happens at a children’s birthday party. During the night, while the researchers are sleeping, Cyc tries to create new knowledge from its programmed facts and rules. One morning the researchers were surprised by one of Cyc’s new findings: “Most people are famous.” Well, this was simply a result of the researchers having entered, besides themselves, only celebrities like, for example, Einstein, Gandhi, and the U.S. presidents. The machine-dreaming researchers, however, were in no way despondent about this obviously wrong finding, because they figured they would only
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
DeepaMehta
have to enter the rest of the population, too. The underlying principle behind this thought is that it is possible to model the whole world in the form of ontologies. The meaning of the world can be captured in its entirety in the computer. From that moment the computer can know everything that humans know and can produce unlimited new insights. At the end of the film Peter Krieg nevertheless asks: “If one day the knowledge of the whole world is represented in a machine, what can humans do with it, the machine having never seen the world.”
the potential of ontologies In order to create computer applications that suit the human way of thinking and working, one needs to think about ontologies. There are ontologies, for example, for molecular biologists, for news editors, for tourist agents, or for connoisseurs of wine. By developing an ontology people agree on the meaning (semantics) of certain computer codes. Every community of interest is free to create their own ontology or to cowork with likeminded people on the development of a shared ontology. This development process—the ontogenesis—is at least as important as the resulting ontology itself. An ontology of the entire world will never exist because the meaning of terms depends on their usage context. A wine merchant probably has different priorities than a wine connoisseur which will be reflected in their ontologies. When developing an ontology, local or global communities of interest focus on that parts of the reality that are relevant to their intended usage context. The authors see the prior potential of ontologies not in creating an one-to-one representation of reality, in order to enable automatic reasoning, but in “learn[ing] to think together by building shared structures of meaning” (Murray, 2003, p. 11). That is not in automation, but in collaboration. We can accept that invitation not only to think about application domains and usage contexts
but also to rethink the underlying concepts of the computer itself, especially the human-computerinterface. It is interesting to bring to mind that ontologies are firmly molded into every software system—long before the Semantic Web effort. These ontologies are made up of the concepts and their relations defined by the software architect, and according to which the user is supposed to think when using the software. So, the concepts of, for example, “presentation,” “slide,” and “masterslide” and their relations are as firmly molded into the software “Powerpoint” as the concepts “application,” “window,” “file,” and “folder” in the software “user interface.” It would be significantly easier for the computer user if the user could work semantically not only in a single application but if the entire computer were designed as a semantic work environment. In such an environment, the user would no longer work in applications, or store files in folders, but would be confronted with the terms of his/her way of thinking and working directly. At the lowest level such an environment could provide the concepts “more or less structured information unit,” “relation between information units,” “view of a information context,” “shared workspace,” and “private sphere.” At the next higher level more specific concepts of yet general usage like “contact info,” “appointment,” “task,” and “project” could be provided. And at the highest level the user would find the concepts of its individual work and leisure domains, for example, a molecular biologist and movie fan would work with the concepts “chromosome,” “gene,” “protein,” as well as “movie,” “actor,” and “cinema.”
The Trouble with Computers and their Solutions When learning how to use a computer nowadays, one understands quickly that for every purpose one needs a specific application. To write a text, one uses a word processor. E-mails are dealt with using an e-mail application. To surf the Internet
DeepaMehta
one needs a Web browser and one’s contact information is stored in the address book. Launching an application causes a window to open, showing one the application controls as well as the actual content (texts, e-mails, Web pages, contact details). When writing a text, one needs to save it into a file. This file is located in a certain folder on the hard disc. When one wants to access one’s text later on, one has to use another application: a file browser to locate and open the respective file. Within the first weeks of learning every computer user experiences the loss of his/her text (image, etc.)—either because the user forgot to save it or because the user cannot find it anymore. Working with applications, files, and folders and the importance of saving is surely not difficult to learn, and even may—as soon as one has internalized the logic of the computer—seem reasonable. However, if we turn away from the machine logic to human work situations, it becomes apparent how unnatural and inefficient this mode of operation is. A typical work situation involves a number of heterogeneous content objects. Take, for example, the production of this book, “SWE.” There are about 20 chapters addressing different topics. Each chapter is written collaboratively by several authors. For each author there are several contact details (e-mail address, homepage, telephone number). Each chapter refers to a number of further information resources (projects, Web sites, other articles). During the review phase, authors review each others’ chapters leading to further chapter revisions. Eventually the editors assemble the accepted chapters into the final book ready for the print shop. By means of this work scenario, two fundamental flaws of the computer and the resulting problems can be made clear:
Missing Relations The meaningful relations between information resources are not represented in the computer.
Thus, they neither can be displayed on-screen nor be exploited for navigation. There is no relation between, for example, the author of a Word document and its entry in the address book application. The user has to switch to the address book and search for the contact details manually. Even within a single application, relations are not easy to represent: if an article, as well as its reviews, are created with, for example, Microsoft Word, it is not possible to represent the relations between these files within Word, but only indirectly in the file system structure, if at all. Because the relations are not represented in the computer, they can not be shared with other users. This makes collaboration inefficient, because every coworker has to reconstruct the relations individually. To represent the relations between, for example, the articles and their reviews one could store every article in its own folder and its reviews in respective sub folders. If a review is sent to a coworker as a mail attachment, the relations to the main article gets lost and the receiver has to reconstruct it in the receiver’s own way. This additional effort increases with the number of coworkers, thus a lack of synchronization and various misunderstandings between the coworkers are programmed. The problem of missing relations can be solved by releasing the contents (texts, e-mails, Web pages, contact details) from their applications and storing them in a corporate memory using a neutral data format. Within the corporate memory, arbitrary content objects can be set into relation, and external resources can be integrated by storing reference objects in the corporate memory. The problem of inefficient collaboration can be solved by letting all coworkers operate on a central (server side) corporate memory instead of their local hard discs. For both problems, the suggested solutions are already in use through specific applications (e.g., Document Management, Content Management, Personal Information Management, Workflow Management, Groupware, and Wikis), but to the
DeepaMehta
knowledge of the authors, there is no computer platform that operates system-wide according to this paradigm.
Missing Work Context The greatest problem which makes using a computer so cumbersome is that the user’s work context is not represented on-screen. This applies especially to the knowledge worker who deals with a number of applications and windows simultaneously. In the work context of “SWE Book,” for example, a number of heterogeneous content objects (texts, e-mails, Web pages, contact details) are involved. A book author will probably use the concept of “folders” to set up his computer as follows: on the hard disc there is a folder “SWE
Book” to hold the article and further assets. In the Web browser there is a folder “SWE Book” to hold the bookmarks of the project’s Web site and other Web sites. In the e-mail application there is a folder “SWE Book” to hold the mails of the book editors and other authors. And in the address book application there is a folder “SWE Book” to hold the contact details of the project’s members. Conclusion: in the user’s mind there is one project, “SWE Book,” whereas in the computer the project is chopped into pieces and fragmented to at least four different work environments. The user has to switch between several applications on a regular basis. Every switch changes the display as well the usage rules abruptly. What is displayed in one moment has vanished (resp. is obscured) in the next, the usage rules that are ap-
Figure 1. Current window-based user interface with four open applications: Web browser, file browser, e-mail client, and address book.
Problem 1: the meaningful relations are not displayed. Information belonging to one project (here: “SWE Book”) are fragmented to several work environments. Problem 2: the user’s work context is not displayed. The user must switch between applications frequently and has to cope with permanent context changes
DeepaMehta
plicable in one moment are not so in the next. The user can, of course, recover his/her “SWE Book” folder in each application, but probably along with another 100 folders which are not related to the current work context in any way. If the content objects belonging to the current work situation—and only these—were visible together on-screen, the cognitive load of the user would be reduced significantly. This would free the user’s mind for concentration on work instead of on the workings of the machine. Within the scope of the current user interface paradigm—every application opens another window and the application’s content objects are bound to that window—the problem of the missing work context can not be solved. The kind of navigation that results from that paradigm is called here “GoTo Navigation.” To look up, for example, an e-mail address while being in a Web browser the user must go-to that e-mail address (that is switching to the address book application) and cope with a complete context switch. This situation can not be abolished by the development of ever further applications because this only leads to a further fragmented user experience. The problem of the missing work context can only be solved by establishing a new user interface paradigm in combination with a new application model. This new interface paradigm can realize a kind of navigation that is called here “Bring-To-Me Navigation.” To look up, for example, an e-mail address while reading a Web page, the e-mail address is brought to the user, that is, it is searched for and displayed within the current work context. The on-screen work context remains stable. Heterogeneous content objects (texts, e-mails, Web pages, contact details) and their meaningful relations (problem 1) are displayed within the same window. The aim of Bring-To-Me Navigation is to let the interface itself disappear, and let the display become a “visual cache” (Canfield Smith, Irby, Kimball, Verplank, & Harslem, 1982) of the user’s state of mind.
In Richter, Völkel, and Haller (2005), the terms “Stable Views” and “Constructive Browsing” are used to describe this new interface paradigm, but here the term “Bring-To-Me Navigation” is used to express that information is brought to a single window and to contrast it to the traditional Go-To Navigation. In order for the content objects from different applications to share the same display context, the application logic must be separated from the display. The application as such will no longer be visible to the user, but only the content objects. Because there is no application-specific user interface anymore, the content objects must provide the application-specific operations themselves, for example, an e-mail object must provide a “send” operation to the user. This requires a new application model. The new application model will be realized as an application framework to allow application developers to bind operations to content objects and to interfere with the shared display context. To our knowledge there is no computer platform that offers stable on-screen work contexts system-wide.
the vision of an integrated work environment The vision of the DeepaMehta platform is to provide the knowledge worker with an integrated environment that supports his/her work, thought, and collaboration process. The DeepaMehta platform replaces the traditional computer desktop by a semantic desktop that “push[es] the user interface farther out from the computer itself, deeper into the user’s work environment” (Grudin, 1990). DeepaMehta envisions: •
Removing machine concepts (applications, document formats, and so forth) from the user interface and thus from the user’s mind. Confronting the user only with the concepts
DeepaMehta
•
•
•
of daily work and thought. Allowing the user to fully control the display in order to represent work context on-screen. Proving that “a well-designed computer system can actually improve the quality of your thinking” (Canfield Smith et al., 1982) Breaking down the “application prison” (Nelson, 2001) and freeing the data from their applications in order to let the user navigate associatively across applicationborders. Allowing the user to re-use information objects in different work contexts. Designing a user interface that allows the user to apply the natural faculties like a sense of orientation: “I’m located at a certain place within a stable environment.” Designing a user interface that accommodates the individual learning process. Designing a software architecture which allows software developers to contribute domain-specific application logic without confronting the user with applications.
•
•
•
Supporting collaboration by allowing workgroups to operate on a shared information repository and to build shared structures of meaning. At the same time, providing the user with protected spaces to encourage creativity. Designing a work environment that supports the whole process, from (a) content creation, to (b) enrich content by structure, and (c) enrich structured content by application logic. Designing a work environment that allows constant changes at every stage. Delivering computers that boot directly into DeepaMehta as the standard user interface and work environment.
The developers of DeepaMehta strive to identify generic operations that are common to all applications, for example, editing, searching, displaying, navigating, and to build them right into the platform. Domain-specific operations may be implemented in the scope of the DeepaMehta
Figure 2. Old world—New world
The traditional computer user is bothered with operating systems, and applications which are built around document formats and network protocols (left side). In contrast, the DeepaMehta user is confronted only with content, and performs operations on the content directly—all on a semantic foundation (right side)
DeepaMehta
application framework. DeepaMehta applications are running at the server side and are not in the sight of the user.
•
the deepamehta platform The DeepaMehta software platform for collaboration and knowledge management is a comprehensive integrative concept that removes existing barriers that make current computer usage so cumbersome. DeepaMehta combines concepts and research findings in the fields of software engineering, information visualization, human computer interaction, Semantic Web, and creativity techniques in an innovative manner. DeepaMehta has the potential to be a great integrator: •
•
•
•
0
Mind Maps/Concept Maps: The DeepaMehta semantic desktop combines the cognitive virtues of mind maps with the computational virtues of concept maps resp. semantic networks. Visualization/Workspace: In DeepaMehta there is no separation between the graphic visualization of content structures and the actual work environment. All kinds of content is created and edited in-place. There is no separation between file-level and application-level. Brain Storming/Structuring/Processing: DeepaMehta supports the whole information handling process: from creating information in a brainstorming-mode, to structure information and building models, and to processing information by implemented logic—all is performed in one environment. File Level/Application Level: DeepaMehta steps away from the file-application-dichotomy and establishes a new paradigm of content and operations. Typed content objects provide the manipulative operations
•
themselves. Generic operations are provided by the DeepaMehta platform. Network/Local Machine: DeepaMehta removes the barrier between the local desktop machine and the network by establishing a uniform user interface for personal content and shared content. Working with local content and remote content results in no different user experience. Topic Maps/RDF: The DeepaMehta data model is inspired by ISO 13250 Topic Maps but has extensions to incorporate RDF-like features. DeepaMehta has the potential to marry the ISO Topic Maps standard and the W3C RDF standard.
topic maps frontend The DeepaMehta desktop is split into two areas. At the left hand side a topic map consisting of typed topics and typed associations is displayed. On the right hand side, detail information about the currently selected topic, association, or the topic map itself, is shown. This panel is called the Property Panel. If, for example, an e-mail topic is selected, the e-mail (with its “From,” “To,” “Subject,” and “Text” properties) is shown in the property panel. If a Web page topic is selected, the page (with its “URL,” “Title,” and “Content” properties) is rendered in the property panel. The user performs basic text editing and image-manipulation operations directly in the property panel, without the use of external applications. To perform operations on a topic, an association, or the topic map itself, the user utilizes context menus. Generic operations such as “Hide,” “Retype,” “Delete,” or “What’s Related?” and “Google Search” are provided by every topic. Depending on their type, topics may provide more specific operations, for example, an e-mail topic provides a “Send” operation, and a Web page topic provides a “Reload” operation. Topics and associations can be created manually or
DeepaMehta
Figure 3. The “DeepaMehta” networked semantic desktop
The left side displays information objects of different kinds and origins as well as their meaningful relations as a topic map that represents the current work context of the user. The right side displays detail information about the selected topic, for example, an e-mail or a Web page. The information objects are edited and manipulated in-place, for example, e-mails are written and send straight from the desktop
programatically. Topics and associations can result from a variety of sources such as mailservers, Web servers, databases, and Web-services. Associations can be created between any two topics, regardless of their origin. Topic maps are not like documents containing contents, but always personal views to contents existing elsewhere. All topics and associations are stored in a server-side repository, called the Corporate Memory. The corporate memory is accessible to various users and workgroups simultaneously, exposed to an access control mechanism of course. A topic map is an individual view to extracts of the corporate memory that are relevant in a particular working context. It is up to the user to decide which topics and associations to retrieve, and where to show them.
Topic maps are of unlimited size and are moved by mouse dragging. Topic maps that serve as common views within workgroups are published into shared workspaces. A query to the corporate memory is visualized graphically and the result set is displayed as a topic of the form of a ton. In order to avoid cognitive overload, DeepaMehta does not reveal a search result at once if it comprises more than seven topics (Miller, 1956). If the result does not comprise more than 150 entries, single topics may be revealed via a popup menu. Large result sets may be reduced by applying a narrower filter. A topic query may involve the topic name, topic type, topic properties, and associations the topic is involved in. The result may be sorted. Result sets in DeepaMehta can be regarded as “smart
DeepaMehta
Figure 4. The context menu of the person topic “Douglas C. Engelbart”
By the means of the “What’s Related?” command associated topics are retrieved from the corporate memory and displayed in the current topic map. Here, the submenu reveals Douglas C. Engelbart is associated with 2 other persons, 1 computer, 2 articles and so on.
Figure 5. In DeepaMehta large amounts of data are displayed as a ton
DeepaMehta desktop, exactly as it was left in the previous session. The DeepaMehta Platform is based on ISO 13250 Topic Maps. Topic maps consist of topics and associations, and both are typed. Types are Topics too. The user can define new types by deriving them from existing types, create additional properties, and defining relations to other types. New types can be used immediately, also together in shared workspaces. Every topic can be retyped at any time without losing content. All this provides the basis for supporting the dynamic collaborative process of building new ontologies.
System Architecture Features of the DeepaMehta software architecture: • • • • • •
A ton represents a query to the corporate memory and the result set at the same time. In this example an unspecific person search results in 267 topics — too much to reveal at once. Applying a filter “kay” yields to a result set small enough to be revealed immediately.
• • • •
folders,” that is, the user can retrigger the query that underlies a ton by double-clicking it. A ton may represent the query, for example. “All e-mails I have received from Jurij in the last 30 days.” The user interface is fully personalized. After logging in the user finds himself directly on the
•
Java-based application server multi-layer architecture Supported Semantic Web standards: ISO 13250 Topic Maps, RDF DB-neutral storage layer, currently MySQL and HSQL (pure Java DB) Variable frontends: Topic Maps GUI (Thin Client), Web browser, PDA, mobile phone Object-oriented framework for application developers SOA (Service Oriented Architecture) Access to external Web services via SOAP Access to external datasources: SQL, LDAP XSLT-based publishing engine for dynamic SVG, PDF and XHTML generation Integrated mail and Web support: SMTP, POP3, IMAP, HTTP
The DeepaMehta system architecture comprises all the layers of an IT system: the storage layer, the application layer, and the presentation layer. DeepaMehta is a distributed architecture, so that every layer can reside on a different machine.
DeepaMehta
Figure 6. DeepaMehta is a multilayered distributed software architecture
The heart of the DeepaMehta software architecture consists of an application server. DeepaMehta applications are developed in the scope of the DeepaMehta application framework and run at server-side. Applications can access a variety of data sources and can be served to a variety of frontends
The heart of the DeepaMehta software architecture consists of an application server. DeepaMehta defines a unique application model and provides a framework for application developers. A DeepaMehta application is actually a collection of topic types and association types that may be assigned to a workspace. Every type can be attached to a Java class that implements the behavior of the instances of that type. The topic type “Postal Address,” for example, may be attached to an implementation that fetches a map from Google Maps for a particular address. New applications can be aggregated from existing types. Thus every application that deals with postal addresses will benefit from the Google Maps behavior. Types can be assigned to shared workspaces. The user “deploys” an application just by joining a shared workspace. Once a user joins a shared workspace, the user gets access to the assigned types, and thus to their functionality.
data model The DeepaMehta data model is inspired by ISO 13250 Topic Maps. On the one hand the Topic Maps standard is not fully implemented; on the other hand, there are extensions to incorporate RDF/RDFS-like features. There are a lot of proposals for the integration of the ISO Topic Maps standard and the W3C RDF/RDFS standard (Pepper, Vitali, Garshol, Gessa, & Presutti, 2006) but DeepaMehta chooses its own ad-hoc approach because when the project started, in early 2000, these proposals did not yet exist. Furthermore in DeepaMehta there are higher-level concepts like Query, Datasource, Workspace, and User defined.
Topic Maps Concepts that are not realized resp. modified in DeepaMehta: n-ary Associations: For the sake
DeepaMehta
of simplicity, DeepaMehta supports only binary associations. This is not a serious lack because an n-ary association can be emulated by n binary associations. •
•
•
•
•
Association Role, Association Role Type: because DeepaMehta supports binary associations only, and associations are explicitly directed from node 1 to node 2, there is no need to realize the concepts of Association Role and Association Role Type. The meaning of the involved nodes is represented in the direction of the association. Occurrence, Occurrence Role, Occurrence Role Type: because DeepaMehta pursues an integrated GUI approach, occurrences are not realized explicitly. External resources are represented as topics; for example, a file is represented as a topic of type Document, and a Web page is represented as a topic of type Web page. Facet: because occurrences are represented as topics and because DeepaMehta provides the concept of properties, there is no need to realize the concept of facets. Scope: not yet realized. To a great extend ambiguity may be diminished by typing topics. In DeepaMehta there would be, for example, two topics “Tosca,” one of type Opera the other of Type Character. Furthermore DeepaMehta provides the concepts of Workspaces and Users. To Workspaces, a number of types are assigned. Users are members of workspaces and have access to the respective types. Topic Maps: In DeepaMehta, topic maps are topics too. Thus, topic maps can act as containers for other topic maps. (This is exploited by the GUI to realize the personal and shared workspaces which are in fact topic maps that serve as storage spaces for other topic maps.)
DeepaMehta Topic Maps Extensions to Incorporate RDF/RDFS-like Features •
•
Property, Property Value: part of the type system. To a topic type or association type a number of properties can be assigned. For example, the type Person has the properties “Birthday” and “Gender.” Property Values are a enumeration of predefined property values. For example, for the Property “Gender” the values “Male” and “Female” are predefined. Relation: to define a relation between types; part of the type system. Part of the definition is the Cardinality and the Association Type to be used at instance-level. For example, to model “City of birth,” a relation between the types “Person” and “City” may be defined, cardinality would be set to “One” and association type would be set to “City of birth.”
Higher-Level DeepaMehta Concepts •
•
•
Query: a query to the corporate memory and the corresponding result set of topics is a topic itself (and is presented at the GUI as a ton). A query can involve the Topic Name, Topic Type, Property Values (at instance level) and associations a topic is involved in. Datasource: external datasources like SQL databases or LDAP repositories are represented as topics too. A datasource is specified via its connection string (an URL starting with jdbc or ldap, for example) and the assignment of topic types whose instances are created from datasource entities. Workspace: a shared workspace with a number of members (users). A workspace is an exchange place for topic maps. Furthermore there are topic types and association types assigned to a workspace to hold the concepts which are used in it. Also the access
DeepaMehta
•
control mechanism is based on workspace memberships. Workspaces are topics, too. User: a user of the platform. A user is a member of a number of workspaces. The user has access to all the types assigned to the workspaces of which the user is a member, as well as to private types. A user has its private sphere (a topic map itself) to store private topic maps. A user is a topic too (derived from Person).
DeepaMehta applications are not visible to the user as such but only their topic types and associations appearing in the context menus. Types can be assigned to workspaces; the user obtains access to types by joining a workspace. The application’s data is modeled as type definitions and the application’s behavior is realized by attaching a Java class to a type. A type definition comprises a set of properties, a derivation from a basis type, and relations to other types:
the application framework • The DeepaMehta platform defines its own application model and provides an application framework. By means of the DeepaMehta Application Framework software developers create applications running on the DeepaMehta platform. A DeepaMehta application consists of a collection of types: topic types and association types. To create, for example, a calendar application, one could define the topic types “Calendar,” “Event,” “Location,” “Person,” and the association type “Participant.”
•
• Figure 7. By choosing from the list of widgets one can specify how to render a property in the property panel.
A Property models an untyped single-value data field of a topic type or association type. The topic type “Event,” for example, could have the properties “Begin Date” and “Begin Time” and the topic type “Person” could have the properties “First Name,” “Last Name,” and “Gender.” For a property, a list of possible values can be predefined. For the “Gender” property, for example, the values “Male” and “Female” could be defined. By the means of Derivation, a type definition can be build by specialization of an existing type. The derived type inherits all the properties and behavior of the basis type. A Relation models the relation between two topic types. In order to model, for example, the place of birth, one could define a relation between the topic types “Person” and “City.” A relation includes the specifications of the cardinality—one or many—and the association type to be used at instance level.
All the building blocks of the data model—topic types, association types, properties, property values, derivations, relations—are topics and associations themselves. Thus, the data model of a DeepaMehta application can be created directly in DeepaMehta as a topic map. As soon as the data model is defined, the basis functions of the application can be used immediately. To create, for example, an event and its participants, the user creates an Event-topic and connects it via Par-
DeepaMehta
Figure 8. The elements of the data model—topic types, association types, properties, and property values—are topics themselves and are handled as topic maps as well.
Here, the topic types of a calendar application are displayed. The topic types “Calendar,” “Event,” and “Person” derive basis properties from the “Topic” base class and define their own properties. To realize application-specific behavior every type can be attached to a Java class
ticipant-associations with Person-topics. Further basis functions like changing, deleting, or searching for events (e.g., all my upcoming events) and also the collaborative calendar usage is possible immediately, because standard information processing and communication functions are already built right into the DeepaMehta platform. An application consists not only of a data model and generic data manipulation operations but also of domain-specific application logic. The calendar application, for example, could notify the participants of an event in due time via e-mail or instant messaging. This is possible in DeepaMehta because topics and associations are not just information carriers but also provide active behavior. Every type can be attached to a Java class which implements the behavior of the instances of the respective type. The Java class
must be compliant to the DeepaMehta application framework, that is, it must be derived directly or indirectly from a certain base class (“LiveTopic”) and override specific methods (hooks) to let the topic instance react upon certain events. The DeepaMehta application framework is primarily event driven. It defines a number of events to which a topic or an association can react. A topic, for example, can react once: •
•
•
It is clicked: Most of the topics react by displaying their properties in the property panel (right side). It is right-clicked: Most of the topics react by dynamically assembling and displaying a context menu. An “E-mail”-topic, for example, can provide a “Send” command. A command has been chosen from its context menu: The base class provides the
DeepaMehta
Figure 9. This class diagram shows a selection of the LiveTopic subclasses that come with the DeepaMehta platform
The platform’s core features like “User Management & Collaboration” (grey areas) as well as the standard applications like “Personal Information Management” (blue area) are implemented as LiveTopics. The base class “LiveTopic” provides the hooks to be overridden by application-specific topic classes, in order to react on certain events
•
•
standard behavior for handling generic topic commands like “Hide,” “Retype,” “Delete,” “What’s Related?” and “Google Search.” One of its properties has been changed: An “Event”-topic, for example, could notify all the participants once an event is postponed. Furthermore, a topic can check the new property value and possibly reject it, or prohibit a property change a priori. It is associated with another topic: An “Event”-topic, for example, could notify a newly assigned participant. Furthermore, both of the associated topics get the chance to specify a certain association type or to reject the association.
•
•
It is moved within a topic map: This can be useful, for example, if the topic map acts as a time or geo grid. Furthermore a topic can lock itself to prohibit any movement. The topic map in which it is contained has been published in a shared workspace: A “Document”-topic, for example, uploads its assigned file to the server-side file repository.
The former list provides just an overview of the most important events. The DeepaMehta application framework defines about 40 events. The event handler hooks are ordinary Java methods. Within the hooks all Java APIs can be
DeepaMehta
used that are available on the server. Furthermore, the DeepaMehta application framework provides a number of service calls in order to navigate in the corporate memory, for example. All hooks are informed by the framework about the user who triggered the event. Thus the topic can impose access restrictions, for example, by disabling the menu commands for which the current user doesn’t have the required credentials.
the DeepaMehta Unified Process The development process is divided into three stages. Traditionally, at every stage another tool is used. At the stage of brainstorming one creates text or graphic content, for example, by using text editors or dedicated brainstorming tools like MindManager. To be able to process the contents later on by the means of application logic, a model must be built, for example, by creating database tables. Finally, the application logic must be coded by a programmer. At each of the three stages, different persons with different skills are involved and different work environments are used, each with a different user interface. This makes for
Figure 10. Processes (left side) and artifacts (right side) as involved in every IT-project
a slow and cumbersome development process. Content must be migrated from one stage to another, changes in one stage may require changes in the other stages. Communication between the involved persons is difficult because of the different jargons. The DeepaMehta Unified Process enables smooth transitions between all three levels (content, structure, and logic). Content can be created before any structure exists, and unstructured content can later be turned into structured content. No content will be lost, even if the structure changes later. Structured content, as well as unstructured content, and even the structure itself may be subject of later brainstorming sessions. Standard logic for, for example, navigating, searching, editing, and displaying content are built right into the DeepaMehta platform. Structured content may be processed by implementing custom logic. In order to support the communication process between the involved persons collaboration features are built into the platform. To build contents and structure collectively, the DeepaMehta platform offers shared workspaces. For the negotiation about the meaning of structures, every shared workspace provides a forum and chat as standard communication tools. The DeepaMehta Unified Process provides a significantly more efficient development process than other existing processes because the DeepaMehta platform simultaneously copes with visual, verbal, and virtual modalities.
Customer solution examples kiezatlas, a geographical Cms
The DeepaMehta Unified Process accommodates constant changes in the Content, Structure, and Logic levels. The DeepaMehta platform supports the Brainstorming/Data Acquisition, Modeling, and Coding processes within one user interface
The first commercially deployed DeepaMehta application was “Kiezatlas”, a geographical content management system (CMS). It was contracted by the Verband für sozial-kulturelle Arbeit, (www. stadtteilzentren.de) the German umbrella organization of settlements and neighborhood centers.
DeepaMehta
Since 2004 Kiezatlas is successfully deployed to publish (city) maps of social relevant institutions on the Web site www.kiezatlas.de.
outlook Extended Support for Semantic technologies
amina knowledge platform The amina knowledge platform was contracted in 2006 by the German amina foundation, (www. amina-initiative.de). The amina initiative aims to promote corporate responsibility (CR) projects by establishing a dialog between companies, universities, and avant-garde thinkers. The amina knowledge platform is built on the basis of DeepaMehta. An interactive public demo is available (in German) at www.amina-wissensplattform.de. DeepaMehta is also utilized as a live-mapping tool during the various amina events.
Currently DeepaMehta can import and export topic maps in a modified XTM format. Future versions of DeepaMehta will support further semantic technologies and domain-specific applications: • •
DeepaMehta type definitions will be built from RDF Schema or vice versa. All topics and associations will be annotated by a Subject Identifier. This allows, for example, merging or interconnecting topic maps or whole corporate memories with each other while preserving their semantics.
Figure 11. The public Web frontend of the Kiezatlas geographical content management system
The left side displays locations of stores and institutions in a city map. The right side shows detail information about the selected institution. Institutions can also be found by category or by entering a search term
DeepaMehta
Figure 12. The Kiezatlas administrators use the DeepaMehta topic map frontend to define the underlying data models of city maps.
In the upper area DeepaMehta standard topic types and properties are visible, for example, “Person,” “Institution,” and “Topic Map.” In the lower area one can see how the Kiezatlas-specific topic types and properties are derived from (blue associations) and set into relation (purple associations) the DeepaMehta standard types. The topic type “Stadtplan” (city map), for example, derives its properties and behavior from “Topic Map” and adds its own property (green association). Furthermore the search criteria for the city map contents are defined via derivations and relations. Also part of the topic map are users, workgroups, and memberships (orange associations), to define access control (turquoise associations), that is, the responsibilities of the various (sub)administrators. The public Web site (Figure 8) as well as the editor backends for the institution owners (Figure 10) are generated directly from this data model, for example, if a new property or search criteria is added to the model, the Web site and the editor backend are updated automatically
•
0
DeepaMehta will import semantically enriched domain-specific data and transform and visualize them as a topic map, for example, when dropping a RDF enabled business card (involving the “foaf,” “contact,” and “geo” ontologies) to the DeepaMehta desktop, it will visualize the contents as a topic map, showing the person, with all relations to friends, projects, and a city map retrieved from google maps.
Extended Collaboration Features Essential features of every collaboration environment are (a) versioned data storage, (b) a change history, (c) notification, and (d) access control. •
Versioning is a good method to cope with competing change requests without establishing a locking mechanism. Currently DeepaMehta contents are not versioned.
DeepaMehta
Figure 13. The Kiezatlas editor backend is a form-based Web application by which institution owners can update institution information on their own.
The institution form is generated dynamically by the DeepaMehta Web engine, based on complex type definitions (Figure 9). The contact person of an institution, for example, is modeled as relation between the “Institution” topic type and the “Person” topic type. The DeepaMehta Web form generator embeds the Person form (with “First Name,” “Last Name,” “Gender” fields) inside the Institution form. To the editor the form looks like an ordinary HTML form, but with the DeepaMehta Web engine the user input is stored as a semantic network in the corporate memory. The contact person of an institution, for example, is represented as topic of type “Person” that is associated with the “Institution” topic
•
•
Future versions of DeepaMehta will support versioning at four levels: topic contents, association contents, topic map geometry/visibility, and file contents. A change history helps to keep track of content changes performed by users: what was changed by whom and when? Changes can be presented visually like in Microsoft Word’s Change-View or in Wikis. Certain changes can be reverted. Currently DeepaMehta provides no change history. A notification mechanism informs users of relevant actions performed by other users: what has been done by my colleagues since my last login? A crucial concern is granu-
•
larity: who should be notified and in what detail? Currently DeepaMehta’s notification mechanism is quite limited: members of shared workspaces are informed via messaging and e-mail once a topic map is updated but not about what has been changed. An access control mechanism is crucial to protect confidential information and privacy and to enforce user hierarchies. Currently DeepaMehta provides an access control mechanism but it is not flexible enough. Basically it works on type-level and has the concept of a topic-owner. Future versions of DeepaMehta will provide access control at instance-level and even property-level.
DeepaMehta
Figure 14. A topic map of the amina network
Relations between amina Corporate Responsibility topics (blue balls) and suitable university courses are shown in pink. Amina agents and their affiliations to corporations and universities are displayed alongside their mentorships (blue associations) for the amina topics. Every amina topic is related to a shared workspace for the students and mentors collaborative work (not visible here)
Extended Standard Applications
Architectural Concerns
The DeepaMehta platform comes with standard applications for Personal Information Management (PIM), Document Management, and Web browsing/searching. PIM includes handling emails, contact information, appointments, and notes. All these functions are at prototype-level and can not yet replace the traditional applications. The rendering of HTML pages, in particular, is currently too limited. Future versions of DeepaMehta will be able to replace the traditional e-mail, Web, text, and image applications and will import their data. The feature set that is used by 90% of the users will be provided. On high demand is the integration of a Web engine like Gecko. Furthermore, audio and video contents will be supported.
Currently DeepaMehta is a client-server system. The graphic DeepaMehta client is connected to one DeepaMehta server at a time. One server instance serves one corporate memory at a time. In order to fully exploit the network effect, future versions of DeepaMehta will allow one client to be connected with multiple servers resp. corporate memories at once. The requirements of associating topics from different corporate memories and of synchronization of certain parts of different corporate memories is apparent. A matching approach for topics is required. The Topic Maps concepts of Subject Indicators and Subject Identifiers would help here, but there is no obvious solution to these requirements yet. In order to support the growth of a large-scale network of DeepaMehta servers, one can replace
DeepaMehta
the client-server architecture by a peer-to-peer architecture. This opens up a lot of questions and requires in-depth discussions. In order to integrate the DeepaMehta application server with other (non-Java) back-end applications and services, the DeepaMehta server should provide the DeepaMehta applications as Web services. Because of the layered DeepaMehta architecture which clearly separates the application layer from the storage and presentation layers, it is already very close to a Service Oriented Architecture (SOA). For the moment there is no language independent interface to the application layer. Future versions of DeepaMehta will provide a SOAP interface to the DeepaMehta applications.
related work semantic desktop There are other semantic desktop environments, for example, “IRIS” (www.openiris.org), “Haystack” (http://groups.csail.mit.edu/haystack/), “Gnowsis” (www.gnowsis.org), or “Nepomuk” (http://nepomuk.semanticdesktop.org). But none
of them solve the problem of the missing work context in a cognitively adequate fashion (see “The Trouble with Computers and their solutions”).
graph-Based information Visualization There are other popular applications, for example, “TheBrain” and “Inxight StarTree” that deploy tree or graph-based visualization to navigate in information spaces. With “TheBrain” the user can organize notes and files into a tree structure. The application displays an extract of the tree as nodes and edges. The node currently focused by the user is displayed in the middle of the screen. Around the focused node the application places all neighboring nodes. These comprise all the nodes that have a direct connection to the focused node. All other nodes are not displayed. To navigate the tree structure, the user clicks on a visible node. The application then moves the clicked node to the middle of the screen and again displays its direct neighborhood. The tree layout is done automatically, according to rules not transparent to the user. Therefore “TheBrain” is not cognitively adequate as it does not satisfy
Figure 15. “The Brain” displays an extract of a tree structure, with the focused node in the middle of the screen. Setting another focus causes a complete rearrangement of the display. www.thebrain.com
DeepaMehta
Figure 16. “Inxight StarTree” deploys a hyperbolic projection to display different hierarchy levels at once. While navigating, the spatial neighbor-hoods of the nodes are not preserved. www.inxight.com/products/sdks/st/
the criteria of free positioning. Haller (2003) points this out in his thesis “Mappingverfahren zur Wissensorganisation”: The main advantage of [visual] mapping approaches is the possibility to organize information spatially, according to the user’s Cognitive Map (Chen & Czerwinski, 1998; Dillon, McKnight, & Richardson, 1993). So, it is important to let the user position the nodes freely in order to adapt the map to his/her internal spatial model. The nodes should keep their positions (at least relatively to their neighborhood), if the map is modified. Otherwise orientation in information space by means of Cognitive Maps is hindered. “Inxight StarTree” allows the visualization of extensive tree and network structures on a limited display space. Like “TheBrain,” the node that is focused by the user is displayed near the middle of the screen. But unlike “TheBrain,” not only the direct neighborhood is displayed, but also deeper levels of the node hierarchy. This is possible because “Inxight StarTree” deploys a projection
of a hyperbolic plane to a region of the screen. Effectively, an increasing scaling factor is used towards the screen borders. The tree or network structure is navigated quasi by moving the screen region within the hyperbolic plane. Due to the hyperbolic projection, it can happen that one minute node A is above node B and the next minute it is the other way around. This visualization approach is also cognitively inadequate as it does not allow the user to orientate himself within the information space. Both, “TheBrain” and “Inxight StarTree” allow the user to navigate large information spaces in an associative manner. But their automatic layout approaches are not cognitively adequate because they do not support the process of knowledge acquisition—learning—of the user. The visualization approach does not consider the mental state of the learner. A work environment for knowledge workers can be regarded as cognitively adequate if it supports associative navigation and a visualization approach that enables the user to arrange, and thus create, the display.
future researCh direCtions Until now artificial intelligence and Semantic Web have brought no significant benefit for the computer user. This situation will not change as long as computer scientists adhere to the rationalist tradition in AI and its reduced and mechanical understanding of our environment and human nature itself. The authors propose to reanimate the principles of soft cybernetics and to examine the images that underlie the current computer research and system designs.
Reanimating Soft Cybernetics The modern computer age is in its infancy and our understanding of what computers do and how their functioning is related to human language,
DeepaMehta
thought, and action is very limited. This is evident when software engineers use terms like “intelligence,” “recognition,” “meaning,” “learning,” “reasoning,” “knowledge,” and “understanding,” for example, to describe their systems. Computer science is practiced mainly by engineers. The authors propose interdisciplinary collaboration between computer scientists and humanists: psychologists, linguists, philosophers, and epistemologists, as well as with biologists and artists. To the knowledge of the authors, the last crucial effort for such collaboration were the Macy conferences held from 1946 to 1953. During the following years, computer research split into two schools of thought: hard cybernetics, which later became the AI movement, and soft cybernetics. It is the school of hard cyberneticists and the AI advocates that increasingly dominates computer science research. They receive the major funding and set the research agenda, while the soft cyberneticists seem to have disappeared from the scene.
The central points of these two schools of thought are: •
•
The hard cyberneticists and AI advocates believe that all things and processes, including language, understanding, learning, and social interaction, are describable by the means of logic. The meaning of a thing exists independently from a person that uses that thing, for example, the meaning of a text is hidden in the text itself and is independent from the reader. Cognitive aspects play no role in this belief, because cognition is a phenomenon that can be simulated by a mechanical-mathematical model. The soft cyberneticists believe intelligence is an evolutionary result of the environment in which it has emerged. Meaning emerges only in the moment of action. Cognitive systems are closed systems which emerge from themselves, in a process that can be called “Autopoiesis” (Maturana & Pörksen, 2002).
Figure 17. Feedback cycle of the human-machine-system
Human beings have intentions and negotiate/cooperate with each other about how to use the computer in a certain usage context. They make a mathematical model with which the machine can compute. The results are interpreted by human beings at two levels: (1) what to do with the result in the usage context? and (2) Meta-Interpretation: how did the result come about? Is the model correct?
DeepaMehta
Intelligence is inseparable from cognition and information is inseparable from its use. Computers can not be intelligent as long as they are not closed systems but dependent on a programmer. The authors suggest a redirection of attention away from the school of hard cybernetics and AI to a focus on the school of soft cybernetics, and to study the human-machine-system and the relation of cognition and computers. A soft cyberneticist does not see the computer as an entity that performs a dialog with the user, or that cooperates with the user, or that has any intention, because a computer is not a cognitive system. Even if the property of having intentions or the process of interpretation is included into a mathematical model, no new quality is established in the machine. The cycle is just delayed. In the end, it remains up to the human being to interpret the value of the generated results in respect to his/ her usage context. Human beings and machines
belong to different semantic realms, and the dividing line can not be removed. Interpretation can not be automatized. Interdisciplinary collaboration would help to get a better understanding of the human-machinesystem and to build computers with a greater benefit for the user.
Examining Underlying Images For the direction of future computer research, one needs to examine the underlying images that computer scientists and system designers have of the machine. One would foresee completely different application domains and create different user interfaces depending on whether one sees the computer, for example, as an assistant or as a tool: •
With the image of an assistant in mind, the computer is regarded as an intelligent agent, one who performs a dialog with the user, and understands the user’s intentions.
Figure 18. The underlying image of the machine determines the research direction and the resulting computer designs
If one sees the computer as an intelligent assistant (left side), one tries to build a machine with cognitive faculties. If one sees the computer as a tool (right side), one tries to design it to meet the demands of human beings in their environment
DeepaMehta
•
This is evident, for example, when the word processor asks one “A sentence must begin with a capital. Shall I do it for you?” Or when Semantic Web researchers try to delegate the process of semantic annotation to the machine, as if the machine could read and understand Web pages on one’s behalf. With the image of a tool in mind, the computer is regarded as a thing. The human being performs tasks and uses the tool to augment faculties.
Outside from computers, the building of tools has a very long tradition. We are far away from harnessing the digital power as we harnessed, for example, a hammer or a pencil. But if we ask the right questions about how digital tools should be built, we are on the right track. The authors propose that researchers and engineers adapt the tool image when formulating research issues and designing computer systems.
Research Issues Computer Science/Human Computer Interaction •
•
•
•
What are the implications for the Semantic Web, if semantics is not hidden in texts but only emerges when the text is used by human beings in a certain usage context? (Meaning as Use as described in Wittgenstein, 1953) What are the implications for artificial intelligence and the Semantic Web if “information and its use are inseparable and actually form a single process”? (Jerzy Konorski, cited in Foerster, 1993) How can a collaborative semantic work environment be designed to support constant changes of content, structure, and logic? What images support a sustainable technology development?
Cognitive Science • •
How can the process of knowledge creation be supported by a computer? How can a user interface be designed to exploit the inherent human cognitive faculties?
Communication Science/Linguistics •
How can the process of semantic emergence be supported by a computer?
Philosophy •
Do computers act?
Cultural Studies •
What comes after postmodernism? What could be the features of post-digitalism?
referenCes Canfield Smith, D., Irby, C., Kimball, R., Verplank, B., & Harslem, E. (1982). Designing the Star User Interface. Byte, 4, 242-282. Chen, C., & Czerwinski, M. (1998, June). From latent semantics to spatial hypermedia—an integrated approach. In Proceedings of the 9th ACM Conference on Hypertext (Hypertext ‘98), Pittsburgh, PA. Retrieved March 9, 2008, from www.pages.drexel.edu/~cc345/papers/ht98.pdf Dillon, A., McKnight, C., & Richardson, J. (1993). Space—the final chapter or why physical representations are not semantic intentions. In C. McKnight, A. Dillon, & J. Richardson (Eds.), Hypertext: A psychological perspective (pp. 169191). Chichester: Ellis Horwood. Foerster, H. v. (1993). Wissen und Gewissen. Versuch einer Brücke. Frankfurt am Main, Germany: Suhrkamp.
DeepaMehta
Grudin, J. (1990). Interface. In Proceedings of the 1990 ACM Conference on Computer Supported Cooperative Work (pp. 269-278). Los Angeles, CA: ACM Press. Haller, H. (2003). Mappingverfahren zur Wissensorganisation. Thesis. Published at KnowledgeBoard Europe. Retrieved March 9, 2008, from www.heikohaller.de/literatur/diplomarbeit/mapping_wissorg_haller.pdf Maturana, H. R., & Pörksen, B. (2002). Vom Sein zum Tun. Die Ursprünge der Biologie des Erkennens. Heidelberg, Germany: Carl-Auer-Systeme Verlag. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Murray, J. H. (2003). Inventing the medium. In N. Wardrip-Fruin & N. Montford (Eds.), The new media reader (pp. 3-11). Cambridge, MA: The MIT Press. Nelson, T. (2001). Talk at ACM conference on hypertext (Hypertext 2001). Retrieved March 9, 2008, from http://asi-www.informatik.uni-hamburg.de/personen/obendorf/download/2003/nelson_ht01.avi.bz2 Pepper, S., Vitali, F., Garshol, L. M., Gessa, N., & Presutti, V. (2006, February 10). A survey of RDF/topic maps interoperability proposals (W3C Working Group Note). Retrieved March 9, 2008, from www.w3.org/TR/rdftm-survey/ Richter, J., Völkel, M., & Haller, H. (2005, November). DeepaMehta — a semantic desktop. In S. Decker, J. Park, D. Quan, & L. Sauermann (Eds.), Proceedings of the 1st Workshop on The Semantic Desktop. 4th International Semantic Web Conference, Galway, Ireland (Vol. 175, CEUR-WS). Wittgenstein, L. (1953). Philosophische Untersuchungen. Frankfurt am Main, Germany: Suhrkamp (2003).
additional reading Semantic Web/Semantic Desktop Decker, S., & Frank, M. (2004). The networked semantic desktop. In Workshop on Application Design, Development and Implementation Issues in the Semantic Workshop at WWW2004, New York, USA. Retrieved March 9, 2008, from http://triple.semanticweb.org/svn/stefan/DeckerFrank.pdf Manola, F., & Miller, E. (2004, February 10). RDF primer (W3C Recommendation). Retrieved March 9, 2008, from www.w3.org/TR/rdf-primer/ Pepper, S. (2000). The TAO of topic maps. Finding the way in the age of infoglut. Retrieved March 9, 2008, from www.ontopia.net/topicmaps/materials/tao.html Pepper, S., & Moore, G. (2001). XML topic maps (XTM) 1.0 (TopicMaps.Org Specification). Retrieved March 9, 2008, from www.topicmaps. org/xtm/1.0/ Computer System Design Landauer, T. K. (1996). The trouble with computers: Usefulness, usability, and productivity. Cambridge, MA: The MIT Press. Norman, D. A. (1986). Cognitive engineering. In D. A. Norman & S. Draper (Eds.), User centered system design (pp. 31-61). NJ: Erlbaum. Suchman, L. A. (1987). Plans and situated actions: The problem of human-machine communication (learning in doing: Social, cognitive & computational perspectives). Cambridge, UK: Cambridge University Press. Winograd, T., & Flores, F. (1986). Understanding computers and cognition. A new foundation for design. Addison-Wesley Professional.
DeepaMehta
Philosophy of Language/Semantics Austin, J. L. (1962). How to do things with words. Cambridge, MA: Harvard University Press (2005). Searle, J. (1969). Speech acts. Cambridge, UK: Cambridge University Press. Cognition Foerster, H. v., & Pörksen, B. (2006). Wahrheit ist die Erfindung eines Lügners - Gespräche für
Skeptiker. Heidelberg, Germany: Carl-Auer-Systeme Verlag. Maturana, H. R., & Varela, F. J. (1992). The tree of knowledge. The Biological roots of human understanding. Boston, MA: Shambhala. Miscellaneous Weizenbaum, J. (1978). Die Macht der Computer und die Ohnmacht der Vernunft. Frankfurt am Main, Germany: Suhrkamp (2003).
Section III
Methods for Semantic Work Environments
Chapter XI
Added-Value:
Getting People into Semantic Work Environments Andrea Kohlhase Jacobs University Bremen and DFKI Bremen, Germany Normen Müller, Jacobs University Bremen, Germany
aBstraCt In this chapter we will look at users’ taking action processes in Semantic Work Environments. We argue that the underlying motivational problem between vast semantic potential and extra personal investment can be considered as a “Semantic Prisoner’s Dilemma” that builds on two competing value perspectives: The micro and macroperspective. The former informs a user’s decision for action, whereas the latter informs a designer’s decision for offering services. An in-depth analysis of the term “AddedValue” reveals its double relativity, which allows a sophisticated evaluation of such services from a microperspective. We use this property of double relativity for suggesting the “Added-Value Analysis” as a design method for getting people into Semantic Work Environments—showcasing its strength with a description of cPOint and cOnneXiOnS.
introduCtion In 1998, WWW inventor and W3C director Tim Berners-Lee proposed a rather gigantic project now called the “Semantic Web.” His road map is based on his understanding of the Semantic Web as “a Web in which machine reasoning will be ubiquitous and devastatingly powerful”
(Berners-Lee, 1998, p. 50) assuming that its data exists in machine-processable documents. Even though he was aware of the fact that “instead of asking machines to understand people’s language, it involves asking people to make the extra effort” (Berners-Lee, 1998, p. 50), he was so enamoured by the possibility to manage the Web’s data, see also Berners-Lee and Fischetti (1999),
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Added-Value
Berners-Lee, Hendler, and Lassila (2001), that he neglected the cost-benefits ratio involved for its users. Although the grand idea of a Semantic Web is well-established by now, it has not come into real life yet. To save the idea on a smaller scale and lighten the burden involved, Semantic Work Environments (SWE) have been proposed, but face similar difficulties to motivate users into (voluntary) action. This is because formalizing or explicating semantics per se is unnecessary for humans with their complex interpretation capability. If they do, they do it for a machine to understand them in the long run: is that worth the effort? Note that this is a problem inherent to semantic technology, not to collaborative or social technology like “Wikipedia.” We argue that the underlying motivational problem between vast semantic potential and extra personal investment can be analyzed in terms of the well-known nonzero-sum game “Prisoner’s Dilemma.” This “Semantic Prisoner’s Dilemma” consists in two competing perspectives on taking action: The micro- and the macroperspective. The latter is a view from without, having the overview of the value landscape, whereas former is a view from within, where not all value information is available. As only the user can take action, the user’s microperspective is decisive for getting her into an SWE. In contrast, an SWE design standpoint tentatively takes the macroperspective and expects users to go for the global optimum in the value landscape. We will show why a rational user can only go for the local optimum. In order to tip the scale of taking-action towards getting people into SWEs, we propose to change the values in the decision landscape, so that from a microperspective the local gains for the decision with the global optimum increase above the critical threshold for acting. In particular, if we enrich an SWE by elaborate, semantic services for the user’s concrete situation, then it is more probable that the scale tips in the desired direction.
But how to design such elaborate, semantic services? We suggest the “Added-Value Analysis” (AVA) as a design method. We will explicate the double relativity of the term “Added-Value,” which allows us to take various microperspectives into account in the design process. Even though the AVA process may feel familiar—and then “just” presents an explication of formerly implicit design processes, it lifts these good practices to a conceptual reflective level. For instance, the awareness about easy-to-fall-for catches (even for professional designers) allows to systematically avoid their consequences. Moreover, by its formalization, such a practice can be replicated by newcomers as well as all people involved in a software design process. Since the concept of “Added-Value” is often misinterpreted, we will elaborate on its history and usage, particularly because “the way in which Added-Value is defined effectively determines its role” (de Chernatony, Henry, & Riley, 2000, p. 43). We illustrate the technique by applying it to the SWEs cpOinT and cOnneXiOnS. Both are based on strong semantic representation formats for course materials, which make them comparable but complementary in their semantic services. cpOinT (Kohlhase, 2005b; Kohlhase, 2005a; Kohlhase & Kohlhase, 2004) is largely a personal SWE for managing and flexibly delivering large collections of MS PowerPoint (PPT) presentations. It is an open-source Semantic Work Environment from within PPT that facilitates semantic classification and annotation of PPT objects like images or text boxes. cpOinT was implemented by one of the authors within the Course-Capsules project at Carnegie-Mellon-University, Pittsburgh, PA. The system can convert semantically annotated PPT documents into the OMDOC format (Kohlhase, 2006). In contrast, cOnneXiOnS (Baraniuk et al., 2002; CNX, 2006) is a community-centered, Web-based content management system for advanced scholarly materials. These “scholarly
Added-Value
modules” are managed in a way that enables instructors to rapidly build and share courses, and assists students in self-steered learning and exploration of interrelations between various disciplines. The system has been developed at Rice University, Houston, Texas; our working group “Knowledge Adaptation and Reasoning for Content” (KWARC) at the Jacobs University, Bremen, Germany hosts the European cOnneXiOnS Centre, supported by ONCE-CS (2005).
the semantiC prisoner’s dilemma As a user’s motivation for taking action crucially depends on the trade-off between the (potential) benefit of and the investment needed for achieving it, we will now scrutinize the cost-benefits ratios induced by semantic work environments. All SWEs share the same problem: Semantically structured data are hard to come by. This problem is known as the “Authoring Problem” and invites two approaches: Either to let highly motivated or paid workers produce the semantic content, or to involve the user as a producer herself. As semantic technology realizes its potential only on large data sets, either the motivation or the money runs out before the potential could be realized. Therefore, the scheme of the “user as consumer and producer,” currently revolutionizing the Internet as Web 2.0 (Downes, 2005; O’Reilly, 2005; Weinberger, 2002), is also very attractive for developers of semantic technology, particularly SWEs. Hence, the responsibility for producing structured information is often given to the users. Their collaborative effort is expected to bring the potential of semantic technology to full bloom. But can the expectation for semantic potential (unspecific as it is) be enough to motivate users into taking action? If an action is considered to be a subjective, intentional activity, then the execution of actions are dependent on their execu-
tors’ appraisal of their effects. According to Heid (2004, p. 146), this recognition is guided by three questions: Who influences the value of an action’s outcome? Whose interests are implemented in that value? and Who assigns responsibility? In the semantic technology frame, the answers include the user, but are dominated by the other collaborators (including technology). Therefore, the user needs to decide whether to take action or not in a cooperation scenario with unspecific expectations about the will to collaborate. This situation was profoundly researched in Psychology and is called “Prisoner’s Dilemma,” which we introduce next.
the prisoner’s dilemma The well-known nonzero-sum game “Prisoner’s Dilemma” (Axelrod, 1984) is often used for analyzing short-term decision-making processes in cooperation scenarios, where the actors do not have any specific expectations about future interactions or collaborations. Concretely, two players, say famous Bonnie and Clyde, are imagined in a prison scenario, where they are independently confronted with cooperation offers by a public prosecutor, the sheriff. They can choose between two moves, either “cooperate” or “defect” (see Figure 1). The idea is that each player gains when both cooperate (i.e., if Bonnie and Clyde do not betray each other, then both will get only a few prison
Figure 1. Bonnie’s value landscape
Added-Value
terms), but if only one of them cooperates, the other one, who defects, will gain more. In particular, if Bonnie decided not to turn Clyde in, then her prison terms depend on his behaviour: If he cooperates with Bonnie, then she will serve 2 terms, but if he decides otherwise, she will serve 10 terms. If both defect, both lose (that is, if Bonnie as well as Clyde blow the whistle on the other, then both will serve 5 terms in prison), but not as much as the cheated cooperator, whose cooperation is not returned. The analogy of the “Prisoner’s Dilemma” to the SWE author’s situation is apparent since the potential of semantic technology does not represent specific expectations about future interactions. For instance, when a semantic author creates a content unit, then it can be fed into semantic systems (thereby fulfilling some of the potential), but the value of the invested knowledge did not increase by this indirect publication process. The addressee in this process is the machine, the objective that the machine can now deal with it. The author can only guess what happens to the content unit later on. In contrast, the aim of publishing a Wikipedia article consists in information collaboration. There, the author’s expectation is rather specific: Distribution of information and publication to fellow humans. Indeed, spelling out the consequences of the semantic interaction process in terms of the Prisoner’s Dilemma sounds all too familiar: If the author decides to invest her time and effort and others contribute as well, everyone profits tremendously from this synergy of cooperation. On the other hand, if just the author works on semantic mark-up, then she will gain nothing in the short run, but some in the long run (see Figure 2). We call this situation the “Semantic Prisoner’s Dilemma.” Since this scenario is a dilemma, a natural question to ask now is: What decision should the SWE author sensibly make?
the microperspectives in the semantic prisoner’s dilemma For a user of semantic material, the motivation for preferring semantically rich data is simple: Explicit document structure supports enhanced navigation and search, semantic mark-up yields context and search by content. Furthermore, the higher the degree of semantic structure, the more services can feed on the material, the higher the benefit for the user. But this is only a standpoint from without—a macroperspective in which all benefits and sacrifices are known and the global decision optimum can be perceived (see Figure 2). The entity of “all” microperspectives informs the macroperspective on the conceptual level towards the global optimum, therefore, the (macroperspective of a) designer is really just concerned with this optimum. From within the situation though, that is, from a microperspective, there is also motivation against taking action: The cost of creating a document is proportional to the depth of the mark-up involved. However, once the mark-up quality passes a certain threshold which supports flexible reuse of fragments—content creation costs may actually go down as they are dominated by the cost of finding suitable (already existent) knowledge elements. Thus, the author is interested in a high reuse ratio, provided that retrieval costs are not prohibitive. The benefits seem obvious for the author who has the opportunity to reuse her own content modules frequently, but the real payoff comes when she is
Figure 2. The semantic prisoner’s dilemma
Added-Value
part of a group of individuals that share content objects and knowledge structures freely. In the Prisoner’s Dilemma, if the decisionmakers were purely rational, they would never cooperate (without at-hand incentives) as they should make the decision which is best for them individually. Suppose the other one would defect, then it is rational to defect yourself: You will not gain much, but if you do not defect you will have all the work. On the other hand, suppose the other one would cooperate. Then you will gain (especially in the long run) whatever you decide, but you will gain more if you do not cooperate (as you do not have to invest your time and effort). So here too, the rational choice is to defect. The problem is that if all content authors are rational, all will decide to defect, and none of them will gain anything. In particular, the sensible decision for an SWE author is her non-cooperation based on individuals’ microperspectives. Therefore, users’ expectations may not solely depend on an SWE’s semantic long-term potential, as each user’s realities dominate the decision process. The Semantic Prisoner’s Dilemma is based on two competing perspectives on the situational value distribution. The macroperspective allows the consideration of all values and hence enforces the decision for the global optimum for all, whereas the microperspective lets the user interpret the known values according to her local optimum. The latter is decisive for the process of takingaction. Therefore, our approach in this chapter, to tip the taking-action scale towards getting people into SWEs, consists of manipulating the values in the decision landscape, so that from a microperspective the gains increase above the critical threshold for acting. In particular, if we enrich an SWE by elaborate, semantic AddedValue services for the concrete situation the user is in, then it is more probable that the tip goes in the wanted direction. But Added-Value services are often misunderstood, therefore we start off with the underlying concepts and develop an SWE design strategy for “Added-Values”.
added-value analysis as a design method The term “Added-Value” has often been used as an umbrella term up to the point of actually losing any serious meaning. In the following we try to reestablish its meaning and its relevance for SWE design by disclosing its history (and marketing aspects), pitfalls, and potential. As there are many (almost) equivalent formulations of the term (see a list further below), we stress the underlying concept character by using the capitalized diction “Added-Value.” The story of “Added-Values” in Science—we surmise despite occasional earlier occurrences— started in 1997 with Don O’Neills statement: “The value add of software to the national economy is not well understood” (O’Neill, 1997, p. 11) and his following description of a study (supported by the American National Software Council) that aimed to understand this “value add” on the level of the national economy of using software vs. traditional means. The central question was how people can be motivated to replace their formalizable procedures by software processes. O’Neill tried to analyze and then to “sell” the advantages of software to the public. Note that the term “Added-Value” is strongly connected to marketing issues here. Subsequently, the term was used in phrases like “added value” (Mana-Lopez, 2004; Nielsen, Overgaard, Pedersen, Stage, & Stenild, 2006), “adding value” (Acken, 1998; Cosley, Frankoswki, Terveen, & Riedl, 2006; Dow, Hackbarth, & Wong, 2006; O’Reilly, 2005), “value-added” (Berghel, 1999; Lindgaard, 2004), “value add” (O’Neill, 1997), “value-based” (Boehm, 2003), and so forth in papers in Computer Science, almost always to convince people of the “real” value of its respective product or requesting value considerations. In an online survey about semantic Added-Value services by the authors, the marketing character of the term was confirmed. For instance, someone associated the well-known
Added-Value
phrase “It’s not a bug, it’s a feature” with AddedValue services, another was reminded of AddedValue discussions in the UMTS debate with the “connotation that this would make people pay for weird data services.” The reason for the difficulty using the term is the subjectivity of the “value” concept and the use of such emphases like “more” or “added”: All of them are relative concepts and do not have any absolute or objective meaning. Therefore, we speak of the “double relativity of Added-Value.” This double relativity is based on the microperspective, which we are mainly interested in here. We will analyze and discuss the term “Added-Value” from the macro- as well as microperspective to mark off their respective properties. We will start out with recalling an analysis of the term within Marketing Science by de Chernatony et al. (2000) and combining it with semantic technology design attitudes. From a literature review, de Chernatony et al. (2000) discerned three subdomains with distinct value definitions: Pricing, Consumer Behaviour, and Strategy literature.
Strategy In the Strategy articles the macroperspective on value was the strongest and the most undifferentiated: A value was simply described as “what buyers are willing to pay” (de Chernatony et al., 2000, p. 41), that is, the price of a product or service. This subdomain is only interested in setting the general frame, so they explicitly take the view from without. They understand value in terms of suppliers, not of buyers (even though the word is used); therefore they use the monetary phrasing. This can be compared with scientific visions of semantic technology like Berners-Lee’s “Semantic Web,” setting the frame in terms of offered potential.
Consumer Behaviour The value definition in the Consumer Behaviour subdomain differs from the one in the strategy domain as the value is here “defined in terms of customer needs and what is desirable” (de Chernatony et al., 2000, p. 40). A consumption value is considered to have five components: Functional, social, emotional, epistemic, and conditional (Sheth, Newman, & Gross, 1991). The product is seen as “complex cluster of value satisfactions” (de Chernatony et al., 2000, p. 41, citing Levitt). Surprisingly, we see the macroperspective at work here even though the consumer has the central role in these definitions. But modelling the consumption value is clearly a view from without, which makes the customer more comprehensible. Analogously, positing the product as a mixture of value satisfactions, does not imply a customer perspective, because just the satisfactions at the end and not the process seem to be of value.
pricing In contrast, we find that the Pricing articles with respect to “value” take the microperspective. Here, value has been defined as “the trade-off between customer’s perceptions of benefits received and sacrifices incurred” (de Chernatony et al., 2000, p. 40). The pricing subdomain takes into account that the most rational price might not be the one that the user is actually willing to pay. The optimal value depends on the customer’s perception, that is, it is contingent on the here-and-now of the individual. We think this value definition to be the most suited one for our purposes, and we can approach the definition of “Added-Value.” To contrast the latter from “value” Grönross distinguishes “an offering’s core value [. . . to be] the core solution and its added value as additional services” (Grönross, 1997, p. 408). It is striking that for the definition of Added-Value, we have to settle on what the core solution is, that is, we need to know
Added-Value
the core problem first. In particular, we have to specify the core problem (possibly comprising several subproblems) to know what the core solution is, and then we can look at this solution with our value term (henceforth called core value), see Figure 3. Therefore, the main task in determining Added-Value consists in establishing the (perceived) benefits and sacrifices of the core solution with respect to the core problem. Quite often, this core problem is kept rather veiled. For instance, in the online questionnaire mentioned above, the core problem was frequently seen to be the argumentation for semantic technology against traditional technology. But if we accept that, the core solution is the concrete implementation satisfying a traditional task (others can by definition not be solved by traditional technology), then all advantages of the underlying semantics could be used for producing AddedValues. In contrast, if we define the core problem as making use of machine-understandable data or equivalently as the question “How can computers help us to rationalize our processes (at work or at home)?” then semantic services deliver core solutions. Therefore, here Added-Value services can only be non-semantic ones, for example, reducing the sacrifice of semantic mark-up. As we can see, different core problem settings allow different Added-Value services (AVS), that is, services that are evaluated with a specific core problem and solution in mind from a microperspective. Now, how can Added-Value services with their inherent double relativity be of use in
lifting the values in the decision situation towards taking-action?
The Added-Value Analysis Method The hard question in design (especially usercentred design) is how to understand what the user needs for situated actions. Here, we suggest to exploit the double relativity of Added-Values. Elaborating on Added-Values, particularly with respect to an SWE, allows us to think about the value scheme differently, thus enabling the discovery of unexplored needs, separate research tracks, and even improvement of communication channels. The values are the ones which make the software interesting as a product, thereby setting the expectations, whereas the AddedValues represent a user’s microperspective. In this sense Added-Values resemble attitudes more than objects. They make the software interesting as a process—either by starting the process or by enriching the user’s experience—thereby exceeding expectations. In particular, if a designer fixes the core problem and thereby constrains the potential microperspectives space, then she can explicate the trade-off scale by listing the (then perceivable) benefits and sacrifices for the core solution relative to the core problem. The core problem together with the core solution allows an evaluation from the microperspective and moreover, it indicates (potential) Added-Value services. As we vary the core problem, other services may become Added-
Figure 3. Action or no action? Identification of added-value services
Added-Value
Value services or core solutions. The important point is that a designer can get a feeling for the microperspectives and design accordingly. Concretely, we propose the following AddedValue Analysis (AVA) as a method for SWE design (see Figure 3): First, we identify a core problem and one (or more) adequate solutions. Then we look at the benefits and sacrifices of each solution from the microperspective—voluntarily neglecting the macroperspective on the overall situation. This micro-analysis enables the designer to pinpoint Added-Values. For instance, reuse of content is almost by default a benefit of semantic technology wherever content management is the core problem. Here, content sharing or content repurposing may emerge as Added-Values. Each point in the benefits/sacrifices list can be searched this way for possibilities of increasing benefits or reducing sacrifices, or new associated values that could be supported by an Added-Value Service. In order to document this procedure we build up a table-like list with the AVA, that contains columns for the trigger, the core problem, the considered solution, and its evaluation listed in form of lists of benefits and sacrifices, see Table 1. Originally, a LATEX style file for the AVA was used, see “ava.sty” in Kohlhase (2007). If a sacrifice is incurred on the user in Step 1, this can be taken as another core problem for which either a solution exists or is needed—here, the sacrifice in Step 1 can be considered a trigger for the resulting problem/solution pair in Step 2. If the former, then we can consider it as Added-Value, if latter, then it becomes a potential Added-Value to the previous problem, which is marked as such in the second column of the table. If a row is triggered by another row in the AVA table, we label
this process either by color or by a reference like “→2”. The color flow represents a bottom-up approach, whereas the reference one describes the use process top-down. For convenience, we use the abbreviation “NN” for “nomen nescio (Lat.) = I do not know” to indicate, that we either could not think of an item or that we intentionally do not follow this line of thought for the moment. When varying the core problem, the possible AddedValue services vary as well: Former Added-Values might become values and vice versa. We cannot think any longer in simple value chains, but have to consider rather complex value constellations (Normann & Ramirez, 1998). In particular, an object’s value may change in the process of using it. In Kohlhase and Kohlhase (2007), several interesting AVA catches were pointed out that give valuable design hints and are cited in a nutshell to stress the AVA’s analytical strength: 0.
1.
2.
The quest for objectivity through the AVA is vain. The wanted “view of microperspectives” for a better understanding of the processuality of use actions rather is an organized quest for subjectivity. Knowing the answer before the question makes it sometimes very hard to find the real problem, since downsizing from the macro- to the microperspective is not easily achieved—which is especially true for the initial core problem that tends to be “Saving the World.” Solutions are the benefits. This is usually a sign that the AVA analyst is not sufficiently non-partisan with respect to the software author’s motivation.
Table 1. #
P
Trigger
P
“Demo”
1 2
Core Problem
Solution
Benefits
Sacrifices
•
•“Demo”→2
•
•
Added-Value
3.
4.
5.
Problems are rhetoric questions. If that is the case, we arrived at a dead end for the AVA—possibly because of a too specific problem description. Values on too low levels. Benefits and sacrifices from a higher level should not be handed on to the lower levels as they are already dealt with. The unfinishedness of AVA. Once the AVA is started one realizes very soon that in the analysis process there tend to pop up more and more problems and ideas, which is good on a creativity (or cognition) scale but rather bad on a satisfaction scale.
In our experience, many of these catches are prone to happen and their adjustment procures considerable insights. The combination of as many microperspectives as possible (by iteration of the process) allows a designer to get an intuition for the user’s perceived value constellations vs. the global value constellation (Normann & Ramirez, 1998)—thereby enabling “thoughtful Interaction Design” (Löwgren & Stolterman, 2004) and architecture.
related software design methods Software design comprises various broad scientific fields ranging from, for example, Psychology, Technosociology, Media Pedagogy, to Computer Science. Even within the latter there are numerous autonomous research areas concerned with design like Software Ergonomics, Human-Computer Interaction, Interface Design, or Emotional Design (with interconnections everywhere). Concluding from the Prisoner’s Dilemma (as pointed out above), the special requirements of semantic software design are summarized in “Most important of all is the understanding of how the user wishes to use the product, in what ways, and to
what ends.” (Cooper & Reimann, 2003, Chapter I, p. 11). To fit the Added-Value Analysis method into this abundance, we restrict our overview to existing design methods that share the focus on the “here-and-now” of a user in the process of using the software. As method for analysis we compare them with the AVA along the axes of micro- and macroperspective. We will distinguish between the ones that are concerned with the user and their use of the designed artefact on a general scale, those that deal with capturing (organizational) workflows, and those that set the users’ values to be the most relevant aspect of software design.
Focusing on Users Human-Centered Design (also called “User Centered Design”) starts by scrutinizing the intended users of the designed device and their behaviour towards it using rapid prototyping procedures, design, mock-up, test, and iteration (see, e.g., Norman, 2002; Norman & Draper, 1986). Even though the user is indeed in the center, it is obviously so from a macroperspective separating it clearly from the microperspective approach of the AVA. Human-Centered Design has its roots in the “Participatory Design” method (Bodker, Kensing, & Simonsen, 2004)—without its political connotations with which it was introduced in Scandinavia in the 1960s, that is, under trade unions’ guidance to improve working conditions (Nygaard, 1979). Löwgren and Stolterman (2004) describe it as a “process of mutual learning, where designers and users learn from and about each other” (Löwgren & Stolterman, 2004, p. 152). Even though specific users are involved in a codesign process (and therefore their microperspectives are subsequently considered in the design), this is a “design-by-committee” process and raises problems for all those users not involved in the process. For instance, even though the chosen users are experts in their daily routines, experi-
Added-Value
ence shows that they often are not aware of and therefore do not reflect on these routines. Hence, they do not communicate all of the real underlying steps. The process of “mutual learning” indicates an exchange of perspectives, so it stands to reason that an informal analysis of microperspectives is assumed. We therefore suggest adding the AVA to this method. First, the AVA can be applied to the work practice as a reflection tool by a user. Then the designer can take these into account informing her of potential microperspectives and lifting them to a conceptual level. By using the AVA method herself, she then knows better what to look for and where to look for hidden practices to be revealed. In “Goal-Directed Design” (Cooper & Reimann, 2003) “Personas” are used as user archetypes in the design phases of research, modelling, definition of requirements, framework setting, and refinement through iteration. Like the AVA it aims for the microperspectives of users on a theoretical basis. In contrast to the AVA though, these microperspectives stay disconnected except if they are tied in by the designer. The latter can be assumed, but this is typically done in an informal and implicit process which can be made explicit—and hence supported—by the AVA. The objectives (e.g., “customer empathy”) of the “Contextual Design” method (Beyer & Holtzblatt, 1997) are the same as the ones of the AVA, but its implementation is based on contextual inquiry, whereas the AVA is based on thought trajectories. As with the other methods, the “last” step in the process (in which the designer has to put the gathered information about microperspectives together into one design for all), stays implicit and the AVA could be used to support it.
Focusing on Workflows Participatory Design is also used in Professional Knowledge Management to get a new understanding of the workflow itself. Here, a change of per-
0
spectives happens: Originally, microperspectives were collected by the designer and combined with her macroperspective in order to adapt design to the users’ needs. The workflow focus implies interest in those microperspectives only insofar as they concern the workflow from the macroperspective. The goal consists in transforming work practices, that is, “informal but nonetheless routine mechanisms by which these processes are put into practice and managed in the face of everyday contingencies” (Dourish, 2003, p. 62), into work processes, that is, “regularized procedures by which work is conducted” (Dourish, 2003, p. 62), that can be handled or supported by software applications. The microperspectives are used by the designer to gain insights about users’ implicit knowledge of work practices into an explicit one. With the AVA we keep the microperspective and therefore view the “workflow” as a “useflow.” For instance, “Task-Oriented Design” (Shneiderman, 1998, p. 63) focuses on tasks (from the macroperspective) which are analysed in ever-shrinking subtasks that are easier to work on (from the macroperspective). Note that a macroperspective does not contrast microperspectives per se (and vice versa). But the AVA follows a complete microperspective approach where tasks appear as a consequence of here-and-now requirements. Another approach called “Activity-Based Computing” (Bardram, Bunde-Pedersen, & Soegaard, 2006) takes users’ activities within the workflow into account. Here, the emphasis lies on the human capability of multi-tasking. Even though activities by themselves are a valuable research object, with the AVA we are more interested in the context of the activities from a user’s standpoint.
Focusing on Values In 1996 Batya Friedman coined the term “ValueSensitive Design” (Friedman, 1996) with which
Added-Value
human values re-enter Interaction Design as a relevant asset. He argues for the reintegration of “autonomy” into design and replaces the statement “Humans adapt to technology” with “Technology adapts to humans.” By singling out one specific aspect, the macroperspective is taken. The AVA incorporates the Value-Sensitive Design method by looking at the subjective values for the process of “decision-taking for action or non-action” from the micro- and not only from the macroperspective. In particular, a human is considered an autonomous being, whose decision-taking process can be influenced, but not controlled. With the AVA we take the position that in Interaction Design the understanding of this process (respectively the totality of all these) needs to be enhanced. In the “Humanistic Research Strategy” (Oulasvirta, 2004), human values are considered as well, but the underlying reasoning for action is restricted to experience and culture, whereas the AVA takes the processuality of action into account. The design method most related to the AVA was proposed by Donald Schön in (Schön, 1983) and is described as “Reflection-In-and-On-Action” (Löwgren & Stolterman, 2004, p. 23). It consists in a conversation between the designer and the “envisioned situation” (Löwgren & Stolterman, 2004, p. 23). Even though the pair (problem, solution) is also considered and understood out of the concrete process (i.e., from a microperspective), it does not take the relativeness of the solution with respect to the problem into account. In particular, the user’s value constellation is not analyzed further. Even though the AVA was conceived independently, we can consider it an extension of the Reflection-In-and-On-Action design method. To fortify our intuition with respect to AddedValues, we will revisit the semantic work environments cpOinT and cOnneXiOnS by an Added-Value analysis of their semantic services.
semantiC serviCes in cPOint and cOnneXiOnS For the cpOinT system as well as the cOnneXiOnS system, we will first give their overviews; then we want to launch the Added-Value Analysis with an introductory run through with a very basic core problem, so that the partial formal analysis in the following AVA tables can be reconstructed. The respective core solutions are listed in more detail farther down, for a better comprehension and appreciation of cpOinT and cOnneXiOnS.
cPOint (“Content in PowerPoint”) Overview In 2002, development of the cpOinT system was started within the Course Capsules project (CCAPS) at Carnegie Mellon University (CMU), USA. Its initial aims were to recover the “treasure” of about 2000 MS PowerPoint (PPT) slides (painstakingly collected over the years covering various computer science lectures at CMU), and to convert them into reusable and searchable content units. The solution consisted in making its hidden content explicit and thereby machineunderstandable, so that even automatic services could be built for them, see Kohlhase et al. (2002). An early case study revealed great barriers for use, necessitating a new, prototypic design approach as suggested by the AVA; therefore it is still in a prototypic state. In a nutshell, cpOinT can be considered an SWE for PPT slide management. For a general introduction, see Kohlhase (2005b), Kohlhase (2005a), and Kohlhase and Kohlhase (2004).
Introductory Run Through The initial core problem for cpOinT was handling of PPT legacy data under the constraints that these data should become reusable and search-
Added-Value
able content units. A first approach via editing PPT-generated XML files turned soon out to be too tedious. The sacrifice “To produce it!” was just too big. By reducing this sacrifice we could create Added-Value. The concept “Invasive Technology” (Kohlhase, 2005b) and an implementation of a semantic editor add-in that did not require a user to leave the familiar environment became a solution for the tedium problem. Moreover, the input of semantic information could be much more precise, if the semantic context was already known. To this end, cpOinT differentiates between a general Categorize form (in which a user can assign a predefined semantic category to an object) and category-dependent forms. For instance, if an image is categorized as “Example,” then a user can determine in the Example-form for which other object this object is an example. Now, one benefit of the Categorization form was that the category information was fixed and therefore could be handled by software. But this is also a sacrifice, as individuals might not want to be restricted this way. This became a trigger for a new core problem: Personal semantic information. A solution could be to reinterpret the name field for each object as a “Social Tag” as in, for example, del.icio.us (2006) or flickr (2007). This is not yet implemented as an Added-Value service, but it is a potential one. Looking at the benefit “Access to all formal PPT data” of our Invasive Technology became another Added-Value service called cWOrD, since data are principally very similar in all MS Office applications. This way, we come back to a much more elaborated concept of reuse than before, as we can now imagine sharing such content units in between the MS Office suite.
CPOint Solutions •
Invasive Technology: Architectural design as invasive SWE (Kohlhase, 2005b), so that, for example, a lecturer who is using PPT as presentation editor can stay with her editor of choice within an SWE.
•
•
•
•
•
•
•
•
Categorize Form/Category-Dependent Forms: Different interface types (e.g., PPT menu bar, PPT context menu, or interactive panel). CPointGraphs: Visualization of semantic information in graph format for semantic structural analysis on different levels. CPointStudent: Student-specific support, for example, visualization of notes in post-it format that attach to PPT objects. A lecturer, for instance, could have inserted such a note on a literature reference, enhancing it as a particularly valuable review on learning items covered in a subsequent exam. Category-Dependent Layout: Generation of already categorized text boxes with CSSbased (Bos, Lie, Lilley, & Jacobs, 1998), category-dependent, and user-specific layout. CPointAuthor: Author-specific support, for example, category-dependent layout or PPT functionality extensions like chopping a list box into several boxes without layout change. It is designed as a panel that shows the basic available semantic information for a selected object. CPointMath: Semantic support for mathematical representations and notations (an extension of an invasive LATEX editor from within PPT (Kohlhase, 2004; Necula, 2003). GoTo/GoBack: The “GoTo” is a navigation interface for PPT slide collections (e.g., during course composition) that provides three independent constraining search methods and jump-and-back-again facilities. OMDoc Conversion: The integration with the open-source world is realized by cpOinT’s conversion function. The semantically annotated PPT presentation can be converted into other formats, specifically OMDoc. OMDoc (Kohlhase, 2006) attempts the tightrope dance between a formal representation format and an informal one that can be exploited by machines and can be understood by humans.
Added-Value
Table 2. #
P
Trigger
Core Problem
Benefits
Solution
Sacrifices
1
Initial
Handling of PPT Legacy Data
SWE cpOinT
• Reuse→2,→7 • Search
• To produce it!→13,→18
2
Reuse
Sharing
OMDoc Conversion, OMDoc Import
• Publishing (Recognition→3) • Accessing content • Open content
• Publishing (Exposure→4) • Loss of information (due to format change) →6
3
P
Recognition
Publicity
NN
• NN
• NN
4
P
Exposure
Rights management
NN
• NN
• NN
5
Open content
Use of OMDoc Application
ActiveMath Guided Tour button
• Use of ActiveMath’ strength
• Dependency on ActiveMath implementation and API
6
Loss of information
Save representational information
Presentational OMDoc
• Reuse of essential presentational information
• New format
7
Reuse
Semantic visualization
CPoint-Graphs
• Visualization on various levels • Structural understanding→8
• Restriction to available levels
8
Structural understanding
Metalevel on content
CPoint-Student
• Content becomes Learning Object • Communication channel (from teacher to student) →9
• NN
Communication channel
Semantic handout
CPoint-Student’s Flashcards
• NN
• NN
10
Search
Semantic search
GoTo
• Keyword search in semantic data • Subjectoriented restriction of search space • Easy find→11
• Complex user interface
11
Easy find
Navigation
GoTo’s Go
• Select and act!
• Leaving current object→12
12
Leaving current object
Jumping back and forth
GoBack
• Flexibility in maneuvering
• NN
13
To produce it!
Reducing pain and suffering
Invasive Technology
• Usability: Same lookand-feel • Known handling • Access to all PPT data→14
• Framing (e.g., access rights)
Categorize form
• Machine-understandable categories • Category specific forms
• Framing (e.g., access rights) • Category restriction →15
Simplifying procedures
• Distinction between newcomers and experts
• Less formalization
9
P
continued on following page
Added-Value
Table 2. continued #
P
Trigger
Core Problem
Benefits
Solution
Sacrifices
14
Access to all PPT data
MS intra services
cWOrD
• Easier migration • Sharing of semantic data between applications
• Proprietary format
15
Category restriction
Personal semantic information
Tagging
• Personal search →16, →17
• Invalid or not serious content
16
Personal Search
Keyword Search in Title
GoTo
• Memory trigger
• NN
Personal Search
Social search
Social Tagging features
• NN
• NN
To Produce it!
Distinct roles
CPoint-Author
• Author-specific tasks→19 • User as consumer and producer
• Limitation to a single role
CPoint-Student
• Student-specific tasks
• Limitation to a single role
CMath
• Math-UI with math macros • MathML/OMDoc/LaTeX conversion of math • Inline math • Personal math symbols
• To produce it!
Category-dependent object-layout
• Consistent layout
• To produce it!
17 18
19
•
•
•
•
P
Author-specific tasks
Input of math
Presentational OMDoc/OMDoc Import: Presentational OMDoc is a special OMDoc subformat that includes more presentational information. An OMDoc file can be transformed (from within PPT) into a semantically annotated PPT show. If it was originally generated from a PPT presentation as “presentational OMDoc,” then almost all information can be saved in the conversion process. CWord: An invasive, semantic editor extension for MS Word geared towards the OMDoc format with the same look-and-feel as cpOinT. Extension Tagging: Each PPT object can be given a name. This can be reinterpreted as a tagging feature. ActiveMath Guided Tour Button: ActiveMath is a Semantic Work Environment for learning math that is based on ActiveMath
OMDoc, a subformat of the OMDoc format. With the cpOinT conversion function PPT presentation can be converted into an ActiveMath OMDoc file that can be read by ActiveMath. The ActiveMath Guided Tour Button appears on theory objects within the PPT presentation and connects to the respective ActiveMath Guided Tour for this object in ActiveMath.
Connexions Overview The cOnneXiOnS project began in the fall of 1999 for moving teaching and learning from a static, linear progression of topics to a dynamic “ecosystem” of shared knowledge. Here, communities of authors, instructors, and learners continually update and relate content units to provide a
Added-Value
Table 3. #
P
Trigger
Core Problem
Benefits
Solution
Sacrifices
1
Initial
Rapid knowledge exchange
SWE cOnneXiOnS
• Living content → 2,→3 • Community building →5 • Dynamic ecosystem →13 • Web commons → 15,→16,→17
• To produce it!→12
2
Living content
Individualized views
Lenses
• High quality modules→4
• Trust
Roadmap
• Individualized navigation
• Complex GUI • To produce it!→12
3
4
P
5
6
Living content
Developing tracking
Versioning
• History tracking • Backup • Parallel authoring process
• NN
High quality modules
Pre-review
Lenses
• Post-review
• Trust
Community building
Communication
Work Areas
• Group oriented publishing • Sustainability • Efficiency
• Public participation
Versioning
• History browsing • Parallel authoring process • Backup
• Public participation • Version Management Overhead
Module structure
• Fine granularity→ 6,→7 • Rapid dissemination of current research results
• Slicing
Fine granularity
Dependency structure
CNXML
• Interdisciplinary browsing→11 • Semantic search→8
• To produce it!→12
7
P
Fine granularity
Graph visualization of dependency structure
NN
• NN
• NN
8
P
Semantic search
Semantic search engine
Search interface
• Sophisticated metadata and formulae search →9
• Complex UI
9
P
Sophisticated metadata and formulae search
Ambiguity of formulae representation
Integration of MathWebSearch
• “Deeper” result sets
• Complex UI • MathML
10
Interdisciplinary browsing
Context affiliation
Roadmap
• Visualization of dependencies
• NN
11
Interdisciplinary browsing
Different notation
CNXML
• Separation between content and presentation
• Content mark-up
CNXML
Authoring interface
• Edit-In-Place
• Get used to it!
Reuse in different contexts 12
To produce it!
continued on following page
Added-Value
Table 3. continued #
P
Trigger
Core Problem
Benefits
Solution
Sacrifices
13
Dynamic ecosystem
Customized courses
Course composer
• Customization • Web content→14 • Content aggregation
• NN
14
Web content
Various output formats
Converter
• NN
• NN
15
Web commons
Content rights management
Content Commons
• Fine granular rights management • Reuse • Content adaptation • Free, interconnected educational material→18
• Exposure • Trust
16
Web commons
Public announcement
Open access
• Publicity • Return by general Web searches • New form of community building • Additional level of peer review
• NN
17
Web commons
Dissemination
Decentralization
• Publicity • Growing user base
• NN
18
Free, interconnected educational material
Self-control of standard of knowledge
Assessment tool
• Immediate feedback
• Willing to learn!
19
Reuse
Different presentations
Annotation interfaces
• Reuse in different context→11 • Disconnection between content and presentation
• To produce it!→12 • Parallel authoring process→20
Parallel authoring process
Management of change
locutor
• Efficiency • Less conflicts • Improved Versioning
• Fine granular classification of changes
20
P
comprehensive perspective on how topics across disciplines interrelate, see for example, Henry (2004). At the moment it offers around 4000 course modules and “the Connexions servers handle over 16 million hits per month representing over 600,000 visitors from 196 countries” (Thierstein & Baraniuk, 2007). cOnneX iOnS offers an SWE to facilitate shared authoring, publishing, exploration, course composition, and personal customization of knowledge resources, for a general introduction, see Baraniuk et al. (2002), CNX (2006).
Introductory Run Through The core problem solved by the SWE cOnneXiOnS can be stated as looking for technical support for
rapid knowledge exchange. In particular, the solution should comprise knowledge documents that are openly accessible, can be shared easily, and worked on together. Therefore, the benefits of the solution can be listed as a dynamic ecosystem with living content, provision of a Web commons, and thereby initiating community building processes. Having a closer look at the gained benefits, we recognize that, for example, the benefit of building up new communities triggers a new core problem: Communication. User subgroups might want to communicate with each other, not all of the users are up-to-date, and they want to work on different parts of a course at the same time without disrupting the others’ work. Here, cOnneXiOnS offers solutions in form of work areas, a versioning control system, and a fine-granular
Added-Value
module structure. Now, if we consider the latter as a core problem, the cOnneXiOnS-specific format CNXML presents itself as a solution. As it is an XML language, we can directly make out a sacrifice for the user: The user has to produce it! Rethinking it as yet another core problem, which is solved by the “Edit-In-Place” feature, we can discover more and more microperspectives and advice for design of potential Added-Value Services.
•
•
cOnneXiOns Solutions •
•
•
•
•
Content Commons: The Content Commons is a repository of interrelated modules under the Creative Commons license, that are ready for (re-)use and modifications. CNXML: cOnne X iOnS’ representation format is the CNXML format, a special XML application for semantic mark-up. Its strength consists in its structure relations within and between modules. Module Structure: A module is a standalone information unit and as such the basic building block of a cOnneXiOnS course. Modules are written in the CNXML format and stored in the Content Commons. Roadmap: Using the roadmap as a navigation tool with personal annotations and presentations, a student is able to go back and forth within a course. It can even be seen as an “anchor” within a course when exploring external resources. Course Composer/Instructor Interface: The Course Composer and the Instructor interface are part of cOnneXiOnS’ knowledge factory (Baraniuk et al., 2002). They facilitate individual as well as collaborative working methods to setup courses. The instructor’s course configuration is then presented in a roadmap to the students in order to ease the navigation through the system.
•
•
•
•
Versioning: Each time a module is modified, cOnneXiOnS saves the previous version of the module in history. As default the latest version of the module is used for viewing, but because of version numbers assigned to each module revision, instructors can also select which version of each module to use in a course. Work areas: There are two types of work areas inside the cOnneXiOnS platform: “My Workspace” and “Workgroups.” They differ in their accessibility to other authors. The former is a private work area only accessible by the author herself, whereas the latter is set up for community building. After having set up a workgroup, authors can assign workgroup-dependent roles (author, maintainer, or copyright holder) to indicate each ones’ responsibility in the joint work. Authoring interface: One of the main goals was not to bother the content author with writing CNXML code; therefore they offer the Edit-In-Place (EIP) editor as a default editor that handles all the XML tagging. cOnneXiOnS enables authors to work within their own working space and to create workgroups to participate on contributions of other authors. Semantic search: Authors, instructors, and students can use sophisticated metadata search. Additionally, the interface offers a semantic formulae search to overcome ambiguity in formulae presentation (Kohlhase & Sucan, 2006). Open access: The Content Commons are available to all Web search engines and not only the metadata of the modules that can be searched, but the modules themselves as well. Converter: The cOnneXiOnS Converter is used to produce various output formats of a single module or an entire course. It even considers different presentation styles according to the output format. Thus the
Added-Value
•
•
•
•
printed version, like a PDF, looks different from the online version. Assessment: To give learners an immediate feedback to their learning experiences, cOnneXiOnS has incorporated an assessment tool. This tool allows learners to test their comprehension right away. Extension locutor: Ontology-driven management of change (Müller, 2006). This system will manage modifications as well as relations between documents to avoid inefficiencies, conflicts, and delays in the content authoring process. LOcuTOr will be integrated into the open-source system Rhaptos on which cOnneXiOnS is based. Such an incorporated management of change will improve the versioning and the roadmap tool and thus encourage collaboration in the nonlinear module structure of cOnneXiOnS. The relations between and within modules will be explicitly handled such that the reuse, the set-up of module variants, and the parallel authoring process will improve, and consequently the rapid knowledge exchange of the entire system. Extension Decentralization: The cOnneXiOnS team is developing a new architecture and tools for distributed storage and maintenance of the cOnneXiOnS Content Commons. Different servers will host Content Commons that are synchronized with each other. Extension Lenses: The cOnneXiOnS team is tracking the approach of a post-review process. The system will incorporate recommendations of trusted third parties or authorized cOnneXiOnS members, to provide the user with filters, so called “lenses,” on the Content Commons. Each lens will have its own focus which can be used to parametrize the roadmap to set up individualized views. In addition the search results executed within a customized roadmap will be weight based on such lenses and endorsements.
ConClusion Semantic Work Environments need content authoring users. As it frequently is a collaboration scenario with unspecific expectations for a user, this user will often be in the Semantic Prisoner’s Dilemma. At its heart we found the competition between the micro- and the macroperspective. The former informs a user’s decision for action, whereas the latter informs an SWE designer’s decision for offering services. Getting people into SWEs means that designers of SWEs have to take microperspectives into account. We proposed the Added-Value Analysis (AVA) as an appropriate method, as the double relativity of “Added-Value” provides her with a thoughtful value constellation. To conclude, our analysis hinges on a scenario with high potential as well as high investment for a user. We can adhere to the fact that the perception of adding value (like recognition of potential) and actualizing it as Added-Value (as scrutinized) are not necessarily automatic. But in order to support users’ taking-action, it is necessary for the designer “to have a solid understanding of the complexity involved in being rational” (Löwgren & Stolterman, 2004, p. 50).
referenCes Acken, J. M. (1998). How watermarking adds value to digital content. Communications of the ACM, 41(7), 74-77. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Baraniuk, R., Burrus, C., Hendricks, B., Henry, G., III, A. H., Johnson, D., Jones, D. et al. (2002). ConneXions: DSP education for a networked world. In Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ’02); IEEE International Conference, ICASSP Conference Proceedings (vol. 4, pp. 4144-4147).
Added-Value
Bardram, J. E., Bunde-Pedersen, J., & Soegaard, M. (2006). Support for activity-based computing in a personal computing operating system. In CHI06 (2006) (pp. 211-220). Montreal, Quebec, Canada.
Cosley, D., Frankoswki, D., Terveen, L., & Riedl, J. (2006). Using intelligent task routing and contribution review to help communities build artifacts of lasting value. In CHI06 (2006) (pp.32-41), Montreal, Quebec, Canada.
Berghel, H. (1999). Digital village: Digital publishing. Communications of the ACM, 42(1), 19-23.
de Chernatony, L., Harris, F., & Riley, F. D. (2000). Added value: Its nature, roles and sustainability. European Journal of Marketing, 34(1/2), 39-56.
Berners-Lee, T. (1998). What the Semantic Web can represent. Retrieved March 9, 2008, from http://www.w3.org/DesignIssues/RDFnot.html Berners-Lee, T., & Fischetti, M. (1999). Weaving the Web: The original design and ultimate destiny of the World Wide Web, by its Inventor. San Francisco, CA: Harper. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American Online. Retrieved April 23, 2008, from http://www. sciam.com/article.cfm?id=the-semantic-web Beyer, H., & Holtzblatt, K. (1997). Contextual design: Defining customer-centered systems. Morgan Kaufmann. Bodker, K., Kensing, F., & Simonsen, J. (2004). Participatory IT design. B&T. Boehm, B. (2003). Value-based software engineering. ACM SIGSOFT: Software Engineering Notes, 28(2), 1-7. Bos, B., Lie, H. W., Lilley, C., & Jacobs, I. (1998). Cascading style sheets. Level 2; CSS2 Specification. W3c recommendation, World Wide Web Consortium (W3C). Retrieved March 9, 2008, from http://www.w3.org/TR/1998/REC-CSS219980512 CNX. (2006). cOnneXiOnS. Retrieved March 9, 2008, from http://www.cnx.org Cooper, A., & Reimann, R. (2003). About face 2.0: The essentials of Interaction Design. John Wiley and Sons.
del.icio.us. (2006). Retrieved March 9, 2008, from http://del.icio.us/ Dourish, P. (2003). Where the action is: The foundations of embodied interaction. MIT Press. Dow, K. E., Hackbarth, G., & Wong, J. (2006). Enhancing customer value through IT investments: A NEBIC perspective. The DATA BASE for Advances in Information Systems, 37(2/3), 167-175. Downes, S. (2005). E-learning 2.0. Retrieved March 9, 2008, from http://elearnmag.org. eLearn Magazine Flickr. (2007). Project home page. Retrieved March 9, 2008, from http://www.flickr.com Friedman, B. (1996). Value-sensitive design. Interactions, ACM, 3(6) 16-23. Grönross, C. (1997). Value-driven relational marketing: From products to resources and competencies. Journal of Marketing Management, 13, 407-419. Heid, H. (2004). Kann man zur Verantwortlichkeit erziehen? Über Bedingungen der Möglichkeit verantwortlichen Handelns. In J. H. M. Winkler (Ed.), Die aufgegebene Aufklärung: Experimente pädagogischer Vernunft, Beiträge zur pädagogischen Grundlagenforschung (pp. 145-154). Juventa Verlag Weinheim und München. Henry, G. (2004). ConneXions: An alternative approach to publishing. In ECDL 2004 European Conference on Digital Library (pp. 421-431). University of Bath, UK
Added-Value
Kohlhase, A. (2004). CPoint’s mathematical user interface. MathUI MathUI04.pdf. Workshop Math User Interfaces on MKM04.Retrieved March 9, 2008, from http://www.activemath.org/paul/ MathUI/proceedings/CPoint/CPoint. Kohlhase, A. (2005a). CPoint documentation. Retrieved March 9, 2008, from http://kwarc.eecs. iu-bremen.de/projects/CPoint/ Kohlhase, A. (2005b). Overcoming proprietary hurdles: CPoint as invasive editor. In F. de Vries, G. Attwell, R. Elferink, & A. Toedt (Eds.), Open source for education in Europe: Research and practise (pp. 51-56). Heerlen: Open Universiteit of the Netherlands. Kohlhase, M. (2006). OMDoc: An open mark-up format for mathematical documents. Springer Verlag. ISBN 3-540-37897-9. Kohlhase, M. (2007). A LATEX style for the added-value analysis. Retrieved March 9, 2008, from http://kwarc.info/projects/latex Kohlhase, A., & Kohlhase, M. (2004). CPoint: Dissolving the author’s dilemma. In A. Asperti, G. Bancerek, & A. Trybulec (Eds.), Mathematical knowledge management, MKM’04 (LNAI No. 3119, pp. 175-189). Springer Verlag. Kohlhase, A., & Kohlhase, M. (2007). Reexamining the MKM value proposition: From math Web search to math Web research. In M. Kauers, M. Kerber, R. Miner, & W. Windsteiger (Eds.), MKM/Calculemus 2007 (LNAI No. 4573, pp. 266-279). Springer Verlag. Kohlhase, M., & Sucan, I. (2006). A search engine for mathematical formulae. In T. Ida, J. Calmet, & D. Wang (Eds.), Proceedings of Artificial Intelligence and Symbolic Computation, AISC’2006 (LNAI No. 4120, pp. 241-253). Springer Verlag. Kohlhase, M., Sutner, K., Jansen, P., Kohlhase, A., Lee, P., Scott, D. et al. (2002). Acquisition of math content in an academic setting. In Second International Conference on MathML and Technologies
00
for Math on the Web, Chicago, IL. Retrieved April 23, 2008, from http://www.mathmlconference. org/2002/presentations/kohlhase Lindgaard, G. (2004). Making the business our business: One path to value-added HCI. Interactions, ACM, 11(2) 12-17. Löwgren, J., & Stolterman, E. (2004). Thoughtful interaction design: A design perspective on Information Technology. The MIT Press. Mana-Lopez, M. J. (2004). Multidocument summarization: An added value to clustering in interactive retrieval. Transactions on Information Systems, 22(2), 215-241. Müller, N. (2006). An ontology-driven management of change. In Wissens- und Erfahrungsmanagement, LWA (Lernen, Wissensentdeckung, Aktivität) Conference Proceedings (pp. 186-193). Necula, G. (2003). What is Texpoint? Retrieved March 9, 2008, from http://raw.cs.berkeley.edu/ texpoint/TexPoint.html Nielsen, C. M., Overgaard, M., Pedersen, M. B., Stage, J., & Stenild, S. (2006). It’s worth the hassle! The added value of evaluating the usability of mobile systems in the field. In Changing Roles. NordiCHI, ACM (pp.272-280). Oslo, Norway. Norman, D. A. (2002). The design of everyday things. B&T. Norman, D., & Draper, S. (Eds.). (1986). User centered system design: New perspectives on human-computer interaction. Lawrence Erlbaum Associates Inc, US. Normann, R., & Ramirez, R. (1998). Designing interactive strategy. From value chain to value constellation. Wiley Sons. Nygaard, K. (1979). The iron and metal project: Trade union participation. In A. Sandberg (Ed.), Computers dividing man and work (pp. 94-107). Malmö: Swedish Center for Working Life.
Added-Value
ONCE-CS. (2005). Open network of centres of excellence in complex systems. Retrieved March 9, 2008, from http://complexsystems.lri.fr/Portal/ tiki-index.php O’Neill, D. (1997). Software value add study. ACM SIGSOFT: Software Engineering Notes, 22(4), 11-12. O’Reilly, T. (2005). What is Web 2.0: Design patterns and business models for the next generation of software. Retrieved March 9, 2008, from http://www.oreillynet.com/pub/a/oreilly/tim/ news/2005/09/30/what-is-web-20.html Oulasvirta, A. (2004). Finding meaningful uses for context-aware technologies: The humanistic research strategy. In Late Breaking Result Papers. ISBN 1-58113-703-6.
Schön, D. A. (1983). The reflective practitioner: How professionals think in action. B&T. Sheth, J., Newman, B., & Gross, B. (1991). Why we buy what we buy: A theory of consumption values. Journal of Business Research, 22, 159-170. Shneiderman, B. (1998). Designing the user interface: Strategies for effective Human-Computer Interaction (3rd ed.). Addison Wesley Longman. Thierstein, J., & Baraniuk, R. G. (2007). Actual facts about ConneXions. Retrieved March 9, 2008, from http://cnx.org/aboutus/publications/ paper.2006-07-12.361906064 Weinberger, D. (2002). Small pieces loosely joined: A unified theory of the Web. Basic Books.
0
0
Chapter XII
Enabling Learning on Demand in Semantic Work Environments: The Learning in Process Approach Andreas Schmidt FZI Research Center for Information Technologies, Germany
introduCtion The new flexibility of workers and work environments makes traditional conceptions of training in advance, in rather large units and separate from work activities, more and more obsolete. It is not only the problem of inert knowledge (i.e., knowledge that can be reproduced, but not applied; Bereiter & Scardamalia, 1985), but also the degree of individualization of learning paths these traditional methods cannot cope with. What we actually need is learning on demand, embedded into work processes, responding to both requirements from the work situation and from employee interests, a form of learning crossing boundaries of e-learning, knowledge management, and performance support (Schmidt, 2005). Many see self-steered learning as the salvation for that new paradigm (in contrast to course-steered learning activities), but this ignores the fact that guidance is essential—both for the learner (reducing the cognitive load) and for the company (enabling the manageability of learning processes). As
a consequence, we have elaborated a concept in between: context-steered learning in which learners get contextualized recommendations of learning opportunities. Implementing such a method requires a semantic work environment infrastructure that allows computer systems for getting hold of work situations and the learning needs arising out of them. Especially crucial is a semantic model of human resource development in such a setting just at the right level of complexity (not simplifying too much, but still manageable), a set of services and a user context management component for capturing and maintaining the information about what the user is currently doing and what’s her state.
BaCkground and state of the art The idea of a learning on demand at the workplace has been very popular for the last 3 years.
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Enabling Learning on Demand in Semantic Work Environments
However, this has not resulted in a well-defined (research) community for investigating the means to support it, but is rather scattered among various disciplines. There is not even an agreed term for this form of learning. Terms range from “embedded learning” (Straub, 2005), via “work-integrated learning” (Lindstaedt, 2006) up to “workflow learning” (Cross & O’Driscoll, 2005). In our research, we coined the term “contextaware workplace learning support” (Schmidt & Braun, 2006) to denote any automated means of learning supported which are based on the situation of the user in her work processes. It builds upon experiences and results in many different fields among which the most important ones shall be summarized in the following: • •
•
•
Business-process-oriented knowledge management (BPOKM, e.g., Abecker, 2004): BPOKM has realized the importance of the process context for context-aware delivery and storage of knowledge assets. Recently, the approach was further developed towards informal learning techniques, for example, in Farmer (2004). While it is true that business processes are an important element of the work context, they definitely are too narrow, although there are some approaches extending it like Hädrich and Priebe (2005). Furthermore, BPOKM has so far ignored the concept of pedagogical guidance completely, viewing the problem mainly as a retrieval problem of the right content. Just-in-Time Information Retrieval: The approach is similar to BPOKM with the difference that is does not particularly focus on business situations, but rather on a general task context (Rhodes & Maes, 2002). By its generic nature, it allows only for a shallow consideration of context, usually only keyword-based query generation. Macroadaptive e-learning approaches: like Woelk and Agarwal (2002) or Davis
•
et al. (2005) mainly adapt to the learner in terms of delivery. They filter the learning content based on the learner’s competencies and the knowledge requirements of the current position or business process context. While this is an important step into the direction of context-aware learning support, they only consider rather static elements of the context, which does not allow for deeper integration of working and learning processes. Interesting developments are in the direction of context-aware recommendations like in Lemire, Boley, McGrath, and Ball (2005), but they have still a notion of context too limited for holistic workplace learning support. Microadaptive e-learning approaches: and adaptive hypermedia approaches are probably the area of research with the longest history and highest activity (Brusilovsky, 2001; Park & Lee, 2004). They focus primarily on the learning object behavior itself and how to adapt it to the learner and her characteristics. Recent approaches include capturing the learner’s context with the help of eye-tracking (Gütl et al., 2005). The main problem of current adaptive e-learning approaches is that they do not consider learning in a work context, but rather set up artificial contexts in learning labs. They allow for a deep contextualization on the personal level, but neglect the organizational context completely. Intelligent tutoring systems (ITS): rely on AI techniques to provide complex adaptive behaviour. These systems have mostly focused on supporting and scaffolding of problem solving in learning (Brooks et al., 2006). In contrast to the microadaptive approaches in the previous paragraph, their adaptive behaviour is based on rich knowledge representations, and they use cognitive diagnosis and user modelling techniques to respond to the needs of the learners. The key
0
Enabling Learning on Demand in Semantic Work Environments
problem of ITS is that they usually require a closed domain for which the system is built and thus do not integrate well into working processes. All these approaches tackle the problem from a certain perspective, but so far, an integrated perspective with a holistic notion of context and with a clear understanding of learning-related problems is still missing. In the following section, we will present a method framework for learning on demand (context-steered learning) before investigating the technical means to realize it (model and services). Finally, the problem of user context management is briefly sketched. To round off the picture, a walkthrough of the system is given before the chapter concludes with a summary and future research directions.
method: Context-steered learning Context-Steered learning draws from experiences both in e-learning research and from information
behaviour research (e.g., Kuhlthau, 2004; Niedzwiczka, 2002) and tries to (a) lower the barriers to learning activities and (b) to avoid frustration of learners because of subjectively irrelevant learning offers (because it does not fit the task at hand or because it cannot be understood with the current competency level). Here, the system observes the (potential) learner’s work activities, while the learner interacts with everyday applications. The system deduces from its domain knowledge and the learner’s knowledge potential knowledge gaps. For these gaps, the system can compile small learning programs from available learning resources and recommend them to the learner, who can decide whether to learn now, to postpone it, or to discard the recommendation completely. The system can also recommend other colleagues (e.g., having been in a similar situation recently or being an expert for the current competency gap) for informal learning. Context-steered learning can be visualized as a process cycle, which appears as an on-demand “detour” of the working processes and can be broken down into the following system primitives (see Figure 1; see Schmidt & Braun, 2006):
Figure 1. Context-steered learning and system behavior primitives (Schmidt, 2006)
0
Enabling Learning on Demand in Semantic Work Environments
•
•
•
•
•
Initiate: In the first phase, the system detects based on observations of the work context and background knowledge if there is a learning opportunity. This functionality refers to the timing (when) and modality (how) of interventions. The latter is important to consider avoidance of the negative effects of interrupting. Select: Appropriate learning resources helping to satisfy the learner’s knowledge need and that fit to the learner requirements are selected. Deliver: It may seem that recommending learning objects (or other documents) already imply that we have determined what to recommend. But this is only partially true. Certain resources cannot be understood by the learner because she does not meet the prerequisites so it is often necessary to compile longer learning programs that incorporate the prerequisites. Adapt: This is the domain of classical micro adaptivity in e-learning. This incorporates the adaptation of navigation (between different content parts of one learning object or between learning objects) and of presentation and behavior of (active) learning content. The latter refers to complex and very specific forms of adaptation of individual learning objects (so-called “context-aware learning objects”), like reflecting the actual work situation in a simulation. Record: One often neglected aspect in the business context of classical formal training is certificates that can be obtained after successfully attending training activities. As a replacement in more informal context where no certificates exist, electronic portfolios can take the role. They can both record results of assessments (where they are available) and the learner’s reflections After completion of this micro learning process, the learner returns to his working process and has the possibility to apply the newly acquired
competencies—and to return to the learning process if it has turned out that learning was not as successful as expected.
model: ConCeptualizing learning on demand In order to actually provide learning support for the sketched context-steered learning method, the system needs to know (a) about the learner and her competencies, (b) about the characteristics of the situation the learner is in, and (c) about available learning opportunities. This requires a deep understanding of the domain of the user. Ontologies have proven their usefulness for representing domain semantics in a formal way so that it is usable in a machine-processable way. For conceptualizing context-aware workplace learning support, we have developed the Professional Learning Ontology1 (Schmidt & Kunzmann, 2006) based on the LIP ontology (Schmidt, 2005). It is structured into three main parts (see Figure 2): •
Learning opportunities: These encompass both traditional pedagogically-prepared learning material like small learning objects and more informal learning opportunities like colleagues or documents used within work processes. These opportunities have to be described with respect to their contributions to the development of the learner’s competencies. These contributions are different depending on the type of learning opportunity: while we cannot assume much more than colleagues or documents helping on a certain topic, pedagogically prepared learning objects are expected to have a learning objective to be expressed as a competency. The acquisition of these competencies can also be assessed using specific assessment objects. Describing objectives (or topics) is usually not sufficient (except for
0
Enabling Learning on Demand in Semantic Work Environments
•
•
communicating with a colleague where in the course of interaction prerequisites can be explained), but has to be complemented by prerequisites that have to be met in order to understand the learning opportunity. Users: The second part of the ontology is directly concerned with the users (both in the role of an employee in work processes and of a learner) and their respective context. This context can be divided into a personal context (containing competencies, learning preferences, mid- and long-term learning goals, among others), a social context (containing relationship to others), an organizational context (describing the user in terms of organizational entities like department, roles, business processes, tasks) and a technical context with the capabilities of the technical equipment. The Domain ontology: provides the glue between the user (and her situation) and the learning opportunities. The domain ontology consists of a competency catalog
(specifying competencies with differentiated competency levels and their relationships like containment and subsumption), an organizational model (consisting of processes and organizational units) and requirement profiles attached to elements of the organizational model. For recommending learning opportunities, this ontology can be used as follows (see Figure 3): as soon as the system knows about the context of the user, it can derive from the user’s context entities (like task, process context) via the attached requirement profiles the required competencies. After comparing with the user’s current competencies, a competency gap can be identified. This takes into account competency relationships like part-of, is-a relationships, and subsuming competency levels. Finally, the competency gap is matched with learning opportunities in order to compile a sequence of learning opportunities (a so-called learning program), taking into account prerequisites and learning preferences as well as
Figure 2. Ontology-based conceptual model for learning on demand Persona l Pref erences Other Metada ta
has-metadata
Learning Objects
Competency
e ctiv obje hasquisi te re re s-p ha
has-preference
Users has-competency is-friend-of
is-prerequisite-for
requires-competency
is-in-context
Learning Material
0
Orga niza tiona l Unit
Role
Task
Pro cess
Ontology
Enabling Learning on Demand in Semantic Work Environments
Figure 3. Using the ontology for context-steered learning Context Acquisition
context entitities
Determining Requirements
requirement profiles of context entitities
required competencies
Competency Gap Analysis
user competencies
competency relationships (e.g., part-of, subsumes)
to be developed competencies learning preferences
Learning Opportunity Selection & Learning Program Compilation
objectives prerequisites order suggestions templates
recommended learning programs
simple pedagogical templates (like instruction, practice, assessment).
teChnology: a serviCeoriented infrastruCture The implementation is based on a hierarchically structured service-oriented architecture, decomposing the system functionality into loosely coupled components that allow for an easy exchange of selected components. This is required both for an integration into real-world corporate environments and for evolving research on top of a common platform. The top-level services, organized into three layers, are given in Figure 3: •
User-oriented services: On the topmost level, there are services directly visible to the user/learner. This compromises a learning environment for interacting with learning objects and programs, for example, an ex-
•
tended SCORM-based LMS that provides context information to learning objects. The second class of services are communication services like e-mail, instant messaging, but also discussion forums. For the proactive system functionality, a learning assistant is responsible, recommending relevant learning opportunities (both content and people) or learning programs embedded into the user’s working environment (either as a tray application or directly embedded in applications). Added-value learning services: These are a family of services responsible for detecting competency gaps based on information from the enabling services layer, retrieving relevant learning objects or colleagues, compiling learning objects into personalized learning programs and negotiating interests between learners and informal teachers. These services are used by the user-oriented services to adapt to the context.
0
Enabling Learning on Demand in Semantic Work Environments
•
Enabling Services. This layer consists of a learning object repository (which holds all content—formal and nonformal—and its metadata), and a user context manager responsible for collecting information about the user’s current context from various sources (context sensors). Because of the sensitivity of the information, the user context manager often is not implemented as a central service, but rather distributed on the individual user’s machines. The user context manager offers both query-response and publish-subscribe interaction patterns. Finally, the ontology service keeps background knowledge about the domain, which comprises among other things a competency catalog, organizational structures like business processes, organizational units, tasks, and so forth, and their respective competency requirements.
user Context management as an enaBling teChnology What was presented so far depends for its credibility and technical realizability on the availability of user context information: what the user knows, what she does, and what constraints for the appropriateness of resources are. Closer inspection (Schmidt, 2006) reveals fundamental challenges. Most of them can be traced back to the problem that the system cannot sense the usage situation (as the subset of the state of the real world relevant to the interaction with the system) directly, but has to rely on the usage context as a model for that situation. This model is the result of a mapping which is highly imperfect in its nature: it is incomplete, uncertain, and possibly inconsistent. This is aggravated by the problem of dynamics. On the one side, it is often not possible to determine context on demand, but rather the system has to collect its pieces in advance (asynchronicity of acquisition and usage). But on the
Figure 4. Service-oriented, context-aware infrastructure for learning on demand (Schmidt, 2005)
0
Enabling Learning on Demand in Semantic Work Environments
Figure 5. Architecture of the user context management service (Schmidt, 2006)
other side, some parts of the context change often quite quickly, others are rather stable (variability in the rate of change). This demands for a user context management service that handles transparently the uncertainty and dynamics. Our context management infrastructure is divided into four layers to keep the complexity of the task manageable. The lowermost layer consists of the context sources which can either push the relevant context information into the management service or from which the user context management service can pull information by querying. Examples are ERP systems or IDEs (for organizational context, depending on the environment), or personal informa-
tion managers like Outlook or instant messaging clients (for personal and social context). On top is the internal layer that provides a temporal database of context facts. These facts take the form of user x has-context-feature Y, annotated with timestamp, validity interval, and confidence. Key to solving the problems of managing context information on this layer is the introduction of the concept of aging into user context management, which combines the aspects of uncertainty and dynamics. Aging of context facts means that the confidence in acquired context facts decreases over time, starting with the initial confidence of the context source. This results in a controlled forgetting of context facts. It should
0
Enabling Learning on Demand in Semantic Work Environments
be obvious that aging has to be specific for different features of the context: while the current task may change quite quickly, other features like role, but also personal preferences are slow to change. So the context schema is annotated with aging profiles. The logical layer takes care of providing a uniform and consistent view on the context of a user. If we deduce the current task of a user heuristically from a variety of sources, this usually does not yield a consistent result. Additionally, the competencies of a user also have to be determined via indirect methods based on observations of the user’s behaviour so that at one occasion the system has a fact that a user has a competency while at another instant in time, the system comes to the conclusion that she does not have the competency. The logical layer takes care of these conflicts, that is, contradictions between context facts (positive and negative facts, or multiple values for a single valued feature). This can be done using the temporal and confidence information. The system can be configured to use different strategies for conflict resolution, also based on the context feature. The external layer, finally, provides mappings into application-specific context schemas. That way, the same context management infrastructure can be used for different applications. Within our prototypes (Schmidt, 2005; Schmidt & Braun, 2006), we have connected a wide range of context sources on two platforms (Windows and MacOS X): browser plugins for Internet Explorer and Mozilla for browsing activities (and Windows explorer actions on the file system), a Microsoft Office plugin for information about active documents, Microsoft Outlook plugin for access to calendar and contacts, Jabber-based instant messenger plugin for presence information, a plugin for the Apple iCal application for tasks and appointments, an extractor for the Apple Address Book application to extract social relationships, environmental sensors like door, phone, and speech sensors, mouse and keyboard
0
activity sensors, among others. Low-level sensor data was—where required—aggregated to higher level context information with the help of Bayesian networks.
a Brief walkthrough In order to have a clearer picture how the sketched learning-on-demand system actually works, let us take the example of Esther, a project manager for small- and mediums-sized projects. Yesterday, she was informed that she is supposed to take over a running European research project as well. When she logs into the project management system and switches to the research project, the system detects (via adapters to the project management system) her new project context. She then opens Microsoft Project to get an overview of the project’s work plan and then Excel Sheet for EU Cost Statements. Based on the opened applications, the context management infrastructure concludes that Esther is in the task of EU Cost Statements with a confidence of 80%. The system also has some evidence that she deals with resource planning, but this has lower confidence so that it is not taken into account. In the domain ontology, it has been modeled that you need to be competent in EU Project Reporting and have intermediate experience in Resource Planning. From Esther’s competencies (see also Figure 6), the system determines the gap: she needs to learn about EU Project Reporting. There is a learning object that intends to deliver intermediate knowledge of the subject, but it requires the learner to be already competent in EU projects so that she gets a learning program that consists of (1) an introduction to EU projects, (2) a learning object on EU Financial Reporting, and (3) an assessment for the previous learning objects; this learning program is presented to her by the Learning Assistant in the system tray. She works through this SCORMcompliant learning program, and the outcome of the assessment as well as her personal notes are
Enabling Learning on Demand in Semantic Work Environments
recorded in her portfolio. When trying to start with the actual work, she discovers that the learning program was not sufficient to cope with the task in an adequate manner so that the system also offers colleagues that did the same task recently (based on their context history) and some templates for doing calculations and asking partners for necessary data (based on the topic assignment of these documents). After asking a colleague and having a look at the documents, she makes some notes in her portfolio and then continues with her task successfully.
ConClusion Learning on demand is an important element for a modern work environment for knowledge workers. It allows for flexible learning embedded into work
processes that combines self-directed learning with pedagogical and organizational guidance. In the light of current trends towards personal learning environments replacing organizationdriven approaches, this seems to be an important element of a future learning ecology, reconciling individual and organizational perspectives: it does not constrain the flexibility of the individual while the organization still has the possibility to influence employees’ learning processes, that is, implement organizational guidance through specifying requirements for organizational entities like business process activities or roles.. Methods for learning on demand like contextsteered learning rely on automated techniques to keep the flexibility manageable. Context-steered learning implements pedagogical guidance (and thus goes beyond simple information delivery) by considering not only the current information/ learning needs, but also prerequisites for under-
Figure 6. Example competence structure
Enabling Learning on Demand in Semantic Work Environments
standing the provided resources and a limited form of meaningful (in the pedagogical sense) order. This loose form of guidance represents a compromise between strong (and thus also expensive) structural (in courses) or even personal guidance (in blended learning settings) and almost no guidance in pure information retrieval settings. On the technological level, semantic modelling in the form of ontologies has demonstrated that it allows for (a) representing complex domains and (b) providing a sound basis for the architecture of loosely coupled service-oriented infrastructures. Combined with the crucial issue how the system actually finds out what the user does and needs, semantic work environments (like Semantic Desktop infrastructures) offer the potential of providing an infrastructure that already provides a lot of semantic information that is needed for learning on demand (Braun, Schmidt, & Hentschel, 2007). The presented service-oriented architecture can easily be integrated into a framework with a larger scope.
future researCh direCtions First user evaluations have shown that contextaware workplace learning support has proven to be a useful concept for learning support in semantic work environments. But the concept has still a long way to go to widespread adoption. This is mainly due to the effort needed to set up the required semantic infrastructure. Usually, standards are the vehicle that allows for reducing efforts to adapt systems to the respective environment. This has been the main focus of e-learning developments in the last years. Standards like SCORM and the related IEEE LOM metadata standard have, however, focused (a) on the content itself and (b) on describing decontextualized individual learning, rather than trying to incorporate the work context. If these standards want to facilitate contextaware workplace learning, they have to:
•
• •
Closely align with standardization activities in the area of competencies (like IEEE Reusable Competency Definition) Incorporate organizational metadata like the organizational context presented above Extend their coverage to more informal and less mature learning opportunities.
Moreover, it is essential that a reference ontology for describing context is developed so that the modelling effort is substantially reduced. The Professional Learning Ontology can be a starting point in these developments. Another current trend is constituted by Personal Learning Environments (PLE) (Attwell, 2007), which are an emerging paradigm for supporting the active side of learning (in contrast to “consuming” learning resources). These PLEs are currently mainly understood as a collection of communication, collaboration, and knowledge structuring tools and do not consider contextawareness at all. However, it seems to be natural to complement the PLE idea with context-steered learning in the future. As far as the very concept of context-awareness is concerned: Although we have significantly broadened the notion of context for workplace learning support, we still see remaining challenges in the following areas, which will be the subject of our further research (especially within the scope of the FP7 EU Integrating Project MATURE, www.mature-ip.eu): •
Socially-aware learning support: In our approach of context-aware mediation of communication, we have started to address the social dimension of informal teaching. However, we see big potential in exploring this issue further by developing a more differentiated social ontology, building on prior work like Masolo et al. (2004) and Matsuo, Hamasaki, Mori, Takeda, and Hasida (2004) and by crafting more advanced elicitation methods (e.g., from e-mail communication)
Enabling Learning on Demand in Semantic Work Environments
•
•
for gaining this information. This approach must not be confused with social network analysis, which represents a macro perspective on social relationships (and often introduces difficult ethical issues), whereas we favor an approach focusing solely on the individual’s rating of social relationships. Maturity-aware learning support: In contrast to school and university education, workplace learning has always been pragmatic with respect to the resources used for learning, ranging from pedagogically sound learning objects to casual documents and highly contextualized entries in discussion forums. Although the motto anything helps is better than insisting on a high level of quality, it should also be obvious that not everything is appropriate for everyone. In Schmidt (2005b) we have presented the notion of knowledge maturity, which has a direct impact on the teachability. Immature resources like discussion entries are only useful if you have a high contextual overlap, whereas learning objects or even courses are useful for (almost) anyone. Our context-steered learning method should thus be extended to capture and make use of maturity information about resources and also relate them to competency levels of employees (see Schmidt, 2007). Continuous evolution of the underlying semantic models: Most approaches based on semantic technologies (including the one presented) rely on explicit models of the real world (in most cases in the form of ontologies). In order to ensure the sustainability of these approaches, these ontologies (like competency or process ontologies) need to be continuously updated. However, current methods requiring specialized knowledge engineers introduce a considerable time lag until relevant changes to reality are reflected in the ontology (Hepp, 2007). To
cope with this challenge, we need to rethink ontology engineering as a bottom-up and work-integrated activity coupled with learning processes. A promising approach is ontology maturing (Braun, Schmidt, Walter, Nagypal, & Zacharias, 2007), comprising a general phase model and lightweight workintegrated engineering tools.
referenCes Abecker, A. (2004). Business process oriented knowledge management: Concepts, methods, and tools. Doctoral Dissertation, University of Karlsruhe, Germany. Attwell, G. (2007). The personal learning environments - the future of e-learning? eLearning Papers, 2(1). ISSN 1887-1542. Bereiter, C., & Scardamalia, M. (1985). Cognitive coping strategies and the problem of inert knowledge. In S. Chipman, J. Seagal, & R. Glaser (Eds.), Thinking and learning skills. LEA. Braun, S., Schmidt, A., & Hentschel, C. (2007). Semantic desktop systems for context awareness - requirements and architectural implications. In 1st Workshop on Architecture, Design, and Implementation of the Semantic Desktop (SemDesk Design), 4th European Semantic Web Conference (ESWC 2007), Innsbruck, Austria. Braun, S., Schmidt, A., Walter, A., Nagypal, G., & Zacharias, V. (2007). Ontology maturing: A collaborative Web 2.0 approach to ontology engineering. In Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge at the 16th International World Wide Web Conference (WWW 07), Banff, Canada. Brusilovsky, P. (2001). Adaptive hypermedia. User Modeling and User Adapted Interaction, 11(1-2).
Enabling Learning on Demand in Semantic Work Environments
Cross, J., & O’Driscoll, T. (2005). Workflow learning gets real. Training Magazine, (2). Davis, J., Kay, J., Kummerfeld, B., Poon, J., Quigley, A., Saunders, G., et al. Using workflow, user modeling and tutoring strategies for just-intime document delivery. Journal of Interactive Learning, 4, 131-148. Farmer, J., Lindstaedt, S., Droschl, G., & Luttenberger, P. (2004, April 2-3). AD-HOC – workintegrated technology-supported teaching and learning. In 5th International Conference on Organisational Knowledge, Learning, and Capabilities, Innsbruck. Gütl, C., Pivec, M., Trummer, C., Garcia-Barrios, V.M., Mödritscher, F., Pripfl, J., et al. (2005). AdeLe (adaptive e-learning with eye-tracking): Theoretical background, system architecture and application scenarios. European Journal of Open, Distance and E-Learning. Hädrich, T., & Priebe, T. (2005a). Supporting knowlegde work with knowledge stance-oriented integrative portals. In European Conference on Information Systems. Hädrich, T., & Priebe, T. (2005b). Supporting knowlegde work with knowledge stance-oriented integrative portals. In European Conference on Information Systems. Lemire, D., Boley, H., McGrath, S., & Ball, M. (2005). Collaborative filtering and inference rules for context-aware learning object recommendation. International Journal of Interactive Technology and Smart Education, 2. Masolo, C., Vieu, L., Bottazzi, E., Catenacci, C., Ferrario, R., Gangemi, A., et al. (2004). Social roles and their descriptions. In Proceedings of the Ninth International Conference on the Principles of Knowledge Representation and Reasoning, Whistler, Canada. Matsuo, Y., Hamasaki, M., Mori, J., Takeda, H., & Hasida, K. (2004). Ontological consideration
on human relationship vocabulary for FOAF. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and Semantic Web. Niedzwiedzka, B. A proposed general model of information behaviour, Information Research, 9. Park, O.C., & Lee, J. (2004). Adaptive instructional systems. In DH. Jonassen (Ed.), Handbook of research for educational communications and technology (2nd ed.) (pp. 651-684). Mahwah, NJ: Lawrence Erlbaum Rhodes, B., & Maes, P. Just-in-time information retrieval agents, IBM Systems Journal, 39, 685-704. Schmidt, A. (2005a). Bridging the gap between knowledge management and e-learning with context-aware corporate learning solutions. In K. Althoff, A. Dengel, R. Bergmann, M. Nick, & T.R. Berghofer (Eds.), Professional knowledge management. Third Biennial Conference, WM 2005, Kaiserlautern, Germany, April 2005. Revised Selected Papers (pp. 203-213). Springer. Schmidt, A. (2005b). Knowledge maturing and the continuity of context as a unifying concept for knowledge management and e-learning. In Proceedings of I-KNOW 05, Graz, Austria. Schmidt, A. (2006). Ontology-based user context management: The challenges of dynamics and imperfection. In R. Meersman & Z. Tahiri (Eds.), On the move to meaningful Internet systems 2006: CoopIS, DOA, GADA, and ODBASE. Part I (pp. 995-1011). Springer. Schmidt, A. (2007). Microlearning and the knowledge maturing process: Towards conceptual foundations for work-integrated microlearning support. In T. Hug, M. Lindner, & P.A. Bruck (Eds.), Proceedings of Microlearning 2007, Innsbruck, Austria. Schmidt, A., & Braun, S. (2006). Context-aware workplace learning support: Concept, experiences, and remaining challenges. In W. Nejdl &
Enabling Learning on Demand in Semantic Work Environments
K. Tochtermann (Eds.), Innovative approaches for learning and knowledge sharing. First European Conference on Technology-Enhanced Learning (EC-TEL 2006), Crete, Greece (pp. 518-524). Springer. Schmidt, A., & Kunzmann, C. (2006). Towards a human resource development ontology for combining competence management and technologyenhanced workplace learning. In R. Meersman, Z. Tahiri, & P. Herero (Eds.), On the move to meaningful Internet systems 2006: OTM 2006 workshops. Part I (pp. 1078-1087). Springer. Straub, R. (2005). Lernen im Kontext: Dynamischen Lernkonzepten gehört die Zukunft. Competence-Site, 26(1). Ulbrich, A., Scheir, P., Lindstaedt, S. N., & Görtz, M. (2006). A context-model for supporting workintegrated learning. In W. Nejdl & K. Tochtermann (Eds.), Innovative approaches for learning and knowledge sharing. First European Conference on Technology-Enhanced Learning (EC-TEL 2006), Crete, Greece (pp. 525-530). Springer. Woelk, D., & Agarwal, S. (2002). Integration of e-learning and knowledge management. In World Conference on E-Learning in Corporate, Government, Health Institutions, and Higher Eduction (Vol. 1, pp. 1035-1042).
additional reading Attwell, G. (Ed.). (2007). Searching, lurking, and the zone of proximal development. E-learning in small and medium enterprises in Europe. Navreme: Pontybridd/Bremen. Maier, R. (2007). Knowledge management systems. Information and communication technologies for knowledge management (3rd ed.). Berlin, Germany: Springer. Rosenberg, M. (2006). Beyond e-learning. Approaches and technologies to enhance organizational knowledge, learning, and performance. Hoboken, NJ: Wiley & Sons. Siemens, G. (2006). Knowing knowledge. London: Lulu.
endnote 1
Available at http://www.professional-learning.eu
Section IV
Techniques for Semantic Work Environments
Chapter XIII
Automatic Acquisition of Semantics from Text for Semantic Work Environments Maria Ruiz-Casado Universidad Autonoma de Madrid, Spain Enrique Alfonseca Universidad Autonoma de Madrid, Spain Pablo Castells Universidad Autonoma de Madrid, Spain
aBstraCt This chapter presents an overview of techniques for semi-automatic extraction of semantics from text, to be integrated in a Semantic Work Environment. A focus is placed on the disambiguation of polysemous words, the automatic identification of entities of interest, and the annotation of relationships between them. The application of these topics falls into the intersection of three dynamic fields: the Semantic Web, Wiki and other work environments, and Information Extraction from text.
introduCtion Nonsemantic wikis and information desktops manage shared or personal information. The success of these platforms is mainly due to the interest they have arisen among potential contributors who are eager to participate because of their particular involvement in the domain under discussion. A stated by Pressuti and Missim (this volume), this success can be attributed to several features of
these systems, such as the easiness to publish and review content on the Web through the Web browser, the few restrictions on what a wiki contributor can write, and the user friendly interface (both in wikis and personal information desktops). All these constitute a very flexible mechanism to create, publish, and annotate content. Semantic Work Environments (SWEs) propose a blend of the Semantic Web technologies and individual or collaborative work environments.
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
The Semantic Web (SW) constitutes an initiative to extend the Web with machine readable content, primarily making the content explicit through semantic annotations. This allows an easy automated processing when searching and retrieving information. SWEs can also benefit from the tagging of relevant concepts and relations held among them, in such a way that the search and retrieval of information is greatly enhanced when semantic annotation is present. SWEs maintain explicit marks for these concepts and relations, so the tags can be easily detected and used by the computer to retrieve from the texts the information required by a user’s query. The addition of semantic annotations to documents can be achieved following the wiki philosophy of lowering the technical barriers, as simple labels attached to the hyperlinks. In fact, wiki communities have already proved to succeed in collaboratively producing at low cost vast information repositories. Next, internal tools may be provided to transform these annotations into a SW markup language. Through the semantic augmentation, wikis and other semantic work environments benefit from the automation of management, searching and retrieval pursued by the SW. But the semantic tagging of work environments faces the well known bottleneck in the Semantic Web applications: placing tags in a large amount of existing content, sometimes also rapidly evolving, can be too costly if it has to be done manually. Moreover, the wiki contributors or the semantic desktop users may feel discouraged to review the full database (containing pre-existing information) in order to place semantic tags, or may hesitate about where and how to place them. This is the case of annotating large repositories of information like news databases, e-mails, or other personal and company-owned databases. In these cases, the number of users and information managers may be small and the amount of data very large. Even in a collaborative environment where the cost is shared among several con-
tributors the tagging bottleneck is present. For instance, among the existing Wikipediae, as of March 2006, the English version has more than one million articles, and the German, Spanish, French, Italian, Japanese, Dutch, Polish, Portuguese, and Swedish Wikipediae are above one hundred thousand articles each. If all these entries were to be extended with semantic annotations manually in a reasonable amount of time, the cost of manually labeling them would be enourmous, if not unfeasible. In answer to this need, the use of semi-automatic annotation procedures has been the object of extensive research for many purposes, including tackling the SW tagging bottleneck. In general, semantic tags provide the same information that a human reader can elicit (sometimes unconsciously) from an untagged natural-language text: names of locations, people, organisations and other entities mentioned in the text, and events and relationships involving them, for instance, that a particular person works in a specific company. Moreover, a human can usually discriminate the sense intended in a text for a polysemous word, for example, if the word pipe is mentioned referring to smoking, plumbing, or music. Finally, from the way in which the terms are used in a context, it is sometimes possible to imagine their meaning up to a certain degree. Motivated by the large amounts of textual information, different areas of Natural Language Processing (NLP) try to create algorithms to perform all these deductions automatically. The subfield of NLP that studies the automatic annotation of entities, events, and relationships in unrestricted texts is called Information Extraction (IE). Text Mining focuses on the discovery of information previously unknown (Hearst, 2003), for example, extracting patterns from text that express known relationships or events and using those same patterns to extract new knowledge. Finally, Word Sense Disambiguation (WSD) tries to identify the correct sense with which a word is being used in a given context.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
A simple system for the acquisition and annotation of semantics is depicted in Figure 1. It departs from a textual unnanotated source (a) that is processed with some NLP tools in order to obtain processed documents. The texts at stage (b) are annotated with linguistic tags, for example, token and sentence boundaries are identified; words are stemmed or reduced to their canonical form; the syntactic structure of the text is identified; and some semantic analysis is performed. The level up to which the text has to be analyzed depends on the techniques that will be selected for the next step, the Semantic Annotation module (c). In this stage, polysemous words are disambiguated, entities and relations are identified, and the original texts are semantically annotated, including explicit references to selected entity types, the results of the disambiguation, and the selected relations that hold between the entities. The collection of documents is ready to be used in a semantic environment (d). Additionally, it is possible to enrich the initial knowledge base with
the new extracted information (e). If the automatic annotation system is integrated with the SWE, it is also possible to enrich the knowledge source with the (evolving) manual annotations done by the users (f). Note that the automation of labelling natural language texts can be tackled in simpler ways when the documents contain structural information, such as tables, lists, or LaTeX, HTML, or XML tags. Wrappers (Florescu, Levy, & Mendelzon, 1998; Kushmerick, Weld, & Doorenbos, 1997) take advantage of this structure in tagging portions of the documents. However, they are not applicable to any document that does not follow the same guidelines, and they are left outside the scope of this review. Since the development of Semantic Web technologies, several systems for ontology population and automatic annotation have been developed (Contreras et al., 2003; Uren et al., 2006). Many of them are editing tools to assist the manual annotation of texts, though some use NLP technologies
Figure 1. Acquisition of semantics for SWEs (a)
NLP Processing
Documents
Web, Newspaper, other corpora
Processed Documents
(b)
KnoWLeDGe
xxxx xxxx
Base
xxx
(f)
Acquisition of semantics
(c)
semantic work environment x [IS A ] xxx [AUTH OR] x x
(e) (d)
Information Extraction: NE, Events Word Sense Disambiguation Relations Extraction
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Figure 2. Levels of linguistic analysis
natural language proCessing
Phonological Analysis
There are many kinds of NLP applications, including dialogue interfaces, question-Answering systems, Machine Translation, Natural Language Generation, Information Extraction or Automatic Text Summarisation, amongst many others. Although some of them require the contribution from other disciplines, such as signal processing for speech recognition in dialogue interfaces, all of them need to solve, to a certain extent, some common problems, corresponding to different levels of analysis of the natural language samples. Figure 2 represents a classical NLP system, which is usually decomposed into the levels of analysis described below (Engels & Bremdal, 2000; Mitkov, 2003). As shown in Figure 2, the analysis is performed from the bottom up, and many NLP systems are implemented as a pipeline of processors each of which adds annotations to the documents provided by the previous one. Note that not every level included will be necessary for every possible application of NLP.
0
analysis levels
and work in a quite automatic way. The results are still limited and, especially for relations extraction, there is a need in further research. Semantic Work Environment technologies are an emerging trend nowadays, and a new field where most of the NLP technologies can be successfully applied. The approaches reviewed in this chapter aim at exploiting and opening new research opportunities in this direction. This chapter is organised as follows: the next section provides a brief introduction to NLP and some typical problems that are present when a computer processes natural language. The following sections review existing work on Named Entity Recognition and Classification, Word-Sense Disambiguation, and Relationship Extraction. The final section concludes the chapter.
pragmatics Discourse Discourse semantics Sentences syntax morphology Words tokenisation phonology
Pre-textual
The phonological level of analysis deals with the processing of the speech signal by means of speech recognition techniques to transform it into a sequence of textual words (Lemmetty, 1999). This level is not necessary if the input information is already provided as textual data, which will be the case, in general, in Semantic Work Enviroments such as semantic wikis.
Text Segmentation It basically consists of two main tasks: tokenising the text into a sequence of words or tokens and identifying sentence boundaries. Already at this level of analysis ambiguity is present. Tokenboundary identification may be complicated in languages in which words are not separated by blankspaces (such as Chinese), but it may even pose a challenge in languages such as English, when the system is supposed to recognise complex and diverse tokens such as URLs, e-mail addresses, or gene and protein names, which
Automatic Acquisition of Semantics from Text for Semantic Work Environments
may include hyphens and dots inside the token. Concerning sentence boundary identification (Mikheev, 2002), the presence of a dot at the end of a word does not always indicate a sentence boundary, as it may be indicating the end of abbreviation or a complex token, or both things at the same time. For instance, in example (1), there are six words that end with a dot, four of which are abbreviations, abother is a sentence ending, and Inc. is both at the same time. (1)
Mr. J. Smith is supposed to arrive next Mon. to Company Inc. He tried to book the flight on Tue. but it was not possible.
For those applications where the input to the NLP system is a speech sequence, in general, the speech recogniser outputs a sequence of words, and therefore tokenisation may not be necessary.
Morphology The morphological level of analysis (Trost, 2003) consists in the processing of words to obtain their inner structure (lemma and affixes). A stemmer is a tool that performs morphological analysis to elicit the stem of a word from its affixes. A morphological analyser reduces word forms to a common canonical form, usually the infinitive form for verbs and the nominative masculine singular for nouns. It can produce additional information such as the number, gender, case, verb conjugation, and so forth. Some analysers also provide derivation and compounding of words, for instance, indicating that the noun creation is derived from the verb create. This is, in general, a more difficult task than processing inflections, as both derivation and compounding are processes that create new words, usually with a different grammatical category than the original ones, and the morphological rules used in derivation are usually more varied. At this level, we may again find ambiguities that are usually resolved
by context. For example, lay may be the present of infinive forms of the transitive verb to lay, or the past form of the intransitive verb to lie, apart from a common noun and an adjective. Usually, programs known as part-of-speech taggers label every token with a tag indicating whether it belongs to some category of words (e.g., nouns, verbs, adjectives, adverbs, prepositions, etc.), which helps in their disambiguation, although some morphological ambiguity remains even after the use of a part-of-speech tagger.
Syntax Syntactic analysis or parsing consists in examining sentences to analyse the grammatical arrangement of words inside them (Kaplan, 2003). It usually consists of identifying constituents or phrases, and the syntactic dependencies between them. Constituents can be defined as groups of words that share grammatical properties: constituents can be exchanged, and used in similar places inside sentences without altering the grammatical correctness of a text. Examples of syntactic dependencies are predicate-argument relationships (such as the relationships that holds between a verb and its subject), or modifier and adjunct relationships (such as relationships between adjectives or prepositional phrases and nouns). These relationships can be used to represent the structure of a sentence as a tree, called parse tree, where the root represents the whole sentence. The parsing of a sentence can be performed completely, by generating the complete parse trees (deep parsing), or it can be attempted in a more shallow (a simple) way. A shallow parser recognises certain constituents (e.g., nonrecursive noun phrases) or selected syntactic dependencies (e.g., between a verb and its direct object, or between a verb and its subject). A kind of partial parser that is widely used in NLP applications is the NP chunker, a tool that identifies non-overlapping nonrecursive noun phrases found in a text.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
A typical example of ambiguity in syntactic analysis is that of Prepositional Phrase (PP) attachment: which is the constituent modified by a particular PP? In sentence (2), for instance, the PP with a telescope may be indicating a feature of the man that I see, or may be an instrument that I am using to see. (2)
I see a man with a telescope
Some kind of semantic analysis is usually needed in Question Answering systems, tools to automatically find the answer to a given question inside a document collection (Harabagiu & Moldovan, 2003), in some dialogue interfaces (Androutsopoulos & Aretoulaki, 2003) and in systems to assess free-text answers written by students (Burstein, Leacock, & Swartz, 2001).
Discourse
semantics Semantics is the study of meaning. As in the case of syntax, the semantic analysis can be performed in a shallower or in a deeper way, depending on the final application. A deep semantic analysis generally consists in translating a text written in human language into a nonambiguous output format. This process is also called semantic interpretation. Nevertheless, many applications obtain good results with a simpler semantic analysis, such as the following: •
Identifying the meaning with which a word is used in some context, for example, bank in sentence (3). (3)
•
Identifying the concept from an ontology for which a particular term is an instance, for example, iscovering that Zalacaín in sentence (4) is an instance of a restaurant. (4)
•
We had dinner at Zalacaín
The role that a constituent performs in a sentence, for example, in sentence (5) I is the agent of the action open, and scissors are an instrument. (5)
I sat at the bank of the river
I opened the parcel with my scissors
A discourse is an extended sequence of sentences that convey a package of information. The information within a discourse can include one or more topics. We can consider a topic segmentation system as a shallow discourse analyser that identifies topic boundaries inside a text. Deep discourse analysers find the rhetorical structure of the texts. A commonly used theory for annotating discourse is the Rhetorical Structure Theory (RST) (Mann & Thompson, 1988), that defines a set of about 25 discourse relationships such as elaboration, contrast, concession, antithesis, example, and so forth. It is possible to construct a discourse tree, for a given topic, at whose top we would find the clause(s) containing the central idea(s), and below those that enhance its function (called satellites). Among other applications, discourse analysis has been also applied to knowledge extraction (i.e., Cimiano, Reyle, & Saric, 2005).
pragmatics The analysis of a text up to its pragmatic level seeks to explain the meaning of linguistic messages in terms of their context of use. This is an important issue when implementing dialogue systems. It takes into account facts which are not explicit in the text, in the form of intended meanings of the speaker (illocutions) which can not be ascertained from a semantic analysis. It implies in most of the cases taking decisions about the participants in the communication act: the speaker’s believes, his intention and the rel-
Automatic Acquisition of Semantics from Text for Semantic Work Environments
evance of its assertions, as well as the listener’s features. For instance, sentence (6) may express the pragmatic speaker’s desire of the listener to switch off the air conditioner. (6)
It is really cold here
•
The pragmatic analysis can take advantage from the discourse analysis, as the rhetorical relations serve as indicator of the speaker’s intention (Leech & Weisser, 2003). Often, anaphora resolution is considered a part of the pragmatic analysis. Anaphora is the linguistic phenomenon where a given referent is pointing back to a previously mentioned item in a text or antecedent. Anaphora resolution consists in determining the noun, noun group or clause being addressed by a pronoun or referent placed in a different sentence than its antecedent (Botley & McEnery, 2000).
nlp toolkits There is already a wide variety of NLP software toolkits that can be used to perform much of the linguistic processing explained above, especially for the lowest levels of analysis. If we classify these toolkits according to how the the linguistic annotations are represented, it is possible to group them in four types (Cunningham et al., 1997; Petasis, Karkaletsis, Paliouras, Androutsopoulos, & Spyropoulos, 2002): •
Additive or markup-based, when the text is annotated with linguistic information with a markup scheme, such as SGML or XML. Existing toolkits in this category include LT-XML (Brew, McKelvie, Tobin, Thompson, & Mikheev, 2000; McKelvie, Brew, & Thompson, 1997) and the wraetlic toolkit (Alfonseca et al., 2006). As Petasis et al. (2002) point out, this approach has the advantage that a program may load from the XML document just the information
•
•
wanted, resulting in small memory requirements, but they are usually criticised as the documents typically have to be re-parsed by each module because these systems are usually implemented as pipelines. Referential or Annotation based: in this case, the linguistic information consists of references to the textual data, which is kept separately. Also known as stand-off annotation (Thompson & McKelvie, 1997), it has been increasingly used to the point that it is considered a prerequisite of the ISO/TC 37/SC 4 linguistic annotation framework (Ide & Romary, 2003). TIPSTER (Grishman, 1998), GATE (Bontcheva, Tablan, Maynard, & Cunningham, 2004), Ellogon (Petasis et al., 2002), and JET (Grishman, 2005) belong to this category. It has as advantage the possibility of creating multiple annotations for a single document, possibly overlapping each other. GATE is probably the most widely used NLP suite at the moment, having been applied to more than 50 research projects. Abstraction based, when the original text is transformed into an internal data structure, that is theoretically grounded, as in the ALEP system (Simkins, 1994), or SProUT (Drozdzynski, Krieger, Piskorski, & Schafer, 2005). Systems without a uniform representation, those that provide an architecture for communication and control, but do not impose a uniform data structure to be used by the different modules, such as the TalLab platform (Wolinski, Vichot, & Gremont, 1998), or OAK (Sekine, 2002), whose modules also understand multiple formats (annotations, SGML, etc.)
word sense disamBiguation In this section, we review the problem of Word Sense Disambiguation, its usefulness in semantic
Automatic Acquisition of Semantics from Text for Semantic Work Environments
annotation of texts, and possible techniques to tackle it. The fact that a word may convey different meanings is very common. Just as example, the word bank in English has 12 different meanings in the online Merriam-Webster dictionary: 1.
A mound, pile, or ridge raised above the surrounding level. 2. The rising ground bordering a lake, river or sea [...]. 3. A steep slope [...]. 4. The lateral inward tilt of a surface along a curve or a vehicle (as an air plane) when taking a curve. 5. A protective or cushioning rim or piece. 6. An establishment for the custody, loan or exchange, or issue of money [...]. 7. The table, counter or place of business of a money changer. 8. A person conducting a gambling house or game. 9. A supply of something held in reserve [...]. 10. A place where something is held available [...]. 11. A group of series or objects arranged together in a row or a tier. 12. One of the horizontal and usually secondary of lower division of a headline. The meaning of a word is usually determined by its context. So, for instance, it is relatively easy for a human to classify the word bank as used in sentence (7) as having meaning number two from the list above, and the same word from sentence (8) as conveying the sixth sense. (7) (8)
After bathing in the lake, I took a sunbath in the bank. The loan I managed to get in that bank is very convenient.
Ide and Véronis (1998), define WSD as the task of associating a given word in a text or a
discourse with a definition or meaning (sense) which is distinguishable from other meanings potentially attributable to that word. It is, therefore, a classification problem, where, for each possible word, we have a certain number of senses, and a WSD system has to classify all occurrences of that word as pertaining to one of the senses mentioned. A straightforward application of WSD to Semantic Work Environments would be that of ontology-based annotation of concepts inside texts. In particular, we can suppose that our semantic markup refers to the concepts in an ontology containing the 12 meanings of the word bank indicated above. If that is the case, in order to be able to use an ontology-based retrieval engine, it would be necessary, first of all, to disambiguate all the occurrences of the word bank in all the documents in the SWE. A second, related application of word-sense disambiguation would be automatic ontology population: from a sentence such as (9), if a system is able to disambiguate the word bank to the sixth sense, the addition of the instance Midsea as an instance of that concept would be much easier than if there was not a WSD module available. (9)
The bank in which I have an account is called Midsea.
Most Word-Sense Disambiguation methods rely on two major sources of information: the context of the polysemous word and one or several external knowledge sources. An external knowledge source is a repository of information that helps the disambiguation task. It may be a dictionary, a thesaurus, a lexicon, or a lexical semantic network, including a repository of word senses and associated information, such as definitions, synonyms, antonyms, topical words, selectional preferences, and so forth.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
machine-readable dictionaries
Thesauri
One of the first attempts to use dictionary definitions to disambiguate is that presented by Lesk (1986). In this work, for each polysemous word, a window around it was extracted as its context, and the possible senses were obtained from a dictionary. The dictionary definitions usually contain terms that are very relevant with respect to each of the senses of a word, which helps to discriminate between senses. Therefore, he selected the sense whose definition has the highest word overlap with the context of the word. Intuitively, the best results are obtained using dictionaries with long definitions. The reader can easily check that the word bank in sentences (7) and (8) could be correctly disambiguated using the dictionary definitions from the Merriam-Webster dictionary. Guthrie, Guthrie, Wilks, and Aidinejad (1991) used the Longman Dictionary of Contemporary Englich (LDOCE) as the sense dictionary. It has, for each possible sense, a box code (specifying the primitive of a sense, for example, abstract, animate, human, and so forth) and a subject code (that classifies the word senses by their domain, for example, economics, engineering, and so forth). In their two-step approach, first the domain of the ambiguous word is identified, and next the disambiguation is done using only the senses pertaining to that domain. Wilks, Fass, Ming Guo, McDonald, Plate, and Slator (1993) proposed an early implementation in the line of what would be later known as the Vector Space Model approach, using LDOCE as the sense dictionary. In this case, both the word context and the dictionary definitions are represented in a vector space, where each dimension corresponds to a word from the controlled vocabulary of LDOCE. The cosine of the angle between two vectors is used as a similarity metric, so the sense chosen for each occurrence of a polysemous word is the one whose vector associated has the highest cosine with the vector associated to the word’s context.
Words inside a thesaurus are classified in categories, where a set of words in a same category are semantically related. Thesauri can also be used to discriminate between senses. Nevertheless, as in the case of dictionaries, thesauri were created for humans and they do not reflect the relations between words in a perfect, error-free way. Walker (1987) used the domains from LDOCE to select the sense whose domain includes the maximum number of words from the context of the polysemous word. Yarowsky (1992) proposed the use of the Roget’s Thesaurus and the Grolier’s Encyclopedia as external sources of information. Before the disambiguation, for each thesaurus category, a group of “salient” words are selected. Once these groups of salient words have been assigned to each category in the thesaurus, the disambiguation is performed using a 100-word context for each polysemous word and a Bayesian classifier.
Computational lexicons Computational lexicons are knowledge bases of words. The resources are usually large-scale knowledge bases, sometimes built by the collaborative effort of different institutions, for general NLP purposes. Computational lexicons use to include many of the common resources exploited in the disambiguation field, combining dictionary definitions, groups of words arranged in a hierarchy like a thesaurus and semantic relations, and syntactic selectional preferences. The lexicons are built either in an enumerative way, where senses are explicitly provided, or in a generative way, in which the senses are generated using rules. In lexical semantic networks, words are structured in a graph representation, where nodes represent concepts (labelled with the set of synonym words that can be used to refer to that concept) and links represent semantic relationships between the concepts. Nodes are called synsets (synonym sets).
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Common relationships are antonymy, hyponymy, and hyperonymy (is-a relationships), holonymy and meronymy (the part-of relationship), telic (purpose), entailment, derived-from, pertainsto-topic, among many others. WordNet (Miller, 1995) is a lexical semantic network that has been used extensively for Word Sense Disambiguation. It contains nouns, verbs, adjectives, and adverbs, together with several relationships linking them. Several authors have extended it with domain labels (Bentivogli, Forner, Magnini, & Pianta, 2004; Magnini & Cavaglia, 2000), topic signatures (Agirre, Ansa, Martinez, & Hovy, 2001), or a classification of the nodes as concepts or instances (Alfonseca & Manandhar, 2002a)—the last versions of WordNet include a classification of the hyponymy relationships as instance-hyponym and subconcept-hyponym. Furthermore, there are versions of WordNet in many other languages (Chakrabarti, Narayan, Pandey, & Bhattacharyya, 2002; Hamp & Feldweg, 1997; Mohanty, Ray, Ray, & Santi, 2002; Tufis, Cristea, & Stamou, 2004; Vossen, 1998). There are many other computational lexicons, such as Cyc (Lenat, 1995; Lenat & Guha, 1990), ACQUILEX (Copestake, 1992) or COMLEX (Macleod, Grishman, & Meyers, 1994), but they have not been used to the same extent in WSD as WordNet. WordNet can be used directly as a sense repository for WSD: the number of senses of a word would be the number of nodes in the graph containing it. However, it has sometimes been criticised because of the fine granularity of its senses, which makes the disambiguation decision very difficult even for a human. The methods for WSD that use lexico-semantic networks exploit the hierarchical relation and the usually rich range of word associations to discriminate among senses in a context. For instance, a system may look for the parents or upper ancestors in the hierarchy of a context word, in such a way that new knowledge that was not present initially in the context (these ancestor words) is
being inferred thanks to the semantic network and used for the disambiguation purpose. These methods are also referred to as relations-based methods, as they follow the relationships while processing the features encoded therein. Distance-based methods use the hierarchy specified by the hyponymy or other relationships. The concepts that contain terms that appear in the context of the polysemous word to disambiguate are marked, and different distance-based similarity metric are used to identify which is the candidate sense that appears nearest the context words inside the hierarchy, using different procedures (e.g., Quillian, 1969). Some variations are the identification of common ancestors of the context terms (Dahlgren, 2000), assigning weights to the paths either depending on the kind of relationship (Sussna, 1993), or using empirically calculated probability estimates to estimate the distance represented by each link (Resnik, 1995a, 1995b). Vorhees (1993) defined the hood of a certain sense S of a polysemous word W as the subgraph from WordNet that contains S, its ancestors and descendants, but which does not contain any other sense of W. The idea is that it represents a grouping of words that embrace one of the senses of W, and every hood represents a different sense of the word. Therefore, the sense of an ambiguous word in a given context can be selected by computing the frequency of the words in the context that occur in each of its possible sense’s hoods. McClelland and Rumelhart (1981) introduced the spreading activation models, which is based on the idea that the introduction of a certain concept in a context will influence and facilitate subsequently introduced concepts that are semantically related. The concepts in a semantic network are activated upon use, and the activation spreads to other nodes. For instance, the concept throw will activate the “physical object” sense of the word “ball.” As it spreads, the activation gradually weakens, but at the same time some nodes in the network can receive activation from
Automatic Acquisition of Semantics from Text for Semantic Work Environments
different sources and result in the best sense for the ambiguous word. Finally, Agirre and Rigau (1995) introduced the conceptual density, with a similar idea as that of the hoods: the complete hierarchy is divided into subhierarchies, each of which contain just one sense of the polysemous word, and the subhierarchy that contains the highest density of contextual terms is chosen as the right sense.
Topic Signatures Agirre et al. (2001) described a procedure to enrich each of the WordNet synsets, which they have applied to enrich all nominal senses (Agirre & Lopez de Lacalle, 2004). The topic signature of a synset is a vector of terms that co-occur with the words in that synset, and their frequencies of co-occurrence. Different functions can be used to weight the terms in a topic signature, such as tf-idf, chi-square, log-likelihood (Dunning, 1993), the t-score (Church, Gale, Hanks, & Hindle, 1991), and the Mutual Information (Hindle, 1990), among many others. To discover the sense with which a word is used in a context, a vector representing that context is calculated, and the most similar sense is found using a similarity metric between the associated vectors, such as the cosine.
that a same thing can be expressed under many textual forms: a specific sense of a very polysemous word can be rare in a corpus extracted from, say, an encyclopedia, or a news repository, and even in the case that it is found, the way it is used can be only one of the many possible ways to use that word with that sense. To ensure that all the senses and all the possible ways to use a word are represented in a corpus, the amount of documents required to be manually annotated has to be very high. Corpora are normally built by hand, and this constitutes a handicap. Therefore, smoothing procedures are also necessary in general. SENSEVAL is an international competition of systems for the semantic analysis of text. SENSEVAL-3 was held in 2004, and SENSEVAL-4/SEMEVAL is being currently organised. In particular, in SENSEVAL3, the WSD exercise for English consisted in disambiguating nouns, adjectives, and verbs from a sample corpus extracted from the British National Corpus, and the selected sense inventories were from WordNet for nouns and adjectives and from WordSmyth for verbs. The best scored systems reported accuracies near 73% for fine sense distinctions (Mihalcea, Chkloski, & Kilgarriff, 2004), mainly using supervised machine learning and unsupervised techniques (Grozea 2004; Keok Lee, Tou NG, & Kiah Chia, 2004; Strappavara, Gliozzo, & Giuliano, 2004).
annotated Corpora and machine learning
Unsupervised Word Sense Disambiguation
A corpus is a collection of documents written in natural language where the words are tagged with their corresponding senses. In this way, a corpus is a bank of samples that enables the extraction of sense-discriminating rules and disambiguation metrics based on the statistics of the senses assigned to the words in the corpus, using Machine Learning techniques. One of the main problems of methods based on annotated corpus, that is to say, methods based in encountering an annotated example to apply it later on, is the so-called data sparseness problem. Natural language is so rich
It is also important to mention that there exist procedures to automatically classify occurrences of a word as pertaining to different senses when there is not a dictionary of senses available, most of the time using clustering techniques (Schütze, 1992, 1998). In this case, the different senses are not labelled, but it is possible to find commonalities in the contexts of the occurrences of the word to group them together in a few groups that represent the word meanings. Similar approaches will be discussed when describing Named Entity correference.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
information extraCtion Information Extraction consists in extracting organized information (entities, events, and relationships) from unrestricted natural language text. At least two facts triggered this research at the beginning of the 1990s: the large amounts of textual data that make it impossible to be processed manually, and the impetus provided by the ARPA-funded Message Understanding Conferences (MUCs). Nowadays there are some fully automated IE systems working with state-ofthe-art technology for some of the tasks addressed in the MUCs, with a performance and accuracy similar to that of a human expert. MUC-1, held in 1987, was merely exploratory, with no formal evaluation made. In MUC-2, the task was to fill templates, with 10 slots per template, and evaluation was done using precision and recall. In the next two MUCs (1991 and 1992), the subject was Latin American terrorism, and the number of slots increased to 24 per template. The conclusion of these conferences was that a system can attain a high accuracy without performing a deep analysis of the sentences, for example, Lehnert, Cardir, Fisher, McCarthy, Riloff, and Soderland (1994) or Cardie (1993). From MUC-5 on, the conference was conducted as part of the TIPSTER project. Topics changed again, the complexity of the task increased, and the Japanese language was added. In MUC-6 and MUC-7 the tasks were: Named Entity Recognition and Classification (identifying persons, organisations, places, currency, etc.), Coreference Resolution (identifying which mentions refer to the same entity in the real world), Template Element (filling templates about people and organisations, that represent relationships between them), and Scenario Template (filling templates about events mentioned in the texts). This section will focuse mainly on Named Entity Recognition and Classification and on Proper Noun Correference Resolution.
The purpose of the MUC conferences have been continued by the Automatic Content Extraction Evaluation (ACE), organised yearly by NIST.1 In addition to the task for Entity recognition, this competition includes test beds for the benchmark of systems carrying out the recognition of Mentions, Values, Time, Relations among words, and Events. It also includes Time Expression Recognition and Normalisation.
Named Entity Recognition and Classification (NERC) As already mentioned, NERC has received much interest even after the last MUC was organised. ACE-2005 included the identification of entities pertaining to the types and subtypes listed in Table 1. This list of entity types and subtypes can be considered as a mini-ontology for which ontology population is to be performed from free text. When the scope of the recognition is expanded to include not only the named reference but also the nominal and pronominal references the task is called mention detection (Florian, Jing, Kambhatla, & Zitouni, 2006). Table 1. ACE 05 entity types and subtypes Type
Subtype
Facility
Airport, Building-Grounds, Path, Plant, Subarea-Facility
Geo Political Entity
Continent, Country or District, GPE-Cluster, Nation, Population Center, Special, State, or Province.
Location
Address, Boundary, Celestial, Land RegionNatural, Region-General, Region-International, Water body
Organization
Commercial, Educational, Entertainment, Government, Media, Medical Science, NonGovernmental, Religious, Sports
Person
Group, Indeterminate, Individual
Vehicle
Air, Land, Subarea Vehicle, Underspecified, Water
Weapon
Biological, Blunt, Chemical, Exploding, Nuclear, Projectile, Sharp, Shooting, Underspecified
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Furthermore, Language-Independent Named Entity Recognition and Classification has also been promoted by the CoNLL-2002 and CoNLL2003 competitions (Tjong-Kim-Sang & Meulder, 2003), including data in Spanish, Dutch, English and German, and the following entities: persons, locations, organisations, and miscellaneous entities (any other kind of entity), with no subtypes. For instance, in the following text, obtained from the English wikipedia, it is possible to recognize many different named entities and mentions:
•
•
[person Angela Dorothea Merkel], born in [location Hamburg], [location
Germany], on [date July 17, 1954], as [ person Angela Dorothea
Kasner], is the [ position Chancellor] of [location Germany]. [ person Merkel], elected to the [organisation German Parliament] from [location Mecklenburg-Western Pomerania], has been the [position chairwoman] of the [organization Christian Democratic Union CDU] since [date April 9, 2000]. [ person She] has been the [ position
Chairwoman] of the [organisation CDU-CSU] parliamentary
party group from [date 2002 to 2005]. The [ person most powerful woman] in [location the world], as considered by the [organisation Forbes Magazine], is only the third woman to serve on the G8 (after [person Margaret Thatcher] of the [location UK] and [person Kim Campbell] of [location Canada]).
It can be seen that this includes, for example, person named references (Angela Dorothea Merkel, Angela Dorothea Kasner, Merkel, Margaret Thatcher and Kim Campbell), which refer to three different entities in the real world. There are also nominal references (most powerful woman) and pronominal references (she). The text also contains Locations (Hamburg, Germany, Mecklenburg-Western Pomerania, the world, ...), organisations (German Parliament, Christian Democratic Union, CDU-CSU, Forbes Magazine, and so forth), positions (chancellor, chairwoman), and dates or time intervals. Concerning the techniques for applied to NERC, we can distinguish three different kinds of approaches:
•
Knowledge-based systems, which are based in the use of rules, patterns or grammars (Arevalo, Civit, & Marti, 2004; Califf, 1998; Freitag, 2000; Maynard, Cunningham, Bontcheva, & Dimitrov, 2002; Soderland, 1999). These rules are generally handcrafted, so there exist pattern languages, such as JAPE (Cunningham, Maynard, Bontcheva, Tablan, & Ursu, 2002), that simplify the task of writing the rules and the parsers. Those that apply Machine Learning techniques, either alone or in combination, including Memory-Based Learning, Maximum Entropy models and Hidden Markov Models (Florian, Ittycheriah, Jing, & Zhang, 2003; Freitag & McCallum, 2000; Klein, Smarr, Nguyen, & Manning, 2003; Kozareva, Ferrandez, & Montoyo, 2005), Error-Driven Transformation-Based Learning (Black & Vasilakopoulos, 2002), boosting algorithms (Carreras, Marquez, & Padro, 2003), and Support Vector Machines (Isozaki & Kazawa, 2002; Li, Bontcheva, & Cunningham, 2005; Mayfield, McNamee, & Piatko, 2003). Those that combine knowledge-based methods and ML techniques (Mikheev, Grover, & Moens, 1998; Mikheev, Moens, & Grover, 1999).
Most of the initial work on NERC relied on the manual construction of hand-crafted rules. For instance, Mikheev et al. (1998), the system that attained the best results in MUC-7 (MUC7, 1998), applied a set of rules with a very high precision (called sure-fire rules) such as the following: Xxxxx+ is a? JJ* PROF
This pattern indicates that any sequence of one or more capitalised words followed by the word is, an optional word a, a sequence of zero or more adjectives and a noun representing a profession is the name of a person, for example, in the sentence
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Huang Kun is a well-known physicist in China. Similarly, titles such as Dr. or Mr., when followed by a sequence of capitalised words, most of the times indicate that the sequence is the name of a person. Other patterns can be constructed for organisations and locations in a similar fashion. Gazetteers containing extensive lists of surnames, locations, and organisations were also widely used in the first NERC systems. Although these patterns can be built semiautomatically, with a bootstrapping procedure, it involves a large amount of manual work, and they are very difficult to port to new domains and languages. That is the reason why corpus-based techniques, relying on Machine Learning algorithms, have been developed. For instance, the following would be the representation of the beginning of the previous text following the guidelines of the CoNLL-2002 and 2003 competitions: Angela
NNP
I-NP
I-NP
I-PER
Merkel
NNP
I-NP
I-PER
,
,
O
O
born
VBP
I-VP
O
in
IN
I-PP
O
NNP
I-NP
I-LOC
,
O
NNP
I-NP
I-LOC
Dorothea
Hamburg
NNP
, Germany
I-PER
O
,
,
O
O
on
IN
I-PP
O
July
NNP
I-NP
I-MISC
CD
I-NP
I-MISC
,
,
O
I-MISC
CD
I-NP
I-MISC
The first column contains the tokens; the second column contains the part-of-speech tags, using the tagset from the Penn Treebank (Markus et al., 1993); the third column indicates chunking information: which tokens belong to Noun Phrases (I-NP), Prepositional Phrases (I-PP), Verb Phrases (I-VP), or to no base syntactic constituent (O); and the last column contains the annotation
0
about Named Entities: which tokens belong to a person name (I-PER), a location name (I-LOC), or a miscellaneous entity (I-MISC, a date in the example). Other labels can be used in case there are two chunks or two entities of the same kind consecutively, to indicate the boundary between them. From a corpus like this, a system can use the lexical form of the tokens, their part-of-speech tag, the chunk labels, and their contexts, to learn a model that is able to classify each token with any of the possible labels containing the annotations about the Named Entities. Gazetteer information and any other kind of external sources can also be included in the features of the examples. As mentioned above, many different Machine Learning algorithms have been tried, with good results, in this task. Mixed approaches combine the use of handcrafted patterns with Machine Learning procedures. For instance, Mikheev et al. (1998) attained a precision of 93.39% combining the patterns with a Maximum Entropy model.
Ontology-Based NERC A NERC system, rather than having a small, predefined, set of classes, can classify instances found in a text of any possible concept present inside an ontology. The trend now is the development of unsupervised or weakly-supervised NERC systems that are able to learn to classify instances of any possible concept in ontologies of increasing size. We can mention Cimiano and Volker (2005) and Sekine (2006) as examples of these systems.
relationship extraCtion One step further in making explicit the knowledge for Semantic Environments is to tag the semantic relations between entities in a given text. The tags can simply state the type of relation between two concepts from a given domain of relations, or point
Automatic Acquisition of Semantics from Text for Semantic Work Environments
to a certain relation type included in a ontology or other knowledge representation formalism. Because most ontologies and word taxonomies are organised top-down through the hyponymy relation (IS-A), this relationship is usually considered separately, and the task of extracting relationships from text is sometimes subdivided into the extraction of taxonomic relations (hyponymy) and non-taxonomic relations (any other, e.g., part-of, author-of, located-in, works-for, etc.). As in the case of NERC, the use of automatic relationship extraction techniques can be very useful in Semantic Work Environments as ways of automatically acquiring semantic information from texts, that can be later used in ontology-based information retrieval systems. To illustrate how relations are conveyed in the text, let us consider the following sentences: The giant otter, one of the biggest mammals to be found on the Amazon river, has suffered persecution and loss of habitat. (9) The happy dog wagged its tail. (10) The piano on which John Lennon composed “Imagine” is on its way back to the Beatles Story in Liverpool.
extract is merged_into_company, being the two first entities in the triple the merging companies and the third entity the resulting one (FinkelsteinLandau & Morin, 1999). ACE-2005 included a task for Relation Detection and Recognition that comprised the types shown in Table 2. The extraction of relations between words copes again with the problem of human language ambiguity. For instance, when using techniques that extract relations by means of sentence patterns, there are phrasal constructions that express quite dissimilar relations. This is especially relevant in the case of a few short patterns such as the genitive construction: X’s Y. As can be seen in (11-15) below, the genitive can be used to convey semantic relationships as dissimilar as scientist-theory, painter-painting, writer-work, architect-building, or location-building, among many others.
(8)
In (8) the appositional sentence2 is a clue to the hyponymy relation between giant otter and mammal. The pronoun its in Sentence (9) indicates the existence of a holonymy (part-of ) relationship between dog and tail. Also, the verb composed in Sentence (10) indicates a relationship composersong between John Lennon and Imagine. The literature shows that the type of relations that can be extracted using an NLP-based automatic procedure varies broadly, from hyponymy (well studied due to its application to build taxonomies) to any of the general nontaxonomic relations like those mentioned above. In most cases the relations are extracted for two concepts, though some approaches seek for relations between three or more entities, for example triples where the three related entities are companies and the relation to
(11) (12) (13) (14) (15)
Einstein’s Theory of General Relativity Bosco’s The Garden of Delights Tolkien’s Lord of the Rings Gaudi’s Sagrada Familia Barcelona’s Sagrada Familia
Following the example, it can be seen that the relational ambiguity can be partially overcome using a Name Entity recognizer. For example, the relation architect-building must be maintained between a Person and a Building. The relation Table 2. ACE 05 relation types and subtypes Type
Subtype
Artifact
User-Owner-Inventor-Manufacturer
General Affiliation
Citizen-Resident-Religion-Ethnicity, Organization-Location
Metonymy
---
Organization Affiliation
Employment, Founder, Ownership, StudentAlumni, Sports-Affiliation, Investor-Shareholder, Membership
Part-whole
Artifact, Geographical, Subsidiary
Person-Social
Business, Family, Lasting-Personal
Physical
Located, Near
Automatic Acquisition of Semantics from Text for Semantic Work Environments
location-building can be also easily taken apart from the rest because the entity type of its subject has to be a Location and not a Person. On the other hand, the relations between words can help sometimes to disambiguate them. The Semantic Web is grounded in the use of ontologies, a knowledge representation formalism where concepts, instances, and relations, among other information, are stored. Ontology-guided retrieval (Benjamins & Fensel, 1998) exploits the possibility of specifying relations and word properties in the query, thus avoiding the problem of polysemy, as it is rare that two words that share the spelling have at the same time an identical set of properties and relations with other words. The AI-based knowledge sources developed in the 1970s and 1980s for relationship extraction were of great theoretical interest, especially because they constituted the first approaches that worked with network-shaped knowledge bases. The experience acquired through their use served as basis for the definition of how the subsequent network-shaped models of knowledge, computational lexicons, and ontologies should be structured. The hard task of manually defining the data contained in the knowledge base restricted the use of AI systems to very limited domains. During the 1990s, the development of computational lexicons for natural language processing faced the same problem, the so called knowledge acquisition bottleneck, that is, the difficulty to acquire and model general knowledge from texts, or relevant knowledge to a particular domain (Gale, Church, & Yarowsky, 1993; Hearst, 1992). Since then, there is a great concern to partially overcome this bottleneck by the use of semiautomatic procedures to extract new knowledge from text. One of the goals was the automatic extraction of relationships amongst concepts in the lexicons. With reference to the methods that extract IS-A relations between concepts, it is important to point out that the task of finding taxonomic relations for new terms nonexistent in the base ontology is
essentially equal to classifying these new terms inside the hierarchy, as many approaches mainly use the subsumption relation to position a term inside an ontology. Therefore, most of the approaches for the discovery of IS-A relations, in practise, classify new terms inside an ontology. Furthermore, in the cases in which an IS-A relation is being learnt between a new instance and an existing concept (e.g., that Zalacaín is an instance of restaurant), the task is esencially identical to ontology-based Named Entity Recognition and Classification. Therefore, there is much overlapping between all the problems described in this chapter. The approaches for relationship identification can be classified in the following groups: • • •
Methods based on Machine-Readable Dictionaries. Methods based on distributional properties of words. Pattern-based methods.
methods Based on machine-readable dictionaries Dictionary definitions, although are written in natural language, usually have some structure and are therefore easier to process than unrestricted documents. Definitions have been found very useful, as they are usually concise descriptions of the concepts and include most of the salient information about them. Therefore, some early attempts to relationship extraction focus on Machine-Readable Dictionaries. Wilks et al. (1993) introduce three different approaches to process the LDOCE dictionary in order to transform its contents into machine-tractable information structured as a network: disambiguation, parsing, and classification. Although this work is not particularly focused in relationships extraction, the authors outline a method for the extraction of semantic relations amongst the concepts. To do so, they depart from dictionary
Automatic Acquisition of Semantics from Text for Semantic Work Environments
definitions that have been already processed to create the concepts network, so the words in the definitions are already disambiguated. Then, manually defined case frames are applied to extract hyponymy and meronymy relations from the definitions. The hyponymy relation is extracted from the genus word in the definition, and the meronymy relation is identified searching lexical clues as “has the part,” combined with proper predicate cases in the lexico-syntactic rule. Rigau (1998) describes a system that analyses dictionary definitions to learn a lexico-semantic network similar to WordNet. His approach consists of identifying in the definitions the genus word and, next, disambiguating it, in order to structure the concepts according to the hyperonymy relationship. An important contribution of this work is the set of heuristics and techniques used for the disambiguation of the genus word, and the identification of several of the problems found when extracting ontologies form dictionaries. Richardson, Dolan, and VanderWende (1998) present a methodology to extract relations from dictionary definitions using a frequency-based approach. The authors use LDOCE and the American Heritage Dictionary to acquire a knowledge data base from the knowledge contained in the dictionaries. To do so, the concepts in the dictionary are represented in a network and linked by relations. Regarding the extraction of relationships, they use a syntax parser (the same used for the MS Word 97 grammar checker) and lexicosyntactic rules to extract a given set of relations from the dictionary entries. The rules are defined to extract attributes like colour, size, time, and relations between words like cause, goal, part, and possessor, amongst other. One interesting asset of this approach is that the learnt relations are extended to “similar” words, that is, to words that are found similar to the original words for which the relation was found. WordNet glosses have also been used to extract relationships between synsets Harabagiu and Moldovan (1998), Harabagiu and Moldovan (1999),
and Novischi (2002) proceed by disambiguating the words in the glosses with a WSD approach, parsing them syntactically and annotating the semantic roles of the constituents. Using this information, patterns can be identify to extract semantic relationships between the WordNet synsets. DeBoni and Manandhar (2002) also use a pattern-based method to extract telic relations (that express the purpose or function of a concept) from the glosses. In order to do this, glosses are annotated with a partial syntactic parser.
Methods Based on Distributional properties of words The distributional hypothesis states that semantically similar terms share similar contexts, and therefore the co-occurrence distributions of terms, as well as the distribution of collocations, can be used to calculate a certain semantic distance between the concepts represented by those terms (Firth, 1957). Using this distance, a partial ordering between the terms can be calculated and thus a taxonomy may be obtained, or the terms can be classified inside an existing taxonomy. These statistical methods are mainly used to extract taxonomic relations, although they also have application for extracting nontaxonomic relations (Maedche & Staab, 2000). Maedche and Staab (2001), Faure and Nédellec (1998), and Caraballo (1999) use a bottom-up clustering algorithm, also called agglomerative clustering, to build a hierarchy of concepts. It is a procedure that starts considering the set of individual concepts as the set of initial clusters, each cluster containing one single concept. In successive steps, the most similar clusters are merged at each step according to the similarity found between the terms in each cluster. The similarity between clusters can be computed using any of the several Vector Space Model-based similarity metrics, that is, by representing contexts as vectors of words, calculating a weight function on the contextual term frequencies, and measuring
Automatic Acquisition of Semantics from Text for Semantic Work Environments
the cosine between every pair of vectors. The contextual vectors can be created using syntactic dependences, as in the case of the ASIUM tool (Faure & Nédellec, 1998) that learns a hierarchy of nouns as per the verbs with which these nouns are syntactically related. Pereira, Tishby, and Lee (1999) use a top-down clustering algorithm, where, in a similar way to bottom-up, the hierarchy may be obtained from the root and a similarity metric that measures divergence is used to assign the members of a cluster. Some approaches mix the pure statistic clustering techniques explained above with other techniques: Cimiano, Hotho, and Staab (2004) use a clustering technique based in Formal Concept Analysis (FCA) (Ganter & Wille, 1999). Caraballo (1999) uses the bottom-up clustering algorithm enhanced with a pattern-based method to discover hyponymy relations. One of the problems of the clustering algorithms is that after the statistical ordering the resulting hierarchy is usually unlabelled. Caraballo uses the lexico syntactic patterns defined by (Hearst, 1992) to detect the hyperonyms of each cluster and assign a label to it. Maedche and Staab (2000) discover nontaxonomic relations from texts using an unannotated corpus, a lexicon, a predefined domain taxonomy and some language processing tools. The first stage of their methodology consists in a linguistic processing of the corpus including syntactic analysis. A Term Identification module identifies relevant terms, and all syntactic dependencies involving these terms are extracted. In a second stage, these word pairs are analysed, using a domain-specific taxonomy, to find the non-taxonomic relationships: the frequencies of the pairs, extracted from the corpus are used to compute support and confidence metrics, a balance of which, imposing a threshold, is used to decide which pairs of concepts will hold a relation between them. Alfonseca and Manandhar (2002b) use a topdown classification approach to enrich WordNet with new concepts and instances using, as similarity metric, the topic signatures and Vec-
tor Space Model-based similarity metrics, and vectors created from syntactic dependencies between words.
pattern-Based methods Pattern-based methods rely on lexical or lexicosemantic patterns to discover taxonomic and non-taxonomic relationships between concepts. Apart from a few exceptions (Navigli & Velardi, 2003), these systems use lexical and/or syntactic sequences to search a free-text corpus. Whenever a match is found, the words participating in the pattern are considered candidates for holding a relationship. In most of the systems the patterns are manually defined. One of the first works for pattern-based relationship extraction was done by Hearst (1992, 1998) by defining regular expressions to automatically extract hyponymy relations from unrestricted text. She looked for pairs of words related in WordNet and extracts sentences where the words co-occured from a corpus. By finding commonalities among the contexts of the words, she built manually some patterns such as such NP as NP* which indicate a hyponymy relationship between the enumeration of noun phrases and the first NP in the pattern. Applying the above pattern to a free-text corpus, sentences like “such authors as Herrick, Goldsmith, ...” would indicate that Herrick and Goldsmith are hyponyms of author. The relations discovered in these texts by her patterns are afterwards used to augment the WordNet lexicon. Hearst-like patterns are also used by Kietz, Maedche, and Volz (2000) and Cimiano, Handschuh, and Staab (2004). Alfonseca and Manandhar (2002c) depict a combination of pattern-based and a distributional-based approach for learning hyponymy relations. In a similar way to the procedure of Hearst (1998), WordNet is used to extract pairs
Automatic Acquisition of Semantics from Text for Semantic Work Environments
of hyperonym-hyponym synsets. Documents containing pairs of concepts from WordNet are downloaded automatically from the World Wide Web, and they are processed syntactically. From the sentences that contain pairs of terms related in WordNet syntactic patterns are extracted, that are then combined with a distributional-based approach. Concerning nontaxonomic relationships, Berland and Charniak (1999) use manual defined patterns to find PART-OF relationships. They use six seed words (book, building, car, hospital, plant, and school) and five hand written lexical patterns that take into consideration the part of the speech of its components. The patterns were applied to a News corpus (NANC: North American News Corpus) to create a group of selected words that match the patterns as candidate for a PART-OF relation. The set of selected words was filtered to find the candidate pairs with the highest support from the corpus, using the log-likelihood (Dunning, 1993) and Johnson’s sigdiff (Berland & Charniak, 1999). Finkelstein-Landau and Morin (1999) learn patterns for the Merge and Produce relationships between companies and products. The authors use manually built lexico-syntactic patterns, and a simple procedure to learn new patterns. Navigli and Velardi (2003, 2004) describe an interesting approach to the extraction of nontaxonomic relations, what they call semantic relations, which is based in non-lexical patterns. In this case, the patterns take the form of rules over the concepts participating in the relation, and its ancestors, in such a way that each rule characterises the family of concepts that can participate in a given relationship. This task is integrated in a whole framework which, using general and domain-specific texts, a core ontology (e.g., WordNet), an annotated corpus, SemCor (Miller et al., 1993), and some linguistic tools perform terminology extraction and ontology construction. Basically, after applying a syntactic parser, the system extracts from a domain corpus a set of complex expressions (e.g.,
credit card, public transport or board of directors) and disambiguates the component words. The complex expressions are then ordered in a taxonomy using the head concepts. Next, some non-taxonomic relations are acquired between the words inside each expression. For instance, there is a temporal relation in afternoon tea, a thematic relation expressed in art gallery, and a manner relation in bus service. Recently, systems that learn relations, in an unsupervised manner, from the Web have also received much attention. Among these, we may cite Ravichandran and Hovy (2001), Mann and Yarowsky (2003), Pantel and Pennacchiotti (2006), Pennacchiotti and Pantel (2006), Alfonseca et al. (2006), and Snow, Jurafsky, and Ng (2006). Most of these generally start with a seed list of pairs of concepts that have a relationship between them, and make use of the size of the World Wide Web to learn accurate patterns that express that relationship. The WWW can again be used to check the accuracy of the new pairs learnt using those patterns.
ConClusions and future researCh trends The application of the topics reviewed in this chapter falls onto the edge of three dynamic fields: the Semantic Web, Wiki and other Work Environments, and Information Extraction from texts. Many of the techniques here presented are under research nowadays for tackling the Semantic Web tagging and population bottleneck, in applications such as Ontology Learning or automatic annotation of texts. Its use in semantic wikis and other semantic work environments is straightforward. Semantic Work Environment technologies are just emerging nowadays, and it constitutes a new field to which all these technologies can be successfully applied. Though the use of semi-automatic annotation and Information Extraction have been the object
Automatic Acquisition of Semantics from Text for Semantic Work Environments
of intensive research since long ago; there is still much room for improvement. There is a special interest in unsupervised systems that learn how to extract data from unnanotated, unrestricted text like the World Wide Web, with special emphasis in techniques using machine learning to capture features such as the distributional properties of the words in a language. Additionally, the task of capturing relations among entities has been pointed out as a research opportunity. The integration of automatic annotation procedures and semantic work environments produces a mutual practical benefit: we envision this integration in the form of systems where, on the one hand, SWEs take advantage of IE results, in the form of pretagged text. The cost of manually annotating preexisting data is therefore reduced, and though automatic annotations will contain some amount of mistakes, they also serve as an example basis for the users to correct and go on with new annotations. On the other hand, the IE system can bootstrap from the manual corrections and annotations provided, learning from the new examples and thus reducing the data sparseness problem that limits the performance of the extraction procedures.
referenCes Agirre, E., Ansa, O., Martinez, D., & Hovy, E. (2001). Enriching WordNet concepts with topic signatures. In Proceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations. In conjunction with the second meeting of the North American Chapter of the Association of Computational Linguistics. Pittsburg, Pennsylvania. Agirre, E., & Lopez de Lacalle, O. (2004). Publicly available topic signatures for all WordNet nominal senses. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal.
Agirre, E., & Lopez de Lacalle, O. (2005). Clustering wordnet word senses. In Proceedings of Recent Advances in Natural Language Processing III. Agirre, E., & Rigau, G. (1995). A proposal for word sense disambiguation using conceptual distance. In Proceedings of Recent Advances in Natural Language Processing, Tzigov Chark, Bulgary (pp. 258-264). Alfonseca, E., Castells, P., Okumura, M., & Ruiz-Casado, M. (2006). A rote extractor with edit distance-based generalisation and multicorpora precision calculation. In Proceedings of the Poster Session of the Annual Meeting of the Association of Computational Linguistics, Sidney, Australia. Alfonseca, E., & Manandhar, S. (2002a). Distinguishing instances and concepts in wordnet. In Proceedings of the First International Conference on General WordNet, Mysore, India. Alfonseca, E., & Manandhar, S. (2202b). Extending a lexical ontology by a combination of distributional semantics signatures. In A. GomezPerez & R. Benjamins (Eds.), Proceedings of the Thirteenth International Conference on Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, Siguenza, Spain (LNAI 2473, pp. 1-7). Alfonseca, E., & Manandhar, S. (2002c). Improving an ontology refinement method with hyponymy patterns. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain. Alfonseca, E., Moreno-Sandoval, A., Guirao, J.M., & Ruiz-Casado, M. (2006). The wraetlic NLP suite. In 5th International Conference on Language Resources and Evaluation, Genova, Italy. Androutsopoulos, I., & Aretoulaki, M. (2003). Natural language interaction. In R. Mitkov (Ed.), The Oxford handbook of computational linguis-
Automatic Acquisition of Semantics from Text for Semantic Work Environments
tics (pp. 136-156). New York: Oxford University Press Inc. Arevalo, M., Civit, M., & Marti, M.A. (2004) MICE: A module for named entity recognition and clasification. International Journal of Corpus Linguistics, 9(1), 53-68. Benjamins, R., & Fensel, D. (1998). Community is knowledge! In (KA)2. In Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling, and Management, Banff, Canada. Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising wordnet domains hierarchy: Semantics, coverage, and balancing. In Proceedings of the Twentieth International Conference on Computational Linguistics, Workshop on Multilingual Linguistic Resources, Geneva, Switzerland (pp. 101-108).
LT XML version 1.2. User documentation and reference guide. Retrieved March 11, 2008, from http://www.ltg.ed.ac.uk/np/ltxml/xmldoc.html Burstein, J., Leacock., C., & Swartz, R. (2001). Automated evaluation of essays and short answers. In Proceedings of the Fifth Computer-Assisted Assessment Conference. Loughborough: Loughborough University. Califf, M.E. (1998). Relational learning techniques for natural language extraction. Ph.D. thesis, University of Texas at Austin. Caraballo, S.A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, (pp. 120-126)..
Berland, M., & Charniak, E. Finding parts in very large corpora. In Proceedings of the ThirtySeventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland (pp.57-64).
Cardie, C. (1993) A case-based approach to knowledge acquisition for domain-specific sentence analysis. In Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 798-803).
Black, W.J., & Vasilakopoulos, A. (2002). Language-independent named entity classification by modified transformation-based learning and by decision tree induction. In Proceedings of the Sixth Workshop on Computational Language Learning, in Association with the Nineteenth International Conference on Computational Linguistics, Taipei, Taiwan (pp. 159-162).
Carreras, X., Marquez, L., & Padro, L. (2003). A simple named entity extractor using adaboost. In W. Daelemans & M. Osborne (Eds.), Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, Canada, (pp. 152-155).
Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10, 349-373. Botley, S., & McEnery, A.M. (Ed.). (2000). Corpus-based and computational approaches to discourse anaphora. Philadelphia, PA: John Benjamins Publishing Co. Brew, C., McKelvie, D., Tobin, R., Thompson, H.S., & Mikheev, A. (2000). The XML library
Chakrabarti, D., Narayan, D.K., Pandey, P., & Bhattacharyya, P. (2002). Experiences in building the indo wordnet: A wordnet for hindi. In Proceedings of the First International Conference on General WordNet, Mysore, India. Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In Lexical Acquisition: Exploiting Online Resources to Build a Lexicon (pp. 115-164). Cimiano, P., Handschuh, S., & Staab, S. (2004b). Towards the self-annotating Web. In Proceedings of the Thirteenth World Wide Web Conference, New York City, New York.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Cimiano, P., Hotho, A., & Staab, S. (2004a). Clustering concept hierarchies from text. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal. Cimiano, P., Reyle, U., & Saric, J. (2005). Ontology-driven discourse analysis for information extraction. Data Knowledge and Engineering, 55(1), 59-83. Cimiano, P., & Volker, J. (2005). Towards largescale, open-domain and ontology-based named entity classification. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgary. Contreras, J., Benjamins, R., Martin, F., Navarrete, B., Aguado, G., Alvarez, I., et al. (2003). Annotation tools and services (Esperonto Project deliverable 31). Madrid, Spain: Universidad Politecnica de Madrid. Copestake, A. (1992). The acquilex lkb: Representation issues in semi-automatic acquisition of large lexicons. In The Third Conference on Applied Natural Language Processing, Trento, Italy. Cunningham, H., Bontcheva, K., Tablan, V., & Wilks, Y. (2000). Software infrastructure for language resources: A taxonomy of previous work and a requirements analysis. In Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., & Ursu, C. (2002). The GATE userGuide. Dahlgren, K.G. (2000). Naive semantics for natural language understanding. Boston, MA: Kluwer Academic Publishers. DeBoni, M., & Manandhar, S. (2002). Automated discovery of telic relations in wordnet. In Proceedings of the First International Conference of General WordNet, Mysore, India.
Drozdzynski, W., Krieger, H.U., Piskorski, J., & Schafer, U. (2005). SProUT – a general-purpose NLP framework integrating finite-state and unification-based grammar formalisms. In 5th International Workshop on Finite-State Methods and Natural Language Processing, Helsinki, Finland. Dunning, T (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74. Engels, R., & Bremdal, B. (2000). Information extraction: State-of-the-art report (On-To-Knowledge Project, deliverable 5). Asker, Norway: CognIT a.s. Faure, D., & Nedellec, C. (1998). A corpus-based conceptual clustering method for verb frames and ontology acquisition. In Proceedings of the Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, First International Conference on Language Resources and Evaluation, Granada, Spain. Finkelstein-Landau, M., & Morin, E. Extracting semantic relationships between terms: Supervised vs. unsupervised methods. In Proceedings of the International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany, (pp. 71-80). Firth, J. (1957). Papers in linguistics 1934-1951. London: Oxford University Press. Florescu, D., Levy, A., & Mendelzon, A. (1998). Database techniques for the World Wide Web: A survey. Association for Computing Machinery, Special Interest Group on Management of Data Record, 27(3), 59-74. Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on Natural Language Learning (pp. 168-171). Edmonton, Canada.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Florian, R., Jing, H., Kambhatla, N., & Zitouni, I. (2006). Factorizing complex models: A case study in mention detection. In Proceedings of the 21st International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 473-480). Freitag, D. (2000). Machine learning for information extraction in informal domains. Machine Learning, 39(2-3), 169-202. Freitag, D., & McCallum, A. (2000). Information extraction with HMM structures learned by stochastic optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence (pp. 584-589). Austin, TX: AAAI Press/The MIT Press. Gale, W.A., Church, K.W., & Yarowsky, D. (1993). A method for disambiguating word senses in a lare corpus. Computers and the Humanities, 26, 415-439. Ganter, B., & Wille, R. (1999). Formal concept analysis - mathematical foundations. Springer Verlag. Grishman, R. (1998). TIPSTER architecture. Retrieved March 11, 2008, from http://cs.nyu. edu/cs/faculty/grishman/tipster.html Grishman, R. (2005). JET: Java extraction toolkit. Retrieved March 11, 2008, from http://cs.nyu. edu/grishman/ Grozea, C. (2004). Finding optimal parameter settings for high performance word sense disambiguation. In Proceedings of the third workshop on the evaluation of systems for the semantic analysis of text, Barcelona, Spain. Guthrie, J.A., Guthrie, L., Wilks, Y., & Aidinejad, H. (1991). Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting on Association for
Computational Linguistics, Berkeley, California (pp. 146-152). Hamp, B., & Feldweg, H. (1997). GermaNet – a lexical-semantic net for German. In Proceedings of the ACL Workshop on Automatic Information Extraction and Building of Lexical Resources for NLP Applications, Madrid, Spain. Harabagiu, S., Miller, G., & Moldovan, D. (1999). Wordnet 2 - a morphologically and semantically enhanced resource. In Proceedings of the SIGLEX Workshop on Multilingual Lexicons, Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland. Harabagiu, S., & Moldovan, D. (1998). Knowledge processing in an extended wordnet. In WordNet: An Electronic Lexical Database (pp. 379-405). MIT Press. Harabagiu, S., & Moldovan, D. (2003). Question answering. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 560-582). New York: Oxford University Press Inc. Hearst, M.A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fifteenth International Conference on Computational Linguistics, Nantes, France. Hearst, M.A. (1998). Automated discovery of wordnet relations. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp.132-152). MIT Press. Hearst, M.A. (2003). Text data mining. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 616-628). New York: Oxford University Press Inc. Hindle, D. (1990). Noun classification from predicate-argument structures. In Proceedings of the 28th Conference of the Association for Computational Linguistics (pp. 268-275).
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Ide, N., & Romary, L. (2003). Outline of the international standard linguistic annotation framework. In Proceedings of the 41st Meeting of the Association of Computational Linguistics - Workshop on Linguistic Annotation: Getting the Model Right, Sapporo, Japan, (pp. 1-5). Ide, N., & Veronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24, 1-40. Isozaki, H., & Kazawa, H. (2002). Efficient support vector classifiers for named entity recognition. In Proceedings of the nineteenth international conference on Computational Linguistics, Morristown, New Jersey (pp. 1-7). Kaplan, R.M. (2003). Syntax. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 70-90). New York: Oxford University Press Inc. Kay, M. (2003). Introduction. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 616-628). New York: Oxford University Press Inc. Keok Lee, Y., Tou NG, H., & Kiah Chia, T. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In Proceedings of the Third Workshop on the Evaluation of Systems for the Semantic Analysis of text, Barcelona, Spain. Kietz, J., Maedche, A., & Volz, R. (2000). A method for semi-automatic ontology acquisition from a corporate intranet. In Proceedings of the Workshop Ontologies and Text, Twelfth International Conference on Knowledge Engineering and Knowledge Management, Juan-les-Pins, France. Klein, D., Smarr, J., Nguyen, H., & Manning, C. (2003). Named entity recognition with characterlevel models. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on
0
Natural Language Learning, Edmonton, Canada, (pp. 180-183). Kozareva, Z, Ferrandez, O., & Montoyo, A. (2005). Combining data-driven systems for improving named entity recognition. In Natural Language Processing and Information Systems (LNCS 3513, pp. 80-90). Springer. Kushmerick, N., Weld, D.S., & Doorenbos, R.B. (1997). Wrapper induction for information extraction. In M.E. Pollack (Ed.), Proceedings of the 15th International Joint Conference on Artificial Intelligence 1997: Vol. 1, Nagoya, Japan (pp. 729-737). CA: International Joint Conferences on Artificial Intelligence Inc. Leech, G., & Weisser, M. Pragamatics and dialogue. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 136-156). New York: Oxford University Press Inc. Lehnert, W., Cardir, C., Fisher, D., McCarthy, J, Riloff, E., & Soderland, S. (1994). Evaluating an information extraction system. Journal of Integrated Computer-Aided Engineering, 1(6). Lemmety, S. (1999). Review of speech synthesis technology. Unpublished Master’s thesis, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland. Lenat, D. (1995). Steps to sharing knowledge. In N. Mars (Ed.), Towards very large knowledge bases. IOS Press. Lenat, D., & Guha, R.V. (1990). Building large lnowledge-based systems. Reading, MA: Addison-Wesley. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries. In Proceedings of the Fifth International Conference on Systems Documentation (pp. 24-26). Li, Y., Bontcheva, K., & Cunningham, H. (2005). Using uneven margins SVM and perceptron for information extraction. In Proceedings of the
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Ninth Conference on Computational Natural Language Learning, Ann Arbor, Michigan, (pp. 72-79).
McClelland, J.L., & Rumelhart, D.E. (1981). An interactive activation of context effects in letter perception. Psychological Review, 88, 375-407.
Macleod, C., Grishman, R., & Meyers, A. (1994). Comlex syntax reference manual. Retrieved March 11, 2008, from http://nlp.cs.nyu.edu/comlex/
McKelvie, D., Brew, C., & Thompson, H.S. (1997). Using SGML as a basis for data intensive natural language processing. Computers and the Humanities, 31(5), 367-388.
Maedche, A., & Staab, S. (2000). Discovering conceptual relations from text (Tech. Rep. 399, Institute AIFB). Karlsruhe, Germany: Karlsruhe University. Maedche, A., & Staab, S. (2001). Ontology learning for the Semantic Web. IEEE Intelligent Systems, 16(2). Magnini, B., & Cavaglia, G. (2000). Integrating subject field codes into wordnet. In Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, (pp. 1413-1418). Mann, W.C., & Thompson, S.A. (1988). Rethorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243-281. Mann, G., & Yarowsky, D. (2003). Unsupervised personal name disambiguation. In Proceedings of CoNLL-2003. Marcus, M.P., Santorini, B., & Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19(2), 313-330. Mayfield, J., McNamee, P., & Piatko, C. (2003). Named entity recognition using hundreds of thousands of features. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, Canada, (pp. 184-187). Maynard, D., Cunningham, H., Bontcheva, K., & Dimitrov, M. (2002). Adapting a robust multigenre ne system for automatic content extraction. In Artificial Intelligence: Methodology, Systems, and applications (LNAI Vol. 2443, pp. 264-273). Springer-Verlag.
Mihalcea, R., Chkloski, T., & Kilgarriff, A. (2004). The senseval-3 English lexical sample task. In Proceedings of the Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. Mikheev, A. (2002). Periods, capitalized words, etc. Computational linguistics, 28(3), 245-288. Mikheev, A., Grover, C., & Moens, M. (1998). Description of the ltg system used for muc-7. In Proceedings of Seventh Message Understanding Conference. Mikheev, A., Moens, M., & Grover, C. (1999). Named entity recognition without gazeteers. In Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, (pp. 1-8). Miller, G.A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41. Mitkov, R. (Ed.). (2003). The Oxford handbook of computacional linguistics. Oxford, UK: Oxford University Press. Mohanty, S., Ray, N.B., Ray, R.C.B., & Santi, P.K. (2002). Oriya wordnet. In Proceedings of the First International Conference on General WordNet, Mysore, India. MUC7. (1998). Proceedings of the 7th Message Understanding Conference. Morgan Kaufman. Navigli, R., & Velardi, P. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(3), 323-340.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Navigli, R., & Velardi, P. (2004). Learning domain ontologies from document warehouses and dedicated websites. Computational Linguistics, 30(2), 151-179. MIT Press. Novischi, A. (2002). Accurate semantic annotation via pattern matching. In Florida Artificial Intelligence Research Society Conference, Pensacola, Florida. Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 113-120). Pennacchiotti, M., & Pantel, P. (2006). Ontologizing semantic relations. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 793-800). Pereira, F., Tishby, N., & Lee, L. (1999). Distributional clustering of English words. In Proceedings of the Thirty-First Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, (pp. 183-90). Petasis, G., Karkaletsis, V., Paliouras, G., Androutsopoulos, I., & Spyropoulos, C. (2002). Ellogon: A new text engineering platform. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain (pp. 72-78). Quillian, M.R. (1969). The teachable language comprehender: A simulation program and theory of language. Communications of the Association of Computational Machinery, 12(8), 459-476. Ravichandran, D., & Hovy, E. (2001). Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting
of the Association of Computational Linguistics (pp. 41-47). Resnik, P. (1995a). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the Fouteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, (pp. 448-453). Resnik, P. (1995b). Disambiguating noun groupings with respect to wordnet senses. In D. Yarowsky & K. Church (Eds.), Proceedings of the Third Workshop on Very Large Corpora (pp. 54-68). Somerset, NJ: Association of Computational Linguistics. Richardson, S.D., Dolan, W.B., & VanderWende, L. (1998). Mindnet: Acquiring and structuring semantic information from text. In Proceedings of the Seventeenth International Conference on Computational Linguistics, Montreal, Canada. Rigau, G. (1998). Automatic acquisition of lexical knowledge from MRDs. Ph.D. Thesis, Departament de Llenguatges i Sistemes Informatics, Universitat Politecnica de Catalunya. Barcelona, Spain. Schütze, H. (1992) Dimensions of meaning. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing (pp. 787-796). Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97-123. Sekine, S. (2002). Oak system. Retrieved March 11, 2008, from http://nlp.cs.nyu.edu/oak/ Sekine, S. (2006). On-demand information extraction. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics. Simkins, N.K. (1994). An open architecture for language engineering. Paper presented at the First Language Engineering Convention, Paris, France.
Automatic Acquisition of Semantics from Text for Semantic Work Environments
Snow, R., Jurafsky, D., & Ng, A.Y. (2006). Semantic taxonomy induction from heterogeneous evidence. In Proceedings of the Annual Meeting of the Association Computational Linguistics, Sidney, Australia. Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3), 233-272. Strappavara, C., Gliozzo, A., & Giuliano, C. (2004). Pattern abstraction and term similarity for word sense disambiaguation: IRST at Senseval3. In Proceedings of the Third Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain. Sussna, M. (1993). Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Base Management, Arlington, Virginia, (pp. 67-74). Thompson, H.S., & McKelvie, D. (1997). Hyperlink semantics for standoff markup of read-only documents. In Proceedings of SGML Europe ’97, Barcelona, Spain. Tjong-Kim-Sang, E.F., & De Meulder, F. (2003). Introduction to the conll-2003 shared task: Language independent named entity recognition. In W. Daelemans & M. Osborne (Eds.), Proceedings of the Seventh Conference on Natural Language Learning (pp. 142-147). Edmonton, Canada. Trost, H. (2003). Morphology. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 25-47). New York: Oxford University Press Inc. Tufis, D., Cristea, D., & Stamou, S. (2004). BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal on Information Science and Technology. Special Issue on BalkaNet, 7(1-2), 9-34. Uren, V.S., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., et al. (2006). Semantic
annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics, 4(1), 14-28. Vorhees, E.M. (1993) Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the Sixteenth International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 171-180). Vossen, P. (Ed.). (1998). EuroWordNet - a multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers. Walker, D. (1987). Knowledge resource tools for accessing large text files. In S. Nirenberg (Ed.), Machine translation: Theoretical and methodological Issues (pp. 247-261). Cambridge, England: Cambridge University Press. Wilks, Y., Fass, D.C., Ming Guo, C., McDonald, J.E., Plate, T., & Slator, B.M. (1993). Providing machine tractable dictionary tools. In J. Pustejovsky (Ed.), Semantics and the lexicon (pp. 341-401). Cambridge, MA: Kluwer Academics Publishers. Wolinski, F., Vichot, F., & Gremont, O. (1998). Producing NLP-based online contentware. In Natural Language and Industrial Applications, Moncton, Canada. Yarowsky, D. (1992). Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics COLING (pp. 454-460).
additional reading For a deeper understanding of the topics outlined herein, the reader is encouraged to refer to the following works: Agirre, E., Mârquez, L., & Wicentowski, R. (Ed.). (2007). SemEval-2007. In Proceedings
Automatic Acquisition of Semantics from Text for Semantic Work Environments
of the 4th International Workshop on Semantic Evaluations.
endnotes 1
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Prentice Hall. Manning, C. D., & Schütze, H (2001). Foundations of statistical natural language processing. MIT Press. Mitkov, R. (2003). The Oxford handbook of computational lingüistics. Oxford University Press.
2
National Institute of Standards and Technology. http://www.nist.gov/speech/tests/ace/ Construct where a noun, or a noun phrase, immediatly follows another element of the same class to explain something related to it, for example, Tokyo, capital city of Japan, is located in the Kanto region.
Chapter XIV
Technologies for Semantic Project-Driven Work Environments Bernhard Schandl University of Vienna, Austria Ross King Austrian Research Centers GmbH (ARC) Research Studios, Austria Niko Popitsch Austrian Research Centers GmbH (ARC) Research Studios, Austria Brigitte Rauter P.Solutions Informationstechnologie GmbH, Austria Martin Povazay P.Solutions Informationstechnologie GmbH, Austria
introduCtion As computer and Internet applications became ubiquitous, most daily business must handle an increasing amount of information via several applications and systems such as e-mail applications, file systems, business software, databases, or other systems. Dealing with information flows is not restricted to a special skill level or field of work; it is rather a significant attribute of any computational work environment.
Most ongoing tasks in companies are in the context of a project since management strategies force process-driven business and organization. During a project, a large volume of knowledge arises that is connected to the output of the project (products and services) as well as to know-how regarding the project realisation and the use of resources. The important aspect of capturing this knowledge in some form is to impart and recycle organizational knowledge to gain raising efficiency in doing business. This knowledge capture may
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Technologies for Semantic Project-Driven Work Environments
result in the creation of (digital) documents like, for example, project plans, resource plans, reports, product sheets, and so on. But there is also a second aspect of knowledge capture, regarding a kind of semantic glue between those assets that should be captured in order to be able to relate them to each other (e.g., who did what on a certain project, how do requirement documents and project reports relate, and so forth), that is, to define their context.
the present This second aspect of knowledge capture is ignored by most systems and methodologies in place today. Consider, for example, the common practice of storing the majority of project documents on a shared file server. Semantic information about the meaning of a certain document or about its relation to other documents in the same or in other projects can only be captured in a most restricted way (e.g., by using file name and/or file path conventions or by describing such relations in these documents themselves). The result of this approach is that it is very cumbersome to find documents on such a file server as soon as it grows to a certain size. This document management strategy supports the finding of documents only by browsing a strictly hierarchical directory structure that follows a certain naming convention, or by searching for low-level metadata features (e.g., creation date) or for some text in the document (full text search). Specialized software (e.g., project management software) was meant to overcome these shortcomings. Hundreds of tools have been developed in this area—nevertheless most of them are fairly closed-box systems that are difficult to customize and that force a company to shape its business processes to fit the software’s requirements, rather than the other way around. Furthermore such systems—where used—do not fulfil all of the requirements in a project-driven working environment.
The Future It is our opinion that highly collaborative semantic systems are needed to advance the state of the art in knowledge capture and reuse in the context of project and document management. Any system that supports users in managing their documents must be able to capture the semantic glue between these documents in some form. We suggest that a mixture between natural language (for users) and formal description languages (for computers) would be very beneficial here. Capturing the semantics of documents and their interrelations supports finding, exploring, reusing, and exchanging digital documents. Furthermore, this context information may be an essential aspect of the long-term preservation of such these documents. We believe that the process of capturing semantics must take place when the system users have maximum knowledge about a certain document (i.e., when the document is created or updated) and should interfere with a user’s normal workflow as little as possible. As every organization has slightly different internal workflows and requirements for such a system, the underlying software must be highly configurable and easily adapted towards the organization’s (changing) needs. Furthermore—as project work is always team work—we want to emphasize the need for a strong collaborative character of such systems. The various aspects of information capture and the distributed nature of the utilized information sources furthermore demands an infrastructure that supports interconnection and integration of multiple heterogeneous data sources. We observe a demand for semantic systems in areas in which knowledge work and collaboration is required, for example for managing liability cases, audit reports, or inspection reports; in software development, product management, management consultancy, or innovation management. Knowledge is one of the most important assets of organisations in these fields, which accounts for the demand for semantic (knowledge) work environments.
Technologies for Semantic Project-Driven Work Environments
In this chapter, we will illustrate how a suite of semantic technologies can help to decrease the effort required for knowledge organization, storage, and retrieval.
working in projeCt-driven environments: the need for CollaBorative semantiCs Before delving into technical details, we will give a short description of typical practices and processes in project-driven environments in industry in order to show where we see potential for capturing information in its semantic context. This example will demonstrate the apparent need for collaborative semantics. In general, a project is a compilation of specific tasks and activities that are required in order to reach a defined goal within a defined time frame. A project is usually realized by a team of people with different skills and experience. Project-oriented work environments face the need of dealing with a very large volume of data stored in various files or system applications: material bills, object lists, detailed descriptions and specifications, and other resources. When working in a team, there is increased demand for communication between team members and, in particular, demand for the documentation of communication. Following our own project experience during the last years, we can (amongst others) identify the following questions that frequently arise when working in small- to medium-sized IT projects: •
•
•
What kind of project information is stored in which system, and what meaning has such information in which context? What are the naming and structure conventions for information, for example, in a file system? Who is responsible for which information and task, and which time constraints have to be considered?
•
What kind of dependencies between information and tasks are observable?
Everyday work consists, to a certain extent, of repeating mini projects or processes where, in our opinion, semantically enriched information can be captured automatically by a semantic management system, assuming that such a system does not overburden the project team with additional effort. The tracking of information in semantic contexts starts with the definition of the project aim and the planning of the realisation. Typically, this information is stored in file systems, project management systems,or databases. Consequently, the basic ontology for a project is already defined during the project’s initial stage, including all persons that are assigned to tasks based on their skills, experiences, and responsibilities. In order to reach the defined milestones, the team members focus on how to realize the given tasks. They must analyze what is new and what can be reused. The project will be itemized: Which components are needed to develop an application with its specific attributes, depending on its usage context. They look after successfully realized projects with similar problems: Is this knowledge already available? Were there any similar situations? Who was involved and what were the detailed specifications of material and costs? In the next stage, the project subteams collect and develop specific information to reach their goals, therefore extending and refining the project ontology. During all process steps, different systems are used to represent information and semantics: groupware systems like Outlook, Exchange, ERP systems, CRM systems; project management tools like MS Project, intranet services, file systems, and others. Often, there is no entity that integrates information from these various sources and presents a unified view on this information to the user. In this chapter, we describe a set of technologies that provide an infrastructure for the realization of small- and medium-sized projects. We describe
Technologies for Semantic Project-Driven Work Environments
METIS, a database that can act as integration point for ontologies and data gathered from external systems, and Ylvi, a semantic wiki that allows creation of semantic information during normal project work. The SemDAV protocol, introduced in this chapter, enables communication and the exchange of data and metadata between subsystems. The Semplorer, a graphical user interface constructed on top of this protocol, provides different views and management functions on the unified set of project-related information. In the following, we illustrate how these technologies can be used together to cope with the efforts of project-driven work environments.
related work The integration of information from different sources in order to create a unified organizational memory has been studied in various works (Noy, 2004; Wache et al., 2001). Semantic Web technology provides a suitable basis for such a task (Priebe & Pernul, 2003), since it provides a data model for representing nearly every kind of information (RDF1), both theoretical and practical models that allow further processing of metadata (namely ontological organizing and reasoning), and implementations of these models that are being developed by an increasingly large community.
Ontology-Based Information integration Examples for ontology-based information integration systems are Observer (Mena, Illarramendi, Kshyap, & Sheth, 2000) or the RDF-based Ontobroker (Decker, Erdmann, Fensel, & Studer, 1998). The central approach of these systems is the integration of heterogeneous sources by making the semantics of data explicit through the definition of ontologies. Their goal is to enable semantic (or concept-based) queries over the integrated data sources via a uniform query interface. Structural
and semantic conflicts between the ontologies that are involved in integration scenarios can are reconciled through the definition of mappings. An important representative for mapping frameworks built on Semantic Web technologies is the MAFRA framework (Maedche, Motik, Silva, & Volz, 2002). Recent works on information integration focus on the requirements of organisations consisting of many scattered units. Peer-to-peer (P2P) networks are a possible approach to fulfil these requirements and systems like Edutella (Nejdl, et al., 2002) or GridVine (Aberer, Cudre-Maurox, Hauswirth, & van Pelt, 2004) are knowledge based integration systems for P2P architectures. The EU Project BRICKS (Risse, Knezevic, Meghini, Hecht, & Basile, 2005) combines this peer-to-peer approach with the semantic integration techniques mentioned above, Integration of information requires access to the data stored in organizational data sources. D2RQ (Bizer& Seaborne, 2004) is a framework for treating data in relational databases as RDF graphs and Lethi (Lethi & Frankhauser, 2004) propose an approach for XML data integration with OWL. An example of the useful integration of Semantic Web technology and Web services is the SemanticLIFE project (Anjomshoaa, Manh Nguyen, Shayeganfar, & Tjoa, 2006). In this work, data from different sources, including Outlook databases or instant messaging clients, is integrated with the aim to build a semantic repository of a human’s personal information. We believe that this idea can be applied to organizational knowledge as well, keeping in mind that it is still important to consider an individual’s personal information sphere. We hypothesize that a significant portion of organizational knowledge is concealed in seemingly personal information like e-mail messages or instant messaging communication, and every approach for organization-wide knowledge management must consider these types of information.
Technologies for Semantic Project-Driven Work Environments
the semantic desktop The above hypothesis is supported by the increasing importance of research that aims at the creation of the semantic desktop (Sauermann, Bernardi, & Dengel, 2005), a new metaphor for human-centered information management using semantic technology. Important projects in this field are Haystack (Karger, Bakshi, Huynh, Quan, & Sinha, 2005), Gnowsis (Sauermann, 2005), Chandler (Fitzgerald, 2003), and DeepaMehta. The common denominator of most semantic desktop projects is the use of ontologies for data organization and the storage of metadata in RDF, which makes the systems open for interconnection to other platforms, and allows the composition of organization-wide semantic systems. Another possible approach for data integration from heterogeneous systems is described by Haslhofer (2006), where SPARQL queries and corresponding result graphs are rewritten on-the-fly by mediators, according to predefined schema mappings. Organizational intranets are often considered a primary storage pool for organizational knowledge. Recent studies (Géczy, Izumi, Akaho, & Hasida, 2006; Lamb & Davidson, 2005) indicate that knowledge workers often make use of only a limited subset of intranet resources. By applying Semantic Web technologies not only to the World Wide Web and the desktop, but also to intranet resources, it will be possible to create a technology bridge between these different worlds. As there will be no single point of integration in such environments, the peer-to-peer paradigm may serve as underlying networking paradigm. Peerto-peer technology for knowledge management has been presented, for example, by Kaulgud and Dolas (2006) and Le Coche, Mastroianni, Pirrò, Ruffolo, and Talia (2006).
(Semantic) Wikis Another approach to facilitate semantic knowledge exchange within organizations and/or
communities of practice is the usage of wikis (Ajchrzak, Wagner, & Yates, 2006) or semantic wikis (Oren, Breslin, & Decker, 2006; Tolksdorf & Simperl, 2006; Völkel, Krötzsch, Vrandecic, Haller, & Studer, 2006). In order to find a common agreement on how to interchange data between different semantic wiki systems, Völkel and Oren (2006) proposed the Wiki Interchange Format (WIF). However, many wiki implementations suffer from two principle problems: 1.
2.
With the increasing number of (semantic) features within wikis, the user interface (i.e., the wiki markup) becomes more complicated, which discourages the use of these features. Also, semantic features are often poorly documented, and the semantics of using them is not clearly defined. Intuitive and clear user interfaces are required in order to encourage users and further promote the idea of semantic wikis. Many wiki implementations lack support for management of data that cannot be represented by text-based formats, for example, multimedia data and integration of legacy systems (e.g., database-driven CRM systems) is difficult.
We consider (semantic) wikis as a valuable tool for information organization and collaborative knowledge work. In order to increase user acceptance, the abovementioned problems still have to be investigated.
teChnologies for engineering an organizational memory In this section, we present an overview of the architecture and relevant technical details of METIS, a framework for the management of multimedia data and metadata, Ylvi, a semantic wiki platform built on top of METIS, and SemDAV,
Technologies for Semantic Project-Driven Work Environments
a Semantic-Web-based protocol that allows the integration of various data sources into a unified information model. We illustrate use cases for these technologies in the context of project-driven work environments and point out how these systems may work together to give enterprise-wide tool support for knowledge tasks.
into a semantic repository where enterprise data is represented using Semantic Web technology (SemDAV) which may also be backed by a METIS instance. SemDAV provides a protocol and an Application Programming Interface so that it can be used as semantic storage system for any business application. Through the Semplorer, a graphical user interface for SemDAV, users may access data using similar interaction metaphors like common file browsers provide. In the next sections, we introduce the individual components of our system suite in more detail and illustrate their usage with the continuous example of small and medium project work. Despite the fact that some aspects of our architecture are still under development, we have already carried out research and development projects using our system suite, and initial experience showed that projects benefit from the slim, efficient structures fostered by our tool suite.
Architecture Overview In Figure 1, we give an overview of our envisioned architecture for project-driven work environments. The semantic database for multimedia objects METIS provides support for persistence of data and metadata as well as a flexible plugin framework for content analysis. On top of METIS, a Web-based Wiki-like interface (Ylvi) provides user access to stored data and annotations through configurable rendering pipelines. Existing desktop applications may be integrated
Figure 1. Knowledge support for project-driven work environments
External Systems
Ylvi Plugin
METIS Core
0
plugin
plugin
plugin
Data
Data
Data
SemDAV Adaption Layer
METIS GUI
External Systems
User
Semplorer SemDAV API SemDAV Protocol SemDAV Repository
Data
Data
Technologies for Semantic Project-Driven Work Environments
METIS: A Flexible Database Foundation for Unified Media management Currently digital data—both in the Internet and in the intranet domain—is shifting more and more towards multimedia data. Reasons for this include •
• •
Widely spread media capturing technologies (e.g., digital cameras, video enabled cell phones, and so forth) Affordable mass-storage and storage devices Increasing bandwidth for transportation of digital content
It is becoming more and more unusual to store data purely in text formats. The increased availability of digital multimedia content demands powerful data management and processing technologies that take the special properties of this kind of data into account. Multimedia management is different from traditional text-based methods. Many techniques that are easily applied to text (e.g., indexing, comparison, transformation) are hardly applicable to other media formats or require quite different approaches (for example image comparison or video indexing). Most current approaches rely on the capturing and appliance of metadata of some form to fulfil these tasks. This requires infrastructure that is able to • • •
Capture the data and its metadata in appropriate formats, Apply domain and media specific algorithms on the managed data and metadata, and Expose the managed data to external systems/applications in well-defined exchange formats.
Example: In project work, a significant fraction of information in a common working environment
is hidden in e-mail attachments—some of the attached media items’ formats are indexable by the e-mail client, but many are not; thus, searching for e-mail attachments is still cumbersome. Furthermore, the implicit and explicit relations between various media items can be used to support document discovery; for example, one could navigate from a Microsoft Word document on a local hard disk to the e-mail message(s) this document was attached to, and to other documents by the same sender, or to other related e-mails and attached documents. As a first step towards addressing the mentioned problems with current state-of-the-art multimedia management technologies, we have developed a prototypical infrastructure that supports uniform and semantic multimedia management for a broad range of possible multimedia applications. Typical multimedia database systems focus on specific kinds of media and/or applications (e.g., video databases, image databases). In the following we provide a short overview of METIS, a database foundation for the unified management of multimedia data of all kinds, comprehensively documented by King, Popitsch, and Westermann (2004, 2007). The major strength of METIS is its flexibility in adapting to a large number of possible multimedia application scenarios. This is achieved by providing highly flexible frameworks for customizing and extending the implemented semantic data model as well as the associated functionality (including metadata extractors and comparison functions; but also interfaces to external systems). Semantic data model. METIS is based on an expressive data model that can instantiate any desired scheme for media management, description, and classification. The instantiated data model can be extended/changed at any time. So-called semantic packs permit the bundling of domainspecific customizations and the introduction of widely-used metadata standards, for example,
Technologies for Semantic Project-Driven Work Environments
Figure 2. METIS architecture overview Multichannel Content Delivery Content Aggregation and Presentation
Template Engine
Application Apache Cocoon
Query Engine
Event System Legacy
METIS Metadata Repos Persistence Abstraction
METIS CORE Meta Data Model
File
HTTP
Generic Data Locators
Loose Coupling of Plugin Plugin Components Framework
DB
Heterogeneous Content Integration
Dublin Core (ISO, 2003) or MARC. Semantic packs can be created through an internal build environment or directly exported from Protégé (Noy, Sintek, Decker, Crubezy, Fergerson, & Musen, 2001). METIS includes a query processor that supports hybrid searches (Wen, Li, Ma, & Zhang, 2003) for media objects, taking into account the semantic classification of media, their high-level characteristics and low-level features, and their relationships to other media objects. Persistence abstraction. METIS is built on a persistence abstraction layer that facilitates the switching of storage back-ends and offers a customizable Web front-end for administration and media management. An extensible locator mechanism allows for transparent access to media in different storage systems and locations. Media aggregation and production. On the output side, METIS offers an XSLT-based multichannel publishing strategy that can be used for aggregation of the managed media objects and delivery in various context-dependent output formats, like XHTML, SMIL, or Microsoft Word. This production layer can easily be extended to
Integration of Legacy Applications
Generic / Configurable Custom / Domain-specific
deliver content in arbitrary formats (e.g., RDF) to external applications. Plugins. A sophisticated plugin infrastructure enables extensive customizations of the system core, such as media- and domain-specific query operators, similarity measures, and feature extraction algorithms. The implemented frameworks provide plugin interfaces for •
• • • • • • •
Complex data types, for example, MPEG-7 media descriptors (Martínez, Koenen, & Pereira, 2002) Data model extensions Querying of the contained data Metadata and data comparison Transparent media locators Media aggregation and presentations Event based communication Visualization and persistence abstraction
METIS differs clearly from traditional mediatype specific databases, including video or image databases as these are tailored towards their respective media type but do not allow uniform multimedia management. METIS differs from
Technologies for Semantic Project-Driven Work Environments
more comparable systems mainly in its flexibility and extensibility on all architectural levels (back-end, GUI, data model and functionality) that greatly helps METIS to quickly fit into concrete projects.
Ylvi: A Semantic Multimedia wiki framework In the previous section, we described how the system/platform METIS can provide uniform access to media resources by providing transparent mechanisms to locate resources in arbitrary data sources, semantic annotation and interrelation of these resources, and querying and aggregated delivery of this media. Such an infrastructure may be sufficient for machines, but human users require additional features, including navigation between media items without the need for searching, or human-readable annotations of media and relations. The semantic glue between media objects is stored in the mind of the users. The question arises: How can this glue be made explicit without penalising the human in favour of the machine? Example: Typical project documentation consists of multiple documents and media objects that serve completely different documentation purposes (e.g., a project plan, a resource plan, deliverables, material, e-mails). Especially when the project’s size renders it unprofitable to establish dedicated organization structures, these documents are stored in a common project directory tree, a database, and/or an e-mail server, respectively. Furthermore, these items are stored in various media formats that require the underlying management infrastructure to be able to deal with these formats. As mentioned before, this representation of project data has various drawbacks, as it does not represent any relation between these documents (aside from that they belong to the same project
and are therefore stored in the same folder – if we consider only the project folder). This makes it impossible to search or navigate along such relations or, for example, request a list of all managed documents of a certain type. Data integration platforms like METIS can provide uniform access to such resources and allow the users to model the relationships and semantic properties (e.g., what purpose a certain document serves) that describe them. This enables users to search and exchange this information, but the following questions remain: • • •
What kind of user interface should be used to enter/access this information? What information should be used to browse the data? How to describe the semantic glue between data objects in a human-readable fashion?
Although, for example, specialized project management software could provide such functionality, such systems are in general very rigid and tailored towards their respective tasks. They provide only limited interoperability with other systems (e.g., e-mail systems, intranet applications, or 3rd party document management software), are often difficult to extend, and lack flexibility. Most relations between media objects can be explicitly modelled with systems like METIS (or the mentioned specialized software with all described drawbacks). However, it is not always preferable to represent such relations only with a formal data model. Although machines can work with this information, it is hard for human users to understand such formal relationships, even if the system tries to translate them into human readable representations. It is often more desirable to allow humans to annotate media and its interrelations in natural language text and additionally allow them to model the formal media relations in a simple manner—this is the goal of semantic wikis.
Technologies for Semantic Project-Driven Work Environments
Ylvi is a semantic wiki framework based on the METIS platform. It combines the advantages of structured, type-/attribute-based media management and the open, relatively unstructured wiki approach. By representing wiki pages as abstract media objects, Ylvi can offer sophisticated media management features to the wiki domain and provide an extensible, highly configurable, and multimedia-enabled semantic wiki. The manifold extensibility of the underlying METIS framework renders Ylvi more a semantic wiki toolkit than just another semantic wiki instance. The semantic typing of Ylvi media, articles, and links between such items greatly enhances the search and browse functionalities for human users. Ylvi provides a comprehensive user interface for capturing the mentioned semantic glue between media objects in the form of natural-language text articles and links between these items. Applied to the above-mentioned example, users could collaboratively create a set of articles that describe the project and its resources, and provide links to these resources. Embedded queries (queries that are included in the article source and rendered as a list of their result set) help to maintain topicality and consistency without additional effort. Such project documentation would be understandable by both computers and humans. As the underlying data model can be extended at any time, such a system could adapt to new requirements in a flexible fashion. In this sense, we consider Ylvi as a high-level, collaborative user interface for an underlying media management framework (in this case METIS) that combines the strength of a configurable semantic multimedia for data representation with the intuitive input paradigm of the emerging semantic wiki technologies. Ylvi extends other semantic wiki approaches by three main aspects: its high configurability, strong multimedia support, and adaptive semantic search. Configurability. The underlying open architecture upon which Ylvi is based provides a broad range of configurable features:
•
•
•
•
Configurable markup language: All syntactic elements of the markup language (except for a minimum set of core elements) can be dynamically defined/extended. Configurable visualisation: Ylvi articles are rendered by pipelines of rendering plugins, using the METIS cross-channel publishing framework, which supports arbitrary output channels, including XHTML, SMIL, and Microsoft Word. Configurable semantics: All semantic modelling elements (article types, attributes, and link types) are configurable on the fly and can be introduced into the system using the functionality provided by METIS (through a Protégé interface, XML import/export, semantic packs, or the Web-based GUI). Functional extensions: The METIS plugin frameworks for functional extensions can be used by Ylvi as well. Additional specific plugin types (e.g., render plugins, toolbox plugins) that implement Ylvi specific functionality (e.g., article rendering) were also developed.
Multimedia support. Ylvi can be characterised as a multimedia-specific wiki as it provides an abstract data model for both simple and complex media, transparent access to arbitrary data sources through a plug-in locator mechanism, as well as support for multimedia aggregation and multichannel publishing. Ylvi treats both wiki articles and multimedia objects in a uniform way: Both are modelled as METIS media objects that can be typed and attributed and may participate in directed, typed links. An overview of the semantic features provided by Ylvi is depicted in Figure 3 and will be described in more detail. Semantic Search. Ylvi provides comprehensive search facilities, based on a full-text index of articles on the one hand, and on its extensive semantic features on the other hand. Articles and media instances can be related and annotated using multi-typing, attribution, and typed links (for
Technologies for Semantic Project-Driven Work Environments
Figure 3. Semantic features in Ylvi type hierarchy multiple inheritance
Model multi-typing of articles and media
Instance A A
attributed articles and media
typed links
A A
wiki article
links article and media embedding
media
external resources
Figure 4. Ylvi Screenshot
Technologies for Semantic Project-Driven Work Environments
internal and external resources). The metadata expressed by these modelling primitives can be searched by an adaptive search algorithm that constantly restricts search space and assists the user in narrowing down his desired search results. Example: Our experience from recent research and development projects indicates that multimedia-enabled semantic wikis such as Ylvi are highly applicable for collaborative document and media management in inter-organizational projects. In the course of various research projects, we employed Ylvi as shared project management and documentation platform, whereas the project ontology was developed on-the-fly during project work, using the Ylvi interface. We consider our experience of adapting Ylvi to concrete project requirements within a very short development time as a proof of concept for (1) the rapid prototyping goals we aimed at with the development of the METIS platform, and (2) the goal of developing a Semantic Wiki framework that is easy to adapt to concrete application settings.
•
•
SemDAV: Leveraging Business knowledge to the desktop The SemDAV project (Schandl, 2006; Schandl & King, 2006) aims to provide a suite of technologies which enable users to work with data in whatever form in a unified way. SemDAV follows the following general design considerations that the authors have observed in everyday work. •
Unify the user’s view on data: Users are confronted with a variety of data structures, user interfaces, classification and annotation schemes, interaction methods, and workflows. This menagerie is imposed by applications, each of which operates in its own “data world” that is in many cases not connected to the world outside the application. The file system, as the lowest common denominator of all data processing systems,
•
could serve as a central point of cross-application and cross-domain data management. However, current file systems do not provide the optimal means for efficient organization of data. Instead, they are restricted to fixed hierarchies and minimal sets of metadata, most of which do not help the user to remove outdated or unnecessary data efficiently. Provide well-known user interaction metaphors: Because of the limitations of file systems, developers are forced to implement application-specific data and metadata formats, which, in turn, requires the design of application-specific user interaction and visualization metaphors. This not only forces the user to adapt to a plurality of interfaces, but also restricts semantic annotation and relation of data to features implemented by the respective applications. Do not overload users with machine semantics: To a great extent, semantics are defined by technical aspects of a system’s implementation. It need not be the case that the contacts in a mail application differ from those in instant messaging software. It need not be the case that a picture sent via mail can have a textual description attached (the mail message itself), while a picture received on an USB stick cannot. Machine semantics are often expressed using complicated and expressive schemas or ontologies that have been developed to serve the application’s need. However, many users are not willing or are not able to cope with this massive complexity. As popular collaborative services like Flickr2 or del.icio.us3 demonstrate, semantics can be expressed in a much easier and more user-centric manner, as long as collaborators share a certain level of knowledge—a requirement that holds true for a significant fraction of daily work. Use open standards and technologies: While the previous three design principles address mainly user-related aspects, this
Technologies for Semantic Project-Driven Work Environments
rule focuses on technical implementation. As history shows, data is often hidden in application-specific formats and schemas. In order to widely leverage data exchange between applications and systems, it is particularly important that systems provide representations of data in well-defined, open formats. In our view, the enormous success of XML4 is a proof of this claim’s importance, and the transition to semantically rich formats like RDF is the next, logical step to answer the open question of semantic interoperability.
mation that is stored only on personal devices but can nevertheless be related to shared information, for example, personal meeting minutes, or mail messages. The SemDAV system will be able to integrate and relate data from various sources and can help to bridge the gap between the public and private spheres. The SemDAV system architecture for data integration is based on the client/server paradigm, and one of its core components, the SemDAV protocol, can be considered as semantic extensions for the WebDAV protocol (Goland, Whitehead, Faizi, Carter, & Jensen, 1999). This architecture can be applied to various application scenarios, since both client and server components can be executed on the same physical machine; and a SemDAV server may act as client for another server. SemDAV proposes a basic abstract atomic data item called a sile (from semantic file). A sile can be any digital object, a file, an image, a piece of music, a person, a machine, an e-mail message. A
Example: Using a collaborative system like a semantic wiki (e.g., Ylvi), coworkers can share project-related information and collaboratively improve project documentation. Web browsers provide a suitable technology for unified interfaces, independent of the platform or context from which they are used. However, there will always be the need for private data and personal infor-
Figure 5. SemDAV architecture
Semplorer GUI
WebDAV-enabled Applications 2
8
Client
SemDAV-enabled Applications
1
SemDAV API 3
4
9
SemDAV Server
WebDAV Adapter
Legacy System Adapter 7
10
Native SemDAV Repository
Server
5
6
Legacy Systems
Technologies for Semantic Project-Driven Work Environments
sile can be compared to a resource in RDF, to an object in object oriented programming languages, to a file in a file system, and to any object in the physical world. Siles are self-contained data units that can be subject to attribution, semantic annotation (tagging or classification), and association with other siles. The main components that operate on siles are depicted in Figure 5. The Semplorer user interface (1) (Schandl, Amiri, Pomajbik, & Todorov, 2007) is the central tool for data and metadata management. It provides a user interface that is oriented towards well-known file management utilities, like Windows Explorer. Using the Semplorer, users are able to browse and search SemDAV repositories, and to manipulate associated metadata in a simple way. Metadata processing and manipulating is not only done by the user by the means of the Semplorer, but also through applications that are aware of SemDAV features (2). Applications are the interface through which the user actually works with data; thus, it is particularly important that applications track and store metadata as early as possible, especially if no additional user input is required (Schandl & King, 2006). Applications, as well as the Semplorer, access SemDAV repositories through the SemDAV API (3). This API abstracts from any implementation details and provides convenient methods to access siles and for search, retrieval and modification of their associated metadata. Currently, an API for Java 1.5 is under development. If the client and the server components are executed on the same machine, requests issued through the SemDAV API are directly handed over to the server request handler. However, if this is not the case, requests are translated into SemDAV protocol requests (4). This protocol is built on top of the standardized technologies HTTP (Fielding et al., 1999) for content handling and transfer, and SPARQL (Kendall, 2006) for metadata querying and processing. Thus, all features that can be applied to these technologies
(like network transport, routing, and encryption, as well as query rewriting and optimization) can be applied to SemDAV without further effort. A SemDAV server implementation (5) handles the requests and processes them either using a designated local repository (6) or by accessing external data sources using specially developed adapters (7). A designated repository combines storage capabilities for binary data (e.g., in the file system) and metadata (e.g., using a RDF triple store). Legacy system adapters, on the other hand, may be implemented in order to access, for example, mail servers, CRM data bases, or Web resources. To gain backwards compatibility with existing systems and applications, SemDAV servers will be able to handle WebDAV requests and map them to the SemDAV data model. WebDAV is an extension to HTTP that allows clients to execute authoring tasks on remote servers. WebDAV features can be mapped to common file system metaphors (see, e.g., davfs25) and is supported by all major desktop operating systems, including Microsoft Windows, Apple Mac OS X, and a magnitude of Linux derivates. The WebDAV protocol is also implemented by a (perhaps surprising) number of commercial applications. This will allow applications (8) to transparently access data stored in the SemDAV server without any further modification. WebDAV requests (9) are interpreted by a WebDAV server emulation (10) and are mapped to operations on SemDAV repositories.
future researCh direCtions We identify three main trends that will significantly change the way knowledge workers interoperate in project-driven environments. First, the proliferation of semantics based on Semantic Web technologies will allow the development of more interoperable systems; an evolutionary process in which XML was only the first step. Second, collaborative knowledge organization
Technologies for Semantic Project-Driven Work Environments
metaphors (like tagging or classification) will gain more importance, and people will be willing to participate in such systems if they can discover added value for themselves. Third, despite the blurring borderlines between local and online applications, there will always be the need for personal data; thus, applications increasingly will have to provide methods for merging remote and local resources. Using the frameworks presented in this chapter as basis technology, we gave examples for an application suite that addresses these future challenges. METIS is able to semantically integrate multimedia information from various distributed sources; Ylvi provides a collaborative, Web-based interface for rapid knowledge exchange and management, and SemDAV and its client implementation is able to integrate personal and shared semantic information and provides interfaces to quickly manage, annotate, and retrieve information. For the future knowledge worker, we envision a seamless interface that provides a single point of entry to all the user’s information needs. Old, unorganised data swamps, like file systems, mail servers, and Web pages, will be wrapped and integrated into fully interconnected, search and browsable knowledge meshes. With nearly infinite storage capacity, users will not have to worry about archiving or deleting information—knowledge organization will be performed with very little additional effort at worktime.
aCknowledgment This work has been partially supported by the Austrian Federal Ministry of Transport, Innovation, and Technology, and the Austrian Federal Ministry of Economics and Labour. The authors also thank Bernhard Haslhofer for valuable contributions to this chapter.
referenCes Aberer, K., Cudre-Mauroux, P., Hauswirth, M., & van Pelt, T. (2004). GridVine: Building Internetscale semantic overlay networks. In Proceedings of the 3rd International Semantic Web Conference (ISWC) (pp. 107-121). Hiroshima, Japan: Springer. Ajchrzak, A., Wagner, C., & Yates, D. (2006). Corporate wiki users: Results of a survey. In Proceedings of the 2006 International Symposium on Wikis (pp. 99-104). Odense, Denmark: ACM Press. Anjomshoaa, A., Manh Nguyen, T., Shayeganfar, F., & Tjoa, A. (2006). Web service based business processes automation using semantic personal information managements Systems – the semantic life case. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 1-12). Vienna, Austria: Springer. Bizer, C., & Seaborne, A. (2004). D2RQ - treating non-RDF-databases as virtual RDF graphs. In Proceedings of the 3rd International Semantic Web Conference. Hiroshima, Japan: Springer. Decker, S., Erdmann, M., Fensel, D., & Studer, R. (1998). Ontobroker: Ontology based access to distributed and semi-structured information. In DS-8: Proceedings of the IFIP TC2/WG2.6 8th Working Conference on Database Semantics - Semantic Issues in Multimedia Systems (pp. 351369). Deventer, The Netherlands: Kluwer. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., et al. (1999). Hypertext transfer protocol – HTTP/1.1 (RFC 2616). The Internet Society. Fitzgerald, M. (2003, November). Trash your desktop. MIT Technology Review, 42-46. Géczy, P., Izumi, N., Akaho, S., & Hasida, K. (2006). Extraction and analysis of knowledge worker activities on Intranet. In Proceedings of
Technologies for Semantic Project-Driven Work Environments
the 6th International Conference on Practical Aspects of Knowledge Management (pp. 73-85). Vienna, Austria: Springer.
Lamb, R., & Davidson, E. (2005). Understanding Intranets in the context of end-user computing. ACM SIGMIS Database, 36(1), 64-85.
Goland, Y., Whitehead, E., Faizi, A., Carter, S.R., & Jensen, D. (1999). HTTP extensions for distributed authoring – WebDAV (RFC 2518). The Internet Society.
Le Coche, E., Mastroianni, C., Pirrò, G., Ruffolo, M., & Talia, D. (2006). A peer-to-peer virtual office for organizational knowledge management. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 166-177). Vienna, Austria: Springer.
Haslhofer, B. (2006). A service oriented approach for integrating metadata from heterogeneous digital libraries. In Proceedings of the 1st International Workshop on Semantic Information Integration on Knowledge Discovery. Yogyakarta, Indonesia: Austrian Computer Society Book Series. ISO. (2003). The Dublin core metadata element set. International Standard ISO 15836-2003. Karger, D., Bakshi, K., Huynh, D., Quan, D., & Sinha, V. (2005). Haystack: A general-purpose information management tool for end users based on semistructured data. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (pp. 13-26). Asilomar, CA: Online Proceedings.
Lethi, P., & Frankhauser, P. (2004). XML data integration with OWL: Experiences and challenges. In 2004 Symposium on Applications and the Internet (SAINT 2004) (pp. 160-170). Maedche, A., Motik, B., Silva, N., & Volz, R. (2002). MAFRA - an ontology mapping framework in the Semantic Web. In Proceedings of the ECAI Workshop on Knowledge Transformation, Lyon, France. Martínez, J.M., Koenen, R., & Pereira, F. (2002). MPEG-7: The generic multimedia content description standard, part 1. IEEE MultiMedia, 9(2), 78-87.
Kaulgud, V.S., & Dolas, R. (2006). DKOMP: A peer-to-peer platform for distributed knowledge management. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 119-130). Vienna, Austria: Springer.
Mena, E., Illarramendi, A., Kashyap, V., & Sheth, P. (2000). OBSERVER: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. In Journal of Distributed and Parallel Databases, 8(2), 223-271.
Kendall, G.C. (Ed.). (2006). SPARQL protocol for RDF (W3C Candidate Recommendation, 6 April 2006). W3C.
Nejdl, W., Wolf, B., Changtao, Q., Decker, S., Sintek, M., et al. (2002). EDUTELLA: A P2P networking infrastructure based on RDF. In Proceedings of the 11th International World Wide Web Conference (pp. 604-615). Honolulu, HI: ACM Press
King, R., Popitsch, N., & Westermann, U. (2004). METIS: A flexible database foundation for unified media management. In Proceedings of the 12th Annual ACM International Conference on Multimedia (pp. 744-745). New York, NY: ACM Press. King, R., Popitsch, N., & Westermann, U. (2007). METIS: A flexible foundation for the unified management of multimedia assets. In Multimedia Tools and Applications, 33(3), 325-349. 0
Noy, N. (2004). Semantic integration: A survey of ontology-based approaches. In SIGMOD Rec., 33(4), 65-70. Noy, N.F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R.W., & Musen, M.A. (2001). Creating Semantic Web contents with Protégé-2000. IEEE Intelligent Systems, 16(2), 60-71.
Technologies for Semantic Project-Driven Work Environments
Oren, E., Breslin, J.G., & Decker, S. (2006). How semantics make better wikis. In Proceedings of the 15th International World Wide Web Conference (pp. 1071-1072). Edinburgh, Scotland: ACM Press. Popitsch, N., Schandl, B., Amiri, A., Leitich, S., & Jochum, W. (2006). Ylvi – multimedia-izing the semantic wiki. In Proceedings of the 1st Workshop on Semantic Wikis – From Wiki to Semantics. Budva, Montenegro: CEUR-WS, Vol. 206. Priebe, T., & Pernul, G. (2003). Towards integrative enterprise knowledge portals. In Proceedings of the 12th International Conference on Information and Knowledge Management (pp. 216-223). New Orleans, LA: ACM Press. Risse, T., Knezevic, P., Meghini, C., Hecht, R., & Basile, F. (2005, December). The BRICKS infrastructure – an overview. In Proceedings of the 75th Conference on Electronic Imaging, the Visual Arts & Beyond (EVA 2005), Moscow. Sauermann, L. (2005). The gnowsis semantic desktop for information integration. In Proceedings of the 3rd Conference on Professional Knowledge Management. Kaiserslautern, Germany: Springer. Sauermann, L., Bernardi, A., & Dengel, A. (2005). Overview and outlook on the semantic desktop. In Proceedings of the ISWC 2005 Workshop on the Semantic Desktop). Galway, Ireland: CEURWS. Vol, 175. Schandl, B. (2006). SemDAV: A file exchange protocol for the semantic desktop. In Proceedings of the Semantic Desktop and Social Semantic Collaboration Workshop. Athens, GA: CEURWS Vol. 202.
Schandl, B., & King, R. (2006). The SemDAV project: Metadata management for unstructured content. In Proceedings of the 1st International Workshop on Contextualized Attention Metdata: Collecting, Managing and Exploiting of Rich Usage Information (pp. 27-32). Arlington, VA: ACM Press. Tolksdorf, R., & Simperl, E.P.B. (2006). Towards wikis as semantic hypermedia. In Proceedings of the 2006 International Symposium on Wikis (pp. 79-88). Odense, Denmark: ACM Press. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., & Studer, R. (2006). Semantic wikipedia. In Proceedings of the 15th International Conference on World Wide Web (pp. 585-594). Edinburgh, Scotland: ACM Press. Völkel, M., & Oren, E. (2006). Towards a wiki interchange format (WIF). In Proceedings of of the 1st Workshop on Semantic Wikis – From Wiki to Semantics. Budva, Montenegro: CEUR-WS Vol. 206. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H. Schuster, G., Neumann, H., et al. (2001). Ontology-based integration of information. In Proceedings of the IJCAI-01 Workshop on Ontologies and Information Sharing (pp. 108-117). Wen, J., Li, Q., Ma, W., & Zhang, H. (2003). A multi-paradigm querying approach for a generic multimedia database management system. ACM SIGMOD Record, 32(1), 26-34.
endnotes 1 2
Schandl, B., Amiri, A., Pomajbik, S., & Todorov, D. (2007). Integrating file systems and the Semantic Web. Demo at the 3rd European Semantic Web Conference. Innsbruck, Austria.
3 4 5
RDF: http://www.w3.org/RDF/ Flickr: http://www.f lickr.com/photos/ tags/ del.icio.us: http://del.icio.us/tag/ XML: http://www.w3.org/XML/ WebDAV Linux File System (davfs2): http://dav.sourceforge.net
Chapter XV
An Integrated Formal Approach to Semantic Work Environments Design Hai H. Wang University of Southampton, UK
Nicholas Gibbins University of Southampton, UK
Jin Song Dong National University of Singapore, Singapore
Yuan Fang Li National University of Singapore, Singapore
Jing Sun The University of Auckland, New Zealand
Jeff Pan University of Aberdeen, UK
Terry R. Payne University of Southampton, UK
introduCtion The Semantic Web (Berners-Lee, Hendler, & Lassila, 2001) has become increasingly significant as it proposes an evolution of the current World Wide Web from a web of documents to a distributed and decentralised, global knowledge-base. Based on the notion of interlinked resources grounded within formally defined ontologies, it promises to be an enabling technology for the automation of many Web-based tasks, by facilitating a shared understanding of a domain through inference over shared knowledge models. Semantic Work Environment (SWE) applications use Semantic Web techniques to support the work of the user by collecting knowledge about the current needs
in a specific activity, and providing both inferred and augmented knowledge that can then be integrated into current work. Web Services have emerged as distributed, heterogeneous software components that provide machine access to the services otherwise offered on the Web through Web pages. Built upon defacto Web standards for syntax, communication protocols, and markup languages such as XML, Web services provide a near ubiquitous mechanism for communication between applications and agents. In addition, such services can be composed to provide additional functionality, thus facilitating the rapid construction of new services. However, the dynamic use of services is limited by the need to agree a-priori upon data models and interface definitions. By
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
An Integrated Formal Approach to Semantic Work Environments Design
coupling Web service technology with Semantic Web technology, Semantic Web Services can partially relax these constraints, both in the dynamic use of services, and in the data models shared by such services. Several examples of such services have been developed, for example, the ITTALKS services (Cost, Finin, & Joshi, 2002), which are considered in this chapter. The use of reasoning over shared domain models and the description of services and capabilities using Semantic Web Service descriptions are essential in supporting communication and collaboration between agents. Agents may agree on the same vocabulary and domain for communication, but different agents may not necessarily refer to the same concept in the same way. The Semantic Web is by definition highly distributed, and is built upon the assumption that different parties may have a different conceptualisation or representation for the same concept. Agreement on the concepts and the terms used to reference them, their relationships with other concepts, and the underlying aximoatisation of a domain model is addressed by formally defining shared ontologies. These ontologies represent different domains, and, through inference, can identify the equivalence of two concepts that may have different representations. The OWL Web Ontology Language (Dean et al., 2004) is a W3C recommendation for representing ontologies based on Description Logics, and provides a basis for providing both terminologies (i.e., vocabularies, relationships, axioms, and rules) and knowledge bases. Whilst providing the mechanisms for defining domain ontologies, representational ontologies (Heijst, Schreiber, & Wielinga, 1997) are necessary for defining the structure of services themselves, and several such ontologies have been proposed (Ankolekar et al., 2002; Roman et al., 2005). These ontologies define at a meta-level what the service does, how to use it, what are its outputs and effects, and so forth, and thus can facilitate the run-time use of services (including discovery and execution) and consequently problem solving
without necessitating human intervention at that point. OWL-S (Ankolekar et al., 2002) is one such ontology (defined in OWL-DL) that describes Web Services, including models for service orchestration (through the process model) and mappings to Web Service definitions. Services found in Semantic Work Environments (SWE) can be defined using Semantic Web Service ontologies, and grounded within a specific domain by using the relevant, shared domain ontologies. The services may have intricate data states, complex process behaviours, and concurrent interactions. The design of such systems requires precise and powerful modelling techniques to capture not only the ontology domain properties but also the services’ process behaviours and functionalities. It is therefore desirable to have a powerful formal notation that captures these notions, in addition to the declarative ontological representations of the domains and the services themselves, to precisely design Semantic Work Environments. Timed Communicating Object-Z (TCOZ) (Mahony & Dong, 2000) is a Formal Specification language which builds on the strengths of Object-Z (Duke & Rose, 2000; Smith, 2000) in modelling complex data and state with the advantage of Timed CSP (Schneider & Davies, 1995) in modelling real-time concurrency. The following characteristics of many SWE applications make TCOZ a good candidate to design such a system: •
•
•
A complex SWE applications system often has both intricate data state and process control aspects. An integrated formal modeling language, like TCOZ, has the capability to model such systems. A service-providing agent may offer several kinds of different services concurrently. TCOZ can support this through its multithreaded capabilities. A complex SWE system is often composed from many individual services. These other
An Integrated Formal Approach to Semantic Work Environments Design
•
•
services may be provided by other agents, which have their own threads of control. This can be modeled by the active-objects feature in TCOZ. A SWE application may include highly distributed components with various synchronous and asynchronous communication channels. This can be specified with various TCOZ communication interfaces: namely channels, sensors, and actuators. Some SWE applications, such as online hospital or online bank applications, may have critical timing requirements. These real-time requirements can also be captured by TCOZ.
Thus, we propose to use TCOZ as a language to model the complex SWE applications. We believe one effective way to design a complex system is to use a single expressive modeling language, like TCOZ, to present complete, rigorous, and coherent requirements models for complex system as a specification contract. Later on, this complex model can be projected into multiple domains so that existing specialized tools in these corresponding domains can be utilized to perform the checking, analysis and other tasks. We have developed tools to automatically map TCOZ to UML for visualization purpose (Dong, Li, Sun, Sun, & Wang, 2002), to Timed Automata for checking the time-related properties, to OZ for animation (Sun, 2003), and to Isabelle/HOL for complex reasoning (Sun, 2003) and so forth. We believe that TCOZ, as a high-level design technique can contribute to the SWE application development in many ways. In support of this claim, we have conducted a Semantic Web (SW) service case study, that is, the online talk discovery system, and applied TCOZ to the design stage to demonstrate how TCOZ can be used to augment and model semantic Web services. Using an expressive formal language like TCOZ can provide an unambiguous requirement for the SW service system and the series of related supporting tools
(Dong et al., 2002; Sun, 2003; Sun, Dong, Liu, & Wang, 2001a) can ensure high quality from the design model. In addition to presenting these general advantages of using formal language for designing systems, the chapter also presents the development of a set of systematic translation rules and tools for automatically extracting the Web ontology and generating the resulting semantic markup for the SW services from the formal TCOZ design model. It is a desired add-on value, as designing the Web ontology and semantic markup for the SW services itself is a tough task for the domain engineers. Those transformation rules are nontrivial. For example, the semantics of the OWL subclass construct is different to that of the Z schema inclusion construct, or TCOZ class inheritance, and such differences should be managed appropriately. Rigorous study has been made to avoid the conflict between those semantics. Our online talk discovery system is a simplified variant of the ITTALKS system (Cost et al., 2002), which is a deployed semantic web service suite that has been used extensively. The remainder of the chapter is organized as follows. The next section briefly introduces TCOZ and SW. We then formally specify the functionalities of the SW service example (i.e., the talk discovery system), present the tool that extracts the ontology used by the SW services from the TCOZ design model automatically, and present the tool which extracts the semantic markup for SW services from the TCOZ design model automatically. We then conclude the chapter.
tCoz and sw serviCes overview tCoz overview Object-Z and CSP Object-Z (Duke & Rose, 2000) is an extension of the Z formal specification language to accom-
An Integrated Formal Approach to Semantic Work Environments Design
modate object orientation. The main reason for this extension is to improve the clarity of large specifications through enhanced structuring. Although Object-Z has a type checker, other tool support for Object-Z is somewhat limited in comparison to Z. The essential extension to Z in Object-Z is the class construct, which groups the definition of a state schema with the definitions of its associated operations. A class is a template for objects of that class: for each such object, its states are instances of the state schema of the class and its individual state transitions conform to individual operations of the class. An object is said to be an instance of a class and to evolve according to the definitions of its class. Timed CSP (TCSP) (Schneider & Davies, 1995) extends the well-known Communicating Sequential Processes (CSP) notation with timing primitives. As indicated by its name, CSP is an event-based notation primarily aimed at describing the sequencing of behaviour within a process and the synchronization of behaviour (or communication) between processes. Timed CSP extends CSP by introducing a capability to consider the temporal aspects of sequencing and synchronization. CSP adopts a symmetric view of both process and environment, with events representing a cooperative synchronization between them. Both process and environment may control their behaviour by enabling or refusing certain events or sequences of events. The primary building blocks for Timed CSP processes are sequencing, parallel composition, and choice.
TCOZ Features Timed Communicating Object-Z (TCOZ) (Mohony, 2000) is essentially a blending of Object-Z with Timed CSP (Schneider & Davies, 1995), for the most part preserving them as proper sublanguages of the blended notation. The essence of this blending is the identification of Object-Z operation specification schemas with terminating
CSP processes. Thus, operation schemas and CSP processes occupy the same syntactic and semantic category; operation schema expressions may appear wherever processes may appear in CSP, and CSP process definitions may appear wherever operation definitions may appear in Object-Z. The primary specification structuring device in TCOZ is the Object-Z class mechanism. We briefly consider the various aspects of TCOZ. A detailed introduction to TCOZ and its Timed CSP and Object-Z features may be found elsewhere (Mahony & Dong, 2000), where the formal semantics of TCOZ are also documented.
Interface: Channels, Sensors and Actuators CSP channels are given an independent, first class role in TCOZ. In order to support the role of CSP channels, the state schema convention is extended to allow the declaration of communication channels. If c is to be used as a communication channel by any of the operations of a class, it must be declared in the state schema to be of type chan. Channels are type heterogeneous and may carry communications of any type. Contrary to the conventions adopted for internal state attributes, channels are viewed as shared (i.e., global) rather than as encapsulated entities. This is an essential consequence of their role as communications interfaces between objects. Thus, the introduction of channels to TCOZ reduces the need to reference other classes in class definitions, thereby enhancing the modularity of system specifications. Complementary to the synchronizing CSP channel mechanism, TCOZ also adopts a nonsynchronizing shared variable mechanism. A declaration of the form “s: X sensor” provides a channel-like interface for using the shared variable s as an input. A declaration of the form “s: X actuator” provides a local-variable-like interface for using the shared variable s as an output. Sensors and actuators may appear either at the system boundary (usually describing how global
An Integrated Formal Approach to Semantic Work Environments Design
analogue quantities are sampled from, or generated by the digital subsystem) or else within the system (providing a convenient mechanism for describing local communications which do not require synchronization).1 The shift from closed to open systems necessitates close attention to issues of control, an area where both Z and CSP are weak (Zave & Jackson, 1997).
independent parallel composition behaviours of the two objects can be represented as ob1 ||| ob2, which means ob1.MAIN ||| ob2. MAIN. The syntactic implication of the above approach is that the basic structure of a TCOZ document is the same as for Object-Z. A document consists of a sequence of definitions, including type and constant definitions in the usual Z style. TCOZ varies from Object-Z in the structure of class definitions, which may include CSP channel and Active Objects processes definitions. Let us use a simple timed message queue system to illustrate the TCOZ Active objects have their own threads of control, notation. The behaviour of the following timed while passive objects are controlled by other obmessage queue system is that it can receive a new jects in a system. In TCOZ, an identifier MAIN message (of type MSG) through an input channel (nonterminating process) is used to determine in within time duration “Tj ” or remove a message the behaviours of active objects of a given class and send it through an output channel “out” within (Dong & Mahoney, 1998). The MAIN operation time duration “T1”. If there is no interaction with is optional in a class definition. It only appears in environment within a certain time “To”, then a a class definition where the objects of that class message will be removed from the current list are active objects. Classes for defining passive but stored in a (window like) actuator list (lost) objects will not have the MAIN definition, but so that other objects (unspecified) with a sensor may contain CSP process constructors. If ob1 Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design “lost” can read it at any time. The message queue and ob2 are active objects of the class C, then the has a FIFO property. (see Box 1) Box 1.
but stored in a (window like) actuator list (lost) so that other objects (un-specified) with a sensor ‘lost’ can read it at any time. The message queue has a FIFO property.
As we can see, Object-Z and TCSP complement each other not only in their expressive capabilities, but also in their underlying semantics. Object-Z is an excellent notation for modelling data and states, but difficult for modelling real-time and concurrency. TCSP is good for specifying timed process and communication, but like CSP, it can be cumbersome when capturing the data states of a complex system. By combining the two, TCOZ treats data
An Integrated Formal Approach to Semantic Work Environments Design
As we can see, Object-Z and TCSP complement each other not only in their expressive capabilities, but also in their underlying semantics. Object-Z is an excellent notation for modelling data and states, but difficult for modelling real-time and concurrency. TCSP is good for specifying timed process and communication, but like CSP, it can be cumbersome when capturing the data states of a complex system. By combining the two, TCOZ treats data and algorithmic aspects using ObjectZ, whilst treating process control, timing, and communication aspects using TCSP constructs. In addition, the object-oriented flavour of TCOZ provides an ideal foundation for promoting modularity and separation of concerns in system design. With the above modelling abilities, TCOZ is potentially a good candidate for specifying composite systems in a highly constructive manner.
Semantic Web Ontology and Service overview As a huge, distributed, information space, the World Wide Web was originally designed to seamlessly support human navigation through related, linked documents. Although this medium was originally designed to do more than simply support human-to-human communication (Berners-Lee et al., 2001), machine or agent mediated assistance has been hindered by the type of markup used within the documents. An emphasis by content providers on presentation and physical design has resulted in a lack of structure, both at the layout and content levels, and rendered most documents opaque to machine comprehension. Automated approaches to extract knowledge or data from such Web pages, or to simulate human browsing and activity with interactive pages required the a-priori construction of messages (using http) to Web servers, and the subsequent parsing of HTML to extract the desired data. These tasks were achieved either manually by software engineers, or through identifying regularities through the
use of machine learning techniques (Lieberman, 2001) within the raw HTML. Such approaches were fragile, and could easily fail if the message format was changed (as was frequently the case) or if new or unexpected content was returned in the resulting HTML. The emergence of XML and XHTML has been significant in addressing the problem of a lack of structure within Web pages, and the evolution of Web Services as a near-ubiquitous, standard technology has greatly facilitated the automated use of services on the web. However, although XML was designed to define the syntax of a document, it says nothing about the semantics of entities within a document, and consequently does not assist in the interpretation or comprehension of messages or exchange sequences (Bussler, 2001). The Semantic Web provides a mechanism for including semantics into XML-based structured documents by building upon various notions underlying knowledge-based systems: namely the use of a modeling paradigm to model a domain; a representation language to support the sharing of facts (i.e., instances), and models (i.e., ontologies) between agents and applications, and reasoning mechanisms to facilitate the inference of any facts entailed by the model. Ontologies are an explicit, formal specification of a shared conceptualisation of a domain (Studer, Benjamins, & Fensel, 1998): they provide a machine readable and agreed upon representation of an abstraction of some phenomenon. This typically involves defining the concepts within the domain being modeled, their properties and relationships with other concepts. In some cases, ontologies may be complemented by axioms, statements that are always true and that are used to constrain the meaning of concept definitions in the ontologies. Often the declarative definitions are not sufficient to constrain completely the meaning of concepts and to capture the “procedural” or decision making aspects of the application business logic. Therefore ontologies may need to be
An Integrated Formal Approach to Semantic Work Environments Design
complemented by rules, grounded in ontological definitions, to facilitate enhanced representation and reasoning capabilities. Whilst a variety of knowledge representations have been proposed in the past, basing a representation on an XML syntax facilitates the use of Web-based machinery for a variety of tasks, including indexing statements, concepts, and properties (URIs), defining vocabularies (RDF Schema) (Brickley & Guha, 2004) and representing statements about objects (through RDF’s subject-predicate- object model). The Web Ontology Language, OWL is an XML-based language for representing Description Logic terminologies (i.e., vocabularies, relationships, axioms and rules) and knowledge bases. Whilst many logics may be expressionally rich, and can support sophisticated knowledge representations, such logics are mostly computationally intractable. A significant advantage of utilizing Description Logics (DLs) for expressing knowledge is that the tractability of different DL subsets (with varying expressivity) is better understood, and in some cases formally proved. Whilst limiting the expressivity of a language may reduce the scope of knowledge that can be represented, ontologies have been developed and successfully deployed for pragmatic use in domains such as medical research and e-science. Several highly-optimised reasoning mechanisms have also been developed that support variants of DLs that whilst theoretically are intractable, are pragmatically tractable for most real-world problems. Three increasingly expressive sublanguages of OWL have been defined to date: OWL-Lite, OWL-DL, and OWL-Full. OWL-Lite offers limited expressivity in order to facilitate rapid uptake and to support the definition of simple, taxonomic structures with limited axioms; whereas OWL-DL has been designed with the intent to support DL representations of the application business logic, and to provide a language subset that has desirable computational properties for reasoning systems. Finally, OWL-Full encompasses maximal
expressivity, and as a result is not decidable, and makes no computational guarantees. The entailed knowledge represented within such ontologies may be inferred through a variety of different reasoning mechanisms. Semantic Web Services provide a declarative, ontological framework for describing services, messages, and concepts in a machine-readable format that can also facilitate logical reasoning. Thus, service descriptions can be interpreted based on their meanings, rather than simply a symbolic representation. Provided that there is support for reasoning over a semantic web service description (i.e., the ontologies used to ground the service concepts are identified, or if multiple ontologies are involved, then alignments between ontologies exist that facilitate the transformation of concepts from one ontology to the other), workflows and service compositions can be constructed based the semantic similarity of the concepts used. OWL-S is an OWL-based Web service ontology, which supplies Web service providers with a core set of markup language constructs for describing the properties and capabilities of their Web services in unambiguous, computer-intepretable form. OWL-S was expected to enable the automatic Web service discovery, automatic Web service invocation, and automatic Web service composition and interoperation. OWL-S consists of three essential types of knowledge about a service: the profile, the process model and the grounding. The OWL-S profile describes what the service does. The OWL-S process model tells how the service works. The OWL-S grounding tells how the service is used. The OWL-S process model is intended to provide a basis for specifying the behaviors of a wide array of services. There are two chief components of an OWL-S process model—the process and process control models. The process describes a Web Service in terms of its input, output, precondition, and effects, where appropriate, its component subprocess. The process model enables planning, composition and
An Integrated Formal Approach to Semantic Work Environments Design
agent/service inter-operation. The process control model—which describes the control flow of a composite process and shows which of various inputs of the composite process are accepted by which of its subprocesses— allows agents to monitor the execution of a service request. The constructs to specify the control flow within a process model includes Sequence, Split, Split+Join, If-Then-Else, Repeat-While, and Repeat-Until.
Formal Model of the Talk Discovery System The system involves four different intelligent agents which communicate interactively. They are the user’s Calendar agent, the MapQuest agent, user’s Personal agent, and the Talk Discovery agent.
Calendar Agent
First, the DATE and TIME set are defined by the Z given type definitions. As this chapter focuses In this section, an online talk discovery system only on demonstrating the approach, the model is used as an example to demonstrate how TCOZ we present here has been kept simple. Z given notation can be applied to the Semantic Web type is chosen to define different concepts, inservice development. cluding TIME and DATE. These concepts can be subdivided into detailed components, for example, System Scenario the TIME comprises hour, minute, and second. The more detailed the model, the more detailed The talk discovery system is a Web portal offering the ontology will be when derived automatically access to information about talks (i.e., presentafrom our tool (this tool will be further discussed tions) and seminars. This Web portal can provide in the later sections). not only the talk’s information corresponding The DateTime concept is defined as a schema & Dong, and An Integrated Formal to Semantic Environments Design to the user’s profile in terms ofWang his interest with twoApproach attributes date andWork time. location constraints, but also can further filter the related talks based on information about the user’s personal schedule, and so forth. In the course of operation, the talk discovery system discovers that there is an upcoming talk Theuser Calendar agent maintains a schedule for each eligible user and supplies related serv that may be of interest a registered based on Each eligible user a personal ID [PID] registered. This id is used to validat the information in the user’s preferences, which must have The Calendar agent maintains a schedule for identity of users when the system receives requests. The Calendar agent has an ID man have been obtained from his online, OWL-eneach eligible user and supplies related services. which provides functions for certifying identity. It may use Web security techniques suc coded profile. Upon receiving this information, Each eligible user must haveavailable a personal digital signatures, to ensure the service is only to ID the(PID) valid users. The follo the user’s User Agent gathers further information: registered. This id is used to validate the schema specifies the ID manager. The attribute ids denotes theidentity set of customs’ ID regis it consults with its Calendar agent to determine of users whencan thereceive systemareceives The to the system. The PIDMananger new id requests. through an input channel add (N the user’s availability, and withand theadd MapQuest Calendar agent ID manager which provides it tot its database, remove anhas id an from its database (Delete) or check if a custom agent to find the distance from registered the user’s (Validate). office functions for certifying identity. It may use Web to the talk’s venue. Finally, after evaluating the security techniques such as digital signatures, to information and making the decision, the User ensure the service is only available to the valid Agent will send a notification back to the talk users. The following schema specifies the ID mandiscovery agent indicating that the user will (or ager. The attribute ids denotes the set of customs’ will not) plan to attend. ID registered to the system. The PIDMananger can receive a new id through an input channel
the talk disCovery system
An Integrated Formal Approach to Semantic Work Environments Design
add (New) and add it tot its database, remove an MapQuest Agent id from its database (Delete) or check if a custom has registered (Validate). (see Box 2) The MapQuest agent is a agent supplying the The status “Status ::= FREE | Busy” defined service for calculating theDesign distance between two Wang & Dong, An Integrated Formal Approach to Semantic Work Environments by the Z free type definition indicates if a person places. First, PLACE is defined as a Z given type & Dong, An Integrated Formal Approach to Semantic Work Environments Design is free or busy. Update,Wang defined in Calendar, is [PLACE]. The MapQuest agent contains a set of used to update the timetable and it must complete places in its domain and a database storing the its task within 1 second. This real time property distance between any two places. is captured by TCOZ’s DEADLINE The a schedule for each eligible user and supplies related services. The Calendaroperator. agent maintains Each eligible user must have aa schedule personal ID registered. id is used to validate the operation Check_StatusThe is used to check whether Calendar agent maintains for [PID] each eligible user This and supplies related services. identity of users when the system receives requests. The Calendar agent has an ID manager Each eligible user must have a personal ID [PID] registered. This id is used to validate the a person is available or not for a particular time which provides certifying identity. It may Web security techniques such as identity of users functions when the for system receives requests. Theuse Calendar agent has an ID manager slot. (See Box 3) which functions for certifying may use to Web techniques such as digitalprovides signatures, to ensure the serviceidentity. is only Itavailable thesecurity valid users. The following
Box 2.
Box 3.
digital to ID ensure the service is only ids available thesetvalid users. The schemasignatures, specifies the manager. The attribute denotestothe of customs’ ID following registered schema specifiesThe the PIDMananger ID manager. The denotes the setan of input customs’ ID add registered to the system. canattribute receive ids a new id through channel (New) to theadd system. The PIDMananger receive id through an input channel (New) and it tot its database, removecan an id from aitsnew database (Delete) or check if a add custom has and add it tot its database, remove an id from its database (Delete) or check if a custom has registered (Validate). registered (Validate).
The Status “ ” defined by the Z free type definition indicates if a person is free or busy. Update, defined in Calendar, is used update the timetable must The Status “ ” defined by the Z freetotype definition indicatesand if aitperson complete its task within 1 second. This real time property is captured by TCOZ's DEADLINE is free or busy. Update, defined in Calendar, is used to update the timetable and it must operator. The operation Check_Status is used to check whether a person is available or not for taskslot. within 1 second. This real time property is captured by TCOZ's DEADLINE acomplete particularitstime
operator. The operation Check_Status is used to check whether a person is available or not for a particular time slot.
8
0 8
Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design
MapQuest agent The MapQuest agent is a agent supplying the service for calculating the distance between two places. Firstly, PLACE is defined as a Z given type [PLACE]. The MapQuest agent contains a set of places in its domain and a database storing the distance between any two places.
An Integrated Formal Approach to Semantic Work Environments Design
Talk Discovery Agent Schema Talk is defined for a general talk type. The interested_subjects records the interested subjects Wang & Dong, An Integrated Formal Approach Semantic Design senses for the tousers. TheWork talkEnvironments discovery system market updates, finding new talks information. Once a new talk is found, it sends a notification MapQuest agent to all the users who may be interested. The MapQuest agent is a agent supplying the service for calculating the distance between two places. Firstly, PLACE is defined as a Z given type [PLACE]. The MapQuest agent contains a Personal agent set of places in its domain and a database storing the distance between any two places. The personal agent maintains the user's profile, extraCting including user's owl name, weB office location, Personal Agent interests, etc. (modeled as [NAME, SUBJECT, …]).ontology After receivingfrom a potentially the interesting tCoz talk notification from the talk discovery agent (defined later), the personal agent uses The personal agent maintains the user’s profile, model operationuser’s Checkname, to communicate withinterests, his calendar agent to check whether the user is free or including office location, not and with the MapQuest agent to ensure the talk will be held nearby. In our system we and so forth (modeled as [NAME, SUBJECT, It is important to have a thoroughly designed assume that a user only wants to attend the talks located within five miles from his office. If on…]). After receiving a potentially interesting tology, since will be shared different the user could attend the talk, the personal agent will inform the itdiscovery agentby and connectagents talk notification from the talk discovery agent and it forms the foundation of all agents’ services. the calendar agent to update the user's timetable. (defined later), the personal agent uses operation However designing a clear and consistent ontology Check to communicate with his calendar agent is not a trivial job. It is useful to have some tool to check whether the user is free or not and with support in designing the ontology. (see Box 5) the MapQuest agent Personal to ensure the talk will be agent In this section, we demonstrate the developThe personal agent the including that user's name, office location, held nearby. In our system we assume that amaintains user mentuser's of anprofile, XSL application automatically interests, etc. (modeled as [NAME, SUBJECT, …]). After receiving a potentially interesting only wants to attend the talks located within five extracts the ontology related domain properties talk notification from the talk discovery agent (defined later), the personal agent uses miles from his office. If the user could attend the from the static aspects of TCOZ formal models, operation Check to communicate with his calendar agent to check whether the user is free or talk, the personal agent will inform the discovery encoded in ZML format (Sun, Dong, Liu, & system we not and with the MapQuest agent to ensure the talk will be held nearby. In our agent and connect the calendar agent to update ontology forfive the miles systemfrom can his office. If assume that a user only wants toWang, attend 2001b). the talksThe located within the user’s timetable. (see Box 4) the user could attend the talk, the personal agent will inform the discovery agent and connect Box 4.
the calendar agent to update the user's timetable.
Talk discovery agent Schema Talk is defined for a general talk type. The interested_subjects records the interested subjects for the users. The talk discovery system senses market updates, finding new talks information. Once a new talk is found, it sends a notification to all the users who may be interested.
4. Extracting OWL Web ontology from the TCOZ model It is important to have a thoroughly designed ontology since it will be shared by different agents and it forms the foundation of all agents' services. However designing a clear and consistent ontology is not a trivial job. It is useful to have some tool support in designing the ontology. Talk discovery agent Schema Talk is defined for a general talk type. The interested_subjects records the interested subjects for the users. The talk discovery system senses market updates, finding new talks 9 information. Once a new talk is found, it sends a notification to all the users who may be interested.
Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design
An Integrated Formal Approach to Semantic Work Environments Design
Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design
Box 5.
In this section, we demonstrate the development of an XSL application that automatically extracts the ontology related domain properties from the static aspects of TCOZ formal models (encoded in ZML format (Sun, 2001b)). The ontology for the system can be resolved readily from the static parts of TCOZ design documents. In the next section, we will demonstrate tools to automatically extract the semantic markup for services from dynamic aspects of TCOZ formal models. ZML is an XML environment for Z family notations (Z/Object-Z/TCOZ). It encodes the Z family of documents in XML, so that the formal model can be easily browsed by the Web browser (e.g. Internet Explorer). The eXtensible Stylesheet Language (XSL) is a stylesheet In to thisdescribe section,rules we demonstrate of andocuments. XSL application that automatically language for matching the anddevelopment translating XML In our case we extracts the ontology related domain properties from the static aspects of TCOZ formal translate the ZML to OWL and OWL-S. The main process and techniques for the translation models Figure 1. TCOZ OWL/OWL-S projection are depicted by (encoded Figure 1.in ZML format (Sun, 2001b)). The ontology for the system can be resolved readily from the static parts of TCOZ design documents. In the next section, we will demonstrate tools to automatically extract the semantic markup for services from dynamic aspects of TCOZ formal models.
ZML is an XML environment for Z family notations (Z/Object-Z/TCOZ). It encodes the Z family of documents in XML, so that the formal model can be easily browsed by the Web browser (e.g. Internet Explorer). The eXtensible Stylesheet Language (XSL) is a stylesheet language to describe rules for matching and translating XML documents. In our case we translate the ZML to OWL and OWL-S. The main process and techniques for the translation are depicted by Figure 1.
Figure 1: TCOZ OWL/OWL-S projection A set of translation rules translating from TCOZ model (in ZML) to OWL ontology is developed thestatic following ZML is an XML environment for Z family be resolved readily frominthe partspresentation. of TCOZ notations (Z/Object-Z/TCOZ). It encodes the Z design documents. In the next section, we will family of documents in XML, so that the formal demonstrate tools to automatically extract the model can be easily browsed by the Web browser semantic markup for services from dynamic 10 (e.g., Internet Explorer). The eXtensible Stylesheet aspects of TCOZ formal models.
Figure 1: TCOZ OWL/OWL-S projection A set of translation rules translating from TCOZ model (in ZML) to OWL ontology is developed in the following presentation.
For example, the giventhe type PID type can bePID translated a class ininto OWL with For example, given can beinto translated a clas Class( PID partial) Class( PID partial) Axiomatic (Function and Relation) definition translation
Axiomatic (Function and Relation) definition transla
An Integrated Formal Approach to Semantic Work Environments Design
Language (XSL) is a stylesheet language to describe rules for matching and translating XML documents. In our case we translate the ZML to OWL and OWL-S. The main process and techniques for the translation are depicted by Figure 1. A set of translation rules translating from TCOZ model (in ZML) to OWL ontology is developed in the following presentation.
Given Type Translation
ObjectProperty(interested_subjects domain(person) range(subject)) The translation from functions and relations in TCOZ to OWL ontolog
cases. The relation R will be translated into an OWL property with B as the The translation fromfunctions functions and relations inowl:Function TCOZ to C as the range class. For total we translate it into an Z Axiomatic (Subset and Constant) cases. The relation R will be translated into an OWL propert In our talk discovery example, the relation interested_subjects can be transl
Definition Translation C as the range class. Subset For total functions we translate it into an
In our talk discovery example, the relation interested_subject interested_subjects In thisObjectProperty( situation, if N corresponds to an OWL class, domain(person) range(subject)) then M will be translated into an OWL subclass ObjectProperty( of N. If N corresponds to an OWL interested_subjects property, then domain(person) range(subject)) M will be translated OWL subproperty Z Axiomatic (Subset into and an Constant) definition translation Subset of N. The translation rules forto the subset arethen M will be transl In this situation, if N corresponds an OWL class, subclassas: of N. If N corresponds to an OWL property, then M will be transl shown Z Axiomatic (Subset and Constant) definition transl subproperty of N. The translation rules for the subset are shown as:
The given types in the TCOZ model are directly In this situation, if N corresponds to an OWL class, then M translated into OWL/RDFS classes. This rule is subclass of N. If N corresponds to an OWL property, then M subproperty of N. The translation rules for the subset are show applicable to the given types defined in both inside and outside of a class definition. The translation Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design can be expressed as the following rule: Given type [T ] translation The given types in the TCOZ model are directly translated into OWL/RDFS classes. This rule T ∈ class is applicable to the given types defined in both inside and outside of a class definition. The translation can be expressed as the following rule:
For example, the given type PID can be translated into a class in OWL with PID as ID.
Constant In this situation, X will be translated into an instance of Y. The following Constant rule:
Class(PID partial)
For example, the given type PID can be translated into a class inInOWL PID as ID. thiswith situation, X will be translated into an
Axiomatic Class(PID(Function partial) and Relation) Definition Translation
Constant instance of Y. The following is the translation rule: In this situation, X will be translated into an instance of Y. rule: Axiomatic (Function and Relation) definition translation 11
The translation from functions and relations in TCOZ to OWL ontology requires several The translation frombefunctions in cases. The relation R will translated and into relations an OWL property with B as the domain class and C as the range class. For total functions we translate it into an owl:FunctionalProperty. z state schema translation TCOZ to OWL ontology requires several cases. In our talk discovery example, the relation interested_subjects can be translated into OWL as:
11
The relation R will be translated into an OWL A Z state schema can be translated into an OWL property with B as the domain class and C as the ObjectProperty( interested_subjects class. Its attributes are translated into OWL proprange class. For total functions we translate it into domain(person) range(subject)) erties with the schema name as domain OWL class an owl:FunctionalProperty. and the Z type declaration as range OWL class. In In our talk discovery example, the relation order to resolve the name conflict between same Z Axiomatic (Subset and Constant) definition translation Subset interested_subjects can be translated into OWL In be translated into an OWL attribute names used in different schemas, we use as:this situation, if N corresponds to an OWL class, then M will subclass of N. If N corresponds to an OWL property, then M will be translated into an OWL subproperty of N. The translation rules for the subset are shown as:
Integrated Formal Approach Wang & Dong, An Integrated Formal ApproachAn to Semantic Work Environments Design
to Semantic Work Environments Design
Z state schema translation AWang Z state schema can be translated into an OWL class. Its attributes are translated into OWL & Dong, An Integrated Formal Approach to Semantic Work Environments Design properties with the schema name with as domain OWL name class and the Z type declaration as defined range the schema name appended attribute The predicates in Object-Z class invariOWL class. In order to resolve the name conflict between same attribute names used in as the ID for the OWL property: ant can be translated into OWL class restriction. different schemas, we use the schema name appended with attribute name as the ID for the Z state schema translation For example, suppose that we add the predicate ‘# OWL property:
A Z state schema can be translated into an OWL class. Its attributes translated into OWL interestsare ≤10’ in the class Person to denote that a properties with the schema name as domain OWL class and the Z type declaration as range person can at most register 10 interested subjects OWL class. In order to resolve the name conflict between same attribute names used in the system. This bethe translated to: different schemas, we use the schema name appended withto attribute name as the will ID for OWL property: Class(person partial restriction(interests maxCardinality(0)
For example the Talk schema defined in a previous section can be translated to OWL as:
Note that as OWL is less expressive than TCOZ, not all the predicates defined in TCOZ can Class(place partial) be translated. Other translation rules are omitted, ObjectProperty(talk_place Functional as the aim of this chapter is to demonstrate the Class(talk partial) domain(talk) range(place)) Class(place partial) approach rathertothan the complete XSL For the Talk schema defined in a previous section can be translated OWLproviding as: … example … ObjectProperty(talk_place Functional ObjectProperty(talk_subject program design. domain(talk) range(place)) domain(talk ) range(subject)) The translation between TCOZ and OWL … …Class(talk partial) Class(place partial) ObjectProperty(talk_subject is not trivial. Rigorous study has been made to ObjectProperty( talk_place Functional domain(talk) range(subject)) avoid the conflict between those two semantics. domain(talk) range(place)) … … For example, the schema inclusion and class inClass translation Class translation talk_subject ObjectProperty( heritance do not correspond to OWL subclasses An Object-Zdomain( class can be translated into ansubject OWL class. talk ) range( ))Its attributes defined in state schema relationship even they appear to be the are translated into OWL properties with the class name as domain OWL class and the though type An Object-Z class OWL can beclass. translated into an OWL declaration as range Other translation details are similar to the Z state schema case. The reason is that, based on the semantics translation above. class. Its defined attributes defined in state schema are of Z schema and Object-Z class, an Object-Z class translated into OWL properties with the class and its subclass has the disjoint instances set: name domain OWL class and the type decClassastranslation laration as range OWL class. Other An Object-Z class can be translated into translation an OWL class. Its attributes defined• in schema c2 state inherit c1 ⇒ c2 ∩ c1 = ∅ ∀c1, c2 : Class details are similar to theproperties Z state schema are translated into OWL with thetranslaclass name as domain OWL class and the type declaration range OWL class. Other translation details are similar to the Z state schema tion definedasabove. This is totally different from the OWL subclass For example the Talk schema defined in a Class( talk partial) previous section can be translated to OWL as:
translation defined above.
relationship, where every instance of an OWL class is the instance of its super class also. In the early work (Dong, Lee, For example the Person class defined in a previous section can be translated to OWL as: Li, & Wang, 2004), it shows a Z semantics of DL and (Zucanu, Li, & Dong, Class(person partial) 2006) shows that the consistency between the Z ObjectProperty(person_id Functional domain(person) range(PID)) Semantics for the Semantic Web languages and the original OWL semantics. The predicates defined in Object-Z class invariant can be translated into OWL class restriction. For example, suppose that we add the predicate ‘# interests !10’ in the class
For example the Person class defined in a previous section can be translated to OWL as:
For example the partial) Person class defined in a Class( person 12 previous section can beperson_id translated toFunctional OWL as: ObjectProperty( domain(person) range(PID))
extraCting owl-s ontology from the tCoz model In the previous two sections we demonstrated how
Class(person partial) TCOZ could be used to capture the requirements ObjectProperty(person_id The predicates definedFunctional in Object-Z class invariant can be translated into OWL class domain(person) range(PID)) of Semantic Web applications and how to project
restriction. For example, suppose that we add the predicate ‘# interests !10’ in the class
12
An Integrated Formal Approach to Semantic Work Environments Design
TCOZ models to OWL ontologies automatically. OWL ontology is used to define the common understanding for certain concepts. The dynamic aspects of Semantic Web services, which define what the service does and how it behaves is also crucial. Recently, OWL-S emerges to define such information for SW services. Extracting the semantic markup information (i.e., OWL-S) for a Semantic Web service from the formal requirements model is another important research work. In this section, we will demonstrate the development of another XSL program to automatically extract OWL-S information from TCOZ formal models. The semantic markup for the system can be resolved from the TCOZ design documents also.
Translation Rules A set of translation rules translating from TCOZ model to OWL-S semantic markup for Semantic Web services are developed in the following:
which are defined by operation definitions. At the same time the service process allows one to effect some action or change in the world. The connection between operations in TCOZ and service process in Semantic Web services is obvious. In order to resolve the name conflict between the Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Des same operation names used in different classes we use the class name appended with operation name as the2ID for the process. Basic rule (R2):
In the case that an operation invokes no other operations, the opera AtomicProcess. A precondition appearing in a TCOZ operation schem Basic Rule (R) as precondition in the respective service process. A postcondition operation schema definition is modeled as effect in the respectiv In the case that an operation invokes no other transformation is shown as below.
operations, the operation is translated as an AtomicProcess. A precondition appearing in a TCOZ operation schema definition is modeled as precondition in the respective service process. A postcondition appearing in a TCOZ operation schema definition is modeled as effect in the respective service process. The transformation is shown in Box 6.
Basic Rule (R)
Basic Rule (R)
Basic rule 3 (R3):
Each operation in TCOZ is modeled as a process (AtomicProcess or CompositeProcess) in OWLS. In TCOZ, operations are discrete processes Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design which specify the computation behaviors and interaction behaviors. From a dynamic view, the An input appearing in a TCOZ operation schema definition is m state of an object is Basic subjectrule to change 2 (R2):from time respective process. output is appearing in as a TCOZ operatio In its the interaction case that anbehaviors, operation invokes no otherservice operations, the An operation translated an to time according to
Box 6.
modeled output operation in the respective process. AtomicProcess. A precondition appearing in as a TCOZ schemaservice definition is modeled as precondition in the respective service process. A postcondition appearing in a TCOZ Basic rule 4 (R4): operation schema definition is modeled as effect in the respective service process. The transformation is shown as below.
In the case that an operation calls other operations, the operation is tr process. Basic rule 5 (R5):
Basic rule 3 (R3):
Communication in TCOZ is modelled as an atomic process with inpu
Basic rule 3 (R3):
An Integrated Formal Approach to Semantic Work Environments Design
Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Des
Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Wang & Dong, An Integrated Formal Approach to Semantic Work Environments Design
Basic rule (R6): Other translation rules for An input appearing in a TCOZ operation schema processes in 6TCOZ. Each TCOZ process primitive will be translated Basic rule 6are (R6): definition is modeled as input in the respective process primitive omitted due to the limited into the proper OW For Each example, the following two rules show how the translation is don Basic rule 2 (R2): TCOZ process primitive will be translated into the proper service process. An output appearing in a TCOZ space. parallel processes in TCOZ. Other translation rules for process prim In the case that an operation invokes For no other operations, the operation is show translated as an example, the following two rules how the translation is operation schema definition is modeled as output the limited space. An input appearing in a TCOZ operation schema definition is modeled as input intranslation the is modeled AtomicProcess. A precondition appearing in a TCOZ operation schema definition parallel processes in TCOZ. Other rules for process p in therespective respectiveservice service process. An inoutput process. appearing service in atheTCOZ operation schema definition is in a TCOZ as precondition the respective process. A postcondition appearing limited space. modeled as output in the schema respective service process. operation definition is modeled as effect in the respective service process. The Basic Rule (R) transformation is shown as below. Basic rule 4 (R4):
In the case that an operation calls other operations, the operation is translated as a composite Basic Rule rule 7(R7): Basic (R): In theprocess. case that an operation calls other operaBasic rule 7 (R7): tions, the operation is translated as a composite Basic rule 5 (R5): process.
Basic rule 3 (R3): Basic Rule (R):
The guards in TCOZ model are used to control the input of an op modelled as preconditions. Other are translation are the omitted, The guards in TCOZ model used torules control inputasofthe an Communication in TCOZ is modelled as an atomic demonstrate the approach rather than providing the complete XSL pro modelled as preconditions. Other translation rules are omitted, as process with input or output. In OWL-S, atomic The guards in TCOZ model are used to control the demonstrate the approach rather than providing the complete XSL processes, in addition to specifying the basic acinput an operation. The guards are modelled as Caseofstudy tions from which larger processes are composed, preconditions. Otherclass translation The Case PIDManager definedrules for are the omitCalendar agent will be u study Communication in TCOZ modelled as an atomic process with input or output. In OWL-S, can also be thought of as the is communication ted, as thePIDManager aim of PIDManager this chapter isclass to demonstrate translation. The has five operations, AddPID, The class defined for the Calendar agent will Re b atomicofprocesses, in process additionspecification. to specifying the basic actions from which larger processes are into primitives a (abstract) and Validate. Each of them will be translated an OWL-S process. the approach rather than providing the complete translation. Thedefinition PIDManager class has input appearing operation schema is modeled as five inputoperations, in the AddPID composed, canAnalso be thought of in as atheTCOZ communication primitives of a (abstract) process (see Box 7) XSL program design. and Validate. Each of them will be translated into an is OWL-S proc respective service process. An output appearing in a TCOZ operation schema definition specification. operation AddPID is an operation invokes no other operations, s modeled as output in the respective The service process. an AtomicProcess withissome standardinvokes header information. The operation (R2) AddPID an operation no other operation Basic Rule (R): Case Study The an following shows (R2) part with of the semantic markup for service Ad AtomicProcess some standard header information. Basic rule 4 (R4): 2 syntax . The OWL-Sshows code in RDF format is omitted here. for service The following part of the semantic markup Each TCOZ process primitive will be translated The PIDManager class defined for the Calendar syntax2. The OWL-S code in RDF format is omitted here. into the proper OWL-S composite process. For agent will be used to demonstrate the PIDManager_AddPID( translation. define atomic process example, the following two rules show how the 14 The PIDManager class has five operations, Ad- - PID) inputs: (PIEManager_AddPID_id define atomic process PIDManager_AddPID( translation is done for the sequential and parallel result: PIDManager_addPID_Effect inputs: (PIEManager_AddPID_id -))PID) In the case that an operation calls other operations, the operation is translated as a composite result: PIDManager_addPID_Effect )) process. AddPID has one input id? declared to be of type PID. It will b Box 7. Basic rule 5 (R5): (PIDManager_AddPID_id) operation AddPID has one input in id? OWL-S declared(R3). to beThe of type PID. AddP It w ‘ids'=ids ! {id?}’ (PIDManager_addPID_Effect) which is a A p (PIDManager_AddPID_id) in OWL-S (R3). The operation translated into effect (PIDManager_addPID_Effect) (PIDManager_AddPID_EFFECT) in which OWL-is ‘ids'=ids ! {id?}’ RemovePID can be translated similarly. translated into effect (PIDManager_AddPID_EFFECT) in OW RemovePID can be translated similarly. The operation New calls the other operation AddPID, so it is translated (R4).The operation It performs subprocesses PIDManager_Ad New callstwo the other operation AddPID, so it is trans PIDManager_AddPID in sequence. PIDManager_AddPID_add_ (R4). It performs two Thesubprocesses PIDManager the PIDManager_AddPID communication on channel add (R5). guard of the operat in sequence. The The PIDManager_AddPID_a precondition (IDnotInIDS)(R7). Communication in TCOZ is modelled as an atomic process with input or output. In OWL-S, the communication on channel add (R5). The guard of the op atomic processes, in addition to specifying the basic actions from which larger processes are precondition (IDnotInIDS)(R7). composed, can also be thought of as the communication primitives of a (abstract) process specification. 2
http://www.daml.org/services/owl-s/1.2/owl-s-gram/owl-s-gram-htm.html 2
An Integrated Formal Approach to Semantic Work Environments Design
dPID, RemovePID, New, Delete, and Validate. Each of them will be translated into an OWL-S process. The operation AddPID is an operation invokes no other operations, so it will be translated as an AtomicProcess (R2) with some standard header information. The following shows part of the semantic markup for service AddID in OWL-S surface syntax2. The OWL-S code in RDF format is omitted here. define atomic process PIDManager_AddPID( inputs: (PIEManager_AddPID_id - PID) result: PIDManager_addPID_Effect ))
AddPID has one input id? declared to be of type PID. It will be translated into input (PIDManager_ AddPID_id) in OWL-S (R3). The operation AddPID has one predicate ‘ids’=ids ∪ {id?}’ ( PIDManager_addPID_Effect) which is a postcondition. It will be translated into effect (PIDManager_ AddPID_EFFECT) in OWL-S (R2). The operation RemovePID can be translated similarly. The operation New calls the other operation AddPID, so it is translated as a composite process (R4). It performs two subprocesses PIDManager_ AddPID_add_id_in and PIDManager_AddPID in sequence. The PIDManager_AddPID_add_id_in process represents the communication on channel add (R5). The guard of the operation is translated as the precondition (IDnotInIDS)(R7). The following shows the part of the semantic markup OWL-S for the operation New. The operation Delete and Valide can be similarly translated. define atomic process add_id_in( inputs: (add_id - PID) precondition: IDnotInIDS …)) define composite process PIDManager_New( input:… …; result:……) {perform add_id_in (… …) ; // sequence PIDManager_AddPID (… …) }
ConClusion In this chapter, we have demonstrated that TCOZ can be used as a high-level design language for modeling Semantic Web service ontologies and functionalities. Another major contribution of this chapter is that it develops systematic transformation rules and tools, which can automatically map TCOZ models to an OWL ontologies and OWL-S service description. Other work (Dong et al., 2004) has recently investigated the development of Z semantics for the ontology language, OWL-DL, and automatic transformation of OWL and RDF ontologies into Z specifications. This allows us to use Z tools, such as Z/EVES, to provide a checking and verification environment for Web ontologies. This contrasts with the work presented in this chapter, where we have investigated how RDF and OWL can be used to build a Semantic Web environment for supporting, extending and integrating various formal specification languages (Dong, Sun, & Wang, 2002a). One additional benefit is that RDF query techniques can facilitate formal specification comprehension. To summarize, we have demonstrated a clear synergy between Semantic Web and Formal Methods, showing how each can greatly support each other. This has been demonstrated through the ITTalks use case, and exploits the synergy between the two.
future researCh direCtions The rules and tool presented in this chapter allow us to build domain ontologies and service markup more easily Apart from forming the initial model used an OWL ontology and the OWL-S service, the TCOZ formal design model can provide an unambiguous requirement for the semantic web service system and the series of related supporting tools (Dong, Sun, & Wang, 2002b; Sun, 2003; Sun et al., 2001a; Sun et al., 2001b) can ensure the
An Integrated Formal Approach to Semantic Work Environments Design
high quality of the design model. Furthermore, the formal model may also be used in such a way that can lead towards a suitable Web service implementation. The automatic generation of code from formal specifications is a popular research area that has received considerable attention, and in which many of tools and systems already exists (Woodcock & Davies, 1996). However the refinement from formal models to Web Service specific implementations is a relatively new research area. The details of the refinement calculus are beyond the scope of this chapter and will be address in a separate paper.
referenCes Ankolekar, A., Burstein, M., Hobbs, J., Lassila, O., McDermott, D., Martin, D., et al. (2002). DAMLS: Web service description for the Semantic Web. In First International Semantic Web Conference (ISWC) Proceedings (pp. 348-363). Bussler, C. (2001). B2B protocol standards and their role in semantic b2b integration engines. Bulletin of the Technical Committee on Data Engineering, 24(1), 67-72. Berners-Lee, T., Hendler J., & Lassila O. (2001). The Semantic Web. ScientificAmerican. Brickley, D., & Guha, R.V. (Eds). (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved March 13, 2008, from http://www. w3.org/TR/rdf-schema/ Cost, R., Finin, T., & Joshi, A. (2002). Ittalks: A case study in the Semantic Web and DAML. In Proceedings of the International Semantic Web Working Symposium (pp. 40-47). Dean, M., Connolly, D., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., et al. (Eds.). (2004). OWL Web ontology language reference. Retrieved March 13, 2008, from http://www. w3.org/TR/owl-ref/
Dong, J. S., Lee, C. H., Li, Y. F., & Wang, H. (2004). Verifying daml+oil and beyond in z/ eves. In The 26th International Conference on Software Engineering (ICSE’04) (pp. 0-0). IEEE Press. Dong, J. S., & Mahony, B. (1998). Active objects in TCOZ. In The 2nd IEEE International Conference on Formal Engineering Methods (ICFEM’98) (pp. 16-25). IEEE Press. Dong, J. S., Sun, J., & Wang, H. (2002a). Semantic Web for extending and linking formalisms. In L.H. Eriksson & P. A. Lindsay (Eds.), Proceedings of Formal Methods Europe: FME’02 (pp. 587-606). Copenhagen, Denmark: Springer-Verlag. Dong, J. S., Sun, J., & Wang, H. (2002b). Z approach to Semantic Web services. In International Conference on Formal Engineering Methods (ICFEM’02), Shanghai, China. LNCS, SpringerVerlag. Dong, J. S., Li, Y. F., Sun, J., Sun, J., & Wang, H. (2002). Xml-based static type checking and dynamic visualization for TCOZ. In 4th International Conference on Formal Engineering Methods (pp. 311-322). Springer-Verlag. Duke, R., & Rose, G. (2000). Formal object oriented specification using object-z. Cornerstones of Computing. Macmillan. Finin, T., Fritzson, R., McKay, D., & R. McEntire. (1994). KQML as an agent communication language. In N. Adam, B. Bhargava, & Y. Yesha (Eds.), Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM’94) (pp. 456-463). Gaithersburg, MD: ACM Press. Ghallab, M., et al. (1998). PDDL-the planning domain definition language V. 2 (Tech. Rep. TR-98-003/DCS TR-1165). Yale Center for Computational Vision and Control. Heijst, G. V., Schreiber, A. T., & Wielinga, B. J. (1997). Using explicit ontologies in KBS development. Int. J. Hum.-Comput. Stud., 46, 183-292.
An Integrated Formal Approach to Semantic Work Environments Design
Lassila, O., & Swick, R. R. (Eds.). (2004). RDF/ XML syntax specification (Rev.). Retrieved March 13, 2008, from http://www.w3.org/TR/rdf-syntaxgrammar/ Levesque, H. J., Reiter, R., Lesperance, Y., Lin, F., & Scherl, R. B. (1997). GOLOG: A logic programming language for dynamic domains. Journal of Logic Programming, 31(1-3), 59-83. Lieberman, H., Nardi, B. A., & Wright, D. J. (2001). Training agents to recognize text by example. Autonomous Agents and Multi-Agent Systems, 4(1-2), 79-92. Mahony, B., & Dong, J. S. (1999). Sensors and actuators in TCOZ. In J. Wing, J. Woodcock, & J. Davies (Eds.), FM’99: World Congress on Formal Methods (LNCS, pp. 1166-1185). Toulouse, France: Springer-Verlag. Mahony, B., & Dong, J. S. (2000). Timed communicating object Z. IEEE Transactions on Software Engineering, 26(2), 150-177. Mahony, B., & Dong, J. S. (2002). Deep semantic links of TCSP and object-z: TCOZ approach. Formal Aspects of Computing, 13, 142-160. Martin, D., Cheyer, A., & Moran, D. (1999). The open agent architecture: A framework for building distributed software systems. Applied Artificial Intelligence,13(1/2), 91-128. Meseguer, J. (1992). Conditional rewriting logic as a unified model of concurrency. Theoretical Computer Science, 96(1), 73-155. Milner, R. (1999). Communicating with mobile agents: The pi-calculus. Cambridge University Press. Narayanan, S. (1999). Reasoning about actions in narrative understanding. In International Joint Conference on Artificial Intelligence (IJCAI’1999) (pp. 350-357). Morgan Kaufmann Press. Roman, D., Keller, U., Lausen, H., Bruijn, J. D., Lara, R, Stollberg, M., et al. (2005). Web services
modeling ontology. Journal of Applied Ontology, 39(1), 77-106. Schlenoff, C. et al. (2000). The process specification language (PSL): Overview and version 1.0 specification (Tech. Rep. NISTIR 6459). Gaithersburg, MD: National Institute of Standards and Technology. Schneider, S., & Davies, J. (1995). A brief history of timed CSP. Theoretical Computer Science, 138. Smith, G. (2000). The object-z specification language. Advances in Formal Methods. Kluwer Academic Publishers. Studer, R., Benjamins, V., & Fensel, D. (1998). Knowledge engineering, principles and methods. Data and Knowledge Engineering, 25(1-2), 161-197. Sun, J. (2003). Tools and verification techniques for integrated formal methods. Ph.D. thesis, National University of Singapore. Sun, J., Dong, J. S., Liu, J., & Wang, H. (2001a). A XML/XSL approach to visualize and animate TCOZ. In J. He, Y. Li, & G. Lowe (Eds.), The 8th Asia-Pacific Software Engineering Conference (APSEC’01) (pp. 453-460). IEEE Press. Sun, J., Dong, J. S., Liu, J., & Wang, H. (2001b). Object-z Web environment and projections to UML. In WWW-10: 10th International World Wide Web Conference (pp. 725-734). ACM Press. Woodcock, J., & Davies, J. (1996). Using z: Specification, refinement, and proof. PrenticeHall International. Zave, P., & Jackson, M. (1997). Four dark corners of requirements engineering. ACM Trans. Software Engineering and Methodology, 6(1), 1-30. Zucanu, D., Li, Y. F., & Dong, J. S. (2006). Semantic Web languages - towards an institutional perspective. In Futatsugi, Jouannaud, & Meseguer (Eds), Algebra, meaning, and computation: A
An Integrated Formal Approach to Semantic Work Environments Design
festschrift in honor of Prof. Joseph Goguen. Springer-Verlag (LNCS 4060, pp. 99-123).
Woodcock, J., & Davies, J. (1996). Using z: Specification, refinement, and proof. Prentice-Hall. Davies, J. (1993). Specification and proof in realtime CSP. Cambridge University Press.
additional reading Readers who are interested in more detailed information about formal notations used in this chapter will find that there is a wide selection of resources available. The following publications are good, basic texts:
Hoare, C. A. R. (1985). Communicating sequential processes. Prentice-Hall International.
endnotes 1
Duke, R., & Rose, G. (2000, March). Formal object oriented specification using object-z In R. Bird & C. A. R. Hoare (Eds.), Cornerstones of computing series. Macmillan Press.
0
2
Mahony presented detailed discussion on TCOZ sensor and actuators in (Mahony & Dong, 1999). http://www.daml.org/services/owl-s/1.2/ owl-s-gram/owl-s-gram-htm.html
Chapter XVI
Lightweight Data Modeling in RDF Axel Rauschmayer University of Munich, Germany Malte Kiesel DFKI, Germany
introduCtion When looking at what “information” means in the context of the Semantic Web, there is an interesting dichotomy (Spyns, Meersman, & Jarrar, 2002; Motik, Horrocks, & Sattler, 2007): On one hand, there is ontology engineering where knowledge is being universally represented and reasoned about. On the other hand, there is data modeling where Semantic Web technology is used to store and process information that is mainly application-dependent. The former receives most of the attention of the research community, but one must not forget that, even in semantic work environments, one frequently encounters the latter: citations, contacts, product information, events, and so forth. In this chapter, we present the editing meta-model (EMM), which provides standards and techniques for implementing RDF editing: It
defines an RDF vocabulary for editing and clearly specifies the semantics of this vocabulary. It also sketches user interface mechanisms to illustrate how the vocabulary would be used in practice. RDF has many existing standards that are more or less related to editing and presentation: 1.
2.
Schema: The RDF Schema Language RDFS (Brickley & Guha, 2004) and the Web Ontology Language OWL (Bechhofer et al., 2004) are schema1 languages for describing the structure of data. Presentation: The Fresnel Display Vocabulary (Bizer, Pietriga, Karger, & Lee, 2006) helps with presenting the data. It declaratively specifies how RDF data should be formatted and laid out. This means that you can package both the data and instructions on how to display it in the same RDF graph.
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Lightweight Data Modeling in RDF
3.
4.
5.
Reversible embedding for publication: RDFa (Adida & Birbeck, 2006) extends XHTML so that RDF data can be embedded inside it. This means that the process of merging semi-structured data (XHTML) and structured data (RDF) for publication becomes reversible; tools become feasible that, when given a Web address, can extract the data embedded in it, in a clearly defined, unambiguous way. Querying: The SPARQL query language (Prud’hommeaux & Seaborne, 2005) aids in flexible data retrieval and is used in advanced Fresnel applications. Computed data: Not all information has to be explicitly stated, some of it can be derived from existing data. OWL inferencing is one mechanism for such derivation. More powerful, rule-based approaches are currently being standardized (“The Rule Markup Initiative,” 2007) for RDF.
EMM focuses on (1), (2), and (3). It is a reflection on how they can be used and extended to better support data modeling. Both OWL and Fresnel are lacking for this task: OWL is too complex for basic data editing, as stated by Hendler who advocates a simplified version of OWL called “OWL Mini” (Hendler, 2006), which is currently work in progress. By its very nature, OWL does not support editing-specific views on RDF data: For a class of resources, such a view would specify what properties to edit (while ignoring the rest), how to display them and in what order. Fresnel has been created to complement OWL in this regard and calls editing-specific views lenses. But Fresnel is only concerned with displaying data and lacks several crucial features for editing. The EMM is a combination of subsets and extensions of existing standards and is split into three parts: •
Schema: We specify a simple type system that is loosely based on OWL and provides everything that is needed for most data ed-
•
•
iting applications. We also introduce some constructs that OWL is missing. Presentation: We restrict and extend Fresnel to suit our purposes. Some parts of Fresnel are too advanced for the current iteration of the EMM, some things are naturally missing, as Fresnel was never intended to support editing. Editing: Data structures and algorithms that are necessary during the actual editing of data. We never edit the data directly, but change it in a three step process: First a lens is used to create a projection of the data to be edited. This projection is a tree-structured view on the data that reflects the definitions of the lens. Second, the user changes the data, rearranging the nodes of the projection as the user pleases. Third, the projection is applied to the data: The changes encoded inside the projection are committed to the data.
In order to better illustrate how the EMM could be implemented, we also show what a graphical user interface for the EMM looks like. For this chapter, we assume familiarity with RDF. Knowledge about OWL and Fresnel is helpful, but we try to explain most of the basic ideas.
overview of the editing meta-model (EMM) In this section, we explain some of the conventions we adhere to and give a brief overview of the EMM.
Conventions used in this Chapter The following markers are use to indicate how we extend or restrict existing standards: (Ignored)
A feature of a standard is ignored, because it is not (nor will it ever be) useful for editing.
Lightweight Data Modeling in RDF
(Future)
We postpone implementing a feature, but it is likely that it is useful for editing. (Extension) A standard lacks a feature that we need. We thus have to define it ourselves.
Building Blocks for Data Modeling in rdf
who uses the emm and how? In this section, we look at the human stakeholders involved in editing. All of them come in contact with editing profiles. An editing profile is a collection of meta-data (such as OWL classes or Fresnel lenses) that supports editing a certain kind of data. The main roles in editing are: •
We try to shield the user from some of the complexities of RDF and never edit RDF data directly, but through an intermediate tree-structured view, a so-called projection. A projection is built from a few basic pieces; it has its own data-building vocabulary, if you will. This vocabulary is reminiscent of object-oriented data modeling and knows of the following (partially overlapping) categories of RDF nodes: • •
•
•
•
Records: Resources whose properties are seen as a set of (key, value) pairs. Assemblies: A sequence of RDF nodes, encoded as either an RDF collection or an RDF container. Atomic data: The leaves in the data tree. Literals are obviously atomic, but even with resources we are not always interested in their properties, but rather in their nodes. An example is a URI node that is used as Web site address. Compound data: all data that is not atomic (i.e., records and assemblies) is called compound. Enumerations: Enumerations are classes that explicitly enumerate all of their instances. OWL distinguishes between an enumerations of literals (“enumerated datatypes”) and enumerations of resources (“enumerated classes”); we seldom make that distinction. Enumerations are always atomic data.
•
•
EMM implementor: Whoever develops an RDF editor based on EMM. Profile author: Creates an editing profile and is expected to know at least some RDF. The activity is called meta-editing. End user: Uses an editing profile and is not necessarily familiar with RDF. The activity is called data editing.
The profile author should have as much control of the editing process as possible. Thus, the end user sees as little or as much of the innards of RDF as the profile author decides.
the main EMM Constructs The EMM has three layers: schema, presentation, and projection (Figure 1). The schema defines the structure of the data. The presentation both selects what data to display (it acts much like a database view this way) and how to display it (style information, if you will): The lens is the view, the format contains the style information, and groups are a package mechanism for both lenses and formats. The editing layer uses projections to encode, visualize, and apply changes to RDF data. Note that the projection is purely a programming language data structure, while all other constructs are defined in RDF.
the user interfaCe: what will it all look like? In this section, we want to give a feeling what the “finished product” should look like. There are
Lightweight Data Modeling in RDF
Figure 1. The EMM is partitioned into three layers: schema, presentation, and editing. Except for projection (which is purely a programming language data structure) each of the constructs in the layers is defined by an RDF resource
established standards for editing record-based data, as embodied by database programs such as Microsoft Access or Filemaker. We want RDF editing to feel much the same way. We distinguish two kinds of editing: Meta-editing is performed by profile authors and requires at least limited knowledge of RDF. Normal editing is performed by the end user and should hide RDF peculiarities as much as possible. We will first take a look at normal editing.
normal editing The following scenarios require user interface answers: • • •
Creating new instances: How does the user create new data? Editing an instance: How does the user edit newly created or existing data? Batch editing: We want to give the user the opportunity to edit not only single instances, but also sets of instances. For example, if
the user wants to set a property of several instances to the same value.
Creating new instances When the user decides to create a new instance, the user will usually know what kind of instance it should be. Thus we present the user with a list of classes which comprises all instances of rdfs:Class (Figure 3.1.1). Obviously, it is the responsibility of the profile author to create these beforehand. We also give the opportunity to create untyped resources or to enter a type manually. In either case, the anchor of the instance will be a blank node, but the user can rename this blank node to a URI later. Note that we currently do not support a new instance being a member of multiple classes. We consider this an advanced operation that could be implemented as adding more class memberships to an instance after instantiation. Whether to make this operation available to the end user or not and how is left to the EMM implementor.
Lightweight Data Modeling in RDF
Figure 2. Creating a new instance
editing an instance Instance editing happens either after the user has created a new instance or if he encounters existing data. One needs a lens in order to edit a resource, but EMM implementations may provide a default lens for editing any resource. As editing controls add considerable visual clutter to a user interface, we have two modes for accessing instances: •
•
In browsing mode (section (a) of Figure 3), data is read-only and can easily be copied. We simply display the currently selected resource. This is exactly what a Web browser gives you when used with normal Fresnel. In editing mode (section (b) of Figure 3) we provide the usual editing controls such as text fields and combo boxes. One can change property values, remove properties and add
new properties. This last operation brings up a list of predicates (section (c) of Figure 3) from which the user can choose the one she wants to add. If the range of a property comprises several classes, we display the same predicate several times, each time with a different class. The profile author may define constraints on the data, the main representative being cardinality constraints which state how often a given property must exist. Examples are: “there must be exactly one last name,” “there must be at least one phone number.” We allow these constraints to be violated at any time. For example, a new instance is always created empty, even if there are properties whose minimum cardinality is greater than 0. But we indicate violations with error messages.
Batch editing Whereas in single-instance editing, one sees a lot of parsed data, batch editing usually starts out with a display that is mostly empty. On can then specify what property values are to be added and what properties are to be removed. The latter operation is an explicitly displayed “remove all” that can be mixed with the former operation. For example: Remove all first names from a set of resources and add the single first name “Joe” to all of them.
Figure 3. Instance editing
Lightweight Data Modeling in RDF
meta-editing Meta-editing refers to the profile author editing classes and lenses. For class editing (Figure 4), we display many properties as you would for instance editing. Two kinds of data associated with a class require special treatment, though: restrictions make more sense if they are in line with the class, as they are the closest thing that owl has to declaring what properties a class contains. Lens editing with EMM is self-hosting: You can define a lens to edit lenses.
sChema: what is the struCture of the data? With the schema, the profile author defines the shape of the data: What properties can an instance have? What is the cardinality of a property? What values can it be assigned? And so on. The schema part of EMM is very directly related to RDF schema and OWL. For editing, we need a class-based approach (as opposed to the propertybased approach of RDFS) and we need to be able to find out the exact schema of a class. Thus we
first define our own type system that is loosely based on (and vastly simpler than) OWL.
The Basic Type System Constructs Class definition in the EMM is schema-driven, like for a database or a programming language. That is, a class is defined as having a set of properties. Basic class definitions can be further refined by defining two classes as equivalent or one as a subclass of the other. Finally, abstract classes are used to better support polymorphism: An abstract class matches all instances of its concrete subclasses, but does not have instances of its own. For example, if a property can have values that are either instances of Man or of Woman, its range is Human, an abstract superclass of both classes. When we check if a property value is valid, we match against Human which accepts all instances of Man and Woman. When it comes to instantiating new property values, Human itself is never directly mentioned; we only offer to instantiate either Man or Woman. The following definitions state the above mentioned ideas more formally.
Basic Sets The following sets are fundamental to our type system:
Figure 4. Class editing • • • • •
Node: The set of all RDF nodes Literal ⊂ Node: RDF literals URI ⊂ Node: RDF URI nodes Class: Classes, a set of symbols whose meaning is defined via equations (see below) PDef: Property definitions (as defined)
A property definition p ∈ PDef is a triple (u, I, R) where: • •
u ∈ URI is the predicate, a URI. I ∈ IN × (IN ∪ {∞}) is the cardinality, an integer interval (whose upper bound may be unconstrained).
Lightweight Data Modeling in RDF
•
R ⊂ Class is the range, a set of classes. The range is the intersection of these classes.
Defining Classes Basic class definitions: A class is originally defined either as a set of properties or as an enumeration (a set of instances). 1. 2.
Defining properties A := props(p1, …, pn), pi ∈ PDef Enumerating instances A := enum, ei ∈ Node
Features: One assigns characteristics to classes by making them a member of special sets. Currently, there is one such set. •
Abstr ⊂ Class: the set of abstract classes.
Basic relations: Two binary relations further refine the class definitions. •
•
equiv ⊂ Class × Class: a symmetric, reflexive and transitive relation that marks classes as equivalent. subExpl ⊂ Class × Class: explicit subclassing; is neither reflexive nor transitive and ignores equivalence.
Derived relations: Two more binary relations are derived from the basic relations. •
•
subDir ⊂ Class × Class: extends subExpl so that each class receives the superclasses of all of its equivalent classes (in addition to its own superclasses). Equivalence is ignored when it comes to reflexivity and the relation is still not transitive. subTrans ⊂ Class × Class: Now we use equivalence to create the reflexive closure of subDir and construct the transitive closure.
We sometimes use the relation names as binary infix predicates or unary functions, with obvious semantics.
Type System Operations There are a few basic questions that the EMM needs to ask the type system it is based on, its interface to the type system, if you will. Each of the following subsections describes one type system operation. The following ideas need to be explained beforehand: There are two kinds of classes in RDF: explicit classes are attached to a resource, while implicit classes must be derived from the RDF node itself. (Potentially) implicit classes are subclasses of rdfs:Literal, rdf: List and subclasses (that all have the implicit instance rdf:nil) and enumeration classes (whose enumerated instances might not have a type). The following meta-variables are used in the definitions below: • • • • •
C, D ∈ Class: single classes A ∈ Abstr: abstract class S, T ⊂ Class: class intersection (interpreted as the intersection of its elements) X: either a set of classes or a single class n ∈ Node: any kind of RDF node
Class Subsumption and Instance Checking S incl D: Is a class intersection S a subclass of a class D? C incl D :⇔ (C , D) ∈ subTrans {C1 , C 2 }incl X :⇔ C1 incl X ∧ C 2 incl X (w.l.g.) X incl{D1 , D 2 } :⇔ X incl D1 ∧ X incl D 2
Instance Checking n instof C: Is a node n an instance of a class C? Instance checking is the foundation for selectors. For implicit classes, we need to decide if n is an
Lightweight Data Modeling in RDF
instance, otherwise we extract the types of n and turn to class subsumption. (See Box 1.)
Default Values for Implicit Classes When we add a new property to an instance, we give it an initial value that depends on its range. For an explicit class, this is simply a blank node that we type with that class. For implicit classes, we need default instances. Examples: • • •
Literals: “0” for integers, “” for strings, now for time, and so forth. Collections: rdf:nil Enumerations: first member of enumeration.
means that n is an instance of S, classes(T) means that the class in T can be explicitly attached as types of a new blank node. In the former case, we need default values of classes, while in the latter case we have to consider abstract classes. Formally, the function constrs maps a type intersection S to a construction set: (See Box 2).
Schema Derivation The schema derivation function sch : Class → 2PDef computes the schema for a given class. We define sch via simpler precursors called schDir (the direct schema) and schEquiv (the direct schema honoring equivalence).
Direct Schema The function dflt : Class → Node maps classes to default instances: rdf:List is mapped to rdf:nil, enumerations are mapped to the first enumeration member and literal classes are mapped to one of their instances. All other classes are mapped to ⊥.
Construction Sets A construction set answers the question how a given class intersection S can be instantiated. It contains two kinds of elements: instance(n)
The direct schema function schDir for a class C returns those properties that have been directly assigned to C. (See Box 3.)
Equivalence Schema The equivalence schema of each member of an equivalence class E is the union of the direct schemas of all members of E.
schEquiv(C ) = D∈equiv (C ) schDir( D)
Box 1. C incl rdf:List dtype(n) incl C n instof C :⇔ n ∈ values(C ) types(n) incl C
if n is rdf:nil if n is a literal if C is an enumeration otherwise
Helper functions:
•
dtype : Literal → Class determines the datatype of a literal.
•
values : Class → 2Node returns the elements of an enumeration.
•
types : Node → 2Class computes the explicit types of a node.
Lightweight Data Modeling in RDF
Box 2. {instance(dflt(C ))} if S = {C} ∧ dflt(C ) ≠⊥ constrs( S ) = D∈concr( A) addto( D, constrs({C})) if S = { A, C} ∧ A ∈ Abstr (w.l.g.) otherwise classes( S ) Helper functions: addto adds a new element to a set of classes() elements. For example: addto(C, {classes(D1), classes(D2)}) = {classes(C,
•
D1), classes(C, D2)} •
concr computes the concrete subclasses of an abstract class.
Note that there can be at most one implicit class (first case above), otherwise the type intersection is not instantiable.
Finally, the definition of sch starts with the equivalence schema and recursively combines it with the (complete) schema of all superclasses. (See Box 4)
Matching specific language tags: For example, lang:fr-CA2 could be the class of all literals whose language tag is fr-CA. emm:UriNode matches all URI nodes. We show this in action in another section. A class for blank nodes does not seem useful.
Schema Honoring Equivalence and Subtyping • •
more implicit Classes
translation from and to owl
RDFS and OWL can already distinguish some nodes via implicit classes: They match nodes not based on what types are attached, but on what the nature of the node is. Examples are rdfs: Literal for any kind of literal, xsd:integer for literals whose datatype is xsd:integer. A few classes are missing:
As the EMM type system is far less powerful OWL, we do not have correspondances for all features of OWL. In this section, we explain what can be translated and how and what has to be ignored. Class descriptions:
• •
emm:PlainLiteral matches any kind of plain literal (without a language tag). emm:TaggedLiteral matches any literal that has a language tag.
•
OWL class identifiers: correspond almost directly to EMM’s class identifiers. We only allow URIs in the range of rdf:type. OWL Lite additionally allows restrictions (which have blank node anchors). More powerful version of OWL allow any resource.
Box 3. { p , , pn } if C = props( p1 , , pn ) schDir(C ) = 1 otherwise ∅
Lightweight Data Modeling in RDF
Box 4. sch( D) = schEquiv( D) sch(C1 ) sch(C n ) where {C1 , , C n } = subDir( D) The override operation is defined as follows. { p1 , , p n } D = p1 ( ( p n D) )
{(u , I1 I 2 , R1 R2 ), p2 , , pn } p1 = (u , I 2 , R2 ) w.l.g. (u , I1 , R1 ) { p1 , , pn } = {( u , I 1 , R1 ), p1 , , pn } otherwise R1 R2 = R1 if R1 incl R2 I1 I 2 = I1 if I1 ⊂ I 2 Intuitively, new properties are added monotonously in subclasses. Overriding properties can only restrict cardinality and range of overridden properties.
•
OWL enumerated classes and enumerated datatypes: map directly to EMM enumerations. Type intersection: the defined class is the subclass of the intersection operands. Type union: the defined class is an abstract superclass of the union operands. Property restrictions: are EMM classes with one property definition. owl:allValuesFrom defines the range. Cardinality constraints define cardinality (Ignored) owl:someValuesFrom, owl:hasValue. (Future) We do not support owl:complementOf. In the future, class complements might be interesting for matching (but not for defining) classes.
• • •
•
•
Global properties: are treated as if they were property definitions for all classes in their domain. •
Class axioms: • •
0
rdfs:subClassOf corresponds to the relation subDir. owl:equivalentClass corresponds to the relation equiv.
owl:disjointWith is assumed for all non-equivalent classes.
•
Property classes: The type of a global property sometimes implies features such as cardinality. owl:FunctionalProperty: a property with cardinality 1. owl:ObjectProperty: There are only resources in the range. owl:DatatypeProperty: There are only literals in the range. (Future) owl:InverseFunctionalProperty defines a property whose value (which must be a resource) is globally unique. This might be useful for some applications such as unique IDs. Properties of global properties. rdfs:domain: defines the domain, multiple values mean that this property applies to each of the mentioned classes.
Lightweight Data Modeling in RDF
rdfs:range: defines the range, multiple values mean type intersection. (Future) owl:inverseOf
Future Schema Features Once we have associations, we can sensibly support OWL’s logical property characterizations owl:SymmetricProperty and owl: TransitiveProperty). The property axioms rdfs:subPropertyOf and owl:equivalentProperty can be used for inheriting property definitions and for more powerful selector matching.
presentation: seleCt, order and style the data to Be edited The schema is used for controlling the structure of the data. The presentation layer of the EMM takes care of displaying the (generally multidimensional) data on a two-dimensional medium. The editing process happens as follows: The user selects one or more resources he wants to edit and then selects a lens to edit it with. A lens specifies which properties are shown and which ones are hidden. They thus provide the user with a limited view on the data, showing only those details that are relevant in the current situation. Lenses also define an order in which to display property values; a necessity for editing, because humans derive meaning from such ordering. Lastly, lenses specify via selectors what resources they can be applied to. Given a set of resources, we can thus automatically generate a list of applicable lenses for the user. In some situations, we also might automatically select the most appropriate lens for the user. This process is called conflict resolution and explained in another section. Lenses are supported by formats and groups. Formats contain more detailed information as to how to present the data: What widget to display
it in, what font to use, and so forth. Format applicability is also specified via selectors, so they are somewhat independent of lenses even though they do not make any sense on their own. Groups define sets of lenses and formats that should be used in conjunction. As lens and format selection is constrained by the currently active group, groups determine the “look and feel” of the data. The active group is usually changed manually by the user, but automatic (temporary) changes also occur. The presentation layer is almost completely based on Fresnel. The reader can thus consult the Fresnel manual (Bizer, Lee, & Pietriga, 2005) for further details.
The Abstract Box Model: Laying out rdf data For editing, we lay out RDF data as a tree: the root of this tree is the so-called container, a top-level display element such as a GUI window or a printed page that holds all of the data. Inside the container, there are RDF resources. Each resource contains itself properties and properties consist of a label and a value. This kind of tree is called the abstract box model (Figure 5) in Fresnel.3 It is sometimes referred to when defining layout and styling and determines some inheritance effects.
Selectors: Matching Resources and properties Selectors are the matching mechanism that is used by both lenses and formats to specify to which instances (and/or properties) they apply. Selectors are mostly implicit in Fresnel, but we found it helpful to reify them.
Resource Selectors Given a resource and a resource selector, we can determine whether the selector matches the resource or not. This is used to specify whether
Lightweight Data Modeling in RDF
Figure 5. The Fresnel abstract box model defines how elements of the RDF data are to be nested when laying them out. The EMM version of it has one property box per label-value pair. The section box is EMM-specific and groups data that is edited similarly (such as all record properties or all members of an assembly)
respectively. A class domain of rdfs: Resource matches any resource. Simple selector: Directly state the ID of the relevant class.
Property Selectors Property selectors are only used in formats, via the property fresnel:propertyFormatDomain. There are two kinds of property selectors: 1.
2.
FSL selector: An FSL expression that returns properties. Not currently used in the EMM. Simple selector: Directly state the predicate.
Selector Specificity
a lens or format applies to a resource or not. Resource selectors are either class selectors or instance selectors: •
•
Instance selectors: Match by looking at the anchor of a resource. Attached to lenses and formats via fresnel:instanceLensDomain and fresnel:instanceFormatDomain, respectively. SPARQL selectors: A SPARQL query defines what instances match a given selector. Not currently used in the EMM. Fresnel Selector Language (FSL) (Pietriga, 2005) selectors: Not currently used in the EMM. Simple selector: Directly state the node of the applicable resource. Class selectors: Match by looking at the type of a resource. Attached to lenses and formats via fresnel:classLensDomain and fresnel:classFormatDomain,
The order in which we have listed the selectors above determines their specificity. That is, an instance selector is more specific than a class selector. When comparing two class selectors, whichever has a class that is a subclass of the other is considered more specific. In situations where several lenses or formats apply, we have to decide which one to use. This process is called conflict resolution and selector specificity plays a big role in it. For example, if two lenses apply to a resource and one of them has an instance selector, while the other one has a class selector, then the former lens wins.
Groups: Context-Specific Containers for EMM Constructs Groups are a context-specific container for data such as lenses and formats. Below, we list all the responsibilities of a group. Some of them depend on the concept of the active group: At any time, the user can designate one group to be current. It then determines certain parameters for layout and styling. Whenever a property description “uses” a group in a lens, the group also changes temporar-
Lightweight Data Modeling in RDF
ily, affording more control over what lenses and formats are used for the property value. There also is an implicit (and purely conceptual) global group that contains all data that can possibly be attached to a group. It is a superset of all groups, but also contains data that does not belong to any group. Whenever no group is active, we use the global group. A group specifies: • • •
•
•
Member lenses.4 Member formats. Style information for containers, resources, properties, labels and values. This information is used to complement the same information that is stored in formats. So, in a way, a group is also a format. Additional content to be inserted before and/or after: resources, properties, labels and values. Again, this content is supplemental to similar format information. Primary classes: what classes are most important? Instances of non-primary classes are usually only displayed nested inside primary instances.
•
•
•
o
o
we do not know about in advance. Lenses that do not show any properties are called atomic. They produce atomic projections. Their main purpose is to provide schema information and atomic alternatives among sublenses (which are usually compound). hideProperties: set of URIs that tells us what properties should be ignored by fresnel: allProperties use: a format directly. Normally, formats function completely separate from lenses. This mechanism allows one to directly specify what format to use, without depending on a selector to match in a certain case. purpose: is like a tag that indicates additional roles of a lens. We currently support the following lens purposes: defaultLens: Whenever looking for a matching lens inside a group, we check lenses marked as default first. labelLens: When creating the display label of a resource, we also consider all matching label lenses of the active group.
A lens is a way for the profile author to state what part of the author she wants the end user to look at. Thus, a lens encodes what properties to edit and in what order to display them. A lens has the following attributes:
A property description provides further information about a specific property. The RDF encoding of Fresnel has two variants for defining property descriptions: One either provides just the predicate or a blank node of type fresnel:PropertyDescription. Then the predicate has to be provided as a property of that blank node, which can also hold the following information:
•
•
Lenses: Selecting Trees of RDF Data
•
selectors: what instances can this lens be used with? showProperties: a list of property descriptions (see below for details). Each property description defines how to display a single property. When laying out the property data, we honor the order in which the properties are listed here. There is also the pseudoproperty fresnel:allProperties that can be placed anywhere in the list and that stands for all properties, even the ones
•
use: a format or a group. In the case of a format, it works like the “use” property of a lens. In the case of a group, the given group temporarily becomes the active group. Thus, this group is responsible for all data we encounter from here on down in the tree that we project from the RDF data. sublenses: These are normal lenses that state their applicability via selectors. A lens can thus determine on a case-by-case basis how the data tree should continue underneath a
Lightweight Data Modeling in RDF
property. For example, depending on the type of an object, we might either display just its URI or several of its properties. Lens data encoding in RDF is covered in another section of the chapter.
Property Descriptions for Collections and Containers Fresnel uses the pseudo-predicate fresnel: member to denote “any collection and/or container predicate.” One could use rdf:first and rdfs:member for this purpose, but fresnel: member is a clean way of saying that the object should be treated as an assembly without knowing beforehand if it will be a collection or a container; which one it is will be determined by looking at the RDF data. Alas, during editing, when we add a property whose range is an assembly, we need to know what empty structure to initialize the object with. Thus, we always disambiguate fresnel:member to emm:collectionMember or emm:containerMember. This happens according to the following rules. •
•
•
If there are lens selectors with a class name, we can examine the inheritance chain of that class to determine whether we have a collection lens, a container lens or both. Otherwise, there is no schema information to help us decide either way. If there are no additional properties, we assume an emm: collectionMember (as rdf:List properties cannot coexist with other properties). If there are additional properties, we assume a emm:containerMember. Remove all other properties if this is a collection lens. As mentioned above, this is due to the fact that collections do not have stable anchors. Sometimes they do not have their own anchor at all, if you consider rdf:nil an empty anchor.
Conflict Resolution for Lenses and Formats Whenever several lenses apply and the user does not choose one manually, we have to automatically determine which lens to use. This is done by finding out which lens is most specific. This depends on what group it is in and on the specificity of its selectors. The same algorithm applies to formats as well, with minor modifications, so we describe both algorithms here. We successively try the following options until we have a matching lens or format: 1. 2.
3.
(Format-only: the format attached to a lens via fresnel:use.) Active group: a. (Lens-only: default lens of a group, fresnel:purpose = fresnel: defaultLens.) b. The lens or format with the most specific matching selector in the active group. Global group: we look at the lenses or formats in this group in the same order as above. Note that this means that we are actually looking at all lenses or formats.
Formats: Styling rdf data Formats work in parallel with lenses to determine how the data stored in an RDF resource should be displayed. “How” means: What widget should be used to display the data? What text style should it be displayed in? Should there be additional text appended before and after the data? Formats have the following categories of parameters (see also Table 1): •
Selectors: Formats can be applied to both resources and properties, but some format mechanisms are specific to either resources or properties. Thus, the selectors of a format
Lightweight Data Modeling in RDF
•
•
•
•
are in general a mix of resource and property selectors. Label configuration: Two things can be specified in standard Fresnel. First, whether the label should be shown at all. Second, whether a fixed text should be displayed as the label instead of the predicate. Styles: apply to elements in the abstract box model, namely resources, properties, labels and values. Styles depend on the HTMLspecific Cascading Stylesheets standard and are currently ignored in the EMM. Value display hint: What widget should be used for editing a property value? For example, even if the value is a URI node, it still might make sense to let the end user edit it inside a text field. Additional content: Specify text that should be prepended or appended to the content of resources, properties, labels or values. For example, one might want to start and end a list with square brackets and to separate the list elements with commas.
Inheritance of Format Parameters Table 1 illustrates format inheritance. It works along the projection tree: The group is seen as the root which contains nested compound projections that hold property projections which in turn contain either compound or atomic projections. Inside this tree, we compute a merged format for each projection: each format parameter is considered separately and more specific values override less specific ones. For example: If the fresnel:labelFormat is specified in both the group and the property, the property wins. In the table, each parameter has its own row. An empty circle indicates that the value can be specified, a filled circle indicates where the parameter is actually used. Entries to the right are more specific than entries to the left. After merging, we do not inherit the merged version (which would propagate values further down the tree than we
would like), but the parameters as they have been originally specified. (Extension) In constrast to Fresnel, compound projections (and not just groups) are allowed to specify property, label and value parameters. This is especially useful for text generation, for example, when a resource-specific separator between properties is desired.
Additional Content: Adding Text Additional content is specified for resources, properties, labels, and values as fresnel: contentBefore, fresnel:contentAfter, fresnel:contentFirst, fresnel: contentLast and fresnel:contentNoValue. An atomic lens is only considered a value, not a resource. Additional content serves two purposes. First, it can be used to add text data to the display boxes (Figure 5). With “no value,” the content is the box, while, with the other positions, it is placed before or after the box. In a sequence of boxes, the last fresnel:contentAfter is overridden by fresnel:contentLast (if it
Table 1. Inheritance of format parameters. Circles indicate where the parameters can occur, filled circles indicate where they are used. Values to the right override (less specific) values to the left Format parameter
Group
Compound
Property
fresnel:label
●
fresnel:value
●
fresnel:resourceFormat
○
●
fresnel:propertyFormat
○
○
●
fresnel:labelFormat
○
○
●
fresnel:valueFormat
○
○
●
fresnel:containerStyle
●
fresnel:resourceStyle
○
○
fresnel:propertyStyle
○
○
●
fresnel:labelStyle
○
○
●
fresnel:valueStyle
○
○
●
Lightweight Data Modeling in RDF
is there). Similarly, fresnel:contentFirst overrides the first fresnel:contentBefore. Note that each section has its own sequence of properties. When it comes to first and last values (and labels), we also start counting inside the section box and not inside the property box (the box model is shown in Figure 5). Second, additional content is also used to translate a resource to text. If no additional content has been specified, a resource is translated to a sequence of values and labels without any space or separator. As this is undesirable, because it is hard to read, we supply a default that inserts an equal sign after the label and a space after the value. We give an example of an additional content specification in another section.
Table 2. Widget hints specify what widget should be used for displaying data. Circles indicate what kind of projection can be displayed with a certain widget. Filled circles show the default widget hints. The first three widgets can be configured by modifiers (see Table 3) atomic resource emm:TextWidget
○
emm:nodeWidget
● (enum: ○)
emm:comboBox
○ (enum: ●)
○ (enum: ●)
fresnel:externalLink
○
○
fresnel:image
○
○
•
Value Display Hints: What Widgets to Use? Display hints are a fundamental way of styling: they determine what graphical widget displays the relevant information. For example, instead of referring to a resource, a URI node could also be the address of a Web site. Then, for editing, we would like to change it via a text field. During browsing, we want to have a clickable hyperlink. We distinguish two kinds of display hints. Widget hints (Table 2) determine the widget that should be used for displaying the information: For example, should a URI be displayed as a hyperlink or as the image it points to? • •
•
emm:textWidget: Make the node (be it a literal or a URI) editable via a text field. emm:nodeWidget: Treat the resource as a reference to a node: display it and make it possible to jump to it. emm:comboBox: always display all instances of a class, whether they are explicitly defined via an enumerated class or datatype or whether they have to be collected via a query.
literal
•
•
● (enum: ○)
fresnel:externalLink: The literal or resource refers to an internet location. It should be displayed as a hyperlink that, when clicked on, goes to that location in a Web browser. fresnel:image: Like above, treat the literal or resource as an internet address, but display whatever resides there as an image. Compound widgets: have no hints, because they are always displayed with the same widget.
Hint modifiers (Table 3) help with configuring some of the widgets: For example, should a text field display the “raw” URI or a qname? Hint modifiers for resources and literals are disjunct. •
•
Resources: fresnel:uri displays a resource as its raw URI, without a namespace prefix; emm:parsable uses a namespace prefix if possible; emm:default uses (unparsable) “prettier” display aids such as a label or a label lens. Literals: em m:parsable displays a parsable text string, like in Turtle (Beckett, 2004); emm:default displays just the text, no quotes and no datatype. Note that
Lightweight Data Modeling in RDF
Table 3. Hint modifiers specify how RDF should be displayed inside a widget. Circles indicate in what situations a certain modifier can be used, filled circles indicate defaults resource
literal
emm:default
●
●
emm:parsable
○
○
fresnel:uri
○
emm:editableAnnotation
○
the language of a plain literal should still be displayed, because it is fundamental to discerning several similar values. emm:editableAnnotation additionally makes the datatype or language tag editable.
editing: speCifying and applying Changes via projeCtions In this section, we would like to talk about what data structures are necessary to hold the data during editing. It turns out that there are many things one has to consider so that editing RDF works as one would expect in a database-like application. Editing is a process that comprises three steps: First, the user picks a resource he would like to display. Second, an applicable lens is chosen, either automatically or manually by the user. Third, this lens is used to project the resource, producing, as a result, a projection. A projection contains the data of the resource, but in a format that mirrors the structure of the lens. Compared to traditional parsing, the lens can be considered the grammar and the projection the abstract syntax tree. Similarly, the projection will reflect how deeply a lens is nested and will always be a tree. The inner nodes of this tree are compound projections, the leaves are atomic projections. Fourth and last, after the user has finished editing the projection, it plays
its second role: the changes it contains are written back to the resource; the projection is applied to the resource. This is the basic scenario for using a projection. When designing the data structure, we also have to consider that a projection has to work in batch mode (one should be able to apply the encoded changes to several resources at once). And it should be as self-contained as possible, so that in a client-server setting, it is enough to send the projection to the client, without the need to include schema and presentation information. Last, lenses also can be used for translating a resource to a text string. For example, to generate a labels for a whole class of resources. It turns out that this task is easier to perform if one takes the intermediate step of creating a projection. Figure 6 shows how the box model from Figure 5 is represented as a data structure: The root of a projection tree is always a compound projection, it holds sections that contain data that is edited in a similar way. For example, a projection of a resource could have two sections: one for container (rdf:Seq) data, one for normal properties. That is, it is both an assembly and a record. One can assign a section title, in the showProperties list of a lens, by putting a literal in front of the property description that starts the section. The projection classes have the following purposes: •
•
•
NodeProjection: is either compound or atomic. In both cases it contains a format and stores the value of the projected node before editing and its current value. AtomicProjection: To minimize dependencies on schema and presentation data, we store the enumeration members with atomic projections that need them for editing (e.g., when they are displayed as a combo box widget). CompoundProjection: The lens is needed for adding new properties (whose objects must be projected from sublenses). The class is needed for adding a type to a newly created projection. To create atomic new properties, we need the
Lightweight Data Modeling in RDF
Figure 6. A UML class diagram for the projection data structures. An assembly section contains members, a record section contains properties. Note that all tree associations such as sections and object are bidirectional which allows us to go up in the tree. This is used for cycle detection and to share formats
•
•
•
property range and use the class as a fallback if there are no sublenses. Class and lens are both stored as resource, so that the projection does not have a dependency on schema or presentation data. Compound projections usually contain at least one section. Section: holds information that is related to the projected resource. Here we explain the assembly and the record section. AssemblySection: contains the elements of an ordered multiset as encoded in an rdf: List or an rdf:Seq. RecordSection: contains normal property values.
The following sections explain how the projection data structure is actually used.
Creating a projection Projecting a Lens Creating a projection means that we get a set of triples (all the properties of a resource) and translate them to a CompoundProjection. To do that, we initially create an empty CompoundProjection instance and then iterate over all property
descriptions in the lens. If the section that we have last added to the projection is what the property description needs, we let it add its data (as extracted from the triples) to the section. If not, we create a section for the property description. Note that, even if the property description does not match any triples, we still need to create a section, because the section knows how to create new fields inside itself. To put it differently: We would otherwise not be able to “instantiate” those property descriptions that initially did not match any RDF content. Sections are later displayed as user interface elements for adding new properties. Whether a lens is atomic (contains no properties) or compound determines whether it projects an atomic projection (a leaf) or a compound projection (an inner node). To recurse into a property value, we need to find a matching sublens. A sublens might have several selectors. The (most specific) matching selector is crucial, because it determines the value of the class attribute of the CompoundProjection. While we usually ignore property values that do not match any sublens, we are more tolerant if there is only one collection sublens, because collections are often untyped.
Lightweight Data Modeling in RDF
Adding Schema Information The class information in a projection is needed as a fallback when the lens does not have enough information. For example, when adding a new atomic property, we need that property’s range so that we can provide a good default value. This range can be encoded as an atomic lens. If it is not, we use the schema. But what is the class of a projection? We actually have two options. In both cases, the matching selector is important, the selector that was the reason that we picked a certain lens and not another one.
editing a projection
•
Displaying the Projection
•
Use the static schema: Access the schema information of the matching selector (that usually contains a class). Use the dynamic schema: Access the schema information of the resource we have projected. Here, we are interested in what type of the resource matched the selector.
For simplicity’s sake, we currently use the static schema.
Projecting a Format So far, we have not explained how the format value of a projection is created. Table 1 shows how the format is created parameter by parameter: In each line, a filled circle indicates where the parameter value is stored, empty circles show that the value determines other values to the right unless it is overriden. An example: The value of fresnel: propertyFormat is stored in a property projection and a combination of the values from the group, the compound and the property format: If the parameter is specified in the property format, it is used, if not, we look at the compound projection and the currently active group. In addition to that, we also track the position of the projection and consolidate the additional content parameters (Sect. 5.5.2): fresnel:contentFirst and fresnel:contentLast only have an effect for the first and last element, where they override
fresnel:contentBefore and fresnel: contentAfter. Thus, we only keep these latter two parameters and change them during merging as appropriate. Note that finding the format before the merging involves selector matching and conflict resolution. The merged format that is stored in the projection is the result of merging the unmerged formats of the ancestors in the projection tree. Otherwise, formats would be propagated to far down the projection tree.
We suggest that implementors of the EMM use two display modes (Table 4): In editing mode, the user has the typical widgets for changing data such as text fields and combo boxes. In browsing mode, we only show read-only text and make that text easy to copy. We are thus operating like a standard Fresnel application.
Adding a Property Providing the user with a good user experience when it comes to adding a property is surprisingly complex. Part of the problem is that we need to solve this problem in a generic way and consider all possible options: •
•
Predicate: What properties do we present the user with? What do we do if we have the pseudo-predicate fresnel:allProperties? What if the user wants to manually enter a predicate? Object: How does the user specify an object? Should the user choose an existing node or create a new one? In the latter case, what type should it have? What if the object is a literal or an enumeration? Should the user enter a value right away or do we provide a default and let the user change that default later? How can we make sure that we always
Lightweight Data Modeling in RDF
Table 4. Editing and browsing mode display widgets differently • Widget
Editing Mode
Browsing Mode
textWidget
text field
uneditable text
nodeWidget
click to change instance (to new or existing node)
click to visit
comboBox
combo box
uneditable text
externalLink
text field
click to visit
image
text field
image
compound
contains editing widgets and an “add” button
contains browsing widgets
present an exhaustive list of options for the object? For solving this problem, one obviously has several options. We explain one of them in more detail: We ask the user for how to create the predicate and the object at the same time, instead of first letting him pick a predicate and then providing him with options for the object (in a more wizard-style fashion). All variations of this (object, predicate) query are encoded as so-called property choices which take the following forms. Property choices have to be created alongside the projection. They are a projection-specific list of all possibilities for adding a new property. Naturally, the end user will never see their structure, but rather a human-readable label. •
00
typeset(predicate, mode, set of resource): Given a set of types, we either let the user pick an appropriately typed node or instantiate one for him. mode = choose: filter the choices by the given set of types. As an extended option, the user can manually enter a URI. mode = instantiate: create a new blank node that has the specified types.
•
The user can later rename it to a URI if she wishes so. constant(predicate, node): In the case of literals and enumerations, there is no real difference between choosing and instantiating. We give the property a default value that can later be changed. enterPredicate(choices): When we let the user enter a predicate, we do have a two-step process: First the user enters a predicate, then the user decides what to do about the object. We encode the latter step as property choices with an empty predicate.
Property choices can be seen as instructions for creating new (default) data. The user makes his choice, the projection is extended, then possibly edited and applied. The algorithm for creating the field choices is as follows. Creating the choices happens directly after creating the projection and is driven by property descriptions. That is, each property description produces a set of field choices. Obviously, their predicate is the predicate of the property description. What kind of choice it is, depends on the range of the property. We compute the construction set for the range and translate instance elements to constants and classes elements to typesets. For example, a literal class will produce a constant (say, the empty string for xsd:string or 0 for xsd: integer) and an abstract superclass will produce one typset choice for each of its subclasses. Normally, we turn the selectors of the sublenses into a set (union) of property ranges. Each class selector is one property range, instance selectors are directly turned into constants. If there are no sublenses, we use the schema to determine the range. If there is not even a schema, we assign two default field choices: one for creating an empty literal, one for choosing any RDF resource. The mode of the choices depends on whether or not a sublens is atomic. Atomic sublenses always produce atomic projections whose structure is not visible, thus choosing existing instances is
Lightweight Data Modeling in RDF
the only meaningful operation. For compound sublenses, we always instantiate a new projection, but the user can later swap the new instance for an existing one.
Applying the Projection: Changing the data The user always changes the projection and not directly the data. To actually perform the changes, the user applies the projection. Before we can fully describe the application algorithm, we first have to explain another mode of projection application: Whereas until now, we have only talked about editing one resource at a time, batch editing is about editing several resources at once. There are three kinds of manipulation operations that can be encoded in a projection: adding new data, removing existing data, and changing existing data. Each of these operations faces new challenges with batch editing. Adding an atomic property value, such as a literal, to a set of resources is straightforward, but if the property value is compound, we have to create a new instance for each resource. This kind of manipulation is encoded in a compound projection as the node being null. For removing data from a single resource, we are working with a definite set of data, so there are no problems. For multiple resources, we want to be able to remove both all properties with a given predicate and one certain predicate-object combination from all resources. But finding out which predicate-object combinations are the same in several resources is complex, especially if nested compound projections are involved. So, while the combined projection of multiple resources could provide a summary, we opted for a simpler solution: it is initially empty, but one can add explicit remove operations that affect the complete subtrees that would have been shown in a single projection. Removing a specific predicate-object combination only works for single resources. Note that for both addition and removal, batch editing forced us to introduce nonspecific operations in addition to
directly specified RDF. If you look at Figure 6, you see that the removed data is kept in or below the sections: the assembly section has a separate attribute for removed elements, the record section stores removed properties in the property projection. The change operation can be seen as a combination of addition and removal, but this viewpoint mainly appears later in the algorithm; initially, it is a distinct operation. When one changes a compound projection, the anchor does not change.5 But changing an atomic projection, we have to first remove the old property and then add the new property. Thus, we store both the old and the new node with an atomic projection.
Application Result: What to Remove, Add, and Keep The application algorithm itself produces three sets of triples: triples to remove, triples to add, and unchanged triples. A few examples: null nodes cause a new blank node to be created for each invocation. An atomic projection with an old node and a new node produces one triple to be removed (with the old node) and one triple to be added (with the new node). To remove a projection (for example, when handling the “removed” data in an assembly section or a property projection), we compute the triples for the subtree, ignore the triples to be added and move the unchanged triples to the triples to be removed. Similarly, copying a projection means that we use the union of the removed and unchanged triples. We also offer a “duplication” operation which means that we have to post-process the union of removed and unchanged triples and swap each existing blank node to a fresh one.
Parsed vs. Created Data A projection is always a mix of data that has been newly created by the user and data that has been parsed from RDF. The most frequent case is that a parsed compound projection contains new data.
0
Lightweight Data Modeling in RDF
But projection-only data can also contain RDF data if the user chooses to add a reference to an existing resource. The “old node” attribute of a node projection indicates whether a projection is parsed or created: if it is null, then the data does not exist in RDF.
Adding and Removing Types Compound lenses always make their class the type of newly created data. For removal, they have to find all equivalent types or subtypes of the class and remove them. This is due to our decision to use the static instead of the dynamic type.
Editing Collections Collections are the only compound data structure where the anchor can change. As a consequence, if there is a collection section in a compound projection, there can be no record or container sections. Furthermore, we need the parent projection whenever the anchor changes to that we can set the parent property to the new value. Last, we add a type if the anchor changes from rdf:nil to a blank node and remove it if it changes from a blank node to rdf:nil.
additional editing meChanisms In this section, we look at additional topics that are relevant for editing: How does one handle qualified names? How are resources labeled? How do we best implement thin frontends (as in a typical Web setting, where the server is a fat backend and the client is a thin frontend)?
RDFa: Reversibly Embedding rdf in html We have now seen that, via a combination of lenses and wiki pages, it is easy to merge structured RDF
0
data and semi-structured wiki pages to publish content. The only problem after publication is that the structured data is lost in the process; one cannot usually undo the merging to get back the original RDF information. Compare this to the many lists of contacts or dates available on the Web that are hard to import into address books or calendars. Help comes in the form of an (X)HTML extension called RDFa (Adida & Birbeck, 2006). It defines a small set of attributes that remain mostly hidden inside the HTML, but whose values allow one to extract the RDF triples that were used to generate the HTML in the first place. In the following example (which is taken from (Wikipedia, RDFa)), relatively free text is marked up with RDFa:
In his latest book Wikinomics, Don Tapscott explains deep changes in technology, demographics and business. The book is due to be published in October 00.
All RDFa-specific parts are underlined: we initially define the namespace prefix dc6 for the URI http://purl.org/dc/elements/1.1/. Then we say that the paragraph is about the resource http://www. example.com/books/wikinomics, meaning that the properties we are about to define should be attached to it. Then we define the dc:title and dc: author of that book. For the date, we use an RDFa mechanism that lets us display October 2006 to the user, but store 2006-10-01 in RDF. Note that even though we discard the former, it is still semantically marked up as the value of property dc:date. Thus, the following three triples have been embedded in the HTML fragment and can be easily extracted: http://www.example.com/books/wikinomics dc:title “Wikinomics” ; dc:author “Don Tapscott” ; dc:date “00-0-0” .
Lightweight Data Modeling in RDF
There are tools such as bookmarklets (RDFa bookmarklets) available with which it is easy to extract the RDF from RDFa-enriched Web sites.
Designating Primary Classes Not all available classes are equally interesting. This is why profile authors can designate more important classes as primary. Their special role comes to play in two situations: one can filter lists of instances to only show instances of primary classes. And when creating a new instance, we only show primary classes. Context-specific primary classes are added to a group as either a list via fresnel:primaryClasses or as a property value via emm:primaryClass (which makes it compositional). Global primary classes are instances of rdfs:Class (of which owl:Class is a subclass) that have a property emm:category whose range is emm:ClassCategory. This allows us to group classes by vocabulary or purpose. Group-defined primary classes are part of a default category if they do not explicitly state one.
Handling Qualified Names Qualified names (qnames) have become a common practice in RDF to abbreviate long URIs. For example, after defining the prefix name rdf to mean the URI reference http:// www.w3.org/1999/02/22-rdf-syntaxns#, one can abbreviate the URI http:// www.w3.org/1999/02/22-rdf-syntaxns#List as rdf:List. Qualified names originated in RDF/XML via XML namespaces but have been expanded to a universal abbreviation mechanism in other RDF syntaxes such as Turtle and SPARQL. Qnames have two problems: First, in any of these syntaxes, prefix definitions are metadata that is not supported by all RDF syntaxes and thus cannot be easily transferred between different repositories. Second, there can be name
clashes (or rather prefix clashes). As an answer to the first problem, EMM reifies qname definitions as resources of type emm:Prefix with the predicates emm:prefixName and emm:uriref. As a partial solution to the second problem, we regard qnames as a purely presentational mechanism and internally always use complete URIs. This solves internal disambiguities, but still might make explicit disambiguation necessary whenever the user enters a qname. Fresnel mentions qnames as having to be defined per group, but does not (to our knowledge) specify what vocabulary or mechanism should be used for doing so.
Labeling Resources There are several ways of turning a resource node into a label (a human readable text string). The following list is in order of preference. That is, if the first solution works, we use it, otherwise we try the next one, and so forth: 1. 2.
3.
4.
Use an rdfs:label that is attached to the resource. Find a matching lens whose purpose is fresnel:labelLens. See later in the chapter for an example of a label lens. Editors can define a specialized lens for labels, the so-called default label lens. If no other lens matches, this lens is used, without checking its selectors. Display the resource URI: Abbreviate as a qname, if possible, and display blank nodes as pseudo URIs.
The following example is taken from the Fresnel manual (Bizer et al., 2005). It defines a label lens for instances of foaf:Person that only shows a person’s name. :foafPersonLabelLens rdf:type fresnel:Lens ; fresnel:purpose fresnel:labelLens ; fresnel:classLensDomain foaf:Person ; fresnel:showProperties foaf:name .
0
Lightweight Data Modeling in RDF
Note that the label will be displayed, but could be hidden with the right format. Label lenses are always turned into single lines of text, unless one explicitly inserts line breaks. Additional content can be used to further refine the label text.
Editor Support for Common Editing patterns Creating an Editing Profile To have editing capabilities as quickly as possible, a natural focus is on a lens. One can specify all necessary schema information inside the lens, but for atomic property values, using atomic lenses is a bit clumsy and it is better to make this information available to all lenses via a schema definition. Thus, creating an editing profile will start with the schema information. Then, an editor should translate a class definition to a lens whose show property list contains all of a class’s properties (including inherited ones). Turning one of the directly mentioned predicates into a property description should be a standard operation. Sometimes, the instance data is there before the schema. In this case, there should be an operation for translating a set of instances (so that one can be sure that all properties appear) into a skeleton schema.
a resource r to a resource s means that wherever r appears as subject, predicate, or object in a triple, it has to be replaced with s.
thin frontends With the popularity of AJAX-based (Garrett, 2005) Web applications, we suddenly have a completely different landscape in which to program. The server acts mainly as a data source and provides a service-oriented API. The client is where the complete user interface is located. It only intermittently invokes the server API. For RDF editing, one has two architectural choices: Either one implements a complete client-side RDF database in Javascript and synchronizes with the server. Or the server handles most of the data manipulation and the client only hosts a very shallow frontend layer. The former approach is probably the most elegant in the long term, but it also makes high demands on client resources and that might be undesirable. The latter approach enables us to reuse existing server-side frameworks (often implemented in Java), with minimal adaptation. And it is actually quite well supported by the EMM separation of projection editing and application. We have implemented our AJAX prototype as a thin frontend and have encountered three challenges while doing so:
Using the Editing Profile • Especially with nested resources, it is very useful to just take an existing resource and duplicate it. An EMM projection is very helpful, because it clearly defines the boundaries of nested resources and duplication is a standard EMM operation. A complementary command is to rename a resource: Sometimes a blank node should be made public, turned into a URI node. Or an existing URI has to be changed. Sometimes one might even want to revise the decision to assign a URI to a node and turn it back into a blank node. In all these cases, a renaming command is useful. Renaming
0
Labeling resources: Quite complicated and should not be performed in the frontend. Thus, we have implemented a generic mechanism that comes into play whenever the server returns a compound data structure that contains RDF resource data (projections are one example). The mechanism extracts this resource data from the data structure, uses it to create a mapping from resource to label, and attaches this mapping to the data structure. The client can now use the mapping to display all resources properly, without knowing anything about the details of labeling.
Lightweight Data Modeling in RDF
•
•
Transmitting projections: If we want to minimize traffic between backend and frontend, projection data should be as small as possible. We therefore do not transmit schema or lens data with a projection, only resources that can be used for looking them up when the projection comes back to the server later on. Adding properties: The only time we need schema and lens information is when we add new properties. But that process can again be delegated to the backend, which has been made possible by reifying all possible choices as a data structure. After the end user has made a choice, it is left to the backend to change the projection and retransmit it in its entirety.
examples The following examples are written in the RDF syntax Turtle (Beckett, 2004).
Bookmarks In order to support a hypothetical vocabulary for bookmarks, one would first define the appropriate schema, an OWL class. We assume that the prefix bookmark has been appropriately defined. bookmark:Bookmark rdf:type owl:Class ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:onProperty bookmark:name ; owl:allValuesFrom xsd:string ]; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:onProperty bookmark:location ; owl:allValuesFrom xsd:string ].
Then, one creates a lens that says that the name of this bookmark should always be displayed first, before its location.
[ rdf:type fresnel:Lens ; fresnel:classLensDomain foaf:Person ; fresnel:showProperties ( bookmark:name bookmark:location ) ]
a lens for Classes emm:OwlClassLens rdf:type fresnel:Lens ; fresnel:classLensDomain rdfs:Class ; fresnel:showProperties ( ( rdf:type fresnel:PropertyDescription ; fresnel:property rdfs:subClassOf ; fresnel:sublens ( # atomic rdf:type fresnel:Lens ; emm:lensDomain ( rdf:type emm:ClassSelector ; emm:acceptClass rdfs:Class ; emm:rejectClass owl:Restriction ) ) ) owl:unionOf “Restrictions” ( rdf:type fresnel:PropertyDescription ; fresnel:property rdfs:subClassOf ; fresnel:sublens emm:OwlRestrictionLens ) “Global Properties” emm:globalProperties ). emm:OwlRestrictionLens rdf:type fresnel:Lens ; fresnel:classLensDomain owl:Restriction ; fresnel:showProperties ( owl:onProperty owl:allValuesFrom owl:minCardinality owl:maxCardinality owl:cardinality ).
This lens for classes demonstrates three new features we have not introduced so far (underlined above): First, the lens domain is defined by a new class of resource, an emm:ClassSelector. It also explicitly rejects instances of owl:Restriction. This is necessary, because we want to handle restrictions later, but they are a subclass of rdfs:Class. We are still experimenting with this feature, its semantics are not yet fully clear. Second, there are two section headings, “Restrictions” and “Global Properties.” These headings are different from additional content, because they
0
Lightweight Data Modeling in RDF
actually start a new section with the consequence that restrictions are added separately from other class properties. Last, emm:globalProperties introduces a special class-specific section. This section collects all global properties whose domains match the anchor of the current projection. In the user interface, we allow the user to jump to them, but not to edit them (there is a separate lens that serves this purpose).
a lens for lenses The following RDF defines a lens that can be used for editing lenses. fresnel:Lens rdf:type rdfs:Class . emm:LensLens rdf:type fresnel:Lens ; rdfs:label ”Lens Lens” ; fresnel:classLensDomain fresnel:Lens ; fresnel:showProperties ( rdfs:label fresnel:classLensDomain fresnel:instanceLensDomain ( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:showProperties ; fresnel:sublens emm:ShowPropertiesSublens ) ). emm:ShowPropertiesSublens rdf:type fresnel:Lens ; fresnel:classLensDomain rdf:List ; fresnel:showProperties ( # Two sublenses, depending on whether the list element is a # property or a property description ( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:member ; fresnel:sublens emm:PropertyDescriptionLens ) # check first ( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:member ; fresnel:sublens ( # An atomic lens: no showProperties, just a lens domain rdf:type fresnel:Lens ; fresnel:classLensDomain emm:UriNode ) ) ). emm:PropertyDescriptionLens rdf:type fresnel:Lens ; fresnel:classLensDomain fresnel:PropertyDescription ; fresnel:showProperties ( fresnel:property
0
( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:sublens ; fresnel:sublens emm:LensLens ) ( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:use ; fresnel:sublens emm:FormatLens ) ).
We have given the second underlined sublens the more general class lens domain emm: UriNode. This means that—as a last, catch-all case—we accept any URI node as a collection element. A lens domain of rdfs:Property would have been more elegant, but then we can only refer to properties that have an explicit type in the RDF repository.
additional Content We demonstrate how a lens can be used to translate a resource to text with the following lens. ex:TagsLens rdf:type fresnel:Lens ; fresnel:classLensDomain rdf:List ; fresnel:showProperties ( ( rdf:type fresnel:PropertyDescription ; fresnel:property fresnel:member ; fresnel:use ( rdf:type fresnel:Format ; fresnel:propertyFormat ( fresnel:contentBefore ”(” ; fresnel:contentAfter ”)” ); fresnel:valueFormat ( fresnel:contentAfter ”, ” ; fresnel:contentLast ”” ) ) ) ).
This is a lens for collections. We want to display square brackets before and after all liste elements (even if there are not any) and commas between list elements. If we use this lens to generate text for the list (“Private” “Todo”)
Lightweight Data Modeling in RDF
we get the result (Private, Todo)
using the emm in a semantic work environment We are using the EMM in the generic RDF editor hyena that is extended via plugins to support various RDF vocabularies. The EMM implementation is one such plugin, another one is a small wiki engine. hyena has two frontends. It can be run as an Eclipse plugin or as an AJAX-based Web application. Figure 9 shows the Eclipse frontend. The resource we are currently editing is a blog entry for which we have defined a lens. Afterwards, we embed all blog entries via a query into a wiki page that defines a homepage (Figure 9). That same homepage can also be displayed on the Web (Figure 9). For further details on the lens-wiki integration in HYENA, consult (Rauschmayer, 2007).
related work protégé Protégé is an ontology editor that has been around for quite some time. In terms of visualization, Protégé comes with a set of plugins that allow several visualization styles. By default, a generic form-based class/instance editor is used. Other plugins provide graph views. It is possible to create own Java-based components that can visualize/ edit individual classes and instances. Concerning data models, Protégé traditionally comes with its own frame-based ontology model. Via a plugin, the so-called RDF backend (Noy, Sintek, Decker, Crubezy, Fergerson, & Musen, 2001), RDFS support is implemented by mapping the frame model of Protégé to RDF(S) on project load and save. However, the mapping is not perfect: Instances of multiple classes can not be handled or created; domain and range of properties is disjunctive, not conjunctive; RDF Schema is treated as restrictions. More recently, an OWL plugin has been written
Figure 7. The RDF editor Hyena running as an Eclipse plugin. The form-based editor for the selected resource (a blog entry) has been defined by a lens
0
Lightweight Data Modeling in RDF
Figure 8. HYENA displays a compound wiki page: Each blog entry and each publication is an embedded lens projection. Clicking on them opens a lens-based editor
Haystack (Quan, Huynh, & Karger, 2003) Haystack consists of several Semantic Web-enabled integrated applications and partly implements what nowadays is called the Semantic Desktop. It uses Adenine, a programming language that can be serialized into RDF, for assisting Java code when creating user interfaces.
DBin (Tummarello, Morbidoni, & Nucci, 2006)
Figure 9. HYENA running in Web mode. The currently displayed resource is the same as in Figure 9
DBin allows us to create discussion groups, letting users annotate arbitrary subjects. Data is stored decentralized; DBin servers are used as a lookup service. For creating and viewing information, DBin uses so-called Brainlets that are written in XML, describing class-specific views.
Annotation Profiles (Palmér, Enoksson, & Naeve, 2006) Annotation Profiles share much of the same goals that we have with the EMM. But they have a slightly different focus: We came from generic RDF editing and wanted to make it more end-user friendly for common tasks, while honoring RDF standards as much as possible. Annotation profiles, on the other hand, started as a custom RDF editing application for electronic portfolio management. They have become a universal editing framework with a style reminiscent of smart forms.
Efforts to Extend/Modify RDFS (Noy, Knublauch, Fergerson, & Musen, 2004), replacing most of the software’s internals, and not relying on the frame-based ontology model. The class hierarchy and consistency is checked using an external OWL inference engine such as Racer or Pellet.
0
Several groups identified the complexity and the comparatively nonintuitiveness of OWL and work on modified versions of RDFS. Basically, these efforts define a set of useful features such as functional properties or cardinality constraints, and deprecate a number of RDF features that are difficult to handle (blank nodes, sequences, bags).
Lightweight Data Modeling in RDF
Implicitly, the RDF extensions the Protégé RDF backend uses for representing Protégé-specific language extensions are such an extension. Other efforts, such as NRL (Sintek, van Elst, Scerri, & Handschuh, 2007), also handling named graph semantics, are under way.
formally we want to progress, while maintaining maximum usability.
schema Computed Properties
This section is relatively loose brainstorming about what features make sense for future versions and extensions of the EMM.
We have to evaluate to what extent we want to support property values that have not been explicitly asserted. For example: counters, subtotals, inverse properties (owl:inverseOf). Potential solutions are inferencing, rule-based, and imperative languages.
associations
Data Validation
Associations bring us even closer to object-oriented data modeling. They are an answer to the observation that some RDF properties belong quite tightly to a resource, while others are more like relations between resources. Data-modeling-wise, associations are the first time that transitivity makes sense. For associations, we have to examine how to mark properties as having a more relational nature. Cardinality ideas can be borrowed from OWL, but OWL does not have the means to declare n-to-m relations. As this feature is orthogonal to lenses, we probably have to introduce a new concept at the presentation layer, as well.
One should be able to do some light-weight data validation, for example, via regular expressions. Then the question arises whether to support this at the presentation level or whether one should define a new datatype for this.
future researCh direCtions
Defaults Especially when the cardinality of a property is greater than zero, the profile author can support the end user by providing properties. We see two main approaches: Provide complete prototypes for a class or specify default values per property.
Meta-Modeling using uml for meta-editing Visual representation of schema information caters to a different area of human cognition and is especially helpful when communicating with non-experts. As our style of data-modeling is almost object-oriented, the unified modeling language UML (Booch, Rumbaugh, & Jacobsen, 1999) is a logical choice. It has served use well internally for quickly sketching schema ideas and is an even better fit after we have extended the EMM with associations. For UML, we have to explore conventions, related work and how
OWL has a rich vocabulary for annotating, versioning and composing ontologies. Many of those constructs seem applicable to data modeling and should probably be included in the EMM.
presentation Advanced Fresnel Features Fresnel has extended vocabularies with many useful features that should be considered for an advanced version of the Editing Meta-Model. The
0
Lightweight Data Modeling in RDF
extended lens vocabulary includes lens inheritance, the extended format vocabulary supports different output media. It is less clear how useful the Fresnel selector language (Pietriga, 2005) is for editing, but the added flexibility for defining lenses is worthy of more exploration. Last, we have to evaluate how to best support styling, especially in a GUI setting.
Medium-Specific Additional Content In current Fresnel, the data provided via fresnel:labelFormat is sometimes HTMLspecific. In the future, there should be a way of separating medium-specific text from mediumindependent text.
User-Defined Labels Sometimes, several instances of the same property should have individual labels. For example, if the phone numbers of contact information are all stored in a single property, the user will still want to assign different labels to them (“home,” “work,” etc.). RDF reification should probably be used for this purpose, together with label defaults that need to be specified somewhere.
editing Sharing Data Sharing the data held in a projection section is currently not handled well. For example, if one uses the same address twice, removing it once will remove all of its data, leaving the second property with just an empty anchor. The current default is that atomic projections can be shared, compound projections are not shared. As a solution, we need the means to specify which nested data might be shared. Then we can warn about removing data to which another property still has a “reference.”
0
Help Texts For some applications, the profile author will want to guide the end user through the editing process via comments and examples. We will have to evaluate what vocabulary is adequate; rdfs: comment is one of several candidates.
Data Refactoring Many schema-changing refactoring operations from programming languages make sense in a data-editing scenario and then additionally necessitate batch manipulation of presentation and instance data. Examples include: renaming a property or moving a property from a class to a superclass.
ConClusion In this chapter, we have filled a gap in current RDF standards: While RDF is perfectly suited for lightweight data modeling, it lacks clearly defined standards to completely support it. To provide this support, we have described the Editing Meta-Model EMM which defines subsets of OWL and Fresnel. It supplements these subsets with new vocabulary and implementation guidelines. We think that this undertaking is useful for implementors of data-centric (as opposed to ontology-centric) editors and can serve as a contribution to the ongoing discussion about simpler versions of OWL (Hendler, 2006).
aCknowledgment We thank Matthias Palmér for valuable feedback on drafts of this document.
Lightweight Data Modeling in RDF
referenCes Adida, B., & Birbeck, M. (2006). RDFa primer 1.0: Embedding RDF in XHTML. Retrieved March 13, 2008, from http://www.w3.org/TR/xhtmlrdfa-primer/ Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., & Stein, L. A. (2004). Owl Web ontology language: Reference. Retrieved March 13, 2008, from http://www.w3.org/TR/owl-ref/ Beckett, D. (2004). Turtle—terse RDF triple language. Retrieved March 13, 2008, from http:// www.dajobe.org/2004/01/turtle/ Bizer, C., Lee, R., & Pietriga, E. (2005). Fresnel—display vocabulary for RDF—user manual. Retrieved March 13, 2008, from http://www. w3.org/2005/04/fresnel-info/manual/ Bizer, C., Pietriga E., Karger, D., & Lee, R. (2006). Fresnel: A browser-independent presentation vocabulary for RDF. In Proc. 5th Int. Semantic Web Conf. (ISWC). Booch, G., Rumbaugh, J., & Jacobsen, I. (1999). The unified modeling language user guide. Addison-Wesley. Brickley, D., & Guha, R.V. (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved March 13, 2008, from http://www. w3.org/TR/rdf-schema/ Garrett, J. J. (2005). AJAX: A new approach to Web applications. Retrieved March 13, 2008, from http://www.adaptivepath.com/publications/essays/archives/000385.php Hendler, J. (2006). The dark side of the Semantic Web. Retrieved March 13, 2008, from http://www. mindswap.org/blog/2006/12/13/the-dark-side-ofthe-semantic-Web/ Motik, B., Horrocks, I., & Sattler, U. (2007). Bridging the gap between OWL and relational
databases. In Proc. 16th int. conf. World Wide Web (WWW). Noy, N. F., Knublauch, H., Fergerson, R., & Musen, M. A. (2004). The Protégé OWL plugin: An open development environment for Semantic Web applications. In 3rd International Conference on the Semantic Web (ISWC). Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., & Musen, M. A. (2001). Creating Semantic Web contents with Protege-2000. IEEE Intelligent Systems 16(2), 60-71. Palmér, M., Enoksson, F., & Naeve, A. (2006). Annotation profiles: Configuring forms to edit RDF. Submitted for publication. Pietriga, E. (2005). Fresnel selector language for RDF (FSL). Retrieved March 13, 2008, from http://www.w3.org/2005/04/fresnel-info/fsl/ Prud’hommeaux, E., & Seaborne, A. (2005). SPARQL query language for RDF. Retrieved March 13, 2008, from http://www.w3.org/TR/rdfsparql-query/ Quan, D., Huynh, D., & Karger, D. R. (2003). Haystack: A platform for authoring end user Semantic Web applications. In Proc. 2nd Int. Semantic Web Conf. (ISWC). Rauschmayer, A. (2007). Wikifying an RDF editor. Submitted for publication. Rauschmayer, A., Andonova, A., & Nepper, P. (2007). Increasing the versatility of java documentation with RDF. In Proc. 3rd International Conference on Semantic Technologies (I-SEMANTICS). RDFa bookmarklets. (2006). Retrieved March 13, 2008, from http://www.w3.org/2006/07/ SWD/RDFa/impl/js/ The rule markup initiative. (2000, November) Retrieved March 13, 2008, from http://www. ruleml.org/
Lightweight Data Modeling in RDF
hyena. A paper on hyena itself is currently being written. The best available introductions are Rauschmayer, Andonova, and Nepper (2007) and Rauschmayer (2007). The former shows how hyena integrates various vocabularies (including Fresnel) to help with Java documentation. The latter presents a wiki that has Fresnel integration.
Sintek, M., van Elst, L., Scerri, S., & Handschuh, S. (2007). Distributed knowledge representation on the social semantic desktop: Named graphs, views, and roles in NRL. In Proc. European Semantic Web Conf. (ESWC). Spyns, P., Meersman, R., & Jarrar, M. (2002). Data modelling vs. ontology engineering. SIGMOD Rec, 12-17. Tummarello, G., Morbidoni, C., & Nucci, M. Enabling Semantic Web communities with DBin: An overview. In Proc. 5th Int. Semantic Web Conf. (ISWC), 2006. Wikipedia: RDFa. (2006, March) Retrieved March 13, 2008, from http://en.wikipedia.org/ wiki/RDFa
endnotes 1
2
3
further reading As this chapter is a new synthesis of existing work, there is no direct “further reading.” The reader can, however, consult the following sources to find out more about some of the topics that we have covered: •
•
Fresnel: The Fresnel homepage at http:// www.w3.org/2005/04/fresnel-info/ provides access to a variety of Fresnel-related information. hyena: The hyena homepage at http://hypergraphs.de/ provides further information on
4
5 6
The term schema is obviously not entirely correct when referring to OWL, but we use it anyway, to keep this enumeration reasonably simple. The namespace lang would have to be defined. The Fresnel manual suggests only displaying the label the first time when a property has several values. It is our experience that it makes sense to display it each time, as values can take up quite a lot of (especially vertical) room, when nested lenses are involved. In Fresnel, things are actually stated the other way around: lenses and formats say what group they belong to. Conceptually, we found it easier to think about a group as containing lenses and formats. The only exception is collections. dc stands for the standard RDF vocabulary Dublin Core which defines predicates for author information, and so forth.
Compilation of References
Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commuications, 7(1), 39-59. Abecker, A. (2004). Business process oriented knowledge management: Concepts, methods, and tools. Doctoral Dissertation, University of Karlsruhe, Germany. Aberer, K., Cudre-Mauroux, P., Hauswirth, M., & van Pelt, T. (2004). GridVine: Building Internet-scale semantic overlay networks. In Proceedings of the 3rd International Semantic Web Conference (ISWC) (pp. 107-121). Hiroshima, Japan: Springer. Aberer, K., Punceva, M., Hauswirth, M., & Schmidt, R. (2002). Improving data access in P2P systems. IEEE Internet Computing, 6(1), 5-67. Acken, J. M. (1998). How watermarking adds value to digital content. Communications of the ACM, 41(7), 74-77. Adida, B., & Birbeck, M. (2006). RDFa primer 1.0: Embedding RDF in XHTML. Retrieved March 13, 2008, from http://www.w3.org/TR/xhtml-rdfa-primer/ Adler, A., Gujar, A., Harrison, B. L, O’Hara, K., & Sellen, A. (1998). A diary study of work-related reading: Design implications for digital reading devices. In CHI (pp. 241-248). Agirre, E., & Lopez de Lacalle, O. (2004). Publicly available topic signatures for all WordNet nominal senses. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal. Agirre, E., & Lopez de Lacalle, O. (2005). Clustering wordnet word senses. In Proceedings of Recent Advances in Natural Language Processing III.
Agirre, E., & Rigau, G. (1995). A proposal for word sense disambiguation using conceptual distance. In Proceedings of Recent Advances in Natural Language Processing, Tzigov Chark, Bulgary (pp. 258-264). Agirre, E., Ansa, O., Martinez, D., & Hovy, E. (2001). Enriching WordNet concepts with topic signatures. In Proceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations. In conjunction with the second meeting of the North American Chapter of the Association of Computational Linguistics. Pittsburg, Pennsylvania. Ajchrzak, A., Wagner, C., & Yates, D. (2006). Corporate wiki users: Results of a survey. In Proceedings of the 2006 International Symposium on Wikis (pp. 99-104). Odense, Denmark: ACM Press. Alfonseca, E., & Manandhar, S. (2002). Distinguishing instances and concepts in wordnet. In Proceedings of the First International Conference on General WordNet, Mysore, India. Alfonseca, E., & Manandhar, S. (2002). Improving an ontology refinement method with hyponymy patterns. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain. Alfonseca, E., & Manandhar, S. (2002). Extending a lexical ontology by a combination of distributional semantics signatures. In A. Gomez-Perez & R. Benjamins (Eds.), Proceedings of the Thirteenth International Conference on Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, Siguenza, Spain (LNAI 2473, pp. 1-7). Alfonseca, E., Castells, P., Okumura, M., & Ruiz-Casado, M. (2006). A rote extractor with edit distance-based
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Compilation of References
generalisation and multi-corpora precision calculation. In Proceedings of the Poster Session of the Annual Meeting of the Association of Computational Linguistics, Sidney, Australia. Alfonseca, E., Moreno-Sandoval, A., Guirao, J.M., & Ruiz-Casado, M. (2006). The wraetlic NLP suite. In 5th International Conference on Language Resources and Evaluation, Genova, Italy. Alier, M. (2006, May). El wiki, un camino hacia el trabajo colaborativo en el aula. Bits Espiral, 6, 38-43. Retrieved March 5, 2008, from http://espiral.xtec.net/images/stories/bits/bits6.pdf Anca, Ş. (2007). MaTeSearch, a combined math and text search engine. Bachelor’s thesis, Jacobs University Bremen. Androutsopoulos, I., & Aretoulaki, M. (2003). Natural language interaction. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 136-156). New York: Oxford University Press Inc. Anjomshoaa, A., Manh Nguyen, T., Shayeganfar, F., & Tjoa, A. (2006). Web service based business processes automation using semantic personal information managements Systems – the semantic life case. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 1-12). Vienna, Austria: Springer. Ankolekar, A., Burstein, M., Hobbs, J., Lassila, O., McDermott, D., Martin, D., et al. (2002). DAML-S: Web service description for the Semantic Web. In First International Semantic Web Conference (ISWC) Proceedings (pp. 348-363).
for the Semantic Web, CEUR Workshop Proceedings, CEUR-WS (pp. 135). Auer, S., & Herre, H. (2006). A versioning and evolution framework for RDF knowledge bases. In Proceedings of Ershov Memorial Conference. Aumüller, D. (2005, March). SHAWN: Structure helps a wiki navigate. In W. Müller & R. Schenkel (Eds.), Proceedings of the BTW-workshop WebDB Meets IR. Retrieved March 4, 2008, from http://dbs.uni-leipzig. de/~david/2005/aumueller05shawn.pdf Aumüller, D. (2005). Semantic authoring and retrieval within a wiki (WikSAR). In Demo Session at the Second European Semantic Web Conference (ESWC2005). Retrieved March 2, 2008, from http://wiksar.sf.net Aumüller, D., & Auer, S. (2005, November). Towards a semantic wiki experience – desktop integration and interactivity in WikSAR. In Proceedings of 1st Workshop on the Semantic Desktop – Next Generation Personal Information Management and Collaboration Infrastructure, (pp. 212-217). Galway, Ireland. Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books. Bächle, M. (2006). Social Software. Informatik Spektrum, 29(2), 121-124. Baraniuk, R., Burrus, C., Hendricks, B., Henry, G., III, A. H., Johnson, D., Jones, D. et al. (2002). ConneXions: DSP education for a networked world. In Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ’02); IEEE International Conference, ICASSP Conference Proceedings (vol. 4, pp. 4144-4147).
Apache Lucene. (2007). The Apache Lucene open source project. Retrieved March 5, 2008, from http://lucene. apache.org/
Bardram, J. E., Bunde-Pedersen, J., & Soegaard, M. (2006). Support for activity-based computing in a personal computing operating system. In CHI06 (2006) (pp. 211-220). Montreal, Quebec, Canada.
Arens, Y., Chee, C., Hsu, C.-N., & Knoblock, C. (1993). Retrieving and integrating data from multiple information sources. International Journal on Intelligent and Cooperative Information Systems, 2(2), 127-158.
Barreau, D., & Nardi, B. A. (1995). Finding and reminding: File organization from the desktop. SIGCHI Bulletin, 27(3), 39-43.
Arevalo, M., Civit, M., & Marti, M.A. (2004) MICE: A module for named entity recognition and clasification. International Journal of Corpus Linguistics, 9(1), 53-68. Attwell, G. (2007). The personal learning environments - the future of e-learning? eLearning Papers, 2(1). ISSN 1887-1542. Auer, S. (2005, May 30). Powl – a Web based platform for collaborative Semantic Web development. In Scripting
Barrett, H.C. (2005). Researching electronic portfolios and learner engagement (Tech. Rep., The REFLECT Initiative). Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., & Stein, L. A. (2004). Owl Web ontology language: Reference. Retrieved March 13, 2008, from http://www.w3.org/ TR/owl-ref/
Compilation of References
Beckett, D. (2004). Turtle—terse RDF triple language. Retrieved March 13, 2008, from http://www.dajobe. org/2004/01/turtle/
Berners-Lee, T., Hendler, J., & Lassila, O. (2001, August). Mein computer versteht mich. Spektrum derWissenschaft, 8, 42-49.
Bellotti, V., Dalal, B., Good, N., Flynn, P., Bobrow, D. G., & Ducheneaut, N. (2004). What a to-do: Studies of task management towards the design of a personal task list manager. In CHI (pp. 735-742).
Beyer, H., & Holtzblatt, K. (1997). Contextual design: Defining customer-centered systems. Morgan Kaufmann.
Bellotti, V., Ducheneaut, N., Howard, M., & Smith, I. (2003). Taking e-mail to task: The design and evaluation of a task management centered e-mail tool. In CHI (pp. 345-352). Benjamins, R., & Fensel, D. (1998). Community is knowledge! In (KA)2. In Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling, and Management, Banff, Canada. Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising wordnet domains hierarchy: Semantics, coverage, and balancing. In Proceedings of the Twentieth International Conference on Computational Linguistics, Workshop on Multilingual Linguistic Resources, Geneva, Switzerland (pp. 101-108). Bereiter, C., & Scardamalia, M. (1985). Cognitive coping strategies and the problem of inert knowledge. In S. Chipman, J. Seagal, & R. Glaser (Eds.), Thinking and learning skills. LEA. Berghel, H. (1999). Digital village: Digital publishing. Communications of the ACM, 42(1), 19-23. Berland, M., & Charniak, E. Finding parts in very large corpora. In Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland (pp.57-64). Berners-Lee, T. (1998). What the Semantic Web can represent. Retrieved March 9, 2008, from http://www. w3.org/DesignIssues/RDFnot.html Berners-Lee, T. (2003, May 20-24). Keynote address: Fitting it all together. In 12th World Wide Web Conference, Budapest, Hungary. Berners-Lee, T., & Fischetti, M. (1999). Weaving the Web: The original design and ultimate destiny of the World Wide Web, by its Inventor. San Francisco, CA: Harper. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American Online. Retrieved April 23, 2008, from http://www.sciam.com/article. cfm?id=the-semantic-web
Bird, F. (2005). Some ideas to improve tags use in social software, flat hierarchy versus categories in social software. Bizer, C., & Seaborne, A. (2004). D2RQ - treating nonRDF-databases as virtual RDF graphs. In Proceedings of the 3rd International Semantic Web Conference. Hiroshima, Japan: Springer. Bizer, C., Lee, R., & Pietriga, E. (2005, November). Fresnel – a browser-independent presentation vocabulary for RDF. In End User Semantic Web Interaction Workshop at ISWC 2005. Bizer, C., Pietriga E., Karger, D., & Lee, R. (2006). Fresnel: A browser-independent presentation vocabulary for RDF. In Proc. 5th Int. Semantic Web Conf. (ISWC). Black, W.J., & Vasilakopoulos, A. (2002). Language-independent named entity classification by modified transformation-based learning and by decision tree induction. In Proceedings of the Sixth Workshop on Computational Language Learning, in Association with the Nineteenth International Conference on Computational Linguistics, Taipei, Taiwan (pp. 159-162). Blandford, A., & Green, T. R. G. (2001). Group and individual time management tools: What you get is not what you need. Personal and Ubiquitous Computing, 5(4), 213-230. Bloehdorn, S., Görlitz, O., Schenk, S., & Völkel, M. (2006). TagFS - tag semantics for hierarchical file systems. Bodker, K., Kensing, F., & Simonsen, J. (2004). Participatory IT design. B&T. Boehm, B. (2003). Value-based software engineering. ACM SIGSOFT: Software Engineering Notes, 28(2), 1-7. Bondarenko, O., & Janssen, R. (2005). Documents at hand: Learning from paper to improve digital technologies. In CHI (pp. 121-130). Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10, 349-373. Booch, G., Rumbaugh, J., & Jacobsen, I. (1999). The unified modeling language user guide. Addison-Wesley.
Compilation of References
Borst, W. (1997). Construction of engineering ontologies for knowledge sharing and reuse. Ph. D. thesis, University of Twente, The Netherlands. Bos, B., Lie, H. W., Lilley, C., & Jacobs, I. (1998). Cascading style sheets. Level 2; CSS2 Specification. W3c recommendation, World Wide Web Consortium (W3C). Retrieved March 9, 2008, from http://www. w3.org/TR/1998/REC-CSS2-19980512 Bos, N., Olson, J. S., Gergle, D., Olson, G. M., & Wright, Z. (2002). Effects of four computer-mediated communications channels on trust development. In CHI (pp. 135-140). Botley, S., & McEnery, A.M. (Ed.). (2000). Corpus-based and computational approaches to discourse anaphora. Philadelphia, PA: John Benjamins Publishing Co. Bowman, S., & Willis, C. (2003). We media: How audiences are shaping the future of news and information. The Media Center at the American Press Institute. Retrieved March 5, 2008, from http://www.hypergene. net/wemedia/download/we_media.pdf Braganza, A., & Mollenkramer, G. J. (2002). Anatomy of a failed knowledge management initiative: Lessons from PharmaCorp’s experiences. Knowledge and Process Management, 9(1), 23-33. Braun, S., Schmidt, A., & Hentschel, C. (2007). Semantic desktop systems for context awareness - requirements and architectural implications. In 1st Workshop on Architecture, Design, and Implementation of the Semantic Desktop (SemDesk Design), 4th European Semantic Web Conference (ESWC 2007), Innsbruck, Austria. Braun, S., Schmidt, A., Walter, A., Nagypal, G., & Zacharias, V. (2007). Ontology maturing: A collaborative Web 2.0 approach to ontology engineering. In Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge at the 16th International World Wide Web Conference (WWW 07), Banff, Canada. Breslin, J. G., Harth, A., Bojars, U., & Decker, S. (2005). Towards semantically-interlinked online communities. In ESWC (pp. 500-514). Brew, C., McKelvie, D., Tobin, R., Thompson, H.S., & Mikheev, A. (2000). The XML library LT XML version 1.2. User documentation and reference guide. Retrieved March 11, 2008, from http://www.ltg.ed.ac.uk/np/ltxml/ xmldoc.html Brickley, D., & Guha, R.V. (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved March 13, 2008, from http://www.w3.org/TR/rdf-schema/
Brickley, D., & Miller, L. (2004). FOAF vocabulary specification FOAF Project. Brusilovsky, P. (2001). Adaptive hypermedia. User Modeling and User Adapted Interaction, 11(1-2). Buckland, M. K. (1992). Emmanuel Goldberg, electronic document retrieval, and Vannevar Bush’s memex. JASIS, 43(4), 284-294. Buffa, M. (2006, May). Intranet Wikis. In IntraWeb Workshop, 15th International Conference on World Wide Web, Edinburgh, Scotland. Buffa, M., & Gandon, F. (2006). SweetWiki: Semantic Web enabled technologies in wiki. In ACM Conference Wikisym. Odense. Burstein, J., Leacock., C., & Swartz, R. (2001). Automated evaluation of essays and short answers. In Proceedings of the Fifth Computer-Assisted Assessment Conference. Loughborough: Loughborough University. Bush, V. (1945, July). As we may think. The Atlantic Monthly, 176(1), 101-108. Bussler, C. (2001). B2B protocol standards and their role in semantic b2b integration engines. Bulletin of the Technical Committee on Data Engineering, 24(1), 67-72. Buswell, S., Caprotti, O., Carlisle, D. P., Dewar, M. C., Gaetano, M., & Kohlhase, M. (2004). The open math standard, version 2.0 (Tech. Rep.). The Open Math Society. Retrieved March 4, 2008, from http://www. openmath.org/standard/om20 Califf, M.E. (1998). Relational learning techniques for natural language extraction. Ph.D. thesis, University of Texas at Austin. Campanini, S.E., Castagna, P., & Tazzoli, R. (2004). Platypus wiki: A semantic wiki wiki Web. In Semantic Web Applications and Perspectives, Semantic Web Workshop. Campbell, C., & Maglio, P. (2003). Supporting notable information in office work. In CHI (pp. 902-903). Canfield Smith, D., Irby, C., Kimball, R., Verplank, B., & Harslem, E. (1982). Designing the Star User Interface. Byte, 4, 242-282. Caraballo, S.A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, (pp. 120-126).
Compilation of References
Cardie, C. (1993) A case-based approach to knowledge acquisition for domain-specific sentence analysis. In Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 798-803). Carlisle, D., Ion, P., Miner, R., & Poppelier, N. (Eds.) (2003). Mathematical markup language (MathML) version 2.0 (2nd ed.). (W3C Recommendation). World Wide Web Consortium. Retrieved March 4, 2008, from http://www.w3.org/TR/MathML2 Carreras, X., Marquez, L., & Padro, L. (2003). A simple named entity extractor using adaboost. In W. Daelemans & M. Osborne (Eds.), Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, Canada, (pp. 152-155). Castillo, J. (1999, November). Trabajo colaborativo en comunidades virtuales. El profesional de la información, 8(11), 40-47. Chakrabarti, D., Narayan, D.K., Pandey, P., & Bhattacharyya, P. (2002). Experiences in building the indo wordnet: A wordnet for hindi. In Proceedings of the First International Conference on General WordNet, Mysore, India. Chan, C., Chen, L.-L., & Geng, L. (2000). Knowledge engineering for an intelligent case-based system for help desk operations. Expert Systems with Applications, 18(2), 125-132. Chat, C., & Nahaboo, C. (2006). Let’s build an intranet at ILOG like the Internet! In IntraWeb Workshop, WWW Conference, Edinburgh. Chawathe, S. S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J. D. et al. (1994). The TSIMMIS project: Integration of heterogeneous information sources. IPSJ, 7-18. Chen, C., & Czerwinski, M. (1998, June). From latent semantics to spatial hypermedia — an integrated approach. In Proceedings of the 9th ACM Conference on Hypertext (Hypertext ‘98), Pittsburgh, PA. Retrieved March 9, 2008, from www.pages.drexel.edu/~cc345/ papers/ht98.pdf
on Recent Advances in Natural Language Processing, Borovets, Bulgary. Cimiano, P., Handschuh, S., & Staab, S. (2004b). Towards the self-annotating Web. In Proceedings of the Thirteenth World Wide Web Conference, New York City, New York. Cimiano, P., Hotho, A., & Staab, S. (2004a). Clustering concept hierarchies from text. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal. Cimiano, P., Reyle, U., & Saric, J. (2005). Ontologydriven discourse analysis for information extraction. Data Knowledge and Engineering, 55(1), 59-83. CNX. (2006). cOnneXiOnS. Retrieved March 9, 2008, from http://www.cnx.org Cole, R. J., Eklund, P. W., & Stumme, G. (2000). CEMvisualisation and discovery in e-mail. In Principles of Data Mining and Knowledge Discovery (pp. 367-374). Cole, R., & Stumme, G. (2000). CEM - a conceptual e-mail manager. In Iccs (pp. 438-452). Collison, C., & Parcell, G. (2005). Learning to Fly (2nd ed.). Capstone, UK. Contreras, J., Benjamins, R., Martin, F., Navarrete, B., Aguado, G., Alvarez, I., et al. (2003). Annotation tools and services (Esperonto Project deliverable 31). Madrid, Spain: Universidad Politecnica de Madrid. Cooper, A., & Reimann, R. (2003). About face 2.0: The essentials of Interaction Design. John Wiley and Sons. Copestake, A. (1992). The acquilex lkb: Representation issues in semi-automatic acquisition of large lexicons. In The Third Conference on Applied Natural Language Processing, Trento, Italy. Corby, O., Dieng-Kuntz, R, & Faron-Zucker, C. (2004). Querying the Semantic Web with the CORESE search enginee. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI’2004), Valencia (pp. 705-709). IOS Press.
Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In Lexical Acquisition: Exploiting Online Resources to Build a Lexicon (pp. 115-164).
Cosley, D., Frankoswki, D., Terveen, L., & Riedl, J. (2006). Using intelligent task routing and contribution review to help communities build artifacts of lasting value. In CHI06 (2006) (pp.32-41), Montreal, Quebec, Canada.
Cimiano, P., & Volker, J. (2005). Towards large-scale, open-domain and ontology-based named entity classification. In Proceedings of the International Conference
Cost, R., Finin, T., & Joshi, A. (2002). Ittalks: A case study in the Semantic Web and DAML. In Proceedings of the International Semantic Web Working Symposium (pp. 40-47).
Compilation of References
Cross, J., & O’Driscoll, T. (2005). Workflow learning gets real. Training Magazine, (2). Cunningham, H., Bontcheva, K., Tablan, V., & Wilks, Y. (2000). Software infrastructure for language resources: A taxonomy of previous work and a requirements analysis. In Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., & Ursu, C. (2002). The GATE userGuide. Cutrell, E., & Dumais, S. T. (2006). Exploring personal information management. Commun. ACM, 49(1), 50-51. Dahlgren, K.G. (2000). Naive semantics for natural language understanding. Boston, MA: Kluwer Academic Publishers. Dai, L., Lutters, W. G., & Bower, C. (2005). Why use memo for all? Restructuring mobile applications to support informal note taking. In CHI (pp. 1320-1323). Danis, C., Kellogg, W. A., Lau, T., Dredze, M., Stylos, J., & Kushmerick, N. (2005). Managers’ e-mail: Beyond tasks and to-dos. In CHI ’05: CHI ’05 Extended Abstracts On Human Factors in Computing Systems, New York, NY (pp. 1324-1327). ACM Press. Davis, J., Kay, J., Kummerfeld, B., Poon, J., Quigley, A., Saunders, G., et al. Using workflow, user modeling and tutoring strategies for just-in-time document delivery. Journal of Interactive Learning, 4, 131-148. de Chernatony, L., Harris, F., & Riley, F. D. (2000). Added value: Its nature, roles and sustainability. European Journal of Marketing, 34(1/2), 39-56. Dean, M., Connolly, D., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., et al. (Eds.). (2004). OWL Web ontology language reference. Retrieved March 13, 2008, from http://www.w3.org/TR/owl-ref/ DeBoni, M., & Manandhar, S. (2002). Automated discovery of telic relations in wordnet. In Proceedings of the First International Conference of General WordNet, Mysore, India. Decker, B., Ras, E., Rech, J., Klein, B., & Hoecht, C. (2005). Self-organized reuse of software engineering knowledge supported by semantic wikis. In Proceedings of the Workshop on Semantic Web Enabled Software Engineering (SWESE), ISWC Galway. Decker, S. (2006). The social semantic desktop: Next generation collaboration infrastructure [Special issue:
APE2006 academic publishing in Europe: The role of information in science and society]. Information Services and Use, 26(2), 139-144. Decker, S., Erdmann, M., Fensel, D., & Studer, R. (1998). Ontobroker: Ontology based access to distributed and semi-structured information. In DS-8: Proceedings of the IFIP TC2/WG2.6 8th Working Conference on Database Semantics - Semantic Issues in Multimedia Systems (pp. 351-369). Deventer, The Netherlands: Kluwer. del.icio.us. (2006). Retrieved March 9, 2008, from http://del.icio.us/ Dello, K., Tolksdorf, R., & Paslaru E. (2005). Makna Free University of Berlin. Retrieved March 7, 2008, from http://www.apps.ag-nbi.de/makna/wiki/About Dietzold, S., & Auer, S. (2006, June). Access control on RDFTriple stores from a semanticwiki perspective. In Scripting for the Semantic Web, CEUR Workshop Proceedings (pp. 183). ISSN 1613-0073. Dillon, A., McKnight, C., & Richardson, J. (1993). Space — the final chapter or why physical representations are not semantic intentions. In C. McKnight, A. Dillon, & J. Richardson (Eds.), Hypertext: A psychological perspective (pp. 169-191). Chichester: Ellis Horwood. Dittrich, J.-P. (2006). iMeMex: A platform for personal dataspace management. In SIGIR PIM. Dittrich, J.-P., & Vaz Salles, M. A. (2006). iDM: A unified and versatile data model for personal dataspace management. In VLDB, Seoul, Korea (pp. 367-378). Dixon, N. (2004, May/June). Does your organization have an asking problem? Knowledge Management Review. Doan, A., Domingos, P., & Halevy, A. (2001). Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD Conference (pp. 509-520). Dong, J. S., & Mahony, B. (1998). Active objects in TCOZ. In The 2nd IEEE International Conference on Formal Engineering Methods (ICFEM’98) (pp. 16-25). IEEE Press. Dong, J. S., Lee, C. H., Li, Y. F., & Wang, H. (2004). Verifying daml+oil and beyond in z/eves. In The 26th International Conference on Software Engineering (ICSE’04) (pp. 201-210). IEEE Press. Dong, J. S., Li, Y. F., Sun, J., Sun, J., & Wang, H. (2002). Xml-based static type checking and dynamic visualization for TCOZ. In 4th International Conference on Formal Engineering Methods (pp. 311-322). Springer-Verlag.
Compilation of References
Dong, J. S., Sun, J., & Wang, H. (2002). Semantic Web for extending and linking formalisms. In L.-H. Eriksson & P. A. Lindsay (Eds.), Proceedings of Formal Methods Europe: FME’02 (pp. 587-606). Copenhagen, Denmark: Springer-Verlag.
Erickson, T., Kellogg, W. A., Laff, M., Sussman, J. B., Wolf, T. V., Halverson, C. A. et al. (2006). A persistent chat space for work groups: The design, evaluation and deployment of loops. In Conference on Designing Interactive Systems (pp. 331-340).
Dong, J. S., Sun, J., & Wang, H. (2002). Z approach to Semantic Web services. In International Conference on Formal Engineering Methods (ICFEM’02), Shanghai, China. LNCS, Springer-Verlag.
Eronen, J., & Röning, J. (2006, June). Graphingwiki – a semantic wiki extension for visualising and inferring protocol dependency. In M. Völkel, S. Schaffert, & S. Decker (Eds.), Proceedings of the 1st Workshop on Semantic Wikis, European Semantic Web Conference 2006 (CEUR Workshop Proceedings), Budva, Montenegro (Vol. 206, pp. 1-15).
Dourish, P. (2003). Where the action is: The foundations of embodied interaction. MIT Press. Dow, K. E., Hackbarth, G., & Wong, J. (2006). Enhancing customer value through IT investments: A NEBIC perspective. The DATA BASE for Advances in Information Systems, 37(2/3), 167-175. Downes, S. (2005). E-learning 2.0. Retrieved March 9, 2008, from http://elearnmag.org. eLearn Magazine Drozdzynski, W., Krieger, H.U., Piskorski, J., & Schafer, U. (2005). SProUT – a general-purpose NLP framework integrating finite-state and unification-based grammar formalisms. In 5th International Workshop on FiniteState Methods and Natural Language Processing, Helsinki, Finland. Duke, R., & Rose, G. (2000). Formal object oriented specification using object-z. Cornerstones of Computing. Macmillan. Dumais, S., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins, D. C. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. In SIGIR (pp. 72-79). Dunning, T (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61-74. Duric, V. (2007, January). Automatization of text categorization based on unlabeled documents. Master thesis, University of Oslo, Institute of Informatics. Efimova, L., & de Moor, A. (2005). Beyond personal Webpublishing: An exploratory study of conversational blogging practices. In HICSS, (p. 107a).
Esselborn-Krumbiegel, H. (2002). Von der Idee zum Text. Eine Anleitung zum wissenschaftlichen Schreiben., Utb. Falkman, G. (2003). Issues in structured knowledge representation a definitional approach with application to case-based reasoning and medical informatics. Ph.D. thesis, Chalmers University of Technology, Göteborg University. Farmer, J., Lindstaedt, S., Droschl, G., & Luttenberger, P. (2004, April 2-3). AD-HOC – work-integrated technology-supported teaching and learning. In 5th International Conference on Organisational Knowledge, Learning, and Capabilities, Innsbruck. Farmer, W., Guttman, J., & Thayer, X. (1992). Little theories. In D. Kapur (Ed.), Proceedings of the 11th Conference on Automated Deduction, Saratoga Springs, NY (LNCS, Vol. 607, pp. 467-581). Springer Verlag. Faure, D., & Nedellec, C. (1998). A corpus-based conceptual clustering method for verb frames and ontology acquisition. In Proceedings of the Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, First International Conference on Language Resources and Evaluation, Granada, Spain. Fensel, D. et al. (1994). Integrating semiformal and formal methods in knowledge-based systems development. In Proceedings of the Japanese Knowledge Acquisition Workshop (JKAW-94), Hitachi, Japan (pp. 73-89).
El-Ansary, S., Alima, L., Brand, P., & Haridi, S. (2003). Efficient broadcast in structured peer-to-peer networks. In 2nd International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA (pp. 304-314).
Fernández García, N., Blázquez del Toro, José M., AriasFisteus, J., Sánchez, L., Sintek, M., Bernardi, A., et al. (2006). NEWS: Bringing Semantic Web technologies into news agencies. In Proceedings of the 5th International Semantic Web Conference, ISWC 2006: Vol. 4273 (pp. 778-791). Athens, Georgia, USA.
Engels, R., & Bremdal, B. (2000). Information extraction: State-of-the-art report (On-To-Knowledge Project, deliverable 5). Asker, Norway: CognIT a.s.
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., et al. (1999). Hypertext transfer protocol – HTTP/1.1 (RFC 2616). The Internet Society.
Compilation of References
Finin, T., Fritzson, R., McKay, D., & R. McEntire. (1994). KQML as an agent communication language. In N. Adam, B. Bhargava, & Y. Yesha (Eds.), Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM’94) (pp. 456-463). Gaithersburg, MD: ACM Press. Finkelstein-Landau, M., & Morin, E. Extracting semantic relationships between terms: Supervised vs. unsupervised methods. In Proceedings of the International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany, (pp. 71-80). Firth, J. (1957). Papers in linguistics 1934-1951. London: Oxford University Press. Fischer, J., Gantner, Z., Rendle, S., Stritt, M., & SchmidtThieme, L. (2006). Ideas and improvements for semantic wikis. In ESWC (pp. 650-663). Fitzgerald, M. (2003, November). Trash your desktop. MIT Technology Review, 42-46. Fjellheim, R., & Norheim, D. (2006, March). Semantic support for knowledge recall in integrated operations. In Semantic Technology Conference, San Jose, CA. Flickr. (2007). Project home page. Retrieved March 9, 2008, from http://www.flickr.com Florescu, D., Levy, A., & Mendelzon, A. (1998). Database techniques for the World Wide Web: A survey. Association for Computing Machinery, Special Interest Group on Management of Data Record, 27(3), 59-74. Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on Natural Language Learning (pp. 168-171). Edmonton, Canada. Florian, R., Jing, H., Kambhatla, N., & Zitouni, I. (2006). Factorizing complex models: A case study in mention detection. In Proceedings of the 21st International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 473-480). Foerster, H. v. (1993). Wissen und Gewissen. Versuch einer Brücke. Frankfurt am Main, Germany: Suhrkamp. Foo, S., Hui, S. C., Leong, P. C., & Liu, S. (2000). An integrated help support for customer services over the World Wide Web: A case study. Computers in Industry, 41(2), 129-145. Franz, T., & Staab, S. (2005). SAM: Semantics aware instant messaging for the networked semantic desktop.
0
In Proceedings of the 1st Workshop On The Semantic Desktop. 4th International Semantic Web Conference (pp. 167-181). Galway, Ireland. Franz, T., Staab, S., & Arndt, R. (2007). The X-COSIM Integration Framework for a seamless semantic desktop. In Proceedings of the Fourth International ACM Conference on Knowledge Capture (K-CAP 2007) (pp. 143-150). Freitag, D. (2000). Machine learning for information extraction in informal domains. Machine Learning, 39(2-3), 169-202. Freitag, D., & McCallum, A. (2000). Information extraction with HMM structures learned by stochastic optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence (pp. 584-589). Austin, TX: AAAI Press/The MIT Press. Friedman, B. (1996). Value-sensitive design. Interactions, ACM, 3(6) 16-23. Friedman, M., & Weld, D. (1997). Efficiently executing information-gathering plans. IJCAI, 1, 785-791. Gal, A., Modica, G., & Jamil, H. (2004). OntoBuilder: Fully automatic extraction and consolidation of ontologies from Web sources. ICDE, 853. Gale, W.A., Church, K.W., & Yarowsky, D. (1993). A method for disambiguating word senses in a lare corpus. Computers and the Humanities, 26, 415-439. Ganter, B., & Wille, R. (1999). Formal concept analysis - mathematical foundations. Springer Verlag. Garrett, J. J. (2005). AJAX: A new approach to Web applications. Retrieved March 13, 2008, from http://www. adaptivepath.com/publications/essays/archives/000385. php Gazendam, L., Malaisé, V., & Brugman, H. (2006). Deriving semantic annotations of an audiovisual program from contextual texts. In Proceedings of the Semantic Web Annotation of Multimedia Workshop (SWAMM’06). Retrieved March 7, 2008, from http://www.cs.vu.nl/guus/ papers/Gazendam06a.pdf Géczy, P., Izumi, N., Akaho, S., & Hasida, K. (2006). Extraction and analysis of knowledge worker activities on Intranet. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 73-85). Vienna, Austria: Springer.
Compilation of References
Gemmell, J., Bell, G., Lueder, R., Drucker, S. M., & Wong, C. (2002). Mylifebits: Fulfilling the memex vision. In ACM Multimedia (pp. 235-238). Ghallab, M., et al. (1998). PDDL-the planning domain definition language V. 2 (Tech. Rep. TR-98-003/DCS TR-1165). Yale Center for Computational Vision and Control. Goker, M., & Roth-Berghofer, T. (1999). The development and utilization of the case-based help-desk support system Homer. Engineering Applications of Artificial Intelligence, 12(6), 665-680. Goland, Y., Whitehead, E., Faizi, A., Carter, S.R., & Jensen, D. (1999). HTTP extensions for distributed authoring – WebDAV (RFC 2518). The Internet Society. Golbeck, J., & Hendler, J. (2005). Inferring trust relationships in Web-based social networks. ACM Transactions on Internet Technology. Grishman, R. (1998). TIPSTER architecture. Retrieved March 11, 2008, from http://cs.nyu.edu/cs/faculty/grishman/tipster.html Grishman, R. (2005). JET: Java extraction toolkit. Retrieved March 11, 2008, from http://cs.nyu.edu/grishman/ Grönross, C. (1997). Value-driven relational marketing: From products to resources and competencies. Journal of Marketing Management, 13, 407-419. Grozea, C. (2004). Finding optimal parameter settings for high performance word sense disambiguation. In Proceedings of the third workshop on the evaluation of systems for the semantic analysis of text, Barcelona, Spain. Gruber, T. (1993). A translation approach to portable ontology specifications. Knowledge Acquisiton, 5(2), 199-220. Gruber. T. (2005). Folksonomy of ontology: A mash-up of apples and oranges. In 1st Online Conference on Metadata and Semantics Research (MTSR’05). Retrieved March 7, 2008, from http://mtsr.sigsemis.org/ Grudin, J. (1990). Interface. In Proceedings of the 1990 ACM Conference on Computer Supported Cooperative Work (pp. 269-278). Los Angeles, CA: ACM Press. Guthrie, J.A., Guthrie, L., Wilks, Y., & Aidinejad, H. (1991). Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting on Association for Computational Linguistics, Berkeley, California (pp. 146-152).
Gütl, C., Pivec, M., Trummer, C., Garcia-Barrios, V.M., Mödritscher, F., Pripfl, J., et al. (2005). AdeLe (adaptive e-learning with eye-tracking): Theoretical background, system architecture and application scenarios. European Journal of Open, Distance and E-Learning. Haag, S., Cummings, M., & McCubbrey, D. J. (2002). Management information systems for the information age (3rd ed.). Irwin McGraw-Hill. Haase, P., Broekstra, J., Ehrig, M., Menken, M., Mika, P., Olko, M. et al. (2004). Bibster – a semantics-based bibliographic peer-to-peer system. In The Semantic Web – ISWC 2004 (pp. 122-136). Springer. LNI 3298. Hädrich, T., & Priebe, T. (2005). Supporting knowledge work with knowledge stance-oriented integrative portals. In European Conference on Information Systems. Halevy, A., Etzioni, O., Doan, A., & Ives, Z. (2003). Jayant Madhavan, Luke McDowell, Igor Tatarinov: Crossing the structure chasm. CIDR.. Haller, H. (2002). Mappingverfahren zur Wissensorganisation. Diploma thesis at Freie Universität Berlin. Haller, H. (2003). Mappingverfahren zur Wissensorganisation. Thesis. Published at KnowledgeBoard Europe. Retrieved March 9, 2008, from www.heikohaller.de/literatur/diplomarbeit/mapping_wissorg_haller.pdf Haller, H., Kugel, F., & Völkel, M. (2006, June). I-mapping wikis – towards a graphical environment for semantic knowledge management. In M. Völkel, S. Schaffert, & S. Decker (Eds.), Proceedings of the 1st Workshop on Semantic Wikis, European Semantic Web Conference 2006 (CEUR Workshop Proceedings), Budva, Montenegro (Vol. 206). Hammersley, B. (2003). Content syndication with RSS. O’Reilly Associates. Hammond, T., Hannay, T., Lund, B., & Scott, J. (2005, April). Social bookmarking tools, a general review. DLib Magazine, 11(4). Hamp, B., & Feldweg, H. (1997). GermaNet – a lexicalsemantic net for German. In Proceedings of the ACL Workshop on Automatic Information Extraction and Building of Lexical Resources for NLP Applications, Madrid, Spain. Handel, M., & Herbsleb, J. D. (2002). What is chat doing in the workplace? In CSCW (pp. 1-10). Harabagiu, S., & Moldovan, D. (1998). Knowledge processing in an extended wordnet. In WordNet: An Electronic Lexical Database (pp. 379-405). MIT Press.
Compilation of References
Harabagiu, S., & Moldovan, D. (2003). Question answering. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 560-582). New York: Oxford University Press Inc.
Henry, G. (2004). ConneXions: An alternative approach to publishing. In ECDL 2004 European Conference on Digital Library (pp. 421-431). University of Bath, United Kingdom.
Harabagiu, S., Miller, G., & Moldovan, D. (1999). Wordnet 2 - a morphologically and semantically enhanced resource. In Proceedings of the SIGLEX Workshop on Multilingual Lexicons, Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland.
Hepp, M., Bachlechner, D., & Siorpaes K. (2005). Ontowiki: Community-driven ontology engineering and ontology usage based on wikis. In Proceedings of the 2005 International Symposium on Wikis (WikiSym 2005).
Haslhofer, B. (2006). A service oriented approach for integrating metadata from heterogeneous digital libraries. In Proceedings of the 1st International Workshop on Semantic Information Integration on Knowledge Discovery. Yogyakarta, Indonesia: Austrian Computer Society Book Series. Hazaël-Massieux, D., & Connolly, D. (2005). Gleaning resource descriptions from dialects of languages (GRDDL). In World Wide Web Consortium (W3C). Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 46(4). Hearst, M.A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fifteenth International Conference on Computational Linguistics, Nantes, France. Hearst, M.A. (1998). Automated discovery of wordnet relations. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp.132-152). MIT Press. Hearst, M.A. (2003). Text data mining. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 616-628). New York: Oxford University Press Inc. Heid, H. (2004). Kann man zur Verantwortlichkeit erziehen? Über Bedingungen der Möglichkeit verantwortlichen Handelns. In J. H. M. Winkler (Ed.), Die aufgegebene Aufklärung: Experimente pädagogischer Vernunft, Beiträge zur pädagogischen Grundlagenforschung (pp. 145-154). Juventa Verlag Weinheim und München. Heijst, G. V., Schreiber, A. T., & Wielinga, B. J. (1997). Using explicit ontologies in KBS development. Int. J. Hum.-Comput. Stud., 46, 183-292. Hendler, J. (2006). The dark side of the Semantic Web. Retrieved March 13, 2008, from http://www.mindswap. org/blog/2006/12/13/the-dark-side-of-the-semanticWeb/
Herlocker, J. H., Konstan, J. A., & Riedl, J. (2000). Explaining collaborative filtering recommendations. In Proceedings of ACM 2000 Conference on Computer Supported Cooperative Work (pp. 241-250). Herring, S. C. (1999). Interactional coherence in cmc. In HICSS ’99: Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences, Washington, DC (Vol. 2, pp. 2022). IEEE Computer Society. Herring, S. C., Scheidt, L. A., Wright, E., & Bonus, S. (2005, February 1). Weblogs as a bridging genre. Information Technology & People, 18(30), 142-171. Hilf, E., Kohlhase, M., & Stamerjohanns, H. (2006). Capturing the content of physics: Systems, observables, and experiments. In J. Borwein & W. M. Farmer (Eds.), Mathematical knowledge management, MKM’06 (pp. 165-178). Springer Verlag. Hilzensauer, W., Hornung-Prähauser, V., & Schaffert, S. (2006, September). Requirements for a personal development planning in e-portfolios supported by Semantic Web technology. In 6th International Conference on Knowledge Management (I-KNOW06), Graz, Austria. Hindle, D. (1990). Noun classification from predicateargument structures. In Proceedings of the 28th Conference of the Association for Computational Linguistics (pp. 268-275). Holi, M., Hyvönen, E., & Lindgren, P. (2006). Integrating tf-idf weighting with fuzzy view-based search. In Proceedings of the ECAI Workshop on Text-Based Information Retrieval (TIR-06). Retrieved March 7, 2008, from http://www.uni-weimar.de/medien/webis/ research/tir/tir-06/ Hutter, D. (2004, July 5). Towards a generic management of change. In C. Benzmuller & W. Windsteiger (Eds.), Computer-supported mathematical theory development (pp. 7-18). Proceedings of the first Workshop on Computer-Supported Mathematical Theory Development held in the frame of IJCAR’04 in Cork, Ireland. ISBN 3902276-04-5. Retrieved March 4, 2008, from http://www. risc.uni-linz.ac.at/about/conferences/IJCAR-WS7/
Compilation of References
Hyvönen E., Ruotsalo T., Häggström T., Salminen M., Junnila M., Virkkilä M. et al. (2006). CultureSampo— Finnish culture on the Semantic Web: The vision and first results. In Developments in Artificial Intelligence and the Semantic Web - Proceedings of the 12th Finnish AI Conference STeP 2006. Retrieved March 7, 2008, from http://www.stes.fi/scai2006/proceedings/proceedings.pdf Hyvönen, E., & Mäkelä, E. (2006). Semantic autocompletion. In Proceedings of the 1st Asian Semantic Web Conference (ASWC2006). Springer-Verlag. Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S. et al., (2005). MuseumFinland—Finnish museums on the Semantic Web. Journal of Web Semantics, 3(2-3), 224-241. Hyvönen, E., Valo, A., Komulainen, V., Seppälä, K., Kauppinen, T., Ruotsalo, T. et al. (2005). Finnish national ontologies for the Semantic Web—towards a content and service infrastructure. In Proceedings of International Conference on Dublin Core and Metadata Applications (DC 2005). Ide, N., & Romary, L. (2003). Outline of the international standard linguistic annotation framework. In Proceedings of the 41st Meeting of the Association of Computational Linguistics - Workshop on Linguistic Annotation: Getting the Model Right, Sapporo, Japan, (pp. 1-5). Ide, N., & Veronis, J. (1998). Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24, 1-40. Isaacs, E., Walendowski, A., Whittaker, S., Schiano, D. J., & Kamm, C. A. (2002). The character, functions, and styles of instant messaging in the workplace. In CSCW (pp. 11-20). ISO. (2003). The Dublin core metadata element set. International Standard ISO 15836-2003. Isozaki, H., & Kazawa, H. (2002). Efficient support vector classifiers for named entity recognition. In Proceedings of the nineteenth international conference on Computational Linguistics, Morristown, New Jersey (pp. 1-7). JENA. (2007). HP labs Semantic Web research. Retrieved March 5, 2008, from http://jena.sourceforge.net/ Jian, N., Hu, W., Cheng, G., & Qu, Y. (2005). FalconAO: Aligning ontologies with falcon. In Integrating Ontologies. Jones, S. R., & Thomas, P. J. (1997). Empirical assessment of individuals’ personal information management systems. Beh. & Inf. Techn., 16(3), 158-160.
Kai, H., Raman, P., Carlisle, W., & Cross, J. (1996). A self-improving help desk service system using casebased reasoning techniques. Computers in Industry, 30(2), 113-125. Kando, N. (1997). Text-level structure of research papers: Implications for text-based information processing systems. In BCS-IRSG Annual Colloquium on IR Research, Workshops in Computing, BCS. Kane, B., & Luz, S. (2006). Multidisciplinary medical team meetings: An analysis of collaborative working with special attention to timing and teleconferencing. Computer Supported Cooperative Work (CSCW), V15(5), 501-535. Kaplan, R.M. (2003). Syntax. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 7090). New York: Oxford University Press Inc. Karger, D., Bakshi, K., Huynh, D., Quan, D., & Sinha, V. (2005). Haystack: A general-purpose information management tool for end users based on semistructured data. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (pp. 13-26). Asilomar, CA: Online Proceedings. Karger, D.R., & Quan, D (2004). What would it mean to blog on the Semantic Web. In Proceedings of the 3rd International Semantic Web Conference (ISWC04), Hiroshima, Japan. Springer-Verlag. Kaulgud, V.S., & Dolas, R. (2006). DKOMP: A peer-topeer platform for distributed knowledge management. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 119130). Vienna, Austria: Springer. Kay, M. (2003). Introduction. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 616628). New York: Oxford University Press Inc. Kendall, G.C. (Ed.). (2006). SPARQL protocol for RDF (W3C Candidate Recommendation, 6 April 2006). W3C. Keok Lee, Y., Tou NG, H., & Kiah Chia, T. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In Proceedings of the Third Workshop on the Evaluation of Systems for the Semantic Analysis of text, Barcelona, Spain. Khare, R., & Çelik, T. (2006). Microformats: A pragmatic path to the Semantic Web. WWW ‘06. ACM Press (pp. 865-866). Khoussainov, R., & Kushmerick, N. (2005). E-mail task management: An iterative relational learning approach. In Second Conference on E-mail and Anti-Spam, CEAS.
Compilation of References
Kidd, A. (1994). The marks are on the knowledge worker. In CHI (pp. 186-191). Kietz, J., Maedche, A., & Volz, R. (2000). A method for semi-automatic ontology acquisition from a corporate intranet. In Proceedings of the Workshop Ontologies and Text, Twelfth International Conference on Knowledge Engineering and Knowledge Management, Juan-lesPins, France. Kim, M.-S., & Raja, N. S. (1991). Verbal aggression and self-disclosure on computer bulletin boards. Annual Meeting of the International Communication Association. King, R., Popitsch, N., & Westermann, U. (2007). METIS: A flexible foundation for the unified management of multimedia assets. In Multimedia Tools and Applications, 33(3), 325-349. Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., & Mylopoulos, J. (2005). Semi-automatic semantic annotations for Web documents. In Semantic Web Applications and Perspectives (SWAP 2005), Proceedings of the 2nd Italian Semantic Web Workshop, University of Trento, Trento, Italy. Klein, D., Smarr, J., Nguyen, H., & Manning, C. (2003). Named entity recognition with character-level models. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, Canada, (pp. 180-183). Klein, M. C. A., Fensel, D., Kiryakov, A., & Ognyanov, D. (2002). Ontology versioning and change detection on the Web. In A. Gómez-Pérez & V. R. Benjamins (Eds.), EKAW (pp. 197-12). Springer. Kogan, S. L., & Muller, M. J. (2006). Ethnographic study of collaborative knowledge work. IBM Systems Journal, 45(4), 759. Kohlhase, A. (2004). CPoint’s mathematical user interface. MathUI MathUI04.pdf. Workshop Math User Interfaces on MKM04.Retrieved March 9, 2008, from http://www.activemath.org/paul/MathUI/proceedings/ CPoint/CPoint. Kohlhase, A. (2005). CPoint documentation. Retrieved March 9, 2008, from http://kwarc.eecs.iu-bremen.de/ projects/CPoint/ Kohlhase, A. (2005). Overcoming proprietary hurdles: CPoint as invasive editor. In F. de Vries, G. Attwell, R. Elferink, & A. Toedt (Eds.), Open source for education in Europe: Research and practise (pp. 51-56). Heerlen: Open Universiteit of the Netherlands.
Kohlhase, A., & Kohlhase, M. (2004). CPoint: Dissolving the author’s dilemma. In A. Asperti, G. Bancerek, & A. Trybulec (Eds.), Mathematical knowledge management, MKM’04 (LNAI No. 3119, pp. 175-189). Springer Verlag. Kohlhase, A., & Kohlhase, M. (2007). Reexamining the MKM value proposition: From math Web search to math Web research. In M. Kauers, M. Kerber, R. Miner, & W. Windsteiger (Eds.), MKM/Calculemus 2007 (LNAI No. 4573, pp. 266-279). Springer Verlag. Kohlhase, M. (2006). OMDoc – an open markup format for mathematical documents (version 1.2). Springer Verlag. LNAI No. 4180. Kohlhase, M. (2007). A LATEX style for the added-value analysis. Retrieved March 9, 2008, from http://kwarc. info/projects/latex Kohlhase, M., & Şucan, I. (2006). A search engine for mathematical formulae. In T. Ida, J. Calmet & D. Wang (Eds.), Proceedings of artificial intelligence and symbolic computation, AISC’2006 (pp. 241-253). Springer Verlag. Kohlhase, M., Sutner, K., Jansen, P., Kohlhase, A., Lee, P., Scott, D. et al. (2002). Acquisition of math content in an academic setting. In Second International Conference on MathML and Technologies for Math on the Web, Chicago, IL. Retrieved April 23, 2008, from http://www. mathmlconference.org/2002/presentations/kohlhase Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464-1480. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V. et al. (2000). Self organization of a massive document collection. Neural Networks, IEEE Transactions, 11(3), 574-585. Kozareva, Z, Ferrandez, O., & Montoyo, A. (2005). Combining data-driven systems for improving named entity recognition. In Natural Language Processing and Information Systems (LNCS 3513, pp. 80-90). Springer. Krieg-Brückner, B., Lindow, A., Lüth, C., Mahnke, A., & Russell, G. (2004, September). Semantic interrelation of documents via an ontology. In G. Engels & S. Seehusen (Eds.), DeLFI 2004, Tagungsband der 2. e-Learning Fachtagung Informatik, 6.-8. Paderborn, Germany (Lecture Notes in Informatics, Vol. P-52, pp. 271-282). D69121. Heidelberg, Germany: Springer-Verlag. Retrieved March 4, 2008, from http://www.springer.de Kriegsman, M., & Barletta, R. (1993). Building a casebased help desk application. IEEE Expert: Intelligent Systems and Their Applications, 8(6), 18-26.
Compilation of References
Krishnamurthy, S. (2002). The multidimensionality of blog conversations: The virtual enactment of September 11. In Internet Research 3.0, Maastricht, The Netherlands. Krötsch, M., Vrandečić, D., & Völke, M. (2005). Wikipedia and the Semantic Web - the missing links. WikiMania 2005.
Langeland, T. (2007, May). Integrated operations on the Norwegian continental shelf. In Future Fields Summit 2007, London, England. Lansdale, M. (1988). The psychology of personal information management. Applied Ergonomics, 19(1), 55-66.
Krötzsch, M. (2008). Semantic MediaWiki. Retrieved April 22, 2008, from http://www.semantic-mediawiki. org
Lassila, O., & Swick, R. R. (1999). Resource description framework (RDF) model and syntax specification (W3C Recommendation). World Wide Web Consortium (W3C). Retrieved March 4, 2008, from http://www. w3.org/TR/1999/REC-rdf-syntax
Krötzsch, M., Vrandecic, D., & Völkel, M. (2005). Wikipedia and the Semantic Web – the missing links. In Proceedings of the WikiMania2005.
Lassila, O., & Swick, R. R. (Eds.). (2004). RDF/XML syntax specification (Rev.). Retrieved March 13, 2008, from http://www.w3.org/TR/rdf-syntax-grammar/
Krötzsch, M., Vrandecic, D., & Völkel, M. (2006). Semantic MediaWiki. In Proceedings of the 5th International Semantic Web Conference (ISWC06), Athens, GA (Vol. 4273, pp. 935-942).
Le Coche, E., Mastroianni, C., Pirrò, G., Ruffolo, M., & Talia, D. (2006). A peer-to-peer virtual office for organizational knowledge management. In Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (pp. 166-177). Vienna, Austria: Springer.
Krötzsch, M., Vrandečić, D., Page, S. et al. (2007). Semantic mediawiki development activities – ontoworld. org. Retrieved March 4, 2008, from http://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki_development_activities Krowne, A. (2003). An architecture for collaborative math and science digital libraries. Unpublished master’s thesis, Virginia Tech. Retrieved March 4, 2008, from http://scholar.lib.vt.edu/theses/available/etd-09022003150851/ Kushmerick, N., & Lau, T. (2005). Automated e-mail activity management: An unsupervised learning approach. In IUI ’05: Proceedings of the 10th International Conference On Intelligent User Interfaces, New York, NY (pp. 67-74). ACM Press. Kushmerick, N., Weld, D., & Doorenbos, R. (1997). Wrapper induction for information extraction. IJCAI, 1, 729-737. Kwaśnik, B. H. (1983). How a personal document’s intended use or purpose affects its classification in an office. In SIGIR (pp. 207-210). Lamb, R., & Davidson, E. (2005). Understanding Intranets in the context of end-user computing. ACM SIGMIS Database, 36(1), 64-85. Lange, C. (2007, March). SWiM – a semantic wiki for mathematical knowledge management (Tech. Rep. No. 5). Jacobs University Bremen. Retrieved March 4, 2008, from http://kwarc.info/projects/swim/pubs/tr-swim.pdf
Leech, G., & Weisser, M. Pragamatics and dialogue. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 136-156). New York: Oxford University Press Inc. Lehnert, W., Cardir, C., Fisher, D., McCarthy, J, Riloff, E., & Soderland, S. (1994). Evaluating an information extraction system. Journal of Integrated Computer-Aided Engineering, 1(6). Lemire, D., Boley, H., McGrath, S., & Ball, M. (2005). Collaborative filtering and inference rules for context-aware learning object recommendation. International Journal of Interactive Technology and Smart Education, 2. Lemmety, S. (1999). Review of speech synthesis technology. Unpublished Master’s thesis, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland. Lenat, D. (1995). Steps to sharing knowledge. In N. Mars (Ed.), Towards very large knowledge bases. IOS Press. Lenat, D., & Guha, R.V. (1990). Building large lnowledgebased systems. Reading, MA: Addison-Wesley. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries. In Proceedings of the Fifth International Conference on Systems Documentation (pp. 24-26). Lethi, P., & Frankhauser, P. (2004). XML data integration with OWL: Experiences and challenges. In 2004 Symposium on Applications and the Internet (SAINT 2004) (pp. 160-170).
Compilation of References
Leuf, B., & Cunningham, W. (2001). The wiki way: Collaboration and sharing on the Internet. AddisonWesley.
Macleod, C., Grishman, R., & Meyers, A. (1994). Comlex syntax reference manual. Retrieved March 11, 2008, from http://nlp.cs.nyu.edu/comlex/
Levesque, H. J., Reiter, R., Lesperance, Y., Lin, F., & Scherl, R. B. (1997). GOLOG: A logic programming language for dynamic domains. Journal of Logic Programming, 31(1-3), 59-83.
Madhavan, J., Bernstein, P., & Rahm, E. (2001). Generic schema matching with cupid. VLDB, 49-58.
Levy, A., Rajaraman, A., & Ordille, J. (1996). Querying heterogeneous information sources using source descriptions. VLDB, 251-262. Li, Y., Bontcheva, K., & Cunningham, H. (2005). Using uneven margins SVM and perceptron for information extraction. In Proceedings of the Ninth Conference on Computational Natural Language Learning, Ann Arbor, Michigan, (pp. 72-79). Lichtenstein, S. (2004). Knowledge development and creation in e-mail. In HICSS. Lieberman, H., Nardi, B. A., & Wright, D. J. (2001). Training agents to recognize text by example. Autonomous Agents and Multi-Agent Systems, 4(1-2), 79-92. Lindgaard, G. (2004). Making the business our business: One path to value-added HCI. Interactions, ACM, 11(2) 12-17. locutor: An ontology-based management of change. (2007). System homepage. Retrieved March 4, 2008, from http://www.kwarc.info/projects/locutor/ Löfberg, L., Archer, D., Piao, S., Rayson, P., McEnery, T., Varantola, K. et al. (2003). Porting an English semantic tagger to the Finnish language. In Proceedings of the Corpus Linguistics 2003 Conference, UCREL, Lancaster University (pp. 457-464). Lopez, V., Motta, E., & Uren, V. (2006). PowerAqua: Fishing the Semantic Web. In Proceedings of the Third European Semantic Web Conference (ESWC2006). Springer-Verlag. Löwgren, J., & Stolterman, E. (2004). Thoughtful interaction design: A design perspective on Information Technology. The MIT Press. Ludwig, L., O’Sullivan, D., & Zhou, X. (2004). Artificial memory prototype for personal semantic subdocument knowledge management (PS-KM). ISWC demo. Mabry, E. A. (1998). Frames and flames: The structure of argumentative messages on the net (pp. 13-26). Mackay, W. E. (1988). More than just a communication system: Diversity in the use of electronic mail. In CSCW (pp. 344-3535).
Maedche, A., & Staab, S. (2000). Discovering conceptual relations from text (Tech. Rep. 399, Institute AIFB). Karlsruhe, Germany: Karlsruhe University. Maedche, A., & Staab, S. (2001). Ontology learning for the Semantic Web. IEEE Intelligent Systems, 16(2). Maedche, A., Motik, B., Silva, N., & Volz, R. (2002). MAFRA - an ontology mapping framework in the Semantic Web. In Proceedings of the ECAI Workshop on Knowledge Transformation, Lyon, France. Magnini, B., & Cavaglia, G. (2000). Integrating subject field codes into wordnet. In Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, (pp. 1413-1418). Mahony, B., & Dong, J. S. (1999). Sensors and actuators in TCOZ. In J. Wing, J. Woodcock, & J. Davies (Eds.), FM’99: World Congress on Formal Methods (LNCS, pp. 1166-1185). Toulouse, France: Springer-Verlag. Mahony, B., & Dong, J. S. (2000). Timed communicating object Z. IEEE Transactions on Software Engineering, 26(2), 150-177. Mahony, B., & Dong, J. S. (2002). Deep semantic links of TCSP and object-z: TCOZ approach. Formal Aspects of Computing, 13, 142-160. Majchrzac, A., Wagner, C., & Yates D. (2006). Corporate wiki users: Results of a survey. In ACM Conference Wikisym 2006, Odense Denmark. Mana-Lopez, M. J. (2004). Multidocument summarization: An added value to clustering in interactive retrieval. Transactions on Information Systems, 22(2), 215-241. Mann, G., & Yarowsky, D. (2003). Unsupervised personal name disambiguation. In Proceedings of CoNLL-2003. Mann, W.C., & Thompson, S.A. (1988). Rethorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243-281. Manola, F., & Miller, E. (2004, February 10). RDF primer. W3C Recommendation. Retrieved March 5, 2008, from http://www.w3.org/TR/rdf-primer Marcus, M.P., Santorini, B., & Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: The penn treebank. Computational Linguistics, 19(2), 313-330.
Compilation of References
Martin, D., Cheyer, A., & Moran, D. (1999). The open agent architecture: A framework for building distributed software systems. Applied Artificial Intelligence,13(1/2), 91-128. Martínez, J.M., Koenen, R., & Pereira, F. (2002). MPEG7: The generic multimedia content description standard, part 1. IEEE MultiMedia, 9(2), 78-87. Masolo, C., Vieu, L., Bottazzi, E., Catenacci, C., Ferrario, R., Gangemi, A., et al. (2004). Social roles and their descriptions. In Proceedings of the Ninth International Conference on the Principles of Knowledge Representation and Reasoning, Whistler, Canada. Masui, T., & Takabayashi, S. (2003). Instant group communication with quickml. In GROUP (pp. 268-273). MathWebSearch, a semantic search engine. (2007). Web page. Retrieved March 4, 2008, from http://search. mathWeb.org Matsuo, Y., Hamasaki, M., Mori, J., Takeda, H., & Hasida, K. (2004). Ontological consideration on human relationship vocabulary for FOAF. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and Semantic Web. Maturana, H. R., & Pörksen, B. (2002). Vom Sein zum Tun. Die Ursprünge der Biologie des Erkennens. Heidelberg, Germany: Carl-Auer-Systeme Verlag. Mayfield, J., McNamee, P., & Piatko, C. (2003). Named entity recognition using hundreds of thousands of features. In W. Daelemans & M. Osborne (Ed.), Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, Canada, (pp. 184-187).
from http://www.w3.org/TR/2004/REC-owl-features20040210/ McGuinness, D., Fikes, R., Rice, J., & Wilder, S. (2000). The chimaera ontology environment. AAAI/IAAI, 11231124. McKelvie, D., Brew, C., & Thompson, H.S. (1997). Using SGML as a basis for data intensive natural language processing. Computers and the Humanities, 31(5), 367-388. Melis, E., Goguadze, G., Palomo, A. G., Frischauf, A., Homik, M., Libbrecht, P. et al. (2006). OMDoc in ActiveMath. In OMDoc – an open markup format for mathematical documents (version 1.2, Chap. 26.8). Springer Verlag. Mena, E., Illarramendi, A., Kashyap, V., & Sheth, P. (2000). OBSERVER: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. In Journal of Distributed and Parallel Databases, 8(2), 223-271. Merz, P., & Gorunova, K. (2005). Reliable multicast and its probabilistic model for job submission in peerto-peer grids. In 6th International Conference on Web Information Systems Engineering (WISE), New York (pp. 504-511). Meseguer, J. (1992). Conditional rewriting logic as a unified model of concurrency. Theoretical Computer Science, 96(1), 73-155. Meyer, P. (2004). The vanishing newspaper: Saving journalism in the information age. University of Missouri Press.
Maynard, D., Cunningham, H., Bontcheva, K., & Dimitrov, M. (2002). Adapting a robust multi-genre ne system for automatic content extraction. In Artificial Intelligence: Methodology, Systems, and applications (LNAI Vol. 2443, pp. 264-273). Springer-Verlag.
Mihalcea, R., Chkloski, T., & Kilgarriff, A. (2004). The senseval-3 English lexical sample task. In Proceedings of the Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text.
McClelland, J.L., & Rumelhart, D.E. (1981). An interactive activation of context effects in letter perception. Psychological Review, 88, 375-407.
Mika, P. (2005, November). Ontologies are us: A unified model of social networks and semantics. In Proceedings of the 4th International Semantic Web Conference (ISWC 2005), Galway, Ireland (LNCS 3729). Springer-Verlag.
McDowell, L., Etzioni, O., & Halevy, A. Y. (2004a). Semantic e-mail: Theory and applications. J. Web Sem., 2(2), 153-183.
Mikheev, A. (2002). Periods, capitalized words, etc. Computational linguistics, 28(3), 245-288.
McEnery, A., & Wilson, A. (Eds.). (2001). Corpus linguistics: An introduction. Edinburgh: Edinburgh University Press.
Mikheev, A., Grover, C., & Moens, M. (1998). Description of the ltg system used for muc-7. In Proceedings of Seventh Message Understanding Conference.
McGuinness, D. L., & Harmelen, F. van (2004, February). OWL Web ontology language overview (W3C Recommendation). W3C. Retrieved March 4, 2008,
Mikheev, A., Moens, M., & Grover, C. (1999). Named entity recognition without gazeteers. In Proceedings of the Ninth Conference of the European Chapter of the
Compilation of References
Association for Computational Linguistics, Bergen, Norway, (pp. 1-8).
on Artificial Intelligence (IJCAI’1999) (pp. 350-357). Morgan Kaufmann Press.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97.
Nardi, B. A., Whittaker, S., & Bradner, E. (2000). Interaction and outeraction: Instant messaging in action. In Conference on Computer Supported Cooperative Work (CSCW) (pp. 79-88).
Miller, G.A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41. Milner, R. (1999). Communicating with mobile agents: The pi-calculus. Cambridge University Press.
Navigli, R., & Velardi, P. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(3), 323-340.
Mitkov, R. (Ed.). (2003). The Oxford handbook of computacional linguistics. Oxford, UK: Oxford University Press.
Navigli, R., & Velardi, P. (2004). Learning domain ontologies from document warehouses and dedicated websites. Computational Linguistics, 30(2), 151-179. MIT Press.
Mohanty, S., Ray, N.B., Ray, R.C.B., & Santi, P.K. (2002). Oriya wordnet. In Proceedings of the First International Conference on General WordNet, Mysore, India.
Necula, G. (2003). What is Texpoint? Retrieved March 9, 2008, from http://raw.cs.berkeley.edu/texpoint/TexPoint.html
Möller, K., Bojars, U., & Breslin, J.G. (2006, June). Using semantics to enhance the blogging experience. In Proceedings of the 3rd European Semantic Web Conference (ESWC06), Budva, Montenegro (LNCS 4011). Springer-Verlag.
Nejdl, W., Wolf, B., Changtao, Q., Decker, S., Sintek, M., et al. (2002). EDUTELLA: A P2P networking infrastructure based on RDF. In Proceedings of the 11th International World Wide Web Conference (pp. 604-615). Honolulu, HI: ACM Press
Motik, B., Horrocks, I., & Sattler, U. (2007). Bridging the gap between OWL and relational databases. In Proc. 16th int. conf. World Wide Web (WWW).
Nelson, T. (2001). Talk at ACM conference on hypertext (Hypertext 2001). Retrieved March 9, 2008, from http:// asi-www.informatik.uni-hamburg.de/personen/obendorf/download/2003/nelson_ht01.avi.bz2
MUC7. (1998). Proceedings of the 7th Message Understanding Conference. Morgan Kaufman. Muljadi, H., & Takeda, H. (2005). Semantic wiki as an integrated content and metadata management system. In Proceedings of ISWC 2005, Galway, Ireland. Müller, N. (2006). An ontology-driven management of change. In Wissens- und Erfahrungsmanagement, LWA (Lernen, Wissensentdeckung, Aktivität) Conference Proceedings (pp. 186-193). Müller, N. (2006). OMDoc as a data format for VeriFun. In OMDoc – an open markup format for mathematical documents (version 1.2, pp. 329-332). Springer Verlag. Müller, N. (2006). An ontology-driven management of change. In Wissens- und Erfahrungsmanagement LWA (Lernen, Wissensentdeckung und Adaptivitat) Conference Proceedings (pp. 186-193). Murray, J. H. (2003). Inventing the medium. In N. Wardrip-Fruin & N. Montford (Eds.), The new media reader (pp. 3-11). Cambridge, MA: The MIT Press. Narayanan, S. (1999). Reasoning about actions in narrative understanding. In International Joint Conference
Niedzwiedzka, B. A proposed general model of information behaviour, Information Research, 9. Nielsen, C. M., Overgaard, M., Pedersen, M. B., Stage, J., & Stenild, S. (2006). It’s worth the hassle! The added value of evaluating the usability of mobile systems in the field. In Changing Roles. NordiCHI, ACM (pp.272280). Oslo, Norway. Nixon, L. J. B., & Simperl, E. P. B. (2006, November). Makna and MultiMakna: Towards semantic hypermedia capability in wikis for the emerging Web. In Y. Sure & S. Schaffert (Eds.), Proceedings of Semantics 2006: From Visions to Applications: Semantics – The New Paradigm Shift in IT (pp.83-98). Vienna, Austria. Nixon, L.J.B., & Paslaru, E. (2006). Makna and MultiMakna: Towards semantic and multimedia capability in wikis for the emerging Web. In Y. Sure & S. Schaffert (Eds.), Proceedings of the Semantics 2006: From Visions to Applications, Austria (pp. 83-98). Nonaka, I. (1994, February). A dynamic theory of organizational knowledge creation. Organization Science, 5(1), 14-37.
Compilation of References
Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating company. New York: Oxford University Press.
O’Neill, D. (1997). Software value add study. ACM SIGSOFT: Software Engineering Notes, 22(4), 11-12.
Nonnecke, B., & Preece, J. (2000). Persistence and lurkers in discussion lists: A pilot study. In HICSS ’00: Proceedings of the 33rd Hawaii International Conference on System Sciences, Washington, DC (Vol. 3, p. 3031). IEEE Computer Society.
O’Reilly, T. (2005). What is Web 2.0: Design patterns and business models for the next generation of software. Retrieved March 9, 2008, from http://www.oreillynet. com/pub/a/oreilly/tim/news/2005/09/30/what-is-web20.html
Norman, D. A. (2002). The design of everyday things. B&T.
Olsen, H. (2005). Navigation blindness, how to deal with the fact that people tend to ignore navigation tools. The Interaction Designer’s Coffee Break, 13(Q1).
Norman, D., & Draper, S. (Eds.). (1986). User centered system design: New perspectives on human-computer interaction. Lawrence Erlbaum Associates Inc, US. Normann, R., & Ramirez, R. (1998). Designing interactive strategy. From value chain to value constellation. Wiley Sons. Nottingham, M., & Sayre, R. (2005). The atom syndication format. The Internet Engineering Task Force (IETF). Novischi, A. (2002). Accurate semantic annotation via pattern matching. In Florida Artificial Intelligence Research Society Conference, Pensacola, Florida. Noy, N. (2004). Semantic integration: A survey of ontology-based approaches. In SIGMOD Rec., 33(4), 65-70. Noy, N. F., & McGuiness, D. L. (2001, March). Ontology development 101: A guide to creating your first ontology (Tech. Rep. No. KSL-01-05). Stanford: Stanford University, Stanford Knowledge Systems Laboratory. Noy, N. F., Knublauch, H., Fergerson, R., & Musen, M. A. (2004). The Protégé OWL plugin: An open development environment for Semantic Web applications. In 3rd International Conference on the Semantic Web (ISWC). Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., & Musen, M. A. (2001). Creating Semantic Web contents with Protege-2000. IEEE Intelligent Systems 16(2), 60-71.
ONCE-CS. (2005). Open network of centres of excellence in complex systems. Retrieved March 9, 2008, from http://complexsystems.lri.fr/Portal/tiki-index.php Oren, E. (2005). SemperWiki: A semantic personal wiki. In Proceedings of the 1st Workshop on the Semantic Desktop at ISWC: Vol 175, Galway, Ireland (pp. 107-122). Oren, E. (2006, November). An overview of information management and knowledge work studies: Lessons for the semantic desktop. In Proceedings of the ISWC Workshop on the Semantic Desktop. Oren, E., Breslin, J.G., & Decker, S. (2006). How semantics make better wikis. In Proceedings of the 15th International World Wide Web Conference (pp. 10711072). Edinburgh, Scotland: ACM Press. Oren, E., Delbru, R., Gerke, S., Haller, A., & Decker, S. (2007). ActiveRDF: Object-oriented semantic web programming. In Proceedings of the 16th International World-Wide Web Conference, Banff, Canada (pp. 817824). Oren, E., Volkel, M., Breslin, J. G., & Decker, S. (2006). Semantic wikis for personal knowledge management. Database and Expert Systems Applications, 4080, 509-518.
Noy, N., & Musen, A. (2000). PROMPT: Algorithm and tool for automated ontology merging and alignment. AAAI/IAAI, 450-455.
Osterfeld, F., Kiesel, M., & Schwarz, S. (2005). Nabu - a semantic archive for XMPP instant messaging. In S. Decker, J. Park, D. Quan & L. Sauermann (Eds.), Proceedings of the 1st Workshop On The Semantic Desktop. 4th International Semantic Web Conference (pp. 159-166). Galway, Ireland.
Nygaard, K. (1979). The iron and metal project: Trade union participation. In A. Sandberg (Ed.), Computers dividing man and work (pp. 94-107). Malmö: Swedish Center for Working Life.
Oulasvirta, A. (2004). Finding meaningful uses for context-aware technologies: The humanistic research strategy. In Late Breaking Result Papers. ISBN 1-58113703-6.
O’Hara, K., & Sellen, A. (1997). A comparison of reading paper and online documents. In CHI (pp. 335-342).
Palmér, M., Enoksson, F., & Naeve, A. (2006). Annotation profiles: Configuring forms to edit RDF. Submitted for publication.
Compilation of References
Palomo, A. G. (2006). QMath: A human-oriented language and batch formatter for OMDoc. In OMDoc – an open markup format for mathematical documents (version 1.2, Chap. 26.2). Springer Verlag. Palomo, A. G. (2006). Sentido: An authoring environment for OMDoc. In OMDoc – an open markup format for mathematical documents (version 1.2, Chap. 26.3). Springer Verlag. Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 113-120).
Pivk, A., Cimiano, P., & Sure, Y. (2005, October). From tables to frames. Journal of Web Semantics, 3(2), 132146. Planet math – math for the people, by the people. (2007). Retrieved March 4, 2008, from http://www.planetmath. org Plessers, P., & Troyer, O. D. (2005, November). Ontology change detection using a version log. In Y. Gil, E. Motta, V. R. Benjamins, & M. A. Musen (Eds.), International semantic Web conference (Vol. 3729, pp. 578-592). Springer. Polanyi, M. (1958). Personal knowledge: Towards a post-critical philosophy. London: Routledge & Kegan Paul Ltd.
Park, O.C., & Lee, J. (2004). Adaptive instructional systems. In DH. Jonassen (Ed.), Handbook of research for educational communications and technology (2nd ed.) (pp. 651-684). Mahwah, NJ: Lawrence Erlbaum
Popitsch, N., Schandl, B., Amiri, A., Leitich, S., & Jochum, W. (2006). Ylvi – multimedia-izing the semantic wiki. In Proceedings of the 1st Workshop on Semantic Wikis – From Wiki to Semantics. Budva, Montenegro: CEUR-WS, Vol. 206.
Passant, A. (2007). Using ontologies to strengthen folksonomies and enrich information retrieval in weblogs: Theoretical background and corporate use-case. In ICWSM 2007, Boulder, CO.
Posc Caesar Association. (2007). Retrieved March 5, 2008, from http://www.posccaesar.org/
Pennacchiotti, M., & Pantel, P. (2006). Ontologizing semantic relations. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics, Sydney, Australia, (pp. 793-800). Pepper, S., Vitali, F., Garshol, L. M., Gessa, N., & Presutti, V. (2006, February 10). A survey of RDF/topic maps interoperability proposals (W3C Working Group Note). Retrieved March 9, 2008, from www.w3.org/TR/ rdftm-survey/
Powers, S. (2005). Cheap eats at the Semantic Web café. Retrieved March 7, 2008, from http://Web log. burningbird.net/archives/2005/01/27/cheap-eats-at-thesemantic-Web-cafe/ Priebe, T., & Pernul, G. (2003). Towards integrative enterprise knowledge portals. In Proceedings of the 12th International Conference on Information and Knowledge Management (pp. 216-223). New Orleans, LA: ACM Press. Prud’hommeaux, E., & Seaborne, A. (2005). SPARQL query language for RDF. Retrieved March 13, 2008, from http://www.w3.org/TR/rdf-sparql-query/
Pereira, F., Tishby, N., & Lee, L. (1999). Distributional clustering of English words. In Proceedings of the ThirtyFirst Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, (pp. 183-90).
Quan, D., Huynh, D., & Karger, D. R. (2003). Haystack: A platform for authoring end user Semantic Web applications. In International Semantic Web Conference (pp. 738-753).
Petasis, G., Karkaletsis, V., Paliouras, G., Androutsopoulos, I., & Spyropoulos, C. (2002). Ellogon: A new text engineering platform. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas, Spain (pp. 72-78).
Quillian, M.R. (1969). The teachable language comprehender: A simulation program and theory of language. Communications of the Association of Computational Machinery, 12(8), 459-476.
Pietriga, E. (2005). Fresnel selector language for RDF (FSL). Retrieved March 13, 2008, from http://www. w3.org/2005/04/fresnel-info/fsl/
0
Ramonet, I. (1995). Informarse cuesta. Le Monde diplomatique, edición española, 1, 1-1. Randall, D. W., Bowker, G., & Leigh Star, S. (2001). Sorting things out: Classification and its consequences
Compilation of References
- review. Computer Supported Cooperative Work, 10(1), 147-153. Ranganathan, S. R. (1962). Elements of library classification. Bombay: Asia Publishing House. Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Schenker, S. (2001). A scalable content-addressable network. In ACM SIGCOMM Conference, San Diego, CA (pp. 161-172). Rauschmayer, A. (2007). Wikifying an RDF editor. Submitted for publication. Rauschmayer, A., Andonova, A., & Nepper, P. (2007). Increasing the versatility of java documentation with RDF. In Proc. 3rd International Conference on Semantic Technologies (I-SEMANTICS). Ravichandran, D., & Hovy, E. (2001). Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (pp. 41-47). Reenskaug, T. (1979). The original MVC reports. Oslo: T. Reenskaug. Reif, G., Gall, H., & Jazayeri, M. (2005, May). WEESA: Web engineering for Semantic Web applications. In A. Ellis & T. Hagino (Eds.), Proceedings of the 14th WWW Conference (pp. 722-729). ACM. Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the Fouteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, (pp. 448-453). Resnik, P. (1995). Disambiguating noun groupings with respect to wordnet senses. In D. Yarowsky & K. Church (Eds.), Proceedings of the Third Workshop on Very Large Corpora (pp. 54-68). Somerset, NJ: Association of Computational Linguistics. Rhodes, B., & Maes, P. Just-in-time information retrieval agents, IBM Systems Journal, 39, 685-704. Richardson, S.D., Dolan, W.B., & VanderWende, L. (1998). Mindnet: Acquiring and structuring semantic information from text. In Proceedings of the Seventeenth International Conference on Computational Linguistics, Montreal, Canada. Richter, J., Völkel, M., & Haller, H. (2005, November). DeepaMehta — a semantic desktop. In S. Decker, J. Park, D. Quan, & L. Sauermann (Eds.), Proceedings of the 1st Workshop on The Semantic Desktop. 4th International Semantic Web Conference, Galway, Ireland (Vol. 175, CEUR-WS).
Rigau, G. (1998). Automatic acquisition of lexical knowledge from MRDs. Ph.D. Thesis, Departament de Llenguatges i Sistemes Informatics, Universitat Politecnica de Catalunya. Barcelona, Spain. Risse, T., Knezevic, P., Meghini, C., Hecht, R., & Basile, F. (2005, December). The BRICKS infrastructure – an overview. In Proceedings of the 75th Conference on Electronic Imaging, the Visual Arts & Beyond (EVA 2005), Moscow. Roman, D., Keller, U., Lausen, H., Bruijn, J. D., Lara, R, Stollberg, M., et al. (2005). Web services modeling ontology. Journal of Applied Ontology, 39(1), 77-106. Rowstron, A., & Druschel, P. (2001). Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms, Heidelberg, Germany (pp. 329-350). Rowstron, A., Kermarrec, A., Castro, M., & Druschel, P. (2001). SCRIBE: The design of a large-scale event notification infrastructure. In Proceedings of the 3rd International Workshop on Networked Group Communication, London, UK (pp. 30-43). Sauermann, L. (2005). The gnowsis semantic desktop for information integration. In Proceedings of the 3rd Conference on Professional Knowledge Management. Kaiserslautern, Germany: Springer. Sauermann, L., Bernardi, A., & Dengel, A. (2005). Overview and outlook on the semantic desktop. In Proceedings of the ISWC 2005 Workshop on the Semantic Desktop). Galway, Ireland: CEUR-WS. Vol, 175. Schaffert, S. (2004, October). Xcerpt: A rule-based query and transformation language for the Web. Ph.D. Thesis, University of Munich. Schaffert, S. (2006). IkeWiki: A semantic wiki for collaborative knowledge management. In Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2006) Manchester, United Kingdom (pp. 388-396). Schaffert, S., Bry, F., Besnard, P., Decker, H., Decker, S., Enguix, C. et al. (2005, November). Paraconsistent reasoning for the Semantic Web (Position paper). In Workshop on Uncertainty Reasoning for the Semantic Web (URSW05) at ISWC05, Galway, Ireland. Schaffert, S., Gruber, A., & Westenthaler, R. (2005, November). A semantic wiki for collaborative knowledge formation. In Semantics 2005, Vienna, Austria.
Compilation of References
Schaffert, S., Westenthaler, R., & Gruber, A. (2006, June). IkeWiki: A user-friendly semantic wiki. In Proceedings of the 3rd European Semantic Web Conference (ESWC06) – Demonstrations Track, Budva, Montenegro.
(Eds.), Innovative approaches for learning and knowledge sharing. First European Conference on Technology-Enhanced Learning (EC-TEL 2006), Crete, Greece (pp. 518-524). Springer.
Schandl, B. (2006). SemDAV: A file exchange protocol for the semantic desktop. In Proceedings of the Semantic Desktop and Social Semantic Collaboration Workshop. Athens, GA: CEUR-WS Vol. 202.
Schmidt, A., & Kunzmann, C. (2006). Towards a human resource development ontology for combining competence management and technology-enhanced workplace learning. In R. Meersman, Z. Tahiri, & P. Herero (Eds.), On the move to meaningful Internet systems 2006: OTM 2006 workshops. Part I (pp. 1078-1087). Springer.
Schandl, B. (2006). SemDAV: A file exchange protocol for the semantic desktop. CEUR-WS, 202. Schandl, B., & King, R. (2006). The SemDAV project: Metadata management for unstructured content. In Proceedings of the 1st International Workshop on Contextualized Attention Metdata: Collecting, Managing and Exploiting of Rich Usage Information (pp. 27-32). Arlington, VA: ACM Press. Schandl, B., Amiri, A., Pomajbik, S., & Todorov, D. (2007). Integrating file systems and the Semantic Web. Demo at the 3rd European Semantic Web Conference. Innsbruck, Austria. Schlenoff, C. et al. (2000). The process specification language (PSL): Overview and version 1.0 specification (Tech. Rep. NISTIR 6459). Gaithersburg, MD: National Institute of Standards and Technology.
Schneider, S., & Davies, J. (1995). A brief history of timed CSP. Theoretical Computer Science, 138. Schön, D. A. (1983). The reflective practitioner: How professionals think in action. B&T. Schraefel, M. C., Wilson, M., Russell, A., & Smith, D. A. (2006). M-space: Improving information access to multimedia domains with multimodal exploratory search. Communications of the ACM, 49(4), 47-49. Schütze, H. (1992) Dimensions of meaning. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing (pp. 787-796). Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97-123.
Schmidt, A. (2005). Bridging the gap between knowledge management and e-learning with context-aware corporate learning solutions. In K. Althoff, A. Dengel, R. Bergmann, M. Nick, & T.R. Berghofer (Eds.), Professional knowledge management. Third Biennial Conference, WM 2005, Kaiserlautern, Germany, April 2005. Revised Selected Papers (pp. 203-213). Springer.
Sekine, S. (2002). Oak system. Retrieved March 11, 2008, from http://nlp.cs.nyu.edu/oak/
Schmidt, A. (2005). Knowledge maturing and the continuity of context as a unifying concept for knowledge management and e-learning. In Proceedings of I-KNOW 05, Graz, Austria.
Sellen, A. J., & Harper, R. H. R. (2003). The myth of the paperless office. Cambridge, MA: MIT Press.
Schmidt, A. (2006). Ontology-based user context management: The challenges of dynamics and imperfection. In R. Meersman & Z. Tahiri (Eds.), On the move to meaningful Internet systems 2006: CoopIS, DOA, GADA, and ODBASE. Part I (pp. 995-1011). Springer. Schmidt, A. (2007). Microlearning and the knowledge maturing process: Towards conceptual foundations for work-integrated microlearning support. In T. Hug, M. Lindner, & P.A. Bruck (Eds.), Proceedings of Microlearning 2007, Innsbruck, Austria. Schmidt, A., & Braun, S. (2006). Context-aware workplace learning support: Concept, experiences, and remaining challenges. In W. Nejdl & K. Tochtermann
Sekine, S. (2006). On-demand information extraction. In Proceedings of the Twenty-First International Conference on Computational Linguistics and Forty-Fourth Annual Meeting of the Association of Computational Linguistics.
Sheth, J., Newman, B., & Gross, B. (1991). Why we buy what we buy: A theory of consumption values. Journal of Business Research, 22, 159-170. Shneiderman, B. (1998). Designing the user interface: Strategies for effective Human-Computer Interaction (3rd ed.). Addison Wesley Longman. Simkins, N.K. (1994). An open architecture for language engineering. Paper presented at the First Language Engineering Convention, Paris, France. Sintek, M., van Elst, L., Scerri, S., & Handschuh, S. (2007). Distributed knowledge representation on the social semantic desktop: Named graphs, views, and roles in NRL. In Proc. European Semantic Web Conf. (ESWC).
Compilation of References
Sirin, E., Parsia, B., Grau, B. C., Kalyanpur, A., & Katz, Y. (2006). Pellet: A practical OWL-DL reasoner. Journal of Web Semantics. Retrieved March 4, 2008, from http:// www.mindswap.org/papers/PelletJWS.pdf
Sun, J., Dong, J. S., Liu, J., & Wang, H. (2001). Object-z Web environment and projections to UML. In WWW10: 10th International World Wide Web Conference (pp. 725-734). ACM Press.
Smith, G. (2000). The object-z specification language. Advances in Formal Methods. Kluwer Academic Publishers.
Sussna, M. (1993). Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Base Management, Arlington, Virginia, (pp. 67-74).
Smith, G. (2005). IA summit folksonomies panel. Retrieved March 7, 2008, from http://atomiq.org/archives/2005/03/ia_summit_folksonomies_panel.html Snipsnap – the easy Web log and wiki software. (2007). Retrieved March 4, 2008, from http://snipsnap.org/
Tapanainen, P., & Järvinen, T. (1997). A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP’97) (pp. 64-71). Washington, DC: ACL.
Snow, R., Jurafsky, D., & Ng, A.Y. (2006). Semantic taxonomy induction from heterogeneous evidence. In Proceedings of the Annual Meeting of the Association Computational Linguistics, Sidney, Australia.
Tazzoli, R., Castagna, P., & Campanini, S. E. (2004). Towards a semantic WikiWikiWeb. In Poster Abstracts of 3rd International Semantic Web Conference (ISWC 2004), Hiroshima, Japan.
Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3), 233-272.
Teevan, J., Alvarado, C., Ackerman, M. S., & Karger, D. R. (2004). The perfect search engine is not enough: A study of orienteering behaviour in directed search. In CHI ‘04: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vienna, Austria (pp. 415-422).
Souzis, A. (2005, September/October). Building a semantic wiki. IEEE Intelligent Systems, 20(5), 87-91. Spyns, P., Meersman, R., & Jarrar, M. (2002). Data modelling vs. ontology engineering. SIGMOD Rec, 12-17. Stoica, I., Morris, R., Karger, D., Kaashoek, M., & Balakrishnan, H. (2001). Chord: A scalable peer-to-peer lookup service for internet applications. In ACM SIGKOMM Conference, San Diego, CA (pp. 149-160). Strappavara, C., Gliozzo, A., & Giuliano, C. (2004). Pattern abstraction and term similarity for word sense disambiaguation: IRST at Senseval-3. In Proceedings of the Third Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain. Straub, R. (2005). Lernen im Kontext: Dynamischen Lernkonzepten gehört die Zukunft. Competence-Site, 26(1). Studer, R., Benjamins, V., & Fensel, D. (1998). Knowledge engineering, principles and methods. Data and Knowledge Engineering, 25(1-2), 161-197. Sun, J. (2003). Tools and verification techniques for integrated formal methods. Ph.D. thesis, National University of Singapore. Sun, J., Dong, J. S., Liu, J., & Wang, H. (2001). A XML/ XSL approach to visualize and animate TCOZ. In J. He, Y. Li, & G. Lowe (Eds.), The 8th Asia-Pacific Software Engineering Conference (APSEC’01) (pp. 453-460). IEEE Press.
The Connexions Team. (2006). Connexions: Help on editing modules. Retrieved March 4, 2008, from http://cnx. org/help/EditingModules The rule markup initiative. (2000, November) Retrieved March 13, 2008, from http://www.ruleml.org/ Thierstein, J., & Baraniuk, R. G. (2007). Actual facts about ConneXions. Retrieved March 9, 2008, from http://cnx. org/aboutus/publications/paper.2006-07-12.361906064 Thompson, H.S., & McKelvie, D. (1997). Hyperlink semantics for standoff markup of read-only documents. In Proceedings of SGML Europe ’97, Barcelona, Spain. Tjong-Kim-Sang, E.F., & De Meulder, F. (2003). Introduction to the conll-2003 shared task: Language independent named entity recognition. In W. Daelemans & M. Osborne (Eds.), Proceedings of the Seventh Conference on Natural Language Learning (pp. 142-147). Edmonton, Canada. Tolksdorf, R., & Simperl, E.P.B. (2006). Towards wikis as semantic hypermedia. In Proceedings of the 2006 International Symposium on Wikis (pp. 79-88). Odense, Denmark: ACM Press. Tosh, D., & Werdmuller, B. (2004). Creation of a learning landscape: Weblogging and social networking in the context of e-portfolios (Tech. Rep.). University of Edinburgh.
Compilation of References
Trost, H. (2003). Morphology. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 2547). New York: Oxford University Press Inc. Tufis, D., Cristea, D., & Stamou, S. (2004). BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal on Information Science and Technology. Special Issue on BalkaNet, 7(1-2), 9-34. Tummarello, G., Morbidoni, C., & Nucci, M. Enabling Semantic Web communities with DBin: An overview. In Proc. 5th Int. Semantic Web Conf. (ISWC), 2006. Ulbrich, A., Scheir, P., Lindstaedt, S. N., & Görtz, M. (2006). A context-model for supporting work-integrated learning. In W. Nejdl & K. Tochtermann (Eds.), Innovative approaches for learning and knowledge sharing. First European Conference on Technology-Enhanced Learning (EC-TEL 2006), Crete, Greece (pp. 525-530). Springer. Uren, V.S., Cimiano, P., Iria, J., Handschuh, S., VargasVera, M., Motta, E., et al. (2006). Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics, 4(1), 14-28. Valen-Sendstad, M. (2006, April). Integrated information platform for reservoir and subsea production systems project (IIP). In The First Norwegian Semantic Days Conference, Stavanger, Norway. van Assem, M., Menken, M.R., Schreiber, G., Wielemaker, J., & Wielinga, B. (2004). A method for converting thesauri to RDF/OWL. In Proceedings of the Third International Semantic Web Conference (ISWC2004). Springer-Verlag. Vehviläinen, A. (2006). Ontologiapohjainen kysymys-vastauspalvelu (Ontology-based question-answer service). M.Sc. Thesis. Retrieved March 7, 2008, from http://www.seco.tkk.fi/publications/2006/vehvilainenmsc-2006.pdf Vehviläinen, A., Alm, O., & Hyvönen, E. (2006). Combining case-based reasoning and semantic indexing in a question-answer service. Poster paper, 1st Asian Semantic Web Conference (ASWC2006). Veinott, E. S., Olson, J., Olson, G. M., & Fu, X. (1999). Video helps remote work: Speakers who need to negotiate common ground benefit from seeing each other. In CHI ’99: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY (pp. 302-309). ACM Press.
Vogiazou, Y., Eisenstadt, M., Dzbor, M., & Komzak, J. (2005). From Buddyspace to CitiTag: Large-scale symbolic presence for community building and spontaneous play. In ACM Symposium on Applied Computing (SAC) (pp. 1600-1606). Völkel, M. (2007). From documents to knowledge models. In Proceedings of ProKW workshop, Konferenz Professionelles Wissensmanagement, Potsdam, Germany. Völkel, M., & Haller, H. (2006). Conceptual data structures (CDS) - towards an ontology for semi-formal articulation of personal knowledge. In Proceedings of the 14th International Conference on Conceptual Structure, Aalborg University – Denmark. Völkel, M., & Oren, E. (2006). Towards a wiki interchange format (WIF). In Proceedings of of the 1st Workshop on Semantic Wikis – From Wiki to Semantics. Budva, Montenegro: CEUR-WS Vol. 206. Völkel, M., et al. (2006). Wiki specification request 3 – wiki interchange format. Retrieved March 4, 2008, from http://www.wikisym.org/wiki/index.php/WSR_3 Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., & Studer, R. (2006). Semantic wikipedia. In Proceedings of the 15th International Conference on World Wide Web (pp. 585-594). Edinburgh, Scotland: ACM Press. Vorhees, E.M. (1993) Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the Sixteenth International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 171-180). Vossen, P. (Ed.). (1998). EuroWordNet - a multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers. Vrandečić, D., & Krötzsch, M. (2006, June). Reusing ontological background knowledge in semantic wikis. In M. Völkel, S. Schaffert, & S. Decker (Eds.), Proceedings of the 1st workshop on semantic wikis, European Semantic Web Conference 2006 (CEUR Workshop Proceedings, Vol. 206, pp. 16-30), Budva, Montenegro. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H. Schuster, G., Neumann, H., et al. (2001). Ontology-based integration of information. In Proceedings of the IJCAI01 Workshop on Ontologies and Information Sharing (pp. 108-117). Wagner, C. (2004). Wiki: A technology for conversational knowledge management and group collaboration. Communications of the Association for Information Systems, 13, 265-289.
Compilation of References
Wagner, C., & Bolloju, N. (2005, April-June). Knowledge management with conversational technologies: Discussion forums, weblogs, and wikis. Journal of Database Management, 16(2), i-viii. Walker, D. (1987). Knowledge resource tools for accessing large text files. In S. Nirenberg (Ed.), Machine translation: Theoretical and methodological Issues (pp. 247-261). Cambridge, England: Cambridge University Press. Watt, J., Walther, J., & Nowak, K. (2002). Asynchronous videoconferencing: A hybrid communication prototype. hicss, 01, 13. Watts, D. J., Dodds, P. S., & Newman, M. E. (2002). Identity and search in social networks. Science Magazine, 296, 1302-1305. Weinberger, D. (2002). Small pieces loosely joined: A unified theory of the Web. Basic Books. Wen, J., Li, Q., Ma, W., & Zhang, H. (2003). A multiparadigm querying approach for a generic multimedia database management system. ACM SIGMOD Record, 32(1), 26-34. Wenger, E., McDermott, R., & Snyder, W. M. (2002, March). Cultivating communities of practice. Harvard Business School Press. Whittaker, S., & Sidner, C. L. (1996). E-mail overload: Exploring personal information management of e-mail. In CHI (pp. 276-283). Whittaker, S., Bellotti, V., & Gwizdka, J. (2006). E-mail in personal information management. Communications of the ACM, 49(1), 68-73. Whittaker, S., Jones, Q., Nardi, B. A., Creech, M., Terveen, L. G., Isaacs, E. et al. (2004). Contactmap: Organizing communication in a social desktop. ACM Trans. Comput.-Hum. Interact., 11(4), 445-471. Wikimedia Foundation. (Ed.). (2006, December). Section from Wikimedia meta-wiki. Retrieved March 4, 2008, from http://meta.wikimedia.org/w/index.php?title=Help: Section&oldid=480808 Wikimedia Foundation. (Ed.). (2006, December). Semantic wiki (from Wikipedia, the free encyclopedia). Retrieved March 4, 2008, from http://en.wikipedia.org/ w/index.php?title=Semantic_Wiki&oldid=92323454 Wikipedia: RDFa. (2006, March) Retrieved March 13, 2008, from http://en.wikipedia.org/wiki/RDFa Wilks, Y., Fass, D.C., Ming Guo, C., McDonald, J.E., Plate, T., & Slator, B.M. (1993). Providing machine trac-
table dictionary tools. In J. Pustejovsky (Ed.), Semantics and the lexicon (pp. 341-401). Cambridge, MA: Kluwer Academics Publishers. Winkler, W.E. (1999). The state of record linkage and current research problems. RR99/03, U.S. Bureau of Census. Retrieved March 7, 2008, from http://www. census.gov/srd/www/byname.html Wittgenstein, L. (1953). Philosophische Untersuchungen. Frankfurt am Main, Germany: Suhrkamp (2003). Woelk, D., & Agarwal, S. (2002). Integration of e-learning and knowledge management. In World Conference on ELearning in Corporate, Government, Health Institutions, and Higher Eduction (Vol. 1, pp. 1035-1042). Wolinski, F., Vichot, F., & Gremont, O. (1998). Producing NLP-based online contentware. In Natural Language and Industrial Applications, Moncton, Canada. Woodcock, J., & Davies, J. (1996). Using z: Specification, refinement, and proof. Prentice-Hall International. World Wide Web Consortium. (Ed.). (2006). W3C GRDDL working group. Retrieved March 4, 2008, from http://www.w3.org/2001/sw/grddl-wg/ Xiao, H., & Cruz, I. (2006). Application design and interoperability for managing personal information in the semantic desktop. In Proceedings of the Semantic Desktop and Social Semantic Collaboration Workshop (SemDesk 2006), at the 5th International Semantic Web Conference (ISWC 2006): Vol. 202. Athens, Georgia, USA. Yarowsky, D. (1992). Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics COLING (pp. 454-460). Yee, K.-P., Swearingen, K., Li, K., & Hearst, M. (2003). Faceted metadata for image search and browsing. In CHI. Yoon, Y., & Acree, A.D. (1993). Development of a casebased expert system: application to a service coordination problem. Expert Systems with Applications, 6, 77-85. Zave, P., & Jackson, M. (1997). Four dark corners of requirements engineering. ACM Trans. Software Engineering and Methodology, 6(1), 1-30. Zhang, J., & Van Alstyne, M. W. (2004). SWIM: Fostering social network based information search. In CHI Extended Abstracts (pp. 1568).
Compilation of References
Zinn, C. (2006, November). Bootstrapping a semantic wiki application for learning mathematics. In Y. Sure & S. Schaffert (Eds.), Proceedings of semantics 2006: From visions to applications: Semantics – the new paradigm shift in IT (pp. 255-260). Vienna, Austria.
Zucanu, D., Li, Y. F., & Dong, J. S. (2006). Semantic Web languages - towards an institutional perspective. In Futatsugi, Jouannaud, & Meseguer (Eds), Algebra, meaning, and computation: A festschrift in honor of Prof. Joseph Goguen. Springer-Verlag (LNCS 4060, pp. 99-123).
About the Contributors
Jörg Rech is a project manager and senior scientist at the Fraunhofer Institute for Experimental Software Engineering (IESE) in Kaiserslautern, Germany. He received the BS (Vordiplom) and the MS (Diplom) in computer science with a minor in electrical science from the University of Kaiserslautern, Germany. His research mainly concerns software antipatterns, software patterns, defect discovery, software mining, software retrieval, automated software reuse, software analysis, and knowledge management. He is also the speaker of the GI working group on architectural and design patterns. Contact him at joerg.
[email protected]. Björn Decker is a solution engineer for semantically enabled knowledge management solutions at empolis GmbH, part of avarto, a Bertelsmann company. Before that occupation, he was a project manager and scientist at the Fraunhofer Institute for Experimental Software Engineering (IESE) in Kaiserslautern, Germany. He received the BS (Vordiplom) and the MS (Diplom) in computer science from the University of Kaiserslautern, Germany. His research interests are experience management, in particular the collaborative maintenance of online repositories and usage of social software. He is a trainer concerning the usage of Wikis in software engineering companies and scientific writing. He organizes and is PC member of different workshops and conferences in the domain of software engineering, semantic Wikis, and experience management. Contact him at
[email protected]. Eric Ras is a senior scientist at the Fraunhofer Institute for Experimental Software Engineering (IESE) in Kaiserslautern, Germany. He received the BS (Vordiplom) and the MS (Diplom) in computer science from the University of Kaiserslautern, Germany. His research interests are learning material production, vocational training methods, software patterns, and experience management. Eric Ras is scientific coordinator of the international distance learning program Software Engineering for Embedded Systems at the University of Kaiserslautern, Germany. He organized and is PC member of different workshops and conferences in the domain of software engineering, e-learning, and knowledge management. Contact him at
[email protected]. ***
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
About the Contributors
Enrique Alfonseca is a lecturer at the Universidad Autonoma de Madrid since 2001. His main research interests are natural language processing topics such as information extraction, text summarisation, sentiment analysis, parsing, semantic annotation, and application of NLP to e-learning. Olli Alm is a research assistant in the Semantic Computing Research Group (SeCo) at the University of Helsinki, Department of Computer Science, and at the Helsinki University of Technology, Laboratory of Media Technology. His research interests include ontology-based information extraction. Klaus-Dieter Althoff is full professor at the University of Hildesheim and is directing a research group on intelligent information systems. He studied mathematics with a focus on expert systems at the University of Technology at Aachen. In 1992 he finished his doctoral dissertation on an architecture for knowledge-based technical diagnosis at the University of Kaiserslautern, where he also received the postdoctoral degree (Habilitation) with a thesis on the evaluation of case-based reasoning systems in 1997. He worked at the Fraunhofer Institute for Experimental Software Engineering as group leader and department head until he went to Hildesheim in April 2004. His main interests include techniques, methods and tools for developing, operating, evaluating, and maintaining knowledge-based systems, with a focus on case-based reasoning, agent technology, experience management, and machine learning. Sören Auer leads the research group Agile Knowledge Engineering and Semantic Web (AKSW) at the department Business Information Systems (University of Leipzig). He studied mathematics and computer science in Dresden, Hagen and Ekaterinburg, Russia. From 2000 to 2003, Sören was managing director of adVIS GmbH, a Dresden-based Internet and IT service provider. In 2006 he finished his PhD thesis on agile knowledge engineering at the University of Leipzig. Sören is project leader of Powl/OntoWiki, an integrated open-source, Semantic Web development framework for the scripting language PHP and founding member of the DBpedia project. Sören is author of over 50 peer-reviewed scientific publications, co-organiser of several workshops and chair of the first Social Semantic Web conference. Michel Buffa is an associate professor at the University of Nice Sophia Antipolis. He is also a researcher in the Mainline group of the I3S laboratory and conducts his research work on the Semantic Web and social networks now. Currently he is a visiting scientist in the EDELWEISS/ACACIA group at INRIA Sophia-Antipolis where he works on the SweetWiki software that is used by several communities of practice. María Ruiz Casado is a PhD student at the Escuela Politécnica Superior, Universidad Autonoma de Madrid. Her thesis topic is Ontology Learning and Population from Free Text. Pablo Castells is associate professor at the Universidad Autonoma de Madrid since 1999, where he leads and participates in several national and international projects in the areas of the semantic-based knowledge systems. His current research interests are focused on information retrieval, personalisation technologies, and context modelling. Ángel García Crespo is the head of the SofLab Group at the Computer Science Department in the Carlos III University and the Head of the Institute for Promotion of Innovation Pedro Juan de Lastanosa, Madrid. He holds a PhD in Industrial Engineering from the Universidad Politécnica de Madrid (award
About the Contributors
from the Instituto J.A. Artigas to the best thesis) and received an Executive MBA. from the Instituto de Empresa. Professor García Crespo has led and actively contributed to relevant European Projects of the FP V and VI, and also in many business cooperations. He is the author of more than 100 publications in conferences, journals, and books, both Spanish and international. Jin-Song Dong received both a Bachelor (1st class honors) and a PhD degree in computing from University of Queensland in 1992 and 1996. From 1995-1998, he was a research scientist at the Commonwealth Scientific and Industrial Research Organisation in Australia. Since 1998 he has been in the School of Computing at the National University of Singapore (NUS), where he is currently associate professor and assistant dean. He is a Steering Committee member of the International Conference on Formal Engineering Methods (ICFEM) and the Asia Pacific Software Engineering Conference (APSEC) series. Guillaume Erétéo is a graduate student engineer from the Computer Science Department of Ecole Polytechnique Universitaire de Nice Sophia-Antipolis. Roar Fjellheim is a cofounder and director of business development at Computas AS, a Norwegian SME specializing in work support systems and knowledge-based systems. He is also an associate professor in computer science at the University of Oslo. His primary professional interests are industrial applications of semantic technology, artificial intelligence, and systems engineering. Fjellheim has an engineering degree from the Norwegian University of Science and Technology (Trondheim). Prior to joining Computas, he held positions at The Norwegian Computing Centre (Oslo), CERN (Geneva), Det Norske Veritas (Oslo), and Innovation Norway (San Francisco). Thomas Franz is a researcher in the group for Information Systems and Semantic Web (ISWeb) at the University of Koblenz-Landau, Germany. He joined ISWeb in 2005. His research interests include personal information management, Semantic Web, and knowledge representation. He is involved in the European research project X-Media and has previously worked in the research center for Knowledge Management at the University of Applied Sciences Cologne, Germany. Thomas holds a Masters Degree in computer science from the University of Freiburg and a diploma degree from the University of Applied Sciences, Cologne. He attended the Clemson University, USA, and the Chalmers University, Sweden Fabien Gandon is a full-time researcher in Informatics and Computer Science in the EDELWEISS/ ACACIA Research team of INRIA in the Research Center of Sophia-Antipolis (France). He has a PhD in Computer Science and he is a Graduated Engineer in Applied Mathematics from INSA. His professional interests include: Semantic Web, ontologies, knowledge engineering and modelling, mobility, privacy, context-awareness, Web services, and multi-agents systems. His main domain of application is organizational memories (companies, communities, etc.) and knowledge management in general. He also participates to W3C working groups. Nicholas Gibbins is a lecturer in the School of Electronics and Computer Science at the University of Southampton. He is currently an investigator on the EU-funded TAO project, and was previously a member of the EPSRC-funded Advanced Knowledge Technologies project from 2000-2007. His research interests include open hypertext and hypermedia, large-scale multiagent systems, and the Semantic Web.
About the Contributors
Juan Miguel Gómez is a visiting professor at the Computer Science Department in the Carlos III University. He holds a PhD in Computer Science from the Digital Enterprise Research Institute (DERI) at National University of Ireland, Galway and received his MSc in Telecommunications Engineering from the Universidad Politécnica de Madrid (UPM). He was involved in a number of EU FP V and VI research projects and was a member of the Semantic Web Services Initiative (SWSI). His research interests include the Semantic Web, Semantic Web services, business process modeling, B2B integration, and recently, bioinformatics. He has published a great number of articles in journal, conferences, and book chapters. Eero Hyvönen is a professor of media technology at the Helsinki University of Technology, Laboratory of Media Technology, and a docent of computer science at the University of Helsinki, Department of Computer Science. He directs the Semantic Computing Research Group (SeCo). Zachary Ives is an assistant professor at the University of Pennsylvania and an associated faculty member of the Penn Center for Bioinformatics. He received his BS from Sonoma State University and his PhD from the University of Washington. His research interests include data integration, peer-to-peer models of data sharing, processing and security of heterogeneous sensor streams, and data exchange between autonomous systems. He is a recipient of the NSF CAREER award and a member of the DARPA Computer Science Study Panel. Malte Kiesel has been working at DFKI Knowledge Management since 2004, participating in the SmartWeb project which focuses on making web contents available on mobile devices using Semantic Web technologies. His research interests are semantic wikis and creating personalized views using both automatically and manually generated metadata. He maintains the semantic wiki Kaukolu which was used in the NEPOMUK semantic desktop project. Currently, Kaukolu is extended in the Mymory project for handling attention data gathered by an eyetracker. The idea is to generate views on information stored in the personal wiki using context as well as attention information. http://smartweb-project.org/ http://kaukoluwiki.opendfki.de/ http://nepomuk.semanticdesktop.org/ http://www.dfki.de/mymory/ Ross King received his PhD in physics from Stanford University. After moving to Vienna in 1995, he migrated to the IT sector, and joined Research Studios Austria in 2002 to help found the Studio Digital Memory Engineering, where he presently serves as Head of Operations. His research interests are primarily concerned with multimedia information management and retrieval. Andrea Kohlhase holds a diploma in Mathematics from the Technical University Berlin. After graduation, she spent 10 years in the software industry, but came back to academia via a research programmership at Carnegie Mellon University, Pittsburgh. She is currently finishing her PhD studies at University Bremen (DiMeB) and works as a senior research associate at Jacobs University, Bremen. Her research interest lies in the field of Human-Computer Interaction, especially semantic interaction design, with a focus on HCI in educational environments. She has developed the software package “CPoint,” offering an invasive, semantic editor, and educational environment for educators as well as students in MS PowerPoint. 0
About the Contributors
Michael Kohlhase is professor for Computer Science at Jacobs University Bremen and Vice Director at the German Research Center for Artificial Intelligence (DFKI), Lab Bremen. His current research interests include automated theorem proving and knowledge representation for mathematics, inference-based techniques for natural language processing, and computer-supported education. He has pursued these interests during extended visits to Carnegie Mellon University, SRI International, and the Universities of Amsterdam and Edinburgh. Michael Kohlhase is recipient of the dissertation award of the Association of German Artificial Intelligence Institutes (AKI, 1995) and of a Heisenberg stipend of the German Research Council (DFG, 2000-2003). He was a member of the Special Research Action 378 (Resource-Adaptive Cognitive Processes), leading projects on both automated theorem proving and computational linguistics. Michael Kohlhase is trustee of the MKM and CALCULEMUS Conferences, a member of the W3C MathML working group and the president of the OpenMath Society. Christoph Lange is a PhD student in the Smart Systems program at Jacobs University Bremen. He holds a degree in computer science from the University of Trier. His research interests lie in the areas of semantic and nonsemantic wikis, as well as representing and managing mathematical knowledge. He is the editor of two textbooks about wikis and an active contributor to the German and English editions of Wikipedia. His PhD research topic is the creation of a semantic wiki that supports scientists and learners by integrating various services, based on the SWiM prototype presented in this book. Yuan Fang Li is a research fellow at the School of Computing, National University of Singapore. His research interests include the Semantic Web, formal methods and software verification. He received his PhD in computer science from National University of Singapore. Damaris Fuentes Lorenzo holds a B.S in Technical Engineering in Computer Managements from the Universidad Carlos III de Madrid and an MSc in Computer Engineering, Development of the Enterprise Information System Specialization. She is enrolled in a Master of Computer Science and Technology, specialized in Software Engineering. She has taken up several scholarships in both the Departments of Telematics and Computer Science of Carlos III University, involved in the latter as a research technician. Her research interests include Semantic Web, ontological engineering, collaborative software and Web accessibility and usability. Normen Müller is research fellow in the Smart Systems program at Jacobs University Bremen. He holds a degree in Computer Science from the Technical University Darmstadt. His research interests lie in mathematical foundations and logic of computer science, knowledge representation based on description logic, and document engineering, particularly management of change. Based on ontology-driven management, he is currently developing the management of change software prototype “locator.” David Norheim is a principle engineer and leads Semantic Web activities at Computas AS, and heads a special interest group for the Semantic Web in the Norwegian Computer Society, and a Semantic Interoperability group in The Norwegian Foundation for E-Business and Trade Procedures. He has experience with applying Semantic Web technologies for various user groups and application areas since 2001, and cofounded a Semantic Web start-up company. Norheim has an engineering degree from the Norwegian University of Science and Technology (Trondheim). Prior to joining Computas he held positions at Joint Research Centre of European Commission (Italy), Innovation Norway (San Francisco), Asemantics (Italy), and University graduate studies at Kjeller (Oslo).
About the Contributors
Eyal Oren is a PhD student and researcher at the Digital Enterprise Research Institute. He has published and presented some 30 papers at international conferences and workshops. His research is concerned with application development on the Semantic Web, in particular with data-driven techniques for data access, data navigation, data entry, and data discovery. He is the creator and developer of ActiveRDF, BrowseRDF, and SemperWiki, and is recently working on Sindice.com, a Semantic Web lookup service. See his homepage on http://www.eyaloren.org/ for more information. Jeff Z. Pan received the PhD in Computer Science from the University of Manchester in 2004. He is a lecturer of computing science at the University of Aberdeen, UK. His main research interests lie in the design of logics and ontology languages, automated reasoning, ontology usability, and the applications of all the above (such as in the Semantic Web). He is a PC Chair of the First International Conference on Web Reasoning and Rule Systems (RR2007). He serves as a co-editor of the Special Issue on Ontology Dynamics of the Journal of Logic and Computation, as well as a co-editor of the Special Issue on Extended Papers from 2006 Conferences of Journal of Data Semantics. Jeff is very active in the World Wide Web Consortium (W3C), serving as a cochair the W3C Multimedia Semantics Incubator Group, as well as a co-coordinator of both the Software Engineering and the Multimedia Task Forces in the W3C Semantic Web Best Practices and Deployment Working Group (2004-2006). He is also a cochair of the Fuzzy RuleML Technical Group and a member of the RuleML steering committee. Terry Payne is a Lecturer within the Intelligence, Agents, Multimedia Group, at the University of Southampton. He holds an MSc and PhD in Artificial Intelligence from the University of Aberdeen, Scotland, and is currently engaged in research on Semantic Web services, semantics for service discovery, and agent-based services. To date he has published over 90 papers and articles, and in 2001 was the winner of the Semantic Web challenge. He is on several Program Committees for various Agents, Services and Semantic Web conferences and chaired the 2003 AAAI Spring Symposium on Semantic Web Services, and the 2005 AAAI Fall Symposium on Agents and the Semantic Web, and was one of the co-authors of the W3C OWL-S Service Description Language Recomendation Note. Jurij Poelchau was born 1963 in Berlin. In 1982-1998, he studied physics, mathematics, and philosophy at the Technical University Berlin as a scientific assistant. He has publications in the field of many-body quantum mechanics. Since 1998, Jurij has been a researcher and consultant for sustainability strategies at Agenda-Agency Berlin (cofounder), TU Berlin, Regioconsult, and fx-Institute for Sustainable Economics (co-founder). His main field of work is to synthesize social, cultural, ecological, economic, and technological innovation. Since 2007, he has been a consultant for the DeepaMehta company and is cofounder of the amina-foundation. Niko Popitsch received his diploma degree in computer science from the Technical University in Vienna. He gained industry experience by working several years as a software architect and developer and joined the Research Studios Austria in 2003, where he is currently involved in all facets of the METIS project. His research interests lie in the area of infrastructures for contextaware multimedia applications. Currently he is leading the development of the semantic wiki implementation “Ylvi” that is based on the METIS middleware.
About the Contributors
Martin Povazay is the managing director, founder, and owner of P.Solutions Informationstechnologien GmbH. He studied business informatics and physics at University Vienna and Technical University Vienna, previously working at the Austrian Parliament and in numerous management positions such as Priority Telecom GmbH as the manager of the process management staff unit and Update Software AG as Product Manager of e-media/Web based products. Since 2000, he has been involved in numerous research and development projects like FLoCiEE, PADD, and SemDAV, and has been focussed on semantic and business systems as well as the design of information systems which correspond to the human factor. Axel Rauschmayer received the diploma in computer science from the University of Munich and wrote his diploma thesis in cooperation with the University of Texas at Austin. He was also one of the first three technical people behind Pangora, a shopping portal company that now powers the shopping pages of a wide range of European sites (among others, AOL, T-Online, Froogle, and Lycos). Currently, he is a PhD student at the University of Munich. His main research project is Hyena, an extensible integrated RDF editor whose RDF vocabulary support includes a semantic wiki, Java source code references, and declarative definitions for form-based editors that is based on the Fresnel display vocabulary. http://hypergraphs.de/ Brigitte Rauter studied European Ethnology at the University of Vienna and received her Mag. phil. in 1997. She has additional skills in Web design and project management. She has worked as an exhibition curator and project manager in several projects, for example, from 2002-2004, on an EU project “Scalex” at Technical Museum Vienna, in 2005 on an EU project “PADD,” in 2006 on a FIT-IT project, “SemDAV,” and at P.Solutions. Since 2000, she has worked in IT branch as project coordinator, product marketer, and “translator” with the focus on bringing together the technical view and ideas and the users needs. Jörg Richter was born in 1967 in Berlin. He has developed software since 1980. He won the CHIP programming award in 1985. Between 1988 and 1995, he studied computer science (focusing on AI and software engineering) and linguistics at Technical University Berlin. He was a fellow at Research Centers for Network Technologies and Multimedia Applications under Prof. Rebensburg, a professional software developer in the areas of knowledge management, e-learning, and learning management, and a lead developer of artfacts.net, the world’s largest portal for modern and contemporary art. Since 2005, he has been the CTO of the DeepaMehta company. He won the Best-Practice awards for DeepaMehta of D21, we make IT Berlin, and Brandenburg initiatives. Sebastian Schaffert is working as a senior researcher and project manager at the group for Knowledgebased Information Systems (KIS) at Salzburg Research as of August 2005. Since 2006, he has also been the scientific director of Salzburg NewMediaLab (SNML), the Austrian industry competence centre on New Media hosted by Salzburg Research. SNML is carrying out industrial research in the areas of social software, semantic systems, and multimedia content management. From 2001-2005, Sebastian was a research and teaching assistant at the Institute for Informatics, University of Munich, Germany. He received his PhD from the University of Munich in 2004 with a dissertation entitled “Xcerpt: A RuleBased Query and Transformation Language for the WEb.” Sebastian Schaffert is engaged in research
About the Contributors
on the Semantic Web, social software, e-learning, knowledge representation and knowledge-based systems, reasoning and programming languages. His particular interest is currently in the combination of social software with Semantic Web technologies. He is primary developer of the semantic wiki system IkeWiki. In the above-mentioned areas, he has numerous publications and is member of many programme committees in international journals, conferences, and workshops. Sebastian Schaffert was among other things also the initiator and coprogramme chair of the 1st Workshop “From Wiki to Semantics” that took place in Budva, Montenegro, in 2006 and of the Semantics conference series (2006 in Vienna, 2007, in Graz), which is concerned about the “Social Semantic Web” and industry-relevant research on Semantic Systems. Bernhard Schandl received his diploma in economics and computer science from Technical University of Vienna and University of Vienna. He worked in various companies including Siemens and Kapsch as software engineer and developer. Since 2004, he has been employed as researcher at University of Vienna. He worked on the METIS/Ylvi platform and in several research projects, including BRICKS and SemDAV. His research interests embrace semantic multimedia systems, personal information management, and information systems usability issues. Andreas Schmidt is Department Manager for the competence area Knowledge and Learning within Information Process Engineering at the FZI Research Center for Information Technologies in Karlsruhe, Germany. He received his diploma in computer science from the University of Karlsruhe and was working in several national and European research projects. Within the project Learning in Process where he was leading the scientific activities, he has developed a competency-oriented methodology for supporting work-integrated learning on demand. His research interests include workplace learning support, competence management, context-aware services, and ontology-based techniques. He is an assistant lecturer at the University of Karlsruhe and scientific coordinator of the EU Integrating Project MATURE. Sergej Sizov is the research fellow in the ISWeb group (Information Systems & Semantic Web) at the University of Koblenz-Landau, Germany. He holds PhD degrees in Applied Mathematics and Computer Science. In the past, he held positions as researcher, project leader, and lecturer at the University of Saarland, Germany, and the Max-Planck Institute for Computer Science, Germany. In his prior work, he substantially contributed to the methodology of thematically focused Web exploration, collaborative IR methods in decentralized environments, and meta methods for Web-based machine learning applications. His research interests include thematically focused Web search, self-organizing folksonomies, peer-to-peer search and retrieval. Jing Sun is a senior lecturer at the Department of Computer Science, The University of Auckland, New Zealand. He obtained his PhD degree from the Department of Computer Science, National University of Singapore in March 2004. Dr. Sun’s research interests include Software Engineering, Formal Methods, and Semantic Web. Antti Vehviläinen was a research assistant at the Semantic Computing Research Group (SeCo) at the Laboratory of Media Technology at the Helsinki University of Technology. He is currently working at Accenture Ltd in Finland.
About the Contributors
Max Völkel is working as a PhD student and research assistant at the Forschungzentrum für Informatik (FZI) at the Universität Karlsruhe(TH). His topics are personal knowledge management, Semantic Web infrastructure, and semantic wikis. He has organized several workshops on semantic wikis (SemWiki 2006 at the ESWC, see http://semwiki.org). In the Knowledge Web Network of Excellence, he has worked on versioning RDF data and programmatic access of ontologies (RDFReactor, http://rdfreactor.semweb4j.org). He works in the EU-project NEPOMUK to build a next-generation knowledge articulation tool. He is the author of a number of RDF-based tools such as RDF2Go (http://rdf2go.semweb4j.org), a triple store abstraction layer. Currently he works on a Semantic Web content repository (http://swecr. semweb4j.org), unifying RDF and Web 2.0 content management. He is also one of the founders of the Semantic MediaWiki project (http://ontoworld.org). His recent works include management of semantic content and gradual formalization of text to structures and finally into semantic content. Hai H. Wang obtained Bachelor (1st class honors) and PhD degrees from the School of Computing, National University of Singapore (NUS) in 2001 and 2004. He worked as a research assistant in the School of Computing at NUS from 2001-2003 and as a research associate in the School of Computer Science at The University of Manchester from 2003-2006. Since 2006, he has been in the School of Electronics and Computer Science at The University of Southampton where he worked as a research fellow. His main interests include software engineering, formal methods ontology, and Semantic Web.
Index
A added-value analysis (AVA) 182, 188, 189, 190, 191, 198 added-value service (AVS) 187 architecture of participation 8 artificial intelligence (AI) 10, 34, 35, 42, 112, 137, 154, 174, 175, 176, 203, 232, 313, 322 asynchronous JavaScript and XML (AJAX) 3, 5, 7, 8, 12, 38, 77, 79, 118, 119, 121, 124, 130, 304, 307, 311, 320 Authoring Problem 183
C communicating sequential processes (CSP) 263–280, 332 communication channels 17, 18, 21, 23, 26, 265 communication systems 17, 18, 19, 25, 31, 326 competency gap 204, 206 ConneXions 181, 182, 183, 191, 194, 195, 196, 197, 198, 199, 201, 314, 317, 322, 333 context-aware learning object 214, 325 context manager 208 CPoint 181, 182, 191, 192, 193, 194, 200, 324 Cyc AI project 154, 226
D data integration 10, 11, 108, 248, 249, 257, 260, 325 data modeling 281, 282, 283, 309, 310
DeepaMehta 158, 159, 160, 161, 162, 163 164, 165, 166, 154, 166, 159, 162, 163, 165, 166, 167, 168, 169, 170, 171, 172, 173, 178, 249, 331 double relativity 181, 182, 186, 187, 198
E e-portfolios 35, 37 editing meta-model (EMM) 281, 282, 283, 2 84, 285, 286, 287, 289, 290, 291, 292, 295, 299, 303–310 editing profiles 283, 304 electronic mail (e-mail) 7, 8, 12, 15–33, 101, 140, 142, 144, 146, 155, 156, 157, 158, 160, 161, 166, 171, 172, 207, 212, 220, 245, 248, 251, 253 257, 314, 315, 317, 318, 325, 326 327, 335
F folksonomies 3, 12, 115, 117, 121, 123, 127–132, 321 Fresnel display vocabulary 281 Fresnel language 14, 281, 282, 283, 285, 291, 292, 293, 294, 295, 299, 303, 309, 310, 311, 312, 315, 330 Fresnel lens 198, 282, 283, 285, 286, 291– 299, 303–310
G gleaning resource descriptions from dialects of languages (GRDDL) 12, 14, 56, 67, 122, 125, 130, 137, 322, 335
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Index
H
N
help desk service 100, 101, 107, 110, 112, 323 Hyena editor 307, 312 hyperlinks 35, 36, 37, 56, 60, 69, 70, 72, 73, 143, 144, 218
named entity recognition 237, 240, 243, 314, 323, 324, 333 natural language processing (NLP) 149, 218–225, 231–335 news groups 19. See mailing lists
I
O
information extraction 14, 103, 238, 239, 240–243, 317, 320–333 information integration 2, 11, 144, 152, 248, 261, 331 instant messaging (IM) 3, 19, 25 Internet relay chat (IRC) 17, 19, 21–30, 168, 319, 321 Internet telephony 18, 21, 23
Object-Z specification language 263, 264, 265, 266, 267, 272, 274 oil industry 92 ontologies 2, 14, 36, 49, 50–57, 61, 73–77, 81–98, 100–137, 141, 148, 155, 162, 170, 212, 213, 230–268, 274, 277, 279, 309, 315, 320, 322, 323, 327, 328, 330 ontology, CoolOntNews 73, 77 ontology, drilling 89, 92, 94, 95, 92 ontology, Dublin Core 55, 57, 62, 73, 75, 77, 78, 94, 112, 252, 312, 323 ontology, system 47, 49, 50–64 open mathematical documents (OMDoc) 47–68, 121, 192, 193, 194, 200, 324, 327, 328, 329 operations, integrated 97, 98, 320
J journalism, citizen 71, 72, 81 journalism, participatory 71
K knowledge capture 245, 246 knowledge management (KM) 18, 20, 32–50, 65–96, 121, 138, 139, 140–160, 200–215, 243, 248, 249, 260, 313, 316, 321–335 knowledge maturity 213 knowledge workers 17, 22, 23, 24, 141, 145, 151, 157, 158, 259, 320, 323
L learning, context-steered 202–213 learning objects (LOs) 205–213 learning on demand 202–212
M mailing lists 19, 26, 27. See news groups mathematical knowledge management (MKM) 50, 51, 65, 67, 68, 200, 322, 324 metadata 1, 2, 11, 25–98, 113–137, 152, 195, 197, 208, 212, 246, 248, 249, 250–260, 303, 321, 323, 328, 335
P personal knowledge management (PKM) 138, 140 –149 personal learning environment (PLE) 212 personal semantic wikis (PSW) 147–149 Prisoner's Dilemma 181–198 Prisoner's Dilemma, Semantic 181–198
R RDFa 13, 116, 119, 122, 124, 125, 126, 130, 137, 282, 302, 303, 311, 312, 313, 335 relationships extraction 232 resource description framework (RDF) 2–14, 36, 38, 41, 50–66, 72–77, 93–99, 113–130, 137, 147, 148, 160–169, 170, 178, 248, 249, 252, 257–261, 268, 277, 278, 279, 281– 287, 291–298, 300–316, 323–334
Index
S
T
semantically-enhanced personal knowledge management (SPKM) 140, 145, 146, 149, 150 semantic annotation 93–110, 147, 177, 218, 223, 242, 253, 256, 258, 329 semantic desktop 14–30, 83, 146–178, 218, 249, 261, 312, 318–335 semantic glue 246, 253, 254 semantic links (semalinks) 73, 76 semantics, collaborative 247 semantic search 10, 48, 66, 95, 97, 100–126, 254, 327 semantic technologies 4, 85, 149, 169, 213, 247 Semantic Web 1–14, 30–116, 121–148, 151, 155, 160, 162, 174, 177, 178, 18–199, 213–219, 232–281, 308, 311–335 semantic wiki for mathematical knowledge management (SWiM) 47–58, 61–66, 119 semantic wiki for mathematical knowledge management (SWiM) 325 semantic work environments (SWEs) 17–29, 182, 183, 185, 198, 217–219, 236 social semantic work environments (SSWEs) 4, 5, 8 social software 2, 3, 12, 35, 40, 50, 51, 71 136, 315 social software, semantic 12
tagging, social 40, 116, 117, 121, 123, 128, 129, 116, 118, 119 timed communicating Object-Z (TCOZ) 263–280, 318, 326, 333 topic maps 164, 165, 166, 169, 178, 330
V video conferencing 21
W Web 2.0 1–15, 121, 138, 183, 201, 213, 316, 329 Web logs (blogs) 15, 17, 18, 19, 20, 21, 26, 31, 34, 38, 39, 44, 307, 308, 311, 322, 323, 324 Web logs, semantic 37, 39 Web ontology language (OWL) 2, 36, 38, 42, 50–81, 86, 92–98, 113–248, 260–291, 305–334 Web ontology language with description logics (OWL-DL) 52–67, 263, 268, 277, 332 what you see is what you get (WYSIWYG) 6, 50–63, 116–129 wikis 5, 21–77, 115–128, 137, 152, 217, 235, 249, 250–261, 330, 334, 335 wikis, semantic 26, 27, 30, 37, 47, 50–54, 67, 72, 116, 117, 122, 132–136, 140–148, 220–256, 318, 320, 334 word sense disambiguation 236, 239, 240, 313, 321, 323 work processes, personal 17, 22, 23, 142