Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6115
Frank Eliassen Rüdiger Kapitza (Eds.)
Distributed Applications and Interoperable Systems 10th IFIP WG 6.1 International Conference, DAIS 2010 Amsterdam, The Netherlands, June 7-9, 2010 Proceedings
13
Volume Editors Frank Eliassen University of Oslo, Department of Informatics P.O. Box 1080 Blindern, 0316 Oslo, Norway E-mail:
[email protected] Rüdiger Kapitza Friedrich-Alexander University Erlangen-Nuremberg, Computer Science 4 Martensstraße 1, 91058 Erlangen, Germany E-mail:
[email protected]
Library of Congress Control Number: 2010927935 CR Subject Classification (1998): C.2, D.2, H.4, H.5, H.3, C.4 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13
0302-9743 3-642-13644-3 Springer Berlin Heidelberg New York 978-3-642-13644-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © IFIP International Federation for Information Processing 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Foreword
In 2010 the International Federated Conference on Distributed Computing Techniques (DisCoTec) took place in Amsterdam, during June 7-9. It was hosted and organized by the Centrum voor Wiskunde & Informatica. DisCoTec conferences jointly cover the complete spectrum of distributed computing subjects ranging from theoretical foundations to formal specification techniques to practical considerations. The 12th International Conference on Coordination Models and Languages (Coordination) focused on the design and implementation of models that allow compositional construction of large-scale concurrent and distributed systems, including both practical and foundational models, run-time systems, and related verification and analysis techniques. The 10th IFIP International Conference on Distributed Applications and Interoperable Systems in particular elicited contributions on architectures, models, technologies and platforms for large-scale and complex distributed applications and services that are related to the latest trends towards bridging the physical/virtual worlds based on flexible and versatile service architectures and platforms. The 12th Formal Methods for Open Object-Based Distributed Systems and 30th Formal Techniques for Networked and Distributed Systems together emphasized distributed computing models and formal specification, testing and verification methods. Each of the three days of the federated event began with a plenary speaker nominated by one of the conferences. On the first day Joe Armstrong (Ericsson Telecom AB) gave a keynote speech on Erlang-style concurrency, and on the second day Gerard Holzmann (Jet Propulsion Laboratory, USA) discussed the question “Formal Software Verification: How Close Are We?”. On the third and last day Joost Roelands (Director of Development Netlog) presented the problem area of distributed social data. In addition, there was a joint technical session consisting of one paper from each of the conferences and an industrial session with presentations by A. Stam (Almende B.V., Information Communication Technologies) and M. Verhoef (CHESS, Computer Hardware & System Software) followed by a panel discussion. There were four satellite events: the Third DisCoTec Workshop on Contextaware Adaptation Mechanisms for Pervasive and Ubiquitous Services (CAMPUS), the First International Workshop on Interactions Between Computer Science and Biology (CS2BIO) with keynote lectures by Luca Cardelli (Mi´ crosoft Research - Cambridge, UK) and J´erˆome Feret (INRIA and Ecole Normale Sup´erieure - Paris, France), the First Workshop on Decentralized Coordination of Distributed Processes (DCDP) with a keynote lecture by Tyler Close (Google), and the Third Interaction and Concurrency Experience Workshop with keynote lectures by T. Henzinger (IST, Austria) and J.-P. Katoen (RWTH Aachen University, Germany).
VI
Foreword
I hope this rich program offered every participant interesting and stimulating events. It was only possible thanks to the dedicated work of the Publicity Chair Gianluigi Zavattaro (University of Bologna, Italy), the Workshop Chair Marcello Bonsangue (University of Leiden, The Netherlands) and the members of the Organizing Committee, Susanne van Dam, Immo Grabe, Stephanie Kemper and Alexandra Silva. To conclude I want to thank the sponsorship of the International Federation for Information processing (IFIP), the Centrum voor Wiskunde & Informatica and the Netherlands Organization for Scientific Research (NWO). June 2010
Frank S. de Boer
Preface
This volume contains the proceedings of DAIS 2010, the IFIP International Working Conference on Distributed Applications and Interoperable Systems. The conference was held in Amsterdam, The Netherlands, during June 7-9, 2010 as part of DisCoTec (Distributed Computing Techniques) Federated Conference, together with the International Conference on Formal Techniques for Distributed Systems (FMOODS & FORTE) and the International Conference on Coordination Models and Languages (COORDINATION). The DAIS 2010 conference was sponsored by IFIP (International Federation for Information Processing) in-cooperation with ACM SIGSOFT and ACM SIGAPP, and it was the tenth conference in the DAIS series of events organized by IFIP Working Group 6.1. The conference program presented the state of the art in research on distributed and interoperable systems. Distributed application technology has become a foundation of the information society. New computing and communication technologies have brought up a multitude of challenging application areas, including mobile computing, interenterprise collaborations, ubiquitous services, service-oriented architectures, autonomous and self-adapting systems, peer-to-peer systems, just to name a few. New challenges include the need for novel abstractions supporting the development, deployment, management, and interoperability of evolutionary and complex applications and services, such as those bridging the physical/virtual worlds. Therefore, the linkage between applications, platforms and users through multidisciplinary user requirements (like security, privacy, usability, efficiency, safety, semantic and pragmatic interoperability of data and services, dependability, trust and self-adaptivity) becomes of special interest. It is envisaged that future complex applications will far exceed those of today in terms of these requirements. The main part of the conference program comprised presentations of the accepted papers. This year, the technical program of DAIS drew from 53 submitted papers. All papers were reviewed by at least three reviewers. After initial reviews were posted, a set of candidate papers were selected and subject to discussion among the reviewers and Program Committee Chairs to resolve differing viewpoints. As a result of this process, 17 full papers were selected for inclusion in the proceedings. The papers presented at DAIS 2010 address integration and QoS provisioning of ubiquitous services and applications, grid computing including reconfiguration languages and volunteer computing, sensor network distributed programming and middleware, context-awareness, composition and discovery of ubiquitous services, fault-tolerance and fault-tolerant controllers, cloud and cluster computing, and adaptive and (re)configurable systems. Finally, we would like to take this opportunity to thank the numerous people whose work made this conference possible. We wish to express our deepest
VIII
Preface
gratitude to the authors of submitted papers, to all Program Committee members for their active participation in the paper review process, to all external reviewers for their help in evaluating submissions, to the Centrum Wiskunde & Informatics (CWI) for hosting the event, to the Publicity Chairs, to Romain Rouvoy who took care of the practical work with the proceedings, to the DAIS Steering Committee for their advice, and to Frank S. de Boer for acting as General Chair of the joint event. June 2010
Frank Eliassen R¨ udiger Kapitza
DAIS 2010 Organization
Executive Committee Program Chairs:
Publicity Chairs:
Proceedings Chair:
Frank Eliassen (University of Oslo, Norway) R¨ udiger Kapitza (University of Erlangen, Germany) Hans P. Reiser (University of Lisbon, Portugal) Johan Fabry (Universidad de Chile, Chile) Charles Zhang (Hong Kong University of Science and Technology, China) Gianluigi Zavattaro (University of Bologna, Italy) Romain Rouvoy (University of Lille 1, France)
Steering Committee Kurt Geihs Jadwiga Indulska Lea Kutvonen Elie Najm Rui Oliveira Ren´e Meier Twittie Senivongse Sotirios Terzis
University of Kassel, Germany University of Queensland, Australia University of Helsinki, Finland ENST, France Universidade do Minho, Portugal Trinity College Dublin, Ireland Chulalongkorn University, Thailand University of Strathclyde, UK
Sponsoring Institutions IFIP WG 6.1 ACM SIGSOFT ACM SIGAPP
Program Committee M. Aoyama J. E. Armend´ ariz-´I˜ nigo D. Bakken Y. Berbers A. Beresford A. Beugnard G. Blair
Nanzan University, Japan Universidad P´ ublica de Navarra, Spain Washington State University, USA Katholieke Universiteit Leuven, Belgium University of Cambridge, UK TELECOM Bretagne, France Lancaster University, UK
X
Organization
A. Casimiro E. Cecchet I. Demeure S. Dobson J. Dowling D. Donsez N. Dulay F. Eliassen S. Elnikety P. Felber K. Geihs N. Georgantas K. G¨ oschka R. Grønmo D. Hagimont S. Hallsteinsen P. Herrmann J. Indulska R. Kapitza H. K¨ onig R. Kroeger L. Kutvonen W. Lamersdorf M. Lawley P. Linington C. Linnhoff-Popien K. Lund R. Macˆedo R. Meier A. Montresor E. Najm N. Narasimhan R. Oliveira G. Pierre P. Pietzuch A. Puder R. Rouvoy D. Schmidt T. Senivongse
University of Lisbon, Portugal University of Massachusetts, USA ENST, France University of St. Andrews, UK SICS, Sweden Universit´e Joseph Fourier - Grenoble 1, France Imperial College London, UK University of Oslo, Norway Microsoft Research, USA Universit´e de Neuchˆatel, Switzerland University of Kassel, Germany INRIA, France Vienna University of Technology, Austria SINTEF, Norway INP Toulouse, France SINTEF, Norway NTNU Trondheim, Norway University of Queensland, Australia University of Erlangen-N¨ urnberg, Germany BTU Cottbus, Germany Wiesbaden University of Applied Sciences, Germany University of Helsinki, Finland University of Hamburg, Germany Queensland University of Technology, Australia University of Kent, UK Munich University, Germany Norwegian Defence Research Establishment (FFI), Norway Federal University of Bahia, Brazil Trinity College Dublin, Ireland University of Trento, Italy ENST, France Motorola Labs, USA Universidade do Minho, Portugal Vrije Universiteit Amsterdam, The Netherlands Imperial College London, UK State University San Francisco, USA University of Lille 1, France Vanderbilt University, USA Chulalongkorn University, Thailand
Organization
K. Sere S. Terzis H. Yokota
XI
˚ Abo Akademi University, Finland University of Strathclyde, UK Tokyo Institute of Technology, Japan
Distributed Computing Techniques 2010 Organizing Committee Frank S. de Boer Susanne van Dam Immo Grabe Stephanie Kemper Alexandra Silva Gianluigi Zavattaro Marcello M. Bonsangue
Centrum Wiskunde & Informatica (CWI), The Netherlands (General Chair) Centrum Wiskunde & Informatica (CWI), The Netherlands Centrum Wiskunde & Informatica (CWI), The Netherlands Centrum Wiskunde & Informatica (CWI), The Netherlands Centrum Wiskunde & Informatica (CWI), The Netherlands University of Bologna, Italy (Publicity Chair) University of Leiden, The Netherlands (Workshops Chair)
Additional Referees L. Broto S. Chollet A. Diaconescu I. Dionysiou M. D¨ urr H. Gjermundrød K. Hamann T. H¨onig K. Jander
F.A. Kraemer L. Laibinis R. Lasowski J. Pereira L. Petre W. Rudametkin P. Sandvik S. Schober M. Segarra
L. Seinturier G. Simon A. Vilenica Y. Watanabe K. Weckemann T. Weise M. Werner
Table of Contents
Ubiquitous Services and Applications RESTful Integration of Heterogeneous Devices in Pervasive Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel Romero, Gabriel Hermosillo, Amirhosein Taherkordi, Russel Nzekwa, Romain Rouvoy, and Frank Eliassen Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shanshan Jiang, Svein Hallsteinsen, Paolo Barone, Alessandro Mamelli, Stephan Mehlhase, and Ulrich Scholz
1
15
Grid Computing Validating Evolutionary Algorithms on Volunteer Computing Grids . . . . Travis Desell, Malik Magdon-Ismail, Boleslaw Szymanski, Carlos A. Varela, Heidi Newberg, and David P. Anderson
29
A Reconfiguration Language for Virtualized Grid Infrastructures . . . . . . . R´emy Pottier, Marc L´eger, and Jean-Marc Menaud
42
Sensor Networks Distributed Object-Oriented Programming with RFID Technology . . . . . Andoni Lombide Carreton, Kevin Pinte, and Wolfgang De Meuter WISeMid: Middleware for Integrating Wireless Sensor Networks and the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeisa P.O. Domingues, Antonio V.L. Damaso, and Nelson S. Rosa
56
70
Context Awareness Structured Context Prediction: A Generic Approach . . . . . . . . . . . . . . . . . . Matthias Meiners, Sonja Zaplata, and Winfried Lamersdorf
84
Service Orientation Experiments in Model Driven Composition of User Interfaces . . . . . . . . . . Audrey Occello, Cedric Joffroy, and Anne-Marie Dery-Pinna
98
XIV
Table of Contents
Service Discovery in Ubiquitous Feedback Control Loops . . . . . . . . . . . . . . Daniel Romero, Romain Rouvoy, Lionel Seinturier, and Pierre Carton
112
Distributed Fault Tolerant Controllers QoS Self-configuring Failure Detectors for Distributed Systems . . . . . . . . . Alirio Santos de S´ a and Raimundo Jos´e de Ara´ ujo Macˆedo
126
Distributed Fault Tolerant Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leonardo Mostarda, Rudi Ball, and Naranker Dulay
141
Cloud and Cluster Computing Automatic Software Deployment in the Azure Cloud . . . . . . . . . . . . . . . . . Jacek Cala and Paul Watson G2CL: A Generic Group Communication Layer for Clustered Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leandro Sales, Henrique Te´ ofilo, and Nabor C. Mendon¸ca
155
169
Adaptive and (Re)configurable Systems Dynamic Composition of Cross-Organizational Features in Distributed Software Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Walraven, Bert Lagaisse, Eddy Truyen, and Wouter Joosen
183
Co-ordinated Utility-Based Adaptation of Multiple Applications on Resource-Constrained Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrich Scholz and Stephan Mehlhase
198
Collaborative Systems gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amir H. Payberah, Jim Dowling, Fatemeh Rahimian, and Seif Haridi Collaborative Ranking and Profiling: Exploiting the Wisdom of Crowds in Tailored Web Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Felber, Peter Kropf, Lorenzo Leonini, Toan Luu, Martin Rajman, and Etienne Rivi`ere Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
212
226
243
RESTful Integration of Heterogeneous Devices in Pervasive Environments Daniel Romero1, Gabriel Hermosillo1, Amirhosein Taherkordi2, Russel Nzekwa1 , Romain Rouvoy1, and Frank Eliassen2 1
INRIA Lille-Nord Europe, ADAM Project-team University of Lille 1, LIFL CNRS UMR 8022 59650 Villeneuve d’Ascq, France
[email protected] 2 Department of Informatics University of Oslo 0316 Oslo, Norway
[email protected],
[email protected]
Abstract. More and more home devices are equipped with advanced computational capabilities to improve the user satisfaction (e.g., programmable heating system, Internet TV). Although these devices exhibit communication capabilities, their integration into a larger home monitoring system remains a challenging task, partly due to the strong heterogeneity of technologies and protocols. In this paper, we therefore propose to reconsider the architecture of home monitoring systems by focusing on data and events that are produced and triggered by home devices. In particular, our middleware platform, named D IGI H OME , applies i) the REST (REpresentational State Transfer) architectural style to leverage on the integration of multi-scale systems-of-systems (from Wireless Sensor Networks to the Internet) and ii) a CEP (Complex Event Processing) engine to collect information from heterogeneous sources and detect application-specific situations. The benefits of the D IGI H OME platform are demonstrated on smart home scenarios covering home automation, emergency detection, and energy saving situations.
1 Introduction Pervasive environments support context-aware applications that adapt their behavior by reasoning dynamically over the user and the surrounding information. This contextual information generally comes from diverse and heterogeneous sources, such as physical devices, Wireless Sensors Networks (WSNs), and smartphones. In order to exploit the information provided by these sources, an integration middleware is required to collect, process, and distribute the contextual information efficiently. However, the heterogeneity of systems in terms of technology capabilities and communication protocols, the mobility of the different interacting entities and the identification of adaptation situations makes this integration difficult. Thus, we need to provide a flexible solution in terms of communication and context processing to leverage context-aware applications on the integration of heterogeneous context providers. F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 1–14, 2010. c IFIP International Federation for Information Processing 2010
2
D. Romero et al.
In particular, a solution dealing with context information and control environments must be able to connect with a wide range of device types. However, the resource scarceness in WSNs and mobile devices makes the development of such a solution very challenging. In this paper, we propose the D IGI H OME platform, a simple but efficient middleware solution to facilitate context-awareness in pervasive environments. Specifically, D IGI H OME provides support for the integration, processing and adaptation of the context-aware applications. Our solution enables the integration of heterogeneous computational entities by relying on the REST (REpresentational State Transfer) principles [8], standard discovery and communication protocols, and resource representation formats. We believe that the REST concepts of simplicity (in terms of interaction protocols) and flexibility (regarding the supported representation formats) make it a suitable architecture style for integration in pervasive environments. Furthermore, while our solution also benefits from WSNs to operate simple event reasoning on the sensor nodes, we rely on Complex Event Processing [19] for analyzing in real-time the relationships between the different collected events and trigger rule-based adaptations. The rest of this paper is organized as follows. We start by describing a smart home scenario in which we identify the key challenges in pervasive environments that motivate this work (cf. section 2). We continue by the description of D IGI H OME, our middleware platform to support the integration of systems-of-systems in pervasive environments (cf. section 3). Then, we discuss the benefits of our approach (cf. section 4) before discussing the related work (cf. section 5). Finally, we conclude by presenting some promising perspectives for this work (cf. section 6).
2 Background and Motivations This section introduces the application and architectural foundations of our work in sections 2.1 and 2.3, respectively. 2.1 Motivating Scenario A smart home generally refers to a house environment equipped with several types of computing entities, such as sensors, which collect physical information (temperature, movement detection, noise level, light, etc.), and actuators, which change the state of the environment. In this scenario, we consider a smart home equipped with occupancy, smoke detection, and temperature sensors. These tiny devices have the ability of collecting context information and communicating wirelessly with each other, in order to identify the context situation of the environment. In addition to that, we can also find actuators to physically control lights, TV, and air conditioning. Figure 1 illustrates the integration of these sensors and actuators in our scenario. As appreciated in this figure, the different entities use heterogenous protocols to interact. In the scenario, the smart phones provide information about user preferences for home configuration. Conflicts between the user preferences are resolved by giving priority to the person who arrived first to the room. The mobile devices also have an application that enables the control of the actuators present in the different rooms. This application can be adapted when
RESTful Integration of Heterogeneous Devices in Pervasive Environments
Legend
Actuator
3
Information Source
HOME User's Preferences
Activation
Smart Phone SMS, HTTP
Sprinkler Adjust Temperature
SOAP
x10
ACN
Lightning Level
Bulb
HES
Air Conditioner
Event Processing
UPnP
On/Off, Channel, Volume
TCP/IP STB
Room image
ZigBee TV
ZigBee ZigBee
Smoke Detected
Video Camera Current Temperature
ZigBee
People in the Room Smoke Sensor
Temperature Sensor
Occupancy Sensor
Fig. 1. Interactions between the smart home devices
there are changes in the actuator’s configuration. Finally, there is a Set-Top Box (STB) which is able to gather information, and interact with the other co-located devices. Situation 1: Alice arrives to the living room. The occupancy sensor detects her presence and triggers the temperature sensors to decrease the sampling rate of data. It also notifies the STB that the room is occupied by somebody, which in turn tries to identify the occupant by looking for a profile in her mobile device. When Alice’s profile is found, the STB loads it and adjusts the temperature and lightening level of the room according to Alice’s preferences. Situation 2: The sensors detect smoke and notify the STB, which using the occupancy sensor, detects that the house is empty. The STB therefore notifies Alice via an SMS, and includes a picture of the room, captured using the surveillance camera. After checking the picture, Alice decides to remotely trigger the sprinklers using her mobile device. She also tells the system to alert the fire department about the problem. If Alice does not reply to the STB within 5 minutes, the system activates automatically the sprinklers and alerts the fire department. Situation 3: Alice installs a new TV in the bedroom. The STB detects the presence of the new device, identifies it, and downloads the corresponding control software from an Internet repository. The platform tries to locate the available mobile devices and finds Alice’s mobile device. The STB propose to update the mobile device with the components for controlling the new TV.
4
D. Romero et al.
2.2 Key Challenges The above described situations allow us to identify several key challenges in terms of: 1. Integration of multi-scale entities: The mobiles devices and sensors have different hardware and software capabilities, which make some devices more powerful than others. Therefore, this heterogeneity requires a flexible and simple solution that supports multiple interaction mechanisms and considers the restricted capabilities of some devices. In particular, regarding sensor nodes, the immaturity of high-level communication protocols, as well as the inherent resource scarceness, bring two critical challenges to our work: 1) how to connect sensor nodes to mobile devices and actuators through a standard high-level communication protocol, and 2) the framework which runs over sensor nodes for supporting context-awareness and adaptation should not impose high resource demands. 2. Entities mobility: In our scenario, computational entities appear and disappear constantly. In particular, mobile devices providing user profiles are not always accessible (they can be turned off or the owner can leave the house with them). In a similar way, the actuators can be replaced or new ones can be added. Thus, we need to provide the functionality to discover new entities dynamically and to support device disconnections. 3. Information processing and adaptation: In order to support adaptation, we first need to identify the situations in which the adaptation is required. We have a lot of information that is generated by the different devices in the environment and we need to define which part of this information is useful in order to identify relevant situations and react accordingly. In our scenario, those situations include the load of Alice’s profile and the adjustment of the temperature, the sending of an alert via SMS in case of an emergency, and the adaptation of Alice’s mobile device to control the new TV in her bedroom. 2.3 REST: REpresentational State Transfer REpresentational State Transfer (REST) is a resource-oriented software architecture style identified by R. Fielding for building Internet-scale distributed applications [8]. Typically, the REST triangle defines the principles for encoding (content types), addressing (nouns), and accessing (verbs) a collection of resources using Internet standards. Resources, which are central to REST, are uniquely addressable using a universal syntax (e.g., a URL in HTTP) and share a uniform interface for the transfer of application states between client and server (e.g., GET/POST/PUT/DELETE in HTTP). REST resources may typically exhibit multiple typed representations using— for example—XML, JSON, YAML, or plain text documents. Thus, RESTful systems are loosely-coupled systems which follow these principles to exchange application states as resource representations. This kind of stateless interactions improves the resources consumption and the scalability of the system. According to R. Fielding [8], “REST’s client-server separation of concerns simplifies component implementation, reduces the complexity of connector semantics, improves the effectiveness of performance tuning, and increases the scalability of pure server components. Layered system constraints allow intermediaries—proxies, gateways, and
RESTful Integration of Heterogeneous Devices in Pervasive Environments
5
firewalls—to be introduced at various points in the communication without changing the interfaces between components, thus allowing them to assist in communication translation or improve performance via large-scale, shared caching. REST enables intermediate processing by constraining messages to be self-descriptive: interaction is stateless between requests, standard methods and media types are used to indicate semantics and exchange information, and responses explicitly indicate cacheability.” Synthesis. REST identifies an efficient architectural style for disseminating resources, which can be encoded under various representations. Therefore, we believe that REST provides a suitable framework for mediating and processing context information in an efficient and scalable manner.
3 The D IGI H OME Platform In order to address the challenges introduced in section 2.2, we propose the D IGI H OME middleware platform for pervasive environments. The main objective of D IGI H OME is to provide a comprehensive and simple solution for dealing with context processing in this kind of environments. To do that, the platform offers services for integration of context information and the detection of adaptation situations based on this information. D IGI H OME also benefits from the WSNs capabilities to process simple events and make local decisions when possible. With D IGI H OME, we can support variants of the platform for resource-constrained devices (sensor nodes or mobile devices in our scenario) that interact with powerful variants running on more powerful devices (e.g., the STB in the smart scenario). Figure 2 depicts the general architecture of D IGI H OME. In this architecture, the Event Collector retrieves and stores the recent information produced by context collectors, such as mobile devices or sensors. The CEP Engine is responsible for event processing and uses the Decision Executor to perform actions specified by the Adaptation Rules. In D IGI H OME, the integration of the heterogenous entities is achieved via the RESTful Communication middleware framework that provides software connectors following the REST principles. In the rest of this section, we give more details about the integration of the information via REST (cf. section 3.1), the complex event processing (cf. section 3.2) and the distribution of D IGI H OME platforms in WSNs (cf. section 3.3). Finally, section 3.4 reports optimizations that are applied to the platform in order to control the reactivity and the stability of the system. 3.1 RESTful Communication Middleware The integration challenge identified in section 2.2 requires a flexible infrastructure enabling communication and discovery between all the participants (i.e., mobile devices, sensor nodes, actuators, and the set-top box). To address this issue, we classify the heterogeneity in terms of resources and different interaction mechanisms. This communication middleware that we define in D IGI H OME therefore follows the REST principles. The simplicity, lightness, reusability, extensibility, and flexibility properties that characterize REST make it a suitable option for context dissemination in pervasive environments.
6
D. Romero et al.
Lightning Level ACN
Sprinkler
Application
X10
ACN Connector
DigiHome Platform
HTTP
HTTP RESTful Communication
HTTP
X10 Connector
Legend AR
Decision Execution
Adaptation Rule Actuator
Stabilization AR AR AR AR
CEP Engine Stabilization
Information Source Optional Element
Event Collector HTTP
HTTP User's Preferences
SOAP Connector SOAP Environmental Information
HTTP ZigBee Connector ZigBee Current Temperature
Fig. 2. Description of the D IGI H OME architecture
The D IGI H OME communication middleware defines ubiquitous connectors encapsulating the distribution concerns. Software connectors [5,30] isolate interactions between components—i.e., they support the transfer of control and data. The connectors can also provide non-functional services, such as persistency, messaging, and invocation helping to keep the component functionality focused on the domain specific concerns. In this way, the connectors foster the separation of concerns [30]. In the context of D IGI H OME, software connectors do not impact the event processing and support multiple implementations of the communication mechanism. Figure 2 depicts several examples of connectors for supporting protocols such as ZigBee [37], SOAP, and ACN [6]. For purposes of our scenario, we choose HTTP as the default interaction protocol for the middleware platform and the mobile devices. The choice of HTTP is motivated by its simplicity and the possibility to have mobile devices not only as consumer but also as service providers [21,25,31]. Our RESTful communication middleware supports additional protocols, such as XMPP [27] and Twitter [20] that can be more suitable in other situations. The connectors also support spontaneous interoperability [16] to deal with the volatility of pervasive environments. As already mentioned in section 2, mobile devices, sensors, and actuators can continuously appear or disappear from the landscape. Therefore, the D IGI H OME connectors deal with this volatility by means of standard discovery protocols. By exploiting the extension capabilities of these discovery protocols, we can, for example, improve the context information advertisements with Quality of Context (QoC) [18] attributes for provider selection purposes. In particular, in our scenario we use UPnP [33] to discover mobile devices and actuators. We have selected this protocol as several available electronics devices already support UPnP. Furthermore, although UPnP is an XML-based protocol, its application in WSNs does not impact the
RESTful Integration of Heterogeneous Devices in Pervasive Environments
7
energy consumption because we do not need to process XML descriptions in sensor nodes (this is the responsibility of CEP Engine in D IGI H OME), just provide them. 3.2 Complex Event Processing Complex Event Processing (CEP) is a technology for detecting relationships, in realtime, between series of simple and independent events from different sources, using predefined rules [35]. In our scenario, we consider a lot of heterogeneous devices (sensors, mobile devices, etc.) that generate isolated events, which can be used to obtain valuable information and to make decisions accordingly. We can see some examples of this in the scenario, like activating the sprinklers in case of detecting a fire or like updating the mobile device in order to control the new TV. To manage those events, we need a decision-making engine that can process them and that can create relations to identify special situations, using predefined rules. In order to identify the desired events, the CEP Engine requires to communicate with an Event Collector, which is in charge of dealing with the subscriptions to the event sources. When an adaptation situation is detected, a corresponding action is triggered, which can go from an instruction to an actuator, to the adaptation of the system by adding or removing functionality. These actions are received by the Decision Executor, which has the responsibility of communicating with the different actuators in the environment. In D IGI H OME, for the event processing in the set-top box, we use E SPER [7], a java open source stream event processing engine, to deal with the event management and decision making process. We chose E SPER for our platform because is the most supported open source project for CEP and is very stable, efficient, and fairly easy to use. The following code excerpt shows an example of an E SPER rule: s e l e c t sum ( movement ) from M o v e m e n tSen so r E v e n t . win : t i m e ( 6 0 s e c ) This is a rule related to the scenario presented in section 2.1. Here we can see the use of a time window, which is a moving interval of time. The rule collects all the events from the movement sensor from the last 60 seconds. 3.3 Support for Wireless Sensor Networks In D IGI H OME, there are two scopes for event processing: local event processing, and global event processing. To improve the efficiency of the system, the sensor nodes in our configuration can be considered as sensor networks in order to avoid assigning local decisions to the STB, responsible for global concerns. Specifically, the event generated by a sensor node may be of interest to a node within the sensor network. Therefore, instead of going through the centralized server framework for making decisions, the WSN itself takes the responsibility of processing events in a more efficient way. The architecture of D IGI H OME in sensor nodes supports both local event processing and
8
D. Romero et al.
global event forwarding. The latter delegates the decision of global event processing to the STB. In the former case, the layered framework on the sensor node has the ability to connect directly to other sensor nodes in the environment and to deliver an event to the nodes subscribed to that type of event. Furthermore, the framework provides a lightweight mechanism for event processing in order to keep resource usage at a very low level. The execution layer also benefits from our unified communication protocol to send the configuration and adaptation instructions across the WSN. The event manager layer of our framework enables in-WSN decisions, whenever an event should be processed with other relevant events generated by other sensor nodes. As an example, when a temperature sensor senses a high degree, for deducing on fire detection, it needs to become aware of the smoke density in the room. Thus, the collaboration at network management layer becomes an essential need. 3.4 Platform Optimizations Stabilization Algorithms. When system events are gathered from different sensors, they are forwarded to the CEP Engine, which analyses them before deciding which actions should the system perform. This decision-making task is often a costly procedure for the system and thus requires optimization techniques in order to optimize this task. One of these technics can consist in stabilizing the data flow, for example between the Event Collector and the CEP Engine. The role of the stabilization mechanism is therefore to filter events, preventing useless triggering of the decision-making task. In [22], stabilization mechanisms are defined as algorithms and techniques that operate the system reconfigurations or adaptations only when relevant changes occur. In the smart home scenario (cf. section 2.1) the stabilization mechanism can be useful at several levels of our architecture. Typically, we can aggregate context information (e.g., the user’s preferences) or compute the average of some data (e.g., the temperature). We also have the possibility of introducing stabilization mechanisms between the CEP Engine and the Decision Executor in order to avoid the recurrent triggering of unnecessary adaptations (cf. Figure 2). Concerning the implementation of the stabilization mechanisms in our framework, we use the flexible approach proposed in [23]. This approach suggests a composition model, which consists of two modalities: horizontal composition and vertical composition. Horizontal composition consists in executing several stabilization algorithms concurrently, while the vertical composition refers to the sequential application of two or more algorithms on the same data sample. Web Intermediaries. REST enables Web Intermediaries (WBI) to exploit the requests exchanged by the participants in the communication process. WBI are computational entities that are positioned between interacting entities on a network to tailor, customize, personalize, or enhance data as they flow along the stream [15]. Therefore, we can benefit from this opportunity to improve the performance of D IGI H OME. When the provided context information does not change much in time, the messages containing this information can be marked as cacheable within the communication protocol. This kind of annotation enables WBI caches to quickly analyze and intercept context requests always
RESTful Integration of Heterogeneous Devices in Pervasive Environments
9
returning the same document. A similar optimization applies to security issues and the filtering of context requests. Indeed, by using proxy servers as WBI, we can control the requested context resources and decide whether the incoming (or outgoing) context requests need to be propagated to the web server publishing the context resource. Other kinds of WBI can also be integrated in the system to operate, for example, resource transcoding, enrichment, or encryption.
4 Empirical Validation Although the contribution of this paper lies in the adoption of a versatile architecture style for integrating the diversity of device appliances available in the pervasive environments, we have also made a performance evaluation of a prototype, implementing the proposed platform. This experimentation demonstrates the reasonable overhead imposed by the D IGI H OME platform. 4.1 Implementation Details We built a prototype of the D IGI H OME platform based on the F RACTAL component model and we used the J ULIA1 implementation of the F RACTAL runtime environment [2]. In order to test our system, we measured the communication and discovery overheads using our RESTful approach as well as the event processing cost when using E SPER. To obtain these results, we have implemented the scene 1 of the smart home scenario (described in section 2.1). 4.2 Discovery and Communication Overhead Table 1 reports the average latency for the context dissemination via REST. In this setup, we retrieve the user preferences from multiple providers (Nokia N800 Internet Tables and MacBook Pro) and use multiple formats for the context information (i.e., XML, JSON, and Java Object Serialization). To do that, we have installed a lightweight version of the D IGI H OME platform that includes only the RESTful Communication Middleware as well as the Event Collector. We also measured the delay for discovering the information provided by the sources. For discovery, we selected the UPnP and SLP [11] protocols. In the tests, the platform aggregates the user’s preferences to reduce the number of messages exchanged between the provider and the consumer. The measured time corresponds to exchange of REST messages as well as the marshalling/unmarshalling of the information. The cost of executing others protocols, such as ACN and ZigBee was not considered in this paper. The reader can find more information about the overhead introduced by these protocols in [1]. Furthermore, in this experimentation, the preferences retrieval from mobile devices via XML was not possible due to a limitation of the Java Virtual Machines used on the mobile device (CACAOVM & JamVM). Nevertheless, this is not a problem for our approach since several representations are available for the same preference. 1
J ULIA: http://fractal.ow2.org/julia
10
D. Romero et al. Table 1. Performance of the RESTful connectors Retrieval Latency
Client/ Provider(s) MacBook Pro (Local) MacBook Pro/ MacBook Pro MacBook Pro/ N800
Notification Latency
Discovery Latency
Object (ms)
JSON (ms)
XML (ms)
Object (ms)
JSON (ms)
XML (ms)
SLP (ms)
UPnP (ms)
74.3 146.2 339.6
85.5 158.3 375.75
92.5 165.3
78.6 154.67 359.26
90.46 167.48 397.54
97.77 174.88
44.03 63.62 128.99
59.51 120 136
N/A
N/A
4.3 Event Processing Overhead The time for context dissemination as well as for discovery confirms that D IGI H OME can integrate heterogeneous entities with a reasonable performance overhead. Furthermore, according to the documentation provided by E SPER[7], the efficiency of the engine to process the events is such that it exceeds over 500,000 events per second on a workstation and between 70,000 and 200,000 events per second running on an average laptop. The efficiency of the engine makes that the use of event processing in our system can be done at a low cost and given the modularity of our architecture, the E SPER engine can be installed in the device that provides the highest processing power. In the context of the D IGI H OME platform, we observed that E SPER took 1ms on average to process the adaptation rules.
5 Related Work 5.1 Context Dissemination In literature, it is possible to find two kinds of solutions to deal with context integration: centralized and decentralized. In the centralized category we can find middleware, such as PACE [13] and Context Distribution and Reasoning (ConDoR) [26]. PACE proposes a centralized context management system based on repositories. The contextaware clients can interact with the repositories using protocols, such as Java RMI or HTTP. For its part, ConDoR takes advantage of the object-oriented model and ontologybased models to deal with context distribution and context reasoning, respectively. In ConDoR, the integration of the information using different protocols is not considered as an important issue. The problem with this kind of approach is the introduction of a single point of failure into the architecture, which limits its applicability to ubiquitous computing environments. On the other hand, in decentralized approaches we can find solutions like CORTEX [29] and MUSIC [14,17]. CORTEX defines sentient objects as autonomous entities that have the capacity of retrieving, processing, and sharing context information using HTTP and SOAP. MUSIC middleware is another decentralized solution that proposes a peer-to-peer infrastructure dealing with context mediation. The decentralized approaches face the problem of fault tolerance by distributing the information across several machines. However, as well as some centralized solutions, the lack of flexibility in terms of the communication protocols remains a key limitation for these approaches. In addition to that, peer-to-peer approaches have performance and security problems. In D IGI H OME, we provide a decentralized solution, where the different interacting devices
RESTful Integration of Heterogeneous Devices in Pervasive Environments
11
can process the events retrieved from the environment. Furthermore, in D IGI H OME we provide flexibility in terms the interaction by supporting different kinds of communication protocols and we also allow spontaneous interoperability. 5.2 Complex Event Processing Given the increasing interest to integrate the flow of data into the existing systems, CEP has gained some attention as it can help to provide that integration transforming isolated data into valuable information. In this context we can find some works similar to ours in [3] and [36]. In [3], the authors integrate CEP into their existing project called SAPHE (Smart and Aware Pervasive Healthcare), and also use E SPER as their CEP engine. As the project name shows, the project is applied to healthcare and use sensors to monitor a patient’s activity and vital signs. They use CEP to correlate and analyze the sensor data in order to calculate critical factors of the patient locally in their set-top box, without having to send all the events to an external server. In their approach they lack a way to discover new services and they never mention how, if possible, would they interact with actuators in order to adapt to the context and respond to a specific situation. An Event-Driven Architecture (EDA) that combines the advantages of WSN with CEP is presented in [36]. They use an extension of the RFID EPCglobal architecture which allows the interaction of RFID and WSN events. Once the events are collected, they use CEP to detect specific situations. They use a smart shelf application as their scenario to show how the events from both sources can be combined. Even though both technologies seem to interact in their project, their specification is somehow limited because they do not specify how the information obtained could be used, other than generating a report that will be logged in the EPCIS server. 5.3 Wireless Sensor Networks In [32], the authors describe a WSN-specialized resource discovery protocol, called DRD. In this approach, each node sends a binary XML description to another node that has the role of Cluster Head (CH). The CH is selected among all the nodes based on their remaining energy. Therefore, it is necessary to give all the nodes the capacity of being a CH. Consequently, all nodes need an SQLlite database, libxml2 and a binary XML parser in order to implement the CH functionalities. In D IGI H OME, with our modular architecture, we consider the resource constraint of sensors nodes and provide a lightweight version of the platform that delegates complex processing to more powerful devices. Therefore, not all the nodes have to be CH. Furthermore we benefit from the advertisement capacities of sensor nodes to identify adaptation situations. In CoBIs [4], business applications are able to access functionalities provided by the sensor nodes via web services. The major aim of the CoBIs middleware is to mediate service requests between the application layer and the device layer. The focus lies thereby on deployment and discovery of required services. AGIMONE [12] is a middleware solution supporting the integration of WSNs and IP networks. It focuses on the distribution and coordination of WSN applications across WSN boundaries. AGIMONE integrates the AGILLA [10] and L IMONE [9] middleware
12
D. Romero et al.
platforms. AGIMONE is a general-purpose middleware with a uniform programming model for applications, that integrates multiple WSNs and the IP network. In our approach, we also promote the integration of sensor nodes via software connectors. Moreover, we enable spontaneous communications with some sensor nodes that execute a lightweight version of D IGI H OME. 5.4 Enterprise Service Bus Enterprise Service Bus (ESB) is an architectural style that leverages on the integration of business infrastructures. In particular, platforms conforming to the Java Business Integration (JBI) specification [34] define the concept of Binding Component to deal with the heterogeneity of communication standards by exposing the integrated services as WSDL interfaces. The D IGI H OME platform rather exploits the concept of software connector to expose the information available in the surrounding environment as REST resources. This data orientation leverages on the integration of smart home devices and contributes to the reactivity of the system by introducing a reasonable overhead compared to more traditional RPC-based protocols.
6 Conclusions and Future Work In this paper we have presented D IGI H OME, a platform to deal with the mobility, heterogeneity, and adaptation issues in smart homes. In particular, D IGI H OME enables the integration of context information by defining intermediaries that follow REST principles and identifies adaptation situations using this information. The simplicity and data orientation promoted by REST makes it an attractive alternative solution to deal with heterogeneity in terms of interactions. The software connectors of D IGI H OME also enable spontaneous communications by supporting standard protocols (e.g., UPnP and SLP) and furnishing context provider selection (based on QoC attributes). On the other hand, the modularized architecture of D IGI H OME allows the definition of variants for the platform that can be deployed on resource-constrained devices. Furthermore, the clear separation of concerns in the D IGI H OME architecture encourages the exploitation of WSNs for simple processing and local decision making. The suitability of our platform for context integration was evaluated with different discovery and context representations. Future work includes the integration of our platform with FraSCAti [28], a platform conforming to the SCA specification [24]. By integrating our approach with SCA, we foster the reuse of the different components of D IGI H OME’s architecture and support the use of different technologies for the implementation of its components. Furthermore, we can benefit from FraSCAti’s reconfiguration capabilities and the separation of concerns of SCA to integrate new communication and discovery protocols at runtime. Acknowledgement. This work is partly funded by the EGIDE Aurora and INRIA SeaS research initiatives.
RESTful Integration of Heterogeneous Devices in Pervasive Environments
13
References 1. Zigbee Alliance. ZigBee and Wireless Radio Frequency Coexistence (June 2007), http://www.zigbee.org/imwp/download.asp?ContentID=11745 ´ Coupaye, T., Leclercq, M., Qu´ema, V., Stefani, J.-B.: The F RACTAL compo2. Bruneton, E., nent model and its support in Java. Software: Practice and Experience – Special issue on Experiences with Auto-adaptive and Reconfigurable Systems 36(11-12), 1257–1284 (2006) 3. Churcher, G.E., Foley, J.: Applying and extending sensor web enablement to a telecare sensor network architecture. In: COMSWARE 2009: Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE, pp. 1–6. ACM, New York (2009) 4. COBIS Consortium. Cobis. fp strep project ist 004270 (2009), http://www.cobis-online.de 5. Crnkovic, I.: Building Reliable Component-Based Software Systems. Artech House, Inc., Norwood (2002) 6. Entertainment Services and Technology Association (ESTA). Architecture for Control Networks (ACN), http://www.engarts.eclipse.co.uk/acn/ 7. EsperTech. Esper, http://esper.codehaus.org/ 8. Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine (2000) 9. Fok, C.-L., Roman, G.-C., Hackmann, G.: A lightweight coordination middleware for mobile computing. In: De Nicola, R., Ferrari, G.-L., Meredith, G. (eds.) COORDINATION 2004. LNCS, vol. 2949, pp. 135–151. Springer, Heidelberg (2006) 10. Fok, L., Roman, G.-C., Lu, C.: Mobile agent middleware for sensor networks: An application case study. In: IPSN 2005: Proceedings of the International Conference on Information Processing in Sensor Networks. IEEE, Los Alamitos (2006) 11. Guttman, E., Perkins, C., Veizades, J., Day, M.: Service Location Protocol, Version 2. RFC 2608 (Proposed Standard) (June 1999), http://tools.ietf.org/html/rfc2608 12. Hackmann, G., Fok, C.-L., Roman, G.-C., Lu, C.: Agimone: Middleware support for seamless integration of sensor and ip networks. In: Gibbons, P.B., Abdelzaher, T., Aspnes, J., Rao, R. (eds.) DCOSS 2006. LNCS, vol. 4026, Springer, Heidelberg (2006) 13. Henricksen, K., Indulska, J., Mcfadden, T.: Middleware for Distributed Context-Aware Systems. In: International Symposium on Distributed Objects and Applications (DOA 2005), pp. 846–863. Springer, Heidelberg (November 2005) 14. Hu, X., Ding, Y., Paspallis, N., Bratskas, P., Papadopoulos, G.A., Barone, P., Mamelli, A.: A Peer-to-Peer based infrastructure for Context Distribution in Mobile and Ubiquitous Environments. In: Proceedings of 3rd International Workshop on Context-Aware Mobile Systems (CAMS 2007), Vilamoura, Algarve, Portugal (November 2007) 15. IBM. Web Intermediaries (WIB), http://www.almaden.ibm.com/cs/wbi/ 16. Kindberg, T., Fox, A.: System software for ubiquitous computing. IEEE Pervasive Computing 1(1), 70–81 (2002) 17. Kirsch-Pinheiro, M., Vanrompay, Y., Victor, K., Berbers, Y., Valla, M., Fr`a, C., Mamelli, A., Barone, P., Hu, X., Devlic, A., Panagiotou, G.: Context Grouping Mechanism for Context Distribution in Ubiquitous Environments. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 571–588. Springer, Heidelberg (2008) 18. Krause, M., Hochstatter, I.: Challenges in Modelling and Using Quality of Context (QoC). In: Magedanz, T., Karmouch, A., Pierre, S., Venieris, I.S. (eds.) MATA 2005. LNCS, vol. 3744, pp. 324–333. Springer, Heidelberg (2005) 19. Luckham, D.C.: The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)
14
D. Romero et al.
20. Makice, K.: Twitter API: Up and Running Learn How to Build Applications with the Twitter API. O’Reilly Media, Inc, Sebastopol (2009) 21. Nokia. Mobile Web Server (2008), http://wiki.opensource.nokia.com/projects/Mobile_Web_Server 22. Nzekwa, R., Rouvoy, R., Seinturier, L.: Towards a Stable Decision-Making Middleware for Very-Large-Scale Self-Adaptive Systems. In: BENEVOL 2009: The 8th BElgianNEtherlands software eVOLution seminar (2009) 23. Nzekwa, R., Rouvoy, R., Seinturier, L.: A Flexible Context Stabilization Approach for SelfAdaptive Application. In: Proceedings of the 7th IEEE Workshop on Context Modeling and Reasoning (CoMoRea), Mannheim, Germany, March 2010, p. 6 (2010) 24. Open SOA. Service Component Architecture Specifications (November 2007), http://www.osoa.org/display/Main/ Service+Component+Architecture+Home 25. OSGi Alliance. OSGi- The Dynamic Module System for Java, http://www.osgi.org 26. Paganelli, F., Bianchi, G., Giuli, D.: A Context Model for Context-Aware System Design Towards the Ambient Intelligence Vision: Experiences in the eTourism Domain. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 173– 191. Springer, Heidelberg (2007) 27. Saint-Andre, P.: RFC 3920 - Extensible Messaging and Presence Protocol (XMPP): Core (January 2004), http://tools.ietf.org/html/rfc3920 28. Seinturier, L., Merle, P., Fournier, D., Dolet, N., Schiavoni, V., Stefani, J.-B.: Reconfigurable sca applications with the frascati platform. In: SCC 2009: Proceedings of the 2009 IEEE International Conference on Services Computing, Washington, DC, USA, pp. 268–275. IEEE Computer Society, Los Alamitos (2009) 29. Sorensen, C.-F., Wu, M., Sivaharan, T., Blair, G.S., Okanda, P., Friday, A., Duran-Limon, H.: A context-aware middleware for applications in mobile Ad Hoc environments. In: Proceedings of the 2nd Workshop on Middleware for Pervasive and Ad-hoc Computing (MPAC’04), Toronto, Ontario, Canada, October 2004, pp. 107–110. ACM, New York (2004) 30. Taylor, R.N., Medvidovic, N., Dashofy, I.E.: Software Architecture: Foundations, Theory, and Practice. John Wiley & Sons, Chichester (2009) 31. The Apache Software Foundation. HTTP Server Project, http://httpd.apache.org 32. Tilak, S., Abu-Ghazaleh, N.B., Chiu, K., Fountain, T.: Dynamic Resource Discovery for Wireless Sensor Networks (2005) 33. UPnP Forum. UPnP Device Architecture 1.0. (April 2008), http://www.upnp.org/resources/documents.asp 34. Vinoski, S.: Java Business Integration. IEEE Internet Computing 9(4), 89–91 (2005) 35. Wang, G., Jin, G.: Research and Design of RFID Data Processing Model Based on Complex Event Processing. In: CSSE 2008: Proceedings of the 2008 International Conference on Computer Science and Software Engineering, Washington, DC, USA, pp. 1396–1399. IEEE Computer Society, Los Alamitos (2008) 36. Wang, W., Sung, J., Kim, D.: Complex event processing in epc sensor network middleware for both rfid and wsn. In: ISORC 2008: Proceedings of the 2008 11th IEEE Symposium on Object Oriented Real-Time Distributed Computing, Washington, DC, USA, pp. 165–169. IEEE Computer Society, Los Alamitos (2008) 37. Zigbee Alliance. Zigbee Protocol, http://www.zigbee.org/
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems Shanshan Jiang1, Svein Hallsteinsen1, Paolo Barone2, Alessandro Mamelli2, Stephan Mehlhase3, and Ulrich Scholz3 1 SINTEF ICT, Postboks 4760 Sluppen, 7465 Trondheim, Norway
[email protected],
[email protected] 2 HP Italy, 20063 Cernusco sul Naviglio, Italy
[email protected],
[email protected] 3 European Media Laboratory GmbH, 69118 Heidelberg, Germany
[email protected],
[email protected]
Abstract. In service-oriented computing, the vision is a market of services with alternative providers offering the same services with different cost and quality of service (QoS) properties, where applications form and adapt dynamically through dynamic service discovery and binding. To ensure decent and stable QoS to end users and efficient use of resources, it is required that both client applications and service implementations are able to adapt both their internal configuration and their binding to other actors in response to changes in the environment. To this end, service level negotiation and agreements (SLA) are important to ensure coordinated end to end adaptation. In this paper we propose a solution based on the integration of an SLA mechanism into a compositional adaptation planning framework and describe a simple yet powerful implementation targeted for resource constrained mobile devices. As validation we include a case study based on a peer-to-peer distributed mobile application. Keywords: Service level agreement, service level negotiation, self-adaptation, service-oriented architecture, adaptation planning.
1 Introduction In service-oriented computing, the vision is that systems providing functionality to end users form dynamically through service discovery and binding at runtime. This is supported by a service “market”, where alternative service providers offer different service levels (SL) for the same services and where service offers appear and disappear and change dynamically. Service level agreement (SLA) serves to establish terms and conditions, especially SL guarantees, between service providers and service consumers, and thus allows systems to control the SL provided to end users. In our work on self-adaptation in mobile and ubiquitous computing environments, we have advocated a combination of component oriented and service oriented adaptation. Service consumers and providers adapt dynamically both their internal component configuration and their service bindings in order to optimize the utility to the end F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 15–28, 2010. © IFIP International Federation for Information Processing 2010
16
S. Jiang et al.
users as well as ensuring efficient utilization of resources. The coordination of the adaptation of part systems is facilitated by service level negotiation and agreements. In the context of the MUSIC project (http://www.ist-music.eu) we have created a development framework based on this approach, including both modeling and middleware support. The principles of this approach to self-adaptation have already been presented and discussed in several publications [1,2,3,4]. The contribution of this paper is to explain the adopted service level negotiation mechanism and how it is integrated with the component level adaptation apparatus to achieve the coordinated adaptation we are seeking. To validate our design we have used the MUSIC framework to implement a peer-to-peer media sharing application, allowing users to dynamically form communities and create and comment a common media collection. By analyzing the design and behavior of this application we can demonstrate that our solution works as intended. The paper is organized as follows: Section 2 presents the MUSIC approach to selfadaptation. Section 3 describes the design and implementation of the MUSIC Negotiation Framework as well as its integration into the adaptation framework. Section 4 presents the InstantSocial case study and demonstrates how SLAs are considered in the adaptation reasoning both at the provider side and the consumer side. Section 5 discusses related work before concluding the paper.
2 Adaptation Framework The MUSIC approach is an externalized approach to the implementation of selfadaptation where the adaptation logic is delegated to generic middleware working on the basis of models of the software and its context represented at runtime [1,2]. These models understand systems as collections of collaborating components, modeled as compositions with typed roles and connectors. A connector models collaboration between two components, where one provides a service to the other. A role models a component providing services to or requiring services from other components. A component is either atomic, or a composition itself, thus allowing hierarchic decomposition. A composition may delegate the provisioning or consumption of a service to the level above it by leaving the appropriate end of the connector unbound. To build system instances according to the above model, we need to find components which conform to the roles in the composition specification, instantiate these components, and connect the component instances according to the composition specification. Typically there will be several component variants matching a role, differing in terms of a set of varying properties. This is modeled by property predictor functions associated with components. Property predictor functions are expressions over the context, the resources and the properties of collaborating components, and in the case of composite components, also the properties of the constituting components. Varying properties typically model variation in extra functional properties (i.e., QoS properties) and resource needs, but may also represent variation in functionality. Thus, by selecting components with different varying properties we can build systems with different properties from the same system model and we can modify the properties of a running system by replacing one or more components.
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
17
A system has a utility function, which expresses how well suited a given configuration is in a given situation based on the predicted values for the varying properties. The utility function is an expression over the predicted properties of the system and the properties of the current context. The adaptation middleware aims to adapt the running systems so as to maximize the overall utility. In Service-Oriented Architecture (SOA) based computing environments, systems are typically distributed with part systems1 deployed on a potentially large number of computers owned and administered by different organizations. Part systems represent end user applications or service providing components, or both. The goal of the adaptation planning is to select appropriate variants that can be used in a composition to optimize the overall utility. However, optimizing utility over the entire set of computers involved is likely to be intractable both from a technical and an administrative point of view. Therefore we limit the scope of system models and the optimization of the utility to part systems, and rely on dynamic service discovery and binding to connect part systems, and on service level negotiation between them to ensure coordinated adaptation. The adaptation planning process considers components installed on its computer to populate the roles of the part system model, service providers located by dynamic service discovery to bind dependencies on external service providers, and takes into account service level agreements with consumers of provided services. Serving external clients consumes resources, and therefore whether or not to publish a service outside the part system or to accept new clients, is also decided at runtime by the adaptation middleware. System models are represented at runtime as plans. A plan contains the details and the QoS properties (in the form of property predictors) of a certain realization. The dependency on an external service is represented as a special kind of plan called service plan, with an associated set of plan variants representing the available service providers. MUSIC provides generic middleware to support running and adapting applications created using the above models. Obviously, components and services will have to be designed to be dynamically replaceable, and handle the transfer of state between variants where necessary. The MUSIC middleware is based on a pluggable architecture and implemented based on an OSGi framework [2], where it is convenient to extend the architecture by plug-ins. The initial architecture proposed has been modified and extended during the implementation process for incorporating the SLA mechanism. Figure 1 gives a simplified view of the MUSIC middleware. Plans and plan variants are stored in the Plan Repository in the Kernel. The Adaptation Manager handles the adaptation planning process for a part system, which is triggered basically by context changes detected by the Context Manager, and by plan changes in the plan repository. The Adaptation Controller coordinates the adaptation process. The Adaptation Reasoner supports different planning heuristics using metadata provided by the plans. The Reasoner builds valid application configurations by solving their dependencies and ranks the configurations by evaluating their utility based on the computation of the predicted properties. The Configuration Executor handles the reconfiguration process 1
In MUSIC a part system may actually span several nodes. However, since this is transparent to the SLA mechanism, we do not explain it further in this paper.
18
S. Jiang et al.
using the plans selected by the Reasoner. The Communication provides basic support for SOA in distributed environment. The Discovery Service publishes and discovers services based on service descriptions using different discovery protocols. Whenever a service is discovered which matches a service dependency of a part system running in a node, a corresponding service plan variant is created in the plan repository. The plan variants are removed from the plan repository whenever the provider disappears or retires the offer. The plan variants will also be updated when the QoS properties provided by the services are changed. The Remoting Service is responsible for the binding and unbinding of services. At the service provider side, it exports services hosted by the provider (i.e. enable them to accept service requests), and at the service consumer side, it provides bindings (i.e. remote access) to the discovered remote services.
Fig. 1. Simplified architecture of MUSIC middleware
In the following we will discuss the design and implementation of the MUSIC Negotiation Framework (MNF, depicted in grey), and its integration with the adaptation framework. The MNF is responsible for service level negotiation and violation handling (cf. Sect. 3.3). It interacts with the Adaptation Manager and the Communication to realize the adaptation process integrated with SLA mechanism.
3 Integrating SLA with the Adaptation Framework Current state-of-the-art work for SLA specification is WS-Agreement [5], a proposed recommendation of the Open Grid Forum. It specifies the general structures and terms for SLA definition and a simple single round negotiation protocol. We have selected WS-Agreement as a starting point for our work. However, WS-Agreement is technically too heavy for resource constrained mobile devices. Therefore, we adopt a custom, lightweight implementation. Below we present the overall approach for the integration work and then describe the main extensions in detail. 3.1 Requirements for the SLA Mechanism In order to achieve coordinated adaptation of part systems, a service provider needs to know about its consumers (e.g. who and how many) and what they need (e.g. service
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
19
level) and incorporate such information into the adaptation process. We have identified a set of requirements for the SLA mechanism in our context: (i) To allow providers to take into account the needs of the current consumers when adapting. The provider should consider the QoS requirements from the consumers (typically as required service levels) and the number of consumers when allocating resources. Such information should be reflected in the utility function so that it can be integrated into the utility-based adaptation reasoning. (ii) To allow providers to notify consumers if they change the service level as a result of adaptation. (iii) To allow the propagation of service level changes throughout the network of providers and consumers. (iv) To give providers the flexibility of withdrawing a service offer, while maintaining the provisioning of the service to current consumers in order to avoid being overloaded. For the server side of the mechanism we have focused on service exchange between peer nodes, typical of collaboration oriented mobile applications, and not specialized service provider nodes. This is also reflected in the case used for validation. However the client side of our solution may also exploit services offered by specialized service provider nodes not using the MUSIC technology. 3.2 Overall Description of the Approach Below we give an overview of how services, service level negotiation, agreement and monitoring are integrated into the adaptation process in MUSIC: 1. Service publication and discovery: In MUSIC, services are advertised with the service levels predicted by the current variant of the service providing application. The advertisement mechanism delivers service descriptions to all the MUSIC nodes which are interested. Such service description contains information needed to properly specify and locate the service, together with a set of properties describing the current service level offered. The service level advertised consists of the predicted property values associated to the component providing the service. When a service is discovered at the consumer side, the advertised service level is used to create a service plan variant in the plan repository, which can be later evaluated by the Adaptation Reasoner when computing the utility of the available compositions. There can be multiple service plan variants for a service plan corresponding to alternative providers that a MUSIC node in the SLA-Consumer role (cf. Sect. 3.3) can select from. 2. Service selection and negotiation: The Adaptation Reasoner selects the most appropriate service offerings among a set of different service providers and different service levels available, each considered as a variant, by deciding if the variant with its service level can contribute to the composition configuration that gives the highest overall utility. If a service variant is selected by the reasoning process, a negotiation process is initiated towards the provider with an offer created based on
20
S. Jiang et al.
the selected service level. If negotiation is successful, an SLA will be created and the service will be provisioned with the guaranteed service level. If negotiation fails2, the Reasoner selects another variant and re-negotiates. The negotiation process thus provides the adaptation planning with a mechanism to bind to the appropriate service provider with guaranteed service levels. 3. Service monitoring. In a ubiquitous service environment, the provided service levels may be dynamically changed. We use a simplified mechanism (cf. Sect. 3.3) to check the conformance of SLAs according to the predicted property values defined in the property predictors leveraging the MUSIC planning mechanism. 4. Service violation and re-negotiation. A service violation discovered by service monitoring will trigger the re-adaptation process of the Adaptation Manager. The violated SLA will be terminated and the Reasoner may select another available service variant and initiate the negotiation process. 3.3 The MUSIC Negotiation Framework The MUSIC middleware can, at the same time, play two separate roles with respect to the negotiation process: It can provide negotiable services to remote nodes (providerside negotiation) and use negotiable services provided remotely (consumer-side negotiation). Provider-side negotiation consists of evaluating an offer coming from a remote node and, possibly, creating an SLA between the parties; consumer-side negotiation consists of creating and submitting a request for reaching an SLA towards a service provider. All the SLAs reached as a result of the negotiation process must be properly monitored to verify their compliance with the agreement terms over the time. The service level negotiation and monitoring capabilities in MUSIC are provided by a custom, lightweight negotiation model, called MUSIC Negotiation Framework (MNF) and implemented by the SLA Manager. The internal components of the SLA Manager and their relationship are depicted in Fig. 2 and briefly described in the following. For a detailed description of interfaces and behavior, readers may refer to [6]. The SLARepresentation allows the creation of a MUSIC-specific, internal representation of an SLA describing the terms of the agreement, the actors involved and their roles, the associated QoS, the SLA state, etc. Once created, an SLARepresentation is stored into the SLARepository, a component which collects all the SLAs created by the MUSIC middleware (both when acting as an SLA-Provider and an SLAConsumer). In addition, it allows other MUSIC middleware components to register as listeners for repository change events happening in the repository. The SLAMonitor constantly monitors the QoS of an offered service and checks it against an SLA reached with a service consumer. In MUSIC, we adopt a simplified mechanism called provider-side SLA monitoring: The provider checks SLA conformance at the end of the adaptation planning process, which is based on the predicted QoS values calculated from the property predictors defined in the service plan variants;
2
Due to the time gap between the adaptation reasoning and the negotiation, negotiation may fail in cases like the provider disappears, changes its service level, or can not accept more consumers. However, as the reasoning time is short, the probability of such failure is small.
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
21
cmp Components
SLA Consumer Plugin SLA Monitor
SLA Negotiator
SLA Representation
Consumer-side Negotiation
Provider-side Negotiation SLA Repository
SLA Provider Plugin
SLA Manager
Fig. 2. Structure of the SLA Manager
the consumer relies on the provider for SLA monitoring by periodically checking the SLA state with the provider3. The SLANegotiator performs all the steps enabling consumer-side and providerside negotiation, described at the beginning of this section. The negotiation logic is supported by corresponding SLA plug-ins that implement specific service level negotiation protocols. The current MNF has available plug-ins for a MUSIC internal negotiation protocol, which is a customized version of WS-Agreement with single round negotiation. In order to integrate MNF into the MUSIC framework (cf. Fig. 1), two additional actions are performed at the end of the planning process: •
•
For all SLA-enabled service plan variants selected by the Adaptation Manager, an SLA negotiation process is triggered for each service. If the negotiation for a service fails, the corresponding service plan variant will be invalidated and a re-adaptation process is triggered. The Adaptation Manager invokes the SLA Manager to check SLA states for all active SLAs provided by the MUSIC node as a mechanism for providerside SLA monitoring.
3.4 Discussion The current MNF implementation and its integration into the MUSIC adaptation framework fulfills the first three requirements listed in Sect. 3.1. For (i): The number of consumers is the same as the number of SLAs and can be easily obtained from the MNF. The QoS requirements of the consumer are reflected in the service offer submitted by the consumer during service level negotiation. Both information can be 3
A common approach for service monitoring from the literature is to use context sensors to gather data about service level metrics and parameters of the provided service at the consumer side. We adopt the simplified mechanism for provider-side SLA monitoring so as to eliminate the needs for consumer-side monitoring. However, context sensors can be readily integrated into the MUSIC framework due to the extensible plug-in architecture. See Sect. 0 for explanation of the rationale for this approach.
22
S. Jiang et al.
included in the utility function for adaptation reasoning. For (ii): The provider updates the SLA states when there is a change in the offered service level and the consumer can detect such change by periodically checking the SLA states with the provider. For (iii): The propagation of service level changes is realized by leveraging the service discovery and the SLA monitoring mechanisms. Requirement (iv) is currently unsupported, but we have designed a mechanism based on special flags that can realize it. The integration implementation has leveraged MUSIC specific features in SLA monitoring to simplify the processing and improve the performance. Firstly, since the MUSIC adaptation framework uses predicted property values for reasoning (i.e., evaluated property values based on the property predictors at the given context), we use the predicted property values when performing provider-side monitoring. In addition, the consumer relies on the provider for SLA monitoring. This mechanism eliminates the need for additional context sensors to collect real-time QoS data both at the provider side and at the consumer side. Secondly, as any cause for service property changes will trigger an adaptation planning process, it is sufficient to check the property values at the end of the planning process. These mechanisms allow for a practical, lightweight implementation for mobile devices. Although our implementation is MUSIC specific, it is quite flexible due to the plug-in architecture. Our current MNF implementation provides plug-ins for the MUSIC internal negotiation protocol. However, by delegating the negotiation logic to plug-ins, alternative negotiation protocols and technologies can easily be incorporated into the MUSIC framework. For example, the MUSIC internal protocol assumes that a service provider will publish only the current service level. To work with nonMUSIC nodes using a negotiation protocol [7] that provides alternative service levels in the service description, plug-ins for that negotiation protocol can be implemented on MUSIC nodes. Such plug-ins must create a service plan variant for each alternative service level, such that they are considered as different variants in the adaptation reasoning, and negotiate the selected service level with the provider. Because our simplified monitoring mechanism relies on the provider for SLA monitoring, it implicitly requires the consumer to trust the provider. If such assumption does not hold in a dynamic environment, the consumer can use context sensor plug-ins to provide consumer-side SLA monitoring as adopted by other SLA frameworks.
4 InstantSocial Case Study InstantSocial (IS) [3] is a media sharing platform for transient user groups that allows members to tag, to comment, and to search for text and images. IS has three design goals: (i) Maintaining a peer-to-peer network yielding high connectivity, (ii) providing access to a large number of media despite varying availability of devices, and (iii) balancing the load upon multiple resource-limited devices. IS accomplishes these goals by building on the previously described SLA capabilities of the MUSIC middleware. The following design extends and refines the IS version described in [3]. 4.1 Design and Utility Function of InstantSocial Figure 3 shows the design of the InstantSocial application: Variant configurations, components, their properties and associated property predictors. Table 1 lists the
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
23
property types and their descriptions. InstantSocial is divided into two parts: The user interface (UI) and the content repository (CR). The content repository instance holds the media items and their associated data, e.g., comments and tags. This component provides and consumes two different services: The context access service (ca) and the routing service (rs). Full
Mini UI
UI
CR >ca >rs
>ca
CR
rs
{avy=(>ca.avy),rut=(>ca.rut+C.rut)/2, rsu=(
ca.avy=f(|ca.rut=f(ca.noc+rs.noc), >rs.rut=f(ca.noc+rs.noc), >rs.conn=s provided, <s consumed s.p: property p of service s C.rut: Resource utilization based on the local context id: The id of the local node f(x)=x/(x+1)
Fig. 3. Design of the InstantSocial application
The content repository is the computationally most demanding part of the InstantSocial application. InstantSocial can run in two variants to allow the reduction of resource usage, if necessary. In the Full variant, the application consists of both parts described above, whereas in the Mini variant InstantSocial does not start a content repository. Instead, it uses the content access service provided by a Full instance in its proximity. Each variant configuration is characterized by three properties: Availability (avy), as a measure for how much media is accessible in the current configuration; resource utilization (rut), indicating how much the node is in use; and routing service utilization (rsu), indicating how much the consumed routing services are under load. The quality of the ca service is characterized by two properties: The availability (ca.avy) and the resource utilization (ca.rut). The ca.avy property serves as an indicator for the amount of media that is accessible through the service. The ca.rut property signalizes how much the providing node is currently used; it depends on the number of consumers (noc) for the provided ca and rs services. InstantSocial instances use a routing service rs to interconnect. Each instance hosts one such service and consumes two (rs1 and rs2). The rs service provides means to route messages through the network and builds an overlay network on the nodes. Note that the overlay network uses directed links between nodes, as opposed to standard network protocols like TCP. The quality of the rs service is determined by a resource utilization (rs.rut) property and a connectivity (rs.conn) property. The rs.rut property is an indicator for how much the routing service is currently used. The rs.conn property indicates the number of other nodes reachable by using this provider. In MUSIC, the utility is a value between 0 and 1. The described properties are mapped into this interval by using the function f(x) (cf. Fig. 3). The utility function of a configuration is defined as: utility= c1⋅avy+c2⋅(1-rut)+c3⋅(1-rsu), where c1, c2, and c3 are relative weights of the properties and they sum up to 1. As media availability is of high importance of InstantSocial, avy should be dominant. For the following scenario, we use c1=0.6, c2=0.3, and c3=0.1. The rut property of a service provider depends on
24
S. Jiang et al.
the number of its service consumers and thus enables its utility function to consider the consumer’s needs during adaptation. The rsu property is used to select rs providers which have less workload. Therefore, it regards the needs (workload) of the provider and helps to spread service uses, thus preventing the overload of individual nodes. Table 1. List of the property types of InstantSocial instances Property type rut rsu avy conn
Description Measure for the resource utilization of a node Measure for the routing service utilization Measure for the availability of media items Connectivity of the node
Value range 0..1 0..1 0..1 Set of nodes
4.2 The InstantSocial Scenario The following scenario – Andy is travelling home after visiting a concert – demonstrates the previously described design. C
D
C
D
B
E
A B
E
Fig. 4. Network layout before (left) and after (right) Andy joins the InstantSocial network
Scene 1: Andy visited a Björk concert and now sits in the train on his way back home. Betty, Chris, David, and Erika, also Björk fans, were already in the train. All of them have their InstantSocial instances (B, C, D, and E, respectively) running in Full mode and they already built the network depicted in Fig. 4 (left) when Andy enters the train. The arrows point from an rs service consumer to the provider. In the depicted initial situation, the connectivity property (avy) of all nodes has the same value because each node can reach the media of all others. However, during network changes, this property differs among the nodes. Each node has its own value of the resource utilization property (rut): For example, the services provided by B are used by one consumer only (rs.rut = 0.5), while D’s services are consumed by three nodes (rs.rut = 0.75). Because the routing service utilization property (rsu) of a node depends on rut of other nodes, this property differs between the nodes. When Andy starts his instance A, the middleware finds the services provided by the nodes that already established the network. After the best combination of services is identified (Table 2), the nodes negotiate SLAs for the routing services. After A arbitrarily connects to B and C, the rs.conn property of A’s provided rs service is updated. Note that in order to avoid numerous re-adaptations, only the rs.conn property is subject to the SLA. In other words, if rs.rut of a consumed service changes then the SLA is not violated. Note that the resource utilization (rut) is independent of rs1 and rs2, and therefore it is not included in Table 2. In this scene, its value is 0, as the services of A have no consumers and there are plenty of local resources available.
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
25
Table 2. Utility of node A based on the its choice of routing services rs1
rs2
avy
rsu
utility
B
C D E D E E
0.8 0.8 0.8 0.8 0.8 0.8
0.5833 0.6250 0.5833 0.7083 0.6667 0.7083
0.5216 0.5175 0.5216 0.5091 0.5133 0.5091
C D
Scene 2: We assume that B is the first node noting this change to its plans. Therefore node B re-adapts and its utility function indicates that it is better to disconnect from D in order to connect to A. This decision is based on the higher availability property of the variants that include A. After negotiating with and connecting to A, B also updates its rs.conn property, which leads to SLA violations on all nodes using B. After all re-adaptations have settled, the network reaches full mutual reachability again, now including node A (Fig. 4, right). Scene 3: On Chris’ mobile the resources start getting too low to run a Full configuration. Because Chris is still browsing through some media items, the adaptation middleware reconfigures his application into the Mini mode and therefore has to choose a content access provider. Candidates are the ca services of A, B, D, and E. These nodes still provide the same availability but different resource utilization (ca.rut), as this latter value depends on number of consumers: The node E has two consumers while A, B, and D have only one each. Therefore, C is free to choose from the latter and finally connects to A. C
D
A
C
D
B
E
A B
E
Fig. 5. Network layout after node C has switched to Mini mode (left) and after the resulting readaptation of node D (right)
Scene 4: After C re-configured into the Mini mode (Fig. 5, left), D and E are no longer able to reach A and B. By chance, D notices the removal of C first. The resulting re-adaptation has to select two nodes among A, B, and E to which D connects. All variants using E have high utility, so this connection is maintained. The choice between A and B is based on their rsu property. Both have one consumer of their routing service, but A has another consumer: The provided content access service is now used by C. The resulting higher rut value of A leads to a higher rsu value for variants that include A. Therefore, D connects to B and again reaches all the nodes in the network (Fig. 5, right).
26
S. Jiang et al.
4.3 Experiences Gained with the MUSIC Approach The adaptive behavior of the system described in the previous section is quite complex and relies on the dynamic balancing of multiple and partly conflicting concerns across a number of users and computers. Nevertheless, the approach described in this paper keeps modeling such systems relatively simple, because a) the separation of the application logic from the adaptation logic decreases complexity significantly, b) the property predictor and utility functions provides a natural way to express the decision logic, and c) the dynamic service discovery and service level negotiation support ensures the necessary coordination between the involved nodes. In our approach a service providing component advertises one service level at any time, the one predicted by the model in the given context. This approach appears to be appropriate in peer-to-peer oriented scenarios like our case study. However, in more client-server oriented scenarios, a server might want to advertise different service levels and prioritize requests in accordance with the agreed service level. Since MUSIC has been perceived primarily as a client-side technology, we have not focused on such requirements. However the consumer side of the MNF allows MUSIC nodes to discover and use services provided by non-MUSIC nodes, which may behave in this way, as already discussed in Sect. 3.4. Another limitation of our approach is that the SLAs may be overly strict. Changing the service property of a MUSIC hosted service always causes a violation of all its SLAs. In some cases, the consumer might prefer a looser SLA and be notified only if the provided service level was decreased or moved outside given bounds. Consider for example the property rs.conn defined above. If a node is added to this set, the change does not have to constitute an SLA violation. The desired semantics of the agreement between provider and consumer in this case is that a particular set of nodes is reachable through the provided service. Consequently, extending this set does not violate this condition. Of course, a proper utility function will choose the same service again, so that the application is not re-configured and for the user everything stays the same. However, preventing SLA violations in such cases would reduce the resources spend on adaptation reasoning. In summary, the proposed SLA architecture is sound and applicable. In particular, it is lightweight compared to existing approaches like WS-Agreement, and thus is more suitable for mobile devices.
5 Related Work Several SLA specifications have been proposed targeting for software-based SLA negotiation, such as WSLA [7] and WS-Agreement [5]. WS-Agreement is the most mature one and we have selected it as the starting point for our work. There exist also several SLA frameworks proposed by projects, such as SLA@SOI [8], BREIN [9], BEinGRID [10], and AssessGrid [11]. These SLA frameworks with their software implementations, however, do not specifically target for self-adaptive service systems, nor do they consider specific resource constraints on mobile devices. In fact, current SLA implementations, e.g. based on WS-Agreement, are technically heavy and not suitable for resource constrained mobile devices.
Hosting and Using Services with QoS Guarantee in Self-adaptive Service Systems
27
Several works have addressed self-adaptation supported by generic middleware, similar to the MUSIC approach. CARISMA [12] focuses on adaptation of middleware level services. Planning consists of choosing among predefined rule-based adaptation policies using utility functions and resolving policy conflicts using an auction-like procedure. CARISMA does not support dynamic service discovery that can trigger application reconfiguration and the rule-based policies do not consider prediction of non-functional properties. However, the auction-like procedure used by CARISMA could be integrated to the MUSIC middleware as a particular negotiation protocol. The self-adaptation techniques proposed by Rainbow [13] are also similar to MUSIC. Rainbow uses component-based architecture model and adaptation strategies based on situation-action rules are scored using utility preferences specified for the quality dimensions, where the adaptation manager selects the highest scoring strategy. QuA [14] is a QoS-aware adaptation framework also based on utility functions. It calculates predicted quality using predictors and specifies quality requirements and adaptation policies using utility functions that map quality prediction to a scalar value. QuA has been applied to support self-adaptive SOA applications by integrating both interface layer and application layer mechanisms providing cross-layer adaptations [15]. However, the QuA middleware has no prototype implementation and does not focus on mobile applications. Genie [16] adopts a similar approach to self-adaptation using component framework and architecture models to support runtime adaptability. It is however not service-oriented and has no dynamic service discovery and SLA support. As far as we know, these self-adaptive systems do not provide an SLA mechanism for adaptation targeted for mobile domain. We are unaware of other work that uses SLA as a mechanism to achieve coordinated end-to-end adaptation. We therefore consider our integration of SLA mechanisms into the adaptation framework and a fully working reference implementation for mobile devices an essential contribution for ensuring QoS-aware and guaranteed self-adaptation.
6 Conclusions and Future Work In this paper we have described how we use and integrate an SLA mechanism in an adaptation framework for self-adaptive service systems in order to allow service providers to take into account the needs of their clients in their adaptation logic and thus achieve coordinated end-to-end adaptation. This approach has been implemented in the MUSIC adaptation framework. As a preliminary validation of the implemented solution, a case study is included, demonstrating how it is exploited in the peer-topeer mobile application InstantSocial to achieve coordinated dynamic adaptation of a set of collaborating application instances on different devices. Initial test shows that the application behaves as described. Performance measurements are currently in progress using several trial applications in addition to InstantSocial. We intend to improve the implementation by extending some capabilities of the framework, which are currently provided at a proof-of-concept level. To simplify implementation, the compliance of a service level with an offer is currently based on an exact match between the values of the QoS required and provided. We plan to introduce a more complex reasoning for handling flexible logic conditions on the QoS
28
S. Jiang et al.
terms to be compared, such as “greater than”, “less then”, “between”, etc. We plan to improve the current implementation to fulfill the last requirement mentioned in Sect. 3.1, i.e., to support the flexibility of selective SLA creations. In addition, we intend to provide additional plug-ins that enhance the MNF by interacting with negotiation protocols different from the MUSIC-specific one. Acknowledgements. This work was partly funded by the European Commission through the project MUSIC (EU IST 035166).
References 1. Rouvoy, R., et al.: Composing Components and Services using a Planning-based Adaptation Middleware. In: Pautasso, C., Tanter, É. (eds.) SC 2008. LNCS, vol. 4954, pp. 52–67. Springer, Heidelberg (2008) 2. Rouvoy, R., et al.: MUSIC: Middleware Support for Self-Adaptation in Ubiquitous and Service-Oriented Environments. In: Cheng, B.H.C., et al. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 164–182. Springer, Heidelberg (2009) 3. Fraga, L., Hallsteinsen, S., Scholz, U.: InstantSocial – Implementing a Distributed Mobile Multi-user Application with Adaptation Middleware. EASST Communications 11 (2008) 4. Hallsteinsen, S., Jiang, S., Sanders., R.: Dynamic software product lines in service oriented computing. In: 3rd Int. Work. on Dynamic Software Product Lines, DSPL (2009) 5. Andrieux, A., et al.: Web Services Agreement Specification (WS-Agreement). Open Grid Forum Recommended Specification (2005) 6. Barone, P.: D4.3 System design of the MUSIC architecture. MUSIC deliverable (2009) 7. Keller, A., Ludwig, H.: The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services. Journal of Network and Systems Management 11(1) (2003) 8. SLA@SOI project, http://sla-at-soi.eu/ 9. BREIN project, http://www.eu-brein.com/ 10. BEinGRID project, http://www.beingrid.eu/ 11. AssessGrid project, http://www.assessgrid.eu/ 12. Capra, L., Emmerich, W., Mascolo, C.: CARISMA: Context-Aware Reflective Middleware System for Mobile Applications. IEEE Trans. On Software Engineering 29(10) (2003) 13. Garlan, D., et al.: Rainbow:Architecture-based self-adaptation with reusable infrastructure. Computer 37(10), 46–54 (2004) 14. Gjørven, E., et al.: Self-adaptive systems: A middleware managed approach. In: Keller, A., Martin-Flatin, J.-P. (eds.) SelfMan 2006. LNCS, vol. 3996, pp. 15–27. Springer, Heidelberg (2006) 15. Gjørven, E., Rouvoy, R., Eliassen, F.: Cross-layer Self-adaptation of Service-Oriented Architectures. In: MW4SOC 2008, pp. 37–42. ACM, New York (2008) 16. Bencomo, N., Blair, G.: Using Architecture Models to Support the Generation and Operation of Component-Based Adaptive Systems. In: Cheng, B.H.C., et al. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 183–200. Springer, Heidelberg (2009)
Validating Evolutionary Algorithms on Volunteer Computing Grids Travis Desell1 , Malik Magdon-Ismail1 , Boleslaw Szymanski1 , Carlos A. Varela1 , Heidi Newberg2 , and David P. Anderson3 1
Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, 12180, USA 2 Department of Physics, Applied Physics and Astronomy, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA 3 U.C. Berkeley Space Sciences Laboratory, University of California, Berkeley, Berkeley, CA 94720, USA
Abstract. Computational science is placing new demands on distributed computing systems as the rate of data acquisition is far outpacing the improvements in processor speed. Evolutionary algorithms provide efficient means of optimizing the increasingly complex models required by different scientific projects, which can have very complex search spaces with many local minima. This work describes different validation strategies used by MilkyWay@Home, a volunteer computing project created to address the extreme computational demands of 3-dimensionally modeling the Milky Way galaxy, which currently consists of over 27,000 highly heterogeneous and volatile computing hosts, which provide a combined computing power of over 1.55 petaflops. The validation strategies presented form a foundation for efficiently validating evolutionary algorithms on unreliable or even partially malicious computing systems, and have significantly reduced the time taken to obtain good fits of MilkyWay@Home’s astronomical models. Keywords: Volunteer Computing, Evolutionary Algorithms, Validation.
1
Introduction
The demands of computational science continue to increase as the rates of data acquisition and modeling complexity far outpace advances in processor speed. Because of this, highly distributed computing environments are very useful in achieving the amount of computational power required. A very effective way of accumulating a large amount of computational power is by utilizing volunteer computing grids with software such as BOINC [1], which provides software to let volunteers easily participate in many different computing projects based on their personal interests. The MilkyWay@Home project 1 has had particular success using this software, gathering a volunteer base of over 27,000 active computing hosts with a combined computing power of over 1.55 petaflops in a little over two years. 1
http://milkyway.cs.rpi.edu
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 29–41, 2010. c IFIP International Federation for Information Processing 2010
30
T. Desell et al.
However, utilizing a volunteer computing grid comes with its own set of unique challenges. As these projects are open to the public and often open source, as in the case of MilkyWay@Home, code run on clients can be executed on any type of hardware, resulting in a very heterogeneous environment. Users can also be geographically distributed, so latency can be highly heterogeneous (e.g., MilkyWay@Home has active users in over 130 countries). Additionally, because users can configure their own hardware and compile the open source application with hardware specific optimizations, errors from improperly configured hardware and compiled applications must be handled without invalidating other work being done by the system. Evolutionary algorithms (EAs) are an approach to performing optimization of challenging problems in computational science, as they can efficiently find global minima in challenging search spaces with many local minima. Traditional methods such as conjugate gradient descent and newton methods quickly get stuck in local minima and fail in these circumstances. This work examines how to efficiently perform validation for evolutionary algorithms on volunteer computing grids, using MilkyWay@Home as a test system. Typically many volunteer computing projects require every result returned by computing hosts to be validated in some manner, however in the case of evolutionary algorithms the number of results requiring validation can be significantly reduced. Two approaches are presented for performing validation. The first is pessimistic, assuming that results are invalid and waiting for their validation before they are used. The other is optimistic, using results as soon as they are reported and later reverting values that are found to be invalid. Using common EAs, differential evolution and particle swarm optimization, optimistic validation is shown to significantly improve the convergence rate of the EAs run by MilkyWay@Home. The remainder of this paper is organised as follows. Section 2 presents the EAs used in this work. The validation strategies used are given in Section 3 and their performance on MilkyWay@Home is discussed in Section 4. The paper concludes with a discussion and future work in Section 5.
2
Evolutionary Algorithms for Continuous Search Spaces
Effective approaches to global optimization for continuous search spaces include differential evolution (DE) and particle swarm optimization (PSO). In general, individuals are sets of parameters to an objective function which is trying to be optimized. Applying the objective function to an individual provides the fitness of that individual, and the evolutionary algorithms evolve individuals through different heuristics to try and find the best possible fitness, which optimizes the objective function. 2.1
Differential Evolution
Differential evolution is an evolutionary algorithm used for continuous search spaces developed by Storn and Price over 1994–1995 [13]. Unlike other evolutionary algorithms, it does not use a binary encoding strategy or a probability
Validating Evolutionary Algorithms on Volunteer Computing Grids
31
density function to adapt its parameters, instead it performs mutations based on the distribution of its population [11]. For a wide range of benchmark functions, it has been shown to outperform or be competitive with other evolutionary algorithms and particle swarm optimization [15]. Differential evolution evolves individuals by selecting pairs of other individuals, calculating their differential, scaling it and then applying it to another parent individual. Some kind of recombination (e.g., binary or exponential) is then performed between the current individual and the parent modified by the differentials. If the fitness of the generated individual is better than the current individual, the current individual is replaced with the new one. Differential evolution is often described with the following naming convention, “de/parent /pairs/recombination”, where parent describes how the parent is selected (e.g., best or random), pairs is the number of pairs used to calculate the differentials, and recombination is the type of recombination applied. Two common recombination strategies are: – binomial recombination, bin(p1 , p2 ): 1 pi if r(0, 1) < σ or i = r(0, D) ci = p2i otherwise – exponential recombination, exp(p1 , p2 ): 1 pi from r(0, 1) < σ or i = r(0, D) ci = p2i otherwise
(1)
(2)
Which take two parents, where the ith parameters of parents p1 and p2 , p1i and respectively, are used to generate the ith parameter of the child, ci . Binomial recombination selects parameters randomly from p1 at a recombination rate σ, and always selects at least 1 parameter of p1 randomly, otherwise it uses the parameter from p2 . Exponential recombination selects all parameters from parent p1 after a randomly chosen parameter, or a random number is generated lower than the recombination rate, whichever comes first. In general, a new potential individual ni (l + 1) for a new population l + 1 is generated from the ith individual xi (l) from the previous population l, and selected if its fitness, f (x), is greater than the previous individual: ni (l + 1) if f (ni (l + 1)) > f (xi (l)) (3) xi (l + 1) = xi (l) otherwise p2i
The j th parameter is calculated given p pairs of random individuals from the population l, where r(l)0 = ... = r(l)2p . θ, φ and σ are the user defined parent scaling factor, recombination scaling factor and crossover rate, respectively. D is the number of parameters in the objective function. b(l) is the best individual in the population l. Two popular variants, used in this paper, are: – de/best/p/bin: ni (l + 1) = bin(xi (l), θb(l)0j + φ
p k=1
2k [r(l)1k j − r(l)j ])
(4)
32
T. Desell et al.
– de/rand/p/bin: ni (l + 1) = bin(xi (l), θr(l)0j + φ
p
2k [r(l)1k j − r(l)j ])
(5)
k=1
For more detail, Mezura-Montes et al. have studied many different variants of differential evolution on a broad range of test functions [10]. 2.2
Particle Swarm Optimization
Particle swarm optimization was initially introduced by Kennedy and Eberhart [9,7] and is a population based global optimization method based on biological swarm intelligence, such as bird flocking, fish schooling, etc. This approach consists of a population of particles, which “fly” through the search space based on their previous velocity, their individual best found position (cognitive intelligence) and the global best found position (social intelligence). Two user defined constants, c1 and c2 , allow modification of the balance between local (cognitive) and global (social) search. Later, an inertia weight ω was added to the method by Shi and Eberhart to balance the local and global search capability of PSO [12] and is used in this work and by most modern PSO implementations. And recently, PSO has been shown to be effective in peer-to-peer computing environments by B´anhelyi et al [2]. The population of particles is updated iteratively as follows, where x is the position of the particle at iteration t, v is it’s velocity, p is the individual best for that particle, and g is the global best position: vi (t + 1) = ω ∗ vi (t) +c1 ∗ rand() ∗ (pi − xi (t)) (6) +c2 ∗ rand() ∗ (gi − xi (t)) xi (t + 1) = xi (t) + vi (t + 1)
3
Validation Strategies
As computing systems get larger, the potential for faulty or erroneous computing hosts increases. This can also be a concern with applications that may occasionally return incorrect results. In the case of volunteer computing systems such as MilkyWay@Home, which are open to the public, there is always the risk of malicious users or bad or improperly configured hardware returning false results. Additionally, as volunteer computing platforms such as BOINC [1] typically encourage participation by awarding credit for completed work and tracking the best participants, there is some incentive for cheating to be awarded more credit than deserved. Different approaches have been developed to perform validation on volunteer computing systems. The BOINC framework provides validation based on redundancy and reaching a quorum [1]. However, validation of every work unit in an asynchronous search setting leads to a large amount of wasted computation.
Validating Evolutionary Algorithms on Volunteer Computing Grids
33
BOINC also provides a strategy which uses a measure of trust. In this strategy, hosts become more trusted as they return results which validate, and lose trust as they return results that are invalid. Using this strategy, results from trusted hosts are assumed to be valid and only occasionally are their results validated to make sure they are still reporting correct results. This approach is unsuitable for EAs however, as a single invalid result that remains in the population can invalidate the entire search. Other work has examined strategies for dissuading participants in volunteer computing systems from cheating by detecting bad hosts [14,8], however these do not address the issue of ensuring correct results. The amount of validation required by EAs can be significantly reduced when compared to strategies which validate every result. This is because EAs only progress when new individuals are inserted into the populations. Individuals with lower fitness are simply discarded.
Table 1. The average number of individuals inserted into the population during the given number of evaluations, averaged over 20 searches with different initial parameters Search 0...25,000 Evaluations 25,001 ... 50,000 Evaluations APSO 476 208 ADE/Best 551 221
Data from MilkyWay@Home’s logs has shown that only a small number of results ever make it into the population. Table 1 shows the average number of inserts done over 20 different searches done on MilkyWay@Home using data from Sagittarius stripe 22. The number of inserts for both the first and second 25,000 reported results are shown for asynchronous particle swarm optimization and differential evolution using best parent selection. It becomes apparent from this information that only a small number of results ever make it into the population, less than 4% in the first half of the search and less than 2% in the second half. Additionally, as the searches progress it becomes more difficult to find new good search areas and the number of evaluated individuals inserted into the population decreases. Data from MilkyWay@Home’s logs also show that for a random sample of 500,000 results, only 2,609, or 0.5% were errors that could have been inserted into the population. By ignoring the erroneous results that would not be inserted into the population, even though they may have a correct result which could potentially be useful, the amount of computation dedicated to validation can be decreased dramatically. However, it is important to note that the search will not progress until better individuals are inserted into its population, so the longer it takes to verify good results, the slower the search will progress. So if too many resources are devoted to optimization the search may progress extremely slow. Two different validation strategies have been implemented and tested, pessimistic and optimistic. Pessimistic validation assumes results are faulty and only uses validated results to generate new work (see Figure 2). While this ensures
34
T. Desell et al.
Fig. 1. The optimistic validation strategy. Results that improve the unvalidated population are used to generate new individuals as soon as they are received, and reverted to previously validated results if they are found to be invalid. When unvalidated results are validated, they are inserted into the validated population.
Fig. 2. The pessimistic validation strategy. Results are validated before they are used to generate new individuals.
that newly generated results are generated from valid individuals, progress in the search is delayed while waiting to validate reported individuals with good fitness. Optimistic validation assumes results are correct and uses the best known individuals to generate new work (see Figure 1). This allows the search to progress as fast as possible, however there is the chance that erroneous individuals will be used to generate new work until those erroneous results are found to be invalid.
Validating Evolutionary Algorithms on Volunteer Computing Grids
35
These strategies were implemented by having two populations. One consisting of validated results and the other consisting of unvalidated results. For pessimistic validation, new individuals are generated by copying the individuals in the unvalidated population for validation at a specified rate, or by performing DE or PSO on the validated population otherwise. When an individual in the unvalidated population is validated, it is inserted into the validated population. For optimistic validation, new individuals are also generated from the unvalidated population at the specified rate, however DE and PSO is also performed on the unvalidated population for the other generated individuals. When an individual in the unvalidated population is found to be valid, it is inserted into the validated population. If an individual is found to be invalid, it is reverted to the previously validated result in the validated population.
4
Results
The effects of optimistic and pessimistic validation were also tested using MilkyWay@Home while fitting a model of the Sagittarius dwarf galaxy on data acquired from Data Release 7 of the Sloan Digital Sky Survey. This problem involves calculating how well a model of three tidal streams of stars and a background function fit a 5 degree wedge of 100,789 observed stars collected such that the wedge is perpendicular to the direction of the tidal stream’s motion (for more information about the astronomy and fitness function readers are referred to [4,5]). In total there are 20 real valued parameters to be optimized in the objective function. This model is calculated by a wide variety of hosts. The fastest high end double precision GPUs can calculate the fitness in under two minutes. High end CPUs require around an hour, and the slowest CPUs can take days. At the time these results were gathered, MilkyWay@Home had approximately 27,000 volunteered hosts participating in the experiments and a combined computing power of 1.55 petaflops 2 . Both particle swarm optimization and differential evolution were tested with a fixed population of 200 individuals. Particle swarm optimization used an inertia weight of ω = 0.6 and c1 = c2 = 2.0. Differential evolution used best parent selection and binomial recombination, i.e., de/best/1/bin, which has been shown to be a very robust and fast version of differential evolution [10]. Differential evolution used a pair weight, recombination rate and recombination scales of θ = φ = σ = 0.5. Figure 3 and 4 compares optimistic and pessimistic validation for DE/best and PSO, respectively. Five searches were run with for each validation rate, as it was increased from 10% to 40%. The best validated fitness presented is the average of those five searches. For both DE/best and PSO optimistic validation significantly outperformed pessimistic validation. Figure 5 compares the best verification strategies and verification rates. For optimistic validation, DE/best found the best solutions with a validation rate of 2
Statistics taken from http://boincstats.com
36
T. Desell et al.
Fig. 3. Comparison of the progress of particle swarm optimization on MilkyWay@Home while optimizing the model for Saggitarius Stripe 22 with different validation rates for optimistic and pessimistic validation
Validating Evolutionary Algorithms on Volunteer Computing Grids
37
Fig. 4. Comparison of the progress of differential evolution with best parent selection on MilkyWay@Home while optimizing the model for Saggitarius Stripe 22 with different validation rates for optimistic and pessimistic validation
38
T. Desell et al.
Fig. 5.
20%, while PSO found the best solutions with a validation rate of 40%. For pessimistic validation, both PSO and DE/best found the best solutions on average with a 30% validation rate. While using optimistic validation significantly improved the convergence rate of the optimization methods used, it also reduced the effect of the validation rate on the convergence of these methods. For pessimistic validation, changing the validation rate seemed to have a great effect at the search convergence rate, while this effect was significantly reduced for optimistic validation, almost to the point where the differences could be attributed to noise. However, it is still very interesting to note that the higher validation rates still outperformed lower validation rates.
5
Discussion
The requirement of high validation rates for pessimistic validation lends itself to the conclusion that it is very important to be able to quickly use newly found good individuals in the generation of new individuals to keep the search progressing quickly. However, the fact that optimistic validation also requires high validation rates, in spite of using results for the generation of new individuals as soon as they are reported, suggests that even a small amount of failures if not invalidated quickly can have very negative effects on the performance of the search. This negative effect could be for a few reasons. First, that the effect of invalid individuals in the search population is extremely negative, considering
Validating Evolutionary Algorithms on Volunteer Computing Grids
39
the validation rates required when only 0.5% of results reported are invalid. Another possibility is that with optimistic validation there could be a large delay between use of high fitness individuals and their insertion into the validated population. If such high fitness unvalidated individual is later found to be invalid, it could be reverted to a very old validated one which could significantly degrade the performance of the search. Yet another possibility involves examining the types of errors that occur in the system. For both optimization methods used, DE/best and PSO, the heuristic for generating new individuals always uses the best individual in the population. If the majority of errors have fitness that is better than the best individual, these will corrupt all newly generated until they are found to be invalid which could also help explain the negative effects of not validating individuals quickly enough. For future work, in order to further examine increasing the efficiency of validation of evolutionary algorithms, it would be interesting to measure the effect of errors in a controlled environment. Using the simulation framework developed in previous work [6], it would be possible to simulate errors (both those that are better than the best individual in the population, and those that only make a single individual invalid) and analyze these effects for different benchmark optimization problems. It would also be interesting to study what rate of errors causes pessimistic validation to outperform optimistic validation. Another area of potential research is to combine the validation strategies presented in this work with those available in BOINC. As opposed to keeping two populations of validated and unvalidated individuals and generating individuals for verification by copying them from the unvalidated population, the BOINC software could be used to handle the validation by quorum on individuals which could potentially be inserted into the validated population. This would ensure that any potentially improving result will be validated so that when an individual in the unvalidated population is found to be invalid, it will roll back only to the last best individual found. However, this approach can lead to slower validation of results and has the potential to utilize a significant amount of memory and disk space as the amount of results awaiting validation can grow unbounded; so those effects must be studied as well. Another potential area of interest is that recently hyper-heuristics have been shown to be effective in distributed computing scenarios [3]. Using meta or hyper-heuristics to automatically tune not only the parameters involved in the optimization methods, but to tune the validation rate could further improve convergence times. The results presented show that using optimistic validation can be a very effective strategy for improving the convergence rates of evolutionary algorithms on volunteer computing grids, without resorting to validating every result which can be required for other algorithms. These strategies have also been used on a live volunteer computing grid with over 27,000 active volunteered computing hosts and have been utilized to provide effective and efficient validation for the MilkyWay@Home project.
40
T. Desell et al.
Acknowledgements Special thanks go to the Marvin Clan, David Glogau, and the Dudley Observatory for their generous donations to the MilkyWay@Home project, as well as the thousands of volunteers that made this work a possibility. This work has been partially supported by the National Science Foundation under Grant Numbers 0612213, 0607618, 0448407 and 0947637. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
References 1. Anderson, D.P., Korpela, E., Walton, R.: High-performance task distribution for volunteer computing. In: e-Science, pp. 196–203. IEEE Computer Society, Los Alamitos (2005) 2. B´ anhelyi, B., Biazzini, M., Montresor, A., Jelasity, M.: Peer-to-peer optimization in large unreliable networks with branch-and-bound and particle swarms. In: Giacobini, M., Brabazon, A., Cagnoni, S., Di Caro, G.A., Ek´ art, A., Esparcia-Alc´ azar, A.I., Farooq, M., Fink, A., Machado, P. (eds.) EvoCOMNET. LNCS, vol. 5484, pp. 87–92. Springer, Heidelberg (2009) 3. Biazzini, M., Banhelyi, B., Montresor, A., Jelasity, M.: Distributed hyper-heuristics for real parameter optimization. In: GECCO 2009: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp. 1339–1346. ACM, New York (2009) 4. Cole, N.: Maximum Likelihood Fitting of Tidal Streams with Application to the Sagittarius Dwarf Tidal Tails. PhD thesis, Rensselaer Polytechnic Institute (2009) 5. Cole, N., Newberg, H., Magdon-Ismail, M., Desell, T., Dawsey, K., Hayashi, W., Purnell, J., Szymanski, B., Varela, C.A., Willett, B., Wisniewski, J.: Maximum likelihood fitting of tidal streams with application to the sagittarius dwarf tidal tails. Astrophysical Journal 683, 750–766 (2008) 6. Desell, T.: Asynchronous Global Optimization for Massive Scale Computing. PhD thesis, Rensselaer Polytechnic Institute (2009) 7. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Sixth International Symposium on Micromachine and Human Science, pp. 33–43 (1995) 8. Golle, P., Mironov, I.: Uncheatable distributed computations. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, pp. 425–440. Springer, Heidelberg (2001) 9. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995) 10. Mezura-Montes, E., Velzquez-Reyes, J., Coello, C.A.C.: A comparative study of differential evolution variants for global optimization. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 485–492 (2006) 11. Schwefel, H.-P.: Evolution and Optimization Seeking. John Wiley & Sons, New York (1995) 12. Shi, Y., Eberhart, R.C.: A modified particle swarm optimizer. In: IEEE World Congress on Computational Intelligence, May 1998, pp. 69–73 (1998)
Validating Evolutionary Algorithms on Volunteer Computing Grids
41
13. Storn, R., Price, K.: Minimizing the real functions of the ICEC1996 contest by differential evolution. In: Proceedings of the IEEE International Conference on Evolutionary Computation, Nagoya, Japan, pp. 842–844 (1996) 14. Szajda, D., Lawson, B., Owen, J.: Hardening functions for large scale distributed computations. In: IEEE Symposium on Security and Privacy, vol. 0, p. 216 (2003) 15. Vesterstrom, J., Thomsen, R.: A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Congress on Evolutionary Computation 2004 (CEC 2004), June 2004, vol. 2, pp. 1980–1987 (2004)
A Reconfiguration Language for Virtualized Grid Infrastructures R´emy Pottier, Marc L´eger, and Jean-Marc Menaud Ascola (EMN/INRIA, LINA) Ecole des Mines de Nantes 4, rue Alfred Kastler 44307 Nantes, France [email protected]
Abstract. The growing needs in computational power to answer to the increasing number of on-line services and the complexity of applications makes it mandatory to build corresponding hardware infrastructures and to share several distributed hardware and software resources thanks to grid computing. To help with optimizing resource utilization, system virtualization is a more and more adopted technique in data centers. However, this software layer adds to the administration complexity of servers and it requires specific management tools to deal with hypervisor functionalities like live migration. To address this problem, we propose VMScript, a domain specific language for administration of virtualized grid infrastructures. This language relies on set manipulation and is used to introspect physical and virtual grid architectures thanks to query expressions and notably to modify VM placement on machines.
1
Introduction
Data centers are one of more important Internet component (with the access point and network). These infrastructures are used most of the time to host online services. Traditional datacenters typically host a large number of relatively small-sized applications, and host hardware and software for multiple organizational units or even different companies when traditional cluster belongs to a single organization, with a relatively homogeneous hardware and system software platform, and share a common systems management layer. So from an architecture perspective, datacenter is closer to a grid architecture (which is a clusters federation) than to a single cluster. That is why, and like grid, the physical administration of this infrastructure is a real challenge, both in monitoring and in its administrative base operations (shutdown, reboot, etc.). From a software perspective, virtualization [8] has spread in datacenter. It allows to gain efficiency for resource utilization and flexibility for application execution. In this approach, each small-sized application hosted is running in a
This work is partially funded by the SelfXL ANR/ARPEGE (http://selfxl.gforge.inria.fr/dokuwiki/doku.php).
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 42–55, 2010. c IFIP International Federation for Information Processing 2010
project
A Reconfiguration Language for Virtualized Grid Infrastructures
43
virtual machine. Virtual machines (VMs) can be thus used to consolidate the workloads of under-utilized servers to use fewer physical machines so as to save on hardware and power consumption [10]. However virtualization as a new abstraction layer adds complexity for data center administrators. If administrators agree on the benefits of virtualization to reduce costs and improve flexibility, most also recognize that it makes the administration more complex and error prone. That is why many companies think about adapting their management tools and instrumentation to current needs of virtualized environments. One of the new issues raised by administrators is the fine management of VMs. Administrators want more particularly to express complex queries on resources (introspection) and manipulate elements (intercession). Low levels APIs (e.g. Xen API [4]) provide some primitive operations on VMs with for example instantiation, shutdown, static or live migration, etc. but no complex operations on sets of elements. Manipulation of collections of resources is then done by invoking these APIs in some general purpose or scripting languages which are not necessarily adapted in terms of concision and precision of their syntax. Our proposition relies on the use of several domain specific languages (DSLs) for grid administration. First of all, we need to define a model for describing grid resources. In our model, a grid user defines a task in the form of a set of VMs called a virtual job which can execute on the grid. A grid is then modeled as a graph of physical elements like machines, racks or clusters, and logical elements including VMs, virtual jobs, and users. We define description languages for administrators to represent grid physical architectures, virtual organizations as sets of users. Another language allows user to describe virtual jobs they want to submit to the grid. In addition to these description languages, another DSL is used to manage the resources previously describe (i.e. navigate in the grid physical and logical architecture to select elements with given properties). This last language is also used to execute reconfiguration operations on grid elements like VM migrations [5]. This paper is organized as follows. Section 2 presents some related work about languages for grid management. Section 3 presents our model to represent resources in a grid architecture and some description languages to build these architectures and virtual jobs. Section 4 describes a domain specific language for the grid management based on selections, navigations and dynamic reconfigurations in grid architectures. Our DSL approach for grid administration is evaluated in Section 5 before concluding in Section 6.
2
Related Work
We are considering two categories of work related to grid administration and DSLs, those involving basic operations and those on language aspects. Basic operations: Shells and APIs. Basic management operations are performed by using hypervisor APIs which allow to manipulate VMs (instantiation, migration, destruction, etc.). As each hypervisor (Xen [4], KVM [11], etc.) has its own
44
R. Pottier, M. L´eger, and J.-M. Menaud
API, the libvirt API 1 may be used to manage different virtualization solutions through a common interface. Above these APIs, shells unify the most common management operations and administrators can similarly manage a server whatever the hypervisor is. Shells (e.g. Usher [13]) and API approaches are designed for local management on a given server. For good working order, administrators need to have information about the whole grid to manage grid resources. To fill this gap, some virtual machine managers [15] (e.g. Virtual Machine Manager 2 ) offer an overview of the grid with real time monitoring. These tools help administrators to manage all grid elements with a common interface, sometimes a graphical user interface, whatever the hypervisor is. However, they offer only limited operations in terms of resource queries and complex reconfiguration operations. Domain Specific Languages. Language approaches address description and reservation in grid context. A grid description may be carved up into grid resources description and description of how to use these resources (that is to say job descriptions). The Job Submission Description Language (JSDL) [3] is a XML based language to describe a job and its needs (resources and applications). For a grid architecture description, VXDL [9] is a language for virtual resources interconnection networks specification and modeling. It describes virtual infrastructures, especially virtual network, and queries the model about the network topology. Other specific languages have proposed to permit users to get resources and use them. ClassAd [14] and xRSL [2] are declarative languages with attributevalue pairs. A language keyword identifies properties on which users can make a selection. Users describe resources required (network, disk, memory, etc.) and how to use them. SWORD [1] is a framework which collects grid monitoring information into a database and provides a query language for selecting and ranking required resources. These languages address grid resources utilization and grid description but not administrative tasks. Our aim is to overcome the limitations of these tools and languages by proposing an approach based on several domain specific languages for both describing and managing (observation and reconfiguration) resources in grids.
3
Specification of Grid Architectures
We distinguish two kinds of actors: administrators and users. Administrators configure servers, networks and, with virtualization, define virtual machines placement. Users submit their jobs and manage them without explicitly choosing specific servers. Each user describes the resources necessary for his job. In our case, grid resources are modeled by virtual machine requirements like in [7]. A job is composed of a set of virtual machines which will be executed on the infrastructure, it is then called a virtual job or vjob (called lease in other works [16]. Each user belongs to a virtual organization (VO). 1 2
http://libvirt.org/html/libvirt-libvirt.html http://virt-manager.et.redhat.com/
A Reconfiguration Language for Virtualized Grid Infrastructures
3.1
45
Life Cycle in Grid Management
Figure 1 describes the classical lifecycle in a grid with the user vjob submission and adminstrator’s maintenance operation. First, after the user has specified his vjob, he submits it to the vjob configuration parser. This parser builds a vjob with the appropriate number of virtual machines. Then, this vjob is submitted to the management system. If the submission succeeds, all virtual machines of the vjob are placed in the grid.
Fig. 1. Global architecture of the grid framework
The management system is used to build a grid representation, to modify it, or to query it. It knows all elements of the grid representation and it allows to ensure some good properties such as uniqueness (e.g., a unique IP per machine). The management system also checks that grid elements are correctly and completely configured. For example, the virtual machine memory is essential to place a VM on a server. Moreover the management system is able to place VMs in the grid with respect to their needs. If no placement is found for a VM, it is rejected. The grid representation is built from a full description of the grid supplied by an administrator. This representation is a structural model of the grid resources (discussed in Section 3.2) and it is tied to the real grid by a monitoring system (in our case Ganglia [12]). This one checks grid representation information to ensure consistency between the real grid and its representation. This causal connection is only maintained for servers and VMs in our framework because the monitoring system does not give information about other elements, like the cluster organization. So if an unexpected event happens, for example an element disappears, the monitoring system detects it and the grid representation is updated accordingly.
46
R. Pottier, M. L´eger, and J.-M. Menaud
Several verifications are performed on this description by the grid configuration parser in collaboration with monitoring system and management system. In the first place, this parser checks the structure of the grid by comparison between the description and the grid model. In the second place, information from grid description are compared to monitoring system information. When a representation is built, the administrator can perform grid reconfigurations, like adding a server, in the management console. He may write a sequence of operations in a script executed in a management console. For example, migrate a virtual machine, then shut down a server. 3.2
A Grid Model
A model of grids (Figure 2) has been conceived as a multi-graph with labeled nodes and arcs. In this graph, the nodes correspond to grid elements with properties and operations. The arcs represent relations between these elements. This graph is navigable with bidirectional relations. This model is one particular view of what a grid is, but it can be adjusted for describing other kinds of grid organizations. It is composed of two kinds of elements: physical elements and logical elements.
Fig. 2. A grid model
A physical element is basically a server container. The smaller the container is, the more accurate the location information of the server is. Servers are identified by the more generic term machine. Some node properties are mandatory and must be initialized to enable grid management. Optional properties, like operating system, allow administrator to simplify the grid management. A machine is placed into a rack at a specific level. A set of racks composes a cluster which
A Reconfiguration Language for Virtualized Grid Infrastructures
47
itself belongs to a site. A site is a general term to design a set of clusters. A site can represent either simply a room where servers are located, or a city with several data centers. The main element of the logical view is the virtual machine (VM). In a vjob defined by several VMs, we can create special groups of VMs called VMSet. For instance, a VMset can be used to group all VMs containing server for a given tier in a 3-tiers application. Each vjob is linked to its owner represented by a User element. A virtual organization (VO) is a set of users who can connect to the grid. Physical and logical views are linked by the hosting relation between machines and VMs. An important property is the life-cycle state of machines, VMs and vjobs. These states represent the current element life cycle and allow to restrict the execution of some operations. The machine life cycle is a trivial two-state automaton with on and off states. The VM life cycle consists of five states: uninitialized before some mandatory properties are configured, initialized, started, suspended and stopped. As we only consider live migration operations on VMs, the VM state remains running during migration. A vjob is a composition of VMs, so a vjob has the same life cycle as a VM. Several description languages are provided to specify grid architectures. These languages are based on XML and XML Schema, so that they conform to our grid model and its mandatory properties. The first language is used by administrators to describe the physical architecture of grids. This description is used by the grid configuration parser to build the physical representation of the grid.
< s i t e name=”EMN” c i t y=” Nantes ”> < c l u s t e r name=”Xen”> < l e v e l number=” 2 ”> <machine hostname=” p a s t e l −1. b217 . home” i p=” 1 9 2 . 1 6 8 . 0 . 1 0 7 ” mac=” 00 : 2 1 : 7 0 : 2 5 : 5 5 : b 0 ”> <memory c a p a c i t y=” 4000 ” /> machine> l e v e l> < l e v e l number=” 3 ”> <machine hostname=” p a s t e l −2. b217 . home” i p=” dhcp” mac=” 00 : 2 1 : 7 0 : 2 5 : 5 5 : b 1 ”> ...
Example of a partial description of a physical grid architecture
As different actors handle the logical view differently, there are two languages to describe it. The first one allows an administrator to link users with a grid by specifying VOs. The second one is used by grid users to define their vjobs to be submitted to the grid.
48
R. Pottier, M. L´eger, and J.-M. Menaud
vmset> v j o b>
Example of a vjob description
4
A Domain Specific Language for Grid Management
Once our grid model has been defined, administrators and users manage resources by navigating and selecting elements in grid physical and logical architectures and by dynamically reconfiguring these architectures. This section describes VMScript, a domain specific language for grid management, i.e. introspection and intercession in grid elements. This language is inspired by previous work on a reconfiguration language in component-based architectures called FScript [6]. Actually, our language is divided into two parts for respectively introspection and reconfiguration. The introspection language, named VMPath, is used to express queries in grid architectures. The reconfiguration language, VMScript, allows the execution of dynamic reconfiguration operations on grids and is a super set of VMPath. 4.1
Selection and Navigation in Grid Architectures
A grid configuration (or architecture) is defined as a labeled directed multigraph. To query these architectures, the VMPath language is used as a side-effect free declarative language. It is restricted to the navigation in grid architectures, the selection of grid elements by their location or their properties. Therefore, the execution of a VMPath expression cannot lead to modifications in grids. VMPath syntax. The language has a very concise but powerful syntax based on XPath 1.0 [17], the W3C standard query language for XML documents. Several arguments are in favor of this choice: – XPath does not depend on the specific syntax of XML documents, it can be used on abstract graph models such as our grid model. Actually XPath only defines concepts of nodes, properties and relations between nodes. – The syntax is open and flexible. Although XPath specifies a fixed set of nodes and relations (XPath axes) to query XML documents, it is possible to define new types of nodes and relations. Our grid model does not use XPath base XML axes (child, attribute, etc.) but defines its own navigation axes.
A Reconfiguration Language for Virtualized Grid Infrastructures
49
– The syntax is concise and readable, an XPath allows to express one-line queries. Moreover, XPath defines a node-set data type which allows powerful set queries with set operations. Despite all these advantages, VMPath does not rely on existing XPath implementations because these implementations are too tied to XML representations. The generic syntax of a VMPath expression consists of a sequence of steps separated by slashes (cf. Figure 3). A step is composed of an axis specifier which indicates the arc to follow in the graph for navigation, and a set of optional predicates to filter the selected nodes. There is no intrinsic notion of hierarchy in navigation and so a navigation axis does not necessarily represent a hierarchical relation between elements. The beginning of the expression, $grid, refers to the initial node set used in the query. This node set is stored in a VMPath variable and denotes in this case a grid element. The navigation axis used in the expression is the site axis which basically selects all site nodes belonging to the grid. This set of sites is then filtered thanks to a predicate to select only sites which are named ‘P aris’, i.e. all sites which are localized in Paris. The ‘@’ symbol is used to query the value of the name property. Initial node set
Optional filtering predicates
$grid/site[@location == "Paris"]/...
Axis specifier
Additional steps
Fig. 3. Syntax of query expressions
VMPath is a dynamically typed language (type checking is performed at runtime). The four primitive data types defined are the same as in XPath 1.0: node-set, string, number, and boolean. As there is no notion of attribute nodes, a special type multi-set has been added to deal with multi-sets of primitive types. The VMPath language supports the classic arithmetic, boolean and comparison operators and also set operators (union, intersection and difference). Functions in VMPath are side-effect free procedures. A library of predefined functions is provided with the language. These functions are essentially: – property accessors to get values of node properties (e.g., ‘name()’ to get the name of a node). The ‘@’ notation before a property name is strictly equivalent to the accessor function on the property (e.g., ‘@name’). It should be noted that these functions can be applied to a set of elements. For instance ‘name($set)’ would return a multiset of strings corresponding to all the names of the elements contained in the set ‘$set’. – functions for string manipulation (e.g., ‘concat()’ for string concatenation, ‘match()’ to test the matching of a string and a regular expression) – aggregation functions on element collections: ‘size()’ (returns the cardinality of a set), ‘sum()’ (returns the sum of a number set), etc.
50
R. Pottier, M. L´eger, and J.-M. Menaud
VMPath examples. VMPath can be used to express a wide range of queries on grid architectures. Some examples are presented afterwards. A selection of all racks in a grid is performed thanks to the following expression: $grid / s i t e / c l u s t e r / rack
A shortcut navigation axis is usable when there is no ambiguous path in the graph to reach the wanted nodes. For instance, the previous expression using a shortcut axis could be expressed as follows: $ g r i d // r a c k
Grid elements can be selected by the value of properties. For example, we may want to find the rack which contains a machine with a specific IP address in a cluster: $ c l u s t e r // machine [ @ip == ’ 1 9 2 . 1 6 8 . 1 1 0 . 3 6 ’ ] / / r a c k
4.2
Dynamic Reconfiguration of Grids
The VMPath query language is integrated into another DSL focusing on the dynamic reconfiguration of grids, VMScript. VMPath expressions are used to select the grid elements to reconfigure. VMScript is an imperative language providing procedures and control structures so as to program reconfiguration scripts of grids. Procedures. VMScript makes the distinction between two kinds of procedures: functions and actions. Functions are side-effect free procedures only for grid introspection, whereas actions are intercession procedures to actually modify grid configurations. A primitive action in our model is a primitive graph transformation in a grid representation such as listed below: – Addition or removal of a node. For instance, to add or remove a cluster node in the graph, we could use respectively the following procedures: new−c l u s t e r ( ) ; d e l e t e −c l u s t e r ( $ c l u s t e r ) ;
– Addition or removal of a relation between nodes. For instance, these two procedures respectively add and remove a rack relation between a cluster and a rack, i.e. add and remove a rack in a cluster: add−r a c k ( $ c l u s t e r , $ r a c k ) ; remove−r a c k ( $ c l u s t e r , $ r a c k ) ;
– Modification of the value of a node property. To change the name of a grid, the following setter is applied: s e t −name ( $ g r i d , ” mygrid ” ) ;
All these primitive actions are automatically generated from the description of our grid model so that possible modifications in the model are transparently taken into account. Native procedures like primitive actions are implemented in Java, the implementation language of the VMScript interpreter. However it is possible to define procedures directly in the VMScript language. These userdefined procedures can be loaded at any time in the interpreter. A VMScript procedure is specified by means of the f unction or action keywords:
A Reconfiguration Language for Virtualized Grid Infrastructures function isEmpty ( s e t ) { return s i z e ( $ s e t ) ==0; }
action m i g r a t e −v j o b ( v j o b , f o r $vm : $ v j o b /vm { m i g r a t e ( $vm , $ d e s t ) ; } }
51
dest ) {
The first procedure is a function which returns true if a set is empty, false otherwise. The second procedure is an action which takes a virtual job and a machine in argument. It consolidates the vjob on the same destination machine by migrating all of its VMs. These two procedures are part of a standard library. This library contains utility procedures which are loaded when the interpreter is instantiated. Control structures. VMScript supports classic control structures in addition to the sequencing of instructions. New variables can be created by assigning them an initial value. Variables are mutable and their scope is defined by the block where they are declared. In the following example, ‘grid’ is a global variable since it is defined outside any block. It is initialized with a grid node built from the configuration file mygrid.xml describing a grid architecture. A declared variable is then referenced by means of the ‘$’ symbol. g r i d = adlnew−g r i d ( ‘ ‘ mygrid . xml” ) ; echo ( $grid ) ;
The conditional execution if-then-else uses the standard C syntax. The following example tests if the memory capacity of a machine is above a threshold. If the test evaluates to true, it adds a VM to the machine, otherwise it prints a message to the standard output. i f ( $machine@mem cap >= 2 0 0 0 ) { add ( $vm , $machine ) ; } else { e c h o ( ‘ ‘ Not enough memory . ” ) ; }
Iteration is restricted to finite sets with a for loop. This limitation prevents from programming infinite loops and non-terminating scripts. The execution semantics of an iteration is to sequentially iterate on every element in the set. For example, the following code iterates on every machine in the grid which do not host any VM. f o r $m : $ g r i d // machine [ s i z e ( . / vm) ==0] { shutdown ($m) ; }
Some native actions are defined to take either a single value or a set in argument. In the latter case, the primitive action is executed in parallel on each element of the set. For instance, the previous example could be executed in parallel with the same action but without explicit iteration: shutdown ( $ g r i d // machine [ s i z e ( . / vm) ==0]) ;
An explicit return statement allows the stopping of a program execution and returns to the caller with possibly sending a value.
52
R. Pottier, M. L´eger, and J.-M. Menaud
function c p u c o n s a v e r a g e ( m a c h i n e s ) { return a v e r a g e ( m a c h i n e s@ c p u c o n s ) ; }
Execution model. Primitive actions defined in VMScript are directly mapped on operations in the grid model. For instance, a migrate-vm action on a VM node corresponds to a migrate Java method on a VM object from our model API. As previously mentioned and thanks to the causal connection between the grid and its representation, the execution of an operation in the model comes to execute operations on real machines and VMs through SSH and calls to native APIs (e.g., Xen API for Xen VMs). VMscript code is executed in an interpreter programmed in Java which can be embedded in applications. Furthermore, an interactive console is provided so as to interactively execute queries on a grid and reconfiguration scripts.
5
Evaluation
We show first in this section the expressiveness of the VMScript language by comparing it to another general purpose scripting language linked to a VM API. We then present several use cases we experimented with VMScript for grid management. 5.1
Comparison with a General Purpose Language
In this section, we compare an action written in VMScript with the same action written in the scripting language Bash. The purpose of the example is to shut down machines in order to perform some maintenance tasks on hardware (for example, changing a hard disk). We want to select machines from their CPU capacity and their kernel version. If these machines do not host VMs, we shut them down. For this experimentation, all machines are stored in a same rack. These machines boot a Linux operating system with different kernel versions and different CPU and memory capacities. Each machine runs a Xen (v3.2.1) hypervisor. From a UNIX shell, we get the cpu capacity from reading the /proc/cpuinfo file and just keep the value of the metric cpu MHz. We obtain the kernel version by the command uname -r. To check that there is no VM hosted by the machine, the Xen API is invoked for listing all hosted virtual machines on a node. 1 2 3 4 5 6 7 8 9
f o r machine in $ ∗ ; do k e r n e l=$ ( s s h root@$machine uname −r ) cpu=$ ( s s h root@$machine c a t / p r o c / c p u i n f o | g r e p ’ cpu MHz ’ | head −n 1 | s e d s /[ˆ0 −9.]// g ) i f [ $cpu = ” 2000 ” −a $ k e r n e l = ” 2.6.26 −1 − xen−amd64” ] ; then i f [ −z ” $ ( s s h root@$machine xm l i | s e d ’ 1 d ’ | s e d ’ 1 d ’ ) ” ] ; then s s h root@$machine h a l t fi fi done
Action written in a bash script
A Reconfiguration Language for Virtualized Grid Infrastructures
53
From the VMScript console, we set a variable “rack” with the rack to analyze. So we select all machines of this rack with the wished “cpu cap”. For the kernel version, the usage of an optional property “os dist” is required. We query the grid representation with the function “size()” to check that no VM is running on a machine. 1
shutdown ( $ r a c k / machine [ @cpu cap = 2 0 0 0 ] [ @ o s d i s t = = ’ 2.6.26 −1 − xen−amd64 ’ ] [ s i z e (vm) = = 0 ] ;
Action written in VMScript
We can see the benefits from using the VMScript DSL because of: – its concision: the VMScript action takes a single line versus 9 lines of codes in Bash, – its homogeneity and genericity: its not necessary to invoke a specific hypervisor API in the code, – its guarantees: the shutdown action in VMScript has a precondition to check that there is actually no VM hosted on a machine before shutting it down. 5.2
Some Common Use Cases in Grid Management
Some samples of VMScript code are given afterwards to exemplify the use of the language for grid management. The action keep-min-nodes is used to ensure that a given number of machines is started so that they can easily host new VMs. action keep−min−n o d e s ( g r i d , nb ) { f o r s : $ g r i d // s i t e { no vm = $ s // machine [ s i z e (vm) = = 0 ] ; on = s i z e ( $no vm ) i f ( $on > $nb ) { shutdown ( s u b s e t ( $no vm , $on − $nb ) ) } e l s e i f ( $on < $nb ) { power−on ( s u b s e t ( $ s // machine [ @ s t a t e = = ’ o f f ’ ] , $nb − $on ) ) ; } } }
The following action gets servers running a Xen hypervisor then puts a new Xen configuration file and restarts the Xen daemon. action h y p e r v i s o r s −c o n f i g ( g r i d , h y p e r v i s o r , f i l e P a t h ) { xen = $ g r i d // machine [ @ h y p e r v i s o r = = ’ xen ’ && @ o s f a m i l y = = ’ Linux ’ ] ; put− f i l e ( $xen , f i l e P a t h ) ; e x e c u t e−command( $xen , ’ / e t c / i n i t . d/ xend r e s t a r t ’ ) ; }
In the next action, we add a new server in the physical architecture and migrate virtual machines of the most overhead servers to free a piece of memory.
54
R. Pottier, M. L´eger, and J.-M. Menaud
action new−machine ( g r i d , h y p e r v i s o r , f i l e P a t h ) { add−e l e m e n t s ( ’ new−s e r v e r . xml ’ ) ; new−s e r v e r = $ g r i d // machine [ @name = = ’ p a s t e l −90 ’ ] ; s e r v e r 1 = $ g r i d // machine [ @mem free = = min ( g r i d // machine@mem free ) ] ; m i g r a t e ( $ s e r v e r 1 /vm [ @mem need = = max ( . / machine/vm@mem need) ] , $new−s e r v e r ) ; }
6
Conclusion
Managing a virtualized grid infrastructure is a hard task and some tools are required to help administrators with this. In the same way, although a grid can aggregate a lot of heterogeneous physical and software resources, it must offer a simple interface to its users, i.e. the application (job) providers. Regarding these preoccupations, we proposed a domain specific language approach for grid management. More precisely, several DSLs are used for grid description, query and reconfiguration. All these languages rely on the definition of a particular model of a grid. A graph based representation of grids is maintained at runtime and conforms to this model. A first description language allows administrators to specify a grid physical architectures as a hierarchical assembly of physical elements like machines and clusters. A second language is used to group grid users by sets in virtual organizations. The last description language is dedicated to the specification of jobs (called virtual jobs) by users. A job, which has to be executed on the grid, is described essentially as a set of VMs. A user can specify the resources (CPU, memory, etc.) required to run the job. The VMScript language focuses on querying grid architectures and on grid reconfiguration. A subset of the language is declarative and is used to query the grid through its runtime representation. This query is done by navigating in the graph and selecting elements with some optional predicates. The imperative part adds side effects to the language with control structures and procedures called actions. These ones actually modify the grid architecture, for instance by placing and moving VMs on machines. Our grid model essentially focuses on the representation of machines as physical resources. It does not deal at the moment with network topology and properties such as latency and bandwidth. However, as this model is extensible without actually modifying the language syntax, these new preoccupations could be introduced for future work provided that a suitable monitoring system gives that information.
References 1. Albrecht, J., Oppenheimer, D., Vahdat, A., Patterson, D.A.: Design and implementation tradeoffs for wide-area resource discovery. In: Proceedings of 14th IEEE Symposium on High Performance, Research Triangle Park, pp. 113–124. IEEE Computer Society, Los Alamitos (2005) 2. Globus Alliance. Extended resource specification language (xrsl). Technical report, Globus Alliance (2009)
A Reconfiguration Language for Virtualized Grid Infrastructures
55
3. Anjomshoaa, A.: Job submission description language (jsdl) specification, version 1.0. Technical report, Global Grid Forum (2005) 4. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: SOSP 2003: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 164–177. ACM, New York (2003) 5. Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: NSDI 2005: Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, pp. 273–286. USENIX Association, Berkeley (2005) 6. David, P.-C., Ledoux, T., L´eger, M., Coupaye, T.: Fpath & fscript: Language support for navigation and reliable reconfiguration fractal architectures. Annals of Telecommunications - Special issue on Software Components - The Fractal Initiative 64, 45–63 (2009) 7. Figueiredo, R.J., Dinda, P.A., Fortes, J.A.B.: A case for grid computing on virtual machines. In: ICDCS 2003: Proceedings of the 23rd International Conference on Distributed Computing Systems, p. 550. IEEE Computer Society, Washington (2003) 8. Goldberg, R.P.: Architecture of virtual machines. In: Proceedings of the workshop on virtual computer systems, pp. 74–112. ACM, New York (1973) 9. Primet, P.V.-B., Koslovski, G.P., Char˜ ao, A.S.: Vxdl: Virtual resources and interconnection networks description language. In: Networks for Grid Applications (2009) 10. Hermenier, F., Lorca, X., Menaud, J.-M., Muller, G., Lawall, J.: Entropy: a consolidation manager for clusters. In: VEE 2009: Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pp. 41–50. ACM, New York (2009) 11. Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: Kvm: the linux virtual machine monitor. In: Proceedings of the Linux Symposium, June 2007, vol. 1, pp. 225–230 (2007) 12. Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. Parallel Computing 30(7), 817–840 (2004) 13. McNett, M., Gupta, D., Vahdat, A., Voelker, G.M.: Usher: an extensible framework for managing custers of virtual machines. In: LISA 2007: Proceedings of the 21st conference on Large Installation System Administration Conference, pp. 1–15. USENIX Association, Berkeley (2007) 14. Raman, R., Livny, M., Solomon, M.: Matchmaking: Distributed resource management for high throughput computing. In: Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, pp. 28–31 (1998) 15. Rosenblum, M., Garfinkel, T.: Virtual machine monitors: Current technology and future trends. Computer 38(5), 39–47 (2005) 16. Sotomayor, B., Keahey, K., Foster, I.: Combining batch execution and leasing using virtual machines. In: HPDC 2008: Proceedings of the 17th international symposium on High performance distributed computing, pp. 87–96. ACM, New York (2008) 17. World Wide Web Consortium. XML path language (XPath) version 1.0. W3C Recommendation (November 1999), http://www.w3.org/TR/xpath/
Distributed Object-Oriented Programming with RFID Technology Andoni Lombide Carreton , Kevin Pinte, and Wolfgang De Meuter Software Languages Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium {alombide,kpinte,wdmeuter}@vub.ac.be
Abstract. Our everyday environments will soon be pervaded with RFID tags integrated in physical objects. These RFID tags can store a digital representation of the physical object and transmit it wirelessly to pervasive, context-aware applications running on mobile devices. However, communicating with RFID tags is prone to many failures inherent to the technology. This hinders the development of such applications as traditional programming models require the programmer to deal with the RFID hardware characteristics manually. In this paper, we propose extending the ambient-oriented programming paradigm to program RFID applications, by considering RFID tags as intermittently connected mutable proxy objects hosted on mobile distributed computing devices. Keywords: RFID, pervasive computing, ambient-oriented programming, mobile RFID-enabled applications.
1
Introduction
RFID is generally considered as a key technology in developing pervasive, contextaware applications [1], [2]. RFID tags are becoming so cheap that it will soon be possible to tag one’s entire environment, thereby wirelessly dispersing information to nearby context-aware applications. An RFID system typically consists of one or more RFID readers and a set of tags. The RFID reader is used to communicate with the tags, for example to inventory the tags currently in range or to write data on a specific tag. RFID tags can either be passive or active. Active tags contain an integrated power source (e.g. a battery) which allows them to operate over longer ranges and to have more reliable connections. Some even have limited processing power. Passive tags are more commonly used because they are very inexpensive. Passive tags use the incoming radio frequency signal to power their integrated circuit and reflect a response signal. Most RFID tags possess non-volatile memory on which they can store a limited amount of data. The technologies on which we focus are cheap, writable passive tags and RFID readers integrated into mobile devices (such as smartphones).
Funded by a doctoral scholarship of the “Institute for the Promotion of Innovation through Science and Technology in Flanders” (IWT Vlaanderen).
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 56–69, 2010. c IFIP International Federation for Information Processing 2010
Distributed Object-Oriented Programming with RFID Technology
57
This technology gives rise to distributed applications running on mobile devices that both disperse application-specific data to and process contextual data from tagged physical objects in their environment. They spontaneously interact with physical objects without assuming any additional infrastructure. We will refer to such applications as mobile RFID-enabled applications (see section 3.1 for an example). These applications use RFID technology in a radically different way than RFID systems deployed today, which only use RFID tags as digital barcodes and almost never exploit the writable memory on these tags. Furthermore, today’s systems assume infrastructure in the form of a centralized backend database that associates the digital barcode with additional information. In mobile RFID-enabled applications, communication with RFID tags is prone to many failures. Tags close to each other can cause interference and can move out of the range of the reader while communicating with it. These failures may be permanent, but it may be that at a later moment in time the same operation succeeds because of minimal changes in the physical environment. For example, a tag moves back in range or suddenly suffers less from interference. As a consequence, dealing with these failures and interacting with the low-level abstraction layers offered by RFID vendors from within a general purpose programming language results in complex and brittle code. In this paper, we propose a natural extension to distributed object-oriented programming by aligning physical objects tagged with writable RFID tags as true mutable software objects. We will model these objects as proxy objects acting as stand-ins for physical objects. For this model to be applicable to mobile RFIDenabled applications, it must adhere to the following requirements: R1: Addressing physical objects. RFID communication is based on broadcasting a signal. However, to be able to associate a software object with one particular physical object, it is necessary to address a single designated physical object. R2: Storing application-specific data on RFID tags. Since mobile RFIDenabled applications do not rely on a backend database, the data on the RFID tags should be self-contained and stored on the writable memory of the tags [3]. R3: Reactivity to appearing and disappearing objects. It is necessary to observe the connection, reconnection and disconnection of RFID tags to keep the proxy objects synchronized with their physical counterparts. Differentiating between connection and reconnection is important to preserve the identity of the proxy object. Furthermore, it should be possible to react upon these events from within the application. R4: Asynchronous communication. To hide latency and keep applications responsive, communication with proxy objects representing physical objects should happen asynchronously. Blocking communication will freeze the application as soon as one physical object is unreachable. R5: Fault-tolerant communication. Treating communication failures as the rule instead of the exception allows applications to deal with temporary unavailability of the physical objects and makes them resilient to failures. For example, read/write operations frequently fail due hardware phenomena.
58
A.L. Carreton, K. Pinte, and W. De Meuter
The remainder of this paper is organized as follows. Section 2 discusses related work. Section 3 starts by introducing a mobile RFID-enabled application scenario. Thereafter we use the scenario as a running example to present the language constructs that make up our model. Section 4 discusses the limitations of our system. Finally, section 5 concludes this paper.
2
Related Work and Motivation
This section discusses the current state of the art concerning RFID applications and supporting software, and how current approaches do not meet the requirements listed in the previous section. RFID Middleware. Typical application domains for RFID technology are asset management, product tracking and supply chain management. In these domains RFID technology is usually deployed using RFID middleware, such as Accada [4] and Aspire RFID [5]. RFID middleware applies filtering, formatting or logic to tag data captured by a reader such that the data can be processed by a software application. RFID middleware uses a setup where several RFID readers are embedded in the environment, controlled by a single application agent. These systems rely on a backend database which stores the information that can be indexed using the identifier stored on the tags. They use this infrastructure to associate applicationspecific information with the tags, but do not allow storing this information on the tags directly (requirement R2). Therefore, current RFID middleware is not suited to develop mobile RFID-enabled applications. RFID in Pervasive Computing. In [6], mobile robots carrying an RFID reader guide visually impaired users by reading RFID tags that are associated with a certain location. In [7] users are equipped with mobile readers and RFID tags are exploited to infer information about contextual activity in an environment based on the objects they are using or the sequence of tags read by the reader. Rememberer [8] provides visitors of a museum with an RFID tag. This tag is used as the user’s memory of the visit and stores detailed information about selected exhibitions. However, none of the above systems provide a generic software framework to develop mobile RFID-enabled applications, but instead use ad hoc implementations directly on top of the hardware. In [9] RFID tags are used to store application-specific data. The RFID tags form a distributed tuple space that is dynamically constructed by all tuples stored on the tags that are in reading range. Mobile applications can interact with the physical environment (represented by tuples spaces) by means of tuple space operations. The system not only allows reading data from RFID tags, but at any time, data in the form of tuples can be added to and removed from the tuple space. However, there is no way to control on which specific tag the inserted tuples will be stored. RFID tags cannot represent physical objects as there is no way address one specific RFID tag as dictated by requirement R1. Hence, the programmer must
Distributed Object-Oriented Programming with RFID Technology
59
constantly convert application data types (e.g. objects) to tuples and vice-versa. Therefore, this approach suffers from the object-relational impedance mismatch [10] and does not integrate automatically with object-oriented programming.
3
Distributed Object-Oriented Programming with RFID-Tagged Objects
In this section, we discuss our RFID programming model. It is conceived as a set of language constructs that satisfy all requirements listed in section 1. We do this by means of an example mobile RFID-enabled application that we use as a case study to motivate our implementation. First, we introduce the general idea of the application. 3.1
A Mobile RFID-Enabled Application Scenario
The scenario consists of a library of books that are all tagged with writable passive RFID tags. The user of the application carries a mobile computing device that is equipped with an RFID reader. On this device, there is software running that allows the user to see the list of books that are nearby (i.e. in the reading range of the RFID device) sorted on different properties of the books (e.g. author, title, ...). This list is updated with the books that enter and leave range as the user moves about in the library. Additionally, the user can select a book from the list of nearby books, on which a dialog box opens. In this dialog box, the user can write a small review about the book. This review is stored on the tagged book itself. Other users can then select that same book from their list of nearby books and browse the reviews on the book, or add their review. 3.2
Ambient-Oriented Programming with RFID Tags
In the mobile RFID-enabled application introduced in the previous section, mobile devices hosting different instances of the application move throughout an environment of tagged books. These books dynamically enter and leave the communication range of the mobile devices and interact spontaneously. These properties are very similar to the the ones exhibited by distributed applications in mobile ad hoc networks [11]. Similar to mobile devices in mobile ad hoc networks RFID tags and readers should interact spontaneously when their ranges overlap. Ambient-oriented programming [12] is a paradigm that integrates the network failures inherent to mobile ad hoc networks into the heart of its programming model. To this end, ambient-oriented programming extends traditional objectoriented programming in a number of ways. First, when objects are transferred over a network connection it is not desirable having to send the class definition along with the object. This leads to consistency problems and performance issues [13], [14]. Hence, a first characteristic of ambient-oriented programming is the usage of a classless object model. A second characteristic is the use of non-blocking communication primitives. With blocking communication a program will wait for
60
A.L. Carreton, K. Pinte, and W. De Meuter
the reply to a remote computation causing the the application to block whenever a communication partner is unavailable [15]. The last characteristic is dynamic device discovery to deal with a constant changing network topology without the need for URLs or other explicit network addressing. Since we are modeling physical objects in a pervasive computing environment as self-contained software objects, ambient-oriented programming provides a fitting framework to cope with the problems listed in the introduction. A promising model epitomizing this paradigm is a concurrency and distribution model based on communicating event loops [16]. In this model, event loops form the unit of distribution and concurrency. Every event loop has a message queue and a single thread of control that perpetually serves messages from the queue. An event loop can host multiple objects that can be published in the network. Other event loops can discover these published objects, obtaining a remote reference to the object. Client objects communicate with a remote object by sending messages over the remote reference, the messages are then placed in the mail queue of the event loop hosting the remote object. The event loop’s thread handles these messages in sequence ensuring the hosted objects are protected against race conditions. A remote reference operates asynchronously, the client object will not wait for the message to be delivered, but immediately continues with other computations. Within the same event loop, local object references are accessed using regular, synchronous message sending. Figure 1 illustrates the communicating event loops model. When mobile devices move out of each others range, the event loops that are hosted on the different devices are disconnected from each other. However, upon such a disconnection, all remote references become disconnected and buffer incoming messages, as illustrated by figure 2. When the communication is reestablished, the remote references are automatically restored and all buffered messages are automatically flushed to the message queue of the destination event loop. AmbientTalk is an ambient-oriented programming language that uses the communicating event loop model as its model for concurrency and distribution [17]. It is conceived as a scripting language that eases the composition of distributed Java components in mobile ad hoc networks. We implemented our RFID system in AmbientTalk and in the next sections we introduce the concrete language abstractions that allow us to program with RFID-tagged objects as
Event loop
Event loop
Message from A to B
Thread Message queue
B
A Object
synchronous
Local references
Remote reference remote reference
asynchronous
Fig. 1. Overview of the communicating event loops model
Distributed Object-Oriented Programming with RFID Technology
Event loop
61
Event loop
A
Buffered message from A to B
B X
Buffered message from X to Y
Y
disconnected remote reference
Fig. 2. Messages to disconnected objects are buffered until reconnection
mutable software objects. Each of the next sections corresponds to a requirement formulated in section 1 and is numerated accordingly. R1
RFID-Tagged Objects as Proxy Objects
As discussed earlier, we model RFID-tagged objects as proxy objects. An example of a book proxy object is given below. It contains slots for the ISBN, title and reviews and provides two mutator methods to update the book’s title and add reviews: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
deftype Book; def aBook := object: { def ISBN := 123; def title := "My Book"; def reviews := Vector.new(); def setTitle(newTitle)@Mutator { title := newTitle; }; def addReview(review)@Mutator { reviews.add(review); }; } taggedAs: Book;
The hardware limitations of RFID tags render it impossible to deploy a full fledged virtual machine hosting objects on the tags themselves. We thus store a serialized data representation of a proxy object on its corresponding tag. Because we use a classless object model, objects are self-contained: there is no class that defines their behavior. Upon deserialization the object’s behavior (its methods) is preserved and used to reconstruct the proxy object. Since we cannot rely on classes to categorize objects, we use type tags. These are “mini-ontologies” that are attached to an object to identify its “type”. In the above example, we define a type Book on line 1 and attach that type to the aBook object in line 14. In section R3 we use the type tag to discover objects of a certain kind. Of course, the data stored on the tags has to be synchronized with the state of these proxy objects. Methods that change the state of the book objects are
62
A.L. Carreton, K. Pinte, and W. De Meuter
Event loop
a
RFID event loop RFID hardware abstraction layer
Thread Message queue
X
RFID reader
Message from X to A
Object
b RFID tags
A Y
Remote references
Buffered message from Y to B
B
connected remote reference (tag in range) disconnected remote reference (tag not in range)
Fig. 3. Overview of the RFID event loop
annotated by the programmer with the Mutator annotation1 . These annotations are used by the implementation to detect when objects change and have to be written to the corresponding tag. For example, calling the addReview mutator method on a book object first updates the reviews field by adding the new review. Subsequently, the system serializes the modified book object and stores it on the correct RFID tag. The proxy objects are managed by what we will henceforth denote as the RFID event loop. It controls an RFID reader to detect appearing and disappearing tags (a and b) and it associates proxy objects with them (A and B). These proxy objects can then be used by other event loops to interact with the tags as if they were mutable software objects. They do this by obtaining remote references to the proxy objects. Remote references (X and Y) reflect the state of the corresponding RFID tags (a and b). When a tag moves out of range of the reader the remote reference is signaled of this disconnection; conversely, when a tag moves back in range the remote reference is signaled of the reconnection. Figure 3 shows a general overview of the RFID system. R2
Storing Objects on RFID Tags
When the RFID event loop detects a blank RFID tag, the tag is represented by a generic proxy object which responds to only one message: initialize. The code below shows how a blank tag is initialized as a book object: when: tag<-initialize(aBook) becomes: {|book| ...};
The RFID event loop generates a data representation of the aBook object by serializing it and stores this data on the RFID tag that corresponds with the tag proxy object. The reference tag to the generic proxy object is obtained using the discovery constructs we explain in section R3. From this point on, the RFID tag is no longer “blank” as it contains application specific data. When 1
AmbientTalk is a highly dynamic programming language which makes it impossible to determine from the source code if mutating operations are going to be invoked. Annotations and type tags are the same data type and are programmer-defined.
Distributed Object-Oriented Programming with RFID Technology
63
storing the object on the tag succeeds, the call to initialize returns with a new remote reference book that points to a newly constructed proxy object (the when:becomes:-construct is explained in section R4) representing the book. The RFID event loop keeps track of the unique link between a proxy object and a tag by means of the serial number that each tag carries. R3
Reactivity To Appearing and Disappearing Objects
As explained in section R1, the RFID event loop notifies other event loops of the appearance and disappearance of the objects they have remote references to. In the code example shown below, an event handler that will execute a block of code each time an object of type Book is discovered is installed using the whenever:discovered: construct. The registered code block is parametrized by the remote reference to the book object (which is also used to send it asynchronous messages). whenever: Book discovered: {|book| whenever: book disconnected: { // react on disappearance }; whenever: book reconnected: { // react on reappearance }; };
Once a remote reference to a book is obtained, within the whenever:discovered callback, two more event handlers can be registered on the book remote reference using the whenever:disconnected: and whenever:reconnected: constructs. These allow one to install a block of code which is executed as soon as the object denoted by the book remote reference moves in or out of range of the reader. Notice that upon reconnection the proxy object maintains its identity through the book reference. For each whenever-handler there exists a when-variant that executes only once. R4
Asynchronous Communication
Applications that acquire a remote reference to a proxy object can communicate with it via asynchronous message sending. Messages sent to proxy objects are handled sequentially by the thread encapsulated in the RFID event loop. This ensures that all proxy objects hosted by the RFID event loop are protected against race conditions. When the remote reference to a proxy object is disconnected, all messages sent to it are locally buffered in the remote reference. When the connection is restored, the messages are flushed to the RFID event loop’s message queue. This means that a message sent to a proxy object of which the RFID tag temporarily suffers from interference or is temporarily unavailable will eventually be processed. Messages sent to proxy objects can either retrieve data (read operations) or trigger behavior that causes side effects (write operations). Both kinds of operations aim to keep the tag synchronized with the proxy object. Performing a read operation on a proxy object causes the proxy object to be updated with the data on the corresponding tag. Performing write operations first cause a side
64
A.L. Carreton, K. Pinte, and W. De Meuter
effect on the proxy object, thereafter the corresponding RFID tag is updated to contain the modified proxy object. Reading and writing tags is thus caused by sending messages to the proxy objects, this also means that access to the RFID reader is managed by the RFID event loop’s message queue and protected against concurrent access. Asynchronous messages are sent using the <- operator. The following example asks a book for its title and displays it: when: book<-getTitle() becomes: {|title| system.println(title)}; system.println("here first!");
The asynchronous call to getTitle immediately returns with a future object. Such a future object can be used to notify callbacks that the return value of the asynchronous call was received. This happens by means of the when:becomes:construct. Using this construct, a block of code can be registered on the future that is executed once the future signals that the return value of the message was received, taking the return value as an argument. This example thus immediately prints "here first”! and only after the title future signals the reply, it prints the title of the book. If the RFID tag corresponding to the book object has disappeared upon sending the message, the remote reference buffers the message until the tag reappears. This message will only be sent when the RFID tag represented by the remote reference is back in range. R5
Fault-tolerant Communication
Buffering an asynchronous message to a proxy object ensures that the message will eventually be sent if the tag moves in range. This makes the communication fault-tolerant as no exception is raised when the object is unavailable for a short period of time. However, failures may not be temporary, a tag may move out of range and never return again. Using the Due annotation, we can annotate the message send with a duration that controls how long a message is buffered before timing out. For example, we can add short reviews to a book: def myReview := "not suitable for beginners"; when: book<-addReview(myReview)@Due(10.seconds) becomes: {|ack| // message processed successfully } catch: TimeoutException using: {|e| // message timed out };
Suppose the RFID tag corresponding with book would leave the reader’s range before the addReview message is received by the book’s proxy object. Then the message is buffered for at most 10 seconds. If the tag does not respond in time, a timeout exception is raised. If the tag reappears in range within this time frame, the message to add the review myReview is delivered to the RFID event loop and the corresponding book object is updated and stored on the RFID tag. Remember from section R1 that addReview was annotated as a mutator method. This means that first the reviews field of the proxy object is updated by adding the new review. Subsequently, the RFID event loop serializes the
Distributed Object-Oriented Programming with RFID Technology
65
changed book object and stores it entirely on the correct RFID tag. Only after both of these operations complete successfully, the future object triggers all its registered when-observers. If this did not happen within the 10 second timeframe, the exception is signaled to client applications and their registered catch-blocks are invoked. 3.3
Addressing Specific Groups of RFID-Tagged Objects
As mentioned in section 2, RFID tags are typically used in large quantities, e.g. in warehouse applications. In mobile RFID-enabled applications it is often necessary to address a specific group of objects. E.g. for all tags that represent a certain product the price stored on the tag should be updated. However, such a collection of RFID tag objects has a highly dynamic nature due to the volatile connections with the RFID tags. At any point in time, tags move out of range and new tags move in range. Instead of forcing the programmer to manually manage collections of nearby objects, AmbientTalk has a dedicated abstraction to discover and address a group of objects: ambient references [18]. At any point in time, an ambient reference designates the set of proximate objects of a certain type. This abstraction is applicable because we represent physical objects as remote proxy objects. An ambient reference represents a variable collection of proxy objects, e.g. the set of nearby books. This set is updated behind the scenes when books move in and out of range. The example below shows an ambient reference to all books in the proximity, denoted by the Book type: def books := ambient: Book;
Ambient references allow to specify various predicates to refine the set of objects designated. This is shown in the example below where books are selected based on their category field: def computerScienceBooks := ambient: Book where: {|b| b.category == "Computer Science"; };
A last example shows how we can address a single object out of the group of nearby objects encapsulated in the ambient reference. For example, if all books about computer science are placed in the same shelf in the library, it is sufficient to query any one book about this topic in range for its shelf: def shelfFuture := computerScienceBooks<-getShelf()@Any; when: shelfFuture becomes: { |shelf| system.println("The book should be on shelf: " + shelf); }; computerScienceBooks<-setShelf("5D")@Sustain;
This happens by annotating the getShelf message with @Any. We can also reach all objects in range using one-to-many communication. The last line of the example updates the shelf where computer science books should be located (e.g. because they have to be moved). The Sustain annotation causes the setShelf message to be perpetually sent to newly discovered computer science books.
66
3.4
A.L. Carreton, K. Pinte, and W. De Meuter
Putting It All Together
Finally, in this section we bring together the language constructs presented throughout this paper to implement the example application introduced in section 3.1. First of all, while the user moves about in the library, the list of nearby books has to be updated. The following code snippet shows this: 1 2 3 4 5 6 7 8 9
deftype Book; def books := ambient: Book; whenEach: books<-getBookInfo()@Sustain becomes: {|infoAndRef| GUI.addBookInfoAndReferenceToList(infoAndRef); }; whenever: Book discovered: {|book| whenever: book disconnected: { GUI.removeBookFromList(book) }; };
The first line declares the Book type and the second line creates an ambient reference that refers to all books in range. On line 4, the asynchronous message getBookInfo to the books ambient reference is annotated with @Sustain, which causes the ambient reference to perpetually send this message to newly appearing books. This returns a multifuture, i.e. a special future object that can trigger the same callback block multiple times with a new value. This callback is registered on the multifuture with a special when-construct (whenEach:becomes:). The code block is triggered each time the multifuture is resolved with a new return value from the message invocation on the ambient reference. The return value of this message is the info about the book (i.e. ISBN number, title and authors) and a reference to the book object. These return values are bound to the infoAndRef parameter of the observer block, which is added to the list in the user interface object. This causes the user interface to show a new entry in the list of nearby books, and to associate a reference to the book entry in this list. On line 7, for every book discovered, a whenever:disconnected: observer is installed that, when triggered because a book went out of range, removes the book from the list in the user interface by means of the book remote reference. Notice that although the remote reference points to an unreachable book, it can still be used to look up the book in the list and remove it. This is an example of the system being tailored towards scenarios where disconnections are the default rather than the exception. As mentioned earlier, the references to the books are being associated with the list entries. This way, when a user double clicks on a list entry, a dialog box is shown in which the user can type a small review or some comments about the book. When accepting the input data of the dialog box, the application attempts to add the text the user just entered to the list of reviews associated on the book itself. This is illustrated by the code snippet below. As we showed earlier in section R4, invoking the addReview method on a book is a mutating operation (i.e. the method is tagged as a Mutator) which causes the book proxy object to be synchronized with its physical representation on the RFID tag.
Distributed Object-Oriented Programming with RFID Technology
67
Notice that this write operation might not happen instantaneously because the RFID tag might be out of range for some time. The following code snippet shows the function that is called after the user wrote a comment in the dialog box we described above: 1 2 3 4 5 6
def addReviewToBook(book, text) { when: book<-addReview(text)@Due(5.seconds) becomes: {|ack| showOkDialog("Review added succesfully!"); } catch: TimeoutException using: {|exc| showWarningDialog("Failed to add review!"); }};
The dialog object passes the reference to the book and the user’s text as arguments to the function shown above. This addReviewToBook function asynchronously sends the addReview message to the book via the remote reference passed as an argument. The message is annotated with @Due(5.seconds) to indicate that if the message is not successfully processed after 5 seconds, a TimeoutException should be raised. The when:becomes:catch: observer installed on the future returned by the message send can trigger two blocks. The becomes: block is triggered when the message was successfully processed by the proxy object and in addition the mutated data was successfully written to the physical RFID tag (since the addReview method is a mutator). As mentioned earlier, within the 5 second timeout period, the RFID tag might have moved in and out of range for several times, but the underlying implementation of the language constructs keeps attempting to write the data until this timeout period has passed. If the timeout period passed without that the review has been successfully written on the tag, the catch: block of the observer is invoked. This block simply shows a dialog box that notifies the user that adding the review failed. In response, the user can try again, maybe after repositioning himself closer to the book.
4
Limitations and Future Work
The thread associated with each event loop consumes the incoming messages sequentially. This means that no objects are shared between different threads and race conditions cannot occur. However, when we consider RFID tags as an ambient environmental memory, it may very well be that a set of RFID tags is in the range of multiple users at the same time. When these users concurrently update the same tag from different devices, distributed race conditions on that tag may occur. In our experiments we have employed passive RFID tags that can only be powered upon communication. This means that here is no way of locking the RFID tag for a limited amount of time. Another limitation of using this type of tags is that currently, they offer only a very limited amounts of writable memory. We have tested our implementation using RFID tags with up to 8 kbits of writable memory. This means that we can
68
A.L. Carreton, K. Pinte, and W. De Meuter
only store very small serialized objects on the tags. On the other hand, the technology is progressing and we can expect the storage on passive tags to steadily increase while the costs drop. As a way to circumvent these limitations we are currently experimenting with active RFID tags. These tags are battery-powered and can keep on running independently from the readers. This means that they can store more data and can execute code, which opens up opportunities for solving the problems mentioned above when more expensive tags can be used, and in addition may lead us in new research directions.
5
Conclusion
Today, developing mobile RFID-enabled applications remains complicated because application developers have to deal manually with the hardware characteristics on a very low level in a general-purpose programming language. Current middleware are not suited to develop such applications (which require writing application-specific data on tags). On the other hand, lower level approaches do not integrate the hardware characteristics into the heart of their programming model, introducing the complexity that we are trying to tackle. The abstractions presented in this paper integrate closely with the object-oriented message passing paradigm, thereby aligning physical objects tagged with writable RFID tags with true mutable software objects. By implementing an example mobile RFID-enabled application, we have observed that the requirements that we set forward for programming mobile RFIDenabled applications are met in the following ways: Addressing physical objects. The implementation of the application shows that mobile RFID-enabled applications can be written in an object-oriented fashion, where application-level proxy objects uniquely represent physical objects in one’s physical environment. Storing application-specific data on RFID tags. The data needed to construct these proxy objects is stored on the RFID tags themselves. Reactivity to appearing and disappearing objects. Application logic is expressed in terms of reactions to changes in the physical environment by relying on a number of expressive abstractions that are integrated into a communicating event loops framework. Asynchronous communication. Interacting with physical objects is achieved by using the message passing metaphor on the proxy objects, by means of asynchronous message passing and asynchronous signaling of return values. Fault-tolerant communication. Communication failures are considered the rule rather than the exception. Failures that must be considered permanent are detected and raise the appropriate exceptions.
References 1. Waller, V., Johnston, R.B.: Making ubiquitous computing available. Commun. ACM 52(10), 127–130 (2009) 2. Bleecker, J.: A manifesto for networked objects — cohabiting with pigeons, arphids and aibos in the internet of things (2006)
Distributed Object-Oriented Programming with RFID Technology
69
3. Roussos, G., Kostakos, V.: RFID in pervasive computing: State-of-the-art and outlook. Pervasive Mob. Comput. 5(1), 110–131 (2009) 4. Floerkemeier, C., Roduner, C., Lampe, M.: RFID Application Development with the Accada Middleware Platform. IEEE Systems Journal, Special Issue on RFID Technology 1, 82–94 (2007) 5. Kefalakis, N., Leontiadis, N., Soldatos, J., Gama, K., Donsez, D.: Supply chain management and NFC picking demonstrations using the AspireRfid middleware platform. In: Companion 2008: Proceedings of the ACM/IFIP/USENIX Middleware 2008 Conference Companion, pp. 66–69. ACM, New York (2008) 6. Kulyukin, V., Gharpure, C., Nicholson, J., Pavithran, S.: RFID in robot-assisted indoor navigation for the visually impaired. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 1979–1984 (2004) 7. Philipose, M., Fishkin, K.P., Perkowitz, M., Patterson, D.J., Fox, D., Kautz, H., Hahnel, D.: Inferring activities from interactions with objects. IEEE Pervasive Computing 3(4), 50–57 (2004) 8. Fleck, M., Frid, M., O’Brien-Strain, E., Kindberg, T., Rajani, R., Spasojevic, M.: From informing to remembering: Deploying a ubiquitous system in an interactive science museum. IEEE Pervasive Computing Magazine Interactive Science Museum, 13–21 (April-June 2002) 9. Mamei, M., Quaglieri, R., Zambonelli, F.: Making tuple spaces physical with rfid tags. In: Symposium on Applied computing, pp. 434–439. ACM, New York (2006) 10. Carey, M.J., DeWitt, D.J.: Of objects and databases: A decade of turmoil. In: 22th International Conference on Very Large Data Bases, pp. 3–14. Morgan Kaufmann Publishers Inc., San Francisco (1996) 11. Dedecker, J., Van Cutsem, T., Mostinckx, S., D’Hondt, T., De Meuter, W.: Ambient-oriented programming. In: 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp. 31–40. ACM, New York (2005) 12. Dedecker, J., Van Cutsem, T., Mostinckx, S., D’Hondt, T., De Meuter, W.: Ambient-oriented programming in ambienttalk. In: Thomas, D. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 230–254. Springer, Heidelberg (2006) 13. Ungar, D., Chambers, C., Chang, B.-W., H¨ olzle, U.: Organizing programs without classes. Lisp Symb. Comput. 4(3), 223–242 (1991) 14. Dedecker, J., Meuter, W.D.: Using the prototype-based programming paradigm for structuring mobile applications (2002) 15. Murphy, A.L., Picco, G.P., Roman, G.-C.: Lime: A middleware for physical and logical mobility. In: 21st International Conference on Distributed Computing Systems, Washington, DC, USA, p. 524. IEEE Computer Society, Los Alamitos (2001) 16. Miller, M., Tribble, E.D., Shapiro, J.: Concurrency among strangers: Programming in E as plan coordination. In: De Nicola, R., Sangiorgi, D. (eds.) TGC 2005. LNCS, vol. 3705, pp. 195–229. Springer, Heidelberg (2005) 17. Cutsem, T.V., Mostinckx, S., Boix, E.G., Dedecker, J., Meuter, W.D.: Ambienttalk: Object-oriented event-driven programming in mobile ad hoc networks. In: XXVI International Conference of the Chilean Society of Computer Science, Washington, DC, USA, pp. 3–12. IEEE Computer Society, Los Alamitos (2007) 18. Van Cutsem, T., Dedecker, J., Mostinckx, S., Gonzalez, E., D’Hondt, T., De Meuter, W.: Ambient references: addressing objects in mobile networks. In: 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, pp. 986–997. ACM, New York (2006)
WISeMid: Middleware for Integrating Wireless Sensor Networks and the Internet Jeisa P.O. Domingues, Antonio V.L. Damaso , and Nelson S. Rosa Universidade Federal de Pernambuco, Centro de Inform´ atica Caixa Postal 7851 - 50740-540 - Recife - PE - Brazil {jpo,avld,nsr}@cin.ufpe.br
Abstract. Wireless sensor networks (WSNs) have recently received growing attention as they have great potential for many distributed applications in different scenarios. Whatever the scenario, most WSNs are actually connected to an external network, through which sensed information are passed to the Internet and control messages can reach the WSN. This paper presents WISeMid, a middleware that focuses on integrating Internet and WSN at service level instead of integrating protocol stacks and/or mapping logical addresses. WISeMid allows the integration of WSN and Internet services by providing transparency of access, location and technology. To validate WISeMid, some results of a power consumption evaluation of the middleware are presented. Keywords: Wireless Sensor Networks, Internet, Middleware, Integration, Service.
1
Introduction
A Wireless sensor network (WSN) is composed by a large number of sensor nodes which are deployed either inside a phenomenon or very close to it. Those nodes sense a physical aspect (e.g., pressure or temperature), process the sensed data and transmit them to a sink node, which can act as a gateway to other networks, a connection to a powerful data processor or an access point for human interface. Most WSNs are connected to an external network, through which their data can reach the final user and control messages can reach the WSN [1]. WSNs have been receiving growing attention as sensor nodes are becoming smaller, cheaper and intelligent, which enable the development of real and complex applications in scenarios such as military target tracking and surveillance, natural disaster relief, biomedical health monitoring, hazardous environment exploration and seismic sensing. As WSNs become more numerous and their data more valuable, it becomes increasingly important to have common means to share data over the Internet [2]. Since WSNs can be easily deployed to various environments to monitor target objects and various conditions, and to collect information, they are considered
Antonio V.L. Damaso is supported by CNPq/Brazil.
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 70–83, 2010. c IFIP International Federation for Information Processing 2010
WISeMid: Middleware for Integrating WSNs and the Internet
71
one essential infrastructure for pervasive computing systems [3]. Also, the WSN is one of the many networks that will compose the Ambient Networks [4]. For all those reasons, integrating WSNs with the Internet has become increasingly desirable and necessary. A number of solutions have been proposed in recent years to provide the integration of WSNs and the Internet. Most of them aim at integrating those networks through mapping protocol stacks and logical address formats used in both networks. Those solutions focus at accessing the network nodes through their logical addresses, which raises several problems. In this context, this paper proposes a solution that aims at integrating applications instead of networks (that is, protocols stack and/or logical address formats mapping). The idea is to provide an infrastructure, namely WISeMid, that allows integrating applications, which are considered services, in a transparent way. In practice, a service that is offered by a sensor node in a WSN or by a host in the Internet should be accessed in a uniform way irrespective of the client or the service location. Hence, application developers only need to know the service name to access its operations as WISeMid takes responsibility for hiding the heterogeneity of all network low level mechanisms. In order to validate our middleware, a power consumption evaluation is performed and the presented results show that although the components added by WISeMid infrastructure increases the power consumption, the services and features it offers save significant energy, which is a tradeoff worth making. The remainder of this paper is organized as follows. The related works are presented in Section 2. Section 3 introduces the WISeMid, describing its elements and its implementation. Section 4 presents some WISeMid evaluation results concerning power consumption. Finally, Section 5 discusses some conclusions and presents some future work.
2
Related Work
Some approaches have been proposed to integrate WSNs and the Internet. The simplest one is the gateway-based approach. It may use an application layer gateway, translating query messages from one side (typically Internet) into messages that can be understood on the other side (usually WSN) [2] and/or mapping addresses [5]; or a Delay Tolerant Networks (DTN) gateway, providing interoperability between and among WSNs, which are considered DTN networks [6]. Overlay-based approaches have been proposed, where some sensor nodes use the TCP/IP protocols or some hosts use WSN protocols [7]. Also, mobile agents have been used to dynamically access the WSN from the Internet [1]. Although directly employing the TCP/IP suite in the WSN would enable its seamless integration with TCP/IP networks, this approach has several problems [8]: the addressing and routing schemes of IP are host-centric, and does not fit well with the sensor network paradigm, where the main interest is the data generated by the sensors and the individual sensor is of minor importance; the header overhead in TCP/IP is very large for small packets, and its size may
72
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
constitute nearly 90% of each packet when sending a few bytes of sensor data, which is not acceptable as it wastes valuable energy in radio transmission; TCP does not perform well over wireless links networks where packets frequently are dropped because of bit-errors; and the end-to-end retransmissions used by TCP consumes energy at every hop of the retransmission path. Also, sensor nodes memory and computational resources are limited and not able to run a full instance of the TCP/IP protocol stack. For that problem, some works have proposed simplified versions of TCP/IP protocol stack: [9]-[10]. A reflective, service-oriented middleware for WSN is proposed in [11]. The middleware acts as a broker between applications and the WSN, translating application requirements into WSN configuration parameters. It monitors both network and application execution states, performing a network adaptation whenever it is needed. For an external point of view, applications are service requestors and sink nodes are service providers, releasing the descriptions of the WSN services and offering access to these services. Although it has some similarities with our proposal, it focuses on network adaptation capability. Besides, it assumes the WSN is a service provider, but not a service consumer. The implementation of tiny web services directly on sensor nodes is presented in [12], including an XML parser, an HTTP server and a simplified TCP/IP protocol stack. As the previous approach, it considers the sensor nodes are only service providers (actually, web service providers), not consumers. Unlike most of the mentioned works, this paper proposes a solution that focuses on integrating Internet and WSN at service level instead of integrating protocol stacks and/or mapping logical addresses. Also, even though some proposals are service-oriented, they only allow requesting services offered by the WSN, as most integration approaches only focuses in accessing the sensor nodes data from the Internet and not the other way around. Although this is the usual situation, as sensors are typically measured data providers, there are some cases where it is better for the sensor to request a service outside the WSN. That happens when the sensor has no enough resources to perform some computation, or when it becomes more resource consuming to perform such task in the sensor than sending a message to request it. For instance, there are some researches that propose solutions of management applications that run outside the WSN. In those approaches, a powerful computer that is connected to the sink node runs the management application, which collects information from the sensor nodes about their resources condition. Then it makes some computation and sends back to the sensor nodes some configuration changes, like new routes that spend less energy. One example of those solutions can be found in [13].
3
WISeMid
WISeMid (Wireless sensor network’s and Internet’s Services integration Middleware) is a communication infrastructure that supports the integration of WSNs and the Internet at service level (see Fig. 1). In this context, applications running in the Internet/WSN nodes may play the role of service providers or service users.
WISeMid: Middleware for Integrating WSNs and the Internet
73
In practice, a service user must be able to communicate with a service provider no matter whether they are running in the same network or not. Hence, WISeMid should provide an infrastructure that allows integrating these services in such a transparent manner that a service should be accessed in the same way irrespective of it being provided by a WSN sensor node or by an Internet host. Additionally, WISeMid should support application services developed in different technologies, such as Web Service, Java RMI, EJB and JMS, although its current implementation supports only WISeMid services.
Fig. 1. WISeMid logical view
Fig. 2. WISeMid physical view
Figure 2 presents a physical view of how WISeMid spreads out through the Internet and WSN. The WISeMid implementation for WSN and Internet is not the same, as they have different requirements and components (that will be explained in the following sections). The physical communication is performed through an Internet host that is connected to the WSN sink node via a serial port (USB). This host executes a special WISeMid service called SAGe, which acts as a proxy between both networks (see Sect. 3.5). 3.1
Overview
In order to promote the networks integration at service level, our middleware has to address some issues. WISeMid should deal with four different kinds of heterogeneity, namely operating system, network, hardware and programming language. WSN and Internet nodes have different hardware platforms (e.g., PC and MICAz nodes) and network protocol stack (e.g., ZigBee and TCP/IP). In addition, applications running in both networks are developed atop different operating systems (e.g., Windows or Unix-based OS, and TinyOS) and using distinct programming languages (e.g., Java and nesC). The heterogeneity of programming language raises another issue: data type mapping. For example, consider that a service user is written in nesC (it runs in the WSN) and a service provider is written in Java (it runs in the Internet). When the service user invokes an operation in the service provider, it is necessary to translate nesC data types into Java ones in a transparent way to application developers. Along with handling the considered heterogeneities, it is also necessary to define the basic abstraction adopted for building applications (e.g., objects, services), the communicating entities (e.g., client/servers, peers), the way these
74
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
entities communicate (e.g., synchronously, asynchronously), and distributed services provided by the middleware (e.g., naming service). 3.2
IDL
As service is the key concept in the proposed approach, an initial step consists of defining how a service is described. For this particular purpose, we have defined the WISeMid IDL (Interface Definition Language) that enables us to define service interfaces in a uniform way, i.e., wherever the service runs (WSN or Internet), its interface is described using the proposed IDL. The general structure of an interface defined in WISeMid IDL is shown as follows: 1: module PACKAGE_NAME{ 2: interface INTERFACE_NAME{ 3: [OPER_TYPE] OUTCOME_TYPE 4: } }
OPER_NAME([TYPE ARG1,...]) [raises(EXCEPTION_NAME1,...)]
The module (package) that contains the service should be initially specified (1). Then, the service interface includes its name (2) and provided operations (3). Each operation has a name, input/output typed parameters and may raise exceptions. Additionally, an operation is by default a request-response operation, but it may be defined as a one-way operation, which means that no response is expected when the operation is invoked. 3.3
Requirements
Considering the issues introduced in Sect. 3.1, the following WISeMid’s requirements have been defined: (R01) service providers should register their services in a Naming Service; (R02) service users should ask the Naming Service for the service it wants to use; (R03) WISeMid should provide location transparency, so the service users are not aware of the location of the service being used; (R04) WISeMid should provide access transparency, i.e., service users access local and remote services in a similar way; (R05) service data should have the same interpretation whatever the programming languages used to implement the service users and service providers; (R06) service providers/users communicate among themselves using Request/Reply communication pattern; (R07) service communication is synchronous; and (R08) services should be stateless and untyped. In addition to these requirements, which concern both networks, there are some specific requirements regarding the limited resources of sensor nodes: (R09) the messages transmitted to the WSN should be kept as short as possible (for instance, if an argument type uses four bytes to represent a value but the argument value fits in one byte, and there is a compatible type that uses only one byte to represent that value, our middleware should convert this argument to the smaller type before sending it to a sensor node); and (R10) unnecessary messages should not be forwarded to the WSN (for example, when an Internet application requests a sensed data like temperature, whose variability considering second/minute time scale is not so significative, WISeMid may decide not to forward this request to the WSN, returning to the application the last value obtained from the WSN).
WISeMid: Middleware for Integrating WSNs and the Internet
3.4
75
Architecture
The WISeMid architecture is depicted in Fig. 3 and consists of three layers: Infrastructure, Distribution and Common Services.
Fig. 3. WISeMid Architecture
The Common Services layer includes services that are not particular to a specific application domain: Aggregation, which performs sensor data aggregation and runs in the WSN; Grouping, which defines clusters inside the WSN; Naming, that stores information needed to access a service and runs in the Internet; and SAGe, that is in charge of forwarding messages from/to WSN and runs in the Internet. Additionally, SAGe also provides location transparency acting as a service proxy between both networks and performs some tasks concerning the aforementioned WSN specific requirements in order to avoid waste of sensors restricted resources (see Sect. 3.5). The Distribution layer includes the following elements: the Stub, which represents a local instance of the service within the client process and offers the same interface as the remote service; the Requestor, which constructs a remote invocation on the client side from parameters such as remote service location, service name and arguments; the Skeleton, which dispatches remote invocations to the remote service using the invocation information sent by the Requestor; and the Marshaller, which serializes and deserializes the parameters passed between client and server using the WIOP Messages. WIOP is our middleware interoperability protocol, which is described in the next section. The Infrastructure layer consists of the Client Request Handler and the Server Request Handler, which handle network communication using the communication facilities provided by the operating systems, e.g., sockets (Windows) and GennericCom (TinyOS). As sensor nodes have limited resources, some elements of the WISeMid architecture are not present in the WSN: the Requestor is not implemented by the WSN service users (its functions are deployed by the Stub), and WIOP messages are treated as byte sequences, which means that the Marshaller is not necessary.
76
3.5
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
Implementation
The WISeMid implementation is divided into two parts, one for the WSN nodes, developed in nesC, and another for Internet hosts, developed in Java. Implementation details of the main middleware elements are described as follows. WIOP. The WISeMid Inter-ORB Protocol (WIOP) is a GIOP-based protocol that defines the Request/Reply messages between clients and servers. A WIOP message is divided into header and body. The WIOP message header is composed by the following fields: endianness (e.g., big endian or little endian); msgType (e.g., request or reply message); and msgSize, which stores the message size in bytes. The WIOP message body may contain a Request or a Reply message. These messages have also a header and a body. The Request message header comprises the fields: requestId, which stores the Request message ID; responseExpected, which signals whether the request has a Reply message or not; serviceId, which is the ID of the requested service; and operation, which represents the name of the operation being invoked. The Request body consists of the number of arguments (numArgs) followed by a sequence of type and value of each argument. The Reply message header contains the fields: requestId, which stores the related Request message ID; replyStatus, which signals whether there was any exception while executing the request, and its possible values are NO EXCEPTION (0), USER EXCEPTION (1), SYSTEM EXCEPTION (2) and LOCATION FOR-WARD (3). The Reply body is composed by the result type and its value. Although containing the same fields, the WIOP field sizes are smaller in the version running in the WSN (called WIOPs ) than in the Internet version of WIOP (called WIOPi ). In order to avoid energy waste during radio transmission, WIOPs messages are kept as minimal as possible. Also, the way arguments are stored in the Request body is different for the WSN version. In the WIOPi , the arguments are stored one by one, being each argument composed by a type and a value. For example, three arguments would be stored in the following sequence: type1, value1, type2, value2, type3, value3. In the WIOPs , only the first argument is individually stored (with its type and value in a row). From the second argument on, the arguments are grouped into couples where the types of both arguments come first followed by their respective values. That happens because types are represented by integer numbers between 0 and 11 (e.g., the float type is represented by the number 7), and therefore each type can be stored in only 4 bits. Hence two types can be grouped into one byte, being followed by their related argument values. In this case, three arguments would be stored in the following sequence: type1, value1, type2, type3, value2, value3. Besides saving energy by its reduced size, the sensor message format also concerns about sensor limited processing as it is already deployed as a byte array, avoiding the need of a Marshaller implementation. The WISeMid Naming Service and Internet services use the Internet format, while the sensor services use the WSN format. Only SAGe handles both formats.
WISeMid: Middleware for Integrating WSNs and the Internet
77
Naming Service. The WISeMid Naming Service stores the references of services executing in the Internet and WSN in such way that a service may only be accessed/used after being registered in the Naming Service. The Naming’s service interface includes five operations: Bind, to register a service by its name, associating it with its reference; Lookup, to return the reference associated to a service name; Rebind, to change the reference that is associated with a service name; Unbind, to unregister a service name; and List, to list all registered services. The service reference includes the service ID, endianness, the IP address and the port number. In the case the service is running in the WSN (i.e., the sensor node has not an IP address), the stored IP address is the SAGe address. SAGe. As stated in Sect. 3.4, SAGe (Sensor Advanced Gateway) [14] is an important element in WISeMid architecture. Running in the Internet host connected to the WSN sink node via a serial port (see Fig. 2), SAGe’s main function is to act as a service proxy between both networks by enabling the communication between services running on Internet hosts and WSN nodes in a transparent way. Furthermore, SAGe also performs some tasks concerning the previously defined WSN specific requirements (R09 and R10). This section describes how SAGe provides the location transparency and implements those requirements. a. Binding of a WSN service: Once a WSN service starts, it sends a message invoking the Bind operation of the Naming Service. When SAGe receives that message, it creates a ServiceReference to the WSN service including the SAGe’s IP address and port, then sends a binding request to the Naming Service, registering the WSN service as a SAGe service. It also keeps the created reference cached as a SageServiceReference, which assigns the ServiceReference to the node ID of the sensor providing the service. In such manner, SAGe knows which sensor node a Request message to a WSN service must be forwarded to. b. Invocation of a WSN service: When SAGe receives a Request message from the Internet invoking a WSN service, it converts the message to a WIOPs Request and sends it to the WSN using the SageServiceReference that was cached when the WSN service was bound. Once the Reply message from the sensor service provider is received, SAGe converts it into a WIOPi Reply message and forwards it to the Internet service user. If no SageServiceReference is found, SAGe does not forward the request to the WSN. Instead, it sends a Reply message reporting an error to the Internet host that requested the service. c. Invocation of an Internet service: When a sensor node service user performs a lookup for an Internet service, SAGe checks if this service is already known, i.e., if its reference is cached. If the service is unknown, SAGe converts and forwards the lookup request to the WISeMid Naming Service. When it receives the WIOPi Reply, it stores the returned ServiceReference (as a SageServiceReference) and sends the service ID to the sensor node service. Using the received service ID, the WSN service invokes the Internet service operation. When SAGe receives the sensor Request message, it uses the cached ServiceReference to invoke the requested operation and, once the Reply message arrives, SAGe converts and forwards it to the sensor node service user.
78
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
d. WSN requirements implementation: SAGe implements the WSN specific requirements defined in Sect. 3.3. To meet requirement R09, SAGe performs an additional step when converting an Internet Request message into a sensor Request message. For each argument in the Request body, it tries to fit the argument value in a smaller type (that is, a compatible type that uses less bytes). For instance, if the argument is a long (an integer of 8 bytes) but its value is ‘525’, it can be stored into a short (an integer of 2 bytes). Thus SAGe converts the argument from a long into a short and adds only 2 bytes to the WIOPs instead of the original 8 bytes, avoiding the transmission of unnecessary 6 bytes. The same step is performed for the result value of WIOPi Reply messages when converting them into WIOPs Reply messages. To deploy the other WSN specific requirement (R10), SAGe performs three more procedures. The first one is not forwarding to WSN any Internet Request which asks for a sensor service that has not been registered. It would be useless and energy wasting since no sensor announced that service. The second procedure occurs when a sensor requests an Internet service. It consists in not giving up at the first unsuccessful attempt to connect to the Internet service provider. Considering that it may be a sporadic problem, SAGe tries again to connect to the server a configurable number of times before returning an error to the sensor node. This measure aims to refrain the sensor from sending another Request in case the answer is fundamental for its application. The last procedure consists in avoiding sending equivalent Request messages (that is, messages asking for the same service with the same parameters) during a short period of time. Assuming that some sensed values do not change very quickly, sending the same Request for the same service in a short time period will likely return the same value, resulting in unnecessary processing and energy consumption. Hence, SAGe groups equivalent Request messages and, for a configurable period of time, only one Request is sent to the sensor service provider, and the received Reply message is stored and forwarded as an answer to all the equivalent Request messages. For the cases where the sensed value changes very often, this procedure may be turned off by setting to null (i.e., 0 seconds) the Reply message storage timeout.
4
Evaluation
As energy is a critical resource in wireless sensor networks, this section presents results about some experiments that analyze how WISeMid affects power consumption in the sensor node. For all scenarios, we use two MICAz motes: one connected to a MTS400 basic environment sensor board, running the application that constitutes the scenario; and another connected to a MIB520 USB programming board, working as a base station (BS), i.e., the sink node. The BS is connected to an Internet host that runs the WISeMid SAGe service or TinyOS SerialForwarder application, which acts as a proxy between the WSN and the Internet. Also, two other services run on Internet hosts: the Naming service and the service/application under evaluation.
WISeMid: Middleware for Integrating WSNs and the Internet
79
In order to estimate the power consumption of the sensor node, an oscilloscope (Agilent DSO03202A) has been used. A PC is connected to the oscilloscope that captures the code snippet execution start and end times by monitoring a led of the sensor, which is turned on/off to signalize the execution start/end. The PC runs a tool named AMALGHMA [15], which is responsible for calculating the power consumption. Five scenarios have been analyzed so far and they can be divided into two groups: one that studies the impact of WISeMid infrastructure on the power consumption of a WSN node, and one that evaluates the efficiency of some WISeMid features specially designed for saving power. In order to make the results more reliable, all values presented here are actually a mean value of 1000 executions of the code in study. 4.1
WISeMid Infrastructure Impact
These scenarios compare the power consumption of a service that uses the WISeMid infrastructure to a similar application that uses only TinyOS. It is worth noting that TinyOS does not have the notion of service, therefore an application with similar functionality to the service actually runs on the TinyOS. The first scenario measures the power consumption of a service that sends 30 messages to the Internet with an interval of 100ms between them. The reason for using 30 messages is to make the difference between the power consumption for both cases more evident without being costly. Sending only one would result in an almost unnoticeable difference whereas more than thirty would make the measurements slower and difficult to synchronize with the oscilloscope window. The interval of 100ms between consecutive messages is necessary to assure the next message will only be sent after the previous one has been completely transmitted. Smaller values have been tried but messages were still being lost. When using the WISeMid infrastructure, a one-way WIOP message is created and sent through the WISeMid components. When the TinyOS is used, a message is normally created (i.e., by defining a struct). Both messages contain only one byte that carries the maximum value: 127 (11111111). Also, while the WISeMid service uses SAGe, the TinyOS application uses the SerialForwarder. Figure 4 shows the results of this scenario. Using the WISeMid spent 1.5% more energy than using only TinyOS. It was already expected that WISeMid was more power demanding as it adds more layers (components) to the sensor node. However, it was a small increase rate, which is acceptable considering the benefits it brings. In the second scenario, the power consumption of a service that receives 30 messages from the Internet is calculated. The interval between each message is 100ms. Similarly to the previous scenario, WIOP messages are used to the WISeMid service whereas typical messages are handled by the TinyOS application, both carrying a byte with the value 127. The results for this scenario are presented in Fig. 5. As expected, the WISeMid service consumed more power than the TinyOS application (6.15%). Similarly to the previous experiment, the benefits of adopting WISeMid has a small cost in terms of power consumption.
80
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
Fig. 4. Power consumption for sending 30 packets - TinyOS versus WISemid
Fig. 5. Power consumption for receiving 30 packets - TinyOS versus WISemid
The last scenario of this group gathers the two previous scenarios and adds the message processing and temperature reading, which are the necessary steps to answer a requisition to a Temperature service provided by the sensor node. This service implements the TEMP interface and thus provides the getTemp() operation, which returns the sensed temperature in Celsius degrees. The service interface is described in WISeMid IDL as shown: 1: module example{ 2: interface TEMP{ 3: long getTemp(); } }
This interface definition was compiled by the developed WISeMid IDL compiler, namely ProxiesGen, to generate the temperature service’s stub (Java) and skeleton (nesC). The Java stub is used by the Internet application to access the service, whilst the nesC skeleton enables the access to the temperature service on the server side. Note that in this case the service user is located in the Internet and the service provider is in the WSN node, but they do not know where each other is located. That happens due to the location transparency provided by the WISeMid Infrastructure (meeting requirement R03). In the equivalent code that uses only TinyOS, the Internet application must know the SerialForwarder IP address and port number and handle all network communications using, for instance, socket connections. The service abstraction provided by WISeMid explains the power consumption increase of 16.11% comparing to the TinyOS application version, as presented in Fig. 6. Although it is not a negligible increase, the facilities offered by the WISeMid service abstraction as well as the energy saving brought by some WISeMid services compensates that, as the next results show.
WISeMid: Middleware for Integrating WSNs and the Internet
81
Fig. 6. Power consumption for requesting a sensor service 30 times - TinyOS versus WISemid
4.2
WISeMid Services
This section presents the results of scenarios that analyze the power that some WISeMid services save. The first scenario studies the energy saving offered by the Aggregation service provided by the WISeMid. For that purpose, a sensor service sends 30 WIOP messages in a row, with an interval of 100ms between the messages. When the Aggregation service is used, instead of 30 messages, the sensor node only sends one message carrying the mean value of the 30 messages. As Fig. 8 shows, the Aggregation service saves 11.18% of energy. In the last scenario, a feature offered by SAGe as implementation of the WSN specific requirement R10 is evaluated. When this feature is on, SAGe stores every Reply message for a given period of time and forwards it to all equivalent
Fig. 7. Energy saving by using the SAGe’s Reply Storage feature
Fig. 8. Energy saving by using the WISeMid’s Aggregation service
82
J.P.O. Domingues, A.V.L. Damaso, and N.S. Rosa
Request messages that arrive during this period, avoiding to send Requests that will return the same result (see Sect. 3.5). To evaluate that, an Internet service user requests the Temperature service 50 consecutive times. We have increased the number of requisitions from 30 to 50 in order to make the difference between the power consumption for both cases more noticeable. As the initial experiments considered an interval of 100ms between consecutive requisitions, the Reply message storage timeout was set to 300ms to allow sending three requisitions during this time and check if SAGe would “block” two of them. After confirming that, we kept the timeout value, but decided to make this scenario more realistic by using random intervals between consecutive requisitions. Thus those intervals are now randomly generated following a Uniform distribution with parameters 10 and 280, which allows the reception of 1 to 30 requisitions during the 300ms a Reply message is stored, depending on the generated values. In order to compare the power consumption with this feature “on” and “off”, the same seed was used for both cases. Figure 7 shows that this SAGe feature saves 25.14% of energy as it avoids sending unnecessary requests to the WSN. The results presented in this section show that on one hand WISeMid infrastructure increases power consumption with its additional components for sensor code, but on the other hand the services and features it offers save significant energy, which is a tradeoff worth making.
5
Conclusion and Future Work
In this paper, we presented the WISeMid middleware as an approach to address the WSN and Internet integration issue. The proposed approach concentrates on solving this problem by integrating services instead of layers. WISeMid provides an infrastructure that allows integrating WSN and Internet services with transparency of access, location and technology. Hence, a service that is offered by a sensor node in a WSN or by a host in the Internet can be accessed in the same way irrespective of the client or the service localization. To validate WISeMid, a power consumption evaluation was presented, showing that although the components added by WISeMid infrastructure increases the power consumption, the services and features it offers save significant energy, which is a tradeoff worth making. In terms of future work, other power consumption evaluation are now being conducted. Also, we are improving the proposed middleware by including typed services, allowing a client to ask for a service by only specifying its type, and a life cycle manager for the remote services, which will enable stateful services. Some features will also be added to SAGe, such as turning it into a distributed service, to refrain it from becoming a bottleneck in large-scale WSN, and deploying conversion between WIOP and others interoperability protocols (e.g., IIOP, JRMP), to enable WISeMid to support application services developed in different technologies, like Web Service, Java RMI, EJB and JMS.
WISeMid: Middleware for Integrating WSNs and the Internet
83
References 1. Bai, J., Zang, C., Wang, T., Yu, H.: A Mobile Agents-Based Real-time Mechanism for Wireless Sensor Network Access on the Internet. In: 2006 IEEE International Conference on Information Acquisition, pp. 311–315 (2006) 2. Reddy, S., Chen, G., Fulkerson, B., Kim, S.-J., Park, U., Yau, N., Cho, J., Hansen, M., Heidemann, J.: Sensor-Internet Share and Search: Enabling Collaboration of Citizen Scientists. In: Workshop for Data Sharing and Interoperability (IPSN 2007), pp. 11–16 (2007) 3. Zheng, Y., Cao, J., Chan, A.T.S., Chan, K.C.C.: Sensors and Wireless Sensor Networks for Pervasive Computing Applications, Subsequences. J. Ubiquitous Computing and Intelligence 1(1), 17–34 (2007) 4. Niebert, N., Prehofer, C., Hancock, R., Norp, T., Nielsen, J.: Ambient Networks A New Concept for Mobile Networking. Technical report, Wireless World Research Forum (2004) 5. Kim, J.-H., Kim, D.-H., Kwak, H.-Y., Byun, Y.-C.: Address Internetworking between WSNs and Internet supporting Web Services. In: 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE 2007), pp. 232–240 (2007) 6. Ho, M., Fall, K.: Poster: Delay Tolerant Networking for Sensor Networks. In: 1st IEEE Conf. Sensor and Ad Hoc Communications and Networks (2004) 7. Dai, H., Han, R.: Unifying Micro Sensor Networks with the Internet via Overlay Networking. In: 29th Annual IEEE International Conference on Local Computer Networks, pp. 571–572 (2004) 8. Dunkels, A., Voigt, T., Alonso, J., Ritter, H., Schiller, J.: Connecting Wireless Sensornets with TCP/IP Networks. In: 2nd International Conference on Wired/ Wireless Internet Communications, pp. 143–152 (2004) 9. Dunkels, A.: Full TCP/IP for 8-Bit Architectures. In: 1st International Conference on Mobile Systems, Applications and Services, pp. 85–98 (2003) 10. Durvy, M.: Poster Abstract: Making Sensor Networks IPv6 Ready. In: 6th ACM Conference on Networked Embedded Sensor Systems (2008) 11. Delicato, F.C., Pires, P.F., Rust, L., Pirmez, L., Rezende, J.F.: Reflective Middleware for Wireless Sensor Networks. In: 2005 ACM Symposium on Applied Computing, pp. 1155–1159 (2005) 12. Priyantha, N.B., Kansal, A., Goraczko, M., Zhao, F.: Tiny Web Services: Design and Implementation of Interoperable and EvolvableSensor Networks. In: 6th ACM Conference on Embedded Network Sensor Systems, pp. 253–266 (2008) 13. Ozturgut, H., Scholz, C., Wieland, T., Niedermeier, C.: SCOPE - Sensor Mote Configuration and Operation Enhancement. In: 22nd International Conference on Architecture of Computing Systems, pp. 84–95 (2009) 14. Damaso, A., Domingues, J., Rosa, N.: SAGe: Sensor Advanced Gateway for Integrating Wireless Sensor Networks and Internet. In: 3rd Workshop on Applications of Ad hoc and Sensor Networks, AASNET (to appear, 2010) 15. Tavares, E.: Software Synthesis for Energy-Constrained Hard Real-Time Embedded Systems. PhD Thesis, Center of informatics, Federal University of Pernambuco (November 2009)
Structured Context Prediction: A Generic Approach∗ Matthias Meiners, Sonja Zaplata, and Winfried Lamersdorf Distributed Systems and Information Systems Computer Science Department, University of Hamburg {4meiners,zaplata,lamersdorf}@informatik.uni-hamburg.de
Abstract. Context-aware applications and middleware platforms are evolving into major driving factors for pervasive systems. The ability to also make accurate assumptions about future contexts further enables such systems to proactively adapt to upcoming situations. However, the provision of a reusable system component to facilitate the development of such future-context-aware applications is still challenging - as it requires to be generic but, at the same time, as efficient and accurate as possible. To address these requirements, this paper presents the approach of Structured Context Prediction which constitutes a framework to facilitate the application of existing prediction methods. It allows application developers to integrate domain-specific knowledge by creating a customized prediction model at design time and to select, implement and combine prediction methods for the intended purpose. Feasibility is evaluated by applying a prototype system component to two mobile application scenarios, showing that both high accuracy and efficiency are possible.
1
Introduction
The vision of Ubiquitous Computing fosters the development of smart devices and applications which are able to assist the mobile user while ideally remaining in the background [1]. In consequence, the ability to obtain, to process, to manage and to provide context information describing the user’s environment and situation has become one of the most important requirements for such systems. The prediction of future context is an important further step which enables devices and applications to also proactively support the user [2]. With the ongoing emergence of generic context management systems and middleware support (e.g. [3, 4]), application developers are often able to build context-aware applications on the basis of generic frameworks. Supplementary, this paper faces the challenge to offer reusable system support for context prediction (in the following referred to as a prediction system) in order to support the development of context-aware applications which should not only consider ∗
The research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007-2013 under grant agreement 215483 (S-Cube).
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 84–97, 2010. c IFIP International Federation for Information Processing 2010
Structured Context Prediction: A Generic Approach
85
the current context, but which are also able to derive and use information about future situations – as e.g. illustrated by the following two use cases: Example 1 (Energy Management). A mobile device uses predictions about its future usage for energy management. If e.g. a user often utilizes his mobile phone in a similar way, the application responsible for energy management can use the prediction system in order to learn about this behavior. Based on predictions, it can make advanced recommendations about the optimal time interval for reloading the battery, save energy in an adjusted way or warn the user in case of critically intensive usage leading to an upcoming uncovered demand for energy. Example 2 (Service Availability). In mobile ad-hoc networks, the availability of a specific software service often depends on context data such as location or network connections. Thus, service availability can e.g. be predicted by forecasting the position of devices which provide such services and/or of devices which need to consume them. Predictions about service availability can thus be used to improve task assignment by always selecting the most “promising” device [4]. However, such a generic applicability imposes several principal requirements on a prediction system: First, the system should support a preferably wide range of applications and diversity of exploitable contexts [5, p. 11] in order to maximize reusability. Furthermore, there are inter-individual differences between users [6] which can also change continuously [7, p. 77]. Thus, the system has to adapt to the individual user at runtime by learning about the characteristics and regularities which determine the future context (cp. [6] [7, p. 77]). Besides being generic and adaptive, the prediction system should be able to produce customized predictions for different scenarios in a reliable and satisfying way, i.e. as accurate and efficient as possible. Especially mobile devices often suffer from a lack of resources [8], so that the corresponding requirement for efficiency makes this trade-off even more challenging. Finally, the core motivation to support application developers implies a preferably low effort for them. In summary, the following requirements can be identified: 1. 2. 3. 4. 5. 6.
Support for a wide range of applications Support for diverse kinds of context Adaptation to the individual user of the system Accuracy of prediction results Efficiency w.r.t. restricted resources of mobile devices Low effort for application developers
In order to elaborate an approach which fulfills these requirements, the rest of the paper is organized as follows: Section 2 analyzes major existing approaches to context prediction and examines hybrid prediction techniques. Section 3 introduces the approach of Structured Context Prediction which partly makes use of such hybrid techniques while integrating domain-specific knowledge in order to achieve accuracy and efficiency. The approach is evaluated in Section 4 using both quantitative evaluation on the basis of Example 2 as well as conceptual evaluation by means of the requirements identified in this section. Finally, Section 5 gives a short summary and an outlook.
86
2
M. Meiners, S. Zaplata, and W. Lamersdorf
Background and Related Work
As the basis for every prediction about the future, there must be a sufficient amount of related data collected in the past. In Example 2, the availability of a service and the position of the consuming device are parameters to be considered. Both are called variables in this paper and can have different values at different points of time (e.g. position = at home and service available = false). Such values constitute so-called historical data: Definition 1 (Historical Data). Historical data of variables V1 , ..., Vn in a time interval with discrete points of time j ∈ Z consist of values vi,j for all variables Vi and all points of time j. The definition is inspired by time series and stochastic processes as both are very generic concepts. It is used in combination with stochastics in order to express uncertainty about future variable values. Furthermore, different methods may have different types of scales (e.g. nominal or ratio according to statistics). Finally, variables can also be understood as attributes of entities (e.g. the position of the user) establishing the connection to context which is defined by Dey and Abowd on the basis of entities [9, p.3-4]. Being on the agenda of context-awareness [5, p. 6], histories provide essential input for prediction. Additionally, this paper takes into account the principle of inductive learning (cp. [10, p. 60-61]) which is e.g. used in data mining, machine learning, pattern recognition and statistics. Inductive learning extracts knowledge about characteristics and regularities from observed parts of historical data (training data). This is advantageous because storing or repeated processing of complete histories are avoided and thus resources of mobile devices are used more efficiently. The learned knowledge is applied for predictions to infer future context from current and recent context. The approaches of Mayrhofer [11], Sigg [7] and Petzold [12] are considered to be major contributions towards generic context prediction. Mayrhofer’s approach uses an exchangeable prediction method [11, p. 37]. The approach combines context prediction with a preceding extraction of context on a high abstraction level (high-level-context [5], e.g. the complex situation in a meeting) from context on a low abstraction level (low-level-context [5], e.g. the noise level in the current room) [11, p. 5, 33, 62]. Context prediction is simplified by this approach because only few different contexts have to be considered as possible values of only one variable. Thus, a prediction based on high-level-context permits high efficiency. However, only predictions about high-level-context are possible, and Sigg states that such a high-level-context-approach is disadvantageous for accuracy [7, p. 179]. The effort for application developers using Mayrhofer’s approach is low [11, p. 128]: Almost no knowledge about the application domain is used [11, p. 128], but its integration is recommended by Mayrhofer as future work [11, p. 132]. Sigg’s approach is nearly fully generic, in particular it is not restricted to the prediction of high-level-context [7, p. 92]. On the other hand, the simplification of prediction based on high-level-context is missing. Sigg considers a single prediction method applied at runtime which is exchangeable at design time [7, p. 91],
Structured Context Prediction: A Generic Approach
87
Fig. 1. Characteristics of existing approaches compared to Structured Context Prediction (SCP ) in view of the requirements identified in Section 1
but regardless of the choice of the method, it has to deal with a potentially high number of possible variable/value combinations as well as additional applicationand variable-dependent requirements. Such requirements could involve fast predictions of the values of a specific variable, different scale types or the utilization of the type of dependencies among variables [11, p. 66] [7, p. 96]. Mayrhofer, Sigg and Petzold state that there is no universal method fulfilling all possible requirements [11, p. 86, 91] [7, p. 203-204] [12, p. 142]. Thus, it is expected that there will always be serious limitations considering accuracy and efficiency for generic context prediction as long as only a single method is used. The effort for application developers using Sigg’s approach is relatively low [7]. Petzold’s approach is restricted to the prediction of primary context [9], i.e. time, position, identity and activity [12, p. 141], and is therefore not fully generic. In addition to the other approaches, it allows a parallel, hybrid application of multiple methods [12, p. 87] (similarly in [13]). This means that the same prediction task can be assigned to multiple methods in order to better fulfill application- and variable-dependent requirements. The advantages of multiple methods can be combined [12, p. 142], e.g. different specialized methods can be utilized to address different aspects of the prediction. This allows for high accuracy and efficiency. On the other hand, the combination of methods leads to higher effort for application developers who have to select and combine the methods in order to apply the approach to their individual application domains. Figure 1 summarizes the main characteristics of the presented approaches showing that there is still no approach which is generic enough and offers high accuracy and efficiency at the same time. However, the application of multiple prediction methods is interesting because it offers the possibility to achieve high accuracy and efficiency. Also, the integration of domain-specific knowledge has to be considered because it narrows the prediction task and can simplify achieving high accuracy and efficiency while remaining generic. Beyond context, Hilario distinguishes different kinds of techniques for a hybrid application of multiple methods [14]. The parallel application of methods which is used by Petzold is called coprocessing [14]. In contrast, chainprocessing denotes the sequential application of methods so that the prediction result of one method is used as input for another method [14]. Both techniques expose benefits concerning accuracy and efficiency [14, p. 21-22] [15, p. 1-4]. Bayesian
88
M. Meiners, S. Zaplata, and W. Lamersdorf
Networks offer the possibility to describe a graph-based dependency structure of variables (cp. [16, p. 101, 112-114] [17]). However, such a generic graph-structure is also interesting to describe the connections between methods which take the output of a method as the input of another method (cp. 3.2). Several methods could be applied for prediction in a hybrid way (e.g. neural networks, regression, decision trees, discriminant functions, markov chains, ARMA and more). However, many of them are not suitable because they require too many resources to be used on mobile devices or do not support adaptive online-learning [2, p. 34] [11, p. 66], i.e. are not able to update already learned knowledge and therefore need an explicit learning phase. The remaining methods come into consideration for the approach presented in the following section.
3
The Approach of Structured Context Prediction
The approach of Structured Context Prediction (SCP) realizes a generic prediction system. It is based on fundamental principles derived from the preceding analysis of existing approaches and introduces the new concept of Prediction Nets and an architecture for a corresponding prediction system. 3.1
Fundamental Principles
In order to overcome the remaining conflict between the requirements of genericness, accuracy and efficiency, the proposed prediction system is based on two major principles: In contrast to Mayrhofer who prioritizes unobtrusiveness [11, p. 131], the approach of Structured Context Prediction uses knowledge about the application domain as valuable information which has to be incorporated by the application developers at design time. It is thereby extending Petzold’s ideas. The second principle is a hybrid application of multiple, exchangeable prediction methods. Thus, methods which are appropriate to ensure accuracy and efficiency of domain-specific predictions can be selected and combined by the application developers respectively. The knowledge about the application domain is described as a prediction model which specifies the way predictions have to be performed and configures the prediction system. Among other things, it assigns a method to each variable in order to predict its value and interrelates the methods. The method uses the values of other variables as inputs which are again predicted by their own methods or are known (e.g. measured by a sensor). Additionally, an adaptation to the individual user at runtime is achieved by adaptive online-learning as the default learning mechanism. Figure 2 shows how the preceding principles and techniques complement one another in order to apply the generic prediction system to a concrete prediction task. The integration of domain-specific knowledge can be illustrated by the application responsible for the energy management of a mobile phone (Example 1). The fact that the user is telephoning can be represented as the value of a boolean variable which is predicted by a method using the values of other variables such
Structured Context Prediction: A Generic Approach
89
Fig. 2. General methodology of Structured Context Prediction
as the time of day and the position of the user which are again each predicted by their variable’s methods. The set of usable methods can be extended by implementing new, possibly application-dependent methods. In consequence, an implementation of a prediction system according to the presented approach constitutes a framework which can be extended by other methods as "plug-ins". So far, a reference configuration of methods is established which is mainly based on linear regression and probability tables. The notion “probability tables” is used to refer to a method which stores occurrence frequencies of variable/value combinations as knowledge. The properties of the two methods complement one another and are therefore well suited for hybrid application. From the perspective of an application developer, the whole procedure of using the prediction system consists of two parts: The first part is determined by the development of the prediction model at design time (cp. Section 3.2). The second part is the retrieval of predictions by the respective application at runtime (cp. Section 3.3). 3.2
Prediction Nets
As introduced above, the main part of a prediction model specifies how the methods and respective variables are connected. This part is called Prediction Net. A Prediction Net specifies that the value of a variable at a specific point of time is predicted using the values of other variables at the same point of time or earlier (e.g. the position of a user can be predicted by the preceding positions). Figure 3 shows a simple example of a Prediction Net which is mainly intended to predict the energy consumption of a mobile phone (cp. Example 1). The phone usage at a specific point of time is e.g. predicted by the number of missed calls, the position and the time of day at the same point of time, and the phone usage one and two time steps earlier. Prediction Nets are defined formally as follows: Definition 2 (Prediction Net). A Prediction Net is a finite directed graph Δ N = (W, E). The node set W is a set of variables {V1 , ..., Vn }. An edge Vi → Vi := (Vi , Δ, Vi ) in the edge set E ⊆ W × N0 × W expresses that the value of Vi at the point of time j − Δ is used as input for the prediction of the value of Vi at the point of time j. The symbol Δ denotes a time offset. The notations 1, ..., l 0 k Vi → Vi := Vi → Vi and Vi −→ Vi := lk=1 Vi → Vi are allowed as Δ
Δ
abbreviations. A Prediction Net contains no cycles of the form Vi →1 ... →l Vi l with k=1 Δk = 0.
90
M. Meiners, S. Zaplata, and W. Lamersdorf
number of missed calls
phone usage
1
1
V1 V2 time of day
V4
V1
1, 2
V2
V1 V5
1, 2
V4
V2
V3 position
Fig. 3. Prediction Net for Example 1
V4 V6
V3
V3
Fig. 4. Prediction Net fragment without (left) and with coprocessing (right)
Prediction Nets are inspired by Dynamic Bayesian Networks [18] which explicitly take into account the factor time. The main difference between Bayesian Networks and Prediction Nets is that Bayesian Networks describe dependencies between variables, and Prediction Nets describe connections between methods which are assigned to the variables. Prediction Nets only allow predictions along the edges of the graph. On the one hand, this makes the design of such a net more complex, but on the other hand it facilitates the predictions – which is advantageous when taking into account the restricted resources of mobile devices. Thus, Prediction Nets are not considered to be a prediction method, but are rather intended as a frame in order to integrate existing methods. It is e.g. possible to use a probability table for a variable with nominal scale type and at the same time regression for another variable with ratio scale type in order to handle a linear dependency in an efficient way with only low storage requirements. The first step to perform a prediction with a Prediction Net is to generate the relevant part of the respective unfolded Prediction Net. An unfolded Prediction Net is a representation of a Prediction Net which represents each variable multiple times, i.e. one variable instance for each point of time. If, e.g., the position of the user (cp. V3 in Figure 3) should be predicted by the preceding position, the 1 Prediction Net contains the variable V3 and the edge V3 → V3 . The unfolded Prediction Net is thus determined as ... → V3,−2 → V3,−1 → V3,0 → V3,1 → V3,2 → ... where ..., -2, -1, 0, 1, 2, ... are points of time (realization of a markov chain). Prediction Nets are a powerful modeling instrument, e.g. they can be used for coprocessing as shown in Figure 4 (right side). Here, V5 and V6 are variables with different methods for the same prediction task. The different results are integrated by the method of V4 (e.g. by arithmetic mean or majority vote). 3.3
Architecture
Figure 5 shows the architecture proposed by the approach of Structured Context Prediction for a prediction system which can be used as a reusable component in context-aware applications. Learning and prediction as concurrent processes are mapped to different layers. The two layers are linked by the knowledge layer which constitutes a data layer at the bottom of the architecture. The learning layer creates and updates knowledge and the prediction layer uses knowledge for predictions which are performed on demand by default. The architecture
Structured Context Prediction: A Generic Approach
91
Fig. 5. Architecture of a prediction system according to the SCP approach
permits that the prediction system is located on another device as the application itself (e.g. a powerful server offering location-dependent predictions). The data acquisition can also be performed remotely in order to use sensors of other devices (e.g. GPS). A further possibility of distribution supported by the architecture is to share learned knowledge with other devices. All of the three mentioned layers contain parts of the methods. Each method possesses its own knowledge (e.g. frequencies of variable values in case of a probability table) and its algorithms for updating the knowledge and predicting the value of the variable associated with the method. However, a method does not have to be aware of the structure of the Prediction Net. The knowledge layer contains knowledge about relationships, characteristics and regularities determining the context. The first part of this knowledge is comprised by the given prediction model which contains the Prediction Net. The second part contains the instance data which is created and updated by adaptive online-learning using the specified methods at runtime in order to adapt to the actual user of the application. In the developed prototype system, the prediction model is created by the application developers as an XML-representation. The data acquisition layer is responsible for acquiring context data. It makes them available by a unique interface and abstracts from the interfaces of physical or logical sensors (cp. e.g. [3]). The learning layer operates concurrently and independently from the application by default. This means that it periodically obtains relevant context data from the data acquisition layer and assigns them to the methods as training data. The methods extract knowledge from the training data by inductive learning. They have to support adaptive online learning unless it is not desired (e.g. in case of a user-independent dependency). The prediction layer makes use of an algorithm which coordinates the methods. This is necessary, because - unlike learning - the prediction normally cannot
92
M. Meiners, S. Zaplata, and W. Lamersdorf
be performed by a single method only. For example, if the phone usage should be predicted using the Prediction Net in Figure 3, also the position and the number of missed calls have to be predicted by their respective methods. Accordingly, a prediction initiated by the application begins with the generation of the relevant part of the unfolded Prediction Net. For each relevant point of time for every relevant variable, a prediction unit called predictor is created. A predictor obtains input values from its parent predictors and passes them to the method associated with its variable in order to predict the required value at the given point of time. The following algorithm summarizes the prediction of value vi,j of variable Vi at the point of time j: if vi,j already predicted then return vi,j else if vi,j known then return vi,j else let parent-predictors predict their values predict vi,j with own method using these values return vi,j end if end if The algorithm is executed multiple times in order to capture probabilistic behavior. This is inspired by an algorithm called Stochastic Simulation which was originally developed for Bayesian Networks [19, p. 189-191] In a prediction round, each predictor and its method predict one of the possible values of the corresponding variable at the corresponding point of time. A value should be chosen with high probability only if the probability occurring in reality is also high. The individual prediction results are used to finally obtain a probability distribution. This can either be a distribution of possible values of the variable at a specific point of time in the future, or a distribution of possible points of time in the future when the variable will have a specific value. Additionally to the number of prediction rounds, an alternative reduced mode without repeated execution of the algorithm can be chosen. This enables a high scalability in comparison to more usual algorithms for Bayesian Networks which could be adapted for Prediction Nets.
4
Evaluation
A prototype of a generic prediction system according to the approach of Structured Context Prediction has been implemented for the Java Micro EditionTM and was applied to the two application scenarios motivated in example 1 and 2. In particular, the framework is used by the existing DEMAC -middleware [4] for the prediction of service availabilities in order to enhance the distribution of mobile business processes. The following subsection presents the experiences which have
Structured Context Prediction: A Generic Approach
time since V1 1970
1,...3
1 n e t s i z e V1 0
t i m e o f d a y V2
V1 3 SA by time of day and place
discrete place V5 by place before
1
1,...3
1,...3
discrete place extrapolated
V1 2 SA by time of day
discrete place V4 by time of day
x-coordinate V6 of position
SA by net size
chances for SA V1 1 by net size
discrete V3 time of day
discrete place (IR)
V8
y-coordinate V7 of position
93
V9
V1 6
SA (IR)
V1 4
SA by place
V1 5
Fig. 6. The Prediction Net for example 2 (SA = service availability, IR = integrated result of coprocessing)
been made with this second scenario and the developed prototype prediction system. The section concludes with a general conceptual evaluation and discussion. 4.1
Scenario-Based Evaluation
The first part of the evaluation is based on the prediction of service availabilities as motivated by example 2 (cp. Section 1). As a first step, an application-specific prediction model is developed and configured for the prediction of service availabilities for a user’s device which is temporarily connected to different ad-hoc networks. In consequence, the service availability is assumed to depend on the total number of devices within these networks as potential service providers, and, similarly to Figure 3, on the time of day and the position of the consuming device (e.g. a printer service is available at the location of the user’s company for the whole day and an ad-hoc file exchange service is temporarily offered by adjacent mobile devices). Thus, these four variables and the dependencies between them basically constitute the Prediction Net shown in Figure 6. The mapping of the original four variables of the example to multiple variables in the net results from the selected prediction methods and the use of coprocessing. The service availability (SA) at a specific point of time is predicted by using four methods in parallel. Similarly, the position at a specific point of time is predicted by using the time of day at this point of time as well as the position at the preceding point of time (markov chain) and by extrapolating the position. As most of these variables are not numerical, the prediction model uses probability tables as the main prediction method of the reference configuration. In addition, also more specialized methods are used (e.g. such as a method realizing a majority vote and a method for determining the time of day as a periodic variable). Developing an appropriate prediction model for the introduced scenario is not trivial because predictions about arbitrary services with different characteristics
94
M. Meiners, S. Zaplata, and W. Lamersdorf
have to be supported. Additionally, because of the resulting network load, service availabilities cannot be measured regularly, so it is e.g. not possible to predict the availability of a service with a markov chain using it’s preceding availability. The example scenario consists of realistic historical data about the behavior of a user and its mobile device spanning an interval of seven days (MondaySunday). It contains the net size, the time of day, the position and the service availability as values of the corresponding variables at different points of time, representing context data measured by real sensors. For the practical experiment, two services with different behaviors have been chosen: A stationary print service is regularly available when the user is at work. An ad-hoc file exchange service is offered spontaneously by few mobile devices carried by other people in the direct vicinity of the user and is thus only available very unfrequently. The quantitative evaluation covers accuracy and efficiency. High efficiency means that the ratio of resource consumption and quality of results is appropriate. The efficiency of the prediction system and of the created prediction model is determined by measuring the resource consumption and the accuracy of the prediction results. All results are based on predictions about the availability of a service as a boolean variable at a specific point of time. The evaluation is run on an average notebook (1.5 GHz, Pentium M processor). If appropriate methods (such as in the reference configuration) are used, the memory requirements are bounded and do not significantly increase because the instance data, i.e. the knowledge learned by the prediction methods, is saved instead of measured raw context data. In most cases, the memory consumption is dominated by probability tables. Thus, the upper bound of memory required for the instance data depends on the number of variables connected with the method and the number of their possible values in the Prediction Net. The maximum amount of memory required for the instance data in the example scenario is about 20 KB and the processing time for learning is insignificant (i.e. considerably less than 1% CPU load). The processing time of a prediction depends on the number of variables in the Prediction Net, the number of prediction rounds and the number of time steps in the time interval which is taken into account for the prediction. Theoretical considerations show that – regarding these dependencies – the time complexity is linear. This result is also confirmed by practical experiments regarding the number of time steps in the time interval (cp. Figure 7) and similarly the number of prediction rounds (time consumption ranging from 13 ms for 50 rounds to 117 ms for 500 rounds if a prediction about service availability is requested 60 minutes ahead). Considering the current processing power of smaller mobile devices (e.g. smartphones), the results indicate that also such relatively complex predictions take less than one second and, thus, the resource consumption is relatively well suited even for less powerful mobile devices. The analysis of accuracy begins with the “empty” prediction system which still has no knowledge learned at runtime. In the course of time, the system learns from the current values of the historical data. Predictions are executed concurrently. The achieved accuracy is determined by comparing the predicted
Structured Context Prediction: A Generic Approach
Fig. 7. Prediction of service availability at different future points of time with 70 prediction rounds, every value averaged over 176 predictions
95
Fig. 8. Correct predictions per day about the availability of a print service (solid line) and an ad-hoc file exchange service (dashed line)
probability of a service’s availability with the actual availability (as boolean value) described in the historical source data. A prediction is considered to be correct if the prediction result states that service availability is probable (resp. unprobable) and the service is actually available (resp. unavailable) in future. Figure 8 shows the accuracy of predictions about the availability of the two service types at different days. Because the (more simple) ad-hoc file exchange service is often unavailable, this regularity can be learned quickly and predictions about its availability already start with relatively good results, i.e. predicting that the service is not available is correct in most cases. Furthermore, in the following days, the system learns to distinguish the availability of the service and the accuracy slightly increases. Also the prediction results about the availability of the print service improve very quickly. Thus, at Tuesday the system is already able to predict that the printer service is available when the user is at work. However, the predictions about the ad-hoc file exchange service are still not completely satisfactory, i.e. a trivial prediction approach always predicting that the service will be unavailable would not be significantly worse. Therefore, an enhanced solution could make use of the full potential of coprocessing by automatically preferring the methods with smallest uncertainty arising from the prediction and thus improve the adaption to the individual user. In the case of the ad-hoc file exchange service, e.g. the net size should play a more important role than the time of day and the position which both currently rather disturb predictions. 4.2
Conceptual Evaluation and Discussion
The fundamental principles of the approach of Structured Context Prediction are appropriate to fulfill most of the requirements as identified in Section 1. First, they establish genericness (requirements 1, 2, 3). They enable a configuration of the prediction system which meets application- and variable-dependent demands (e.g. originated by different scale types and dependency types) because methods can be chosen flexibly according to the domain-specific knowledge. Additionally, variables as attributes of entities constitute a generic metamodel in order to capture the application domain. Thus, many different applications with diverse demands
96
M. Meiners, S. Zaplata, and W. Lamersdorf
for their domains and possible contexts are supported (requirements 1, 2). An adaptation to the user takes place by adaptive online-learning (requirement 3). The effort for application developers to enable context predictions will be decreased in many cases if an implementation of a prediction system according to the presented approach is used as a reusable system support (requirement 6), because the system coordinates the methods and offers a reference configuration of already implemented methods which can be extended in the future. When all required methods are implemented, the application developers can handle them as black boxes and only have to configure them, combine them and define the data dependencies between them in an abstract way by using the graphical representation of the Prediction Net and the XML-representation of the prediction model. Compared to Mayrhofer’s and Sigg’s approach, the effort for application developers is still high because domain-specific knowledge is used and has to be incorporated by the application developers. However, this compromise allows for facilitating the prediction task at runtime and for enabling high accuracy and efficiency without limiting the approach of Structured Context Prediction to a special application domain, i.e. keeping it generic. The possibility to select appropriate methods for the considered application domain is not only a positive aspect in face of genericness, but also in view of accuracy and efficiency (requirements 4, 5). For each variable, the best method to handle the characteristics and regularities determining the value of this variable can be selected. Efficiency can in particular be enhanced if no dependencies are expected between a set of variables. For example, the availability of a service does not necessarily depend on the availability of another service (cp. Example 2). Some dependencies exist, but are known to be instable, i.e. to change from time to time due to unobserved, external influences. Such dependencies are candidates to be ignored so that accuracy and efficiency can be improved. Additionally, it makes sense not to offer and prepare predictions about variables which will never be used. It is e.g. unnecessary to enable predictions about the future position of a device using the information whether a service is available unless it is needed by the application. Finally, the development of an appropriate prediction model can ensure scalability and applicability in the context of heterogeneous devices. For example, the application developers have the possibility to select methods with only a small demand for resources in case the application is targeted to be run on resource-restricted mobile devices.
5
Conclusion
As a further step towards pervasive environments, the approach of Structured Context Prediction realizes a prediction system as a framework with high genericness and potential for high accuracy and efficiency at the same time. So, an approach with a new combination of characteristics in comparison to the approaches analyzed in Section 2 is established (cp. Figure 1). In contrast to analyzed previous approaches which are often restricted to special applications, e.g. to those which only use high-level-context, the composability of prediction
Structured Context Prediction: A Generic Approach
97
methods and the integration of domain-specific knowledge as proposed here enables support for a wide range of applications. However, there are still some open research tasks - especially in view of the usability of the developed framework. In particular, a reduction of the effort for application developers (e.g. by supporting tools for the development of prediction models) constitutes a major challenge to further facilitate the prediction of future context by context-aware applications.
References 1. Satyanarayanan, M.: Pervasive Computing: Vision and Challenges. IEEE Personal Communications 8(4), 10–17 (2001) 2. Mayrhofer, R.: Context Prediction based on Context Histories: Expected Benefits, Issues and Current State-of-the-Art. In: Proc. of ECHISE 2005, pp. 31–36 (2005) 3. Salber, D., Dey, A.K., Abowd, G.D.: The Context Toolkit: Aiding the Development of Context-Enabled Applications. In: Proc. of CHI 1999, ACM, New York (1999) 4. Zaplata, S., Kunze, C.P., Lamersdorf, W.: Context-based Cooperation in Mobile Business Environments: Managing the Distributed Execution of Mobile Processes. BISE 2009(4) (2009) 5. Chen, G., Kotz, D.: A Survey of Context-Aware Mobile Computing Research. Technical Report TR2000-381, Dartmouth College (2000) 6. Jameson, A., Wittig, F.: Leveraging Data About Users in General in the Learning of Individual User Models. In: Proc. of. 17th Int. Joint Conf. on Artificial Intelligence, pp. 1185–1192. Morgan Kaufmann, San Francisco (2001) 7. Sigg, S.: Development of a Novel Context Prediction Algorithm and Analysis of Context Prediction Schemes. PhD thesis, University of Kassel (2008) 8. Satyanarayanan, M.: Fundamental Challenges in Mobile Computing. In: Proceedings of PODC 1996, pp. 1–7. ACM, New York (1996) 9. Dey, A.K., Abowd, G.D.: Towards a Better Understanding of Context and ContextAwareness. Technical report, Georgia Institute of Technology (1999) 10. Symeonidis, A.L., Mitkas, P.A.: Agent Intelligence through Data Mining. Springer, Heidelberg (2005) 11. Mayrhofer, R.: An Architecture for Context Prediction. PhD thesis, Johannes Kepler University Linz (2004) 12. Petzold, J.: State Predictors for Context Prediction in Ubiquitous Systems. PhD thesis, University of Augsburg (2005) (in German) 13. Petzold, J., et al.: Hybrid Predictors for Next Location Prediction. In: Ma, J., Jin, H., Yang, L.T., Tsai, J.J.-P. (eds.) UIC 2006. LNCS, vol. 4159, pp. 125–134. Springer, Heidelberg (2006) 14. Hilario, M.: An Overview Of Strategies For Neurosymbolic Integration. In: Connectionist-Symbolic Integration, pp. 13–35. Lawrence Erlbaum Assoc., Mahwah (1995) 15. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000) 16. Borgelt, C., Kruse, R.: Graphical Models - Methods for Data Analysis and Mining. John Wiley & Sons, Chichester (2002) 17. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988) 18. Dagum, P., Galper, A., Horvitz, E.: Dynamic network models for forecasting. In: Proceedings of UAI 19, pp. 41–48. Morgan Kaufmann, San Francisco (1992) 19. Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2001)
Experiments in Model Driven Composition of User Interfaces Audrey Occello, Cedric Joffroy, and Anne-Marie Dery-Pinna Universit´e de Nice Sophia-Antipolis, Polytech’Nice Sophia, 930, Route des Colles, B.P. 145, F-06903 Sophia Antipolis cedex, France
Abstract. Reusing and composing pieces of software is a common practice in software engineering. However, reusing the user interfaces that come with software systems is still an ongoing work. The Alias framework helps developers to reuse and compose user interfaces according to the way they are composing new systems from smaller units as a mean of speeding up the design process. In this paper we describe how we rely on Model Driven Engineering to operationalize our composition process. Keywords: User interface composition, metamodeling, transformations.
1
Introduction
Software Composition is about reuse of software artifacts in order to construct larger systems from smaller ones such as with Service Oriented Architecture (SOA) [1] or Component-Based Software Engineering (CBSE) [2] paradigms. Techniques evolve to ever improve the reusability, customizability and maintainability of such composed systems. However composition is often focused on the functional part of a system and not on its interactive part. Hence, the User Interface (UI) often has to be built from scratch each time a new system is composed from a set of services or components for example. Based on the hypothesis that each service comes along with a UI, we propose to exploit the relationships between them to deduce the UI of the application resulting from a composition of services. The Alias framework1 builds a UI for an application A as a function of: 1) the way services are composed to form A; and 2) the interactions between such services and their corresponding UIs. The originality of Alias is to reason at the Abstract User Interfaces (AUI) level which simplifies the composition algorithm and makes it reusable and oblivious to heterogeneity: – algorithm simplification: the AUI level enables us to focus on the composition rules without burdening ourselves with widget type and style, – reuse: the same algorithm can be used to deduce composition of Swing UIs, Flex UIs, Ajax UIs and so on, – obliviousness to heterogeneity: we can deduce the composition of UIs written in different languages, as is done in the plasticity research area [3]. 1
http://users.polytech.unice.fr/~ joffroy/ALIAS/
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 98–111, 2010. c IFIP International Federation for Information Processing 2010
Experiments in Model Driven Composition of User Interfaces
99
Given these facts, we believe in the legitimacy of adopting a Model Driven Engineering (MDE) [4] approach. In this paper, we describe how we experiment with MDE to operationalize our composition process. As there are various ways of implementing MDE, we also discuss how we deal with modeling and tool choices and the questions our experiment raises. The remainder of this paper is organized as follows. Section 2 illustrates the Alias composition process using a tour operator scenario. Section 3 introduces the metamodels on which the framework is based to describe user interfaces and services as well as their compositions. Section 4 gives an overview of the transformations that operationalize the Alias composition process. Section 5 compares Alias with related work. Section 6 concludes.
2
Alias Composition Process Overview
The strength of the Alias approach is to prevent the developer from implementing user interfaces from scratch. Instead, the developer focuses on composing business components or services as usual. Then, the UI of the resulting composite application made of services is deduced from: 1) the interaction links between each user interface and its corresponding service and 2) the way services are composed to form the composite application. The only requirement to use the Alias framework is to respect a separation of concerns between the different elements that compose an application: the UI part and the service part need to be clearly identifiable as well as the interaction links between the two parts (triggering a given operation on event handling is considered as an interaction link between the UI and the service). Section 2.1 presents a scenario to illustrate the proposal. Section 2.2 gives some details about the composition engine and its role in the overall process. 2.1
Tour Operator Scenario
We illustrate the Alias approach with a Hotel Booking and Flight Reservation composition scenario. We want to reuse these two services and their corresponding user interfaces in order to build a new service that enables users to book a hotel and a flight simultaneously, as would happen in a tour operator company. With such a service, the user would be able to plan a trip faster. To illustrate our proposal, we only focus on the search part of these services. Extensions of the example can be found on the Alias website. The Hotel Booking service. This service handles two operations: (i) getAvailableHotels returns a list of available hotels for a given quadruplet (country, city, check-in and check-out dates) and (ii) bookARoom books a room in a hotel. To use this service, there are different user interfaces (as proposed by http://www.travelocity.com/Hotels). We only focus on the UI that enables the user to check the availability of hotels. To view available hotels, a user has to follow the steps above: 1) choose a country, a city and check-in/check-out dates; 2) search for available hotels (getAvailableHotels operation call).
100
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
The Flight Reservation service. This service handles two operations: (i) getAvailableFlights returns a list of flights and (ii) reserveAFlight makes the reservation of the flight. To use this service, there are different user interfaces (as proposed by http://www.airfrance.us/). We focus only on the UI that checks available flights). To view available flights, a user has to follow the steps above: 1) choose a country and a city to select a departure and a destination airport, a departure and a return date; 2) search for available flights (getAvailableFlights operation call). 2.2 Alias Framework Steps and Role of the Composition Engine The composition engine aims at deducing which elements of the existing UIs to keep, which ones to leave and what to do in case of duplicated elements in the UI corresponding to the service composition. The expression of the resulting UI structure as a composition of the existing UI is not written by the developer. The composition rules that give the resulting UI structure are generated by the engine as a function of composition inputs (the interaction links between each user interface and its corresponding service and the way services are composed). Alias uses first order logic to generate such composition rules. The composition engine is described as PROLOG predicates, the composition inputs are generated as PROLOG facts and the composition rules to generate the resulting UI are obtained by inference. We do not detail the composition engine further as it is not in the scope of this paper. Using the Alias framework can be divided in 5 steps and implies the manipulation of services, UIs and compositions at two levels of representation: a concrete level that corresponds to the Platform Specific Model (PSM) of the MDA layer modeling stack [5] and an abstract level that corresponds to the Platform Independant Model (PIM). 1. The developer has to select the services to compose (Hotel Booking service and a Flight Reservation service in the tour operator scenario); 2. The framework collects information about each service and their UIs to create abstract representations; 3. The developer makes explicit the composition links between the different services of the composite application and the interaction links between services and UIs at the abstract representation level; 4. The composition engine computes the resulting user interface as a set of element reused from the existing UIs abstract representation; 5. The information of the resulting UI abstract representation is used to generate a first sketch of the Concrete User Interface (CUI) at the platform level. Steps 1, 2 and 5 imply being capable to obtain and interpret differents PSMs corresponding to various UI description languages (Flex, XUL, Swing, etc), service description languages (SCA, OSGI, WS-*, etc) and composition formalisms (BPEL orchestration, component assembly, etc). Steps 2, 3 and 4 are necessary to perform the composition in a generic way: UIs, services and the way they are composed need to be explicit in a pivotal formalism, the PIM, which is presented in section 3. An overview of the model transformations that operationalize this process is described in section 4.
Experiments in Model Driven Composition of User Interfaces
3
101
Metamodels Involved in the UI Composition Process
In previous work [6], we defined three languages in order to describe UIs at different levels of abstraction: ALIAS-Behavior for modeling UI elements at a very high level, ALIAS-Structure for modeling more concrete aspects of the UI structure and ALIAS-Layout for modeling the UI layout. The main goal of this set of languages was to compose heterogeneous UIs directly: we experimented with various composition algorithms. This first step led us to the conclusion that we do not need to take into account all these aspects in the composition rules. Composing the style or the layout is not pertinent when considering heterogeneous UIs: for the composite UI to look coherent, it is necessary to keep the style and the layout of only one of the UIs to be composed. Such information is reintroduced after the composition during the transformation to the concrete UI level. Ongoing work is focused on ALIASBehavior, the pivotal formalism upon which the composition reasoning is done. This section presents how we metamodeled this pivotal formalism. In recent work, we moved from a pure UI composition to a UI composition deduced from service composition. As a matter of fact, we have to manage information not only about UIs but also about services. We could have distinguished the formalism describing UIs from the one describing services. However, the degree of abstraction that we chose allows for manipulating UIs and services in the same way (see section 3.1). This point facilitates the reasoning about composition and simplifies the formalism. We applied separation of concerns and split our formalism design into two metamodels (PIMs), each one having the most suitable representation to achieve its goal easily. A first one deals with service and UI information extraction and exchange (section 3.1) and the second one deals with the composition itself (section 3.2). As their structures goals differ, merging the two would have led us to privilege one representation in such a unified metamodel, making the other less efficient. Moreover, there is only a small subset of information in common between the two metamodels. 3.1
AliasExchange Metamodel
The AliasExchange metamodel represents both UIs and services. Information that needs to be reified for a UI concerns: the data inputs independently of the widget chosen to retrieve this data (text fields, lists, trees, etc), the data outputs again independently of the widget chosen to display this data (labels, etc) and action triggers (user interactions) independently of the widget chosen to trigger actions (buttons, menu items, etc). Information that needs to be reified for a service concerns essentially the signature of the operations implemented by a service: input parameters, output parameters and the name of the operation. Then we can identify an isomorphism between the two sets of information and that is why a unique representation for both UI and services is possible: 1) UI data inputs correspond to service operation parameters, 2) service operation
102
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
results correspond to UI data outputs, 3) interactions with the user are located in UI action elements and service operation calls. To sum-up, the metamodel (Fig. 1) defines the Entity class, representing UIs and services which are made of a set of AliasElement and the Input, Output and Action classes which inherit from the Element class describing UI elements and service operations through an id, a name and a type. Each Action element is associated with Input and Output elements in order to find the parameters and return of an operation and to associate user interactions to input and output widgets. In this metamodel, each entity is considered individually: we reify neither the interactions between the UI and the service nor the interactions between services.
Fig. 1. UML Class diagram of the AliasExchange Metamodel
The semantic attribute is a parameter of the composition algorithm used in case of conflicts. It helps in deciding to merge or group UI elements if they are equivalent or if they belong to the same family of information for the user (e.g. two UI elements that denote contact information). It also makes it possible to prevent the merging when UI elements own the same structure (name, type) but differ in their semantics (e.g. hotel check-in and flight departure are dates but the first one means “way-out” and the second one “way-in”). Currently, this attribute corresponds to some key words (e.g. “way-in”, “wayout”, “contact information”) and is filled manually in order to validate the composition algorithm and to compare different merging alternatives. Ultimately, we plan to decorate AliasExchange models with ontology annotations to enhance the conflict detections. The challenge is then to decorate the models automatically. As a proof of concept, we plan to extract such information from Web Services ontology languages such as OWL-S (http://www.w3.org/Submission/OWL-S/). Figure 2 shows the AliasExchange model for the flight availability checking UI of the tour operator scenario. There are six inputs (names of departure/destination country and city, check-in and check-out dates), one output (a list of available flights) and one action (users search for flights).
Experiments in Model Driven Composition of User Interfaces
103
Fig. 2. Model for the user interface of the Flight Reservation service
Figure 3 represents the Flight Reservation service and highlights the isomorphism between the two sets of information: We recognize the data shared with the UI component for availibility checking as well as the data relative to the other UI not depicted in this paper. Inputs are operation parameters, outputs are operation results and actions are operations such as getAvailableFlights. Types are not shown to avoid overloading the figure.
Fig. 3. Model for the Flight Reservation service
This metamodel eases the exchange of service and UI descriptions between developers, composing services, and UI designers, creating the concrete UI in function of the abstract UI deduced from such composition.
104
3.2
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
AliasCompose Metamodel
The AliasCompose metamodel represents interactions involved in compositions between a service and its UIs and between several services. To express the interactions, we use two different binding types: data links and event links. Event links represent control flows (between two operations or between an operation and the UI element that triggers the call). Data links represent dataflows (between UI elements and operation or between operations). The metamodel (Fig. 4) looks similar to component metamodels such as UML2.0 component diagram [7] or SCA [8] because UIs and services are represented as components with ports. However the granularity of our port is finer: at the data or operation level not at the programming interface level. We adopt the component metaphor as this has become a defacto standard to express bindings. AliasCompose shares some information about each individual UI and service with AliasExchange. However, it does not keep neither the name and type of inputs/outputs nor the relationships between actions and relative input and output elements as it does not need a precise definition of each individual service/UI to express the interaction and composition links.
Fig. 4. UML Class diagram of the AliasCompose Metamodel
From AliasExchange to AliasCompose: The AliasCompose models for the Flight Reservation service and the flight availability checking UI are obtained from the AliasExchange models illustrated in section 3.1. The service and the UI are components represented as boxes; inputs and outputs are represented at the left and right sides of the box; and triggers are represented on the top side of the box. The AliasCompose model for the service is depicted in the lower part of Figure 5 and the AliasCompose model for its UI in the upper part of this figure. First refinement step: To express the interaction links between the service and the UI, the two corresponding models need to be refined as a third one where bindings (see the Binding class in figure 4) between component ports are added. Such bindings are illustrated in Figure 5 as dashed lines. The overall figure depicts the AliasCompose model corresponding to this refinement.
Experiments in Model Driven Composition of User Interfaces
105
Fig. 5. AliasCompose model showing the interactions between the UI and the service
Second refinement step: The Composite class (see figure 4) is used to reify the result of the service composition as a new component. It expresses which ports are kept and which ones are left and makes it possible to deduce which UI elements may be merged in the resulting UI. For example, the lower part of figure 6 shows the AliasCompose model resulting from the refinement of the two AliasCompose models corresponding to the Hotel Booking service and the Flight Reservation service with city input merging. Results of the composition engine: The upper part of Figure 6 shows the AliasCompose model for the UI computed by the composition engine. The bindings between the upper part and the lower part of the figure describe the interactions between the resulting UI and the composition of services. Exploitation of the resulting UI abstract description: When the composition engine has computed the abstract description of the UI for a given composition of services, the resulting AliasCompose model is translated back to AliasExchange models in order to generate code for specific platforms. At this step, details concerning UI structure and layout choices need to be reintroduced. Hence AliasExchange models which correspond to UIs are annotated with extra information to describe UI elements more precisely with widget-specific characteristics (lists or check-boxes, buttons or menu items, . . . ) and the position of UI elements. For this, we would reuse our work around ALIAS-Structure and ALIAS-Layout.
106
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
Fig. 6. AliasCompose model for the composition of the Hotel Booking service and the Flight Reservation service and the deduced UI
Annotations would be added using the decorator design pattern. This part of the work is still under progress and what we hope is to be able to transform a decorated AliasExchange model into an existing model of abstract UI dedicated to plasticity (such as Teresa [9] for example) to address an ergonomic final UI. We do not discuss this point further in this paper.
4
Overview of the Transformations Involved in the Composition Process
This section describes the end-to-end transformation chain that operationalizes the composition process steps described in section 2.2. The overview of the transformation chain is depicted in Figures 7 and 8. Full arrows represent automated transformations and dashed arrows represent hand-crafted transformations.
Experiments in Model Driven Composition of User Interfaces
107
Fig. 7. Abstraction and composition related transformations
– Transformations 1 and 2 correspond to “step 2” and consist in the extraction of information about each existing concrete service and UI and its expression into the AliasExchange formalism. – Transformation 3 is a pre-processing task of “step 3” to obtain UI and services in the AliasCompose formalism. Transformation 4 corresponds to “step 3” in which the developer specifies: (i) the interaction links that exist between existing concrete UI and services in the AliasCompose formalism (first refinement step) and (ii) the composition links between services in the AliasCompose formalism (second refinement step). – Transformations 5, 5’, 6, 7 and 7’ are related to “step 4” (the use of the composition engine) and consist in a technological space shift. They correpond respectively to: the generation of prolog facts from alias models, the internal rule inference, the translation of prolog results back to alias models specifying the structure of the resulting UI (AliasExchange model) and its links to the composition of services (AliasCompose). – Transformation 8 and 8’ correspond to “step 5” and give feedback to application developers and UI designers on the UI composition. These transformations are very important because one can see whether the result of the composition is correct or not. It is easier to deal with a concrete UI than with Prolog facts or with an abstract UI. Each transformation has been denoted based on the classifications of Czarnecki [10] and of Mens [11]. The figure outlines the fact that our composition
108
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
Fig. 8. Concretization related transformations
process covers a large spectrum of transformations: 1) endogenous and exogenous, 2) vertical and horizontal, 3) reverse engineering (abstraction from PSM to PIM), synthesis (concretization from PIM to PSM), migration as well as internal refinement. The diversity of transformations has lead us to experiment with different tools and mechanisms. Then we use pure MDE tools such as ATL [12] and Acceleo2 as well as ad-hoc techniques such as implementing visitor design patterns, conditional Transformations [13] (which are based on a description of a model inside Prolog facts - with some rules of transformations we can describe how we would like to transform a source model into another one or into a source code). For the time being, we are not sure of which combination of tools best fits. Some questions are still open such as: – How to make a transformation bidirectional? We would like to use only one implementation in order to automate transformations 4 and 8’ because the latter is really the inverse of the former concerning the first refinement. – Which technique better deals with runtime and instance level transformation? We plan to use Alias at runtime in order to adapt the UI as a function of service discovery and disappearance in the context of ubiquitous computing. Then, transformations would occur at runtime and data transformation would be at an instance level not at a class/type level. Lastly, limits of the current transformation chain essentially concern reverse engineering of UIs and services. Transformations 1 and 2 are implemented as a visitor. However, this approach is only possible if we work on source code. We plan to try out an extraction technique based on language reflection such as proposed by Bezivin et al [14]. Transformation 4 is handwritten because it is too complicated to extract interaction links using a visitor or transformation rules in the case of the first refinement but it can be automated easily for the second refinement (composition links). The automation of transformation 4 is not essential in a design approach but is crucial if we want to use Alias at runtime. We plan to test the extraction technique based on language reflection also for the automated discovery of composition links by exploiting introspection over composition formalisms such as component assemblies [8], [15], [16] or service orchestrations [17], [18]. 2
http://www.acceleo.org
Experiments in Model Driven Composition of User Interfaces
5
109
Related Work
Models have long been used in Human-Computer Interaction (HCI) to make knowledge explicit. Rapidly, efforts have been put on UI code generation to make explicit and to reuse the know-how in HCI. Nowadays, models and transformations have been rediscovered under the umbrella of Model Driven Engineering (MDE) to tackle problems such as UI composition or UI plasticity (adaptation of the UI to the context of use while maintaining ergonomic properties [19]). The Cameleon Reference Framework (CRF) [3] defines four levels of abstraction of UIs: (i) Task and concept, both defining the user and system tasks and the concepts of a specific domain, (ii) Abstract User Interface (AUI) describing the structure of the user interface without any specific widget, (iii) Concrete User Interface (CUI) describing widgets of the User Interface and specifying also elements from the AUI, and finally (iv) Final User Interface (FUI) implementing the CUI in a specific language. The two first levels are PIMs. The two last levels are PSMs (for a language point of view or from a operating system point of view). The framework also proposes transformations to shift from one level to the next one. The metamodels described in this article are at the AUI level: AliasExchange is a subset of AUI, AliasCompose reifies additional information about composition/interaction links. Many approaches adopt an MDE approach based on this four level model abstraction of UIs. For example, [20] or [21] use AUI or CUI to compose user interfaces. At the end, they use model transformation to generate a FUI usable for an end-user. In such work, composition is addressed but mostly from an ergonomic and usability point of view. the composition process is based on the structural aspects of user interfaces located at the Abstract/Concrete User Interface (AUI/CUI) or task level and does not exploit information related to the functional part as does Alias. Servface [22], a European project, decorates service descriptions with user interface annotations. These annotations allow for the generation of a high quality UI to interact with the annotated services. The generation process of a FUI is based on refinement in the different models presented in [3]. Servface and Alias share the same goal: building a user interface for a composition of services. However, Servface composition is implemented by using a task tree description (service operations are bound to system tasks) and annotations whereas Alias composition extracts the tasks sequences in UI interactions from the service workflow instead of duplicating such information. Work on planning [23] proposes an approach to compose interactive services (e.g. functional core and UI) from user needs. For this, the framework asks users about what they want in a natural language. Then, an incomplete task model is built and transformed into a planning problem. After that, the results of the planning problem are translated back into task models which are refined successively into a AUI, CUI and then FUI. Lastly but not least, most of these approaches work at one CRF level and then define successive transformations to obtain a FUI. Hence, they follow a top-down approach and use mostly model-to-code transformation. In contrast,
110
A. Occello, C. Joffroy, and A.-M. Dery-Pinna
the Alias composition process adopts a top down approach as well as a bottom up one as existing artifacts are extracted from code. Alias makes greater use of transformation techniques as mentioned in section 4. However, work on planning and Alias have both demonstrated the power of model transformation for bridging domains together: the first one between UI composition and planning and the second one between UI composition and predicate logic.
6
Conclusion
Alias is a logical approach for composing user interfaces at the Abstract User Interface (AUI) level. Its originality comes from the fact that the composition is deduced from the way services are composed. This paper has shown the pertinence of adopting a MDE [4] approach with regards to our needs: – expressing the composition algorithm in a formalism that best fits, – reusing the composition algorithm for different concrete platforms (servicebased platforms such as OSGi, web services, SCA or component-based platforms such as Fractal, OpenCCM, Sofa and so on), – handling heterogeneity of UI description languages. We have shown how we used MDE to operationalize the Alias composition process. We definitively think that MDE brings a lot of benefits in the UI composition research area. This approach enables the isolation of the composition algorithm. Changing of formalism will not impact all the framework but only the transformations directly related to this aspect. In the same way, using reverse engineering prevents the designer from focusing on platform diversity and avoids combination issues that would arise if we translate the concrete services and UI directly into the composition algorithm in PROLOG. In addition, the use of MDE gives a lot of opportunities to the Alias framework. Considering that work around plasticity in the Human-Computer Interaction (HCI) community is based on models at different levels, our ultimate goal is to transform Alias models into plasticity models to obtain final user interfaces with ergonomic properties.
Acknowledgements We thank the DGE M-Pub 08 2 93 0702 project for his funding.
References 1. Natis, Y.V.: Service-oriented architecture scenario. Gartner, Inc. (2003) 2. Heineman, G., Councill, W. (eds.): Component-Based Software Engineering, Putting the Pieces Together. Addison-Wesley, Reading (2001) 3. Calvary, G., Coutaz, J., Thevenin, D., Limbourg, Q., Bouillon, L., Vanderdonckt, J.: A unifying reference framework for multi-target user interfaces. Interacting With Computers 15/3, 289–308 (2003) 4. Schmidt, D.C.: Model-Driven Engineering. IEEE Computer 39, 25–32 (2006) 5. OMG: Model Driven Architecture. OMG Document ormsc/2001-07-01 (2001)
Experiments in Model Driven Composition of User Interfaces
111
6. Pinna-D´ery, A.M., Joffroy, C., Renevier, P., Riveill, M., Vergoni, C.: ALIAS: A Set of Abstract Languages for User Interface Assembly. In: SEA 2008, Orlando, Florida, USA, IASTED, pp. 77–82. ACTA Press (2008) 7. The Object Managemant Group: Unified Modeling Language Specification 2. OMG Document formal/2009-02-02 (2009) 8. Marino, J., Rowley, M.: Understanding SCA (Service Component Architecture). Addison-Wesley Professional (2009) 9. Mori, G., Patern` o, F., Santoro, C.: Design and development of multidevice user interfaces through multiple logical descriptions. IEEE Transactions on Software Engineering 30, 507–520 (2004) 10. Czarnecki, K., Helsen, S.: Classification of model transformation approaches. In: OOPSLA 2003 Workshop on Generative Techniques in the Context of ModelDriven Architecture (2003) 11. Mens, T., Gorp, P.V.: Applying a model transformation taxonomy to graph transformation technology. Electronic Notes in Theoretical Computer Science 152, 143– 159 (2006) 12. Jouault, F., Kurtev, I.: Transforming models with ATL. In: Bruel, J.-M. (ed.) MoDELS 2005. LNCS, vol. 3844, p. 128. Springer, Heidelberg (2006) 13. Kniesel, G., Koch, H.: Program-independent composition of conditional transformations. Technical Report IAI-TR-03-1, ISSN 0944-8535, CS Dept. III, University of Bonn, Germany (2003) (updated February 2004) 14. B´ezivin, J., Chevrel, R., Bruneli`ere, H., Jossic, A., Jouault, F., Piers, W.: Modelextractor: an automatic parametric model extractor. In: The international workshop on Object-Oriented Reengineering (WOOR) at the ECOOP 2006 Conference, Nantes, France (2006) 15. The Object Managemant Group: CORBA Component Model Specification, 4.0 edition. OMG Document formal/2006-04-01 (2006) 16. Bruneton, E., Coupaye, T., Leclercq, M., Qu´ema, V., Stefani, J.B.: The fractal component model and its support in java: Experiences with auto-adaptive and reconfigurable systems. Softw. Pract. Exper. 36, 1257–1284 (2006) 17. Peltz, C.: Web services orchestration and choreography. Computer 36, 46–52 (2003) 18. Khalaf, R., Mukhi, N., Weerawarana, S.: Service-oriented composition in bpel4ws. In: WWW (Alternate Paper Tracks) (2003) 19. Scapin, D., Bastien, J.: Ergonomic criteria for evaluating the ergonomic quality of interactive systems. Behaviour & Information Technology 16, 220–231 (1997) 20. Lepreux, S., Hariri, A., Rouillard, J., Tabary, D., Tarby, J., Kolski, C.: Towards Multimodal User Interfaces Composition Based on UsiXML and MBD Principles. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, p. 134. Springer, Heidelberg (2007) 21. Pinna-D´ery, A.M., Fierstone, J.: Component model and programming: a first step to manage Human Computer Interaction Adaptation. In: Chittaro, L. (ed.) Mobile HCI 2003. LNCS, vol. 2795, pp. 456–460. Springer, Heidelberg (2003) 22. Servface Project: Service annotation for user interface composition (7th Framework European Programme Project) (2008), http://www.servface.org 23. Gabillon, Y., Calvary, G., Fiorino, H.: Composing interactive systems by planning. In: UbiMob 2008, Saint Malo, France, pp. 37–40 (2008)
Service Discovery in Ubiquitous Feedback Control Loops Daniel Romero, Romain Rouvoy, Lionel Seinturier, and Pierre Carton INRIA Lille-Nord Europe, ADAM Project-team University of Lille 1, LIFL CNRS UMR 8022 59650 Villeneuve d’Ascq, France [email protected]
Abstract. Nowadays, context-aware applications can discovery and interact with services in ubiquitous environments in order to customize their behavior. In general, these providers use diverse discovery and interaction protocols. Furthermore, they can join and leave the environment at anytime, making difficult the utilization of services. Therefore, this mobility and variability in terms of protocols impose a low coupling between the interacting entities and the need for spontaneous communications. Unfortunately, the existing works in literature fail to deal with these needs in a simple and flexible way. In this paper we face this problem by defining ubiquitous bindings for SCA (Service Component Architecture) applications. These bindings modularize the discovery concerns promoting sharing of common discovery functionalities and simplifying the integration of discovery protocols. In this way, these bindings enable the transparent advertisement, discovery, filter, and access of services available in the environment. Our ubiquitous bindings are integrated into the FraSCAti platform and their benefits are demonstrated by building ubiquitous feedback control loops.
1
Introduction
In ubiquitous environments, different computational entities (mobile devices, laptops, sensors, etc.), both providing and consuming services, arrive and leave routinely [1]. The different applications executing in these environments exploit this richness to adapt and improve their behavior. However, in ubiquitous environments, because there is no standard discovery protocol, the service providers are used to select the most suitable according to their capabilities. This results in a collection of advertised services using heterogeneous discovery and interaction protocols, and characterized by their dynamism, that cannot always be discovered and accessed by clients. Therefore, spontaneous communications become an important issue to deal with dynamicity and unpredictability of ubiquitous environments [1]. Unfortunately, the works in the literature to deal with this issue tend to be complex, not flexible enough or do not consider the needs of consumers in terms of interaction mechanism [2,3,4]. In this paper, we extend the SCA (Service Component Architecture) model [5] with the notion of ubiquitous bindings in order to support service discovery in F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 112–125, 2010. c IFIP International Federation for Information Processing 2010
Service Discovery in Ubiquitous Feedback Control Loops
113
ubiquitous environments. These bindings enable applications to discover and filter SCA services available in the environment, which are then accessed using the supported SCA bindings. We claim that SCA extensibility and independence from communication and implementation technologies allow the transparent management of service discovery. Furthermore, by encapsulating spontaneous communications in SCA bindings, we provide the flexibility required for choosing the most suitable discovery and communication mechanism between consumers and providers. We illustrate the utilization of our bindings by designing ubiquitous feedback control loops (FCLs) [6]. This kind of FCLs allows us to deal with runtime adaption of context-aware applications supporting the mobility of the participating entities. The ubiquitous bindings are implemented as SCA components for promoting reuse and integration into the FraSCAti platform [7]. The rest of this paper is organized as follows. We start by introducing the motivations for discovery and adaptation in the landscape of context-aware applications (cf. section 2). We continue with the foundation of our proposal (cf. section 3) before introducing our ubiquitous FCLs (cf. section 4). Then, we present our lightweight solution for supporting discovery in SCA applications (cf. section 5) and the evaluation of our implementation (cf. section 6). In section 7, we compare our approach with existing solutions for service discovery in ubiquitous environments. Finally, we summarize the conclusions of this work and the promising research directions in section 8.
2
Motivations and Challenges
This section highlights the challenges for context-aware applications in ubiquitous environments. We start by describing a motivating scenario (cf. section 2.1) before introducing the challenges it exhibits (cf. section 2.2). 2.1
Motivating Scenario
A smart home generally refers to a house environment equipped with various types of sensor nodes, which collect information about the current temperature, occupancy (movement detection), noise level, and light states. In addition to that, actuators are also deployed within rooms to physically control appliances, such as lights, air conditioning, blinds, television, and stereo. In this environment, both sensors and actuators can be accessed from mobile devices owned by the family. Furthermore, the control system deployed in such a smart home is able to retrieve preferences about room configuration from mobile devices and change the room state according to them. For example, preferences can describe the temperature and light levels accepted by each family member. When several people share the same room, the decision is based on merged preferences. In case of conflict, the decision prioritizes the first person that enters the room. The mobile devices also have an application that enables users to control the appliances in home. This application has several modules (one for each appliance)
114
D. Romero et al.
that are activated or deactivated according to the current battery level, the battery saving preference and activation of modules preference (this information is also provided by the mobile device). The modules also are installed/uninstalled regarding the changes in the appliance configurations. The following paragraphs describe two concrete situations of the scenario. Alice listens to music in the living room. The temperature conforms to her preferences. When Bob enters the living room, the controller detects his device and retrieves the preferences (related to the room configuration as well as for the application allowing the access to appliances). Let assume that there is no conflict with the temperature level. However, the light level is too low for Bob’s tolerance. The light range specified by Alice includes the level accepted by Bob. Hence, the system decides to modify the light level of the room according to Bob’s preferences. On the other hand, the system analyses battery level and decides that it must deactivate the multimedia module allowing downloads from a multimedia server available at home. In another situation, Alice installs a new TV. The system detects the new device and retrieves the required module to control it. When Bob arrives home, the system detects his device and installs the module as well. 2.2
Key Challenges
According to our scenario, we can identify three key challenges for contextaware applications in ubiquitous environments: i) heterogeneity, ii) mobility and iii) runtime adaptation. The first one refers to the variability in terms of devices (that differ in their processing capabilities), services (implemented with several technologies) and context information (that has different syntax and semantic) present in the environment. This requires a flexible solution in terms of communication (e.g., protocols and data representation) that allows applications to access the context and services. The mobility is concerned with the dynamicity of services and context providers, which can spontaneously join and leave (e.g., mobile devices in the scenario). Hence, the applications should keep working even if some providers are gone as well as they should have the possibility to discover new services. Finally, the adaptation of context-aware applications requires the retrieval and processing of the context information for deciding the needed reconfigurations. In this adaptation should be considered the variable capabilities of the different devices that execute the context-aware applications. As presented in this paper, we face the mobility and heterogeneity issues using our RESTful (cf. section 3.3) and ubiquitous bindings (cf. section 5). We also provide some highlights for dealing with the adaptation at runtime by defining ubiquitous feedback control loops (cf. section 4).
3
Background
In this section, we present the foundation of our proposal—i.e., the feedback control loops (cf. section 3.1) and SCA (cf. section 3.2). We also introduce the
Service Discovery in Ubiquitous Feedback Control Loops
115
FraSCAti platform (cf. section 3.2) and RESTful bindings (cf. section 3.3), which we use for implementing ubiquitous FCLs. 3.1
Feedback Control Loops for Autonomic Computing
The Autonomic computing enables the development of applications that exhibit properties such as self-configuration, self-optimization, self-healing and self-optimization [6,8]. These properties are generally achieved by means of the MAPE-K model, which is composed by the followings phases: i) Monitoring phase to collect, aggregate, and filter events from a managed resource, ii) Analysis phase that consist in the processing of the information recollected in the previous step, iii) Planning phase that defines the actions needed to achieve goals and objectives determined in the analysis, and iv) Execution phase for executing the plan determined in the previous step and using the adaptability capabilities of the system. All the different phases share the Knowledge base that includes historical logs, configuration information, metrics and policies. 3.2
The Service Component Architecture (SCA)
The Service Component Architecture [5] is a set of specifications for building distributed application based on SOA and Component-Based Software Engineering (CBSE) principles. In SCA, the basic construction blocks are the software components, which have services (or provided interfaces), references (or required interfaces) and expose properties. The references and services are connected by means of wires. SCA specifies a hierarchical component model. This means that components can be implemented either by primitive language entities or by subcomponents. In the latter case the components are called composites. SCA is designed to be independent from programming languages, Interface Definition Languages (IDLs), communication protocols and non-functional properties. In this way an SCA-based application can be built, for example, using components in Java, PHP and COBOL. Furthermore, several IDLs are supported, such as WSDL and Java Interfaces. In order to support interaction via different communication protocols, SCA provides the notion of binding. For SCA references, bindings describe the access mechanism used to call a service. In the case of services, the bindings describe the access mechanism that clients have to use to invoke the service. Finally, an SCA component may be associated with policy sets or intents that declare the set of non-functional services that it depends upon. The SCA specification includes security and transactions policies [9], but the model may be extended with new ones if required. The FraSCAti Platform. This platform [7] allows the development and execution of SCA based distributed applications. The platform itself is built as an SCA application, i.e., its different subsystems are implemented as SCA components. FraSCAti extends the SCA component model to add reflective capabilities in the application level as well as in the platform. Furthermore, the platform applies interception techniques for extending SCA components with non-functional services, such as confidentiality, integrity and authentication.
116
3.3
D. Romero et al.
REpresentational State Transfer (REST)
REST [10] is an architectural style to define distributed applications. Typically, REST defines the principles for encoding (content types), addressing (nouns) and accessing (verbs) resources using Internet standards (e.g., URIs, HTTP, XML and mime-types). Resources, which are key to REST, are addressable using a universal syntax (e.g., a URL in HTTP) and share a uniform interface for the transfer of application states between client and server (e.g., GET/POST/PUT/DELETE in HTTP). REST resources may typically exhibit multiple typed representations using—for example—XML, JSON, YAML, or plain text documents. Therefore, the simplicity, lightness, reusability, extensibility and flexibility properties that characterized REST make it a suitable option for exchanging context information in ubiquitous environments. RESTful Bindings. The REST bindings [11] follow the REpresentational State Transfer Principles. In this way, these bindings support multiple context representations (e.g., XML, JSON and Java Object Serialization) and communication protocols (HTTP, XMPP, FTP, etc.). This flexibility allows us to deal with the heterogeneous context managers and context-aware applications as well as with the different capabilities of the devices that execute them. Details about the architecture of these bindings are presented in [11]. Synthesis: SCA provides a flexible and extensible component model that can be used in ubiquitous environments to deal with heterogeneity and mobility. In particular, as we present in section 5, we benefit from the protocol independence for defining ubiquitous bindings that enable spontaneous communications. Furthermore, the FraSCAti capabilities in terms of runtime adaptation for applications and the platform itself make it a good option for dealing with autonomic FCL in ubiquitous environments.
4
Ubiquitous Feedback Control Loops
In order to face the adaptation challenge in ubiquitous environments, we propose the architecture presented in Figure 1 to implement our ubiquitous FCLs [11]. We choose FCLs to support dynamic reconfigurations because they provide a clear isolation of the different steps of the adaptation process. This feature allows us to distribute the concerns in several entities and reduce the coupling between them. We give the ”ubiquitous” adjective to these FCLs because they have the capacity to configure themselves at execution time. This means that some parts of the loop can dynamically join and leave, such as the applications running on mobile devices in our smart home scenario. Furthermore, the low coupling between the FCL parts promotes their integration at runtime with others ubiquitous FCLs. In our FCL (cf. Figure 1), the Controller encapsulates the functionalities required for monitoring, analyzing and planning. This means that the Controller detects the presence of new services, collects the information from the mobile devices (that join and leave the environment), processes the retrieved information and decides the required reconfigurations of the context-aware applications.
Service Discovery in Ubiquitous Feedback Control Loops
117
These applications can be either deployed on the mobile devices or be one of the available services in the environment (e.g., Multimedia Server). Consequently, the Controller requires to dynamically locate the service that operates reconfigurations of the context-aware applications. In particular, the mobile device and Multimedia Server enclose the execution part of our FCL. Moreover, the mobile device also hosts monitoring responsibilities since it notifies the Controller when changes in the provided context information occur (e.g., the battery level decreases or increases). Thus, the mobility of the different elements (mobile devices and services) in the FLC makes necessary the definition of ubiquitous FCL. The next section introduces the required bindings in SCA in order to deal with this mobility. Controller
Home Control System Reconfiguration Executor
Module Store
RPC (SOAP) Rule Engine
Reconfiguration Engine (FScript)
Adaptation Triggering
Context Processing
SCA Platform (FraSCAti)
UB
Adaptation Runtime
UPnP TV
UB
Multimedia Server
UB
UB
Multimedia Provider
Server Runtime Mobile Device
View
Controller
TV Control Module Multimedia Module
Client-side Application Reconfiguration Engine (FScript)
Reconfiguration Engine (FScript)
SCA Platform (FraSCAti)
Context Policy
UB
... Legend:
SCA Platform (FraSCAme)
Context Policy Mobile Runtime
A
B
Third-party provider SCA component SCA composite
SCA service
SCA wire (local)
SCA reference
SCA wire (remote)
UB Ubiquitous Binding
Fig. 1. Ubiquitous Feedback Control Loop for the smart home scenario
5
Discovery of Ubiquitous Services
As already mentioned, one of the challenges for context-aware applications is the mobility. In this paper, we tackle this issue by defining a new type of binding for the SCA component model: ubiquitous bindings. These bindings provide a simple and lightweight mechanism for communications and promote a low coupling between the interacting entities. Furthermore, following the SCA principles, the ubiquitous bindings enable the management of the SCA service discovery in a transparent way. These advantages make the ubiquitous bindings a suitable solution to deal with dynamicity in our ubiquitous FCL. In the rest of this section we present the design (cf. section 5.1) and implementation (cf. section 5.2) of ubiquitous bindings enabling the discovery of SCA services. 5.1
Ubiquitous Bindings
In ubiquitous environments, services constantly join and leave. For this reason, we need to provide our SCA-based FCLs with the functionality required to deal
118
D. Romero et al.
with this dynamicity. In order to introduce spontaneous interoperable communications [1,12] in SCA, we define the concept of ubiquitous binding. This new type of binding integrates state-of-the-art Service Discovery Protocols (SDPs) and enable the establishment of communication wires at runtime. To do that, we consider three design aspects of the SDPs [12]: i) provider invocation, ii) description and attribute definition, and iii) provider selection. Regarding the invocation, some SDPs (e.g., UPnP [13]) define the communication mechanism. However, this mechanism is not always the most suitable. Therefore, an ubiquitous binding advertises a service provided by the SCA component as being accessible via the different SCA bindings associated with it. On the other hand, we are interested in the service description and provider selection because we need to choose the service provider according to the costumer requirements. Hence, we benefit from the discovery protocol flexibility to define properties associated with the QoS (Quality of Service) or QoC (Quality of Context ) attributes (in the case of context-aware applications) [14] in the service advertisements. For defining the filters allowing provider matching, we use LDAP filters [15]. Figure 2 depicts the definition of the ubiquitous bindings for services (left side) and references (right side). The Discovery Protocol is the name of the discovery protocol associated to the binding. The definition of an ubiquitous binding has the filter attribute in the client-side. This attribute specifies an LDAP filter that expresses restrictions of the required service in terms of its properties. In the server-side, the ubiquitous binding can have properties that provide additional information about the service, such as QoC attributes. Each property is described by the property element. By defining the ubiquitous bindings in this way, we can support the discovery of SCA context services via different discovery protocols and then access the services using the most suitable communication protocol according to the application needs. The lower part of Figure 2 shows examples of an ubiquitous binding with the SLP [16] protocol. The precision and probabilityOfCorrection are QoC attributes [17] that describe the context provided by the service. These attributes and the contextType properties are used in the definition of the LDAP filter in the reference. The bindings in the service side correspond to the different communication that can be used to access the context information. Service (server-side)
Reference (client-side)
<property name="..." value="..."> ...
<service name="battery-level"> ... <property name="probabilityOfCorrection" value="medium"/> <property name="reputation" value="medium"/> <property name="contextType" value="batteryLevel"/> ...
... ...
Fig. 2. SCA definition of the ubiquitous bindings
Service Discovery in Ubiquitous Feedback Control Loops
5.2
119
Implementation of the Ubiquitous Bindings in the FraSCAti Platform
We have integrated our ubiquitous bindings into the FraSCAti [7] platform. The FraSCAti selection is motived by two main reasons: i) the reflective capabilities that it introduces in the SCA programming model to allow dynamic introspection and reconfiguration of SCA based context consumers and producers, and ii) we can run the light version of platform (FraSCAme) on the mobile devices with limited capabilities [11]. Figure 3 depicts the integration of our ubiquitous bindings into FraSCAti. As it can be seen, an ubiquitous binding is composed of the Discoverer and Advertiser components. The Discoverer plays the role of a stub in a traditional FraSCAti binding [18]. In other words, a reference of the client component is connected to the Discoverer and is responsible for providing access to the remote SCA services. In addition to that, the Discoverer enables SCA components to search the required services at runtime. When the service is detected and selected, the Discoverer component provides access to it. At the server side, the Advertiser (or the skeleton in the FraSCAti terminology) publishes the services whose bindings are declared as ubiquitous. Both of them, the Discoverer and the Advertiser, are associated with a specific discovery protocol (e.g., UPnP or SLP). The proposed architecture for these components modularizes different concerns of service discovery (i.e., search, selection, and provider monitoring) and introduces some optional optimizations (in the Discoverer case). In this way we foster the reuse of the different components (in particular for the selection of providers), the flexibility to use different implementations and choose the required components (not all the components are mandatory). In the following sections, we present the detailed architecture of the Discoverer and Advertiser components.
Context Processing
Provider Selector Active Stub
<<Stub>> Discoverer
Adaptation Runtime
Legend: Component
Composite
Reference
Service
Wire
Stubs Registry
Advertisements/ Search Responses Search Requests Communication Channel
<<Skeleton>> Advertiser
<<Skeleton>> REST
Mobile Runtime
Context Policy
Fig. 3. Integration of ubiquitous bindings into FraSCAti
Discoverer Component. This component is associated with a specific SCA reference. In order to reduce the memory footprint, we externalize the components providing common functionality to different discoverers. In particular, different implementations of the discoverer can share the components for provider
120
D. Romero et al.
selection, the active stubs (that encapsulate the communication with the remote service) and the stub registry (that keeps a list of the stubs already instantiated). The Discoverer Component (left side in Figure 4) has a Discoverer Orchestrator that coordinates the discovery of the requested SCA service. The Finder component sends the requests to detect the potential providers in the environment. If the filters with attributes are supported (e.g., SLP), the Finder translates the LDAP filters to the protocol scheme. If the SDP does not support automatic service selection and it is needed (e.g., UPnP), the Finder uses a provider selector for choosing the service. When the provider is selected, the Discovery Orchestrator disables the Finder and uses the Provider Monitor for monitoring the service availability. When the service is invoked the first time, the Discovery Orchestrator verifies in the stubs registry if there is a stub for the service provider. When this happens, the Discovery Orchestrator selects the registered stub as Active Stub. Otherwise, the Orchestrator uses the FraSCAti binding factory [11] (which is used to create wires and binding in the platform) in order to instantiate and configure the Active Stub. When the provider becomes unavailable, the Provider Monitor notifies the Orchestrator that actives again the Finder and asks it to find a new provider.
change-service-property-value Discoverer
provider-selector Finder
Service Registry Promoter component-state
active-stub Discovery Orchestrator
Advertiser
ldap-filter
stub-registry binding-factory Provider Monitor
Legend: Composite
Reference
Component
Service
Optional Component Wire
Property
Fig. 4. Discoverer and Advertiser Architecture
Advertiser Component. The Advertiser (right side in Figure 4) contains a component Promoter with the following responsibilities: 1. Advertise the available SCA services in the Service Registry. Each entry in the Service Registry contains the required information for the published service (e.g., name, type) that is required to advertise the service. This information can be updated at runtime. 2. Listen and process search requests. 3. Notify events associated with the SCA component state. A given SCA component contains one Advertiser of an ubiquitous binding type. This means that all the services with a same ubiquitous binding type may be exported using the same advertiser instance.
Service Discovery in Ubiquitous Feedback Control Loops
121
Implementation Details. We have implemented ubiquitous bindings for SLP and UPnP. For the discovery via UPnP, we use Cyberlink for Java1 version 1.7 and for SLP the jSLP library2 . Our RESTful bindings are based in the Comanche3 web server [19]. Both FraSCAti zand Comanche are based on the Fractal component model and use the Julia4 implementation of the Fractal runtime environment [19]. Synthesis: The ubiquitous bindings provide a flexible and simple mechanism that allows a transparent management of mobility in our ubiquitous FCLs. These bindings leverage on the clear separation of concerns promoted by the SCA component model to avoid impact the business logic. On the other hand, the modularity of the SCA architecture of our bindings promotes their reuse and the flexibility to select the more suitable implementations of the different components.
6
Empirical Validation
To evaluate the performance of ubiquitous bindings, we implemented the scene 1 of the smart home scenario (cf. section 2.1). We tested several configurations of the scenario using two Dell Latitude 430 laptops, with the following software and hardware configurations: 1.33 GHz processor, 2 GB of RAM, Intel Pro wireless 3945ABG card, WIndows XP SP3, Java Virtual Machine 1.6.0 14, Julia 2.5.2 and FraSCAti 1.2. The mobile clients are two Nokia N800 Internet Table with 400 Mhz, 128 MB of RAM, interface WLAN 802.11 b/e/g, Linux Maemo (kernel 2.6.21), CACAOVM Java Virtual Machine 0.99.4, Julia 2.5.2 and FraSCAti 1.2. We also evaluate the RESTful bindings using XML, JSON and the Java Object Serialization for context representations. We used the library Xerces2 Java Parser 2.9.1 5 for XML, and JSON-lib 2.2.34 6 to serialize the information as JSON documents. We used SCA services to simulate the home sensors and actuators. These services are always executed on the same Dell Latitude hosting the server part of the scenario. Table 1 summarizes the overhead observed for discovery via the ubiquitous bindings. This time includes the discovery, instantiation, and configuration of the SCA wires. The given measures are the average of 10.000 successful tests, of which the first 100 were considered as part of the warm-up. We have reduced the UPnP execution time avoiding the recovery of the service description file. We do not need this file because the advertisement messages already contain the properties required to select the provider. Regarding the discovery cost, we observe that it is possible to integrate the ubiquitous bindings in the feedback 1 2 3 4 5 6
Cyberlink for Java: http://cgupnpjava.sourceforge.net/ jSLP: http://jslp.sourceforge.net/ Comanche web server: http://fractal.ow2.org/tutorials/comanche.html Julia: http://fractal.ow2.org/julia Xerces2 Java Parser: http://xerces.apache.org/xerces2-j/ JSON-lib: http://json-lib.sourceforge.net
122
D. Romero et al. Table 1. Performances of RESTful bindings Providers Configuration a) 1 Local Providers b) 1 External Provider
Provider N/A Laptop
c) 1 External Provider
N800
d) 2 External Providers
Laptop & N800
e) 2 External Providers
N800 A & B
Discovery Latency
Retrieval Latency
SLP (ms)
UPnP (ms)
Object (ms)
JSON (ms)
XML (ms)
68 91
73 111
244 292
304 252
315 261
216 507 736
284 547 769
513 576 641
817 839 989
818 845 1046
control loops with a reasonable overhead (68ms per message). We also notice that the network increments the discovery latency approximately 25%, comparing the tests with a local provider (configuration a) and the laptop as provider (configuration b). Although the measures with mobile devices (configuration c, d and e) demonstrate that we can discovery services in a rational time, their use as providers considerably increase the discovery latency. This additional cost is mainly due to the limited processing capacity of these devices. As expected, SLP is more efficient than UPnP, as it is a higher-level protocol. Table 1 also reports the costs of interactions once the services are discovered in the feedback control loop. In these tests, we use our RESTful bindings (cf. section 3.3) and three different representations for information retrieval (Java Object Serialization, JSON and XML) for the communication. As it can be seen, the exchange the information costs 244ms per message (configuration a). In the case of configurations including the mobile device (c, d and e), we again observe the additional overhead.
7
Related Work
In this section we present some works that deal with service discovery in ubiquitous environments. INDISS [2] (INteroperable DIscovery System for network Services) is a system based on event-based parsing techniques to provide full service discovery interoperability. The authors claim that this interoperability is achieved without altering existing applications and services. INDISS exploits the multicast groups used by different discovered protocols to detect the protocols being used in the environment. Then, INDISS transforms the SDP messages to events that will be transformed again into messages that correspond to the SDP supported by the client application. Although interoperability between discovery protocols is an interesting solution for the mobility problem, the applications have yet to use always the same communication protocols defined by the SDP even if it is not suitable. Our SCA-based solution provides the required flexibility for client (resp. server) applications can search (resp. advertiser) the required (resp. provided) services using the more suitable discovery and interaction protocols. In this way, the devices only deploy the required functionality. In [20], authors propose a framework for the development of an adaptive multi-personality service discovery middleware, which operate in fixed and adhoc networks. According to authors, the framework promotes component re-use
Service Discovery in Ubiquitous Feedback Control Loops
123
and simplifies configuration and dynamic reconfiguration of multiple concurrent protocols. With our ubiquitous bindings we also foster re-use and dynamic reconfiguration capabilities thanks to the combination of the SCA component model and the FraSCAti platform. ReMMoC (Reflective Middleware for Mobile Computing) [4] is an adaptive middleware for discovery and access of services in mobile clients. According to authors, ReMMoC reconfigures itself to use the current discovery protocols present in the environment. Furthermore the middleware interoperate with services implemented upon different interaction types. Although in our approach we do not deal with the adaptation issue in the communication level, by using the reconfigurability capabilities offered by FraSCAti and our ubiquitous feedback control loops, we could identify situations where new discovery or interaction bindings are required and deploy them on the clients. Furthermore, the integration of ubiquitous bindings in SCA promotes the use of these bindings in any SCA application not only mobile clients.
8
Conclusions and Perspectives
In order to deal with the mobility issue in ubiquitous environments, we have enabled spontaneous communications in the SCA-based applications. To do it, in this paper, we define ubiquitous bindings, a new kind of binding for the SCA standard that allows the integration at runtime of service providers and consumers. Our ubiquitous bindings advertise and discover services via different discovery protocols and select them applying LDAP filters. The flexibility of the ubiquitous bindings allows the service access using the SCA traditional bindings associated with the services. Furthermore, the design of the ubiquitous bindings is based on SCA to enable their integration in any SCA runtime platform. By benefiting from SCA extensibility and its clear separation of concerns, we integrate in applications discovery management in a transparent way. Thus, the originality of our solution rests on its simplicity and efficiency achieved by the combination of well defined and accepted standards and protocols. To illustrate the use of ubiquitous bindings, we define ubiquitous FCLs enabling adaptation of context-aware applications. The exchange of context information in these FCLs is achieved via RESTful bindings that allow us to face heterogeneity in ubiquitous environments. The ubiquitous bindings have been integrated into the FraSCAti platform, following an architecture that promotes the sharing of common functionality of service discovery. The suitability of our ubiquitous bindings was confirmed with tests executed using a smart home scenario. Future work includes further tests using different kinds of mobile devices, protocols and service providers. In the particular case of ubiquitous FCLs, we plan to improve the performance of our solution by introducing a cache mechanism that enables the temporal storing of the retrieved context information. In this way, when all the required information is gathered, it can be processed even if
124
D. Romero et al.
any of the context providers is not available (or the connection was lost). Finally, we will exploit the introspection and reconfiguration capabilities brought into SCA by the FraSCAti platform in order to instrument the adaptation process in the mobile devices via our RESTful bindings.
References 1. Kindberg, T., Fox, A.: System software for ubiquitous computing. IEEE Pervasive Computing 1(1), 70–81 (2002) 2. Bromberg, Y.D., Issarny, V.: Indiss: interoperable discovery system for networked services. In: Alonso, G. (ed.) Middleware 2005. LNCS, vol. 3790, pp. 164–183. Springer, Heidelberg (2005) 3. Nakazawa, J., Tokuda, H., Edwards, W.K., Ramachandran, U.: A bridging framework for universal interoperability in pervasive systems. In: ICDCS 2006: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Washington, DC, USA, p. 3. IEEE Computer Society, Los Alamitos (2006) 4. Grace, P., Blair, G.S., Samuel, S.: A reflective framework for discovery and interaction in heterogeneous mobile environments. SIGMOBILE Mob. Comput. Commun. Rev. 9(1), 2–14 (2005) 5. Beisiegel, M. et al: Service Component Architecture (November 2007) 6. Hariri, S., Khargharia, B., Chen, H., Yang, J., Zhang, Y., Parashar, M., Liu, H.: The Autonomic Computing Paradigm. Cluster Computing 9(1), 5–17 (2006) 7. Seinturier, L., Merle, P., Fournier, D., Dolet, N., Schiavoni, V., Stefani, J.B.: Reconfigurable sca applications with the frascati platform. In: SCC 2009: Proceedings of the 2009 IEEE International Conference on Services Computing, Washington, DC, USA, pp. 268–275. IEEE Computer Society, Los Alamitos (2009) 8. Parashar, M., Hariri, S.: Autonomic computing: An overview. In: Banˆ atre, J.-P., Fradet, P., Giavitto, J.-L., Michel, O. (eds.) UPP 2004. LNCS, vol. 3566, pp. 257– 269. Springer, Heidelberg (2005) 9. Open SOA: SCA Transaction Policy, Version 1.0 (December 2007) 10. Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine (2000) 11. Romero, D., Rouvoy, R., Seinturier, L., Chabridon, S., Denis, C., Nicolas, P.: Enabling Context-Aware Web Services: A Middleware Approach for Ubiquitous Environments. In: Sheng, M., Yu, J., Dustdar, S. (eds.) Enabling Context-Aware Web Services: Methods, Architectures, and Technologies, Chapman and Hall/CRC (July 2009) 12. Zhu, F., Mutka, M.W., Ni, L.M.: Service discovery in pervasive computing environments. IEEE Pervasive Computing 4(4), 81–90 (2005) 13. UPnP Forum: UPnP Device Architecture 1.0 (April 2008), http://www.upnp.org/resources/documents.asp 14. Krause, M., Hochstatter, I.: Challenges in modelling and using quality of context (QoC). In: Magedanz, T., Karmouch, A., Pierre, S., Venieris, I.S. (eds.) MATA 2005. LNCS, vol. 3744, pp. 324–333. Springer, Heidelberg (2005) 15. Smith, M., Howes, T.: RFC 4515 - Lightweight Directory Access Protocol (LDAP): String Representation of Search Filters. IETF RFC (1996) 16. Guttman, E., Perkins, C., Veizades, J., Day, M.: Service Location Protocol, Version 2. RFC 2608 (Proposed Standard) (June 1999), http://tools.ietf.org/html/rfc2608
Service Discovery in Ubiquitous Feedback Control Loops
125
17. Dey, A.K., Abowd, G.D., Salber, D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Hum.-Comput. Interact. 16(2), 97–166 (2001) 18. SCOrWare Project: SCA Platform Specifications - Version 1.0 (2007) 19. Bruneton, E., Coupaye, T., Leclercq, M., Qu´ema, V., Stefani, J.B.: The Fractal component model and its support in Java. Software: Practice and Experience – Special issue on Experiences with Auto-adaptive and Reconfigurable Systems 36(1112), 1257–1284 (2006) 20. Flores-Cort´es, C.A., Blair, G.S., Grace, P.: A multi-protocol framework for ad-hoc service discovery. In: MPAC 2006: Proceedings of the 4th international workshop on Middleware for Pervasive and Ad-Hoc Computing, p. 10. ACM, New York (2006)
QoS Self-configuring Failure Detectors for Distributed Systems Alirio Santos de S´ a and Raimundo Jos´e de Ara´ ujo Macˆedo Distributed Systems Laboratory - LaSiD / Computer Science Departament Federal University of Bahia, Salvador, Bahia, Brazil {aliriosa,macedo}@ufba.br
Abstract. Failure detectors are basic building blocks from which fault tolerance for distributed systems is constructed. The Quality of Service (QoS) of failure detectors refers to the speed and accuracy of detections and is defined from the applications and computing environment under consideration. Existing failure detection approaches for distributed systems do not support the automatic (re)configuration of failure detectors from QoS requirements. However, when the behavior of the computing environment is unknown and changes over time, or when the application itself changes, self-configuration is a basic issue that must be addressed - particularly for those applications requiring response time and high availability requirements. In this paper we present the design and implementation of a novel autonomic failure detector based on feedback control theory, which is capable of self-configuring its QoS parameters at runtime from previously specified QoS requirements.
1
Introduction
Enterprise Information Technology infrastructures make use of different components of hardware and software to supply highly available, secure and scalable services for applications with distinct quality of service (QoS) requirements. In order to fulfill such requirements, these infrastructures require mechanisms which guarantee a quick reaction and recovery in the presence of system component failures. In this context, failure detectors are fundamental for monitoring failures, enabling recovery processes to be triggered. Thus, design and implementation of failure detectors have been object of intense research in recent decades [1,2,3,4,5,6,7,8]. In distributed systems over networked computers, failure detectors are implemented by monitored and monitor processes which periodically exchange messages each other. To guarantee a quick recovery in presence of component failures, the monitoring period has to be as short as possible. However, shorter monitoring periods can increase resource consumption, compromise the application response time and decrease the efficiency and speed of the detection and recovery mechanisms. In addition, complications emerge when the computational environment or application characteristics change at runtime. In these scenarios, adjusting the monitoring period automatically is a tremendous challenge which has not F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 126–140, 2010. c IFIP International Federation for Information Processing 2010
QoS Self-configuring Failure Detectors for Distributed Systems
127
been appropriately addressed in the literature. Most of the published papers have mainly focused on adaptive detection which uses prediction mechanisms to calculate detection timeout without taking into account the dynamic adjusting of monitoring periods [2,4,5,6]. The few papers which consider the dynamic configuration of a monitoring period do not take into account the QoS metrics generally accepted in the literature [9,7,8]. Paper Contributions. This paper proposes a failure detector that is capable of self-configuring its operational parameters in response to changes in the computing environment or application at runtime, according to user-defined QoS requirements. Systems with such characteristics are known as autonomic where self-configuration is one of the required properties [10]. Most difficulties in the implementation of these self-configuring failure detectors relies on the modeling of the distributed system dynamics - which is hard to characterize using specific probability distribution functions when load varying environments are considered (e.g. cloud computing environments). For modeling such a dynamic behavior of distributed systems we use the feedback control theory commonly applied in the industrial automation systems arena [11]. The proposed failure detector was completely implemented and evaluated by simulation using QoS metrics such as detection time, mistake recurrence time, mistake duration, percentage of mistakes and availability. These metrics allowed us to evaluate the speed and accuracy of failure detections under varying computational loads. Even without related work for a direct performance comparison, we compared our approach with a traditional adaptive failure detector approach manually configured with different monitoring periods. The results showed that, in most cases, our autonomic failure detector performed better than such adaptive failure detector for each monitoring period considered. Paper Organization. The remainder of this paper is organized as follows. In section 2, it is discussed related work and the theoretical context of the contribution of this paper. In section 3, it is presented the system model and the QoS metrics used to configure and evaluate the proposed failure detector. In section 4, it is described the design and evaluation of our autonomic failure detector approach. Lastly, in section 5, final remarks and future work are presented.
2
Related Work and Theoretical Context
Fault-tolerance mechanisms for distributed systems must guarantee the correct functioning of services even in the presence of faults. The design of these mechanisms considers models with hypotheses about the behavior of the components over faulty conditions. The most commonly used model is called crash model and assumes that failed (crashed) components do not respond to any request. Even when a crash model cannot be verified in real scenarios it can be emulated by masking and fault hierarchy techniques[12]. Detecting crash failures of system components is a basic issue for the working of many fundamental protocols and
128
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
algorithms used to build dependable systems. For example, in a passive replication scheme, the failure of a primary replica has to be readily detected in order to allow a backup replica to take on the role of failed replica with minimal impact on the distributed application [13]. Crash failure detection usually considers monitored processes which periodically send their state using messages, named heartbeats, to a monitor process. The monitor determines a timeout interval used to define the instant of the arrival of heartbeat messages. If a heartbeat does not arrive in the timeout interval the monitor will believe that the monitored process has crashed. This detection model depends on timeliness constraints about transmission and processing of the monitoring messages. In an asynchronous distributed system, time bounds for processing and transmission of messages are unknown which makes it impossible to solve certain problems of fault-tolerance in a deterministic way [14]. To address this, Chandra and Toueg (1996) [1] introduced the unreliable failure detectors approach. These failure detectors are termed unreliable because they may wrongly indicate the failure of a process that is actually correct and, on the other hand, may not indicate the failure of faulty processes. Chandra and Toueg demonstrated how, encapsulating a certain level of synchrony, unreliable failure detectors can solve fundamental problems of the asynchronous distributed systems (e.g. consensus and atomic broadcast). Despite the great importance of Chandra and Toueg‘s work for the understanding and solution of fundamental problems in distributed systems, the absence of timeliness bounds of the asynchronous model imposes hard practical challenges for the implementation of failure detectors. One of these challenges is deciding appropriate values for detection timeout. Long timeouts make detection slow and can compromise the system response time during faults of the system. However, short timeouts can degrade failure detector reliability and damage the system performance as many algorithms and protocols which rely on failure detector information can do additional processing and message exchange imposed by false failure suspicions. Therefore, many researchers studied the use of delay predictors in failure detectors implementation (e.g. [4,5,6]). These predictors suggest, at runtime, values for detection timeout to achieve quick detection with a minimal impact on the reliability of the detector. However, these researches do not consider the dynamic adjustment of monitoring periods which is another important factor for the performance of failure detectors. The specification of Chandra and Toueg’s failure detectors addresses properties which are difficult to evaluate in practice (e.g. ”eventually every process that crashes is permanently suspected by some correct process”). Because of that, Chen (2002)[2] defined QoS metrics which have been used to evaluate the speed and accuracy of failure detector implementations. Then, the work of a designer is to use a monitoring period and a delay predictor which deliver a detection service with a QoS level suitable for the related application requirements. When the computing environment behavior does not change over time or when it is possible to estimate this behavior using some probability distribution function then we can adequately configure the failure detector so that it shows a satisfactory QoS level. For example, in [2] a procedure to make an
QoS Self-configuring Failure Detectors for Distributed Systems
129
offline configuration for the failure detectors is presented. Also, in the same paper the authors suggest that such procedure can be re-executed whenever the network traffic characteristics change. However, the effects of this re-execution on the detector performance have not been evaluated. In [3], the authors shortly comment on a consensus-based procedure to dynamically adjust the monitoring period when certain conditions occur. However, the authors have not detailed their solution, neither evaluated it in terms of QoS metrics. The authors of [9], [7] and [15] explored the runtime configuration of failure detectors. However, these researches consider that the environment behavior does not change and they do not demonstrate how to dynamically setup a failure detector using QoS metrics such as detection time, mistake duration and mistake recurrence time. As far as we are aware, this is the first work to propose a failure detector able to dynamically self-configure by adjusting both the monitoring period and the timeout according to a specified QoS.
3
Basic Issues: System Model and QoS Metrics
We consider a distributed system made up of a finite set Π = {p1 , p2 , ..., pn } of processes interconnected by unreliable channels, where corrupted messages are discarded in the receptor processes and messages may be lost. No upper bounds are assumed for message transmission delays and processing times. That is, it is assumed an asynchronous distributed system. Processes can fail by prematurely stop functioning (crash faulty model). Byzantine failures are not considered. In this work we develop a crash detection service using a pull monitoring style[16]. In this monitoring style a monitor process pi periodically asks about the state of a monitored process pj by sending a ”are you alive?” (aya) message and so pj must respond by using a message called heartbeat (hb) or ”I am alive!”. The aya and the hb messages exchanged between one pair of processes pi and pj are sequentially marked on each monitoring period (τ ). Thus, ayak is the k th ”are you alive?” message sent from pi to pj and hbk the corresponding k th heartbeat message sent from pj to pi . We use sk and rk to denote the sending and receiving instants of ayak and hbk , respectively - according to the local clock of pi . There are no assumed synchronized clocks. Let rtok denote the estimate of the timeout for the receiving of hbk . If a heartbeat does not arrive after timeout rto, pi will insert pj into its suspect list. If pi receives a heartbeat with a timestamp greater than the last heartbeat received then pi will remove pj from its suspect list. In the configuration and performance evaluation of the detection service we apply the QoS metrics proposed by [2], such as Detection Time (T D); Mistake Recurrence Time (T M R); and Mistake Duration (T M ). The Detection Time represents the time interval between the crash of a monitored process (pj ) and the time when the monitor process (pi ) suspects pj permanently. The Mistake Recurrence Time measures the time between two consecutive mistakes of the failure detector. Finally, Mistake Duration measures the length time that mistake remained. T M and T M R are metrics for failure detector reliability and can be compared to Mean Time to Repair (M T T R) and Mean Time Between Failures (M T BF ), respectively. Thus, we have used in our work T M R and T M to
130
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
estimate the failure detector availability (AV ) by: AV = (T M R − T M )/T M R. The application requirements in terms of QoS detection are defined by T DU , T M U and T M RL which represent the maximum T D, the maximum T M and U U L , TM and TMR is used in [2] the minimum T M R, respectively. The notation TD where U and L represent the upper and lower bounds, respectively - we are following this pattern of notation with some differences (T D, T M , T M R instead of TD , TM , TMR as originally proposed in [2]).
4
Design and Evaluation of the AFD Approach
The autonomic failure detector (AFD) proposed is made of an autonomic manager (or controller) which observes the behavior of a basic failure detection service module (plant or managed element). Then, based on the previously defined QoS detection requirements, the autonomic manager calculates the monitoring period and the detection timeout which a monitor process pi has to use to check the state of a monitored process pj (see Figure 1).
Fig. 1. General View of the Autonomic Failure Detector
The autonomic manager executes three basic tasks: (i) computing environment and detection service sensing (ii) timeout regulation and (iii) monitoring period regulation. These tasks are described in the following subsections. 4.1
Computing Environment and Detection Service Sensing
Computing Environment Sensing. In distributed systems, when a process (pi ) receives a heartbeat from a process (pj ) it cannot know if pj is still working. This happens because the receiving of a heartbeat only carries past information about the state of pj . Thus, if pi receives a heartbeat at instant rk it knows that pj was working until rk − rttk /2, where rttk = rk − sk is the round-trip-time delay of the monitoring messages and rttk /2 is an estimate for the transmission delay for hb. Thus, the greater the interval between the arrival of heartbeat messages the greater the duration of the uncertainty of pi about the state of the pj . We can formalize such uncertainty in the following way: at a known instant t, the time interval which pi is unaware of the state of pj , which we name uncertainty time interval (uti), can be computed by uti(t) = t − (ru − rttu /2), where ru and rttu represent the time instant of the last heartbeat received and the last round-trip-time computed, respectively. If uti is measured when “are you alive?” ayak is being sent then pi will compute uti at each interval k by:
QoS Self-configuring Failure Detectors for Distributed Systems
131
Fig. 2. uti: (a) without message losses; (b) with message loss in interval k + 1
utik = sk − (ru − rttu /2). Figure 2 illustrates the uti for (a) the case where messages are not lost and (b) the case when heartbeats are lost. In order to represent the interaction delay from a monitored process to a monitor, we define the variable delay based on the last uti estimated. That is, delayk = |utik − τk−1 |, where τk = sk+1 − sk represents the monitoring period. Such a delay variable will be used later to calculate the monitoring period related to a given system load and a specified QoS (see section 4.3). If a monitored process pj does not fail and messages are not lost then |utik − τk−1 | = rttk /2 (or delayk = rttk /2), otherwise delay will proportionally increase to τ times the number of heartbeats which was not received by pi since ru . We denote the minimum, maximum, maximum variation delays observed during the failure detector execution as delay L , delay U and jitterU , respectively. When the characteristics of the computing environment change, it is possible that the observed delay L , delay U and jitterU change too. Thus, we consider a forgetting factor f in the computation of these variables, which is defined as: fk = max{0, (T DU − delaykL )}/T DU , where f0 = 1. If T DU ≈ delay L then f → 0 and the autonomic manager does not take into account the values of these variables. On the other hand, if T DU >> delay L then f → 1 and the old values of these variables have the most impact on computing the new values. The algorithm below shows the steps to calculate the delay related variables. 1. Compute delayk = |utik − τk |; 2. If k = 0 then define delaykL = delayk , delaykU = delayk and jitterk = 0; L + 3. If delaykL > delayk then delaykL = delayk , otherwise do delaykL = f ∗ delayk−1 (1 − f ) ∗ delayk ; U + 4. If delaykU < delayk then delaykU = delayk , otherwise do delaykU = f ∗ delayk−1 (1 − f ) ∗ delayk 5. Compute jitterk = |delayk − delaykL |, if jitterkU < jitterk then jitterkU = jitterk U otherwise do jitterkU = f ∗ jitterk−1 + (1 − f ) ∗ jitterk ;
In order to handle occasional delay variations, the autonomic manager utilizes a filtered version of delay using the same forgetting factor f . Let delay F be the F filtered version of delay, we compute: delaykF = f ∗ delayk−1 + (1 − f ) ∗ delayk , with delay0F = delay0 . The variables delay, delay L , delay U , and delay F represent past information obtained from received heartbeat messages. In order to predict current or expected values of delay (denoted delay E ), a safety margin (the jitter) is added to the filtered version of delay: delaykE = delaykF + jitterkU . Such delay E is then used in the monitoring period regulation mechanism (see section 4.3).
132
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
Detection Service Sensing. We compute the detection time considering the worst case – i.e., when the monitored process pj fails immediately after it has sent a heartbeat. Thus, the autonomic manager on a monitor process pi estimates that pj supposedly crashed at instant tcrashk = rk−1 − rttk−1 /2, and pi suspects pj when it does not receive the heartbeat hbk at tsuspectk = sk + rtok (where rtok denotes the estimate of the timeout for receiving hbk - see section 3). Consequently, T D can be computed by: T Dk = tsuspectk − tcrashk . For each monitoring period, we compute the number of false suspicions (nf ) and the suspicion duration (sd) as follows. When a suspicion occurs, nfk = nfk−1 +1 and sdk = rk − (su + rtou ), where su represents the sending time of ayau for which a corresponding hbu was not received in the timeout interval (rtou ). Otherwise, if no suspicions occur, nfk = nfk−1 and sdk = 0. Thus, we can compute the mean k mistake duration and the mean mistake recurrence time by T Mk = l=0 sdl /nfk and T M Rk = sk /nfk , respectively. We compute the percentage of mistakes (P oM ) and the detection service availability (AV ) by: P oMk = nfk /ne and AVk = (T M Rk − T Mk )/T M Rk , where ne = k + 1 is the total number of estimations carried out by a monitor process. 4.2
Timeout Regulation
Failure detection for asynchronous distributed systems [2,3,4,5,6] requires timeout estimators which have: (a) fast convergence in such a way that it quickly learns the variations on the delay magnitude; (b) high accuracy so that it can meet the application requirements in terms of the detection time; and (c) an over biased estimate of the delay magnitude in order to prevent detection mistakes. These requirements are very hard to achieve because optimizing a criteria may have a negative impact on another one. If the characteristics of the environment change often, then it is a good idea to have a timeout estimator with a high learning rate and high accuracy. However, if short and spurious changes in environment behavior are considered an estimator with fast learning rate probably make mistakes. Nonetheless, over biased estimates is a good strategy to prevent mistakes, but overestimating of the detection timeout can lead to slow detections, compromising application responsiveness. To address fast convergence, high accuracy and minimum over biased estimate, we consider timeout estimators based on end-to-end delay observation as used in traditional literature of failure detection for asynchronous distributed systems. We then designed a novel strategy based on detection availability (AV ) to suggest a safety margin (α) in such a way to decrease failure detector mistakes and to achieve the desired detection availability, as follows. If the detection service is inaccurate (i.e., AV is low), then the safety margin α is increased to improve detection accuracy; otherwise, if AV is high, then α is decreased to improve the detection speed. For a interval k, let AV L and AVk be the minimum and the observed detector availability, respectively, and let α0 = 0, the detection timeout is regulated by the following algorithm.
QoS Self-configuring Failure Detectors for Distributed Systems
133
1. Compute AV L = (T M RL − T M U )/T M RL and AVk = (T M Rk − T Mk )/T M Rk ; 2. Calculate ek = AV L − AVk and αk = αk−1 + τk−1 ∗ ek , If αk < 0 then αk = 0; 3. Use a previously selected timeout estimator of the literature to suggest the timeout C U L rtoC k and so compute the detection timeout by rtok = rtok +min[αk , T D −delay ].
Any timeout estimator based on delay observation which considers the requirements discussed above can be used with our novel safety margin strategy. 4.3
Monitoring Period Regulation
The goal of the monitoring period regulation is to minimize the detection time without compromising the accuracy of the detector. To attain such an objective, we implement a feedback control loop embedded in the autonomic manager. This control loop carries out three activities: (i) characterization of the computing environment resource consumption (ii) definition of the control problem and (iii) design and tuning of the controller. Because we assume that the environment characteristics change and are unknown, we treat the computing environment as a black-box system and use delay variations to help us to estimate the resource consumption. In despite of unpredictable variations of delay, we use linear equations to model resource consumption and to estimate the relationship between delay and the monitoring period. These models based on linear equations are an approximation and do not appropriately address the problem of characterizing the environment behavior. Nonetheless, they are a good tool for helping us to describe the control problem issues, to define the desired dynamic behavior of the distributed environment and to design the control law. To overcome the limitations of the linear approach we designed an adaptation law which dynamically tunes the parameters of the model, thus enabling the considered model to self-adjust given the changes in the computing environment. Each one of the considered activities for the implementation of the period regulation control loop is described in greater details below. These activities result in the algorithm presented at the end of this section. Characterization of the Computing Environment. As previously mentioned we abstract the environment as a Black-Box System (BBS). Thus, a monitor process submits a service request (i.e. a ”are you alive” message) and receives a service response (i.e. a ”heartbeat” message with the state of a monitored process) from the BBS. For the sake of estimating the execution BBS capability, we assume the minimum delay (delay L ) as an indication of BBS’ service time. This service time is later used as a reference to estimate resource consumption in the system. Similarly, in this black-box modeling we take the expected delay delay E as an indication of the BBS’ response time - such a response time will vary with the number of the service requests. The resource consumption (rc) can be estimated based on delay variations as: ⎧ 0 if delay U = delay L ⎨ E L rck = delayk − delay (1) ⎩ otherwise U L delay − delay
134
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
Resource consumption can also be modeled as a function of the monitoring period. As such, shorter period will likely lead to more resource consumption. We use this fact later to design the plant model (i.e., the BBS behavior) by correlating both views of resource consumption : observed from delay variations and expected from the monitoring period. We use the following equation to model resource consumption as a function of the monitoring period: rck = (τ U − τk−1 )/(τ U − τ L ) where τ U = T DU − delay L and τ L = delay L represent the maximum and the minimum monitoring period, respectively. Relating this equation with equation 1 we can compute delaykE = [(τ U − τk−1 )/(τ U − τ L )] ∗ (delay U − delay L ) + delay L . Calculating the derivative of delay E in function of τ , we have: υ = −ΔdelaykE /Δτk−1 = (delay U − delay L )/(τ U − τ L ) where Δxi = xi − xi−1 . With respect to a monitor pi , if we just observe the behavior of the delay variation in BBS with respect to the variation of τ , then it will be possible to compute the relationship between the input (u = Δτ ) and output (y = Δdelay E ) of BBS using a linear ARX model, a commonly applied technique [17]: yk+1 = yk + υ ∗ uk
(2)
Feedback Control Problem. The control problem is to regulate τ to obtain lower detection times using available resources. We name the maximum resource consumption with respect to pi as rcU ∈ (0, 1]. Therefore, we compute the difference between rcU and the resource consumption rck (equation 1) by: ek = rcU − rck . When e > 0, rc is lower than expected, so τ is decreased to enable a fast detection. When e < 0, rc is greater than expected so τ is increased to consume less resources and to avoid failure detector mistakes. Design and Tuning of the Controller. Period regulation is carried out using a Proportional-Integral (PI) controller[17]. This controller produces a control action (the required variation on τ ) considering the difference (ek ) between rcU and rck . A PI controller considers the following control law: uk = KP ∗ ek + KI ∗ Δt ∗
k−1
ei
(3)
i=0
where Δt is the time interval between the last and the current activation of the controller and KP and KI are the proportional and integral gains of the controller, respectively [17]. The tuning of the controller entails finding the values of the gains KP and KI which have to address the following performance requirements [17]: stability; accuracy; settling time (Ks ); and maximum overshoot (Mp ). As previously discussed, if the environment characteristics change then these performance requirements will not be guaranteed with the linear controller (PI). Additionally, the design of such linear controller requires the choice of constant time intervals, named sampling period (h), in which the controller observes the resource consumption and actuates. The sampling period depends on the magnitudes of the environment delays which are unpredictable so the sample period
QoS Self-configuring Failure Detectors for Distributed Systems
135
cannot be constant either. Moreover, traditional design based on the z-transform transfer functions (see [17]) is a valid tool only if sampling period is constant so, in principle, we could not make use of z-transform also. To address these limitations, we consider the initial setup of the PI controller, using z-transform transfer functions, and use an adaptation law for the gains KP and KI . Our initial setup uses Equations 2 and 3 to obtain the z-transform transfer υ z functions P (z) = z−1 and C(z) = KP + KI ∗ Δt ∗ z−1 which represent the BBS behavior and the PI control law, respectively. We then use P (z) and C(z) to C(z)∗P (z) define the closed loop transfer function Fr (z) = 1+C(z)∗P (z) . With these definitions, we assume that the closed loop reaches the desired output in a settling time Ks = delay U . Moreover, under moderate load conditions (i.e. mean delay greater than standard deviation), we define the maximum output overshoot as 10% (i.e. Mp = 0.1). Thus, we use the pole placement technique 1 to estimate the complex poles (cp) of the Fr (z) as: cp = m ∗ exp(±jθ), where m = exp(−4/Ks) and θ = π ∗ log(m)/log(Mp ) are the magnitude and angle of the poles cp, respectively. Thus, we define KP and KI using the following adaptation law: 1. verify if delaykL = delaykU then φ = 0 else φ = 1/(delaykU − delaykL ); 2. compute ψ = 1/(τkU − τkL ), KP = (φ − m2 )/ψ and KI = (m2 − 2 ∗ m ∗ cos(θ) + 1)/ψ
As previously discussed, delay L and delay U change over time so the closed loop poles cp change also. The proposed adaptation law adjusts KP and KI so as to handle these variations and to improve the control loop performance. Period Regulation Algorithm. The algorithm below, used to regulate the monitoring period, follows from the previous discussion. 1. if k = 0 then do ui0 = 0; 2. obtain ek = rcU − rck , adapt KP and KI , compute the integral control action uik = uik−1 + ek ∗ KI ∗ Δt and compute the proportional control action upk = KP ∗ ek ; 3. define τkL = delaykL and τkU = T DU − delaykL and obtain τk = τkL + (uik + upk ), if τk > τkU then τk = τkU else if τk < τkL then τk = τkL ;
4.4
Performance Evaluation
Setup of the Simulations. The simulations were carried out in Matlab / Simulink / TrueTime[18]. The simulated environment has three computers named c1 , c2 and c3 that are connected by a Switched Ethernet with a nominal transfer rate of 10M bps and memory buffer size of 8M bits. Messages have a fixed size of 1518bits and when a buffer overflow occurs the messages are discarded. A process in c1 monitors failures of a process in c2 . A process in c3 generates a random burst of messages in the network. This burst is generated in such a way that the mean utilization of the bandwidth is increased by 10% at each 1000ms and this utilization returns to zero after reaching 90%. The burst generation allows us to evaluate the detectors under different load conditions. The simulation is executed until 1
See [17] for a discussion.
136
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
we have transferred 104 monitoring messages (or 40000ms), approximately. The experiments compare the performance of our autonomic detector (AFD) with an adaptive detector (AD), both using the Jacobson algorithm[19] for timeout prediction. In the performance evaluation we manually configured AD with τ = 1ms and τ = 5ms. We configured AFD as follows: rcU = 0.5, T DU = 50ms, T M U = 1ms and T M RL = 10000ms. The failure detectors were evaluated by T D, T M , T M R, AV and P oM metrics. Simulation Results. Figures from 3 to 7 present the performance of the failure detectors in terms of the considered performance metrics, for varying network loads. In these figures, the x-axis and y-axis of the graphics represent the time in milliseconds and the metric which is being considered, respectively. In Figures 3, 4 and 6, T D, T M and the T M R are represented in milliseconds.
Fig. 3. Performance in terms of the Detection Time
In terms of the T D metric (Figure 3), the AD with τ = 1ms presents the lowest T D under low network load conditions but T D under high network load conditions increases significantly (the highest values of the curve). The AD with τ = 5ms presents mean T D equal to 5.5ms. The AFD varies T D with the variation of the network load but, in this case, the observed T D is always lower than 7ms and its mean is equal to 5ms.
Fig. 4. Performance in terms of the Mistake Duration
QoS Self-configuring Failure Detectors for Distributed Systems
137
In terms of the T M metric (Figure 4), the AD with τ = 1ms presents mistake durations around 10−1 ms and has mean equal to 0.70ms. the AD with τ = 5ms and the AFD have T M around 10−2 ms where the mean mistake durations are 0.05ms and 0.06ms, respectively. However, the T M showed by AFD are always lower than 0.07 and vary less than in AD.
Fig. 5. Performance in terms of the Percentage of Mistake
In terms of the P oM metric (Figure 5), the AD presents mean P oM equal to 12.0% and 9.0% for with τ = 1ms and τ = 5ms, respectively. The AFD always presents a P oM lower than 0.6% and has a mean P oM equal to 0.1%. Such better accuracy performance of AFD is due to its safety margin, and the use of longer periods for high network loads - thus contributing to more stable delays.
Fig. 6. Performance in terms of the Mistake Recurrence Time
In terms of the T M R metric (Figure 6), the AD presents an initial peak and stabilizes around 6.8ms and 55.3ms for τ = 1ms and τ = 5ms, respectively. The AFD has an initial oscillation in T M R but then it increases with time reaching 1900ms. This better performance of AFD is also due to its safety margin and longer periods for high network loads.
138
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
Fig. 7. Performance in terms of the Detection Availability
Lastly, in terms of the AV metric (Figure 7), the AD with τ = 1ms presents a mean AV around 0.9 but it shows very oscillatory behavior. The AD with τ = 5ms has an AV around 0.999. The AFD has an AV around 0.9999 and varies less than AD. Observe that AV is directly derived from T M and T M R, thus the better performance of AFD - which also performed better for T M and T M R. From the performance data presented above we can conclude that AFD has better performance in terms of detection time, mistake recurrence time, percentage of mistakes and availability, and it has a performance similar to AD in terms of mistake duration. We carried out similar experiments considering the comparison between AFD and another AD (using the Bertier approach [3]) and AFD also performed better. These results are available in a technical report at LaSiD website (http://www.lasid.ufba.br) and are omitted here due to space constraints.
5
Final Remarks
Traditional failure detection approaches for distributed systems do not support the self-configuring of the failure detector by QoS metrics. However, when the computational environment characteristics are unknown and can change, selfconfiguring is a basic ability to guarantee the tradeoff between response time and availability. A self-configuring ability requires the modeling of the dynamic behavior of the distributed system which is a great challenge when the computing environment can change. To address these challenges, this paper presented the design and implementation of a novel detector based on control theory and which is able to dynamically configure the monitoring period and detection timeout following the observe changes in the computing environment and according to user-defined QoS constraints. We developed a series of experiments in a simulated environment to verify the performance of the autonomic failure detector (AFD) in terms of the speed and accuracy. These evaluations allowed us to observe how fast and how accurate was the detector in different scenarios of network loads. Even without similar works for a direct comparison, the evaluations considered comparisons with different configurations of an adaptive failure detector (AD) available in literature and manually configured with different monitoring
QoS Self-configuring Failure Detectors for Distributed Systems
139
periods. The simulations demonstrated that the AFD indeed could dynamically regulate the monitoring period to achieve the desired QoS and, in most cases, our approach performed better than the AD considered. Since the primary goal of our research is to provide a mechanism to dynamically adjust the monitoring period, AFD was designed to encapsulate any available AD not just the one used in the implementation and evaluation presented. This makes our solution general enough to take advantage of any other AD with better performance. Finally, the evaluations have considered only local networks which are usual in cluster computing environments. In future works, we are going to evaluate our approach in WAN scenarios and apply it to other mechanisms related to the management of autonomic applications under development at LaSiD [20].
References 1. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996) 2. Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectores. IEEE Trans. on Computer 51(2), 561–580 (2002) 3. Bertier, M., Marin, O., Sens, P.: Performance analysis of hierarchical failure detector. In: Proc. of Int. Conf. on DSN, USA, pp. 635–644. IEEE Society, Los Alamitos (June 2003) 4. Lima, F.R.L., Macˆedo, R.J.A.: Adapting failure detectors to communication network load fluctuations using snmp and artificial neural nets. In: Maziero, C.A., Gabriel Silva, J., Andrade, A.M.S., de Assis Silva, F.M. (eds.) LADC 2005. LNCS, vol. 3747, pp. 191–205. Springer, Heidelberg (2005) 5. Nunes, R.C., Jansch-Pˆ orto, I.: Qos of timeout-based self-tuned failure detectors: the effects of the communication delay predictor and the safety margin. In: International Conf. on DSN, pp. 753–761. IEEE Computer Society, Los Alamitos (July 2004) 6. Falai, L., Bondavalli, A.: Experimental evaluation of the qos failure detectors on wide area network. In: Proc. of Int. Conf. on DSN, pp. 624–633. IEEE CS, Los Alamitos (July 2005) 7. Xiong, N., Yang, Y., Chen, J., He, Y.: On the quality of service of failure detectors based on control theory. In: 20th Int. Conf. on Advanced Information Networking and Applications, April 2006, vol. 1 (2006) 8. Satzger, B., Pietzowski, A., Trumler, W., Ungerer, T.: A lazy monitoring approach for heartbeat-style failure detectors. In: 3rd Int. Conf. on Availability, Reliability and Security (ARES 2008), March 2008, pp. 404–409 (2008) 9. Mills, K., Rose, S., Quirolgico, S., Britton, M., Tan, C.: An autonomic failuredetection algorithm. In: WOSP 2004: Proc. of the 4th int. workshop on Software and Performance, pp. 79–83. ACM, New York (2004) 10. IBM: Autonomic computing: Ibm’s perspective on the state of information technology. Technical report, IBM Coorporation, USA, New York (2001) 11. Ogata, K.: Discrete-Time Control Systems, 2nd edn. Prentice-Hall, Upper Saddle River (1995) 12. Cristian, F.: Understanding fault-tolerant distributed systems. Communication of the ACM 34(2), 56–78 (1991)
140
A. Santos de S´ a and R.J. de Ara´ ujo Macˆedo
13. Birman, K.: Replication and fault-tolerance in the ISIS system. ACM SIGOPS Operating Systems Review 19(5), 79–86 (1985) 14. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985) 15. So, K.C.W., Sirer, E.G.: Latency and bandwidth-minimizing failure detectors. In: EuroSys 2007: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pp. 89–99. ACM, New York (2007) 16. Felber, P.: The Corba Object Group Service: A Service Approach to Object Groups ´ in CORBA. PhD thesis, Ecole Polytechnique F´ed´erale De Lausanne (1998) 17. Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. Wiley Interscience, Canada (2004) 18. Henriksson, D., Cervin, A.: Truetime 1.13 - reference manual. Tech. Report TFRT7605-SE, Dep. of Automatic Control, Lund Institute of Technology (October 2003) 19. Jacobson, V.: Congestion avoidance and control. In: Proc. of the Sigcomm 1988 Symp., Stanford, Ca, vol. 18(4), pp. 314–329 (1988); ACM CC Review 20. Andrade, S.S., Macˆedo, R.J.A.: A non-intrusive component-based approach for deploying unanticipated self-management behaviour. In: Proc. of IEEE ICSE 2009 W. Software Engineering for Adaptive and Self-Managing Systems (May 2009)
Distributed Fault Tolerant Controllers Leonardo Mostarda, Rudi Ball, and Naranker Dulay Imperial College London, UK SW7 2AZ {lmostard,rkb,nd}@imperial.ac.uk
Abstract. Distributed applications are often built from sets of distributed components that must be co-ordinated in order to achieve some global behaviour. The common approach is to use a centralised controller for co-ordination, or occasionally a set of distributed entities. Centralised co-ordination is simpler but introduces a single point of failure and poses problems of scalability. Distributed co-ordination offers greater scalability, reliability and applicability but is harder to reason about and requires more complex algorithms for synchronisation and consensus among components. In this paper we present a system called GOANNA that from a state machine specification (FSM) of the global behaviour of interacting components can automatically generate a correct, scalable and fault tolerant distributed implementation.
1
Introduction
Programmers often face the problem of correctly co-ordinating distributed components in order to achieve a global behaviour. These systems include sense and react systems, military reconnaissance and rescue missions, autonomous control systems as found in aviation and safety critical systems. The common approach used is to build of a centralised control system that enforces the global behaviour of the components. The advantages of centralised co-ordination are that implementation is simpler [1,2] as there is no need to implement synchronisation and consensus among components, furthermore many tools are available for the definition and implementation of centralised controllers [3]. Existing distributed solutions are typically application-specific and require that the programmer understands and implements (often subtle) algorithms for synchronisation and consensus [4,5]. In this paper we present a novel approach to generate a distributed and faulttolerant implementation from a single Finite State Machine (FSM) definition of a global behaviour. We model the system as a set of components providing and requiring services. Co-ordination (global behaviour) is defined by a global FSM that defines the interactions among sets of components. Sets provide support to group available components at runtime and allows the selection of an alternative instance of a component in case of failure. A global state machine is automatically translated into a collection of local ones, one for each set. A FSM Manager at
This research was supported by UK EPSRC research grants EP/D076633/1 (UBIVAL) and EP/E025188/1 (AEDUS2).
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 141–154, 2010. c IFIP International Federation for Information Processing 2010
142
L. Mostarda, R. Ball, and N. Dulay
each host is responsible for handling the events and invocations for its local state machines and ensuring correct global behaviour. A Leader is responsible for the management and synchronisation of FSM Managers. This is achieved through an extension of a Paxos-based consensus protocol that implements correct, scalable and fault tolerant execution of global FSM. In particular scalability is obtained by using different optimisations that are derived from the FSM structure. Various approaches could benefit from having automatically generated distributed implementations provided by a centralised specification. For instance the automatic synthesis of component based applications such as [6,7] are commonly used to generate a centralised controller where global state machines are obtained through composition and can have millions of states. Such approaches would benefit from our distribution approach since it ensures correctness and provides scalability. We have implemented our approach in a system called GOANNA that takes as input, state machines and generates as output, distributed implementations in JAVA, C or nesC. The system is being used to develop distributed applications for sensor networks, unmanned vehicles and home networks [8,9] and could be used as backend for tools that produce centralised controllers using finite state machines [7]. The main contributions of this paper are: (1) the GOANNA platform that supports the co-ordination of components as a state machine specification and automatically generates a distributed implementation utilising a Paxos-based consensus protocol; (2) we show that the system guarantees the global behaviour in the presence of node and communication failures or new nodes discovered at runtime; (3) we show that our optimisations significantly increase scalability with respect to the number of components and present the time overhead of the runtime system.
2
Overview
In this section we overview how we specify and distribute the co-ordination for a small fire alarm sensor system [8]. The system is composed of temperature and smoke sensors and a sprinkler actuator. The basic requirement is that the sprinkler should operate only when the temperature and smoke readings exceed a threshold. 2.1
System Model and State Machines
We assume the system is composed of a set of components that provide and require services. Components can be already bound together. In our fire alarm system we have Sprinkler components providing the services waterOff() and waterOn() to enable and disable water flow and Temperature and Smoke components requiring the services tempEvent(int val) and smokeEvent(int val) where val denotes the value of the temperature and smoke, respectively. Our global FSM specifies the sequence of events (resulting from component interactions) that are permitted in the running system and can proactively invoke services. More specifically, a global state machine is defined by a list of
Distributed Fault Tolerant Controllers
143
event-state-condition-action rules defined in terms of participating components (grouped into sets as described below). In Figure 1 we show the state machine for the fire alarm application. Each rule states that when the system is in a given state, the related event is observed and the condition is true then an action is applied. For example the rule of line 10 states that when the state is 1, the event smokeEvent is observed on a smoke and the smoke value is greater than 20 than the water must be enabled and the state changed to 2.
t e m p E v e n t on t e m p e r a t u r e S e t from * 0 -1: event . val >50 -> {} 3 -4: !( event . val >50) -> { signal to s p r i n k l e r S e t waterOff () ;}
3 4 5 6 7
s m o k e E v e n t on smokeSet from * 1 -0: !( event . val >20) -> {} 1 -2: event . val >20 -> { signal to s p r i n k l e r S e t waterOn () ;}
8 9 10 11 12
waterOn on s p r i n k l e r S e t from * 2 -3: {} -> {}
13 14 15
on timeout (10000) 2 -2:{} -> { signal to s p r i n k l e r S e t waterOn () ;}
16 17 18 19 20 21 22
}
waterOff on s p r i n k l e r S e t 4 -0: {} -> {}
1 smokeEvent on smokeSet
smokeEvent on smokeSet
2
tempEvent on temperatureSet 0
waterOff on sprinklerSet
global fsm f i r e A l a r m( set Smoke smokeSet , set T e m p e r a t u r e temperatureSet , set S p r i n k l e r s p r i n k l e r S e t) {
tempEvent on temperatureSet
1
4
waterOn on sprinklerSet 3
2
from * timeout(10000)
Fig. 1. The GOANNA global FSM
Fig. 2. The graphical form of the global FSM
Events. Component interactions currently map to four GOANNA events, two for client endpoints (outgoing call and returned reply) and two for server endpoints (incoming call and outgoing reply). In the global finite state machine of Figure 1 we show events expressed using our syntax. For example, the event "tempEvent on temperatureSet from *" corresponds to a tempEvent returned reply on a Temperature component inside the temperatureSet. In this case we do not specify the component sending the reply (a hardware sensor). Timeout events are also supported and are generated by GOANNA when no rule has been applied within the specified time t. For example, timeout(10000) will raise a timeout after 10 seconds if no other rule has been applied. State-condition-action rules. For each event the global FSM can define a list of state-condition-action rules. A rule is of the form qs − qd : condition → action where qs and qd are states. When the event is observed, the state of the global FSM is qs and the condition is true then the action can be applied and the state changed to qd . When an event is observed but no rule can be applied (the condition does not hold or there are not relevant transitions) then a reaction policy can be applied such as discard (the event).
144 1 2 3 4 5
L. Mostarda, R. Ball, and N. Dulay
configuration f i r e A l a r m C o n f i g u r a t i o n ( floor : int ) set t : T e m p e r a t u r e where place == floor set sm : Smoke where place == floor set sp : S p r i n k l e r where place == floor instance is : f i r e A l a r m(t , sm , sp ) ;
Fig. 3. Fire alarm configuration
2.2
System Configuration
GOANNA system configurations specify both component sets and global FSM instances. Sets provide support to classify components as they are discovered and allows the selection of an alternative instance of a component in case of failure. Sets are grouped by component type and by a where predicate that can use attributes such as host name, position, node capabilities, to group components when they are discovered. In Figure 3, the three sets: t, sm and sp will group all components of types Temperature, Smoke and Sprinkler respectively running on floor floor of the building. Components join and leave the set at run time. When an action from the global FSM must be performed an instance from the appropriate set is selected. This removes the need to manage the availability of components from the state machine specification and allows the selection of a new component in case of failures. Sets are used to implement the following asynchronous best-effort primitives: (i) signal to set call and (ii) signal to c in set call. The former is used to invoke the method call on all components belonging to set while the latter to call the same service on exactly one component c. Global FSM definitions can be multiply instantiated. For example, for our fireAlarm application we could instantiate a global FSM for each floor in a building. 2.3
Distribution
GOANNA automatically decomposes a global FSM into a collection of local ones (see Figure 4), one for each set plus, a special FSM which contains all timeout events defined in the global FSM. In the following we refer to this timeout state machine as the skeleton. The FSM Manager local to each host uses the local FSMs to validate the component interactions related to the host it resides on while the leader uses the skeleton to execute timeouts. The leader also contains the correct state of the global FSM. When a FSM manager must validate an event 2 1
leader
decomposition
global fsm 3
Sp
Sp-local FSM
fsm state=1 skeleton
Manager fsm state=1
Manager fsm state=1 synchronisation
T-local FSM
Sm-local FSM
T Sm
T=temperature Sm=smoke Sp=sprinkler
Fig. 4. Centralised Control System - Distribution Implementation
Distributed Fault Tolerant Controllers
145
(w.r.t. the behaviour defined by the global FSM) it uses its local FSMs and its local state of the global FSM. This state (even if out of date) can be sufficient to reject locally the event reducing the number of synchronisations with the leader. If the FSM manager can accept the event it tries to propose a new state to the leader. The leader denies the proposal request when the FSM manager has an outdated state (in which case the leader updates the FSM manager with the correct state), otherwise the leader grants the proposal request. In this case the FSM manager performs the actions from its local FSM and synchronises the leader with the new state. 2.4
State Machine Consensus Protocol
Our consensus protocol extends Multi-Paxos with Steady State [10] with additional information in order to have a correct distributed state machine implementation. More specifically, it adds all information needed to execute actions (from the FSMs) and correctly parse system traces. This is achieved using timeouts to manage the one-to-one communications between FSM managers (executing the action) and the leader checking it. Multi-Paxos is normally described using client, acceptor, learner, and leader1 roles. In our implementation the client, acceptor and learner roles are included in our FSM Manager. The basic idea is that a FSM manager locally verifies event acceptance before proposing its new state. After a new state proposal request the leader can either decline the request (e.g., the FSM manager’s state can be out of date) or accept it, waiting for the action to complete and the new state to be updated. Although these steps are the basis for correct distribution they are not efficient in terms of memory and traffic overhead. State machines (automatically generated from high level tools [7]) can be composed of millions of states so their deployment on each host can be inefficient. Moreover, FSM managers could continuously propose their new local states overloading the network. Our global state machine distribution process offers a partition of the FSM transitions and are loaded only when needed while our protocol takes advantage of the state machine structure in order to avoid useless protocol instances. The idea is that an outdated local state can be enough to reject an event (see Section 3.6 for details). 2.5
Fault Tolerance Model
In GOANNA we make the following assumptions: (i) software components fail independently from their FSM Manager; (ii) FSM Managers can fail and recover; (iii) the leader fails and stops (but a new leader from a ranked set of nodes will be chosen) ; (iv) the ranked leaders control each other using reliable communication; (v) we assume a set of backups that are used as a stable storage for the last state accepted by the leader. These assumptions are used to guarantee that a transition of a global FSM is performed if a FSM Manager can select an available component, the leader is running and the majority of backups are running. 1
The leader is also known as the proposer.
146
3
L. Mostarda, R. Ball, and N. Dulay
Distributed State Machine Co-ordination
In this section we describe in detail how GOANNA generates a distributed state machine implementation from a global FSM specification. 3.1
The System Model
We first introduce some notation used to describe the system model. The set E denotes the set of all possible component events while e1 , . . . en are elements in E. The set E c denotes the set of events locally observed on a component c and ec1 . . . ecn elements in E c . We use Ts to denote all possible traces (i.e., sequence of events) inside the system. We use Tc to denote all traces local to a component c. Traces are subject to the happened-before relation (→) [11], i.e., a message can be received only after it has been sent. A system trace Ts can be obtained through a linearisation [12]. The basic idea is that all component-local traces can be merged by using the following two rules: (i) all independent events from different traces can occur in any order in the merged trace; (ii) events within the same trace must retain their order. 3.2
State Machine Definitions
A state machine is used to define the correct traces in our distributed system. In the following we provide the FSM formal definition. Definition 1. A state machine is a 4-tuple A = (Q, q0 , I, rules) where: (i) Q is a finite set of states; (ii) q0 ∈ Q is the initial state; (iii) I is a finite set of events s.t. I ⊆ E; and (iv) rules is a list of 5-tuples (e,qs ,qd ,condition,action) where e ∈ E and qs , qd ∈ Q. Definition 2. Let A = (Q, q0 , I, rules) be a state machine and e ∈ I be an event. Let q be the current state of A. The event e can be accepted by a rule (e,qs ,qd ,condition,action) in rules if q = qs and the condition is satisfied. Definition 3. Let A = (Q, q0 , I, rules) be a state machine and t = e1 . . . ei . . . a trace in TS . Let q0 be the initial state of A and e1 be the first symbol to read. A accepts the sequence t if for each current state qi−1 and next symbol ei , A can accept ei by a rule (ei ,qi−1 ,qi ,condition,action). When the rule is applied the action is performed, qi is the new state of A and ei+1 the next symbol to read. The language TA recognised by a state machine A is composed of all traces accepted by it. Events outside the FSM alphabet are ignored (i.e., they are not subject to the FSM validation). When different state machines are defined the event must be accepted by all of them. We emphasis that TA is a subset of Ts (all possible system traces). More specifically a global FSM defines all permitted traces inside the system.
Distributed Fault Tolerant Controllers
3.3
147
Local State Machine Generation
In order to distribute the global state machine, we first need to decompose it into a set of local ones, one for each set, and a skeleton. We use A = (Q, q0 , I, rules) to denote a global state machine, sc to denote a set s defined over the component type c and Asc = (Qsc , qs0 , Isc , rulessc ) to denote the local state machine assigned to the set sc . In order to generate all local state machines we consider all sets defined in the global FSM. For each set sc we generate the local state machine Asc by examining the global state machine A for rules of the form R = (ec , qs , qd , condition, action). Every time one of these rules is found, the event ec is added to Isc , the states qs and qd are added to Qsc and the rule R is added to rulessc . In other words the state machine Asc contains all interactions that take place locally on a component of the type c belonging to the set sc . The skeleton Ak contains the list of all timeout rules. 0 2
smokeEvent on smokeSet
smokeEvent on smokeSet
0 1
4
tempEvent on temperatureSet tempEvent on temperatureSet
2
1
timeout(10000) 3
waterOff on sprinklerSet 3
0
(b) tempEvent localfsm
(a) smokeSet-local fsm
waterOn on sprinklerSet
4 (c) sprinklerSet-local fsm
2 (d) skeleton
Fig. 5. Generated local state machines and skeleton
In Figure 5 we show all local state machines generated from the global state machine of Figure 2. We emphasise that each transition has been projected locally to the set that its event relates to. Effectively our distribution algorithm defines a partition of the global state machine transitions. 3.4
Successful Protocol Execution
In Figure 6 we show the global flow of a successful protocol execution. We denote with Ai a component instance of the type A. The protocol starts when an instrumentation point related to a component instance c detects an incoming/outgoing message. This generates an event e and invokes the procedure validate(c,e) on its local FSM manager. This finds all set sc the component c belongs to, loads the related local FSM Asc and looks for a state qi where the event e can be Ai,qi
C instrumentation point
manager
validate(c,e)
lock
key
leader
backups
Ai,qi
propose(result) key
lock ()
response Ai,qi
actionExecuted(key,newStates)
accepted
accept(newStates) accepted(newStates) unlock ()
Fig. 6. Successful protocol execution
148
L. Mostarda, R. Ball, and N. Dulay
accepted (see Definition 2 for the definition of acceptance). If the event can be accepted the FSM manager starts the protocol by sending a propose(result) request to the leader containing the fsm instance name Ai and new proposed state qi . The leader receives the request and compares the received state qi with its local state, e.g., qi. Moreover it checks whether or not the fsm Ai has been locked by another FSM manager. Suppose that the states are the same (qi == qi) and no fsm instance has been locked. Then the leader generates a new key and responds with a response data structure to the FSM manager. This structure contains a key (denoting the protocol instance) and an outcome (set to accepted). With this answer the leader promises to the FSM manager the lock on the required fsm instance Ai. The FSM manager receives the response, performs the local actions (from the rules of the local FSM), and sends back to the leader an actionExecuted(key, newStates) response where newStates contains the new state after the execution of the rule. The leader receives the request and checks the existence of the key. In case the key exists it deletes the key, unlocks the fsm instance Ai and updates its local state with the received one. The process of updating the state uses a Multi-Paxos protocol with Steady State. More specifically, the new state is sent to a set of backups through an accept request. When the majority of them notify the update (through an accepted request) the protocol can correctly terminate. When multiple state machine instances are defined the FSM manager must check the event acceptance for all of them. As for the aforementioned execution if the event is accepted the FSM manager starts the protocol but communicates all the states, locks all state machine instances and applies all actions (when it receives the grant from the leader). 3.5
Protocol Exceptions
A protocol instance can raise a manager out-of-sync and fsm instance locked exceptions. A manager out-of-sync exception is raised when any of the state sent by the FSM manager and the leader one are different. This is a consequence of a FSM manager whose proposed states are not synchronised with the global execution and is detected and notified by the leader. In particular after the leader receives the request propose(result) it replies with an out-of-sync error containing its state (i.e., the most updated one). This is used by the FSM to updated its local state. A locked exception is generated when a FSM manager proposes a state related to an instance Ai that has been locked by another FSM manager. In this case the leader sends back a response data structure with the locked error. Failures on FSM managers and their communication links are handled in our protocol using timeouts. They are needed to manage the asynchronous oneto-one communications between FSM managers (executing the action) and the leader checking it. In the following we describe those failures and how they are handled by the leader and by FSM managers.
Distributed Fault Tolerant Controllers
149
A leader can see a FSM manager or link failure during three possible steps of the protocol execution: (i) when it is responding to a propose request (propose response failure); (ii) while waiting for an actionExecution message (action execution timeout ); (iii) when responding to an actionExecution message (action execution response failure). These failures can be a result of a FSM manager fault, a communication failure or a slow (overloaded) FSM manager. A propose response failure occurs when the leader fails to communicate to a FSM manager the outcome of a proposal of states (i.e, a response data structure). In this case a timeout is raised on the leader side that deletes any key or lock granted. An action execution timeout occurs when a FSM manager receives the permission to execute its local actions but it does not respond with an actionExecuted message. In this case the timeout is triggered on the leader side. This causes the key to be deleted (i.e., the protocol instance to be ended) and all FSMs to be unlocked. It is worth mentioning that even if the FSM manager sends an actionExecuted invocation after the timeout expires this will be detected (the key is no longer existent) and the FSM states will not be updated. Therefore in the case of non-recoverable actions the global execution can be inconsistent.The action execution timeout provides resilience to component failures. When one component fails to execute its action the leader does not update the FSMs (that is, the global behaviour did not progress), it times out and waits for a new request. In this way a new component instance (correctly synchronised) can still perform another action. An action execution response failure occurs when the leader correctly receives an actionExecution message from a FSM manager but fails to acknowledge the reception. In this case the leader ends the protocol instance and waits for the next request. A FSM manager can see a leader or link failure during four possible steps of the protocol execution: (i) when invoking a propose request (propose invocation failure); (ii) while waiting after the propose request (propose response failure); (iii) when invoking the actionExecution (actionExecution invocation failure); (iv) while waiting the actionExecution response (actionExecution response failure). These failures can be a result of a leader fault, a communication failure or slow leader execution. In all cases the FSM Manager ends the protocol execution and returns an error to the instrumentation point. In our protocol, we have a set of ranked leaders. While the highest ranked leader is servicing FSM Managers the lower ranked leaders monitor the highest ranked leader for failure. More specifically when the highest ranked leader is no longer detected, the next leader in the rank is elected. This recovers all correct global states from the backups. An error on the protocol execution is always returned to the instrumentation point that can be programmed to implement different reactions such as retry the parsing, discard the event and so on. One should be aware that there are cases in which the protocol may not make any progress. For instance this is the case in which the same FSM manager is always granted permission and always fails. In order to avoid this kind of livelock the leader always chooses a random FSM manager when granting permission.
150
L. Mostarda, R. Ball, and N. Dulay
Our distributed FSM implementation has been proved to be correct by showing that a linearisation of the traces produced by the FSM Managers (see Section 3.1) is always accepted by the global FSM (see the extended technical report of this paper for details [13]). 3.6
Protocol Optimisations
While our protocol solves the general problem of consensus among FSM managers, the state machine structure can allow the addition of different optimisations. In drop duplicate requests the FSM manager buffers each result data structure that has been sent with a propose request. Any further propose that contains the same state machine instance Ai with the same state qi is locally buffered and held until the first request has returned its result. If the result contains an error related to Ai then the same error is returned for all instrumentation points, otherwise if the request has been accepted the FSM manager waits for the action to complete and releases one of the requests. Grouping allows different operations to be sent in the same message reducing the number of messages sent. For instance all signal requests related to the same action execution are grouped together and sent in a single message. The drop unreachable requests optimisation avoids sending propose requests that are certain to be dropped. This is based on the reachableA : Q × Q → Bool function that is derived from the structure of a global state machine A. In particular, reachableA (qs , qd ) is true when the state qd is reachable from the state qs and false otherwise. The FSM manager keeps track, for each instance Ai, of the last updated state qs . Before proposing a new state qd the FSM manager verifies reachable(qs , qd ). When reachable(qs , qd ) is false the event is locally rejected without interacting with the leader. Effectively, the proposed state qd cannot be reached from qs .
4
Evaluation and Results
A more extensive evaluation of results can be found in [13]. GOANNA for Java 1.5 was evaluated on a 100 Mbit network using a cluster of 50 Intel Pentium architecture machines, each operating with at least 2 GB of RAM running the Linux operating system. As many as 2600 Components (sensors) were executed. Experiments sought to (a) validate the GOANNA implementation, (b) measure the outcome of induced failures (killing and rebooting hosts) and (c) highlight the performance optimisations resulting from using the FSM structure. The Average Event Time (AET ) represents the time taken for the validate(c,e) to complete (Figure 6). AET measures the time taken for a FSM Manager to validate a component interaction event. Throughput was represented as the average number of requests that a Leader could handle per second. Each experiment created FSM Managers and allocated a set of Components (sensors) to each. The scenario used considered a GOANNA hierarchy made up of a single Leader, multiple FSM Managers and multiple Components.
Distributed Fault Tolerant Controllers
151
Execution Overhead. Execution overhead was measured using AET to maintain consistency between experiments. The evaluation was performed on several configurations. Experiments considered systems with between 10 and 125 components. Each sensor Component was executed in a separate thread and sent a reading every 400 ms. For configuration A, all sensors were run on the same host. For configuration B, two hosts ran half of the sensor instances each. For configuration C, a third of the sensors were run on each host. For configuration D, a tenth of the sensors were run on each host (summary in Figure 7).
Fig. 7. Event Validation Time
Fig. 8. Throughput of validation requests
In configuration A, the average time for a FSM manager to validate a component interaction was 834 ms. In the configuration B, it was 404 ms. In configuration C, it was 280 ms. In configuration D, it was 76 ms. This shows that the protocol scales linearly when components are distributed across different hosts. Figure 8 shows the throughput of the Leader and FSM Managers - the number of validate requests handled per second and the scalability of the approach. Average Event Time Performance Measurement. Measuring the average event time performance we considered a standard system containing a Leader, three Backups, m FSM Managers and 52 Components per FSM Manager (50 temperature sensors, 1 smoke sensor and 1 sprinkler). An increase in FSM Managers multiplied the number of Components providing sensor data to the Leader, using the hierarchical communication network we saw a distribution of load commonly seen in hierarchical network architectures (Figure 9). This would place increased load on the Leader. Baseline performance was seen to increase from 360 ms for 520 Components to 1196 ms for 2600 Components a change of 835 ms with a 5 fold increase in total Components - 167 ms cost per 520 Components added to the system. This performance reduction while significant presents the opportunity to further distribute load amongst Leaders for the maintenance of a low AET. Fault Tolerance. While previous performance tests had considered GOANNA in simple initialisation and computation phases, failures were introduced to
152
L. Mostarda, R. Ball, and N. Dulay
Fig. 9. Base Average Event Time Performance
Fig. 10. Kill Performance
measure the capacity for GOANNA to deal with the addition (booting) and removal (killing) of FSM Managers and Components at runtime. Three Backups including a single leader and a collection of m FSM Managers (M) were started and components allocated. Each FSM Manager’s components were instructed to operate and provide sensor readings for 30 seconds. – Kill. In each experiment a set of 50 FSM Managers were seen to operate normally. We introduced failures by killing the FSM Manager process on individual hosts. Given the kill scenario, once a FSM Manager was removed from the GOANNA system it was not restarted. Reduced FSM Manager load on the Leader was seen to improve the performance of GOANNA, with reduced AET after a FSM Manager had been killed. Figures 10 and 12 represents the final AET value reported for the Kill experiment, illustrating the reduced AET where increasing the number of FSM Managers killed. Significant maximum values exist due to GOANNA timeouts occurring after FSM Managers have been killed. Figure 10 illustrates the Maximum, Minimum and Average Event Time occurring where FSM Managers are removed from the GOANNA system. The Leader was seen to handle the removal and time-out of FSM Manager messages gracefully. – Kill-Reboot. GOANNA was seen to perform consistently and normally where FSM Managers were first killed and then rebooted. As rebooting restarted, FSM Manager processes were seen to not adversely affect the AET of the overall system. AET had limited fluctuation between 1143 ms and 1271 ms for increasing FSM Managers. Figure 11 shows the Maximum, Minimum and Average Event Time occurring where FSM Managers are killed and then rebooted. Comparison. GOANNA is seen to continue to execute normally where FSM Managers were removed from the system (Figure 12). A Leader’s performance was seen to adapt to the loss of FSM Managers, improving provision of service to FSM Managers which still existed. We see this as a reduction in AET. In the instance where FSM Managers were rebooted the system performance was seen to behave as if no failure had occurred in the system - results mimicked a faultless system.
Distributed Fault Tolerant Controllers
Fig. 11. Kill-Reboot Performance
4.1
153
Fig. 12. Comparison of Performance: Kill and Kill-Reboot
Summary
We attribute the success of GOANNA to both the GOANNA protocols and the hybrid FSM approach used. The increased AET outcome, given increased Components, commonly affects such distributed systems and was expected. Performance degradation for systems exceeding 2600 components produced a shallow derivative in terms of AET performance loss (Figure 9). Thus, it is possible to facilitate extra distribution of Leaders in a scalable and strategic manner, tuning the number of Leaders or FSM Managers to achieve a specific AET performance that a system builder deems acceptable for a given sensor deployment.
5
Related Work
Various techniques have been developed in order to generate a distributed implementation from a logically centralised specification. In [14] the authors use an aspect-oriented approach in order to automatically generate the global behaviour. They specify component definitions and aspects related to functional and non-functional requirements. Some of the aspects are used to weave components together. Our global state machines offer a more structured way to specify the global behaviour and can also be used in property verification. In [15] the authors propose a monitoring-oriented approach. They combine formal specifications with implementation to check conformance of an implementation at runtime. System requirements can be expressed using languages such as temporal logic. Specifications are verified against the system execution and user-defined actions can be triggered upon violation of the formal specifications. Although this approach allows the specification of global behaviour, it is verified by a centralised server. In contrast, in our approach all conditions and predicates are executed locally. Our earlier work [16] performs state-machine monitoring, but on closed distributed systems and assumes no failures. GOANNA supports active co-ordination, dynamic systems, and fault-management using consensus. In [17] the authors present a workflow engine that simulates distributed execution by migrating the workflow instance (specification plus run-time data) between execution nodes. In [18] they split the specification into several parts in order to have a distributed execution. However, this approach defines a set of independent communicating entities rather than a global behaviour.
154
6
L. Mostarda, R. Ball, and N. Dulay
Conclusions
In this paper we have described GOANNA, a system that models the co-ordination of component-based systems as a global state machine specification and automatically generates a correct, scalable and fault-tolerant implementation. GOANNA decomposes global state machines into local ones, and uses a consensus protocol to synchronise them. The system guarantees the global behaviour in the presence of fault and supports the introduction of new component instances at runtime.
References 1. Guerraoui, R., Rodrigues, L.: Reliable Distributed Programming. Springer, Heidelberg (2006) 2. Oppenheimer, P.: Top-down network design. Cisco system inc. (2004) 3. Schroeder, B.A.: On-line monitoring: A tutorial. IEEE Computer 28, 72–78 (1995) 4. Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: PODC 2007, pp. 398–407 (2007) 5. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: OSDI (2006) 6. Giannakopoulou, D., Pasareanu, C.S., Barringer, H.: Assumption generation for software component verification. In: ASE, pp. 3–12 (2002) 7. Penna, G.D., Magazzeni, D., Intrigila, B., Melatti, I., Tronci, E.: Automatic generation of optimal controllers through model checking techniques. In: ICINCO-ICSO, pp. 26–33 (2006) 8. Mostarda, L., Marinovic, S., Dulay, N.: Distributed orchestration of pervasive services. In: IEEE AINA (2010) 9. Pediaditakmois, D., Mostarda, L., Dong, C., Dulay, N.: Policies for Self Tuning Home Networks. In: IEEE POLICY 2009 (2009) 10. Lamport, L.: Paxos made simple, fast, and byzantine. In: OPODIS, pp. 7–9 (2002) 11. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 558–565 (1978) 12. Ben-Ari, M.: Principles of Concurrent and Distributed Programming, 2nd edn. Addison-Wesley, Reading (2006) 13. Mostarda, L., Ball, R., Dulay, N.: Distributed fault tolerant controllers. Technical report, Depart. of Computing. Imperial College, London (2010) 14. Cao, F., Bryant, B.R., Burt, C.C., Raje, R.R., Olson, A.M., Auguston, M.: A component assembly approach based on aspect-oriented generative domain modeling. Electr. Notes Theor. Comput. Sci. 114, 119–136 (2005) 15. Chen, F., Rosu, G.: Towards monitoring-oriented programming: A paradigm combining specification and implementation. Electr. Notes Theor. Comput. Sci. 89 (2003) 16. Inverardi, P., Mostarda, L., Tivoli, M., Autili, M.: Synthesis of correct and distributed adaptors for component-based systems: an automatic approach. In: ASE, pp. 405–409 (2005) 17. Montagut, F., Molva, R.: Enabling pervasive execution of workflows. In: 2005 International Conference on Collaborative Computing: Networking, Applications and Worksharing (2005) 18. Sen, R., Roman, G.C., Gill, C.: CiAN: A Workflow Engine for MANETs. In: Lea, D., Zavattaro, G. (eds.) COORDINATION 2008. LNCS, vol. 5052, pp. 280–295. Springer, Heidelberg (2008)
Automatic Software Deployment in the Azure Cloud Jacek Cala and Paul Watson School of Computing Science, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom {jacek.cala,paul.watson}@newcastle.ac.uk
Abstract. For application providers, cloud computing has the advantage that it reduces the administrative effort required to satisfy processing and storage requirements. However, to simplify the task of building scalable applications, some of the cloud computing platforms impose constraints on the application architecture, its implementation and tools that may be used in development; Microsoft Azure is no exception. In this paper we show how an existing drug discovery system — Discovery Bus — can benefit from Azure even though none of its components was built in the .Net framework. Using an approach based on the “Deployment and Configuration of Component-based Applications Specfication” (D&C), we were able to assemble and deploy jobs that include different types of process-based tasks. We show how extending D&C deployment models with temporal and spatial constraints provided the flexibility needed to move all the compute-intensive tasks within the Discovery Bus to Azure with no changes to their original code.
1
Introduction
The emergence of cloud execution environments shifts the management and access to computing and storage resources to a new, higher, and more efficient level. Often, however, it requires developers to rebuild or at least substantially re-engineer existing applications to meet new requirements. This is true for Microsoft’s Azure cloud that was designed for applications based on the .Net framework, and which supports a specific, queue-based software architecture. For many existing software systems, whose development may have consumed significant resources, it could be prohibitively expensive to have to redesign and reimplement them to fit these constraints. Therefore, the key question for us was whether it was possible for existing software to benefit from new execution environments such as the Azure Cloud without the need for significant, and so expensive, changes. This paper presents details of the automatic deployment platform we developed for Azure, driven by the need of a chemistry application performing QSAR analysis. Quantitative Structure-Activity Relationship (QSAR) is a method used to mine experimental data for patterns that relate the chemical structure of a F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 155–168, 2010. c IFIP International Federation for Information Processing 2010
156
J. Cala and P. Watson
drug to its activity. Predicting the properties of new structures requires significant computing resources and we wanted to exploit the Windows Azure Cloud in order to accelerate this process [7].
2
Motivation and Approach
The Discovery Bus is a multi-agent system that automates QSAR analysis. It implements a competitive workflow architecture that allows the exhaustive exploration of molecular descriptor and model space, automated model validation and continuous updating as new data and methods are made available [3]. Our main goal was to move to Azure as much of the computing-intensive agents as possible to make the most of the parallelization offered by the cloud. However, the key problem we encountered when moving Discovery Bus components to Azure was that none of them were created in the .Net framework. Instead, they were written in the Java, C++, and R languages. Therefore, to be able to use the Azure platform directly we needed a solution that would enable us to run and execute existing, non-.Net software in the cloud. For efficiency, and to reduce the amount of the software stack we need to maintain ourselves, this should ideally be a solution that allows the deployment of only a Discovery Bus agent instead of the whole OS stack including that agent, as in the Infrastructure as a Service (IaaS) approach. Our aim was therefore to build an open and lightweight deployment platform that can support a diverse set of existing Discovery Bus agents running in Azure. The Windows Azure platform is a combination of processing and storage services. The compute services are divided into two types of nodes — web and worker role nodes. The web role enables the creation of applications based on ASP.NET and WCF and is the entry point for external clients to any Azurebased application. In contrast, the worker role runs applications as independent background processes. To communicate, web and worker roles can use the Azure storage service comprising of queue, table and blob storage. The primary means of communication for Azure-based systems are queues. The queue-based approach aims at maximising scalability because many web role nodes can insert tasks to a queue, while many worker role nodes can acquire tasks from the queue. By simply increasing the number of workers, the tasks remaining in the queue can be processed faster [10]. The proposed anycast-like operation model [1] fits very well problems that do not require much communication apart from a single request-response pattern with no state preserved in the workers. Much of the Discovery Bus processing has this kind of communication style, but to use Azure for QSAR processing, the missing link is the ability to install and execute non-.Net components. Moreover, as these components often have specific prerequisites such as the availability of third-party libraries and a Java runtime environment, a deployment tool must allow more sophisticated dependencies to be expressed. We based our deployment solution on the “Deployment and Configuration of Component-based Distributed Applications Specification” (D&C) [11]. It defines
Automatic Software Deployment in the Azure Cloud
157
the model-based approach to deployment1 and is built according to the Model Driven Architecture (MDA). D&C specifies a Platform Independent Model (PIM) of deployment which follows the general idea that the deployment process remains the same, independent of the underlying software implementation technology. The PIM can further be customized with Platform Specific Models (PSMs), such as PSM for CCM [12], to address aspects of deployment specific to a particular software technology. The D&C specification defines one of the most complete deployment standards [5], however, it originates from object-oriented and componentbased domains, and PSM for CCM is the only existing PIM to PSM mapping. One of our intentions was to determine whether the deployment approach proposed by OMG can be used at a different virtualization level; in our case, a software component is usually an OS executable and a component instance is commonly a process in an operating system. This is in stark contrast to the definition of component proposed in the specification. One of the key qualities of model-based deployment, which distinguishes it from the script-based and language-based approaches, is a separation between a model of execution environment and model of software (Fig. 1). This improves the reusability of software and environment definitions as the same software description can be used for deployment over different execution environments, and the same environment description can be used for the deployment of different software. This separation is especially important for heterogeneous environments where deployment planning enables the matching of software components to appropriate execution nodes. Previously, Azure offered one type of resource, but recently Microsoft enabled four different computing node sizes in their cloud platform. However, model-based deployment also provides the means to express execution node resources, which is crucial for planning deployments in multi-layer virtualized environments. The deployment of a component at a lower virtualization level enriches the set of software resources, enabling further deployments at higher levels. For example, if a Java runtime environment (JRE) is deployed on a node at the OS level, that node gains the ability to run Java applications. The JRE can be expressed as a node resource, which is then taken into account during planning. These advantages motivate the need for deployment planning, and the model-based approach.
3
Architecture of the Automatic Deployment Solution
In order to make the most of the scalability offered by the Azure cloud, our deployment platform is a queue-based system that allows a web role to request the deployment of jobs on worker nodes. For the purpose of the QSAR use case, jobs represent the code of Discovery Bus agents but our platform is generic and allows potentially any kind of program to be run. Every job is described by a deployment plan that defines what to install and execute. We extended models defined in the D&C specification with some of the concepts proposed in our previous work [2]. A deployment plan can describe multiple tasks to be 1
In contrast to the script-based and language-based approaches [6].
158
J. Cala and P. Watson Application Model
Execution Environment Model
Fig. 1. Model-based deployment separates between software and execution environment models
deployed, each of which may be of different type such as Java-based applications, or Python and R scripts. The tasks can be deployed independently, or can be bound with spatial and temporal dependencies. A deployment plan may also include dependencies on other plans that need to be enacted on first, e.g. Javabased applications that require Java runtime to be available first. The overall architecture of our system is shown in Fig. 2. Plans are submitted by a web role Controller to a common queue and can then be read by workers. A submitted plan is acquired by a single worker DeployerEngine that tries to install and execute the job. Once the execution is finished, the worker returns results to the Azure blob storage where they can be found by Controller. Usually, the Controller sends different types of jobs to the queue, while the DeployerEngine matches a job to an appropriate deployer class that can handle it. To deal with the plans we distinguished two levels of deployers: operating system level and process level. All of them realize the same interface IDeployer job results Blob Storage
<< Worker Role >>
Deployer Engine Queue Storage << Web Role >>
<< Worker Role >>
Controller
Deployer Engine deployment plans << Worker Role >>
Deployer Engine
Fig. 2. An illustration of the communication path between the web role Controller and worker role Deployers
Automatic Software Deployment in the Azure Cloud
159
(Fig. 3). It follows a one-phase activation scenario, which is a slightly simplified version of the two-phase activation proposed by D&C. The IDeployer interface includes operations related to plan installation and activation, whereas the twophase deployer also includes an intermediate, initialization step. The purpose of install is to collect together all artifacts required to activate the plan on the target node. The activate operation is responsible for running the plan, and its behavior heavily depends on the level of the implementing deployer. The meaning of deinstall and deactivate is to reverse install and activation respectively. When implementing process-level deployers, we noticed that irrespective of the type of a job to be run (a Java class/jar, python or R script and a standalone executable) the install operation remains the same and only minor changes are required in the activate operation. The changes are mainly related to running a different virtual machine executable or standalone exec and setting their parameters according to the configuration properties provided in the plan. Moreover, for all process-based components we distinguished exactly the same input and output ports: standard input, standard output and standard error. This unifies the way in which the results of job execution can be returned back to Controller.
+ + + +
<>
<>
IDeployer
ITwoPhaseDeployer
install deinstall activate deactivate
+ initialize + deinitialize
<< realizes >>
Base Process Deployer
<< realizes >>
OS Library Deployer OS deployer level
Exec Deployer
Python App Deployer
Java App Deployer
R Script Deployer
process deployer level
Fig. 3. Class hierarchy showing relation between different types of deployers
Conversely, implementation of OSLibraryDeployer differs from the others significantly. The main task of this entity is to enable access to a requested software package for deployers at the higher-level. The idea is that OSLibraryDeployer installs a package and during activation exposes its contents as node resources. We have implemented an automated plan generator that produces deployment plans using manually prepared plan templates. Its main disadvantage is that it does not take node resources into account and treats all worker nodes equally
160
J. Cala and P. Watson
irrespective of what resources they currently offer. However, in the future we would like to adapt this to search for node resources during the deployment planning phase and select the node which best fits a submitted plan.
4
Expressiveness of the Deployment Solution
One of the essential elements of the model-based deployment is expressiveness of the models that are used to describe the software and the execution environment. Although D&C defines one of the most complete deployment standards, we noticed in our previous work [2] that there are some key aspects that this specification does not address. In particular, it lacks: modelling of multilayer virtualized systems, support for different software technologies, and the ability to express temporal constraints on deployment actions. In this section we present more details of how we adapted the platform independent model of deployment proposed by the OMG to our needs. The first important assumption is that in this work we address different virtualization levels than the one proposed in PSM for CCM. As mentioned above, a deployment plan can include components of different types such as OS libraries, executables and Python scripts. The key to map the D&C PIM to the virtualization levels addressed by our deployment engine was to notice that a component instance may represent either a process or an installed and active OS library. This allowed us to develop the mapping for a deployment plan and all its elements. Table 1 includes details of this mapping. The main feature of the mapping presented here is the definition of a component interface. For the process level we defined a component as having three ports: input, output and error. Using these ports, components within a plan can be connected together much like processes in an operating system shell. Moreover, they can be bound to external locations such as a URL or a blob in Azure storage. Fig. 4 depicts a sample plan comprising two tasks that run concurrently. They send their error streams to a common blob; Task1 sends its output stream to Task2 and to an external URL. Also, it acquires input from an external location. As shown in the figure, at the process virtualization level, deployment plans that represent a composite component, also have the three ports mentioned. This allowed us to unify the way in which the results of process-based jobs are returned. By default, the outcome of a deployment plan is two blobs: output and error. Any component in a plan can be bound to these ports, which means that contents of their outputs are redirected appropriately. It is also possible to redirect ports of multiple tasks to a single output and redirect a single output to multiple locations. At the process virtualization level, different components can also be combined together in a single plan using temporal and spatial constraints (Fig. 5). Temporal constraints allow the expression of component dependencies that are usually modelled in the form of a direct acyclic graph. The StartToStart and FinishToFinish collocations enable synchronization barriers to be created between components’ deployment activities, whereas FinishToStart allows instances to
Automatic Software Deployment in the Azure Cloud
161
Table 1. Mapping of D&C execution model entities to process and OS virtualization levels D&C entity
Process virtualization level
OS virtualization level
deployment plan
a composite of processes that may be interconnected and bound with spatial and temporal constraints
usually a single library component; potentially a composite of multiple libraries
component interface description
defines three ports: input, output and error, and a single property: priority
simple type referring to a library
artifact deployment description
a file; a part of program code (e.g. an executable, resource file, configuration script)
a file; a part of library code (e.g. an executable, resource file, configuration script)
monolithic deployment description
program code; groups all program files together
library code; groups all library files together
instance deployment description
a process
installed and active library; presents resources for higher level deployers
instance resource deployment description
assigned node resource; may refer to resources of the lower level OS deployer
assigned node resource (e.g. disk space)
plan connection description
connection between standard input, output and error process streams
n/a
plan property mapping
process priority
n/a
<>
Application1 input
output
output
<< process >>
Task2
input
Azure blob
error
a URL
output a URL
input
<< process >>
Task1
error
error Azure blob
Fig. 4. A deployment plan showing input and output ports of process-based tasks. Ports may refer to URLs and Azure blobs.
162
J. Cala and P. Watson
be linked in a deployment chain. We also adapted the original D&C spatial constraints2 to the process virtualization level. Instead of SameProcess/DifferentProcess, the SpatialConstraintKind allows the expression of the need for running selected processes in the same or different working directories. This may be useful when components within a plan have some implicit dependencies. For example, if they use common input files, they may need to run in the same directory. Conversely, when they produce result files with the same names, they need to execute in separate working directories.
1..*
+constrainedInstance
<< Planner >>
<< enumeration >>
PlanTemporalConstraint
Temporal ConstraintKind
+ constraint : TemporalConstraintKind
<< Planner >>
+ StartToStart + FinishToStart + FinishToFinish
Instance Deployment Description 1..*
+constrainedInstance
<< enumeration >>
Spatial ConstraintKind
<< Planner >>
PlanSpatialConstraint + constraint : SpatialConstraintKind
+ + + + +
SameNode DifferentNode SameDirectory DifferentDirectory NoConstraint
Fig. 5. Temporal and spatial constraints in the proposed deployment model
The last element of the proposed mapping is related to resources assigned to deployed components. We made a separation between the lower level, OS deployer and higher level, process deployers. Our intention is that when the OS deployer activates a library on a node, it also registers a set of resources associated with that library. This will enable a deployment planer, working at the process virtualization level, to search for the best node to deploy. For example, if a node executed a Java-based application previously, it has a JRE deployed. Therefore, any subsequent Java-based components should preferably be deployed on this node instead of any other that does not provide any JRE. Although we do not address deployment planning yet and use manually prepared deployment plans, our platform is well suited for this.
5
Application of Our Deployment Solution to Discovery Bus
Discovery Bus is a system that implements the competitive workflow architecture as a distributed, multi-agent system. It exhaustively explores the available 2
i.e. PlanLocality and PlanLocalityKind entities.
Automatic Software Deployment in the Azure Cloud
163
model and descriptor space, which demands a lot of computing resources [3]. Fortunately, the architecture of Discovery Bus makes it easy to move agents to different locations in a distributed environment. This enabled us to place the most compute-intensive agents in the cloud. However, as agents are not limited to any particular implementation technology, the key requirement for our deployment platform was to support different software technologies. These include Java, R and native applications. Our platform can deploy arbitrary jobs, however, for the purpose of the QSAR use case we prepared a number of predefined deployment plans that describe specific Discovery Bus agents. One of the most compute-intensive tasks is the calculation of the molecular descriptors. Figure 6 shows a deployment plan that we designed to run this job in Azure. First, the plan expresses a dependency on the JRE and CDK libraries. These dependencies are directed to the lower level, OS deployer that installs and activates them on an Azure worker node. Second, the plan includes an artifact description for the input data file which is provided to the descriptor calculator task on its activation. The location of this artifact is given when the plan is created. Third, the calculate descriptor program sends results to the standard output stream, which we redirect to the external output port of the plan. By default, the output of a deployment plan is stored in the Azure blob storage. Similarly, the error stream of the task is redirected to the external error port and is transmitted to the blob storage.
<< depends on >>
<< OS library >>
CDK Library
<< DeploymentPlan >>
CalculateCDKDescriptorsAgent << Artifact >>
output
InputFile output
<< OS library >>
Java Runtime Environment
<< Java app >>
Calculate CDK Descriptors
error
error
Fig. 6. A deployment plan created for a selected, Java-based Discovery Bus agent
This example shows how easy it is to prepare a suitable deployment plan for an existing application. However, there are many cases when a process-based task returns results not only via standard output or error streams but creates files, or communicates results through a network. Neither our approach, nor Azure, prevents sending data via a network, but the more difficult case occurs when the task execution produces files. As they are stored in local worker storage, we supplemented the deployment plan with a post-processing task that was able to transfer them to a desired location. Figure 7 shows a sample two-task plan that sequentially executes the main R script, and an additional post processing program that converts the results and
164
J. Cala and P. Watson
sends them to a specific location. Although this required some extra development effort, we were able to use this approach to move all of the compute-intensive Discovery Bus agents to Azure without changing any of their original code. << DeploymentPlan >>
FilterFeaturesAgent << spatial constraint >>
<< Artifact >>
<< depends on >>
SameDirectory
InputFile
<< OS library >> << R script >>
Filter Features
<< Generic app >> << temporal constraint >>
PostFF
R Runtime Environment
FinishToStart
Fig. 7. One of the predefined deployment plans used in processing a Discover Bus task
The tasks have the spatial constraint SameDirectory that they must be executed in the same working directory. This requirements stems from the fact that there are implicit dependencies between both tasks i.e. the PostFF program reads files produced by the FilterFeatures script. Therefore, we imposed the temporal constraint FinishToStart on these two tasks; as a result, PostFF cannot run until FilterFeatures completes. The plan also shows a dependency on the R runtime environment, which is not shipped with Azure by default and, therefore, needs to be deployed prior to the execution of this plan. This dependency is directed at the OSLibraryDeployer. It is not, however, the same as the temporal constraints, which create a chain of deployment actions. Instead, “depends on” demands only prior availability.
6
Related Work
The problem of deployment of computing intensive jobs in distributed systems has its roots in the beginnings of distributed systems. Although cloud computing creates new opportunities to exploit, most prior work on scientific applications has been carried out in the area of Grid Computing. Condor3 is one of the main examples of Grid-like environments that provide batch execution over distributed systems. It is a specialized workload management system for compute-intensive jobs. Serial or parallel jobs submitted to Condor are placed in a queue and decisions are made on when and where to run the jobs based on a scheduling policy. Execution progress is then monitored until completion. To match jobs to the most appropriate resources (compute nodes) Condor uses the ClassAd mechanism, which is a flexible and expressive 3
http://www.cs.wisc.edu/condor
Automatic Software Deployment in the Azure Cloud
165
framework for matching resource requests (jobs) with resource offers (machines). Jobs can state both job requirements and preferences, while compute nodes can specify requirements and preferences about the jobs they are willing to run. All of these may be described through expressions, allowing a broad range of policies to be implemented [9]. There are three key differences between Condor and our solution. First, Condor uses the ClassAd mechanism and a central manager to schedule jobs, whereas Azure proposes, and we use, the queue-based approach for scalability. However, Condor supports a flexible matching of jobs to resources, while we treat all computing resources as being indistinguishable. We leave the problem of resource discovery and deployment planning for future work. Second, the ability to express dependencies between deployment plans such as between an R script and its execution environment allows for the clear and concise definition of job prerequisites. This simplifies complex deployment in distributed environments, and is not supported by Condor. Third, we based our solution on D&C deployment models that provide the recursive definition of a component which either can be a monolith or an assembly of subcomponents. Condor instead uses the ClassAd mechanism that describes a job without any relationship to others, except where temporal constraints are expressed by directed acyclic graphs and evaluated by the DAGMan scheduler.4 We believe that temporal constraints are an important feature in a deployment plan and therefore supplemented D&C models to enable their definition. Condor does not support more sophisticated constraints, such as the resources required to realize a connection between executing jobs. Neither do we currently consider this aspect of D&C models, but we expect to extend our system to use it in the near future work. The Neudesic Grid Computing Framework (Neudesic GCF)5 is dedicated to the Azure platform. It provides a solution template and base classes for loading, executing, and aggregating grid tasks on the Microsoft cloud. Neudesic GCF offers a set of templates such as Worker, Loader and Aggregator. A Worker implements each of the computing tasks, a Loader is added to read input data from local resources and generate tasks, and an Aggregator receives results and stores them. Unlike with our solution, Neudesic GCF requires building applications from scratch. A developer is given a set of templates that they can use for programming, but as everything needs to be built in .Net, this framework was not suitable for our use case. Moreover, the Neudesic GCF assumes the homogeneity of Azure resources as all workers are identical. With the use of model-based deployment, our solution is better prepared to be extended to take into account of heterogeneous Azure computing resources. Several Cloud computing solutions are available apart from Azure. The most prominent ones are the Google App Engine (GAE) and the Amazon Elastic Computing Cloud (Amazon EC2). The Google App Engine6 is a cloud computing platform primarily dedicated to Python and Java language applications. It provides a high-level platform 4 5 6
http://www.cs.wisc.edu/condor/dagman http://azuregrid.codeplex.com http://code.google.com/appengine
166
J. Cala and P. Watson
for running distributed systems and has the built-in ability to automatically scale the number of working nodes following changes in the incoming workload. However, unlike Microsoft Azure, this environment is closed with respect to running native programs, which would require us to rewrite the existing code. Moreover, as it is aimed at building scalable web servers, GAE limits the time for processing a request to 30 seconds7 which makes it unsuitable for scientific computation in general and our QSAR scenario in particular, as this often needs much more time to complete a single job. Amazon EC28 provides a virtual computing environment controlled through a web service interface. It gives complete control over the operating system installed on a computing node, which makes it more open than Azure [8]. However, as a result, it demands more administrative and management effort, whereas Azure can adopt a declarative approach to configuration through simple configuration descriptors. Despite the fact that our solution is based on Azure, we believe that it could be ported to Amazon EC2 as well, providing the same benefits of deployment automation as for Azure. RightScale9 offers a range of products that can run on top of different cloud platforms such as Amazon EC2, Rackspace, FlexiScale. RightScale’s Grid Edition framework (GE framework) provides a queue-based solution for running batch processes. Overall, it follows similar pattern that we use. Additionally, the GE framework enables auto-scaling that can adapt the number of worker nodes in response to changing factors. The key difference between RightScale’s GE framework and our deployment solution is in the granularity of workers. The GE framework as a basic unit of deployment uses server templates. A server template is a preconfigured image of an operating system that is used to launch new server instances in the Cloud. For each particular task or set of tasks a user needs to prepare an appropriate server template. Instead, in our approach a basic unit of deployment is a process-level component that is much smaller when comparing to an OS image. Our deployment platform allows running arbitrary jobs on any active worker irrespective of which type of job it is.10 This promotes better resource sharing and guarantees more effective solution, especially for smaller and short running tasks. Moreover, our deployment plans can include many interrelated subtasks, which results in a much more expressive framework and enables the assembling of applications from existing components.
7
Conclusions and Future Work
In this paper we discuss an automatic deployment framework for the Azure cloud platform. We use the framework to run compute-intensive Discovery Bus agents 7 8 9 10
See Google App Engine Documentation (ver. 2009-12-15) Sect. What Is Google App Engine; Quotas and Limits. http://aws.amazon.com/ec2 http://www.rightscale.com Currently this is any kind of task from a supported set of task types: a Windows executable, a Java jar/class file, an R script, a Python script or a command-line script.
Automatic Software Deployment in the Azure Cloud
167
in the cloud. Despite the fact that none of the Bus components was implemented in the .Net framework, our framework allowed us to seamlessly integrate most existing agents with Azure without any code modification. The exception was for those agents that produced results in the local file system. Here, some additional development effort was required; we had to implement a simple task to transfer output files to a designated location. Then, using our deployment platform we could express spatial and temporal constraints between processes so ensuring that all the results produced are correctly transmitted. Apart from results specifically focussed on the Discovery Bus as a motivating scenario, this work showed that the platform independent model defined in the D&C specification can be successfully applied to the process and OS virtualization levels. The PIM creates a very expressive deployment framework that with only minor modifications allowed us to build composite deployment plans describing process-based applications. Further, the ability to express dependencies between deployment plans supports the clear and concise definition of component prerequisites. By using the proposed deployment framework we were able to successfully run Discovery Bus agents in Azure. To generalise the solution, we see three major directions for future work. Firstly, we would like to support the two-phase initialization of process-based components. This will enable us to distribute components included in a deployment plan over multiple worker nodes even in the case when components are interconnected. Secondly, we see the need for deployment planning to make our solution more effective. The ability to determine node resources would allow the matching of work to those nodes that best fit the submitted plan. However, our major focus remains on preserving efficient scalability for a queue-based system, while enabling resource discovery and deployment planning (an interesting approach to this problem is presented in [4]). Finally, an interesting future direction for our framework is the dynamic deployment of new deployer types. If our DeployerEngine implements the IDeployer interface, we can dynamically install and activate deployers that support new component types. This in turn will increase flexibility and facilitate the runtime evolution of the deployment platform according to changing needs. Acknowledgements. We would like to thank Microsoft External Research for funding this work. Also, we are grateful to the wider team working on the project: the Discovery Bus team (David Leahy, Vladimir Sykora and Dominic Searson), the e-Science Central Team (Hugo Hiden, Simon Woodman) and Martin Taylor.
References 1. Abley, J., Lindqvist, K.: Operation of Anycast Services. Request for Comments 4786, Best Current Practice 126 (2006) 2. Cala, J.: Adaptive Deployment of Component-based Applications in Distributed Systems. PhD thesis, AGH-University of Science and Technology, Krakow (submitted February 2010)
168
J. Cala and P. Watson
3. Cartmell, J., Enoch, S., Krstajic, D., Leahy, D.E.: Automated QSPR through Competitive Workflow. Journal of Computer-Aided Molecular Design 19, 821–833 (2005) 4. Castro, M., Druschel, P., Kermarrec, A.-M., Rowstron, A.: Scalable ApplicationLevel Anycast for Highly Dynamic Groups. In: Stiller, B., Carle, G., Karsten, M., Reichl, P. (eds.) NGC 2003 and ICQT 2003. LNCS, vol. 2816, pp. 47–57. Springer, Heidelberg (2003) 5. Dearle, A.: Software Deployment, Past, Present and Future. In: FOSE 2007: 2007 Future of Software Engineering, pp. 269–284. IEEE Computer Society, Los Alamitos (2007) 6. Talwar, V., Milojicic, D., Wu, Q., Pu, C., Yan, W., Jung, G.: Approaches for Service Deployment. IEEE Internet Computing 9(2), 70–80 (2005) 7. Watson, P., Hiden, H., Woodman, S., Cala, J., Leahy, D.: Drug Discovery on the Azure Cloud. In: Poster presentation on Microsoft e-Science Workshop, Pittsburgh (2009) 8. Amazon Web Services LLC, Amazon Elastic Compute Cloud: User Guide, API Version 2009-11-30 (2010) 9. Condor Team: Condor Version 7.5.0 Manual. University of Wisconsin-Madison (2010) 10. Microsoft Corp.: Windows Azure Queue — Programming Queue Storage. Whitepaper (2008) 11. Object Management Group, Inc.: Deployment and Configuration of Componentbased Distributed Applications Specification, Version 4.0 (2006) 12. Object Management Group, Inc.: Common Object Request Broker Architecture (CORBA) Specification, Version 3.1, Part 3: CORBA Component Model (2008)
G2CL: A Generic Group Communication Layer for Clustered Applications Leandro Sales1 , Henrique Te´ ofilo2 , and Nabor C. Mendon¸ca2 1
2
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Via Ponzio, 34/5 – 20133 Milano, Italy [email protected] Mestrado em Inform´ atica Aplicada, Universidade de Fortaleza (UNIFOR), Av. Washington Soares, 1321 – 60811-905 Fortaleza – CE, Brazil [email protected], [email protected]
Abstract. Generic group communication frameworks offer several benefits to developers of clustered applications, including better software modularity and greater flexibility in selecting a particular group communication system. However, current generic frameworks only support a very limited set of group communication primitives, which has hampered their adoption by many “real-world” clustered applications that require higher-level group communication services, such as state transfer, distributed data structures and replicated method invocation. This paper describes the design, implementation and initial evaluation of G2CL, a Generic Group Communication Layer that offers a set of commonly used high-level group communication services implemented on top of an existing generic framework. Compared to current group communication solutions, G2CL offers two main contributions: (i) its services can be configured to run over any group communication system supported by the underlying generic framework; and (ii) it implements the same service API used by JGroups, a popular group communication toolkit, which may reduce its learning curve and make the task of migrating to G2CL particularly attractive for JGroups users.
1
Introduction
Group communication, i.e., the ability to reliably transmit messages amongst a group of processes, plays an important role in the design of dependable and adaptable distributed systems [8]. This form of communication has been particularly valuable in clustered environments, where classical group communication applications include replication, load balancing, resources management and monitoring, and highly available services [7]. A group communication system (GCS) is a type of middleware that implements a set of reusable group communication services that can be useful in multiple application domains. Some of the most popular GCSs currently in use are JGroups [4], Spread [2] and Appia [20], each providing its own set of group communication primitives and protocols. Choosing an appropriate GCS for a F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 169–182, 2010. c IFIP International Federation for Information Processing 2010
170
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
given distributed application is an important design decision that can be made difficult by the fact that those systems tend to vary widely not only in terms of the communication abstractions they implement, but also in terms of the delivery semantics and quality-of-service (QoS) guarantees they provide [7]. Another difficulty is that, once a developer commits to a particular GCS, her application code becomes tightly coupled to that system’s API. Such level of coupling is undesirable for two main reasons: (i) it requires changing the application code every time the target API evolves; and (ii) it makes it extremely hard to migrate the application to a different GCS (with a different API), thus preventing developers from easily benefiting from a new (possibly more effective) GCS in the future. An interesting way for developers of distributed applications to avoid coupling their application code to the services provided by a specific middleware solution is to use a generic middleware API. Typically, such generic APIs are implemented by means of a plug-in mechanism which allows application developers to select a particular concrete middleware system at configuration time, without the need to change their application code. This approach has been successfully used in a number of distributed software domains, including structured peer-to-peer communication [9], publish-subscribe systems [22] and grid applications [21]. In terms of group communication, there have been some attempts to provide a common API for different GCSs, such as in Hedera [13], jGCS [6] and Shoal [26]. However, all those systems only implement basic operations for reliable message transmission and group management. The problem, in this case, is that some mature GCSs, such as JGroups, also offer a number of higherlevel group-related services, for instance, object state transfer, distributed data structures and transparent invocation of replicated objects. As a consequence, many “real-world” distributed applications that rely on those high-level services cannot benefit from existing generic group communication APIs. In this paper, we describe the design, implementation and initial evaluation of G2CL, a Generic Group C ommunication software Layer that implements a set of commonly used high-level group communication services, similarly to those already provided by JGroups. In contrast to JGroups, though, all services provided by G2CL are implemented on top of an existing generic API, which allows them to be easily reconfigured to run over any GCS supported by the underlying plug-in mechanism. To demonstrate the power of G2CL we have successfully used it to replace JGroups as the generic group communication solution for the JOnAS JEE application server [17]. The migration from JGroups to G2CL in the JOnAS source code has been done with relatively little programming effort, as we will describe later in the paper, and has allowed us to evaluate the impact of using different G2CL configurations on the performance of JOnAS under a variety of load conditions. These results build our confidence that G2CL can be a valuable addition to the set of programming tools currently available for developers of distributed applications. The rest of the paper is organized as follows. Section 2 gives a brief overview of two technologies that have greatly influenced our work on G2CL, namely
G2CL: A Generic Group Communication Layer for Clustered Applications
171
JGroups and jGCS. Section 3 describes the main design decisions and implementation strategies used in the development of G2CL. Section 4 reports on our initial evaluation of G2CL using JOnAS as a case study. Section 5 further discusses our results and highlights the merits and limitations of our work. Finally, Section 6 concludes the paper and outlines our future research agenda.
2
Related Technologies
2.1
JGroups
JGroups [4] was one of the first group communication toolkits written entirely in Java. It provides a simple API for accessing its basic group communication services, whose main component is the Channel interface. This interface is used to send/receive messages asynchronously to/from a group of processes, and to monitor group changes by means of the Observer design pattern [12]. Currently, JGroups offers a single implementation of the Channel interface, called JChannel.1 The Channel interface hides the actual protocol stack used by JGroups for message transmission. However, JGroups allows developers to configure their own protocol stack, by combining different protocols for message transmission (for instance, TCP or UDP over IP Multicast), data fragmentation, reliability, security, failure detection, membership control, etc. This can be done via an external XML file, whose properties are loaded by JGroups at initialization time, thus avoiding the need to change the application code directly. On top of the basic services provided by the Channel interface, JGroups implements another set of higher-level services, called building blocks [16], which offer more sophisticated group communication abstractions for application developers. These include services such as MessageDispatcher, which implements primitives for synchronous message transmission; RPCDispatcher, which implements a remote invocation mechanism for replicated objects on top of the MessageDispatcher service; and ReplicatedHashMap, which implements a distributed version of the HashMap class of Java on top of the RPCDispatcher service. Due to its great flexibility in defining customized protocol stacks, and also to its rich set of building blocks, JGroups has been a popular choice amongst clustered application developers, having recently been incorporated as part of the JBoss project [15]. 2.2
jGCS
The Group Communication Service for Java (jGCS) [6] is a generic group communication framework that aims at providing a common Java API to several existing GCSs. Its ultimate goal is to facilitate reuse of the different services implemented by those systems without requiring substantial changes in the source code of the target application. 1
In our work, we have used JGroups version 2.6.10, released on April 28, 2009. JGroups is available at http://www.jgroups.org.
172
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
The jGCS architecture relies on a plug-in mechanism based on the Inversion of Control (IoC) design pattern [10]. This mechanism is used by jGCS to decouple its service API from the underlying service implementation, thus allowing the same API to be reused across different GCSs. The actual service implementation (plug-in) used by jGCS can be defined at initialization time, via an external configuration file. The current version of jGCS offers plug-ins for several GCSs, including JGroups, Spread [2] and Appia [20].2 The jGCS API is divided into four complementary interfaces, namely configuration interface, common interface, data interface, and control interface. These are described in more details below. Configuration Interface. This interface decouples the application code from implementation-dependent group communication concerns, such as group configuration and specification of message delivery guarantees. The actual GCS plug-in to be used is defined at configuration time, by means of an external configuration file. At execution time, the jGCS services are instantiated according to the specified configuration, using a dependency injection mechanism [10] or a service locator [1]. The main classes of this interface are ProtocolFactory, which implements the Abstract Factory design pattern [12] to allow the initialization of new protocol instances based on the underlying plug-in configuration; GroupConfiguration, which encapsulates group information (e.g., the group ID) necessary to open a new group session through which the application can exchange messages with other group members and monitor group membership changes; and Service, which encapsulates the specification of message delivery guarantees to be used during message transmission. Common Interface. This interface contains common classes shared by all other interfaces. The main class of this interface is Protocol, whose instances are created by the ProtocolFactory class from the configuration interface. A Protocol object is used to create, for a given GroupConfiguration object, the objects responsible for message exchange and group membership management, of types DataSession and ControlSession, respectively, described next. Data Interface. This interface contains classes responsible for sending and receiving group messages. The main classes of this interface are DataSession, which is used to send messages to a group and also to register observers [12] to handle messages received from that same group; Message, which encapsulates a message to be sent or received from a group and the address of the sender; and MessageListener, which must be implemented by all observers registered with a DataSession. To avoid forcing any specific data format or serialization mechanism on the application, the message body is stored as a byte array inside Message, with the application being responsible for serializing the message before transmission and deserializing it after receipt. 2
In our study, we have used jGCS version 0.6.1, released on October 29, 2007. jGCS is available at http://jgcs.sourceforge.net/.
G2CL: A Generic Group Communication Layer for Clustered Applications
173
Control Interface. This interface contains classes responsible for group management, from simple notifications of members joining or leaving a group to the creation of new virtual group views. The main classes of this interface are ControlSession, which provides methods for members to join or leave a group and also to register observers to listen to notifications of membership changes (e.g., join, leave and failure of members); ControlListener, which must be implemented by all observers registered with a ControlSession; MembershipSession, which is an extension of class ControlSession used to obtain a list of members currently connected to a group and also to register observers to listen to changes in group views; and MembershipListener, which must be implemented by all observers registered with a MembershipSession.
3
G2CL
G2CL is an extensible group communication software layer that sits on top of existing generic frameworks. Its main design goal is to offer a more sophisticated set of generic group communication services, similar to those provided by the building blocks of JGroups, but with all the benefits associated with the use of a loosely-coupled software architecture. To achieve this goal, we have taken some important design decisions, discussed below. 3.1
Main Design Decisions
Choice of Generic Framework. Our first design decision was concerned with selecting the generic framework to be used as the basis for the implementation of G2CL. Of the three generic frameworks currently available, i.e., Hedera [13], jGCS [6] and Shoal [26], only Hedera and jGCS were considered mature enough for our purposes, with both providing plug-ins for several existing GCSs. Shoal, on the other hand, only provides support for a single GCS (namely, JXTA [18]) and thus was discarded as a possible generic framework candidate. The choice between Hedera and jGCS was based on several factors, including an analysis of their design features and performance overhead. In the end, we chose to use jGCS because of its well-designed API, which has been implemented following well-known object-oriented design principles and patterns [6], and the fact that it offers a much lower performance overhead compared to the overhead imposed by Hedera, particularly for small messages [24,25]. Service Implementation Model. Another important design decision was concerned with defining an appropriate implementation model for G2CL. Given the rich set of group communication building blocks offered by JGroups, and its popularity amongst distributed application developers, we have decided to implement the G2CL services following, whenever possible, the same building block API (including class names and method signatures) used by JGroups. This decision has the potential to facilitate the task of migrating an existing clustered application based on JGroups to G2CL, since both systems implement similar APIs. Another benefit is that G2CL users could greatly reduce their learning curve by leveraging on JGroups’ extensive API documentation and code base.
174
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
jGCS Extensions. During the design of G2CL we have identified the need to make some minor extensions to the classes and interfaces originally provided by jGCS. These extensions are described below. As it is typical with other communication abstractions that encapsulate lowerlevel services, to implement some of the G2CL services we needed a way to add service-specific headers to application messages in a manner that is separate from their body. Such headers would be used to store control information relevant to the implementation of some services, but which could not be exposed to the application. Since this facility is not readily supported by the Message class currently provided by jGCS, we had to define a new message class, called G2CLMessage. In order to maintain compatibility with the DataSession class of jGCS, G2CLMessage implements jGCS’s Message interface. This allows G2CLMessage objects to be transmitted as any other message using any jGCS plug-in. Another extension made to jGCS was the implementation of a new DataSession class, called MarshalDataSession, which works like an adapter [12] between the G2CL services and the original DataSession used by the jGCS plug-ins. The main responsibility of this new class is to intercept all message transmission calls made to the plug-in by the application and then execute the necessary transformations to convert between a message of type G2CLMessage and another message of type Message. In this way, all G2CL services must rely only on MarshalDataSession for message transmission (instead of the original DataSession class of jGCS). 3.2
Implemented Services
The initial set of group communication services implemented as part of G2CL was selected based on an informal analysis of the JGroups services that are most commonly used in practice. The selected services were classified into two groups, named high-level services and service decorators, described below. High-level Services. These services encapsulate a MarshalDataSession instance by hiding its basic message transmission functionality, so as to provide application developers with a more sophisticated group communication API. Four services were initially implemented as part of this group: MessageDispatcher, RpcDispatcher, ReplicatedHashMap and StateTransferDataSession. Those four services are briefly described below. MessageDispatcher. Provides a way to send synchronous messages to the group with request-response correlation. Sending synchronous message to the group can cause ambiguity in regards to when the execution should resume. Hence, the sender should choose between different policies that specify how many members should receive the message before considering the message as sent.
G2CL: A Generic Group Communication Layer for Clustered Applications
175
RpcDispatcher. Provides a way to make remote method invocation in the members of the group. When creating his own RpcDispatcher instance, each member needs to specify the object in which the received method invocations should be made. As method invocations are synchronous, to avoid ambiguity, as in MessageDispatcher, the invoker needs to choose between different policies to specify when the method should return. StateTransferDataSession. Provides a DataSession with a state transfer mechanism implemented based on the JGroups State Transfer service [5]. It should be used when the application needs to maintain a replicated state amongst all group members. ReplicatedHashMap. Implements a Map object replicated across all members of the group. Any change to the map (via invocation of clear(), put(), remove(), etc.) will transparently be propagated to all replicas in the group. Invocations of read-only methods always access the local replica. Due to space limitations, and because those services provide the same set of functionalities provided by their corresponding services in JGroups, with a similar API, we will omit the details of their implementation from the paper. For a more detailed account of those services, the interested reader is referred to [11]. Service Decorators. Services of this group add extra functionalities (such as message fragmentation and encryption) to the basic message transmission service provided by the MarshalDataSession class. As the group name implies, these services are based on the Decorator design pattern [12]. Their implementation keeps the same interface provided by MarshalDataSession, so that their use is completely transparent to the application. Currently, G2CL provides four service decorators, namely FragDataSession, BundleDataSession, CompressDataSession and CryptoDataSession. These services provide mechanisms for message fragmentation, message bundle, message compression and message encryption, respectively. Each service decorator can be used either in isolation, or combined with other service decorators, forming a chain of responsibility [12] where different decorators can be added or removed from the chain without affecting the application code. To facilitate the use and configuration of service decorators, G2CL provides a MarshalDataSessionFactory class whose main responsibility is to create a new MarshalDataSession instance. If necessary, the MarshalDataSessionFactory can also instantiate a chain of decorators for the new MarshalDataSession object. The creation of both the MarshalDataSession instance and its chain of decorators can be configured by the user in a manner that is independent of the application code, using a dependency injection mechanism or a service locator.
4
Evaluation
To assess the migration effort and potential performance impact associated with the use of G2CL in a real clustered application, we have conducted a case study
176
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
involving the JOnAS Java EE application server [17]. The reason for selecting JOnAS as our target application is two-fold: (i) it is a mature clustered technology of non-trivial size (in the order of 230.000 lines of Java code); and (ii) it makes intensive use of a number of group communication services and building blocks provided by JGroups, which have similar services already implemented as part of G2CL. 4.1
JOnAS Overview
The Java Open Application Server (JOnAS) is an open source implementation of the Java EE 5 specification [27].3 JOnAS supports the creation of reliable EJB applications by providing a high-availability (HA) service based on a cluster of JOnAS instances. When a client application requests the creation of a Stateful Session Bean (SFSB) component, one of the servers in the cluster is chosen to respond to that client’s invocations until the client requests the removal of that SFSB. Before sending a response to the client, the server propagates any change in the state of the SFSB to the other servers in the cluster, which act as backup servers for that component. If the server initially allocated to a replicated component fails, the state of the SFSB can be recovered by one of its backup servers, which will start handling future invocations for that component on behalf of the failed server. To implement its HA service JOnAS relies on a RMI-like replication protocol called Clustered Method Invocation (CMI), which is specifically tailored for transparently invoking replicated objects. The CMI protocol uses several high-level group communication services provided by JGroups to implement a number of features, including a distributed version of a JNDI-based resource registry, and a state propagation mechanism. More specifically, the distributed registry uses the RPCDispatcher and StateTransfer services of JGroups to guarantee that any changes made to registry by one of the servers are reliably propagated to the other servers (for instance, when a new object is created). The state propagation mechanism, in turn, uses the MessageDispatcher service of JGroups to guarantee that, whenever the server responsible for a replicated object fails, at least one of the remaining servers in the cluster will will have all the necessary information to continue responding to any ongoing or future client request on behalf of the failed server. In the following we describe how we have replaced, in the JOnAS source code, all JGroups services used in the implementation of the CMI protocol with the corresponding generic services provided by G2CL. 4.2
Migration Process
Our migration process was concentrated on two JOnAS classes, namely SynchronyzedDistributedTree and JGMessageManager. These are the main classes 3
In our work, we have used JOnAS version 5.1.0-M5. JOnAS is available at http://jonas.ow2.org.
G2CL: A Generic Group Communication Layer for Clustered Applications
177
Table 1. JOnAS migration numbers # Packages Module CMI
# Classes
# LoC
Total Changed (%) Total Changed (%) Total Changed (%) 80
2 (2,50%)
216
6 (2,78%) 18.691 2.106 (11,28%)
OW2-UTIL
235
1 (0,42%)
596
5 (0,84%) 33.538
270 (0,80%)
JOnAS
396
1 (0,25%) 2133
1 (0,04%) 180.030
111 (0,06%)
All
711
4 (0,56%) 2945
12 (0,41%) 232.259
2.487 (1,01%)
involved in the implementation of the distributed registry and the replication mechanism of CMI, respectively. In both classes our migration strategy consisted, essentially, of changing all lines of code (and, when necessary, their associated configuration files) responsible for initializing the target JGroups services (i.e., RPCDispatcher, StateTransfer and MessageDispatcher ), in order to replace them with the necessary code to initialize the corresponding services of G2CL. One notable exception was the need to implement a new message serialization mechanism for JOnAS. This was required because the original version of JOnAS uses the serialization mechanism provided by JGroups, while jGCS (and, consequently, G2CL) leaves the serialization process to be implemented by the application. Finally, we also had to change the way JOnAS handles the identification of group members. In the original version of JOnAS, group members are identified by the Address class of JGroups. In the new version, based on G2CL, this class was replaced by the SocketAddress class, which is the class used to identify group members in jGCS. It is interesting to note that, even though the JGroups services we have replaced are actually used in many other parts of the JOnAS source code, we did not have to change any of those parts. This was due to our decision to keep the same JGroups API when implementing the corresponding services in G2CL. Table 1 quantifies our migration effort in terms of the number of JOnAS packages, classes and lines of code (LoC) that had to be modified as part of our G2CL migration strategy. From that table we can see that most of the changes were performed in the CMI module, where about 2.5% of its packages, 2.8% of its classes and 11% of its lines of code had to be modified. These numbers reflect the fact that CMI makes intensive use of JGrous in its implementation, as we have explained above. Even though many of the changes made to the CMI module were certainly non-trivial, we can still see these numbers in a positive light if we consider that nearly 98% of the packages and classes of that module (comprising about 90% of its lines of code) were left unchanged after the migration. The percentage of changes in the other modules was much smaller, as expected, varying between 0.06 and 0.8%. Overall, we only had to change about 1% of the total of lines in the JOnAS source code.
178
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
The above numbers are indicative that the programming effort required by the G2CL migration process was relatively low compared to the full size of the JOnAS source code. They also reflect the fact that group communication, although crucial to the provisioning of some important services of JOnAS, is only used scarcely in its implementation. 4.3
Performance Impact Analysis
Despite the clear software engineering benefits that can be associated with the use of generic APIs, one cannot neglect the inevitable performance impact that those systems may impose on the services they generalize. With this concern in mind, we have analysed the potential overhead caused by G2CL on the performance of JOnAS. Our analysis compared the performance of the original version of JOnAS, based on JGroups, against that of the new version, based on G2CL, using three different jGCS plug-in configurations. Method. Our analysis was carried out in a local cluster environment, which was configured in a manner to emulate a typical JEE clustering scenario [19]. This environment was composed of nine PCs connected through a dedicated 10/100 Mbps Fast Ethernet switch. Each PC had the following configuration: Intel Core 2 Duo processor; 2 GB RAM (DDR2); and Linux Debian (version 5.0) operating system. Six PCs were used in the business layer, each one running a separate JOnAS instance with CMI and the HA service enabled, playing the role of replicated EJB containers. Two other PCs were used in the presentation layer, each one also running a separate JOnAS instance, but now playing the roles of both web containers and CMI clients. Finally, one PC was used to run the Apache server (version 2.2.11), which was responsible for balancing the load amongst the servers of the presentation layer. To compare the performance of the different JOnAS versions, we have developed a simple EJB application with a single SFSB. This SFSB implements the basic functionalities of a shopping cart in an e-commerce application, offering operations to insert, update and remove items from the shopping cart. For persistence, we used the PostgreSQL relational database system (version 8.3) [23]. This EJB application was installed in all the six servers of the business layer, with its SFSB component being configured as a replicated CMI object. We have also developed a simple web-based client application to continuously invoke a series of operations provided by the replicated object (shopping cart) at the business layer. Both the EJB application and the client application were implemented in a way to create an execution scenario similar to the one used by Lodi et al. in [19], where the authors have compared the performance of an enhanced version of the JBoss application server [15]. We ran multiple sets of experiments, with each experiment involving a different version of JOnAS. In total, we analysed the performance of four JOnAS versions: the original version, based on JGroups, and three variations of the new version, based on G2CL, using the jGCS plug-ins for JGroups, Spread and Appia,
G2CL: A Generic Group Communication Layer for Clustered Applications
179
respectively. In all experiments we varied the number of clients from 50 to 100, so as to observe the performance of the different versions of JOnAS under different load conditions. To generate the client loads we used the ApacheBench(ab) benchmarking tool (version 2.0) [3]. In terms of group communication features, we configured the three jGCS plug-ins to provide the same set of guarantees that is provided by JGroups in the original version of JOnAS. This was necessary to make sure that the new version of JOnAS, based on G2CL, would behave, at least functionally, in a similar fashion to its original version. Finally, we used the client response time as our performance measure [14]. In our analysis, this measured as computed by calculating the average response time observed across all clients during the same experiment. To achieve a confidence interval of 95%, each experiment was executed at least 30 times, with extreme outliers being removed using the boxplots method [28]. Results. Figure 1 shows the average client response time observed for the four versions of JOnAS as a function of the number simultaneous client requests handled by the EJB application. As we can see, the different JOnAS versions are non-uniformly affected as the number of client requests grows. In addition, when we compare the original version of JOnAS, which uses JGroups directly, against the new version, based on G2CL configured with the JGroups plug-in, we note that their performances is very close, with a slight advantage to the former. This shows that the performance overhead imposed by G2CL on JOnAS is minimal (for 50 simultaneous requests, their performance differ by about 27% in favor of the original version, with that difference quickly falling below 5% as the number of simultaneous requests approaches the 70 mark).
Fig. 1. Performance analysis results
180
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
We also observed that the new JOnAS version configured with the Appia plug-in imposes a virtually constant performance loss (in the order of 25%) when compared with its original version. When the new version is configured with the Spread plug-in, the observed performance loss is even higher (up to 42% for 100 simultaneous requests). These results suggest that the performance impact imposed by the use of G2CL may not be determined a priori, as it is likely to be influenced by the performance of the underlying jGCS plug-in. In this regard, we believe G2CL can offer a real contribution towards more effective clustering solutions, since it liberates developers to experiment with new group communication mechanisms without requiring a significant programming effort.
5
Discussion
In our work, we use the term generic to convey the notion of flexibility and portability. In this sense, we say that both jGCS and G2CL implement generic APIs, in that both can be easily configured to use different GCSs as their underlying group communication mechanism. Similarly, we say that JGroups implements a specific (non-generic) API, since its services are tightly-coupled to its own group communication primitives and protocols. In terms of group communication abstractions, G2CL adds little extra functionality beyond those already provided by jGroups. However, compared with JGroups, G2CL main advantage is its greater flexibility for configuring its underlying group communication mechanism, which can be any of the GCSs currently supported by jGCS. Therefore, by providing a flexible implementation, based on jGCS, for a number of commonly used JGroups services, G2CL combines the best of both systems. With respect to its performance impact, our early experimental results from the JOnAS case study show that G2CL offers a noticeable yet non significant performance loss compared with the performance of JGroups when the latter is used in standalone mode, particularly under high load conditions. Nonetheless, we believe that the use of a generic group communication API can still pay-off in terms of improving application performance. As we have already shown elsewhere [24,25], Spread can outperform JGroups by a large margin under certain communication scenarios. This means that, for some distributed applications that use JGroups, the migration to G2CL using a Spread-based configuration might actually result in a real performance gain. A further investigation of the conditions upon which migrating to G2CL might improve application performance is an interesting topic for future work. An important limitation of our work thus far is that we have limited our evaluation to single performance metric (i.e., client response time). In this regard, we plan to further investigate the potential impact of using different G2CL configurations on other well-known performance metrics, such as server throughput and memory consumption.
G2CL: A Generic Group Communication Layer for Clustered Applications
6
181
Conclusion
In this paper, we have presented our work on G2CL, a generic software layer providing a rich set of high-level group communication services. Our early experience in using G2CL in the the JOnAS application server as well as in other middleware technologies suggests that it can be effectively used as a generic group communication solution for existing clustered technologies, requiring a relatively modest migration effort and imposing a minimal performance overhead, particularly for those applications originally based on JGroups. A natural line for future research is to improve G2CL with new group communication services and features. We are also conducting more case studies, involving open source clustered applications of varying sizes and domains and using new performance metrics, in order to better analyse the benefits and limitations of our approach. G2CL is being developed as an open source project. Its source code and documentation are freely available at http://g2cl.googlecode.com.
References 1. Alur, D., Malks, D., Crupi, J., Booch, G., Fowler, M.: Core J2EE Patterns: Best Practices and Design Strategies, 2nd edn. Sun Microsystems, Inc., Mountain View (2001) 2. Amir, Y., Danilov, C., Stanton, J.: A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication. In: Proceedings of the 2000 International Conference on Dependable Systems and Networks (FTCS-30, DCCA8), pp. 327–336. IEEE CS Press, New York (2000) 3. Apache: Apache HTTP server benchmarking tool (1996), http://httpd.apache.org/docs/2.0/programs/ab.html 4. Ban, B.: Design and Implementation of a Reliable Group Communication Toolkit for Java. Tech. rep., Cornell University, Cornell University (1998) 5. Ban, B.: A Flexible API for State Transfer in the JavaGroups Toolkit (2007) (unpublished manuscript) 6. Carvalho, N., Pereira, J., Rodrigues, L.: Towards a Generic Group Communication Service. In: Proceedings of the 8th International Symposium on Distributed Objects and Applications (DOA 2006), pp. 1485–1502. Springer, Montpellier (2006) 7. Chockler, G.V., Keidar, I., Vitenberg, R.: Group Communication Specifications: A Comprehensive Study. ACM Computing Surveys 33(4), 427–469 (2001) 8. Couloris, G., Dollimore, J., Kindberg, T.: Distributed Systems – Concepts and Design, 4th edn. Addison-Wesley, Boston (2005) 9. Dabek, F., Zhao, B., Druschel, P., Kubiatowicz, J., Stoica, I.: Towards a Common API for Structured Peer-to-Peer Overlays. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 33–44. Springer, Heidelberg (2003) 10. Fowler, M.: Inversion of Control – IoC (2004), http://martinfowler.com/articles/injection.html 11. G2CL: Generic Group Communication Layer (2009), http://g2cl.googlecode.com/ 12. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston (1995)
182
L. Sales, H. Te´ ofilo, and N.C. Mendon¸ca
13. Hedera: Hedera Group Communications Wrappers (2008), http://hederagc.sourceforge.net/ 14. Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling. Wiley-Interscience, New York (1991) 15. JBoss: JBoss Application Server (2009), http://www.jboss.org/jbossas/ 16. JGroups: JGroups – Building Blocks (2009), http://www.jgroups.org/blocks.html 17. JOnAS: JOnAS – Java Open Application Server (2009), http://jonas.ow2.org/ 18. JXTA: JXTA Community Project (2008), https://jxta.dev.java.net/ 19. Lodi, G., Panzieri, F., Rossi, D., Turrini, E.: SLA-Driven Clustering of QoS-Aware Application Servers. IEEE Transactions on Software Engineering 33(3), 186–197 (2007) 20. Miranda, H., Pinto, A., Rodrigues, L.: Appia – a Flexible Protocol Kernel Supporting Multiple Coordinated Channels. In: Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS 2001), pp. 707–710. IEEE CS Press, Phoenix (2001) 21. van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: User-Friendly and Reliable Grid Computing Based on Imperfect Middleware. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC 2007). ACM Press, Reno (2007) 22. Pietzuch, P., Eyers, D., Kounev, S., Shand, B.: Towards a Common API for Publish/Subscribe. In: Proceedings of the 2007 Inaugural International Conference on Distributed Event-based Systems, pp. 152–157. ACM, Toronto (2007) 23. PostgreSQL: PostgreSQL (2009), http://www.postgresql.org/ 24. Sales, L., Te´ ofilo, H., D’Orleans, J., Mendon¸ca, N.C., Barbosa, R., Trinta, F.: Performance Impact Analysis of Two Generic Group Communication APIs. In: Proceedings of the 1st IEEE International Workshop on Middleware Engineering (ME 2009), pp. 148–153. IEEE CS Press, Bellevue (2009) 25. Sales, L., Te´ ofilo, H., Mendon¸ca, N.C., D’Orleans, J., Barbosa, R., Trinta, F.: An Evaluation of the Performance Impact of Generic Group Communication APIs. Int. Journal of High Performance Systems Architecture 2(2), 90–98 (2009), http://dx.doi.org/10.1504/IJHPSA.2009.032026 26. Shoal: Shoal – A Dynamic Clustering Framework (2008), https://shoal.dev.java.net/ 27. SUN: Java platform, enterprise edition (java ee) (2006), http://java.sun.com/javaee/ 28. Triola, M.F.: Elementary Statistics, 7th edn. Addison-Wesley, Boston (1997)
Dynamic Composition of Cross-Organizational Features in Distributed Software Systems Stefan Walraven, Bert Lagaisse, Eddy Truyen, and Wouter Joosen DistriNet, Dept. of Computer Science, K.U. Leuven, Belgium {stefan.walraven,bert.lagaisse,eddy.truyen,wouter.joosen}@cs.kuleuven.be
Abstract. Companies offering software services to external customer organizations must ensure that the non-functional requirements of all these customer organizations are satisfied. However, in such a cross-organizational context where services are provided and consumed by different organizations, the implementation of features, for example security, is scattered across the services of the different organizations and cannot be condensed into a single module that is applicable to multiple services. In this paper we present an aspect-based coordination architecture for dynamic composition of cross-organizational features in distributed software systems such as systems of systems or service supply chains. The underlying approach of this architecture is to specify the features at a higher level that abstracts the internal mechanism of the organizations involved. A coordination middleware dynamically integrates the appropriate features into the service composition, driven by metadata-based user preferences. Keywords: Cross-organizational, Feature-oriented, Service engineering, Dynamic composition, AOSD.
1
Introduction
A software service is used by several customer organizations simultaneously, possibly each of them with their own end users. Each customer organization may have different – possibly conflicting – requirements with respect to the features provided by the service. As non-functional requirements are often applicationspecific and pervasive, a software service is required to allow on-demand integration of tailorable features. To fit those varying requirements recent trends in service engineering aim to combine the benefits of feature-based and servicebased approaches [3,13,1,16]. This combination offers a modular and pluggable solution that increases the reusability of services and supports on-demand customization tailored to the user’s needs. The building blocks for these customizations consist of modular features. A feature is a distinctive mark, quality, or characteristic of a software system or systems in a domain [11]. Features define both common facets of the domain as well as differences between related systems in the domain. They make each system in a domain different from others. F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 183–197, 2010. c IFIP International Federation for Information Processing 2010
184
S. Walraven et al.
However, services are mostly used in a service composition consisting of services from different organizations. In such a cross-organizational context, service implementations are black boxes, implemented and deployed by different organizations on possibly different service platforms, and only the interface descriptions are publicly available [1]. In this cross-organizational and heterogeneous context, a problem arises when a feature implementation cannot be contained within the service provider, but crosscuts the service provider and service consumer (see Fig. 1). Because such a logical but distributed feature cannot be condensed into a single feature any more, on-demand service customization tailored to the user’s preferences is hard to achieve in a cross-organizational context. Cross-org Service consumer
Service provider
{preferences}
Provider application
Consumer application
Distributed & heterogeneous features
End user
Service platform X
Service platform Y
Fig. 1. Problem context of cross-organizational feature composition
A typical example of a cross-organizational crosscutting feature is security in distributed e-finance software, such as online stock trading systems. When implementing an access control concern (authentication and authorization) in such an application, security actions need to be performed for every interaction between the distributed subsystems as presented in Fig. 2. In this cross-organizational context it is difficult to defend that a single feature module encapsulates the implementation of the internal security mechanisms of the organizations involved as well as the global security policy governing how security must be addressed in the overall interaction between different organizations. The latter security policy belongs to a level of abstraction above the internal security mechanism, allowing different implementations. The problem of cross-organizational customization of services has not been well addressed in the current state-of-the-art. On the one hand, existing coordination architectures for cross-organizational service provisioning (GlueQoS [29], BCL [18], T-BPEL [26]) focus on dynamically establishing and monitoring agreements for enacting service delivery, but fail to support the coordination and dynamic composition of system-wide software features. On the other hand, stateof-the-art dynamic composition technologies such as dynamic aspect weaving fail to offer the right mechanisms for expressing customizations at the right level of abstraction. This paper proposes an aspect-based coordination architecture for dynamic composition of cross-organizational features in distributed software systems such
Dynamic Composition of Cross-Organizational Features
185
as systems of systems or service supply chains. The underlying approach of this architecture is to provide support for cross-organizational service customization by leveraging some of the principles of cross-org coordination architectures. First, a high-level feature ontology is specified to agree about a technology-independent feature model among the organizations within a specific contract domain or service network. Second, service consumers are able to express a desired feature configuration through the specification of a feature policy that is based on the vocabulary of this ontology. Third, each organization has to map their part of the feature ontology to a specific aspect-based implementation platform. The underlying coordination architecture is responsible for managing the feature ontology, its organization-specific mapping to aspects and the deployment of user-specific feature policies within various scopes such as per service binding, per session and per message. The remainder of this paper is structured as follows. In Sect. 2 we further motivate and illustrate the importance of cross-organizational feature composition in distributed software systems. Section 3 shortly discusses related work. We describe our aspect-based coordination architecture for dynamic composition of cross-organizational features in Sect. 4 and evaluate the performance overhead of our prototype in Sect. 5. Section 6 concludes the paper.
2
Problem Motivation and Illustration
We present an example in the e-finance domain to further motivate and illustrate the importance of cross-organizational feature composition in distributed software systems (see Fig. 2). Banks offer their customers a stock trading service to inspect, buy and store stock quotes. To be able to provide this service, those banks cooperate with the stock market, which in turn cooperates with a settlement company. Such a cross-organizational service composition allows each participant to take up two roles: service consumer and service provider. For example, the bank company is a server for the bank customers, but consumes the QuotesOrderService of the stock market.
Fig. 2. Illustration of the stock trading service composition including the signaturebased authentication feature as a single module
186
S. Walraven et al.
Since different clients have different needs, the service providers must ensure that the different and varying non-functional requirements are satisfied, for example security, transaction support, load balancing, priority processing and stepwise feedback. In our example the bank customers can obtain different variants of the stock trading service composition by selecting features tailored to their preferences. For instance, a bank can offer to their customers several signature-based authentication options from which they can choose one, based on the different algorithms (e.g. SHA1withDSA). In the same way the stock market will negotiate with the different bank companies about the message carrier, the used protocols or which trusted third party (TTP) will be used. The stock trading service in Fig. 2 includes a signature-based authentication feature, applied on the connection between the stock market and the settlement company. This feature affects both the service consumer, to sign the messages, and the service provider, to perform the verification of the signatures. This clearly illustrates that a single feature module, consisting of client and server functionality, can affect multiple services in a service composition. However, each company in a cross-organizational service network has its own IT administration and trust domain, and will not allow external parties to add or update feature implementations. The services provided by the different partners are black boxes, independently maintained by the company’s own administrators. This black-box scenario hinders the feature modularization and composition in a cross-organizational context [1]. Moreover, since the services are loosely-coupled, the different parties can use different service platforms, programming languages and feature composition techniques (e.g. Java and .NET platforms). Therefore a feature cannot be condensed into a single module any more and should be applied in a distributed and heterogeneous way. Cross-organizational features need to be split up into consumer- and provider-side parts that respectively fulfill the service consumer and service provider responsibilities. A uniform high-level representation of those features is necessary to be able to share them in a cross-organizational and heterogeneous application domain or service network. Further we need a coordination middleware to dynamically activate the appropriate feature implementations throughout the service composition in a consistent way.
3
Related Work
This paper will tackle the problem statement by combining and improving two bodies of work. On the one hand, the body of work on cross-organizational coordination architectures; on the other hand, dynamic software composition technology in particular aspect-oriented middleware. Cross-organizational coordination architectures. A core tenet of the body of research on cross-organizational coordination architectures is a multi-layered architecture distinguishing between policy and mechanism. In previous work we have defined a reference model [28] for classifying the different approaches. In general, a cross-org coordination architecture consists of an agreement language
Dynamic Composition of Cross-Organizational Features
187
for specifying agreements (either contract- or policy-based) and a coordination middleware for establishing agreements and monitoring these for violations. An agreement between a service consumer and provider specifies the rules of engagement, that must be complied with by service consumer and service provider. For example, agreements specify the flow of interactions and message types to be exchanged (BCL [18]), modal constraints (i.e. authorizations, obligations, prohibitions, timings (Ponder [4])), QoS requirements (GlueQoS [29]), usage of protocols and standards (T-BPEL [26]). Secondly, the underlying coordination middleware supports establishing agreements between client and server dynamically, and to enforce the agreements or detect violations against it. These architectures, however, are not designed for user-specific customization of shared service instances and the consistent deployment of distributed and heterogeneous software features throughout cross-organizational service compositions. Aspect-oriented middleware. Aspect-oriented software development (AOSD) [6] has been put forward as a possible solution to address the problem of crosscutting (often non-functional) concerns. In addition, AOSD is often applied to enable modularization and composition of features [17,15,14]. Aspect-oriented frameworks [21,9,24,12,23,22] have played a key role in the modularization of middleware platforms: these have evolved from a monolithic platform with a declarative configuration interface towards an architecture that is able to plug application-specific or user-specific extensions on demand. Current aspect-oriented frameworks also support dynamic aspect weaving in a reliable and atomic manner [19,27]. These AO techniques make these platforms therefore ready for deployment in usage contexts where a shared service instance can be dynamically customized to customer-specific requirements by dynamically weaving in desired features. In a cross-organizational context, however, current AO technology fails to offer coordinating mechanisms for deploying multiple aspect modules across multiple organizations and heterogeneous platforms.
4
Aspect-Based Architecture for Cross-Organizational Composition
This section describes the aspect-based coordination architecture enabling dynamic composition of cross-organizational features. Figure 3 illustrates our approach underlying the architecture. Similarly to the research on cross-organizational coordination architectures, our architecture assumes that a conceptual model for cross-organizational features is agreed upon between all organizations within a particular domain or a specific service network. This conceptual model defines a feature ontology, a feature model [11] in fact, shared by all organizations involved, for naming and defining the different features and their alternative implementation strategies. Next, the feature ontology is mapped to an aspect-based feature implementation within each organization which can use an AO-technology of its choice. Subsequently, a service consumer can then express user-specific feature preferences when binding to a service provider. An underlying coordination middleware will ensure that the appropriate feature
188
S. Walraven et al.
Agreement
Service Consumer Organization
Service Provider Organization
Aspect-based Feature Implementation Mapping
Selection {preferences}
features
request
Feature Ontology
Provider application
Consumer application Feature Implementations by Consumer
<<metadata>>
Feature Implementations by Provider
End user Dynamic Activation
Fig. 3. Overview of the approach
implementations are activated dynamically throughout the service composition in a consistent manner at the right moment, driven by metadata. We will now explain the architecture in more detail. First, the high-level, technology-independent feature ontology is presented. Next, we describe the aspect-based feature implementation mapping. Thereafter it is presented how users can specify customization requirements through feature policies. Finally we present the design of our coordination middleware. As a running example we will use the example of dynamically composing security features such as authentication and non-repudiation in the stock trading service composition. 4.1
High-Level Feature Ontology
The conceptual model in our approach for specifying cross-organizational features consists of a high-level feature ontology. This feature ontology should be abstract and independent from the aspect-based implementation (the computational model) to enable that different organizations in the service network can implement the same features differently depending on their choice of implementation platform and composition technology. Within a particular domain, for example e-finance domain, a standard for non-functional features (e.g. security) can be agreed upon. Figure 4 presents an example of a feature ontology for security features based on existing standards [8,20,2]. Security protection breakdowns into authentication, authorization, audit, availability, confidentiality, integrity and non-repudiation. We also itemized some possible feature implementation strategies, based on the different algorithms (e.g. SHA1withDSA for signature-based authentication). A feature node within the ontology consists of a feature identifier, a unique name for referring to that feature, and a high-level, technology-independent feature contract about the intended behavior of the feature and the roles that different parties involved have to play. These roles are described by a name
Dynamic Composition of Cross-Organizational Features
!"$
%"&
189
!
'() "
* #
$
'(
'() #
" #
Fig. 4. Example of a feature ontology for the stock trading composition
(e.g. Service Consumer) and a set of responsibilities that specify constraints on behavior and interfaces. Furthermore, composition rules can be specified that prescribe which features depend on other features and which features can’t be executed during the same request due to feature interference. For example, the SignatureBasedNR non-repudiation feature (see Fig. 4) defines two roles: a service consumer who retrieves the customer account information, and a service provider responsible for securely logging the customer account, the name and arguments of the request, and the cryptographic signature of the message. The service provider role requires a CustomerAccount attribute, which will be provided by the service consumer role. There is also a dependency rule necessary that prescribes that SignatureBasedNR requires the SHA1withDSA authentication feature to provide a Signature attribute. Currently we have not yet designed a concrete language for representing feature contracts (We expect though that such feature contract language would be based on existing feature modeling techniques for services such as Service Diagrams [7].). We do offer a declarative specification language for representing feature identifiers and roles from the ontology and their mapping to specific feature implementations. 4.2
Aspect-Based Feature Implementation Mapping
The mapping between the high-level feature ontology and the aspect-based implementations is specified on the level of the internal processes and data, hiding the implementation details for external parties. By capturing the semantics of the features in a high-level feature ontology, the different features can be implemented independently by each of the service providers using their favorite service platform and AO-composition technology. Hence, the different services in the network may have their own optimized aspect-based implementations of the different features, and the most appropriate feature implementation in each service may depend on environmental circumstances. However, the feature
190
S. Walraven et al.
implementations have to satisfy certain constraints, enforced by the feature ontology. In addition, the implementation of the different features and the software composition strategy are open for adaptation by each of the local administrators. The use of AOSD [6] enables a clean separation of concerns, in which the core functionality of a service is separated from any feature behavior. Therefore features are implemented separately from each other as composite entities containing a set of aspect-components, providing the behavior of the features (so called advice). This advising behavior can be dynamically composed on all the components of a service – at consumer-side and at provider-side. The aspectcomponents of the features are composed by means of declarative specifications in the form of AO-compositions: these specify on which elements of the service platform the aspect-components must be applied. Listing 1. Example of a feature implementation mapping featureImplMapping SHA1withDSAImpl { implements : SHA1withDSA ; r o l e : S e r v i c e Co n su m e r ; ao−c o m p o s i t i o n { i d : SHA1withDSASigning ; pointcut { kind : e x e c u t i on ; componenttype : ∗ ; componentinstance : ∗ ; i n t e r f a c e : ITransport ; method : se n d M e ssa g e ; } advice { comptype : SHA1withDSASignature ; interface : ISignature ; method : s i g n ; }}}
Each feature implementation mapping within a specific organization is described by means of a declarative specification that specifies: (i) the feature and role that is implemented and (ii) a set of AO-compositions to integrate the feature into the internal processes and data of the organization. Such an AOcomposition specifies a pointcut and a set of advices to apply. Listing 1 presents an example of a feature implementation mapping for the service consumer role of the SHA1withDSA signature-based authentication feature. The AO-composition specifies that this feature imposes on the transport layer of the service platform (pointcut) to digitally sign the messages before sending by means of the SHA1withDSASignature aspect-component (advice). 4.3
Expressing User-Specific Preferences through Feature Policies
The feature ontology is accessible to the end users of the service application and allows them to select the desired set of features. A service consumer selects a set of features by instantiating a feature policy. A feature policy is a declarative configuration that specifies per service binding which features are desired for that particular binding (see Listing 2). A service binding simply identifies the URI and interface of the service provider in question. When user-specific preferences
Dynamic Composition of Cross-Organizational Features
191
dictate that a particular feature must be applied, the feature implementations of the consumer and provider side will be dynamically composed for every message exchanged through that service binding. Listing 2. Example of a feature policy s er v i ceb i n d i n g { URI : h t t p : / /www. s t o c k t r a d i n g e x a m p l e . be ; port : StockTradingServiceSoapEndpoint ; fea tu r es : SHA1withDSA , SignatureBasedNR ; }
4.4
Coordination Architecture
In order to process user requests in a consistent manner throughout the crossorganizational service composition, coordination is needed between all participating services. To achieve this coordination, every request is tagged with extra meta information, specifying the set of features corresponding to user-specific preferences. This metadata propagates with the message flow initiated by that user request. As such, knowledge about the desired combination of features travels with the message flow. The coordination middleware also ensures that the appropriate feature implementations are activated dynamically when required.
Fig. 5. The coordination middleware architecture
Figure 5 presents our coordination middleware, built on top of an AOframework. The coordination middleware relies on this AO-framework to support dynamic AO-composition. The runtime composition of cross-organizational features throughout service compositions consists of two main phases: selection and activation. These phases are represented in the architecture as modular packages: the Selection and ConsumerActivation package are part of the consumer service platform; the ProviderActivation package is included into the provider
192
S. Walraven et al.
service platform. In the rest of this subsection we first explain the selection phase. Then the general structure and operation of the activation phase are described. Selection of Cross-Organizational Features. The machinery for the selection phase of the coordination middleware consists of the FeatureSelection aspect and the PolicyManager component. FeatureSelection is imposed onto the application layer where it intercepts the requests to remote services that are subject to a certain feature policy (cf. Listing 2). The PolicyManager processes the feature policies and stores which features apply to each service binding. These data structures are hash maps with constant access time. The FeatureSelection aspect queries the PolicyManager to determine which features apply for a given service binding, and annotates the intercepted messages with the feature identifiers of the required features. It also keeps track of which service bindings have already been customized: only for new service bindings the PolicyManager is queried for required features. Activation of Selected Cross-Organizational Features. After the necessary crossorganizational features are selected, those features need to be activated. The activation phase of the coordination middleware consists of the ConsumerActivation and ProviderActivation aspects and the ConsumerFeatureMapping and ProviderFeatureMapping components. The feature mapping components store a hash map of feature implementations and their associated AO-compositions. This allows to query the appropriate feature implementation based on a given feature identifier. ConsumerFeatureMapping and ProviderFeatureMapping handle respectively the service consumer and service provider roles of the features. The consumer-side ConsumerActivation aspect imposes on the consumer service platform; at the server-side the ProviderActivation aspect intercepts all incoming messages. These locations are the first joinpoints – thus the entry points – in the call flow of the remote requests, at the consumer-side middleware stack as well as the provider-side middleware stack. ConsumerActivation intercepts all messages and checks them for metadata with selected features. The selected features are compared with the features currently applied on the particular service binding to see whether any changes are necessary. If a change is necessary, ConsumerActivation queries the ConsumerFeatureMapping component for the descriptions of the AO-compositions for the selected features and uses this information to compose the necessary aspect-components. The activation aspect will then send a request to the AOframework to deploy those aspect-components. The last step in the feature activation mechanism is notifying the provider-side about the feature change. This way we ensure the features are applied throughout the service composition in a consistent manner. Therefore the feature updates are added as a piggyback to the message. At the provider-side, the ProviderActivation aspect intercepts the incoming requests and checks them for feature updates. If necessary, the new features
Dynamic Composition of Cross-Organizational Features
193
are activated. The activation at the provider-side is analogous to the consumerside. Concretely, ProviderActivation first inspects the currently activated crossorg features of the service binding, then it applies the feature changes using the feature implementation mappings retrieved from ProviderFeatureMapping. Incoming requests are also verified to be compliant with the feature policies: the messages are inspected for unsupported or missing features. These messages are not accepted, and an exception message is returned. Since the activation aspects check all requests, cross-organizational features can easily be deployed within a per message scope. Therefore the FeatureSelection aspect will need to query the PolicyManager for all intercepted messages. The activation aspects will ensure that features will be applied only once per service binding.
5
Evaluating the Performance Overhead
The evaluation aims to measure the absolute and relative performance overhead that our coordination architecture introduces on the overall responsiveness of service applications. In particular we want to measure the overhead introduced by the Selection and Activation components. Before we present the evaluation of our architecture, we briefly discuss our prototype implementation. As a proof of concept, a prototype of the coordination architecture has been implemented as a framework using Java SE 6. It is built on top of our own aspectcomponent framework which offers support for dynamic AO-composition. The Java dynamic proxy technology is used to intercept invocations and call advices. For all declarative specifications we used XML files. The coordination architecture is developed as an extension to the service platform, enabling the dynamic composition of cross-organizational features in distributed applications. Since it is a modular and aspect-based extension, the coordination middleware can be omitted or removed when needed. Our current prototype relies on an aspect-component framework that also supports weaving various middleware features in the distribution layer. We measured the roundtrip by comparing a version of our distribution layer where all features are statically composed, against one where all features are dynamically woven using our coordination architecture. In particular we focused on one security feature for the distribution layer: the implementation of a signaturebased authentication feature (for signing and verifying method invocations) as described in the running example of this paper. We have used a round-trip latency benchmark as presented in [5] which requires minimal processing time. This way the processing time influences the results as minimal as possible and the results show clearly the worst-case runtime overhead of the coordination architecture. As described in [5], the benchmark application is composed of two JVM applications: a client and a server. The client invokes a remote service which implements an empty ping method with no arguments and a void return value. The benchmark scenario is configured as
194
S. Walraven et al.
follows. The remote service and the client run into two JVMs on different hosts 1 in the same LAN. 100 benchmark series are executed sequentially. For all series, 20000 warmup operations are performed. Next, 20 steps of 5000 invocations are measured. For each step an average execution time is logged. So 2000 average measures (100 series * 20 steps) are obtained for each benchmarked distribution layer, representing 10 million operations. This benchmark application is extended with the signature-based authentication feature and executed on the two versions of our distribution layer: the default version and the version extended with our coordination architecture. In both cases the authentication feature will be deployed from the beginning. The results of the round-trip latency benchmark test are presented in Fig. 6(a) and indicate that the overhead introduced by our coordination middleware is negligible. We can achieve these good results because the Selection aspect in our coordination architecture keeps track of which service bindings have already been customized. In addition the feature activation mechanism only notifies the provider-side when there are updates. In the current benchmark setup where the feature is deployed from the beginning, this will result in very few updates. To measure the relative performance overhead of our coordination architecture, we compared our prototype against standard middleware platforms: Java RMI [25] and JBoss Remoting [10], using the default benchmark application (without security features). Java RMI is a minimal distribution layer with almost no customization capabilities. JBoss Remoting is a distribution layer with a standardized interface for pluggable transport and serialization layers. The three middleware platforms all use Java serialization and TCP sockets, but differ in their composition architecture: hard-coded (Java RMI), object-based framework (JBoss Remoting), or aspect-component framework (our approach).
Fig. 6. Subfigure (a) presents the average time (in ms) needed to execute a ping operation in the two versions of the distribution layer, with the signature-based authentication feature activated. Subfigure (b) shows the average number of minimal method operations per second in the different distribution middlewares. The box plots at the end of the bars represent respectively the minimum, average and maximum during the different benchmarks. 1
The benchmark tests were performed on systems with an Intel Core 2 Duo 3.00 GHz processor and 4 GB memory, running Ubuntu 8.04 (hardy) and OpenJDK Runtime Environment version 1.6.0.
Dynamic Composition of Cross-Organizational Features
195
The results of the second benchmark are used to calculate the average number of operations per second (see Fig. 6(b)) in the different distribution middlewares. These results show that Java RMI has the best performance (as expected), but more importantly, our coordination architecture and distribution layer introduce a relative overhead (in comparison to Java RMI) that is 6 factors smaller than that of JBoss Remoting. These initial results indicate that the overhead introduced by our coordination middleware is acceptable, especially in a crossorganizational context where long-running and asynchronous interactions more frequently occur than synchronous interactions with timing constraints.
6
Conclusion
Cross-organizational customization of services has not been well addressed in the current state-of-the-art. In this paper we presented an aspect-based coordination architecture for dynamic composition of cross-organizational features in distributed software systems. A high-level feature ontology specifies for each feature the responsibilities of service consumer and provider. Using this feature ontology we are able to map feature identifiers to aspect-based feature implementations, allowing each organization to implement the features independently using an AOP technology of its choice. The underlying coordination middleware ensures that user-specific feature preferences are processed in a consistent manner throughout the cross-organizational service composition. Our architecture has been validated in a prototype. We have benchmarked this implementation and it shows acceptable performance.
References 1. Apel, S., Kaestner, C., Lengauer, C.: Research Challenges in the Tension Between Features and Services. In: SDSOA 2008: 2nd International Workshop on Systems Development in SOA Environments, pp. 53–58. ACM, New York (2008) 2. Beznosov, K.: Engineering Access Control for Distributed Enterprise Applications. Ph.D. thesis, Florida International University, Miami, Florida, USA (July 2000) 3. Cohen, S., Krut, R. (eds.): Proceedings of the First Workshop on Service-Oriented Architectures and Software Product Lines. Carnegie Mellon University - Software Engineering Institute (May 2008) 4. Damianou, N., Dulay, N., Lupu, E., Sloman, M.: The Ponder Policy Specification Language. In: Sloman, M., Lobo, J., Lupu, E.C. (eds.) POLICY 2001. LNCS, vol. 1995, pp. 18–38. Springer, Heidelberg (2001) 5. Demarey, C., Harbonnier, G., Rouvoy, R., Merle, P.: Benchmarking the Round-Trip Latency of Various Java-Based Middleware Platforms. In: CPM 2004: The OOPSLA 2004 Component and Middleware Performance Workshop, pp. 7–24. Studio Informatica, Vancouver (2004) 6. Filman, R.E., Elrad, T., Clarke, S.: Aspect-Oriented Software Development. Addison-Wesley, Boston (2004) 7. Harhurin, A., Hartmann, J.: Service-Oriented Commonality Analysis Across Existing Systems. In: SPLC 2008: 12th International Software Product Line Conference, pp. 255–264 (2008)
196
S. Walraven et al.
8. International Organization for Standardization (ISO): Information Processing Systems - Open Systems Interconnection - Basic Reference Model - Part 2: Security Architecture, ISO 7498-2:1989 (1989) 9. JBoss Community: JBoss AOP, http://www.jboss.org/jbossaop/ 10. JBoss Community: JBoss Remoting, http://www.jboss.org/jbossremoting/ 11. Kang, K.C., Cohen, S.G., Hess, J.A., Novak, W.E., Peterson, A.S.: FeatureOriented Domain Analysis (FODA) Feasibility Study. Tech. Rep. 21, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA (1990) 12. Lagaisse, B., Joosen, W.: True and Transparent Distributed Composition of Aspect-Components. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS, vol. 4290, pp. 42–61. Springer, Heidelberg (2006) 13. Lee, J., Muthig, D., Naab, M.: An Approach for Developing Service Oriented Product Lines. In: SPLC 2008: 12th International Software Product Line Conference, pp. 275–284 (2008) 14. Lee, K., Kang, K.C., Kim, M., Park, S.: Combining Feature-Oriented Analysis and Aspect-Oriented Programming for Product Line Asset Development. In: SPLC 2006: 10th International Software Product Line Conference, pp. 10–112 (2006) 15. Loughran, N., Rashid, A.: Framed Aspects: Supporting Variability and Configurability for AOP. In: Bosch, J., Krueger, C. (eds.) ICOIN 2004 and ICSR 2004. LNCS, vol. 3107, pp. 127–140. Springer, Heidelberg (2004) 16. Medeiros, F.M., de Almeida, E.S., de Lemos Meira, S.R.: Towards an Approach for Service-Oriented Product Line Architectures. In: Third Workshop on ServiceOriented Architectures and Software Product Lines (SOAPL), pp. 151–164 (2009) 17. Mezini, M., Ostermann, K.: Variability Management with Feature-Oriented Programming and Aspects. In: SIGSOFT 2004/FSE-12: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 127–136. ACM, New York (2004) 18. Milosevic, Z., Linington, P.F., Gibson, S., Kulkarni, S., Cole, J.: Inter-Organisational Collaborations Supported by E-Contracts. In: Building the E-Service Society, pp. 413–429. Springer, Heidelberg (2004) 19. Nicoar˘ a, A., Alonso, G.: Dynamic AOP with PROSE. In: ASMEA 2005: Workshop on Adaptive and Self-Managing Enterprise Applications, pp. 125–138 (2005) 20. OMG: CORBA Security Services Specification. Version 1.8 (March 2002), http:// www.omg.org/cgi-bin/doc?formal/02-03-11.pdf 21. Pawlak, R., Seinturier, L., Duchien, L., Florin, G.: JAC: A flexible solution for aspect-oriented programming in Java. In: Yonezawa, A., Matsuoka, S. (eds.) Reflection 2001. LNCS, vol. 2192, pp. 1–24. Springer, Heidelberg (2001) 22. Rouvoy, R., Eliassen, F., Beauvois, M.: Dynamic planning and weaving of dependability concerns for self-adaptive ubiquitous services. In: SAC 2009: ACM symposium on Applied Computing, pp. 1021–1028. ACM, New York (2009) 23. S¨ oldner, G., Schober, S., Schr¨ oder-Preikschat, W., Kapitza, R.: AOCI: Weaving Components in a Distributed Environment. In: DOA 2008: Distributed Objects and Applications, pp. 535–552. Springer, Heidelberg (2008) 24. SpringSource: AOP with Spring, http://static.springsource.org/spring/docs/3.0.x/ spring-framework-reference/html/aop.html 25. Sun Microsystems: Java RMI, http://java.sun.com/javase/technologies/core/basic/rmi/ 26. Tai, S., Mikalsen, T., Wohlstadter, E., Desai, N., Rouvellou, I.: Transaction Policies for Service-Oriented Computing. Data & Knowledge Engineering 51(1), 59–79 (2004)
Dynamic Composition of Cross-Organizational Features
197
27. Truyen, E., Janssens, N., Sanen, F., Joosen, W.: Support for Distributed Adaptations in Aspect-Oriented Middleware. In: AOSD 2008: 7th International Conference on Aspect-Oriented Software Development, pp. 120–131. ACM, New York (2008) 28. Truyen, E., Joosen, W.: A Reference Model for Cross-Organizational Coordination Architectures. In: 12th International Conference on Enterprise Distributed Object Computing Workshops, vol. 0, pp. 252–263 (2008) 29. Wohlstadter, E., Tai, S., Mikalsen, T., Rouvellou, I., Devanbu, P.: GlueQoS: Middleware to Sweeten Quality-of-Service Policy Interactions. In: ICSE 2004: 26th International Conference on Software Engineering, pp. 189–199 (2004)
Co-ordinated Utility-Based Adaptation of Multiple Applications on Resource-Constrained Mobile Devices Ulrich Scholz and Stephan Mehlhase European Media Laboratory GmbH Schloß-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany {Ulrich.Scholz,Stephan.Mehlhase}@eml-development.de
Abstract. Running several applications on a small, mobile device requires their constant adjustment to changing environments, user preferences, and resources. The decision upon this adjustment has to regard various factors of which the optimality of the result is only one: Further non-functional aspects including user distraction and the smoothness of operation have to be taken into account, too. This paper explains various events causing adaptation and details several nonfunctional aspects to be considered. It then presents Serene Greedy, a pragmatic approach for deciding upon adaptation and non-adaptation of simultaneously running applications in resource constrained, mobile settings. Finally, this paper discusses Serene Greedy by comparing it against other adaptation reasoning techniques for performance and the mentioned non-functional properties.
1 Introduction With the emergence of ubiquitous computing, common future scenarios will consist in people moving around carrying general-purpose mobile devices, which they use extensively to assist both leisure and business related tasks. Naturally, the users expect their devices to run powerful applications as well as to run several of them simultaneously, serving different purposes at the same time. For developers of mobile applications this scenario is very challenging. Users expect applications on these devices to have capabilities close to those of contemporary laptop PCs. But on top of that, such applications have to cope with various additional restrictions, such as sudden context changes, scarce resources, and limited device capabilities. Applications that meet these complex requirements have to provide the variability to adjust to the varying environment as well as a reasoning mechanism that selects the best fitting variant for every specific situation. Implementing these capabilities in addition to the application functionality is indeed a demanding task. Developers can meet the challenges of a mobile setting by building on dedicated middleware platforms that provide reasoning and variability modeling support [5,9]. For example, utility-based adaptation reasoning allows to factor out the optimization mechanism from the business logic: The developer provides a function as measure for the usefulness of a particular application variant in a given situation; a reasoning mechanism then selects the optimal variant. While utility-based adaptation reasoning has been
This work was partly funded by the European Commission through the project MUSIC (EU IST 035166) as well as by the Klaus Tschira foundation.
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 198–211, 2010. c IFIP International Federation for Information Processing 2010
Co-ordinated Utility-Based Adaptation of Multiple Applications
199
demonstrated to work well for individual applications, handling multiple simultaneous applications poses additional challenges that currently receive little attention. The contributions of this paper are as follows: First, it describes issues arising when handling multiple applications, in particular performance decrease through excessive adaptations as well as non-functional problems such as stalling and user distraction caused by low-yielding re-configurations. It then presents Serene Greedy, a utility-based adaptation reasoning technique suitable for co-ordinated adaptation of multiple applications. Finally, it discusses the results of applying this technique in the context of the MUSIC middleware [9]. The next section introduces terms and concepts related to utility-based adaptation of multiple applications, while Sect. 3 describes non-functional aspects of adapting them. Section 4 gives a detailed analysis of different adaptation reasons as well as their influence and importance for maintaining the optimal usefulness. Section 5 presents Serene Greedy, a pragmatic approach to the adaptation of sets of simultaneously running applications. Section 6 demonstrates and discusses Serene Greedy. Section 7 reviews related work and gives further directions. Section 8 concludes the paper.
2 Utility-Based Adaptation of Multiple Applications Non-functional adaptation of multiple applications poses various challenges for the application developer. We describe these problems in the context of an execution environment for applications which facilitates adaptation to varying context [4]. We assume such an environment to follow an externalized approach to the implementation of selfadaptation where the adaptation logic is delegated to generic middleware working on the basis of models of the software and its context represented at runtime. We also presuppose the use of utilities as a means to specify the objectives that guide the adaptation logic [7]. 2.1 Components, Variants, Context, and Resources Applications are assembled of components, i.e., pieces of code, and several different collections of components, each called a variant, can realize the same application. At runtime, the knowledge required for adaptation is represented by plans, where each plan contains the code of a component and information how to assemble this component with others. Plans can be installed and removed at runtime, even those used by a running variant, so the set of available variants of an application can change dynamically. Context is a set of values that describes the world from the view of the middleware (but not properties of the middleware itself). Applications state which context they depend on and the middleware provides the corresponding values on request. Context values change in accordance with changes in the world and in principle such changes are outside of the control of the middleware, the applications, and the user. Resources, e.g., memory and CPU, are specific context values whose availability determines whether a variant can be executed in a specific situation. Each variant announces a specific, fixed amount for each resource that it requires. Resources are being assigned to variants by the middleware and only the middleware can take them away. Consequently, running variants can continue to run regardless of resource changes.
200
U. Scholz and S. Mehlhase
An application variant is valid if it is given enough resources, i.e., if for each resource the required amount is smaller than what is available, and if it uses installed plans only; otherwise, it is invalid. In addition, a valid, running variant becomes invalid if the user removes a plan that is used by that variant. 2.2 Utility and the Utility Function Many variants provide the same function to the user (e.g., participation in a picture sharing community), but often with different quality (e.g., with respect to reliability and bandwidth). The degree to which a particular variant has the potential to satisfy the user’s needs is called the utility of that variant, which is a real number between zero and one (worst and best). Each application variant has an associated utility function and the utility of a running application at a specific time point is given by evaluating the current variant’s utility function on the current context. Formally, the utility function is a mapping fu : V ×C → [0, 1], where V is the set of variants and C is the set of possible contexts. As shorthand, we say utility function of variant v to refer to the utility function where the variant is held constant to v. Because the function fu is arbitrary, the utility values of a variant under two different contexts are unrelated in the general case; likewise, the utilities of two different variants under the same context are unrelated. When working with multiple applications, the user takes interest in them to varying degree. To allow the user to indicate his preferences to the middleware, it is possible to assign a priority to applications, which is a real number between zero and one (lowest and highest). We call an unprioritized utility a raw utility. Priorities enable the user to weight the relevance of applications according to his/her real needs: Giving low priority to an application with high utility indicates that this application does not help the user much despite it provides optimal service on an absolute scale. The product of priority and raw utility of an application is the application’s weighted utility and the sum of the weighted utilities of all running applications, normalized by n the sum of their weights, is the overall weighted utility uow = Σ pi ui/Σ n pi , or simply overall utility. Utility-based adaptation assumes that uow equals user satisfaction and has the aim to keep this number high at all times. 2.3 Application States and Adaptation Depending on the interest of the user, an installed application can be in use or not. Consequently, applications can be in two different states called installed and running; users can start and stop them. Figure 1 gives a state diagram of the possible transitions. On starting an application, the middleware selects and configures its initial variant. After an event that might render the current variant sub-optimal, the middleware has to adapt, i.e., to re-consider all currently valid variants of all running applications together with their priorities. If necessary, it then has to exchange the current variant of some applications with another variant. The first step in this process is called adaptation reasoning, the second re-configuration. The latter step handles state transfer transparently. Besides re-configuring a running application and letting it untouched, adaptation reasoning can also decide to terminate an application, i.e., to stop it without user request:
Co-ordinated Utility-Based Adaptation of Multiple Applications
start installed starting aborted
stop
201
adapt
running
terminated
Fig. 1. State diagram of application states
If your running applications consume all available resources then running another one does not work. In the component based approach, termination is always necessary if an application does not have longer a valid variant as well as if a set of applications does not have a set of variants that is valid in combination (i.e., all sets contain an invalid variant). In the latter case, one or more applications of the set have to be terminated. For the same reasons as with termination, adaptation reasoning can abort the start of an application. The next section shows another possible reason for termination. 2.4 Indirect Dependencies between Applications In this work, we consider the adaptation of multiple, independent applications running on the same device. Although such applications do not functionally depend on each other, there is an indirect dependency among them via their shared use of system resources. In resource-constrained settings, giving more resources to one application requires to take them away from another. Therefore, distributing the available resources is part of finding a valid variant set. The same is true if the weighted utility of an application changes. For this reason, finding the variant set with the highest overall utility in general requires considering all running applications. Because of indirect dependencies, the maintenance of the optimal overall utility can cause termination: Consider the case of two applications that can run simultaneously, i.e., which have variants that are valid in combination. If a variant of one application has a high weighted utility but uses so much resources that no variant of the second can run then this variant alone might yield a higher overall utility than running any valid pair of variants. In this case, the middleware might stop the second application.
3 Non-functional Aspects of Adaptation The utility-based adaptation approach takes the overall weighted utility as sole measure for the quality of its result. In other words, utility is thought to equal user satisfaction. Ideally, the middleware constantly adapts the running applications such that their current variants provide optimal overall utility at all times. Changes in the variant set remain unnoticed by the user except for modifications in application functionality. The user is expected to approve these user-perceivable changes because they are essential for maintaining high utility, i.e., user satisfaction. Obviously, this approach is based on strong assumptions: It requires the application designer to encode the user’s perceived application quality into a real value between
202
U. Scholz and S. Mehlhase
zero and one. But there are additional user-visible effects that influence user satisfaction, especially when considering multiple applications. Even if each adaptation constantly upholds an optimal utility, the side-effects of this mechanism are noticeable by the user. In the following we discuss the side-effects performance decrease, stalling of applications, fidgetiness, and application termination. Performance decrease: Adaptation requires processing resources and if the reasoning runs on a machine that is shared with applications then these resources are not available for the latter. Because in the worst case, finding an optimal set of variants is exponential in the number of variants, the user might experience a significant performance decrease. For multiple applications this problem is particular prominent because it results in long search times even if each application has a moderate number of variants. Application stalling: After the reasoning mechanism has decided upon the new variants of applications, they are re-configured by the configuration middleware. This process requires to suspend and to re-start applications, which the user experiences as stalling. In general, we can assume that the middleware limits the negative effects of re-configuration by detecting and discarding requests for unchanged applications. Fidgetiness: User-visible changes, e.g., of the GUI and of application functionality, have the potential of annoying the user. In case of explicit user actions (e.g., disconnection of a device and change of location), the user “understands” and endorses a resulting adaptation. Drastic changes of unimportant applications that yield only slight improvements are less accepted. Systems that exhibit such behavior are fidget. Consider the case of an application with two, visually distinct variants. The utility functions of all variants are almost the same, except for the dependency on a binary context property: Each variant is slightly better than the other for one of the property values. If the property quickly oscillates between these two values, the application changes accordingly, although the absolute utility improvement is negligible for each change. In such cases, suppressing re-configurations for small improvements reduces the fidgetiness of the system. Application termination: The user can be displeased by applications that terminate on their own, i.e., without explicit user request. The same alienation can occur if the user starts an application the middleware decides it is better not to and aborts. As detailed in Sect. 2.3, this drastic measure must be taken if an application cannot be started or cannot continue to run and it might be the result of maintaining the optimal overall utility.
4 Adaptation Events and Affected Applications Mobile applications have to react to changes in their environment. If such adaptation events can affect the utility of the currently running applications then the middleware may have to adapt. Adaptation events can occur at any time and in any number. An application can be affected by one or more events or it can be unaffected. For each of the various combinations, the consequences for the adaptation and the overall system differ. In the following, we first examine the different events and then classify applications according to the events they are affected by.
Co-ordinated Utility-Based Adaptation of Multiple Applications
203
4.1 Adaptation Events For our notion of utility-based adaptation, we can distinguish five classes of events: Changes in Application Status: A user request to start or to stop an application results in an event of this kind. Plan Changes: Plans can be installed and uninstalled by the user at any time. Adding a plan can affect the utility of an application because it possibly allows new variants that improve utility. Removing a used plan renders the using application invalid; removing an unused one has no effect. Context Changes: If an application depends on a particular context element then changes in that element can cause a change of the utilities of all variants of that application. Because the mapping from context to utility is arbitrary and cannot be foreseen, finding the new best variant requires examining all variants. On the other hand, a running application can continue to run regardless of changes in context, although its utility might no longer be optimal. Priority Changes: The priority of an application scales the raw application utility. Consequently, a change in priority does not affect the validity of a variant nor the ordering of the variant of an application regarding utility. Of course, changing application priority can render the current variant set sub-optimal via indirect dependencies. Resource Changes: Variants of an application differ in their use of system resources. Because of indirect dependencies between applications, the amount of available resources determines the set of valid variants. 4.2 Classification of Applications Affected by Adaptation Events An adaptation event can affect the running applications in different ways: For example, if the user stops one application, the others can continue to run without change. On the other hand, preserving the optimal utility requires to consider all applications in combination: If one application adapts, the others have to adapt, too. In the following we define four classes in which we group the running applications in case of an adaptation event. To which class an application belongs depends on the kind of event and whether the application is directly affected or not. Loosely spoken, the classes are ordered top-down according to the “seriousness” of not adapting their applications. The classes are mutually exclusive and if an application could belong to several classes then it is included in the one mentioned first. For example, if an application is affected by a change of context and of resources, it is in class “Utilities Changed”. Adaptation Required: Contains applications that are started or stopped and that use a removed plan. Applications in this class must be adapted by the middleware. Applications not in this class do not require adaptation, i.e., all valid variants before the event are valid afterwards, although they might yield a low utility. For these applications the middleware can decide to skip adaptation reasoning and re-configuration. Utilities Changed: Contains applications affected by a context change. Because the utilities of the affected variants change arbitrarily, it is unknown without adaptation whether there is a better valid variant. Utilities Similar: Contains applications with new plans and with a new priority. Furthermore, it contains any application in case of a resource increase. The utility functions
204
U. Scholz and S. Mehlhase
previously valid variants of an application with new plans are unchanged while new variants might be available. The same is true for any application in case of a resource increase. If an application has a new priority, its variant set is unchanged but its utility function is scaled by a constant factor. Unaffected: Contains all applications not directly affected by any adaptation event. Adapting an unaffected application is least likely to improve overall utility: Provided it is given the same amount of resources then adaptation reasoning will decide for the currently running variant again. Because of indirect dependencies between applications, it might still be useful to adapt unaffected applications along with affected ones.
5 The Serene Greedy Adaptation Reasoning Technique This section presents an adaptation reasoning technique designed for resourceconstrained mobile devices that pragmatically balances optimality versus the nonfunctional aspects of adaptation presented in Sect. 3. In principle, these aspects are relevant for the adaptation of multiple applications in general. As different settings require different adaptation mechanisms, we state several properties that we assume to be present in resource-constrained platforms. We continue with detailing two adaptation mechanisms that have been demonstrated applicable to solve the adaptation problem. Finally, we present the Serene Greedy adaptation technique. 5.1 Adaptation in Resource-Constrained, Mobile Settings Adaptation in a mobile setting is assumed to be performed by a single algorithm running as part of the middleware; there are no resources to perform extensive negotiations and to wait for external consultancy. The adaptation process is atomic from the viewpoint of the applications and potentially affects all applications controlled by the middleware. Application adaptation has two parts: Adaptation reasoning and re-configuration. Changing a running variant as well as starting an application requires both steps. Adaptation reasoning is computationally expensive but does not stop applications that are reasoned about. Re-configuration is cheaper than reasoning but requires suspending and re-starting a running application. Performing adaptation reasoning for an application yields a list of all valid variants, sorted by utility. It always considers all variants of an application, caching and pre-processing between different adaptations are not used. Nevertheless, adaptation reasoning has to be performed at most once for an application during one adaptation process. In the following, we refer with “applications” to those only that can potentially run after adaptation, i.e., to non-stopped running applications as well as started ones. 5.2 Brute Force and Greedy The Brute Force adaptation technique [1] can serve as baseline for adaptation reasoning. It searches through all sets of variants of all applications. In particular, it always performs adaptation reasoning for all applications and it does not distinguish between adaptation events. Termination handling is taken into account by applying two optimization criteria: The first prefers large valid variant sets, the second optimizes overall
Co-ordinated Utility-Based Adaptation of Multiple Applications
205
sereneGreedy c_sig := 0.1 /* Double value in the range [0,1] */ A := set of all applications; sumP := 0 while(|A| > 0) S := {t | a in A, t:=getSereneGuess(a, c_sig), t!=null} if(S == {}) terminateOrAbortStarting(A) return else (p_a, u_a, v_a) := tuple in S with highest p_a*u_a A := A\{a}; sumP := sumP + p_a if(p_a*u_a/sumP >= c_sig || cannotContinueToRun(a)) establishVariant(v_a) else continue(a) Fig. 2. The Serene Greedy reasoning method
weighted utility. Therefore, Brute Force will prefer a large, low-yielding variant set over a single variant with high utility. If all applications can run, i.e., in a resource-rich setting, Brute Force will yield the optimal utility. On the down side, it is exponential in the number of variants: If it is applied to p applications with q variants each then it considers pq variant sets. The Greedy adaptation technique [1] performs adaptation reasoning on each application individually. It then selects applications one by one, preferring those that provide a valid variant yielding the highest weighted utility. If the resources are used up then it stops the remaining applications or aborts their start. Usually, Greedy evaluates much fewer variant combinations than Brute Force, i.e., only up to p × q. A drawback of Greedy is that the selected application variants may quickly exhaust the available resources. Thus often, the user will be able to run fewer applications than with an optimal Brute Force approach. 5.3 Serene Greedy The simplest way to prevent the non-functional downsides of adaptation is to not adapt. Reasoning techniques, such as Brute Force and Greedy, that always reason about all applications and that re-configure indiscriminately are prone to waste resources, stall applications, and annoy the user by being fidget. But obviously, not adapting – if possible at all – will likely result in a sub-optimal overall utility. Serene Greedy tries to reach a pragmatic balance between optimality versus the nonfunctional aspects of adaptation in two ways: (i) It uses a notion of significance, i.e., it tries to make unforced change to the system only if the improvement is deemed to be significant and (ii) it tries to guess whether an adaptation is significant or not, based on the classification of applications affected by adaptation events (cf. Sect. 4.2). Figures 2 to 4 present the Serene Greedy algorithm. Figure 2 shows the main loop, which collects guesses of achievable weighted utility, chooses the application with the best guess, and then either re-configures it or keeps it running unchanged. The latter option is taken if it is available and if a change would not yield a significant improvement.
206
U. Scholz and S. Mehlhase getSereneGuess(a, c_sig): (p, u, v) if(adaptationRequired(a) || cannotContinueToRun(a)) return adaptationReasoning(a) else v_a := running variant of a u_a := raw utility of v_a under the current context p_a := current priority of a if(!unaffected(a) && u_a<1-c_sig && p_a>=c_sig) return adaptationReasoning(a) else return (p_a, u_a, v_a) Fig. 3. Making serene guesses cannotContinueToRun(a): Boolean Yields true if either application a is currently not running or if there is no valid variant of a that can run with the available resources; false otherwise. adaptationReasoning(a): Tuple of priority, utility, and variant Performs adaptation reasoning on application a. Returns the tuple (p_a, u_a, v_a), where v_a is a valid variant that provides the optimal raw utility u_a and p_a is the priority of a. It returns null if there is no valid variant. On the first call to this function, all variants of a are considered; subsequent calls simply return a cached result. establishVariant(v_a) and continue(a) After applying the first method, v_a is the running variant of application a: If v_a is currently running then it remains untouched; otherwise, the method re-configures or starts v_a. The second method keeps the running variant of application a unchanged. adaptationRequired(a): Boolean and unaffected(a): Boolean These functions yield true if application a is classified in the respective class according to Sect. 4.2; false otherwise. terminateOrAbortStarting(A) All applications in set A are terminated if running; otherwise starting them is aborted. Fig. 4. Additional functions and methods of Serene Greedy
Figure 3 supplies the serene guesses: If an adaptation is required, it reports the resulting variant and its utility. For applications in class “Unaffected” it hands back the running variant and its current utility. For applications in classes “Utilities Changed” and “Utilities Similar” adaptation reasoning is performed only if their current raw utility is less than excellent (ua < 1 − csig ) and their priority allows for a significant weighted utility (pa ≥ csig ). Figure 4 describes the functions and methods used by the algorithm.
6 Discussion In this section we demonstrate and discuss the Serene Greedy adaptation mechanism regarding the non-functional aspects detailed in Sect. 3. The demonstration uses artificial applications that clearly exhibit the effects under discussion. Of course, an evaluation with real applications and users would be preferable. But because the impact of, e.g., fidgetiness is subjective, such studies require large samples and are out of scope of this
Co-ordinated Utility-Based Adaptation of Multiple Applications
207
paper. The applied algorithms were implemented as part of the MUSIC framework [9], which provides an externalized approach as it was described in Sect. 2.1. The source code of MUSIC and of the applications used in this section is available online.1 For pragmatic reasons, MUSIC does not consider resource changes as adaptation causing event, i.e., it categorizes an application that is only affected by a resource change as “Unaffected” and not as “Utilities Similar”. MUSIC currently disregards these events because they happen frequently and considering them would result in constant adaptation reasoning. Note however that the arguments given in Sects. 3 and 4 remain valid despite the change, Serene Greedy is demonstrated according to Sect. 5, and resources are heeded during adaptation reasoning. Section 7 details an extension that would allow taking advantage of resource changes. Serene Greedy requires as input a value for significance. The higher this value, the more applications remain unchanged during adaptation. Certainly, this value should be chosen in accordance with the applications under consideration: Important changes should yield a significant utility increase. For the following demonstration we have decided for a significance value of csig = 0.1. 6.1 Performance The performance of an adaptation mechanism strongly depends on the number of evaluated variants. A multi-application setting allows reducing this number in an easy way by avoiding the adaptation of unaffected applications. We demonstrate this effect with four applications in a resource-rich setting, where only two of them depend on a specific context element. Each of the applications has 10 different variants. After a context change event, we measure the total number of evaluated variants and the total number of applications that were subject to adaptation reasoning. We also measure the average adaptation time on a normal PC. The results of this scenario are summarized in Table 1. Table 1. Performance of different reasoning algorithms Algorithm Brute Force Greedy Serene Greedy
Avg. time (ms) # Variants # Apps reasoned about 117.72 15.70 9.41
10 000 40 20
4 4 2
The numbers clearly show the performance differences of the three algorithms: Brute Forces evaluates an exponential number of variants, which results in a large run time. Furthermore, Brute Force and Greedy reason about all applications, while Serene Greedy reasons only about two. Note that all three algorithms re-configure only the two applications that are affected by the context change. Although certainly synthetic, the given scenario is realistic. The number of applications is kept small, as on small devices most users do not use too many applications at the same time. Also, applications usually differ in their context dependencies. Note that using a resource-constrained setting, e.g., a small mobile device, would show an even more significant performance gain of Serene Greedy over the other two. 1
http://developer.berlios.de/projects/ist-music
208
U. Scholz and S. Mehlhase Table 2. Priorities, utilities, and memory requirements of applications A and B
Variant Priority Utility (context “1”) Utility (context “2”) Memory requirement (kB) A1 A2
0.5
0.9 0.8
0.8 0.7
100 40
B1 B2 B3
0.15
0.8 0.3 0.4
0.1 0.4 0.3
120 70 80
6.2 Fidgetiness and Stalling Fidget applications waste resources for reasoning and often stall. The following experiment shows how Serene Greedy improves on both effects while loosing only little in utility. The set up consists of two applications A and B in a resource-constrained setting that does not allow all variant combinations to be valid. The two applications depend on a context element that oscillates between the values “1” and “2”. Furthermore, application A has a higher priority as application B. Table 2 gives the priorities of the applications and the utility values of their variants, as well as their memory requirements. The available memory is 190 kB. Scene 1 in Fig. 5 shows the initial situation of the experiment, using context value “1”. The variant set with optimal overall utility A1 , B1 is not valid because of resource constraints. Therefore, Brute Force selects the set A2 , B1 . Greedy and Serene Greedy select both A1 because it has the highest weighted utility. We also assume that they both select B3 , so that they initially yield the same overall utility. After changing the context to “2” (scene 2), all reasoners change to set A1 , B2 . Brute Force adapts both applications, the others only one. On the following context change back to “1” (scene 3), their behavior differs: Brute Force selects its initial variant set, thus re-configuring both applications; Greedy re-configures application B; while Serene Greedy does not re-configures at all. The additional change from B2 to B3 yields an increase of overall utility of about 0.04, which is insignificant. Regarding overall utility, Brute Force always yields the best values: 0.8, 0.71, and 0.8, while the others are sub-optimal under context “1”. After reaching context “1” the second time, Greedy is slightly better than Serene Greedy, because the latter waives the re-configuration to B3 . The experiment shows that Serene Greedy re-configures, i.e., stalls less than the other two while yielding sub-optimal but comparable utility compared to Greedy for applications in classes other than Unaffected . The latter observation is supported by an analysis of the maximal difference between the two reasoners in case of a sub-optimal decision of Serene Greedy for a single application a after a sequence of identical ones. This difference is at most the overall utility using the optimal variant minus the one cur cur max cur sum pa ×(umax a −ua )/(pa +psum ), where u using the current, i.e., umax ow − uow = a , ua , and p are the current maximal raw utility of a, the current utility of the running variant of a, and the sum of priorities of all applications accepted prior to a, respectively. Because of csig > pa and umax ≥ ucur a a ≥ 1 − csig (cf. Fig. 3), each individual decision of Serene c Greedy is at most by sig/(1+c−1 ×psum ) ≤ csig below the optimum achievable in this situasig tion. Multiple fidget applications are uncritical, too, because with an increasing number
Co-ordinated Utility-Based Adaptation of Multiple Applications utility 0.8
1 2
0.75
2
1
0
0.7 1
2
3
209
Brute Force Greedy Sophisticated Greedy Numbers on the edges indicate the number of re-configured applications scene
Fig. 5. Behavior of the different adaptation mechanisms
of running applications their individual contributions decrease. The error term reflects this correlation by an increasing denominator. As demonstrated, Serene Greedy can reduce the fidgetiness when adapting multiple applications. Nevertheless, this problem cannot be solved by a reasoning technique alone: A fidget system disturbs the user, but how much it does so is subjective and also depends on the applications. For one user, a perceivable change can be important while another does not care. Likewise, a slight improvement in utility can make the difference for one user but not for another. In the end, overcoming fidgetiness will require coordinating the applications as well as foresight by the developer and user involvement. 6.3 Application Termination In a resource-scarce setting, the involuntary termination of applications can be unavoidable. The same can result from maintaining the optimal utility. As such behavior will likely annoy the user, it has to be minimized. Unfortunately, the problem of keeping a maximal number of applications running is a variant of the Knapsack problem and thus NP-complete [6]. Brute Force, which is optimal in this respect, evaluates an exponential number of variant combinations. Greedy and Serene Greedy perform better; consequently they are sub-optimal regarding utility and stop more applications. In general, the problem of termination becomes more prominent for systems which run adaptation techniques that limit the search effort in favor of performance: They might miss variant sets that keep applications running and therefore terminate more applications than necessary. Thus, performance and susceptibility to termination have to be considered in combination when choosing an adaptation algorithm. At least, an adaptation mechanism should keep important applications running, i.e., those with high weighted utility, and notify the user for those it decides to stop. Serene Greedy adheres to the first requirement by preferring high yielding applications. Clearly, the second one is a task for the middleware as a whole.
7 Related Work and Further Directions According to the roadmap presented in [2], our work on non-functional effects addresses challenges for the engineering of self-adaptive systems, among which are understanding the various aspects of self-adaptation, such as user needs and system properties, as well as classifying the modeling dimensions available. Serene Greedy falls in the category of effects of adaptation: It prefers the optimality and operationally of applications with high
210
U. Scholz and S. Mehlhase
utility over those with low. Its decisions are predictable because high value applications are likely to continue in case some applications must be stopped, it reduces overhead by not adapting low value applications, and it increases resilience because remaining operational is preferred over an insignificant performance gain. Systems that take into account non-functional aspects when adapting usually aim at more “high-level” aspects as the ones covered by this paper: Cheng et al. [3] present a language to describe non-functional objectives and information about the system, which allows an adaptation mechanism to take the described aspects into account. Aura [5] aims at selecting optimal providers for resources and other characteristics (e.g., security) to keep the user undistributed. Pladian et al. [8] try to improve user experience by anticipating future resource needs of multiple running applicationsand takes into account the overhead of adapting. Serene Greedy tries to limit this overhead by deciding to not adapt and not re-configure. While these methods address important non-functional aspects of adaptation, they are nevertheless susceptible to the “low-level” ones related to the actual search (or non-search) through the available set of variants. On small mobile devices – and for future demanding applications – we still consider these search-related issues of high importance because the optimality of a decision can be easily outweighed by the effort required to search for it. Sykes et al. [10] regard the frequency of adaptation and the related delay when adapting single applications, thus improving on stalling and user distraction. Our approach considers the adaptation of multiple applications, thus taking into account the fidgetiness caused by adapting unimportant applications. The remainder of this section presents directions in which we plan to extend Serene Greedy. As explained in Sect. 6, its implementation as presented in this paper does not use resource change events, so it misses the chance to gain valuable increases in utility on resource-constrained devices. We plan to remove this limitation as follows: Define an adaptation delay and, for each resource, a significant amount. For each resource, when reasoning about an application, record the need of the optimal variant. If more than this amount is available then disregard changes. Otherwise, on significant changes, adapt after the given delay. With this strategy, resource changes are taken into account but the undesired non-functional effects are limited. A way to limit stalling by re-configuration is to stop and re-start only those parts of an application that differ between the variants; unaffected parts can continue to run. Imagine an application with a GUI and a business component. If adaptation decides to exchange the latter part but leaves the GUI unchanged then the user might not notice the change. Realizing this technique transparently requires the application designer to provide information about which parts of an application can be adapted independently, new reasoning techniques that penalize the perpetuation of components, and administration code that buffers communication between re-configured and unchanged components. As mentioned before, the utility function can be kept (in part) in memory, which allows filtering variants beforehand and thus saves re-evaluating them again. Consequently, the time to decide upon a good variant set can be reduced. On the down side, the memory used for storing this information is not available to the applications such that this approach might yield sub-optimal utility. We plan to explore the use of more structured utility functions, where the same effect can be reached with less memory.
Co-ordinated Utility-Based Adaptation of Multiple Applications
211
8 Conclusions The operation of multiple adaptive applications on small, mobile devices requires handling them in a co-ordinated way. Otherwise, non-functional effects of the adaptation process can obstruct and annoy the user. This paper improves towards this end in three ways: (i) We discuss the problem of indirect dependencies between applications in resource-constrained settings and identify resulting non-functional aspects of adaptation, in particular the problem of application termination and of fidgetiness, i.e., disturbing adaptations of unimportant applications. (ii) We then analyze different events that can cause the applications to adapt. This analysis allows to classify running applications according to the consequences to the system and to the utility of not adapting them. Finally, (iii) we present Serene Greedy, an adaptation method based on the given classification. We compare an implementation of this method with two other adaptation techniques for the MUSIC middleware. The results show that Serene Greedy reduces stalling and fidgetiness of adapting multiple applications while providing improved performance and a utility comparable to the Greedy reasoning technique.
References 1. Brataas, G., Hallsteinsen, S., Rouvoy, R., Eliassen, F.: Scalability of decision models for dynamic product lines. In: Proceedings of the International Workshop on Dynamic Software Product Line, DSPL 2007 (September 2007) 2. Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J., et al.: Software engineering for self-adaptive systems: A research roadmap. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009) 3. Cheng, S.-W., Garlan, D., Schmerl, B.: Architecture-based self-adaptation in the presence of multiple objectives. In: ICSE Workshop on Software Engineering for Adaptive and SelfManaging Systems, SEAMS (2006) 4. Floch, J., Hallsteinsen, S., Stav, E., Eliassen, F., Lund, K., Gjørven, E.: Using architecture models for runtime adaptability. IEEE Software 23(2), 62–70 (2006) 5. Garlan, D., Siewiorec, D.P., Smailagic, A., Steenkiste, P.: Project Aura: Towards distractionfree pervasive computing. IEEE Pervasive Computing 21(2) (2002) 6. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R., Thatcher, J. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972) 7. Kephart, J.O., Das, R.: Achieving self-management via utility functions. IEEE Internet Computing 11(1), 40–48 (2007) 8. Pladian, V., Garlan, D., Shaw, M., Satyanarayanan, M., Schmerl, B., Sousa, J.: Leveraging resource prediction for anticipatory dynamic configuration. In: SASO 2007 Conference on Self-Adaptive and Self-Organizing Systems (2007) 9. Rouvoy, R., et al.: MUSIC: Middleware support for self-adaptation in ubiquitous and serviceoriented environments. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 164–182. Springer, Heidelberg (2009) 10. Sykes, D., Heaven, W., Magee, J., Kramer, J.: Exploiting non-functional preferences in architectural adaptation for self-managed systems. In: ACM Symposium on Applied Computing, Track on Dependable and Adaptive Distributed Systems (2010)
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay Amir H. Payberah1,2, Jim Dowling1 , Fatemeh Rahimian1,2 , and Seif Haridi1,2 1
Swedish Institute of Computer Science (SICS) 2 KTH - Royal Institute of Technology
Abstract. This paper presents gradienTv, a distributed, market-based approach to live streaming. In gradienTv, multiple streaming trees are constructed using a market-based approach, such that nodes with increasing upload bandwidth are located closer to the media source at the roots of the trees. Market-based approaches, however, exhibit slow convergence properties on random overlay networks, so to facilitate the timely discovery of neighbours with similar upload bandwidth capacities (thus, enabling faster convergence of streaming trees), we use the gossip-generated Gradient overlay network. In the Gradient overlay, nodes are ordered by a gradient of node upload capacities and the media source is the highest point in the gradient. We compare gradienTv with state-of-the-art NewCoolstreaming in simulation, and the results show significantly improved bandwidth utilization, playback latency, playback continuity, and reduction in the average number of hops from the media source to nodes.
1 Introduction Live streaming using overlay networks is a challenging problem. It requires distributed algorithms that, in a heterogeneous network environment, improve system performance by maximizing the nodes’ upload bandwidth utilization, and improve user viewing experience by minimizing the playback latency, and maximizing the playback continuity of the stream at nodes. In this paper, we improve on the state-of-the-art NewCoolstreaming system [7] for these requirements by building multiple media streaming overlay trees, where each tree delivers a part of the stream. The trees are constructed using distributed algorithms such that a node’s depth in each tree is inversely proportional to its relative available upload bandwidth. That is, nodes with relatively higher upload bandwidth end up closer to the media source(s), at the root of each tree. This reduces load on the source, maximizes the utilization of available upload bandwidth at nodes, and builds lower height trees (reducing the number of hops from nodes to the source). Although we only consider upload bandwidth for constructing the Gradient overlay in this paper, the model can easily be extended to include other important node characteristics such as node uptime, load and reputation. Our system, called gradienTv, uses a market-based approach to construct multiple streaming overlay trees. Firstly, the media source splits the stream into a set of substreams, called stripes, and divides each stripe into a number of blocks. Sub-streams allow more nodes to contribute bandwidth and enable more robust systems through F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 212–225, 2010. c IFIP International Federation for Information Processing 2010
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
213
redundancy [4]. Nodes in the system compete to become children of nodes that are closer to the root (the media source), and parents prefer children nodes who offer to forward the highest number of copies of the stripes. A child node explicitly requests and pulls the first block it requires in a stripe from its parent. The parent then pushes to the child subsequent blocks in the stripe, as long as it remains the child’s parent. Children can proactively switch parent when the market-modelled benefit of switching is greater than the cost of switching. The challenge with implementing this market-based approach is to find the best possible matching between parents and children in a timely manner, while having as few parent switches as possible. In general, for a market-based system to work efficiently, information and prices need to be spread quickly between participants. Insufficient information at market participants results in inefficient markets. In a market implemented using an overlay network, where the nodes are market participants, the communication of information and prices between nodes is expensive. For example, finding the optimal parent for each node requires, in principle, flooding to communicate with all other nodes in the system. Flooding, however, is not scalable. Alternatively, an approach to find parents based on random walks or sampling from a random overlay produces slow convergence time for the market and results in excessive parent switching, as information only spreads slowly in the market. We present a fast, approximate solution to this problem based on the Gradient overlay [17]. The Gradient is a gossip-generated overlay network, built by sampling from a random overlay, where nodes organize into a gradient structure with the media source at the centre of the gradient and nodes with decreasing relative upload bandwidth found at increasing distance from the centre. A node’s neighbours in the Gradient have similar, or slightly higher upload bandwidth. The Gradient, therefore, efficiently acts as a market maker that matches up nodes with similar upload bandwidths, enabling the market mechanisms to quickly construct stable streaming overlay trees. As nodes with low relative upload bandwidths are rarely matched with nodes with high relative upload bandwidths (as can be the case in a random overlay), there is significantly less parent-switching before streaming overlay trees converge. We evaluate gradienTv by comparison with NewCoolstreaming, a successful and widely used media streaming solution. We show in simulation that our market-based approach ensures that the system’s upload bandwidth can be near maximally utilized, the playback continuity at clients is improved compared to NewCoolstreaming, the height of the media streaming trees constructed is much lower than in NewCoolstreaming, and, as a consequence, playback latency is less than NewCoolstreaming.
2 Related Work There are two fundamental problems in building data delivery (media streaming) overlay networks: (i) what overlay topology is built for data dissemination, and (ii) how a node discovers other nodes supplying the stream. Early data delivery overlays use a tree structure, where the media is pushed from the root to interior nodes to leave nodes. Examples of such systems include Climber [14], ZigZag [18] and NICE [3]. The short latency of data delivery is the main advantage of
214
A.H. Payberah et al.
this approach [24]. Disadvantages, however, include the fragility of the tree structure upon the failure of nodes close to the root and the fact that all the traffic is only forwarded by the interior nodes. SplitStream [4] improved this model by using multiple trees, where the stream is split into sub-streams and each tree delivers one sub-stream. Orchard [11], ChunkySpread [19] and CoopNet [12] are some other solutions in this class. An alternative to tree structured overlays is mesh structure, in which the nodes are connected in a mesh-network [24], and nodes request missing blocks of data explicitly. The mesh structure is highly resilient to node failures, but it is subject to unpredictable latencies due to the frequent exchange of notifications and requests [24]. SopCast [9], DONet/Coolstreaming [25], Chainsaw [13], BiToS [20] and PULSE [15] are examples of mesh-based systems. Another class of systems combine tree and mesh structures to construct a data delivery overlay. Example systems include CliqueStream [2], mTreebone [22], NewCoolStreaming [7], Prime [10] and [8]. GradienTv belongs to this class, where the mesh is the Gradient overlay. The second fundamental problem is how nodes discover the other nodes that supply the stream. CoopNet [12] uses a centralized coordinator, GnuStream [6] uses controlled flooding requests, SplitStream [4] and [8] use DHTs, while NewCoolstreaming [7], DONet/Coolstreaming [25] and PULSE [15] use a gossip-generated random overlay network to search for the nodes. NewCoolstreaming [7] has the most similarities with gradienTv. Both systems have the same data dissemination model where a node subscribes to a sub-stream at a parent node, and the parent subsequently pushes the stream to the child. However, gradienTv’s use of the Gradient overlay to discover nodes to supply the stream contrasts with NewCoolStreaming that samples nodes from a random overlay (referred to as the partnerlist). A second major difference is that NewCoolStreaming only reactively changes a parent when a sub-stream is identified as being slow, whereas gradienTv proactively changes parents to improve system performance.
3 Gradient Overlay The Gradient overlay is a class of P2P overlays that arrange nodes using a local utility function at each node, such that nodes are ordered in descending utility values away from a core of the highest utility nodes [16,17]. As can be seen in Figure 1, the highest utility nodes (darkest colour) are found at the core of the Gradient, and nodes with decreasing utility values (lighter grays) are found at increasing distance from the centre. The Gradient maintains two sets of neighbours using gossiping algorithms: a similarview and a random-view. The similar-view of a node is a partial view of the nodes whose utility values are close to, but slightly higher than, the utility value of this node. Nodes periodically gossip with each other and exchange their similar-views. Upon receiving a similar-view, a node updates its own similar-view by replacing its entries with those nodes that have closer (but higher) utility value to its own utility value. In contrast, the random-view constitutes a random sample of nodes in the system, and it is used both to discover new nodes for the similar-view and to prevent partitioning of the similar-view.
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
215
Fig. 1. Gradient overlay network
4 GradienTv System In gradienTv, the media source splits the media into a number of stripes and divides each stripe into a sequence of blocks. GradienTv constructs a media streaming overlay tree for each stripe, where blocks are pushed from parents to children. Newly joined nodes discover stripe providers using the Gradient overlay and compete with each other to establish a parent-child relationship with providers. A node proactively changes its parent for a stripe, if it finds a lower depth parent for that stripe and if that parent either has a free upload slot or prefers this node to one of its existing children. We use the term download slot to define a network connection at a node used to download a stripe. Likewise, an upload slot refers to a network connection at a node that is used to forward a stripe. If node p assigns its upload slot to node q’s download slot, we say p is the parent of q and q is the child of p. Our market model uses the following three properties, calculated at each node, to match nodes that can forward a stripe with nodes that want to download that stripe: 1. Currency: the total number of upload slots at a node, that is, the number of stripes a node is willing and able to forward simultaneously. A node uses its currency when requesting to connect to another node’s upload slot. 2. Connection cost: the minimum currency that should be provided for establishing a connection to receive a stripe. The connection cost to a node that has an unused upload slot is zero, otherwise the node’s connection cost equals the lowest currency of its already connected children. For example, if node p has three upload slots and three children with currencies 2, 3 and 4, the connection cost of p is 2. 3. Depth: the shortest path (number of hops) from a node to the root for a particular stripe. Since the media stream consists of several stripes, nodes may have different depths in different trees. The lower the depth a node has for a stripe, the more desirable a parent it is for that stripe. Nodes constantly try to reduce their depth over all their stripes by competing with other nodes for connections to lower depth nodes. 4.1 Gradient Overlay Construction Each node maintains two sets of neighbouring nodes: a random-view and a similarview. Cyclon [21] is used to create and update the random-view and a modified version
216
A.H. Payberah et al.
Fig. 2. Different market-levels of a system, the similar-view of node p and its fingers
of the Gradient protocol is used to build and update the similar-view. The node references stored in each view contain the utility value for the nodes. The utility value of a node is calculated using two factors: a node’s upload bandwidth and a disjoint set of discrete utility values that we call market-levels. A market-level is defined as a range of network upload bandwidths that have the same utility value. For example, in figure 2, we define some example market-levels: mobile broadband (64-127 Kbps) with utility value 1, slow DSL (128-511 Kbps) with utility value 2, DSL (512-1023 Kbps) with utility value 3, Fibre (>1024 Kbps) with utility value 4, and the media source with utility value 5. A node measures its upload bandwidth (e.g., using a server or trusted neighbour) and calculates its utility value as the market-level that its upload bandwidth falls into. For instance, a node with 256 Kbps upload bandwidth falls into slow DSL market-level, so its utility value is 2. A node prefers to fill its similar-view with the nodes from the same market-level or one level higher. A feature of this preference function is that low-bandwidth nodes only have connections to one another. However, low bandwidth nodes often do not have enough upload bandwidth to simultaneously deliver all stripes in a stream. Therefore, in order to enable low bandwidth nodes to utilize the spare slots of higher bandwidth nodes, nodes maintain a finger list, where each finger points to a node in a higher market-level (if one is available). In Figure 2, each ring represents a market-level, the black links show the links within the similar-view and the gray links are the fingers to nodes in higher market-levels. Nodes bootstrap their similar-view using a bootstrap server, and, initially, the similarview of a node is filled with random nodes that have equal or higher utility value. Algorithm 1 is executed periodically by the node p to maintain its similar-view. The algorithm describes how on every round, p increments the age of all the nodes in its similar-view. It removes the oldest node, q, from its similar-view and sends a subset of nodes in its similar-view to q (lines 3-6). Node q responds by sending back a subset of its own similar-view to p. Node p then merges the view received from q with its existing
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
217
Algorithm 1. Updating the similar-view 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34:
procedure UpdateSimilarView this this.similarV iew.updateAge() q ← oldest node from this.similarV iew this.similarV iew.remove(q) pV iew ← this.similarV iew.subset() a random subset from p’s similarV iew Send pV iew to q Recv qV iew from q qV iew is a subset of q’s similarV iew for all nodei in qV iew do if Up (nodei ) = U (p) OR Up (nodei ) = U(p) + 1 then if this.similarV iew.contains(nodei ) then this.similarV iew.updateAge(nodei ) else if this.similarV iew has free entries then this.simialrV iew.add(nodei ) else nodej ← pV iew.poll() get and remove one entry from pV iew this.similarV iew.remove(nodej ) this.simialrV iew.add(nodei ) end if end if end for for all nodea in this.randomV iew do if Up (nodea ) = U(p) OR Up (nodea ) = U(p) + 1 then if this.similarV iew has free entries then this.simialrV iew.add(nodea ) else nodeb ← (x ∈ this.similarV iew such that Up (x) > U(p) + 1) if (nodeb = null) then this.similarV iew.remove(nodeb ) this.simialrV iew.add(nodea ) end if end if end if end for end procedure
Algorithm 2. Parent assignment 1: 2: 3: 4: 5: 6: 7: 8: 9:
procedure assignParent for all stripei in stripes do candidates ← findParent(i) if candidates = null then newP arent ← a random node from candidates send ASSIGN R EQUEST | i to newP arent end if end for end procedure
Algorithm 3. Select candidate parent from the similar-view and the fingers 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
procedure findParent i candidates ← Ø if this.stripei .parnet = null then this.stripei .parnet.depth ← ∞ end if for all nodej in (similarV iew f ingers) do if nodej .stripei .depth < this.stripei .parent.depth AND nodej .connectionCost < this.currency then candidates.add(nodej ) end if end for return candidates end procedure
218
A.H. Payberah et al.
similar-view by iterating through the received list of nodes, and preferentially selecting those nodes in the same market-level as p or at most one level higher. If the similar-view is not full, it adds the node, and if a reference to the node to be merged already exists in p’s similar-view, p just refreshes the age of its reference. If the similar-view is full, p replaces one of the nodes it had sent to q with the selected node (lines 8-20). What is more, p also merges its similar-view with its own local random-view, in the same way described above. Upon merging, when the similar-view is full, p replaces a node whose utility value is more than p’s utility value plus one (lines 21-33). The fingers to higher market-levels are also updated periodically. Node p goes through its random-view, and for each higher market-level, picks a node from that market-level if there exists such a node in the random-view. If there is not, p keeps the old finger. 4.2 Streaming Tree Overlay Construction Algorithm 2 is called periodically by nodes to build and maintain a streaming overlay tree for each stripe. For each stripe i, a node p checks if it has a node in its similarview or finger list that has (i) a lower depth than its current parent, and (ii) a connection cost less than p’s currency. If such a node is found, it is a added to a list of candidate parents for stripe i (Algorithm 3). Next, we use a random policy to select a node from the candidate parents, as it fairly balances connection requests over nodes in the system. In contrast, if we select the candidate parent with the minimum depth, then for even low variance in currency of nodes, it causes excessive connection requests to those nodes with high upload bandwidth.
Algorithm 4. Handling the assign request 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
upon event A SSIGN R EQUEST | i from p if has free uploadSlot then assign an uploadSlot to p send ASSIGNACCEPTED | i to p else worstChild ← lowest currency child if worstChild.currency ≥ p.currency then send ASSIGN N OTACCEPTED | i to p else assign an uploadSlot to p send RELEASE | i to worstChild send ASSIGNACCEPTED | i to p end if end if end event
Algorithm 4 is called whenever a receiver node q receives a connection request from node p. If q has a free upload slot, it accepts the request, otherwise if p’s currency is greater than the connection cost of q, q abandons one of its children with the lowest currency and accepts p as a new child. In this case, the abandoned node has to find a new parent. If q’s connection cost is greater than p’s currency, q declines the request.
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
219
5 Experiments and Evaluation In this section, we compare the performance of gradienTv with NewCoolstreaming under simulation. In summary, we define three different experiment scenarios: join-only, flash-crowds, and catastrophic failure, and, we show that gradienTv outperforms NewCoolstreaming in all of these scenarios for the following metrics: playback continuity, bandwidth utilization, playback latency, and path length.1 Experiment setup We have implemented both gradienTV and NewCoolstreaming using the Kompics platform [1]. Kompics provides a framework for building P2P protocols, and simulation support using a discrete event simulator. Our implementation of NewCoolstreaming is based on the system description in [7,23]. We have validated our implementation of NewCoolstreaming by replicating, in simulation, the results from [7]. In our experimental setup, we set the streaming rate to 512 Kbps and unless stated otherwise, experiments involve 1000 nodes. The stream is split into 4 stripes and each stripe is divided into a sequence of 128 KB blocks. The media source is a single node with 40 upload slots. Nodes start playing the media after buffering it for 30 seconds. This is comparable with the most widely deployed P2P live streaming system, SopCast’s that has average startup time of 30-45 seconds [9]. The size of a node’s partial view (the similar-view in gradienTv, the partner list in NewCoolstreaming) is 15 nodes. The number of upload slots for the non-root nodes is picked randomly from 1 to 10, which corresponds to upload bandwidths from 128 Kbps to 1.25 M bps. As the average upload bandwidth of 704 Kbps is not much higher than the streaming rate of 512 Kbps, nodes have to find good matches as parents in order for good streaming performance. We assume all the nodes have enough download bandwidth to receive all the stripes simultaneously. In gradienTv, we define 11 market-levels, such that the nodes with the the same number of upload slots are located at the same market-level. For example, nodes with one upload slot (128 Kbps) are the members of the first market-level, nodes with two upload slots (256 Kbps) are located in the second market-level, and the media source with 40 upload slots (>5 M bps) is the only member of the 11th market-level. Latencies between nodes are modelled using a latency map based on the King dataset [5]. In the experiments, we measure the following metrics: 1. Playback continuity: the percentage of blocks that a node received before their playback time. In our experiments to measure playback quality, we count the number of nodes that have a playback continuity of greater than 90%; 2. Bandwidth utilization: the ratio of the total number of utilized upload slots to the total number of requested download slots; 3. Playback latency: the difference in seconds between the playback point of a node and the playback point at the media source; 4. Path length: the minimum distance in number of hops between the media source and a node for a stripe.
1
The source code and the results are available at: http://www.sics.se/∼amir/gradientv
A.H. Payberah et al.
100
100
80
80 Playback continuity
Playback continuity
220
60
40
20
60
40
20 gradientv - join only gradientv - flash crowd gradientv - failure
0 0
200
400
600
(a) gradienTv.
800
newcoolstreaming - join only newcoolstreaming - flash crowd newcoolstreaming - failure
0 1000
0
200
400
600
800
1000
(b) NewCoolstreaming.
Fig. 3. Playback continuity in percent (Y-axis), against time in seconds (X-axis)
We compare our system with NewCoolstreaming using the following scenarios: 1. Join-only: 1000 nodes join the system following a Poisson distribution with an average inter-arrival time of 100 milliseconds; 2. Flash crowd: first, 100 nodes join the system following a Poisson distribution with an average inter-arrival time of 100 milliseconds. Then, 1000 nodes join following the same distribution with a shortened average inter-arrival time of 10 milliseconds; 3. Catastrophic failure: as in the join-only scenario, 1000 nodes join the system following a Poisson distribution with an average inter-arrival time of 100 milliseconds. Then, 400 existing nodes fail following a Poisson distribution with an average interarrival time 10 milliseconds. The system then continues its operation with only 600 nodes. In addition to these scenarios, we also evaluate the behaviour of gradienTv when varying two key parameters: (i) the playback buffering time and (ii) the number of nodes. Playback Continuity In this section, we compare the playback continuity of gradienTv and NewCoolstreaming in three different scenarios: join-only, flash crowd and catastrophic failure. In figures 3(a) and 3(b), the X-axis shows the time in seconds, while the Y-axis shows the percentage of the nodes in the overlay that have a playback continuity more than 90%. We can see that gradienTv significantly outperforms NewCoolstreaming for the whole duration of the experiment in all scenarios. Moreover, after the system stabilizes, we observe a full playback continuity in gradienTv. This out-performance is due to the faster convergence of the streaming overlay trees in gradienTv, where high-capacity nodes can quickly discover and connect to the source using the similar-view, while in NewCoolstreaming nodes take longer to find parents as they search by updating their random view through gossiping. Another reason for out-performance is the difference in policies used by a child to pull the first block from a new parent. In gradienTv, whenever a node p selects a new parent q, p informs q of the last block it has in its buffer, and q sends subsequent blocks to p, while in NewCoolstreaming, the requested block is determined by looking at the head of the partners. This causes NewCoolstreaming to miss blocks when switching parent.
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
221
100
100
80
80 Bandwidth utilization
Bandwidth utilization
Bandwidth Utilization Our second experiment compares the bandwidth utilization of gradienTv (figure 4(a)) and NewCoolstreaming (figure 4(b)). We observe that when the system has no churn, as in the join-only scenario, both systems equally utilized the bandwidth. In the flash crowd and catastrophic failure scenarios, the performance of the both systems drops significantly. However, gradienTv recovers faster, as nodes are able to find parents more quickly using the Gradient overlay.
60
40
20
60
40
20 gradientv - join only gradientv - flash crowd gradientv - failure
0 0
200
400
600
(a) gradienTv.
800
newcoolstreaming - join only newcoolstreaming - flash crowd newcoolstreaming - failure
0 1000
0
200
400
600
800
1000
(b) NewCoolstreaming.
Fig. 4. Bandwidth utilization in percent (Y-axis), against time in seconds (X-axis)
Path Length In the third experiment, we compare the average path length of both streaming overlays. Before looking at the experiment results, we calculate the minimum depth of a k-ary tree with n nodes using logk (n). In our experiments, there are on average 5 upload slots per node (as upload slots are uniformly distributed from 1 to 10), and the minimum depth of the trees is expected to be log5 (1000) ≈ 4.29. Figures 5(a) and 5(b) show tree depth of the system for gradienTv and NewCoolstreaming. We observe that gradienTv constructs trees with an average height of 4.3, which is very close to the minimum height. The figures also show that the depth of the trees in gradienTv are half the depth of the trees in NewCoolstreaming. Shorter trees enable lower playback latency. What is more, we observe that the average depth of the trees is independent of the inter-arrival time of the joining nodes. This can be seen in figures 5(a) and 5(b), where the depth of the trees, after the system stabilizes, is the same. More interestingly, in the catastrophic failure scenario, we can see a sharp drop in NewCoolstreaming tree depth, as a result of the drop in the number of nodes remaining in the system and the fact that many remaining nodes do not have any path to the media source. The same behaviour is observed in gradienTv, but since the nodes can find appropriate nodes to connect to more quickly, the fluctuation in the average depth of trees is less than in NewCoolstreaming. Playback Latency This experiment shows how the average playback latency of nodes changes over time in our three scenarios (figures 6(a) and 6(b)). In the join-only scenario, we can see that 200 seconds after starting the simulation, the playback latency in gradienTv converges
A.H. Payberah et al.
9
9
8
8
7
7
6
6
Avg path length
Avg path length
222
5 4 3 2
5 4 3 2
gradientv - join only gradientv - flash crowd gradientv - failure
1 0 0
200
400
600
800
newcoolstreaming - join only newcoolstreaming - flash crowd newcoolstreaming - failure
1 0 1000
0
200
(a) gradienTv.
400
600
800
1000
(b) NewCoolstreaming.
Fig. 5. Average path length in number of hops (Y-axis), against time in seconds (X-axis)
120
120
100
100
Playback Latency (seconds)
Playback Latency (seconds)
to just over 30 seconds, close to the initial buffering time, set at 30 seconds. For the join-only scenario, gradienTv exhibits lower average playback latency than NewCoolstreaming. This is because its streaming trees have lower depth, and, therefore, nodes receive blocks earlier than in NewCoolstreaming. This is also the case for the two other experiment scenarios, flash crowd and catastrophic failure. Here, we can see an increase in the average playback latency for both systems. This is due to the increased demand for parents by new nodes and nodes with failed parents. While the nodes are competing for parents, they may fail to receive the media blocks in time for playback. Therefore, they have to pause until a parent is found and the streaming is resumed. This results in higher playback latency. Nevertheless, when both systems stabilize, nodes will ignore the missing blocks and fast forward to the play from the block where the streaming from the new parent is resumed. Hence, the playback latency will improve after the system has settled down. There is a significant difference between the behaviour of gradienTv and NewCoolstreaming upon an increase in the playback latency. In gradienTv, if playback latency exceeds the initial buffering time and enough blocks are available in the buffer, nodes are given a choice to fast forward the stream and decrease the playback latency. In contrast, NewCoolstreaming jumps ahead in playback by switching parent(s) causing it to miss blocks, thus it negatively affects playback continuity.
80 60 40 20
gradientv - join only gradientv - flash crowd gradientv - failure
0 0
200
400
600
(a) gradienTv.
800
80 60 40 20
newcoolstreaming - join only newcoolstreaming - flash crowd newcoolstreaming - failure
0 1000
0
200
400
600
800
1000
(b) NewCoolstreaming.
Fig. 6. Average playback latency in seconds (Y-axis), against time in seconds (X-axis)
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay 100
50 Playback Latency (seconds)
Playback continuity
80
60
40 no buffer 10 seconds buffer 20 seconds buffer 30 seconds buffer
20
0 0
100
200
300
400
500
600
Time (s)
(a) Playback continuity against time.
223
no buffer 10 seconds buffer 20 seconds buffer 30 seconds buffer
40
30
20
10
0 0
100
200
300 Time (s)
400
500
600
(b) Playback latency against time.
Fig. 7. The behaviour of gradienTv for different playback buffer lengths (in seconds)
Buffering Time We now evaluate the behaviour of gradienTv for different initial playback buffering times. We compare four different settings: 0, 10, 20 and 30 seconds of initial buffering time. Two metrics that are affected by changing the initial buffering time are playback continuity and playback latency. Figure 7(a) shows that when there is no initial buffering, the playback continuity drops to under 20% after 50 seconds of playback, but as the system stabilizes the playback continuity increases. Buffering 10 seconds of blocks in advance results in less playback interruptions when nodes change their parents, but better playback continuity is achieved for 20 and 30 seconds of buffering. Figure 7(b) shows how playback latency increases when the buffering time is increased. Thus, the initial buffering time is a parameter that trades off better playback continuity against worse playback latency. Number of Nodes In this experiment, we evaluate the performance of the system for different system sizes. We simulate systems with 128, 256, 512, 1024, 2048, and 4096 nodes, where nodes join the system following a Poisson distribution with an average inter-arrival time of 100 milliseconds. In figure 8(a), we show the bandwidth utilization after all the nodes have joined (for the different system sizes). We define d as the time when all nodes have joined for a particular size. This means that for the system with 128 nodes, d is 13 seconds, while for the system with 4096 nodes d is 410 seconds. This experiment shows that, regardless of system size, nodes successfully utilize the upload slots at other nodes. This implies that convergence in terms of matching upload slots to download slots, appears to be independent of the number of nodes in the system. A necessary condition, of course, is that there is enough available upload and download bandwidth to deliver the stream to all nodes. In the second experiment, we measure the tree depth while varying system sizes. We can see in figure 8(b) that the depth of the trees are very close to the theoretical minimum depth in each scenario. For example, the average depth of the trees with 1024 nodes is 4.34, which is very close to log5 (1024) ≈ 4.30.
224
A.H. Payberah et al. 100
9
128 256 512 1024 2048 4096
8 7
60
40 128 256 512 1024 2048 4096
20
0
d+20 d+40 d+60 d+80d+100d+120d+140d+160d+180 Time (s)
(a) Bandwidth utilization against time.
Avg path length
Bandwidth utilization
80
6 5 4 3 2 1 0 0
100
200
300 Time (s)
400
500
600
(b) Path length against time.
Fig. 8. Bandwidth utilization and path length for varying numbers of nodes
6 Conclusions In this paper, we presented gradienTv, a P2P live streaming system that uses both the Gradient overlay and a market-based approach to build multiple streaming trees. The constructed streaming trees had the property that the higher a node’s upload capacity, the closer that node is to the root of the tree. We showed how the Gradient overlay helped nodes efficiently find good neighbours for building these streaming trees. Our simulations showed that, compared to NewCoolstreaming, gradienTv has higher playback continuity, builds lower-depth streaming trees, has better bandwidth utilization performance, and lower playback latency.
References 1. Arad, C., Dowling, J., Haridi, S.: Developing, simulating, and deploying peer-to-peer systems using the kompics component model. In: COMSWARE 2009: Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE, pp. 1–9. ACM, New York (2009) 2. Asaduzzaman, S., Qiao, Y., Bochmann, G.: CliqueStream: an efficient and fault-resilient live streaming network on a clustered peer-to-peer overlay. In: Proceedings of the 2008 Eighth International Conference on Peer-to-Peer Computing, pp. 269–278. IEEE Computer Society, Los Alamitos (2008) 3. Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable application layer multicast. In: SIGCOMM 2002: Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 205–217. ACM, New York (2002) 4. Castro, M., Druschel, P., Kermarrec, A.-M., Nandi, A., Rowstron, A., Singh, A.: Splitstream: high-bandwidth multicast in cooperative environments. In: SOSP 2003: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 298–313. ACM Press, New York (2003) 5. Gummadi, K.P., Saroiu, S., Gribble, S.D.: King: Estimating latency between arbitrary internet end hosts. In: SIGCOMM Internet Measurement Workshop (2002) 6. Jiang, X., Dong, Y., Xu, D., Bhargava, B.: Gnustream: a p2p media streaming system prototype. In: ICME 2003: Proceedings of the 2003 International Conference on Multimedia and Expo, Washington, DC, USA, pp. 325–328. IEEE Computer Society, Los Alamitos (2003)
gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay
225
7. Li, B., Qu, Y., Keung, Y., Xie, S., Lin, C., Liu, J., Zhang, X.: Inside the new coolstreaming: Principles, measurements and performance implications. In: IEEE INFOCOM 2008. The 27th Conference on Computer Communications, pp. 1031–1039 (2008) 8. Locher, T., Meier, R., Schmid, S., Wattenhofer, R.: Push-to-Pull Peer-to-Peer Live Streaming. In: Pelc, A. (ed.) DISC 2007. LNCS, vol. 4731, pp. 388–402. Springer, Heidelberg (2007) 9. Lu, Y., Fallica, B., Kuipers, F., Kooij, R., Van Mieghem, P.: Assessing the quality of experience of sopcast. Journal of Internet Protocol Technology 4(1), 11–23 (2009) 10. Magharei, N., Rejaie, R.: Prime: Peer-to-peer receiver-driven mesh-based streaming. In: INFOCOM (2007) 11. Mol, J.J.D., Epema, D.H.J., Sips, H.J.: The orchard algorithm: P2p multicasting without freeriding. In: P2P 2006: Proceedings of the Sixth IEEE International Conference on Peer-toPeer Computing, Washington, DC, USA, pp. 275–282. IEEE Computer Society, Los Alamitos (2006) 12. Padmanabhan, V.N., Wang, H.J., Chou, P.A., Sripanidkulchai, K.: Distributing streaming media content using cooperative networking. In: NOSSDAV 2002: Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video, pp. 177–186. ACM, New York (2002) 13. Pai, V., Kumar, K., Tamilmani, K., Sambamurthy, V., Mohr, A.E., Mohr, E.E.: Chainsaw: Eliminating trees from overlay multicast. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 127–140. Springer, Heidelberg (2005) 14. Park, K., Pack, S., Kwon, T.: Climber: An incentive-based resilient peer-to-peer system for live streaming services. In: Workshop on Peer-to-Peer Systems, IPTPS (2008) 15. Pianese, F., Keller, J., Biersack, E.W.: Pulse, a flexible p2p live streaming system. In: INFOCOM. IEEE, Los Alamitos (2006) 16. Sacha, J., Biskupski, B., Dahlem, D., Cunningham, R., Meier, R., Dowling, J., Haahr, M.: Decentralising a service-oriented architecture. Accepted for publication in Peer-to-Peer Networking and Applications 17. Sacha, J., Dowling, J., Cunningham, R., Meier, R.: Discovery of stable peers in a selforganising peer-to-peer gradient topology. In: Eliassen, F., Montresor, A. (eds.) DAIS 2006. LNCS, vol. 4025, pp. 70–83. Springer, Heidelberg (2006) 18. Tran, D.A., Hua, K.A., Do, T.T.: Zigzag: An efficient peer-to-peer scheme for media streaming. In: INFOCOM (2003) 19. Venkataraman, V., Yoshida, K., Francis, P.: Chunkyspread: Heterogeneous unstructured treebased peer-to-peer multicast. In: ICNP 2006: Proceedings of the Proceedings of the 2006 IEEE International Conference on Network Protocols, Washington, DC, USA, pp. 2–11. IEEE Computer Society, Los Alamitos (2006) 20. Vlavianos, A., Iliofotou, M., Faloutsos, M.: Bitos: enhancing bittorrent for supporting streaming applications. In: IEEE Global Internet, pp. 1–6 (2006) 21. Voulgaris, S., Gavidia, D., van Steen, M.: CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays. Journal of Network and Systems Management 13(2), 197– 217 (2005) 22. Wang, F., Xiong, Y., Liu, J.: mtreebone: A hybrid tree/mesh overlay for application-layer live video multicast. In: ICDCS 2007: Proceedings of the 27th International Conference on Distributed Computing Systems, p. 49 (2007) 23. Xie, S., Li, B., Keung, G.Y., Zhang, X.: Coolstreaming: Design, Theory and Practice. IEEE Transactions on Multimedia 9(8), 1661 (2007) 24. Ken Yiu, W.P., Jin, X., Gary Chan, S.H.: Challenges and approaches in large-scale p2p media streaming. IEEE MultiMedia 14(2), 50–59 (2007) 25. Zhang, X., Liu, J., Li, B., Yum, T.s.P.: Coolstreaming/donet: A data-driven overlay network for peer-to-peer live media streaming. In: IEEE Infocom (2005)
Collaborative Ranking and Profiling: Exploiting the Wisdom of Crowds in Tailored Web Search Pascal Felber1 , Peter Kropf1 , Lorenzo Leonini1 , Toan Luu2 , Martin Rajman2 , and Etienne Rivi`ere1 1
University of Neuchˆ atel, Switzerland [email protected] 2 EPFL, Switzerland [email protected]
Abstract. Popular search engines essentially rely on information about the structure of the graph of linked elements to find the most relevant results for a given query. While this approach is satisfactory for popular interest domains or when the user expectations follow the main trend, it is very sensitive to the case of ambiguous queries, where queries can have answers over several different domains. Elements pertaining to an implicitly targeted interest domain with low popularity are usually ranked lower than expected by the user. This is a consequence of the poor usage of user-centric information in search engines. Leveraging semantic information can help avoid such situations by proposing complementary results that are carefully tailored to match user interests. This paper proposes a collaborative search companion system, CoFeed, that collects user search queries and accesses feedback to build user- and document-centric profiling information. Over time, the system constructs ranked collections of elements that maintain the required information diversity and enhance the user search experience by presenting additional results tailored to the user interest space. This collaborative search companion requires a supporting architecture adapted to large user populations generating high request loads. To that end, it integrates mechanisms for ensuring scalability and load balancing of the service under varying loads and user interest distributions. Experiments with a deployed prototype highlight the efficiency of the system by analyzing improvement in search relevance, computational cost, scalability and load balance.
1
Introduction
Search engines certainly play the most significant role in today’s Web usage. Leading search engines rely on the observation of the structure of linked elements [7] (i.e., the graph formed by hyperlinks between pages and data items), which is used in conjunction with the keywords forming a query to decide on the most relevant elements, or for advanced approaches with user-centric search options and hints (e.g., when using Google’s SearchWiki [1]). These search engines do not leverage the collective knowledge that is created by the users as part of their navigation choices. Instead, the bulk of the score used to decide on this relevance depends
This work is partially funded by the Hasler fundation and SNF project 102819.
F. Eliassen and R. Kapitza (Eds.): DAIS 2010, LNCS 6115, pp. 226–242, 2010. c IFIP International Federation for Information Processing 2010
Collaborative Ranking and Profiling
227
Rank according to structural information and popularity
Rank according to userperceived pertinence
Rank according to structural information and popularity
Rank according to userperceived pertinence
on the links pointing to the element, that is, scores are mostly based on structural information. While efficient for retrieving the most relevant elements when the implicit semantic search area (i.e., interest domain) is the most popular one, there exist many situations where the elements that are the most cited, or belong to the most renowned sites are not those expected by the user. For instance, a Web search for the query term “Java” returns a list of elements that overwhelmingly focus on the programming language. This is obviously a result of the predominance of computers-related resources on the Web. Nonetheless, a user looking for information on the Indonesian island of “Java” will be dissatisfied by not finding any relevant information (from her point of view) before the items of rank 6 and 16.1 The solution for avoiding such a situation and obtaining better-tailored results is to pair the structural information used by the search engine with some semantic information about the expectations of a particular user. Concretely, information about which items were deemed interesting by other users with similar interests can be leveraged to avoid search domain inadequacies. As a result, the information diversity, which is not well captured by solely monitoring the structure of the information graph, can be achieved by taking into account the diversity of expectations from querying users and using the wisdom of crowds, learned from past accesses, to determine relevant content for one particular user. Information about one user’s interest can be derived from the set of elements that she accessed as a results of her previous queries (feedback information), and from the keywords forming these past queries themselves. Similarly, the set of elements that are deemed interesting by users of some semantic interest profile can be derived from the elements they accessed after a Web search, that is, relevant elements can be extracted by correlating user accesses and extracted interests. We believe that the best CoFeed: approach for proposing such a Collaborative Search engine ranking service service is to build a companion Query + service to complement search enQueryInterest 1. item a 50 item a' Profile Raw 2. item b Results 43.5 item b' gines, instead of creating a new results . . . 38 item i' stand-alone search mechanism. . . Indeed, despite many research n. item n 10.5 item m efforts invested so far to proCoFeed: pose collaborative search engines Collaborative Search engine ranking service (e.g., Faroo, YaCy, Wowd2 ), no Insertion of accessed items system has been able to reach a 1. item a 50 item a' 2. item b 43.5 item b' sufficient level of quality and effi. . ciency to truly compete with its . 38 item i' . . centralized counterparts. This is n. item n 10.5 item m a direct consequence of the bootFig. 1. Usage of a companion search service strap problem [9]: added value of a new collaborative search engine becomes perceivable only when the system has attracted enough users to fully sustain its specific functionalities. 1 2
On http://www.google.com at the time of writing. http://www.faroo.com, http://YaCy.net, http://www.wowd.com/.
228
P. Felber et al.
Centralized Search Engine
Figure 1 presents a general vision of the companion service: the user sends her request to a keyword-based search engine, which returns results based on structural information. Meanwhile, the same query is sent to the collaboratively built companion semantic search service. Note that the latter request is paired with some semantic profile, which is a representation of the user’s interest field. The companion service then returns a set of elements tailored to the user requirements on the basis of her semantic profile, and ranked according to their relevance to her interest domains. The additional results can then be presented together with the results from the traditional search engine used, in a similar manner that context-sensitive ads are presented as suggestions to the user for a query on most current centralized search engines’ results. This simple presentation is also used by [15]. Although more elaborate presentations of the results to the user can be devised, we consider this to be a research task on its own, and not the focus of this paper. Information about subsequent accesses (i.e., which item is accessed for some query and in which order) are sent to the semantic ranking service and used for building, for each request, a set of items that preserves information diversity. Building such a system poses a set of challenging research issues related to information management (Section 2). First, how to accurately capture the semantic information associated with user activities (profiling interests, using actual accesses to construct a representation of some user’s interests)? Second, how to process the feedback information to maintain sets of relevant elements that capture information diversity? Third, how to efficiently construct from these sets a tailored ranked list of results to answer user requests? Another challenging question, which this Routing Substrate Querying Peer (client) paper answers in detail (Section 3), concerns Semantic Profiling Profile an appropriate infrastructure for support1 Query ing such a service. A centralized approach Q 2 is easy to implement but scalability in number of users comes at a prohibitively high cost, especially if the service also has to tolerate failures. Moreover, it poses again a A ______ C ______ bootstrap problem, with many resources nec______ B Centralized Tailored essary before being able to serve a reasonResults Results ably sized set of users. On the other hand, a distributed (collaborative) architecture has 1 Node resp. of Repository P(Q) a much lower cost of bootstrap, and as 2 Ranking the number of users increases, the number Storage Insertion of servers also increases.3 Last, beside relieving the bootstrap and scalability issues, Fig. 2. Components & informa- distributed architectures are known to be tion flow better candidates for implementing fault routes
routes
routes
returns a result list tailored to the user semantic profiling
sorts stored items w.r.t. client profile
constructs
3
Note that end users are not necessarily acting as servers as in a pure peer-to-peer model. Instead, institutions can dedicate one or a few servers for provisioning the system as the popularity of the service increases—hence the collaborative aspect.
Collaborative Ranking and Profiling
229
tolerance and for balancing the load of serving clients over a large set of collaborative machines. Overall Architecture. Before describing in detail the components and algorithms of CoFeed, we start by a general overview of its architecture, as depicted in Figure 2. CoFeed consists of a software component on the client computer (typically a browser plugin) and a distributed infrastructure that implements the collaborative ranking. The distributed infrastructure is composed of a possibly large number of nodes that collectively store and update repositories of items and associated relevance feedback information. Queries from a client are sent to some existing search engine. At the same time, they are sent to CoFeed together with user-specific interest profile information. The routing substrate is in charge of delivering the query and the profile to the appropriate node, responsible for the target query’s repository (see arrows labeled 1 in Figure 2). Based on the query terms and the profiling information, the ranking module on that node produces a ranked result list tailored for the user. The client can then combine the lists obtained from the search engine and CoFeed to improve the overall quality of the results presented to the user. Relevance information is gathered on the user’s machine by observing accesses to elements returned by any of the search methods (documents A, B, and C in the figure). This information is used by the profiling module to consolidate the local interest profile of the user. It is also sent to the insertion module on the node that is in charge of the query repository, and along with the profile of the user, to update the relevance tracking information for this query (see arrows labeled 2 in Figure 2).
2
Profiling, Storing and Ranking
This section describes how our system gathers profiling information, processes user queries, and stores and ranks relevant documents. The set of information associated to one query and stored in CoFeed is called a repository. We use the notations and terminologies denoted in Table 1. Profiling user interests. The accesses of users to documents form the basis for constructing their local user interest profile (U P ). Each document from the result list is associated with a snippet, which contains a larger set of keywords (or tags) representing the content of the document. These keywords are used to form the interest profile (U P ) of the user, which is used in turn to construct document profiles (DP ) maintained in the distributed repositories. Keywords are normalized in the system by classical means (stemming, noise-word list, alphabetical sort, duplicate words elimination). As storing all keywords for all accesses is obviously not possible, CoFeed represents profiles using Bloom filters [6], which are space-efficient probabilistic data structures allowing fast and false-negative-free inclusion tests over a set of elements. Moreover, Bloom filters have the added advantage of preserving privacy, as users may not want their set of accessed elements to be sent in plain over the network.
230
P. Felber et al. Table 1. Notations
Q
Query (a set of keywords, normalized by stemming, stop words removal, etc.) P (Q) Node in charge of the repository for query Q RF item A relevance feedback item (composed of Q, D, U P , Snippet) D URL of a feedback item DP Document profile (Bloom filter) UP Interest profile of the user (Bloom filter) Snippet Summary of the document (title & synopsis with some/all query terms) F req Frequency of a feedback item for a query Q as managed by node P (Q) (calculated as a moving average) Tfirst First arrival time of a feedback item for a given query Q on P (Q) Tlast Last arrival time of a feedback item for a given query Q on P (Q)
A Bloom filter maps elements from an unbounded set to a bounded set of k bits in a bit array of medium size (8,192 bits in our prototype) by using k different uniform hash functions (we use 3 hash functions in CoFeed). Elements (keywords from snippets and queries) are inserted in the profiles (U P ) by setting the k bits corresponding to these hash functions in the associated filter. The inclusion is tested by checking the bits corresponding to each of the k hash functions, and can yield some false positives. This is not much of a concern in CoFeed, as Bloom filters are not used for inclusion tests but for estimating union and intersection sizes of two sets. This is done by counting respectively the number of bits set in the logical OR, or the logical AND of the two corresponding bloom filters. In CoFeed, we compare two profiles S1 and S2 by using 1 S2 | the Jaccard similarity: |S . This similarity metric between an U P and a S2 | |S1 DP represents the adequacy to the user interest domain of a document. The same metric between two DP s represents their semantic distance. In order to avoid the saturation of bloom filters over time as new queries are performed and as more feedback is inserted in CoFeed, we use for both document and user profiles a variant of bloom filters called time-decaying bloom filters [8]. In this variant, bits that are set are associated to decaying timers. Newer elements have a higher weight and older information gradually disappears over time. The larger memory required for each bit is compensated by the frequent removal of elements (and thus the clearance of some bits) from the set. Using this structure allows CoFeed to spontaneously adapt to variations in the popularity of queries and users to receive a feedback that is more relevant to their ongoing search session. Collecting interest feedback. When a user browses the result list for a query, the title, document reference, and snippet help her select the most relevant documents w.r.t. her query and her interests. The action of accessing some document following a query produces a feedback information item. It represents an implicit vote for a document that the user, given her implicit expectations (as summarized by her user profile U P ), deemed interesting for the query. The following information is tracked and forms an RF item: (1) the original query Q, (2) the document reference D, e.g., a URL, (3) the local interest profile U P of the user
Collaborative Ranking and Profiling
231
after it has been updated with keywords from Q and the snippet, and (4) the snippet of the document, when available. Elements that are not accessed are simply ignored. Managing repositories. The repository for a query Q is maintained by a specific node P (Q) in the system. Section 3 explains how this node is reached and how the load for popular queries is dynamically shared amongst several nodes. Managing a repository for some query Q consists of two operations: (1) the management of the relevance feedback information received for Q, and (2) the generation of the results to be sent to a user submitting a request for Q. We maintain one entry per tuple (Q, D) in the storage. The entries contain additional information (DP, Snippet, F req, Tfirst , Tlast ), which are used for various tasks: sorting query results, storage management and garbage collection. Upon arrival of a new RFitem (Q, D, U P, Snippet) at time t (see arrows labeled 2 in Figure 2), if an item (Q, D) already exists, it is updated by computing the union of the DP and U P bloom filters, updating the frequency, and setting Tlast to t; otherwise, a new item is created and initialized using the content of the new RFitem. Item ranking. When the node P (Q) receives a request under the format (Q, U P ) (see arrows labeled 1 in Figure 2), the storage manager extracts RFitems from the list associated with query Q and sorts them according to the similarity score w.r.t. the user profile (i.e., Sim(U P, DP )) and to the frequency. The resulting ranked list of document descriptors (U RL, Snippet) is then sent back to the user. To ensure that a user profile U P provides sufficiently meaningful information to rank search results according to the user’s interests, we use on the client side a threshold that specifies the minimum number of distinct documents from the ongoing search session that must be embedded in the user profile for it to be sent along with the query. This helps ensure a minimum level of quality in the results returned by CoFeed and avoids spending bandwidth and resources when no gain can be expected from the ranking information. Garbage collection. Clients are continuously inserting new feedback information in the system. The storage on each node may be limited. A garbage collection mechanism allow to reclaim periodically some storage space while making sure that the most important information is preserved. Whenever a predefined limit for storage size has been reached (or no resources are available), a set of rules based on (with decreasing order of priority): (1) frequency of items updates and last update time; (2) popularity thresholds; (3) utility of items for constructing results list. We omit further details of the garbage collection mechanisms for the sake of brevity.
3
Distributed Storage System
This section presents the design rationale of CoFeed’s distributed storage system for managing repositories and allowing efficient processing of ranking and feedback insertion requests. We describe the resulting architecture and focus specifically on its two key features, routing and load balancing mechanisms.
232
P. Felber et al.
As previously mentioned, our objective in the design of CoFeed is to support large populations of clients, each submitting many requests. To avoid the prohibitive cost of scalable centralized solutions (e.g., high traffic server farms), we propose a decentralized approach in which a set of nodes cooperates to provide the service. These nodes may be provided by ISPs or participating institutions (e.g., universities) that collectively share the processing load. The growth of the numbers of these nodes will follow the number of clients and allows solving the bootstrap problem from a resource provisioning perspective. The repository associated with a query is under the responsibility of a specific node in the system, but high loads are shared amongst several nodes. This node is located by using an efficient key-based routing protocol, which is described below. A challenging aspect when designing the CoFeed distributed infrastructure is that the popularity distribution of queries is typically very sparse (that is, distributed according to a power law). This means that a small subset of the queries is requested extremely often while the vast majority is only rarely requested. Given the high skew in the distribution, one must ensure that popular queries do not overload specific nodes in the infrastructure. To protect against such scenarios, we have designed adaptive load balancing mechanisms to dynamically offload nodes experiencing too much incoming load. These mechanisms rely neither on fixed load threshold parameters nor manual tuning (see Section 3). Routing. Each query Q is associated with a node P (Q). This node stores the repository associated to Q: documents references, relevance tracking and interest profiling information. Our overall design is a specialized form of a distributed hash table (DHT). It associates a key-based routing layer (KBR) and a storage layer. The role of the KBR layer is to locate the node responsible for some query based on its key. To that end, it relies on a structured overlay (e.g., an augmented ring), where each node is assigned a unique identifier and the responsibility of a range of data items identifiers. In our case, each query Q has an identifier determined by hashing its terms to a key h(Q). The node P (Q) whose range covers h(Q) is responsible for maintaining Q’s repository and for providing the appropriate sorted set of document references when asked to by some remote node. During the routing process, on each routing step towards the destination, the storage layer can be notified by a Transit call that a message is transiting via the local node. It can in turn modify the content of this message, or even answer the request on behalf of P (Q). This mechanism is used in our design to implement load balancing. A typical DHT provides a raw put/get interface to the application. Elements are stored as blocks on the node responsible for their key, and also retrieved as blocks. Our design differs in the important following point: our storage layer does not store information blindly, but provides an interface and functionalities that are specific to the storage and processing of ranking and feedback information. This has a strong impact on the design of fault-tolerance and load balancing mechanisms. We based our system on the routing layer of Pastry [18], known for its stability and its performance (small number of hops, usage of network distance for choosing neighbors, etc.). In Pastry, nodes are organized in an augmented ring and maintain routing tables of size O(logb N ), where b is a system parameter
Collaborative Ranking and Profiling
233
(keys are expressed in base b). Greedy routing succeeds in at most O(logb N ) steps. When routing a request to its destination, each intermediary node selects as the next hop a node from its routing table with an identifier that has a longer common prefix with the target key than itself. As each routing step “resolves” at least one digit, at most d = O(logb N ) routing steps are required. An interesting property of such a greedy routing strategy is that routing paths towards a destination converge to the same set of nodes, and do so with an increasing probability as they get closer to the destination: the more digits have been resolved, the less nodes remain that have a longer common prefix with the target key. Routes from all nodes to some key in the network collide in the last hops. The path convergence property is particularly useful for the design of load balancing mechanisms [17, 21], as described next. Load balancing. CoFeed needs to manage large numbers of users simultaneously and support the storage and access to repositories in a scalable manner. The sparseness of query popularities is the main problem, as nodes responsible for storing most popular queries may receive unbearable amounts of traffic. When some node P (Q) gets overloaded by requests to a popular query Q, it replicates its responsibility for managing information and answering requests related to Q. A wide range of techniques has been proposed for balancing load in structured overlays (e.g., [13, 17, 19, 21]). All these proposals however target scenarios where the number of accesses is much greater than the number of updates to the data. These systems support access to non-mutable data by placing replicas on nodes that lie on the path towards its key. Our system requirements are Links Used Periodic Updates different. First, the amount Links Not Used delegate's copy master's copy of writes (insertion of interQuerying (updated) ___________ Reply Application ___________ (updated) est tracking information) and ___________ ___________ (updated) ___________ ___________ (updated) the amount of reads (queries) Insert (Relevance) ___________ (updated) or Query (Q) are of the same order. Caching delta = 2 delta = 3 only read accesses is thus not Storage P(Q) del. delegates P(Q) possible: routing every inserLayer level 1 tion for a query Q to the node Transit (msg) Transit (msg) Send (msg) Deliver (msg) P (Q) would involve notifying Send (msg) Send (msg) all copies, resulting in a load KBR KBR KBR KBR similar to the one avoided by Query or Relevance Tracking information Propagation caching access requests. It is Propagation without delegation thus necessary to also cache inFig. 3. Delegation mechanism sertions, that is, to allow copies of information about a query to be modified independently from the “master” copy. We call such a copy a delegate: a replica onto which modifications are possible with only loose synchronization to its master copy. Second, queries are very dynamic by nature (e.g., a little-known personality can suddenly become famous and trigger millions of searches). Therefore, load balancing needs to be reactive, i.e., be able to initiate and cancel delegation dynamically as a function of the actual load.
234
P. Felber et al. Set: cand ← {c ∈ p.in | p.inc ≥ x∈p.in p.inx /|p.in|} foreach c ∈ cand (in parallel) do retrieve p.loadx from c if p.loadp > γdel × c∈cand p.loadc /|cand| then // Details of logging ommited for brevity during time Δlog , log requests from nodes cand foreach c ∈ cand do c.q, c.qload ← most frequent query from c, and its associated load p.loadavg ← p.loadp + c∈cand p.loadc / (|cand| + 1) choose d ∈ cand that yields the minimal |p.loadd (d.q → d) − p.loadavg | if d.qload > x∈p.in p.inx × ξdel then send a copy of the repository for query d.q to d delegate d.q to q Algorithm 1. Node p’s periodic (Δdel ) auditing of incoming links Table 2. Delegation: constants and notations
Constants Δdel Auditing period Δlog Request logging period γdel Imbalance tolerance before delegating ξdel Minimum relative gain for delegation decision Notations p.in All “last-hop” nodes that sent some request to p during last period Δdel p.inx Number of requests p received from x during last period Δdel p.loadx Inc. request load at x as known to p
p.loadx cand
Value 15mn 5mn 180% 10% x
p.inx
p.loadp p
p.in
Figure 3 presents the principle of delegation: a request (either for ranking or for insertion) is sent by the node on the left side and is routed towards the node P (Q) on the right side. As the next to last node on the path is a delegate of P (Q) for Q, it notices that a request for Q is going through its KBR layer and intercepts it. It replies on behalf of P (Q) or inserts the information in its local copy. Periodic synchronization takes place between the delegates and their delegator (which may itself be a delegate). Delegates are chosen according to the auditing Algorithm 1, which is run periodically by each node to evaluate its need for delegation. Table 2 gives the default parameter values used by the algorithm, as well as the notations used in the pseudocode. The periodic auditing of the local load for deciding on a new delegation works as follows. P (Q) keeps a counter p.inx of the number of requests received on each of its incoming links p.in, labeled by the previous hop x. Note that p does not maintain information about which query was targeted, as the role of this lightweight passive monitoring is only to detect load imbalance and not to spot their origin. All nodes in p.in that sent more than the average load received on
Collaborative Ranking and Profiling
235
all p’s incoming links are asked for their own incoming load, normalized to the period Δdel . This information is stored in p.loadx for node x. The auditing of nodes for delegation is done only if sufficient imbalance is detected between the incoming load on node p and the load experienced by nodes in the cand (candidates) set. The imbalance threshold is γdel : a value of 180% indicates that p has to handle more than 80% more requests than the average of cand nodes being investigated for possible delegation. If some imbalance is detected, the node enters a logging phase (active monitoring) in which the requests received from cand are recorded. This phase does not have to be as long as the passive monitoring phase, as only the most requested queries are of interest to p for deciding on a delegation, and those are likely to occur in great quantity even in a short period. Then, the most popular query received from each node c ∈ cand is evaluated as a potential target for delegation. Basically, we select as delegate a node such that, when ignoring the most popular set of request for the same query coming from that node, the difference between the load experienced by p and the average load experienced by the nodes in cand is minimal. Said differently, the goal is less to unload p than to evenly distribute the processing load on all nodes. Moreover, in order to prevent oscillations of delegations and un-delegations, p requires that at least ξdel percent of its load will be handled by the new delegate. When the delegation of a query by node d is decided, p sends a copy of the repository it has for the delegated query and instructs d to handle requests on its behalf. The cost of sending a delegation depends only on the size of the repository, which is typically small (in the order of a few kilobytes). Delegates can in turn use this mechanism for redelegating Q: the master copy on P (Q) and its delegates form a tree. Synchronization between the copies is performed periodically when the number of changes, denoted delta in Figure 3, reaches a configurable threshold. Pair-wise synchronization is used to aggregate the two copies in a new list, either by inserting “new” elements in the master list or by re-ranking the union of the two lists and keeping the k highest items. This list is then forwarded along the tree, resetting all deltas to 0. Delegations are revoked by similar mechanisms: a node can revoke a delegation, based on the observation of requests load, either if it receives notably more requests than the other node for which it is a delegate, or if the revocation of the delegation helps balancing the load between a delegate and its delegator (i.e., the mean incoming load for both nodes gets closer to the average load observed by the delegate). This process uses hysteresis-based threshold values to avoid oscillations: the threshold for triggering delegation is higher than the threshold used for revoking one. We omit the detailed algorithm for brevity.
4
Evaluation
In this section, we evaluate CoFeed using two methods. Both use an actual implementation of the system. First, we assess the validity of interest profiling by running it against user behavior models. Second, we evaluate the performance
236
P. Felber et al.
and effectiveness of the infrastructure itself by observing the peak performance on a single node and the scalability in terms of managed elements, as well as distributed aspects: performance of routing, load balancing and reactiveness to dynamically changing loads. Experiments were conducted on a cluster of 11 dual-core computers, each with 2 GB of main memory and running GNU/Linux. In experiments that do involve large number of nodes but no time-based performance measurements, each machine of the cluster executes multiple processes that represent different nodes. Naturally, for experiments that evaluate the performance of a single node w.r.t. time or peak performance, machines are used exclusively by one process. The implementation is based on a combination of C and Lua deployed using the Splay infrastructure [11]. User-centric ranking effectiveness. We first evaluate the effectiveness of interest-based profiling and ranking to actually report better tailored results to the user, especially in the case where this user issues request for ambiguous query terms. To that extent, we developed both a synthetic data distribution model and a user behavior model. We do not consider distributed system aspects in this first part of the evaluation and assume that one node replies to all requests coming for one particular query (i.e., there is no use of load balancing). Our evaluation metrics are the ranks of elements of interest for the user, given her interest domain, with and without interest profiling. We consider a set of U users u1 , u2 , . . . , interested in a set of queries Q = q1 , q2 , . . . (e.g., “java”, “jaguar”, etc.). All these terms are ambiguous, and are associated to a set of documents (or elements) belonging to two or more interest domains chosen amongst D = d1 , d1 , . . . . The actual number of domains dom(qi ) for one query qi is determined randomly using a power-law distribution: dom(qi ) = 1 + extra(qi ), with Pr[extra(qi )] ∝ extra(qi )−αdom/query . This means that most queries are associated with documents along 2 domains, a smaller set with documents over 3 domains, an even smaller with 4. No query is associated to more than 4 domains, and the parameter αdom/query determines the skewness of this distribution. Each domain has a popularity, which is also determined using a power-law distribution: P r[di ∈ D] ∝ i−αdompop . For each query qi , the dom(qi ) domains are selected according to this domain popularity distribution. Each user is interested in one single domain, also selected according to the same domain popularity distribution, and issues requests for elements related to this domain. We consider a set of documents (or elements) E = e1 , e2 . . . , each of which is associated with one single interest domain chosen according to the domains’ popularity distribution. For each domain di we create a list of documents E(di ), which is used as follows to generate a set of elements at each repository. Each query qi is associated with a sorted set of 100 documents E(qi ) representing the repository’s content. Each element in this set is dedicated to one of the domains for which qi is associated, chosen according the domain popularity distribution. The elements of the set are then filled by using a randomly picked and shuffled subset of E(di ). We use the values in Table 3 for the parameters of the workload.
Collaborative Ranking and Profiling
237
Table 3. Workload parameters Name |U | |Q| |D| |E|/|D| αdom/query αdompop
Value 500 2,000 20 400 1 0.8
Role Number of users Number of queries Number of interest domains Number of documents/elements per domain Distribution of the number of extra domains per query Distribution of the popularity of interest domains
Each document is associated with some text that represents the content of the document. This text is composed of a random number of keywords (between 15 and 30) chosen among queries from the domains associated with the document. To simulate the fact that the snippet returned by a centralized search engine for a given document will vary according to the search keywords (e.g., it highlights the sentences that surround the occurrence of the keywords in the original document), the snippet is generated as a random subset of 5 to 7 keywords forming the document content. One such snippet is generated initially for each query a document is attached to. The search and access behavior of users is modeled using two phases. In a first phase, each user issues requests for queries that are attached to her interest domain and receives the list of elements as it is stored in the repository (i.e., without using interest-based ranking). This process continues until the user has sent at least 100 interest feedback items to the system (by simulated clicks on some of the returned results). This first phase helps construct the user and document profiles. We simulate the behavior of a user interested in the domain d receiving a list of items for some query q as follows. The user favors elements that are (1) higher up in the list, and (2) related to domain d. To model this behavior, we choose accessed elements according to a power-law distribution of the ranks in the list, with P r[accessing ith element] ∝ i−0.8 , and we drop links that are not in d with probability 80%. In other words, there is a 20% chance that a user accesses some links that are not in her interest domain. This accounts for some “pollution” in the user and document profiles that is representative of real users’ behaviors. In a second phase, we compare the impact on the lists received by the users for their queries, of the use of the profiles and interest-based ranking. This allows us to evaluate whether the user profiling helps in leveraging links interesting to the user by ranking them higher. For our evaluation, we consider two sets of domains: the 25% most popular ones (ranked 1 to 5) and the 25% least popular ones (ranked 16 to 20). We consider all the requests made by users that are interested in any of the domains of each set. For each such request, we examine the ranks of items that belong to the corresponding interest domain. We consider the ranks of the first 5 elements in the returned lists that are of the correct domain: the higher these 5 elements are in the list, the more effective the search mechanism is from the user point
238
P. Felber et al.
(a) 25% least and most popular domains (b) 5% least and most popular domains Fig. 4. Impact of interest-based profiling
of view. We compare the distribution of these ranks both when using the direct result from the simulated search engine, and when using CoFeed. Obviously, there are more users interested in the 25% more popular interest domains than in the 25% least popular ones, and elements that are in the latter are ranked lower in the list returned by the centralized search engine model (being attached to an ambiguous query, they compete for positions in the list with at least one more popular domain). The goal of CoFeed is to promote links that are related to the user’s domain of interest toward the first positions of her tailored list. Figure 4(a) shows the cumulative distribution of the rank in the returned list for these first 5 elements, both when CoFeed interest-based ranking is used and when it is not, considering the 25% most/least popular elements. We observe that elements for the popular domains are already ranked higher than elements for the least popular domains: the median of the ranks of elements for popular domains is 5, while it is about 20 for unpopular domains. It follows that for both sets, CoFeed’s ability to promote in the list the elements that are really of interest to the user is real, as in these representative sets, a vast majority of such elements appears in the first 5 ranks of the list. Figure 4(b) presents a similar plot, but when considering the 5% most/least popular set of the interest domains. We observe similar results, with unpopular domains ranked much higher in the list for the users who need it. Repository peak performance. Next, we observe the performance of our prototype by running a single repository P (Q) on a single machine (Core 2 Duo processor at 2.4 GHz with 2 GB memory) submitting synthetic request loads in a synchronous manner: we therefore achieve the highest possible throughput of requests that can be handled by one node in the system. We do not limit the size of the repository, as we want to highlight the relative cost of inserting feedback information and ranking elements as a function of the number of stored elements. Figure 5 presents the maximal throughput evolution for insertions and ranking requests submitted alternatively. We observe that for reasonable repository sizes (up to 8,000 elements, which we expect to be the common case in practice), the throughput is consistently higher than 100 requests served per second. Note that the costs for one single request increase logarithmically in the size of the repository. The throughput still achieves as many as 50 ranking and 100 insertion requests per second with 30,000 items in the repository.
Items stored (thousands)
80 70 60 50 40 30 20 10 0
Insertion/Ranking requests per second
Collaborative Ranking and Profiling
300 250 200 150 100 50 0
239
Size of the repository for Q at node P(Q) 0
5
10
15
20
25
30
600
Insertions Ranking requests
400 200 0 0
0
5
10
15 20 Time (minutes)
1
25
30
Day 1: balancing from skewed popularities
12
Day 2-3: transient popular query load adaptation
Maximum
Delegates
90th perc.
Primary
75th perc.
8
20 16
th
50 perc. 25th perc.
12
5th perc.
4
24
8 4
0
0 0
4
8 12 16 20 24 Time (hours) 20/s Request rate evolution for one popular query
32 40 48 56 64 72 Time (hours)
Requests per second (one pop. query, cumulative)
Requests per second (all queries, distribution)
Fig. 5. Performance, single repository: max. possible load vs. repository size.
2/s
Fig. 6. Evaluation of the delegation mechanism reactiveness and efficiency
Routing layer. We measured the distribution of route lengths at the KBR layer for various system side. As expected [18], the distribution of route lengths is balanced around a low average route size (3.7 for 128 nodes, 5.7 for 4,096 nodes, 6.5 for 16,3984 nodes), which grows logarithmically in the system size. Delegation-based load balancing: efficiency and reactiveness. Figure 6 shows a 3-days experiment using real request load from AOL [16].4 The experiment evaluates two aspects: a bootstrap phase with no dramatic change in the user interest, showing the balancing process with stable popularity distributions, and a second phase with a previously unknown query Qpop (artificially added to the AOL data set) suddenly generating a massive load in the system followed by a massive loss of popularity. During this time period, the associated P (Qpop ) has to efficiently tackle the massive and sudden load imbalance. The first day (on the left) presents the evolution of the distribution of the request load on all 1024 nodes, as delegation progressively takes place. The 4
Unfortunately we could not use this data set for our evaluation of the profiling and ranking effectiveness because it lacks the necessary feedback information.
240
P. Felber et al.
system starts up without bootstrap at time t0 = 0 hours and initially no delegation is made. The evolution of the load distribution is presented by stacking up percentiles: the median load is thus represented by the 50th percentile and the maximal load by the lightest shade of gray. We observe that, from an initial high imbalance (where some nodes receive 10 times more load than 50% of all nodes), the system quickly converges to a reasonable imbalance (the most loaded node receives approximately twice as many requests than 50% of the nodes). Further balancing could probably be achieved by modifying γdel and ξdel (see Section 3), but the small extra gain would likely not compensate for the additional synchronization messages necessary to more evenly balance the load. The two last days (on the right) present the reactiveness of the system for one single and suddenly popular query Qpop (while the first day presented results for all queries). At time ts =24 hours, randomly chosen nodes in the system start issuing requests for Qpop . The rate (shown in the bottom-right graph) reaches 20 requests per second in 4.8 hours, i.e., 10 times more than the median overall load at each node; it then remains constant for 9.6 more hours, before decreasing during 19.2 hours. The upper graph presents the load for Qpop at P (Qpop ) (black bars) and its delegates (gray bars). Each bar represents the load and number of delegates at the end of a 70-minutes observation period. We observe that the number of delegates follows the popularity trend closely, in both directions (gain and loss). Furthermore, the load at P (Qpop ) experiences a small increase in the beginning but remains very low and stable when delegation is active. While some delegates may have only a very small portion of the load, they are still serving 1 or 2 queries per second, i.e., about the median load at all nodes. This is due to delegation decisions being made not based on fixed threshold but on the comparison of the loads of several nodes. Similarly, some delegates have a higher load than others but the imbalance remains within the limits imposed by the γdel and ξdel parameters.
5
Related Work
Many of the research efforts on P2P Web search focus on decreasing the bandwidth consumption as compared to a centralized approach [5,12,14,22]. However, none of these P2P systems has yet succeeded in gaining sufficient popularity as they all suffer from the bootstrapping problem. CoFeed avoids this problem by leveraging existing search engines and providing added value to the user. The personalization of search results for a user based on her interest profile was studied by [23, 24] but not exploited in the context where knowledge is collaboratively built and aggregated. The use of social annotations (e.g., from bookmarking platforms such as del.icio.us) to improve Web search has been recently explored [4,20]. Another example is the PeerSpective system [15], which leverages implicit interest between communities of users based on the posting of links from one page to the other on social networks (e.g., FaceBook, MySpace, etc.). Such services operate in a centralized way and require intervention from the user to bookmark and annotate accessed items, which restricts them to a small subset of power users.
Collaborative Ranking and Profiling
241
Our approach is more similar to the Chora [9] and Sixearch [3] systems, which also use decentralized architectures for sharing and leveraging user search experiences. CoFeed differs from these systems in several ways, notably they do not use interest profiling nor do they target information diversity. A decentralized storage specifically designed for P2P Web search has been proposed in [10] for term frequency-inverse document frequency (TF-IDF). Unlike CoFeed, this system does not provide any mechanism for handling the skew in the popularity of queries, and it does not deal with the terms extraction nor use user-centric information to answer the queries. Lopes et al. have proposed in [13] a storage architecture for large data on top of a DHT, using B+-trees to balance the storage load over several nodes. This architecture was designed for TF-IDF and only supports non-mutable data. Several other systems use the inverse routing paths convergence property, notably for load balancing [21] and or for replication and performance [17].
6
Conclusion
We have presented the architecture and building blocks of a novel collaborative ranking service, CoFeed, that can efficiently complement existing search engines. CoFeed leverages user-centric information such as interest profiling and relevance tracking in order to return search result lists tailored to the user interests. Collaborative ranking allows us to present tailored results to users, which can be more relevant especially when the user expectations do not follow the main trend. CoFeed combines methods for interest profiling and mechanisms to maintain information diversity. It builds on a support distributed P2P systems that combines classical key-based routing with an application specific storage layer. This layer proposes novel load balancing mechanisms based on the application needs and characteristics.
References 1. http://googleblog.blogspot.com/2008/11/ searchwiki-make-search-your-own.html 2. Adamic, L.A., Huberman, B.A.: Zipf’s law and the internet. Glottometrics 3, 143–150 (2002) 3. Akavipat, R., Wu, L.-S., Menczer, F., Maguitman, A.: Emerging semantic communities in peer web search. In: P2PIR 2006 (2006) 4. Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: WWW 2007 (2007) 5. Bender, M., Michel, S., Weikum, G., Zimmer, C.: The Minerva project: Database selection in the context of P2P search. Datenbanksysteme in Business, Technologie und Web 65, 125–144 (2005) 6. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
242
P. Felber et al.
7. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998) 8. Cheng, K., Xiang, L., Iwaihara, M., Xu, H., Mohania, M.M.: Time-decaying bloom filters for data streams with skewed distributions. In: RIDE-SDMA 2005 (2005) 9. Gylfason, H., Khan, O., Schoenebeck, G.: Chora: Expert-based p2p web search. In: AAMAS 2006 (2006) 10. Klemm, F., Aberer, K.: Aggregation of a term vocabulary for peer-to-peer information retrieval: a DHT stress test. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005. LNCS, vol. 4125, pp. 187–194. Springer, Heidelberg (2005) 11. Leonini, L., Rivi`ere, E., Felber, P.: SPLAY: Distributed systems evaluation made simple (or how to turn ideas into live systems in a breeze). In: NSDI 2009 (2009) 12. Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: The feasibility of peer-to-peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735. Springer, Heidelberg (2003) 13. Lopes, N., Baquero, C.: Taming hot-spots in dht inverted indexes. In: LSDS-IR 2007 (2007) 14. Luu, T., Klemm, F., Podnar, I., Rajman, M., Aberer, K.: Alvis peers: A scalable full-text peer-to-peer retrieval engine. In: Proc of P2PIR 2006 (2006) 15. Mislove, A., Gummadi, K.P., Druschel, P.: Exploiting social networks for internet search. In: HotNets 2006 (2006) 16. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: InfoScale 2006, New York, NY, USA (2006) 17. Ramasubramanian, V., Sirer, E.G.: Beehive: O(1)lookup performance for powerlaw query distributions in peer-to-peer overlays. In: NSDI 2004 (2004) 18. Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, p. 329. Springer, Heidelberg (2001) 19. Rowstron, A., Druschel, P.: Storage management and caching in PAST, a largescale, persistent peer-to-peer storage utility. In: SOSP 2001 (2001) 20. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J.X., Weikum, G.: Efficient top-k querying over social-tagging networks. In: SIGIR 2008 (2008) 21. Serbu, S., Bianchi, S., Kropf, P., Felber, P.: Dynamic load sharing in peer-to-peer systems: When some peers are more equal than others. IEEE Internet Computing, Special Issue on Resource Allocation 11(4), 53–61 (2007) 22. Suel, T., Mathur, C., Wu, J.-W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. In: WebDB 2003 (2003) 23. Tan, B., Shen, X., Zhai, C.: Mining long-term search history to improve search accuracy. In: SIGKDD 2006 (2006) 24. Teevan, J., Dumais, S.T., Horvitz, E.: Personalizing search via automated analysis of interests and activities. In: SIGIR-IR 2005 (2005)
Author Index
Anderson, David P.
29
Magdon-Ismail, Malik 29 Mamelli, Alessandro 15 Mehlhase, Stephan 15, 198 Meiners, Matthias 84 Menaud, Jean-Marc 42 Mendon¸ca, Nabor C. 169 Mostarda, Leonardo 141
Ball, Rudi 141 Barone, Paolo 15 Cala, Jacek 155 Carton, Pierre 112 Damaso, Antonio V.L. 70 de Ara´ ujo Macˆedo, Raimundo Jos´e De Meuter, Wolfgang 56 Dery-Pinna, Anne-Marie 98 Desell, Travis 29 Domingues, Jeisa P.O. 70 Dowling, Jim 212 Dulay, Naranker 141 Eliassen, Frank Felber, Pascal
226
Occello, Audrey
98
126
Payberah, Amir H. Pinte, Kevin 56 Pottier, R´emy 42
212
15 Sales, Leandro 169 Santos de S´ a, Alirio 126 Scholz, Ulrich 15, 198 Seinturier, Lionel 112 Szymanski, Boleslaw 29
1
Jiang, Shanshan 15 Joffroy, Cedric 98 Joosen, Wouter 183 Kropf, Peter
29 1
Rahimian, Fatemeh 212 Rajman, Martin 226 Rivi`ere, Etienne 226 Romero, Daniel 1, 112 Rosa, Nelson S. 70 Rouvoy, Romain 1, 112
1
Hallsteinsen, Svein Haridi, Seif 212 Hermosillo, Gabriel
Newberg, Heidi Nzekwa, Russel
Taherkordi, Amirhosein Te´ ofilo, Henrique 169 Truyen, Eddy 183
226
Lagaisse, Bert 183 Lamersdorf, Winfried 84 L´eger, Marc 42 Leonini, Lorenzo 226 Lombide Carreton, Andoni Luu, Toan 226
Varela, Carlos A.
29
Walraven, Stefan 183 Watson, Paul 155 56 Zaplata, Sonja
84
1