Software Technologies for Embedded and Ubiquitous Systems: 7th IFIP WG 10.2 International Workshop, SEUS 2009 Newport Beach, CA, USA, November 16-18, ... Applications, incl. Internet Web, and HCI)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author: Sunggu Lee | Priya Narasimhan

79 downloads 1400 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

5860

Sunggu Lee Priya Narasimhan (Eds.)

Software Technologies for Embedded and Ubiquitous Systems 7th IFIP WG 10.2 International Workshop, SEUS 2009 Newport Beach, CA, USA, November 16-18, 2009 Proceedings

13

Volume Editors Sunggu Lee Pohang University of Science and Technology (POSTECH) Department of Electronic and Electrical Engineering San 31 Hyoja Dong, Nam Gu, Pohang, Gyeongbuk 790-784, South Korea E-mail: [email protected] Priya Narasimhan Carnegie Mellon University Electrical and Computer Engineering Department 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, USA E-mail: [email protected]

Library of Congress Control Number: 2009937935 CR Subject Classiﬁcation (1998): C.2, C.3, D.2, D.4, H.4, H.3, H.5 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13

0302-9743 3-642-10264-6 Springer Berlin Heidelberg New York 978-3-642-10264-6 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © IFIP International Federation for Information Processing 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12793907 06/3180 543210

Preface

The 7th IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS) followed on the success of six previous editions in Capri, Italy (2008), Santorini, Greece (2007), Gyeongju, Korea (2006), Seattle, USA (2005), Vienna, Austria (2004), and Hokodate, Japan (2003), establishing SEUS as one of the emerging workshops in the ﬁeld of embedded and ubiquitous systems. SEUS 2009 continued the tradition of fostering cross-community scientiﬁc excellence and establishing strong links between research and industry. The ﬁelds of both embedded computing and ubiquitous systems have seen considerable growth over the past few years. Given the advances in these ﬁelds, and also those in the areas of distributed computing, sensor networks, middleware, etc., the area of ubiquitous embedded computing is now being envisioned as the way of the future. The systems and technologies that will arise in support of ubiquitous embedded computing will undoubtedly need to address a variety of issues, including dependability, real-time, human–computer interaction, autonomy, resource constraints, etc. All of these requirements pose a challenge to the research community. The purpose of SEUS 2009 was to bring together researchers and practitioners with an interest in advancing the state of the art and the state of practice in this emerging ﬁeld, with the hope of fostering new ideas, collaborations and technologies. SEUS 2009 would not have been possible without the eﬀort of many people. First of all, we would like to thank the authors, who contributed the papers that made up the essense of this workshop. We are particularly thankful to the Steering Committee Co-chairs, Peter Puschner, Yunmook Nah, Uwe Brinkschulte, Franz Rammig, Sang Son and Kane H. Kim, without whose help this workshop would not have been possible. We would also like to thank the General Co-chairs, Eltefaat Shokri and Vana Kalogeraki, who organized the entire workshop, and the Program Committee members, who each contributed their valuable time to review and discuss each of the submitted papers. We would also like to thank the Publicity Chair Soila Kavulya and the Local Arrangements Chair Steve Meyers for their help with organizational issues. Thanks are also due to Springer for producing this publication and providing the online conferencing system used to receive, review and process all of the papers submitted to this workshop. Last, but not least, we would like to thank the IFIP Working Group 10.2 on Embedded Systems for sponsoring this workshop. November 2009

Sunggu Lee Priya Narasimhan

Organization

General Co-chairs Eltefaat Shokri Vana Kalogeraki

The Aerospace Corporation, USA University of California at Riverside, USA

Program Co-chairs Sunggu Lee Priya Narasimhan

Pohang University of Science and Technology (POSTECH), Korea Carnegie Mellon University, USA

Steering Committee Peter Puschner Yunmook Nah Uwe Brinkkschulte Franz Rammig Sang Son Kane H. Kim

Technische Universit¨ at Wien, Austria Dankook University, Korea Goethe University, Frankfurt am Main, Germany University of Paderborn, Germany University of Virginia, USA University of California at Irvine, USA

Program Committee Allan Wong Doo-Hyun Kim Franz J. Rammig Jan Gustafsson Kaori Fujinami Kee Wook Rim Lynn Choi Minyi Guo Paul Couderc Robert G. Pettit IV Roman Obermaisser Tei-Wei Kuo Theo Ungerer Wenbing Zhao Wilfried Elmenreich Yukikazu Nakamoto

Hong Kong Polytech, China Konkuk University, Korea University of Paderborn, Germany Malardalen University, Sweden Tokyo University of Agriculture and Technology, Japan Sunmoon University, Korea Korea University, Korea University of Aizu, Japan INRIA, France The Aerospace Corporation, USA Vienna University of Technology, Austria National Taiwan University, Taiwan University of Augsburg, Germany Cleveland State University, USA University of Klagenfurt, Austria University of Hyogo and Nagoya University, Japan

VIII

Organization

Publicity and Local Arrangements Chairs Soila Kavulya Steve Meyers

Carnegie Mellon University, USA The Aerospace Corporation, USA

Table of Contents

Design and Implementation of an Operational Flight Program for an Unmanned Helicopter FCC Based on the TMO Scheme . . . . . . . . . . . . . . . Se-Gi Kim, Seung-Hwa Song, Chun-Hyon Chang, Doo-Hyun Kim, Shin Heu, and JungGuk Kim

1

Energy-Eﬃcient Process Allocation Algorithms in Peer-to-Peer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ailixier Aikebaier, Tomoya Enokido, and Makoto Takizawa

12

Power Modeling of Solid State Disk for Dynamic Power Management Policy Design in Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jinha Park, Sungjoo Yoo, Sunggu Lee, and Chanik Park

24

Optimizing Mobile Application Performance with Model–Driven Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chris Thompson, Jules White, Brian Dougherty, and Douglas C. Schmidt

36

A Single-Path Chip-Multiprocessor System . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Schoeberl, Peter Puschner, and Raimund Kirner

47

Towards Trustworthy Self-optimization for Distributed Systems . . . . . . . . Benjamin Satzger, Florian Mutschelknaus, Faruk Bagci, Florian Kluge, and Theo Ungerer

58

An Experimental Framework for the Analysis and Validation of Software Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea Bondavalli, Francesco Brancati, Andrea Ceccarelli, and Lorenzo Falai

69

Towards a Statistical Model of a Microprocessor’s Throughput by Analyzing Pipeline Stalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uwe Brinkschulte, Daniel Lohn, and Mathias Pacher

82

Joining a Distributed Shared Memory Computation in a Dynamic Distributed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto Baldoni, Silvia Bonomi, and Michel Raynal

91

BSART (Broadcasting with Selected Acknowledgements and Repeat Transmissions) for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ingu Han, Kee-Wook Rim, and Jung-Hyun Lee

103

X

Table of Contents

DPDP: An Algorithm for Reliable and Smaller Congestion in the Mobile Ad-Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ingu Han, Kee-Wook Rim, and Jung-Hyun Lee Development of Field Monitoring Server System and Its Application in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chang-Sun Shin, Meong-Hun Lee, Yong-Woong Lee, Jong-Sik Cho, Su-Chong Joo, and Hyun Yoe On-Line Model Checking as Operating System Service . . . . . . . . . . . . . . . . Franz J. Rammig, Yuhong Zhao, and Sufyan Samara Designing Highly Available Repositories for Heterogeneous Sensor Data in Open Home Automation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto Baldoni, Adriano Cerocchi, Giorgia Lodi, Luca Montanari, and Leonardo Querzoni Fine-Grained Tailoring of Component Behaviour for Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nelson Matthys, Danny Hughes, Sam Michiels, Christophe Huygens, and Wouter Joosen

114

121

131

144

156

MapReduce System over Heterogeneous Mobile Devices . . . . . . . . . . . . . . . Peter R. Elespuru, Sagun Shakya, and Shivakant Mishra

168

Towards Time-Predictable Data Caches for Chip-Multiprocessors . . . . . . Martin Schoeberl, Wolfgang Puﬃtsch, and Benedikt Huber

180

From Intrusion Detection to Intrusion Detection and Diagnosis: An Ontology-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luigi Coppolino, Salvatore D’Antonio, Ivano Alessandro Elia, and Luigi Romano

192

Model-Based Testing of GUI-Driven Applications . . . . . . . . . . . . . . . . . . . . Vivien Chinnapongse, Insup Lee, Oleg Sokolsky, Shaohui Wang, and Paul L. Jones

203

Parallelizing Software-Implemented Error Detection . . . . . . . . . . . . . . . . . . Ute Schiﬀel, Andr´e Schmitt, Martin S¨ ußkraut, Stefan Weigert, and Christof Fetzer

215

Model-Based Analysis of Contract-Based Real-Time Scheduling . . . . . . . . Georgiana Macariu and Vladimir Cret¸u

227

Exploring the Design Space for Network Protocol Stacks on Special-Purpose Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyun-Wook Jin and Junbeom Yoo

240

Table of Contents

HiperSense: An Integrated System for Dense Wireless Sensing and Massively Scalable Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pai H. Chou, Chong-Jing Chen, Stephen F. Jenks, and Sung-Jin Kim Applying Architectural Hybridization in Networked Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Casimiro, Jose Ruﬁno, Luis Marques, Mario Calha, and Paulo Verissimo Concurrency and Communication: Lessons from the SHIM Project . . . . . Stephen A. Edwards

XI

252

264

276

Location-Aware Web Service by Utilizing Web Contents Including Location Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YongUk Kim, Chulbum Ahn, Joonwoo Lee, and Yunmook Nah

288

The GENESYS Architecture: A Conceptual Model for Component-Based Distributed Real-Time Systems . . . . . . . . . . . . . . . . . . . Roman Obermaisser and Bernhard Huber

296

Approximate Worst-Case Execution Time Analysis for Early Stage Embedded Systems Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Gustafsson, Peter Altenbernd, Andreas Ermedahl, and Bj¨ orn Lisper

308

Using Context Awareness to Improve Quality of Information Retrieval in Pervasive Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph P. Loyall and Richard E. Schantz

320

An Algorithm to Ensure Spatial Consistency in Collaborative Photo Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pieter-Jan Vandormael and Paul Couderc

332

Real-Sense Media Representation Technology Using Multiple Devices Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Kwan Yun, Jong-Hyun Jang, Kwang-Ro Park, and Dong-Won Han

343

Overview of Multicore Requirements towards Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ina Podolski and Achim Rettberg

354

Lifting the Level of Abstraction Dealt with in Programming of Networked Embedded Computing Systems (Keynote Speech) . . . . . . . . . . K.H. Kim

365

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

377

Design and Implementation of an Operational Flight Program for an Unmanned Helicopter FCC Based on the TMO Scheme Se-Gi Kim1, Seung-Hwa Song2, Chun-Hyon Chang2, Doo-Hyun Kim2, Shin Heu3, and JungGuk Kim1 1

Hankuk University of Foreign Studies {undeadrage,jgkim}@hufs.ac.kr 2 Konkuk University [email protected], {chchang,doohyun}@konkuk.ac.kr 3 Hanyang University [email protected]

Abstract. HELISCOPE is the name of a project support by MKE (Ministry of Knowledge & Economy) of Korea to develop flying-camera services that transmits the scene of a fire by an unmanned helicopter. In this paper, we introduce the design and implementation of the OFP (Operational Flight Program) for the unmanned helicopter’s navigation based on the well-known TMO scheme. Navigation of the unmanned helicopter is done by the commands on flight mode from our GCS (Ground Control System). As the RTOS on the FCC (Flight Control Computer), RT-eCos3.0 that has been developed based on the eCos3.0 to support the basic task model of the TMO scheme is being used. To verify this navigation system, a HILS (Hardware-in-the-loop Simulation) system using the FlightGear simulator also has been developed. The structure and functions of the RT-eCos3.0 and the HILS is also introduced briefly. Keywords: unmanned helicopter, on-flight software, TMO.

1 Introduction The HELISCOPE [1] project is to develop an unmanned helicopter and its on-flight embedded computing system for navigation and real-time transmission of the motion video of the scene of a fire using the HSDPA or Wibro communication scheme. The unmanned helicopter shall be used especially in the phase of disaster response and recovery intensively. In this paper, we introduce the design and implementation of the OFP (Operational Flight Program) subpart of the HELISCOPE project based on the well-known TMO scheme [3]. Navigation of the unmanned helicopter is done by the commands on flight mode from our GCS (Ground Control System). As the RTOS on the FCC (Flight Control Computer), RT-eCos3.0 [2], [5] that has been developed based on the eCos3.0 to support the basic task model of the TMO scheme is being used. The reason why we used the TMO model and RT-eCso3.0 is because an OFP must provide a well-structured real-time control mechanism with various sensors, actuators S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 1–11, 2009. © IFIP International Federation for Information Processing 2009

2

S.-G. Kim et al.

and communication devices connected to the FCC. The OFP must process real-time sensor inputs and commands coming from GPS/INS (GPS/Inertial Navigation System), AHRS (Attitude and Heading Reference System) and the GCS in a deadline based manner. A main FC (Flight Control) task of the OFP calculates real-time control signals with the inputs and must send them to actuators of the helicopter in the pre-given deadline. Also, the FCC must report the status of the aircraft to the GCS periodically or upon requests from the GCS. Our OFP consists of one time-triggered task and several message-triggered tasks. Collecting of sensor data and commands is done by several message-triggered tasks. Upon receiving data, these tasks store them in to the ODS (Object Data Store) of the OFP-TMO. The ODS also contains some parameters on the capability of the aircraft. Periodic calculation and sending of control outputs with the data in ODS is performed by the main time-triggered task. All these tasks are scheduled by the RT-eCos3.0 based on their timing constraints. To verify the navigation system, a HILS (Hardware-in-the-loop Simulation) system using the open-source FlightGear simulator also has been developed. In section 2, the TMO model and the RT-eCos3.0 kernel is introduced briefly as related works and in section 3, design and implementation of the OFP are described. In section 4, the HILS system for verification will be discussed and in section 5, we will conclude.

2 The TMO Model and the RT-eCos3.0 Kernel In this section, a distributed real-time object model, TMO, and the RTOS that has been used in implementing the OFP are introduced briefly as related works. 2.1 TMO Model [2], [3] The TMO model is a real-time distributed object model for timeliness-guaranteed computing at design time. A TMO instance consists of three types of object member: an ODS (Object Data Store), time-triggered methods (SpM: Spontaneous Method) and message-triggered methods (SvM: Service Method). An SpM is actually a member-thread that is activated by a pre-given timing constraint and must finish its periodic executions within the given deadline. An SvM is also a member-thread that is activated by an event-message from a source outside a TMO. Main differences between the TMO model and conventional objects can be summarized as follows. - TMO is a distributed computing component and thus TMOs distributed over multiple nodes may interact via distributed IPC. - The two types of method are active member threads in the TMO model (SpMs and SvMs). - SvMs cannot disturb the executions of SpMs. This rule is called the BCC (Basic Concurrency Constraint). Basically, activation of an SvM is allowed only when potentially conflicting SpM executions are not in place. 2.2 RT-eCos3.0 Scheduling and IPC [2], [5] RT-eCos3.0 which is a real-time extension of the eCos3.0 supports multiple perthread real-time scheduling policies. The policies are the EDF(Earliest Deadline

Design and Implementation of an Operational Flight Program

3

First)/BCC, LLF(Least Laxity First)/BCC and the non-preemptive FTFS (First Triggered First Scheduled) scheduler. The non-preemptive FTFS scheduler is used when an off-line scheduling scenario is given by our task serializer [7] that determines the initial offsets of time-triggered tasks so as that all task instances can be executed without overlap and preemption. On the other hand, the EDF/BCC and the LLF/BCC are normally used when there is no pre-analysis tool or when the task serialization is not possible. This means that the EDF and LLF schedulers are ones for the systems with dynamic execution behaviors. With these real-time schedulers, two basic types of real-time task; time-triggered and message-triggered tasks; are supported by the kernel. SpMs and SvMs of the TMO model are mapped to these tasks by the TMOSL (TMO Support Library) for TMO programmers. The timing precision that is used for representing timing constraints such as start/stop time, period and deadline has been enhanced into micro second unit. Although the scheduling is performed every milli-second, the kernel computes current time in micro-second unit by checking and compensating the raw PIC clock ticks from the last clock interrupt. With these this schedulers, the kernel shows task-switch overhead of 1.51 micro-sec and scheduling overhead of 2.73 micro-sec in a 206MHz ARM9 processor environment. Management of message-triggered tasks is always done in conjunction with the logical multicast distributed IPC of the RT-eCos3.0. Once a message is arrived via the network transparent IPC at a channel associated with a message-triggered task, the task is activated and scheduled to finish its service within the pre-given deadline. The IPC subsystem consists of two layers. The lower layer is the intra-node channel IPC layer and the upper is the network transparent distributed IPC layer. This layering is for flexible configurations of the RT-eCos3.0 IPC based on various protocols. Besides supporting this basic channel IPC, the TMOSL has been enhanced to support the Gate and RMMC (Real-time Multicast Memory replication Channel) that is a highly abstracted distributed IPC model of the TMOSM of U.C. Irvine [4]. Since the role of an SvM is to handle external asynchronous events, the channel IPC has been extended so that external devices such as an IO devices or a sensor-alarm device can be associated to a channel. In this case, a message-triggered task can be activated when an asynchronous input occurs and is scheduled to be finished within the predefined deadline.

3 Design and Implementation of OFP for Unmanned Helicopter FCC Based on the TMO Scheme In this section, control points and OFP of a helicopter is mainly described. 3.1 Helicopter Mechanics Being different from the fixed wing aircraft, a helicopter makes a stable flight by using the thrust and upward force generated by the fixed speed rotation of the engine and the angles of the main and tail rotor blades. For change of heading, vertical flight and forward movement, it uses the tilt rotor disk and tail rotor.

4

S.-G. Kim et al.

Fig. 1. Main components of a helicopter [1]

A helicopter can maintain the powerful hovering state that is not possible in fixed wing aircraft but it has a difficult problem in maintain a stable attitude caused by its complicated lifting mechanism. Fig. 1 shows the main components of a helicopter and table 1 shows the major control points and their motion effects. Table 1. Helicopter controls and effects [1] Control Points

Directly controls

Cyclic lateral

Varies main rotor blade pitch left/right

Cyclic longitudinal

Varies main rotor blade pitch fore/aft

Collective

Tail rotor: Rudder

Collective angle of attack for the rotor main blades via the swashplate Collective pitch supplied to tail rotor blades

Secondary effect

Used in forward flight

Used in hover flight

Induces roll in direction moved

To turn the aircraft

To move sideways

Induces pitch nose down or up

Control attitude

To move forwards/ backwards

Inc./dec. pitch angle of rotor blades causing the aircraft to rise/descend

Inc./dec. torque and engine RPM

To adjust power through rotor blade pitch setting

To adjust skid height/ vertical speed

Yaw rate

Inc./dec. torque and engine RPM (less than collective)

Adjust sideslip angle

Control yaw rate/ heading

Primary effect Tilts main rotor disk left and right through the swashplate Tilts main rotor disk forward and back via the swashplate

Design and Implementation of an Operational Flight Program

5

3.2 Flight Modes of the Unmanned Helicopter The unmanned helicopter receives commands from the GCS and the OFP on FCC calculates the values of control signals to be sent to control points with current sensor values from GPS/INS, AHRS and SWM (Helicopter Servo Actuator Switching Module). Theses control signals are sent to control points via the SWM. The OFP also sends information on location and attitude to the GCS for periodical monitoring. In case of losing control, the SWM is set to a manual remote-control mode. Fig. 2 describes the control structure. Actually, Fig. 2 describes the whole structure of the HELISCOPE project including the MCC (Multimedia Communication Computer) board and its communication mechanism however, the description of this part has been excluded in this paper. Auto-flight modes of our unmanned helicopter are as follows (Fig. 3). -

Hovering Auto-landing Point navigation Multi-point navigation

In hovering mode, the helicopter tries to maintain current position and attitude even if there is a wind. And hovering is always the end of all flight modes except for autolanding and is the most difficult mode when auto-flight mode is used. In auto-landing mode, landing velocity is controlled according to the current altitude. In point-navigation mode, the helicopter moves to a target position given by the GCS. In this mode, a compensation algorithm is used when there is a breakaway from the original track to the target caused by a wind. When the craft arrives at the target, it turns its mode to hovering. Multi-point navigation is a sequential repetition of point- navigations.

Fig. 2. Structure of the HELICOPE

6

S.-G. Kim et al.

Fig. 3. Flight Mode Transition

Fig. 4. Structure of the OFP-TMO

3.3 Design of the OFP Based on the TMO Scheme The OFP basically consists of a TMO instance containing with one time-triggered task, four message-triggered tasks and ODS. The ODS contains data that are periodically read from GPS/INS, AHRS and SWM. Followings are descriptions on the rolls of these five tasks and data contained in the ODS. - GPS/INSReader (IO-SvM): This task collects data sent from the GPS/INS device that periodically sends temporal and spatial data in 10 Hz. The data mainly consists of information on altitude, position and velocity for each direction.

Design and Implementation of an Operational Flight Program

7

- AHRSReader (IO-SvM): This task collects data sent from the AHRS device that periodically send attitude information of the aircraft in 20 Hz. - SWMReader(IO-SvM): This task collects data sent from the SWM in 10 Hz. The data consists of current values of four control points (Cyclic-lateral, Cycliclongitudinal, Collective, Rudder). - GCSReader(SvM): This task receives various flight commands from the GCS sporadically. Commands from the GCS include information on flight mode, target positions, switch-to-manual-mode, etc.. - FC(SpM) : FC means Flight Control and this task runs with 20Hz period and 40 milli-second deadline. This task calculates the next values for four control points with the ODS and send control values back to the SWM for controlling the helicopter. - ODS: Besides the data collected from the read tasks, the ODS data also contains information on the current flight mode, the next values for control points and capability parameters of the aircraft. The frequencies of the SvMs and the FC-SpM can be changed according to the actual frequencies of real devices and the capability of the CPU used. The frequencies of the IO-SvMs above are set to the ones of the physical devices that will be used in the field test. The calculations being performed by the FC for the four flight modes consist of four basic auto-flight rule-operations. They are SetFowardSpeed, SetSideSpeed, SetAltitude and SetHeading. SetFowardSpeed and SetSideSpeed generate values of twoaxis-Cyclic for forward and side speeds. SetAltitude generates a Collective value to rise, descend or preserve the current altitude. Finally SetHeading generates a value to change heading. For each flight mode, an appropriate combination of these four basic operations is used. For example, in the point navigation mode, the SetHeading operation generates a value for heading (rudder) to the target position and the SetForwadSpeed operation generates values for two Cyclic control points for the maximum speed to the target. To avoid too rapid changing of attitude of an aircraft, some upper and lower limits are imposed to the values generated by the four basic operations because the maximum and minimum values for Cyclic Lateral/Longitudial, Collective, Rudder are dependent on the kinds of aircraft.[6] 3.4 GCS (Ground Control System) A GCS is an application tool with which a user can monitor the UAV status and send control messages to FCC. Our GCS provides a graphical interface for easy control. A protocol for the wireless communication devices (RF modem) between the UAV and GCS has been designed. There are two types of packet, to-GCS-packet for monitoring and from-GCS-packet for control. Both packets consist of header, length, status or control data, and checksum in the tail. - to-GCS-packet

8

S.-G. Kim et al.

The FCC system transmits to-GCS-packets containing the UAV status data that are described in table 2 to the GCS at 20Hz. Table 2. UAV status data Attitude Location Velocity Control signal FCC system

Roll/Pitch/ Yaw angle North, East, Altitude Forward/Sideward/Upward velocity Lateral cyclic/Longitudinal cyclic/ Main rotor collective pitch/Main rotor collective pitch/Tail rotor collective pitch angle Battery voltage

- from-GCS-packet A from-GCS-packet contains a command message to control the UAV flight mode described in section 3.2. - User Interface The user interface has been designed for prompt monitoring and convenient control. Fig. 5 and 6 shows the user interface of our GCS. Various status data including the

Fig. 5. Main interface window

Fig. 6. Control signal window

Design and Implementation of an Operational Flight Program

9

attitude of an UAV contained in downlink packets are displayed in 2D/3D/bar graphs and gauges of the GCS for easy analysis and debugging the FCC control logic. The GCS has six widgets for control operations such as landing, taking off, hovering, moving, self-return and update status.

4 Experiments with the Hardware-in-the-Loop Simulation System To test and verify the OFP in the TMO scheme, we used a HILS system with the open-source based FlightGear-v0.9.10 simulator. The FlightGear simulator supports various flying object models, 3-D regional environments and model-dependent algorithms. In the HILS environment, the OFP receives GPS/INS, AHRS and SWM information from the FlightGear simulator. Fig. 7 shows the HILS architecture. For the FCC, a board with the ST Thomson DX-66 STPC Client chip has been used to enhance floating point operations. Among various testing of stability of flying that have been performed, test results of hovering and heading in the point navigation mode are shown in Fig. 8 and 9. In the hovering test, the helicopter takeoffs at altitude 7m, maintains hovering at altitude 17m for 10 seconds and does auto-landing. Fig. 8 shows the desired references and actual responses of the HILS system in this scenario and we can see that there is a deflection of almost 0.5 meter in maximum when hovering. This result is tolerable in our application.

Fig. 7. Structure of the HILS

10

S.-G. Kim et al.

Fig. 8. Stability of hovering control

Fig. 9. Heading-control in the point navigation mode

5 Conclusions and Future Works In this paper, we introduced the design and implementation of the OFP (Operational Flight Program) for Unmanned Helicopter’s navigation based on the well-known TMO scheme and the RT-eCos3.0 and verified the system using HILS. Since the OFP can naturally be composed of time-triggered and message-triggered tasks, using of the TMO mode that supports object-oriented real-time concurrent programming is a very well-structured and easily extendable scheme in designing and implementing the OFP. Moreover, we could also find out that the RTOS, RT-eCos3.0,

Design and Implementation of an Operational Flight Program

11

is a very suitable for this kind of application because of its accurate timing behavior and small size. By finishing the testing of HILS with the flying object supported by the FlightGear, the job to be done further is to do some minor corrections on parameters and detail algorithms that are dependent on a real flying object model. Our plan is to start field testing with a real aircraft in this year. Acknowledgement. This work was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency" (NIPA2009-C1090-0902-0026) and also partially supported by the MKE and the KETI, Korea, under the HVI(Human-Vehicle Interface) project(100333312) .

References 1. Kim, D.H., Nodir, K., Chang, C.H., Kim, J.G.: HELISCOPE Project: Research Goal and Survey on Related Technologies. In: 12th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, pp. 112–118. IEEE Computer Society Press, Tokyo (2009) 2. Kim, J.H., Kim, H.J., Park, J.H., Ju, H.T., Lee, B.E., Kim, S.G., Heu, S.: TMO-eCos2.0 and its Development Environment for timeliness Guaranteed Computing. In: 1st Software Technologies for Dependable Distributed Systems, pp. 164–168. IEEE Computer Society Press, Tokyo (2009) 3. Kim, K.H., Kopetz, H.: A Real-Time Object Model RTO.k and an Experimental Investigation of Its Potentials. In: 18th IEEE Computer Software & Applications Conference, pp. 392–402. IEEE Computer Society Press, Los Alamitos (1994) 4. Jenks, S.F., Kim, K.H., et al.: A Middleware Model Supporting Time-triggered Messagetriggered Objects for Standard Linux Systems. Real-Time Systems – J. of Time-Critical Computing systems 36, 75–99 (2007) 5. Kim, J.G., et al.: TMO-eCos: An eCos-based Real-time Micro Operating system Supporting Execution of a TMO Structured Program. In: 8th IEEE International Symposium on Object/ Component/ Service-Oriented Real-Time Distributed Computing, pp. 182–189. IEEE Computer Society Press, Seattle (2005) 6. Kim, S.P.: Guide and Control Rules for an Unmanned Helicopter. In: 2nd Workshop on HELISCOPE, pp. 1–12. ITRC, Konkuk University, Seoul (2008) 7. Kim, H.J., Kim, J.G., et al.: An Efficient Task Serializer for Hard Real-time TMO Systems. In: 11th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, pp. 405–413. IEEE Computer Society Press, Orlando (2008)

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems Ailixier Aikebaier2, Tomoya Enokido1, and Makoto Takizawa2 1

Rissho University, Japan [email protected] 2 Seikei University, Japan [email protected], [email protected]

Abstract. Information systems are composed of various types of computers interconnected in networks. In addition, information systems are being shifted from the traditional client-server model to the peer-to-peer (P2P) model. The P2P systems are scalable and fully distributed without any centralized coordinator. It is getting more significant to discuss how to reduce the total electric power consumption of computers in information systems in addition to developing distributed algorithms to minimize the computation time. In this paper, we do not discuss the micro level like the hardware specification of each computer. We discuss a simple model to show the relation of the computation and the total power consumption of multiple peer computers to perform types of processes at macro level. We also discuss algorithms for allocating a process to a computer so that the deadline constraint is satisfied and the total power consumption is reduced.

1 Introduction Information systems are getting scalable so that various types of computational devices like server computers and sensor nodes [1] are interconnected in types of networks like wireless and wired networks. Various types of distributed algorithms [6] are so far developed, e.g. for allocating computation resources to processes and synchronizing multiple conflicting processes are discussed to minimize the computation time and response time, maximize the throughput, and minimize the memory space. On the other hand, the green IT technologies [4] have to be realized in order to reduce the consumptions of natural resources like oil and resolve air pollution on the Earth. In information systems, total electric power consumption has to be reduced. Various hardware technologies like low-power consumption CPUs [2,3] are now being developed. Biancini et al. [8] discuss how to reduce the power consumption of a data center with a cluster of homogeneous server computers by turning off servers which are not required for executing a collection of web requests. Various types of algorithms to find required number of servers in homogeneous and heterogeneous servers are discussed [5,9]. In wireless sensor networks [1], routing algorithms to reduce the power consumption of the battery in a sensor node are discussed. In this paper, we consider peer-to-peer (P2P) overlay networks [7] where computers are in nature heterogeneous and cannot be turned off by other persons different from S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 12–23, 2009. c IFIP International Federation for Information Processing 2009

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

13

the owners. In addition, the P2P overlay networks are scalable and fully distributed with no centralized coordination. Each peer has to find peers which not only satisfy QoS requirement but also spend less electric power. First, we discuss a model for performing processes on a computer. Then, we measure how much electric power a type of computers spends to perform a Web application process. Next, we discuss a simple power consumption model for performing a process in a computer based on the experiments with servers and personal computers. In the simple model, each computer consumes maximally the electric power if at least one process is performed. Otherwise, the computer consumes minimum electric power, i.e. in idle state. The simple model shows personal computers with one CPU independently of the number of cores, according to our experiment. A request to perform a process like a Web page request is allocated to one of the computers in the network. We discuss a laxity-based allocation algorithm to reduce not only execution time but also power consumption in a collection of computers. In the laxity-based algorithm, processes are allocated to computers so that the deadline constraints are satisfied based on the laxity concept. In section 2, we present a systems model for performing a process on a computer. In section 3, we discuss a simple power consumption model obtained from the experiment. In section 4, we discuss how to allocate each process with a computer to reduce the power consumption. In section 5, we evaluate the process allocation algorithms.

2 Computation Model 2.1 Normalized Computation Rate A system S is includes a set C of computers c1 , ..., cn (n ≥ 1) interconnected in reliable networks. A user issues a request to perform a process like a Web page request. The process is performed on one computer. There are a set P of application processes p1 , ..., pm (m ≥ 1). A term process means an application process in this paper. We assume each process ps can be performed on any computer in the computer set C. A user issues a request to perform a process ps to a load balancer K. For example, a user issues a request to read a web page on a remote computer. The load balancer K selects one computer ci in the set C for a process ps and sends a request to the computer ci . On receipt of the request, the process ps is performed on the computer ci and a reply, e.g. Web page is sent back to the requesting user. Requests from multiple users are performed on a computer ci . A process being performed at time t is referred to as current. A process which already terminates before time t is referred to as previous. Let Pi (t) (⊆ P ) be a set of current processes on a computer ci at time t. Ni (t) shows the number of the current processes in the set Pi (t), Ni (t) = |Pi (t)|. Let P (t) show a set of all current processes on computers in the system S at time t, ∪i=1,...,n Pi (t). Suppose a process ps is performed on a computer ci . Here, Tis is the total computation time of ps on ci and minTis shows the computation time Tis where a process ps is exclusively performed on ci , i.e. without any other process. Hence, minTis ≤ Tis for every process pi . Let maxTs and minTs be max(minT1s , ..., minTns ) and min(minT1s, ..., minTns ), respectively. If a process ps is exclusively performed on the fastest computer ci and the slowest computer cj , minTs = minTis and maxTs =

14

A. Aikebaier, T. Enokido, and M. Takizawa

minTjs , respectively. A time unit (tu) shows the minimum time to perform a smallest process. We assume it takes at least one time unit [tunit] to perform a process on any computer, i.e. 1 ≤ minTs ≤ maxTs . The average computation rate (ACR) Fis of a process ps on a computer ci is defined as follows: Fis = 1/Tis [1/tu].

(1)

Here, 0 < Fis ≤ 1 / minTis ≤ 1. The maximum ACR maxFis is 1 / minTis . Fis shows how many percentages of the total amount of computation of a process ps are performed for one time unit. maxFs = max(maxF1s , ..., maxFns ). minFs = min(maxF1s , ..., maxFns ). maxFs and minFs show the maximum ACRs maxFis and maxFjs for the fastest computer ci and the slowest computer cj , respectively. The more number of processes are performed on a computer ci , the longer it takes to perform each of the processes on the computer ci . Let αi (t) indicate the degradation rate of a computer ci at time t (0 ≤ αi (t) ≤ 1)[1/tu]. αi (t1 ) ≤ αi (t2 ) ≤ 1 if Ni (t1 ) ≤ Ni (t2 ) for every pair of different times t1 and t2 . We assume αi (t) = 1 if Ni (t) ≤ 1 and αi (t) < 1 if Ni (t) > 1. Suppose it takes 50 [msec] to exclusively perform a process ps on a computer ci . Here, minTis = 50. Here, Fis = maxFis = 1/50 [1/msec]. Suppose it takes 75 [msec] to perform the process ps while other processes are performed on the computer ci . Here, Fis = 1/75 [1/msec]. Hence, αi (t) = 50/75 = 0.67 [1/msec]. We define the normalized computation rate (N CR) fis (t) of a process ps on a computer ci at time t as follows: αi (t) · maxFis /maxFs [1/tu] fis (t) = (2) αi (t) · minTs /minTis [1/tu] For the fastest computer ci , fis (t) = 1 if αi (t) = 1, i.e. Ni (t) = 1. If a computer ci is faster than another computer cj and the process ps is exclusively performed on ci and cj at time ti and tj , respectively, fis (ti ) > fjs (tj ). If a process ps is exclusively performed on a computer ci , αis (t) = 1 and fis (t) = maxFis / maxFs . The maximum NRC maxfis shows maxFis / maxFs . 0 ≤ fis (t) ≤ maxfis ≤ 1. The NCR fis (t) shows how many steps of a process ps are performed on a computer ci at time t. The average computation rate (ACR) Fis depends on the size of the process ps while fis (t) depends on the speed of the computer ci . Next, suppose that a process ps is started and terminated on a computer ci at time stis and etis , respectively. Here, the total computation time Tis is etis - stis . The following formulas hold for the degradation rate αi (t) and NCR fis (t): etis αi (t) =1 (3) minT is stis etis etis αi (t) (fis (t)) dt = minTs · = minTs (4) stis stis minTis If there is no other process, i.e. αi (t) = 1 on the computer ci , fis (t) = maxFis / maxFs = minTs / minTis . Hence, Tis = etis − stis = minTis . If other processes are performed, Tis = etis - stis > minTis . Here, minTs shows the total amount of computation to be performed by the process ps .

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

15

Figure 1 shows the NCRs fis (t) and fjs (t) of a process ps which are exclusively performed on a pair of computers ci and cj , respectively. Here, the computer ci is the fastest in the computer set C. The NCR fis (t) = maxfis = 1 for stis ≤ t ≤ etis and Tis = etis - stis = minTs . On the slower computer cj , fjs (t) = maxfjs < 1 and Tjs = etjs - stjs > minTs . Here, maxfis · minTis = minTs = maxfjs · minTjs from the equation (4). The areas shown by fis (t) and fjs (t) have the same size minTs (= Tis ). Figure 2 shows the NCR fis (t) of a process ps on a computer ci at time t, where multiple precesses are performed concurrently with the process ps . fis (t) is smaller than maxfis if other processes are concurrently performed on the computer ci . Here, Tis = et etis - stis > minTs and stis fis (t)dt = minTs . is fis (t) 1

1 fjs (t)

maxf is

maxf is Tjs 0

stis

stjs

t

etis

etjs

Tis (=minTs )

Fig. 1. Normalized computation rates (NCRs)

0

stis

etis

t

Fig. 2. Normalized computation rate fis (t)

Next, we define the computation laxity Lis (t) [tu] of a process ps on a computer ci at time t as follows: t Lis (t) = minTs − (fis (x)) dx. (5) stis

The laxity Lis (t) shows how much computation the computer ci has to spend to perform up a process ps at time t. Lis (stis ) = minTs and Lis (etis ) = 0. If the process ps would be exclusively performed on the computer ci , the process ps is expected to terminate at time etis = t + Lis (t). 2.2 Simple Computation Model There are types of computers with respect to the performance. First, we consider a simple computation model. In the simple computation model, a computer ci satisfies the following properties: [Simple computation model] 1. maxfis = maxfiu for every pair of different processes ps and pu performed on a computer ci . 2.

ps ∈Pi (t)

fis (t) = maxfi .

(6)

16

A. Aikebaier, T. Enokido, and M. Takizawa

The maximum normalized computation rate (NCR) maxfi of a computer ci is maxfis for any process ps . This means, the computer ci is working to perform any process with the maximum clock frequency. Pi (t) shows a set of processes being performed on a computer ci at time t. In the simple computation model, we assume the degradation factor αi (t) = 1. On a computer ci , each process ps starts at time stis and terminates at time etis . We discuss how the NCR fis (t) of each process ps changes in presence of multiple precesses on a computer ci . A process ps is referred to as precedes another process pu on a computer ci if etis < stiu . A process ps is interleaved with another process pu on a computer ci iff etiu ≥ etis ≥ stiu . The interleaving relation is symmetric but not transitive. A process ps is referred to as connected with another process pu iff (1) ps is interleaved with pu or (2) ps is interleaved with some process pv and pv is connected with pu . The connected relation is symmetric and transitive. A schedule schi of a computer ci is a history of processes performed on the computer ci . Processes in schi are partially ordered in the precedent relation and related in the connected relation. Here, let Ki (ps ) be a closure subset of the processes in the schedule schi which are connected with a process ps , i.e. Ki (ps ) = {pu | pu is connected with ps }. Ki (ps ) is an equivalent class with the connected relation, i.e. Ki (ps ) = Ki (pu ) for every process pu in Ki (ps ). Ki (ps ) is referred to as knot in schi . The schedule schi is divided into knots Ki1 , . . . , Kili which are pairwise disjointing. Let pu and pv are a pair of processes in a knot Ki (ps ) where the starting time stiu is the minimum and the termination time etiv is the maximum. That is, the process pu is first performed and the process pv is lastly finished in the knot Ki (ps ). The execution time T Ki of the knot Ki (ps ) is etiv - stiu . Let KPi (t) be a current knot which is a set of current or previous processes which are connected with at least one current process in Pi (t) at time t. In the simple model, it is straightforward for the following theorem to hold from the simple model properties: [Theorem] Let Ki be a knot in a schedule schi of a computer ci . The execution time T Ki of the knot Ki is ps ∈Ki minTis . Let us consider a knot Ki of three processes p1 , p2 , and p3 on a computer ci as shown in Figure 3 (1). Here, Ki = {p1 , p2 , p3 }. First, suppose that the processes p1 , p2 , and p3 are serially performed, i.e. eti1 = sti2 and eti2 = sti3 . Here, the execution time T Ki is eti3 - sti1 = minTi1 + minTi2 + minTi3 . Next, three processes p1 , p2 , and p3 start at time st and terminate at time et as shown in Figure 3 (2). Here, the execution time T Ki = minTi1 + minTi2 + minTi3 . Lastly, let us consider a knot Ki where the processes are concurrently performed. The processes p1 , p2 , and p3 start at the same time, sti1 = maxf i

maxf i p 1

p 2

p 3

p1

maxf i

p2

p2

p1

p3 minT i1

minT i2 minT i3

(1) Serial execution.

time t

(minT i1+ minT i2 + minT i3) time t (2) Parallel execution.

p3

(minT i1+ minT i2 + minT i3) time t

sti1

t 1 - sti1

t1

(3) Mixed execution.

Fig. 3. Execution time of knot

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

17

sti2 = sti3 , are concurrently performed, and the process p3 lastly terminates at time eti3 after p1 and p2 as shown in Figure 3 (3). Here, the execution time T Ki of the knot Ki is eti3 - sti1 = minTi1 + minTi2 + minTi3 . The current knot KPi (t1 ) is {p1 , p2 , p3 } at time t1 and KPi (t2 ) is {p1 , p2 } at time t2 . It depends on the scheduling algorithm how much each NCR fis (t) is in the equation (6), fis (t) = αis ·maxfi where ps ∈Pi (t) αis = 1. In the fair scheduler, each fis (t) is the same as the others, i.e. αis = 1/|Pi (t)|: fis (t) = maxfi / |Pi (t)|.

(7)

2.3 Estimated Termination Time Suppose there are a set P of processes {p1 , . . . , pm } and a set C of computers {c1 , . . . , cn } in a system S. We discuss how to allocate a process ps in the process set P to a computer ci in the computer set C. Here, we assume the system S to be heterogeneous, i.e. some pair of computers ci and cj have different specifications and performance. Suppose a process ps is started on a computer ci at time stis . A set Pi (t) of current processes are being performed on a computer ci at time t. [Computation model] Let KPi (t) be a current knot = {pi1 , ..., pili } of processes, where the starting time is st. The total execution time T (st, t) of processes in the current knot KPi (t) is given as; T (st, t) = minTi1 + minTi2 + · · · + minTili

(8)

In Figure 3 (3), t1 shows the current time. A process p1 is first initiated at time sti1 and is terminated before time t1 on a computer ci . A pair of processes p2 and p3 are currently performed at time t1 . Here, KPi (t) is a current knot {p1 , p2 , p3 } at time t1 . T (sti1 , t1 ) = minTi1 + minTi2 + minTi3 . The execution time from time sti1 to t1 is t1 - sti1 . At time t1 , we can estimate that the processes p2 and p3 which are concurrently performed and terminate at the time t1 + T (stt1 , t1 ) - (t1 - sti1 ) = sti1 + T (ti1 , t1 ). sti1 is referred to as starting time of the current knot KPi (t). No process is performed in some period before sti1 and some process is performed at any time since sti1 to t. The estimated termination time ETi (t) of current processes on a computer ci means time when every current process of time t terminates if no other process would be performed after time t. ETi (t) is given as follows: ETi (t) = t + T (stis , t) − (t − stis ) = stis + T (stis , t)

(9)

Suppose a new process ps is started at current time t. By using the equation (9), we can obtain the estimated termination time ETi (t) of the current processes on each computer ci at time t. From the computation model, the estimated termination time ETis (t) of a new process ps starting on a computer ci at time t is given as follows: ETis (t) = ETi (t) + minTis

(10)

18

A. Aikebaier, T. Enokido, and M. Takizawa

3 Simple Power Consumption Model Suppose there are n (≥ 1) computes c1 , . . . , cn and m (≥ 1) processes p1 , . . . , pm . In this paper, we assume the simple computation model is taken for each computer, i.e. the maximum clock frequency to be stable for each computer ci . Let Ei (t) show the electric power consumption of a computer ci at time t [W/tu] (i = 1, . . . , n). maxEi and minEi indicate the maximum and minimum electric power consumption of a computer ci , respectively. That is, minEi ≤ Ei (t) ≤ maxEi . maxE and minE show max(maxE1 , ..., maxEn ) and min(minE1 , ..., minEn ), respectively. Here, minEi shows the power consumption of a computer ci which is in idle state. We define the normalized power consumption rate (N P CR) ei (t) [1/tu] of a computer ci at time t as follows: ei (t) = Ei (t)/maxE (≤ 1).

(11)

Let minei and maxei show the maximum power consumption rate minEi / maxE and the minimum one maxEi / maxE of the computer ci , respectively. If the fastest computer ci maximumly spends the electric power with the maximum clock frequency, ei (t) = maxei = 1. In the lower-speed computer cj , i.e. maxfj < maxfi , ej (t) = maxej < 1. We propose two types of power consumption models for a computer ci , simple and multi-level models. In the simple model, the NPCR ei (t) is given depending on how many number of processes are performed as follows: maxei if Ni (t) ≥ 1. ei (t) = (12) minei if otherwise. This means, if one process is performed on a computer ci , the electric power is maximally consumed on the computer ci . Even if more than one process is performed, the maximum power is consumed in a computer ci . A personal computer with one CPU satisfies the simple model as discussed in the experiments of the succeeding section. The total normalized power consumption T P Ci (t1 , t2 ) of a computer ci from time t1 to time t2 is given as follows: t2 T P Ci (t1 , t2 ) = ei (t)dt (13) t1

Next, T P C1 ·(t1 , t2 ) ≤ t2 - t1 . In the fastest computer ci , T P C1 ·(t1 , t2 ) = maxei ·(t2 t1 ) = t2 - t1 if at least one process is performed at any time from t1 to t2 in the simple model. Let Ki be a knot of a computer ci whose starting time is sti and termination time is eti . The normalized total power consumption of the computer ci to perform every i (sti , eti ). In the simple model, T P Ci (sti , eti ) = eti process in the knot Ki is T P C maxe dt = (et st ) · maxe = i i i i ps ∈Ki minTis · maxei . sti

4 Process Allocation Algorithms 4.1 Round-Robin Algorithms We consider two types of algorithms, weighted round robin (WRR) [20] and weighted least connection (WLC) [21] algorithms. For each of the WRR and WLC algorithms,

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

19

we consider two cases, Per (performance) and Pow (power). In Per the weight is given in terms of the performance ratio of the servers. That is, the higher performance a server supports, the more number of processes are allocated to the server. In Pow, the weight is defined in terms of the power consumption ratio of the servers. The less power a server consumes, the more number of processes are allocated to the server. 4.2 Laxity-Based Algorithm Some application has the deadline constraint T Cs on a process ps issued by the application, i.e. a process ps has to terminate until the deadline. Here, a process ps has to be allocated to a computer ci so that the process ps can terminate by the deadline T Cs . Cs (t) denotes a set of computers which satisfy the condition T Cs , i.e. Cs (t) = {ci | ETis (t) ≤ T Cs }. That is, in a computer ci in Cs (t), the process ps is expected to terminate by the deadline T Cs . Here, if the process ps is allocated to one computer ci in Cs (t), the process ps can terminate before T Cs . Next, we assume that a normalized power consumption rate (NPCR) ei (t) of each computer ci is given as equation (12) according to the simple model. We can estimate the total power consumption laxity leis (t) of a process ps between time t and ETis (t) at time t when the process ps is allocated to the computer ci [Figure 4]. leis (t) of the computer ci is given as equation (14): leis (t) = maxei ∗ (ETis (t) − t)

(14)

Suppose a process ps is issued at time t. A computer ci in the computer set C is selected for the a process ps with the constraint T Cs at time t as follows: Alloc(t, C, ps , T Cs ) { Cs = φ; NoCs = φ; for each computer ci in C, { if ETis (t) ≤ T Cs , Cs = Cs ∪ {ci }; else /* ETis (t) > T Cs */ NoCs = NoCs ∪ {ci }; } if Cs = φ, { /* candidate computers are found */ computer = ci such that leis (t) is the minimum in Cs ; return(computer); } else { /* Cs = φ */ computer = ci such that ETis (t) is minimum in NoCs ; return(computer); } } Cs and NoCs are sets of computers which can and cannot satisfy the constraint T Cs , respectively. Here, Cs ∪ NoCs = C and Cs ∩ NoCs = φ. In the procedure Alloc, if there is at least one computer which can satisfy the time constraint T Cs of process ps , one of the computers which consumes the minimum power consumption is selected. If there is no computer which can satisfy the application time constraint T Cs , one of the computers which can most early terminate the process ps is selected in the computer set C.

20

A. Aikebaier, T. Enokido, and M. Takizawa

e i (t)

maxe i * (ETis (t)- t )

maxe i mine i time t

f i (t) maxf i

p2

p3

ps

p1

0

(minTi1 + minTi2 + minTi3 )

sti1

minTis

t

time t

ETis

Fig. 4. Estimation of power consumption

5 Evaluation 5.1 Environment We measure how much electric power computers consume for Web applications. We consider a cluster system composed of Linux Virtual Server (LVS) systems which are interconnected in gigabit networks as shown in Figure 5. The NAT based routing system VS-NAT [12] is used as the load balancer K. The cluster system includes three servers s1 , s2 , and s3 in each of which Apache 2.0 [11] is installed, as shown in Figure 5. The load generator server L first issues requests to the load balancer K. Then, the load balancer K assigns each request to one of the servers according to some allocation algorithm. Each server si compresses the reply file by using the Deflate module [13] on receipt of a request from the load generator server L. We measure the peak consumption of electric power and the average response time of each server si (i = 1, 2, 3). The power consumption ratio of the servers s1 , s2 , and s3 is 0.9 : 0.6 : 1 as shown in Figure 5. On receipt of a Web request, each server si finds a Virtual server

Gbit switch

Gbit switch

real server 1

Server 1

Server 2

Server 3

Numbre of CPUs

1

1

2

Numbre of cores

1

1

2

CPU

Load generation Load balancer server

real server 2

Memory Maximum computation rate Maximum power computation rate maxei

real server 3

Fig. 5. Cluster system

Intel Pentium 4 AMD Athlon AMD Opteron (2.8GHz) 1648B (2.7GHz) 2216HE (2.7GHz)

1.024MB

4,096MB

4096MB

1

1.2

4

0.9

0.6

1

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

21

reply file of the request and compresses the reply file by using the Deflate module. The size of the original reply file is 1Mbyte and the compressed reply file is 7.8Kbyte in size. The Apache benchmark software [10] is used to generate Web requests, where the total number 10,000 of requests are issued where 100 requests are concurrently issued to each server. The performance ratio of the servers s1 , s2 , and s3 are 1 : 1.2 : 4 as shown in Figure 5. The server s3 is the fastest and mostly consumes the electric power. The server s1 is slower than s3 but more consumes the electric power than the serve s2 . 5.2 Experimental Results If the weight is based on the performance ratio (P er), the requests are allocated to the servers s1 , s2 , and s3 with the ratio 1 : 1.2 : 4, respectively. On the other hand, if the weight is based on the power consumption ratio (P ow), the requests are allocated to the servers s1 , s2 , and s3 with the ratio 0.9 : 0.6 : 1, respectively. Here, by using the Apache benchmark software, the load generation server L transmits totally 100,000 requests to the servers s1 , s2 , and s3 where six requests are concurrently issued to the load balancer K. The total power consumption of the cluster system and the average response time of a request from a web server are measured. We consider a static web server where the size of a reply file for a request is not dynamically changed, i.e. the compressed version of the same HTML reply file is sent back to each user. In this experiment, the original HTML file and the compressed file are 1,025,027 [Byte] and 13,698 [Byte] in size, respectively. On the load balancer K, types of process allocation algorithms are adopted; the weighted round-robin (WRR) [20] algorithms, WRR-Per and WRR-Pow; the weighted least connection (WLC) [21] algorithms, WLC-Per and WLC-Pow. Figure 6 shows the total power consumption [W/H] of the cluster system for time. WRR-Per and WLC-Per show the total power consumption of the servers in the WRR and WLC algorithms with the performance based weight (Per), respectively. WRRPow and WLC-Pow indicate the power consumption of the WRR and WLC with power consumption based weight (Pow), respectively. In WRR-Per and WLC-Per, the total

Power consumption [W/H]

500 480 460

WRR Performance WLC-Performance WRR Power WLC Power

440 420 400 380 360 340

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Execution time [min]

Fig. 6. Power consumption

22

A. Aikebaier, T. Enokido, and M. Takizawa

execution time and peak power consumption are almost the same. In addition, the total execution time and peak power consumption are almost the same in WRR-Pow and WLC-Pow. This experimental result shows that the total power consumption and total execution time are almost the same for the two allocation algorithms if the same weight ratio is used. If the weight of the load balance algorithm is given in terms of the performance ratio (Per), the peak power consumption is higher than the power consumption ratio (Pow). However, in the Per, the total execution time is longer than Pow. Here, the total power consumption is calculated by the multiplication of the execution time and power consumption. The experiment shows the total power consumption is reduced by using the performance based weight (P er).

6 Concluding Remarks In this paper, we discussed the simple power consumption model of computers. We discussed the laxity-based algorithm to allocate a process to a computer so that the deadline constraint is satisfied and the total power consumption is reduced on the basis of the laxity concept. We obtained experimental results on electric power consumption of Web servers. We evaluated the simple model through the experiment of the PC cluster. Then, we showed the PC cluster follows the simple model. We are now considering types of applications like database transactions and measuring the power consumption of multi-CPU servers.

References 1. Akyildiz, I.F., Kasimoglu, I.H.: Wireless Sensor and Actor Networks: Research Challenges. Ad Hoc Networks Journal (Elsevier) 2, 351–367 (2004) 2. AMD, http://www.amd.com/ 3. Intel, http://www.intel.com/ 4. Green IT, http://www.greenit.net 5. Heath, T., Diniz, B., Carrera, E.V., Meira, W., Bianchini, R.: Energy Conservation in Heterogeneous Server Clusters. In: PPoPP 2005: Proceedings of the tenth ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp. 186–195 (2005) 6. Lynch, N.A.: Distributed Algorithms, 1st edn. Morgan Kaufmann Publisher, San Francisco (1997) 7. Montresor, A.: A robust Protocol for Building Superpeer overlay Topologies. In: Proc. of the 4th International Conference on Peer-to-Peer Computing, pp. 202–209 (2004) 8. Bianchini, R., Rajamony, R.: Power and Energy Management for Server Systems. IEEE Computer 37(11) (November 2004); Special issue on Internet data centers 9. Rajamani, K., Lefurgy, C.: On Evaluating Request-Distribution Schemes for Saving Energy in Server Clusters. In: Proc. of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 111–122 (2003) 10. ab - Apache HTTP server benchmarking tool, http://httpd.apache.org/docs/2.0/programs/ab.html 11. Apache 2.0, http://httpd.apache.org/ 12. VS-NAT, http://www.linuxvirtualserver.org/ 13. Apache Module mod-deflate, http://httpd.apache.org

Energy-Efficient Process Allocation Algorithms in Peer-to-Peer Systems

23

14. Aron, M., Druschel, P., Zwaenepoel, W.: Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers. In: Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pp. 90–101 (2000) 15. Bevilacqua, A.: A Dynamic Load Balancing Method on a Heterogeneous Cluster of Workstations. Informatica 23(1), 49–56 (1999) 16. Bianchini, R., Carrera, E.V.: Analytical and Experimental Evaluation of Cluster-Based WWW Servers. World Wide Web Journal 3(4) (December 2000) 17. Heath, T., Diniz, B., Carrera, E.V., Meira Jr., W., Bianchini, R.: Self-Configuring Heterogeneous Server Clusters. In: Proceedings of the Workshop on Compilers and Operating Systems for Low Power (2003) 18. Rajamani, K., Lefurgy, C.: On Evaluating Request-Distribution Schemes for Saving Energy in Server Clusters. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pp. 111–122 (2003) 19. Colajanni, M., Cardellini, V., Yu, P.S.: Dynamic Load Balancing in Geographically Distributed Heterogeneous Web Servers. In: Proceeding of the 18th International Conference on Distributed Computing Systems, p. 295 (1998) 20. Weighted Round Robin (WRR), http://www.linuxvirtualserver.org/docs/scheduling.html 21. Weighted Least Connection (WLC), http://www.linuxvirtualserver.org/docs/scheduling.html

Power Modeling of Solid State Disk for Dynamic Power Management Policy Design in Embedded Systems Jinha Park1, Sungjoo Yoo1, Sunggu Lee1, and Chanik Park2 1 Department of Electronic and Electrical Engineering, POSTECH (Pohang university of Science and Technology) 790-784 Hyoja-dong, Nam-gu, Pohang, Korea {litrine,sungjoo.yoo,slee}@postech.ac.kr 2 Flash Software Team, Memory Division, DS Solution Samsung Electronics Hwasung, Gyeonggi-do, Korea [email protected]

Abstract. Power consumption now becomes the most critical performance limiting factor to solid state disk (SSD) in embedded systems. It is imperative to devise design methods and architectures for power efficient SSD designs. In our work, we present the first step towards low power SSD design, i.e., power estimation of SSD. We present a practical approach of SSD power estimation which tries to keep the advantage of real measurement, i.e., accuracy, while overcoming its limitations, i.e., long execution time and lack of repeatability (and high cost) by a trace-based simulation. Since it is based on real measurements, it takes into account the power consumption of SSD controller as well as that of Flash memories. We show the effectiveness of the presented method in designing a dynamic power management policy for SSD. Keywords: Solid state disk, power consumption, measurement, trace-based simulation, dynamic power management, low power states.

1 Introduction Flash-based storage is becoming more and more popular in embedded systems such as smart card, smart phone, net book, labtop, etc. as well as in desktop PC and server. Compared with conventional non-volatile memory, namely hard disk drive (HDD), Flash-based storage can give several advantages, e.g., high performance, low power consumption, reliability, etc. Especially, solid state disk (SSD) is becoming a major non-volatile memory while replacing HDD in smart phones (e.g., iPhone 3GS with 32GB Flash memory) and net books as well as in notebook PCs and servers [1]. 1.1 Low Power SSD Design Problem SSD is inherently more power efficient than HDD since HDD requires large driving power for running mechanical parts, i.e., motors for spindle and head as well as performing I/O operations (between disk, DRAM buffer and host). However, SSD does not require the mechanical parts, but consumes power only for the electrical ones. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 24–35, 2009. © IFIP International Federation for Information Processing 2009

Power Modeling of SSD for Dynamic Power Management Policy Design

25

SSD can achieve higher performance than HDD mostly by parallel accesses to relatively low speed Flash devices (e.g., achieving a throughput higher than 240MB/s by accessing 8 Flash devices with 33Mbytes/sec each). High performance SSD inherits the same level of power consumption constraints that traditional HDD has in embedded systems. For instance, SSD has a peak power budget of about 1A and an average power consumption budget of about 1.2W in typical notebook PCs [2] and is expected to have much lower power budget in smart phones. Further performance improvement in SSD will require more power consumption especially due to more aggressive parallel accesses to Flash devices. Such an aggressive parallel scheme is not easily applicable due to the given power consumption constraints of embedded systems. However, there is no more large room in peak and average power budget to be used, by aggressively parallel accesses, for further performance improvement of SSD in embedded systems. Power consumption now becomes the most critical performance limiting factor in SSD. It is imperative to devise design methods and architectures for power efficient SSD designs. There has been little work on low power SSD designs. In our work, we present the first step towards low power SSD design, i.e., power estimation of SSD. We also report our application of the power estimation method to a low power SSD design where the parameter of dynamic power management is explored in a fast and cost effective way with the help of the presented power estimation method. 1.2 Power Estimation of SSD The design space for low power SSD design will be huge due to various possible design choices, e.g., parameter sets in dynamic power management (e.g., time-out parameters, DPM policies, etc.), Flash Translation Layer (FTL) algorithms and SSD controller architectural parameters (e.g., I/O frequency in Flash devices, DRAM buffer size and caching/prefetch methods in the controller, etc.).1 When exploring the design space, there can be two typical methods of evaluating the power consumption of design candidates: real measurement and full system simulation with a power model. Real measurement, which gives accurate power information, is in use in SSD product designs. There are two critical problems with real measurements: long design cycle (e.g., hours of SSD execution is required for the evaluation of one design candidate) and changing battery characteristics over repeated runs [3].2 The two problems prevent designers from performing extensive design space exploration which may require evaluating numerous design candidates. Thus, it will be impractical to evaluate all the choices with real SSD executions due to the long execution time and high cost of battery.3 The second candidate, cycle-level full system simulation with a power model, is prohibitive due to tool long a simulation runtime. Assuming a 200MHz SSD controller 1

2

3

In this paper, we consider only software-based solutions. There can be hardware design candidates such as # of channels, # ways/channel, DRAM capacity/speed, # CPUs, etc. Battery lifetime measurements require a procedure of fully charging and then completely discharging the battery while running the system. The battery characteristics degrade significantly after several times (~10 times) of such procedures. Thus, a new battery needs to be used for subsequent battery lifetime measurements. Statistical approximation and optimization methods, e.g., response surface model can also be applied to reduce the number of real executions.

26

J. Park et al.

and ~100 Kcycles/sec of simulation speed, it may take 125 days for the simulation of less than 1.5 hours of real SSD execution. Thus, SSD power estimation based on a detailed simulation model may not be practical in real SSD designs. 1.3 Our Contribution In this paper, we present a practical approach of SSD power estimation which tries to keep the advantage of real measurement, i.e., accuracy, while overcoming its limitations, i.e., long execution time and lack of repeatability (and high cost). The power estimation method takes as input real power measurements and SSD access trace. Then, it performs a trace-based simulation of SSD operation gathering the information of power consumption. Finally, it gives as output power profile (power consumption over time). Since it is based on real measurements, it takes into account the power consumption of SSD controller as well as that of Flash memories. The presented method of SSD power estimation gives fast estimation, via trace-based simulation, and accuracy, based on real power measurements. We also present an application of our method to designing a DPM policy for SSD. The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces Flash memory and SSD operations. Section 4 explains the power estimation method. Section 5 reports experimental results including the application to DPM policy design. Section 6 concludes the paper.

2 Related Work There have been several works for power characterization and optimization of HDD [4][5][6][7][8][9][10]. Hylick, et al. explain that the mechanical parts incur large overheads of power consumption especially when HDD starts to run the spindle and head [4]. Zedlewski, et al. present a performance and power model of HDD based on a Disk simulation model, DiskSim [5]. The HDD power model considers the entire HDD as a single entity and has four active-mode power states (seeking, rotation, reading and writing) and two idle-mode ones. IBM reports that fine-grained power states enable further energy reduction than conventional coarse-grained ones by enabling to exploit short idle periods to enter low power states more frequently and stay there for a longer period of time [6]. Lu, et al. compare several DPM policies for HDD [7]. Douglis, et al. offer a DPM policy where the time-out (if there is no new access during the time-out, a low power state is entered) is determined adaptively based on the accuracy of previous time-out prediction [8]. Helmhold, et al. present a machine learning-based disk spin-down method [9]. Bisson, et al. propose an adaptive algorithm to adaptively calculate the time-out for disk spin down utilizing multiple timeout parameters and considering spin-up latency cost [10]. Regarding SSD, a performance model is presented only recently in [11]. However, there is little work on power characterization and modeling for SSD. In terms of power optimization in Flash-based storage, there are two categories of approaches depending on the optimization target between active and idle-mode power consumption. Joo, et al. present a low power coding scheme for MLC (multi-level cell) Flash memory which has value-dependent power characteristics (e.g., in case of 2 bit MLC, the power consumptions of coding two two-bit data, 00 and 01 are different) [12]. Recently, a low power

Power Modeling of SSD for Dynamic Power Management Policy Design

27

solution is presented for 3D die stacking of Flash memory, where multiple Flash dies share a common charge pump circuit [13]. Regarding the idle-mode power reduction, in commercial SSD products [14], a fixed time-out and DIPM (device initiated power management) are utilized when the SSD detects an idle period and asks the host of a grant for transition to the low power state.

3 Preliminary: Flash Memory Operation and SSD Architecture Typically, a Flash memory device contains, in a single package, multiple silicon Flash memory dies in a 3D die stacking [13]. The Flash memory dies share the I/O signals of device in a time-multiplexed way. We call the I/O signals of package channel and each memory die way.4 A single Flash memory device can support up to the data throughput of data width*I/O frequency, e.g., 33MBps at 33MHz. In order to support higher bandwidth, we need to access Flash memory dies (ways) and devices (channels) in parallel. Fig. 1 shows an example of SSD architecture consisting of two channels (i.e., two Flash devices) and four ways (i.e., four Flash memory dies on a single device) per channel. The controller takes commands from the host (e.g., smartphone or notebook CPU) and performs its Flash Translation Layer (FTL) algorithm to find the physical page address(es) to access. Then, it accesses the corresponding channels and ways, if needed, in parallel. In terms of available parallelism, each way can run in parallel performing internal operations, e.g., internal read and program to transfer data between the internal memory cell array and its I/O buffer. However, the controller can access, at a time, only one way on each channel, i.e., the I/O signals of the package, in order to transfer data between the I/O buffer of the Flash memory die and the controller. Thus, the peak throughput is determined by the number of channels * I/O frequency. One of salient characteristics in Flash memory is that no update is allowed on already written data. When a memory cell needs an update, it first needs to be erased before a new data is written to it. We call such a constraint “erase before write”. In order to overcome the low performance due to “erase before write”, log buffers (often called update blocks) are utilized and the new write data is written to the log buffers. The Flash translation layer (FTL) on the SSD controller maintains the address mapping between the logical data and the physical data [15][16][17]. In reality, the controller, to be specific, FTL, determines the performance and power consumption, especially, of random reads/writes. Thus, the controller overhead needs to be taken into account in the power estimation of SSD.

Ch 1 Host

Controller Ch 2

NAND NAND NAND NAND NAND NAND NAND NAND

Fig. 1. Multi-channel/multi-way SSD controller architecture 4

We use two terms (channel and the I/O signals of device, and way and die) interchangeably throughout this paper.

28

J. Park et al.

4 SSD Power Estimation The power estimation requires as input -

-

Performance and power measurement data (read/write latency for each of read/write data sizes of 1/2/4/8/16/32/64/128/256 sectors, power consumption of sequential reads/writes, power consumption per power state (idle, partial, and slumber), and power state transition delay values), Information of SSD architecture (# channels, ways/channel, tR/tRC/tPROG/ tERASE), and SSD access trace obtained from a real execution on the host system.

We perform a trace-based simulation of the entire SSD (controller as well as Flash memories) considering critical architectural resource conflicts at a channel level. Then, we obtain power profile over time as well as output execution trace (e.g., the status of each channel/way and latency of each SSD access). The trace-based simulation also allows for the simulation of DPM policy where power state transitions are performed based on the given DPM policy. 4.1 Performance and Power Modeling Resource conflict modeling is critical in performance modeling. Given the performance data per SSD access size and the information of SSD architecture, we model the critical resource conflict at a channel level. To do that, we decompose the measured latency into two parts in terms of resource parallelism: the latency of Flash I/O and that of controller and Flash internal operation. We model the channel-level resource conflict with the Flash I/O latency since only one Flash I/O operation can be active at a time at each channel. However, the controller and Flash internal operation (read from cell array to I/O buffer, program, and erase) can be performed in parallel. Fig. 2 (a) illustrates the decomposition of latency for a single one-sector SSD write operation (i.e., a SATA write command of one sector size) in the case of SSD architecture in Fig. 1. At time t0, the controller transfers, via the corresponding channel, one sector data to the I/O buffer of target Flash die. After the I/O operation is finished at time t1, we model that the controller and Flash program operations take the other portion of latency from time t1 to t2. For the power modeling of active-mode operation, we decompose the power consumption into two parts: baseline and access-dependent parts. The baseline part corresponds to the power consumption of idle state when there is no SSD access in progress while the SSD controller is turned on. We measure the power consumption of idle state and use it as the baseline part. The access-dependent part is obtained by subtracting the baseline from the measured power consumption of sequential reads/writes. The access-dependent part is further decomposed into the power consumption of per-way operation. Fig. 2 (b) illustrates the decomposition. Fig. 2 (b) shows the power profile for the case of Fig. 2 (a). The baseline, i.e. the power consumption of idle state is always consumed over the entire period in the figure (to be specific, until a transition to a low power state is made). The access-dependent part for a single write operation (its derivation will be explained later in this section) is added to the total power between time ,

Power Modeling of SSD for Dynamic Power Management Policy Design

Flash 0 Flash 1 Flash 2 Flash 3 Flash 4 Flash 5 Flash 6 Flash 7

29

Flash 0 Flash 1 Flash 2 Flash 3 Flash 4 Flash 5 Flash 6 Flash 7

t0 t1

t2

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

Time

(a) Trace-based simulation of single write

Per-way power consumption = P per_way_write

Power

Baseline = P idle

t0 t1

t2

Power

Measured power consumption level = P sequential_write Time

(b) Power profile of single write

Time

(c) Trace-based simulation of sequential writes

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

Time

(d) Power profile of sequential writes

Fig. 2. Performance and power modeling

t0 and t2 during which the write operation, including Flash I/O, program, and controller execution, is executed. Due to the limitation in measuring power consumption, we do not make a further decomposition to separately handle each of Flash I/O, read, program, erase and controller execution. We expect more detailed power measurement will enable such a fine-grained decomposition to give more accurate power estimation, which will be left for our future work. Figs. 3 (c) and (d) illustrate how we derive the power consumption of per-way operation from the access-dependent part of sequential reads/writes. Fig. 2 (c) shows the result of trace-based simulation for the case of 8 page sequential writes to the SSD of Fig. 1. At time t0, t1, t2, and t3, the controller starts to access one Flash die on each channel to transfer a page data from the controller to the I/O buffer of the Flash die. Then, it initiates a program operation in the Flash die. After the latency of program and controller, the next Flash I/O operations start at time t5, t6, t7, and t8, respectively. Fig. 2 (d) shows the corresponding power profile. First, the baseline portion covers the entire period. Then, we add up the contribution of each Flash die to the total power as the figure shows. Thus, we see the peak plateau from t3 to t10 when all the eight Flash dies and controller are active. The measured power consumption of sequential writes corresponds to the power consumption at the plateau. Thus, we obtain the per-way power consumption of write operation (Pper_way_write) as follows. Pper_way_write = (Psequential_write – Pidle)/# active Flash dies at the plateau The per-way power consumption of read operation is calculated in the same way. 4.2 Trace-Based Simulation of Performance, Power and DPM Policy The trace-based simulation covers DPM policy as well as performance and power consumption. Fig. 3 (a) illustrates how a given DPM policy is simulated in the tracebased simulation. In the figure, we assume that a time-out (TO)-based DPM policy is simulated. Thus, the SSD needs to enter a low power state after an idle period of TO

30

J. Park et al.

since the completion of previous access. In the figure, at time t13, an idle period starts and a TO timer starts to count down. At the same time, the power consumption drops to the level of idle state power consumption. At t14, the TO timer expires and a low power state is entered. The figure shows that the total power consumption drops again down to the level of the entered low power state. At t15, a read command for 8 pages arrives. However, since the SSD is in a low power state, it takes a wakeup time, Twakeup to make a state transition to the active state. Thus, the read operations start at time t16 after the wakeup delay. Fig. 3 (b) shows the current profile obtained in the trace-based simulation. Fig. 4 shows the pseudo code of trace-based simulation. The simulation is eventdriven and handles three types of event: host command arrival (e.g., SATA read/write command), start/end of Flash operations (I/O operation, and read/program/erase operation), and power state transition (e.g., when a time-out counter expires). The simulation advances the simulation time, Tnow to the time point when the next event occurs (line 2 in the figure). At the time point when there is any event (line 3), we select the event in the order of end Æ start Æ state transition, where ‘end’ and ‘start’ represent events for the end and start of Flash operation, respectively (line 4). If the selected event is a host command, then we run the FTL algorithm to find the corresponding physical page addresses (PPAs) (line 6). Note that, if there is no available log buffer space, then the FTL can call a garbage collection which schedules data move and erase operations on the corresponding channels and ways. The garbage collection method is specific to the FTL algorithm. Note also that the power consumption and runtime of FTL algorithm is included in the access-dependent part of per-way power consumption and the controller latency. If the current state is a low power state (line 8), we need a state transition to the active state. Thus, during the wakeup period, we mark the power state as an intermediate state of state transition (PState = Transition2Active), set the total power to the idle state power consumption (Ptotal = Pidle), and remove, if any, a future event for transition to a low power state (lines 8—12). Twakeup

TO Flash 0 Flash 1 Flash 2 Flash 3 Flash 4 Flash 5 Flash 6 Flash 7

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

t14

t15

t16 t17 t18 t19 t20 t21 t22 Time

t15

t16 t17 t18 t19 t20 t21 t22 Time

(a) Trace-based simulation with DPM policy

Power P idle P low_power_state

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13

t14 (b) Power profile

Fig. 3. Trace-based simulation of DPM policy

Power Modeling of SSD for Dynamic Power Management Policy Design

31

A host command can create 2 to 128 event pairs for the start/end of Flash operations. If it is a read or write command for a data size less than and equal to the page size, it creates two event pairs: one pair, <start, end> for controller and Flash internal read or write operation and the other pair, <start, end> for Flash I/O operation. The 128 event pairs are created by a host command for 64 page (256 sector) read or write. In the trace-based simulation, the created event pairs are scheduled by a function, Schedule_events_for_Flash_operations(PPAs, Tinit), where PPAs is a list of physical page addresses obtained from the FTL algorithm run (on line 6). The function performs an ASAP (as soon as possible) scheduling algorithm to schedule the event pairs. Thus, it schedules each event pair at the earliest time point when the corresponding Flash channel (for Flash I/O operation) or way (for Flash internal operations and controller operation) becomes available. If the current state is a low power state, the scheduled time of the first event pair is adjusted to account for the wakeup delay (lines 11 and 13). If the new event (selected on line 4) is a start event and the current power state is a low power state, then the power state is set to the active state (line 17). Then, the power consumption of newly started operation is added to the total power consumption (line 18). If the new event is an end event (line 19), then that of the just finished operation is subtracted from the total power consumption (line 20). If there is no more future event for Flash operation, then it means that there is no active Flash channel and way and an idle period starts. Thus, we can insert into this point the function of DPM policy under development. Since we assume a simple TO-based DPM policy in Fig. 4, we schedule a TO event at Tnow + TO (line 23). 1 while(Tnow < end of simulation) { 2 Advance time Tnow to the next event 3 while (any event at time Tno w) { 4 new_event = pop(event_list(Tnow)) // pop the events of end of Flash operation first 5 If (new_event == host command) 6 Run FTL to find the corresponding PPAs 7 Tinit = Tnow 8 If (current status == low power state) 9 PState = Transition2Active 10 Pto tal = Pidle 11 Tinit = Tnow + Twakeup 12 Clear future events for transition to low power state 13 Schedule_events_for_Flash_operations(PPAs, Tinit) 14 Else if (new_event==start or end of Flash operation) 15 If (new event == start of Flash operation), then 16 If (PState==Transition2Active), then 17 PState=Active 18 Add the power consumption of the newly started operation to Pto tal 19 Else, // new_event = end of Flash operation 20 Subtract the power consumption of the just finished operation from Pto tal 21 If there is no more future event for Flash operation, then 22 // Insert DPM policy here. The following is a TO-based DPM policy example 23 Schedule a TO event at Tno w+TO 24 Else // power state transition event 25 // TO event for a power state transition in the DPM policy example 26 PState = LowPowerState 27 Ptotal = Plo w_power_state 28 // If there is any lower power state, then schedule a TO event here 29 } // end of “any event at time Tnow” 30 }

Fig. 4. Pseudo code of trace-based simulation algorithm

32

J. Park et al.

If the new event (selected in line 4) is a power state transition event (TO event in the DPM policy example), then the power state is set to the low power state (line 26). The total power consumption is set to that of low power state (line 27). If there is any lower power state, i.e., in the case of TO-based DPM policy with more than one low power states, then we can schedule another TO event (at line 28). The entire tracebased simulation continues until all the input SSD accesses are simulated.

5 Experiments We designed the power estimator in Matlab. For the experiments, we used a Samsung SSD (2.5”, 128GB, and SATA2)5 [18]. As the input data of performance and power consumption, we used the measurement data obtained from the real usage of SSD on a notebook PC running Windows VISTA. For performance, we used the measured latency of read/write commands for the data sizes of 1, 2, 4, 8, 16, 32, 64, 128, and 256 sectors, respectively. We also used the measured power consumption of sequential reads/writes and that of low power states (power consumption was measured for the idle state and two low power states called partial and slumber). We collected the input traces of SSD accesses from the notebook PC by running three scenarios of MobileMark 2007: Reader, Productivity, and DVD [19]. We also used PCMark05 to collect a SSD access trace as a heavy SSD usage case [20]. The accuracy of performance/power estimation was evaluated by comparing the estimation results and corresponding measurement data. The comparison showed that the trace-based simulation gives the same estimation results as measurement data in both power consumption (of sequential reads/writes) and latency (of all the read/write data sizes). We applied the trace-based simulation to a time out (TO)-based DPM policy design for SSD with two low power states (partial and slumber). TO-based DPM is used in HDD [6] and SSD [14] products. DPM in SSD is different from that in HDD since DPM in SSD can exploit short idle periods (much shorter than a second) which could not be utilized in HDD due to the high wakeup delay of mechanical parts (in seconds). Thus, DPM in SSD can give more reduction in energy consumption by entering low power states more frequently. TO-based DPM policy requires selecting a suitable TO which gives the minimum energy consumption over various scenarios. Fig. 5 shows performance estimation results obtained by sweeping the TO value for each of three MobileMark scenarios. In the TO sweep, for simplicity, we set two TO parameters (one for the transition from the active state to the first low power state partial, and the other for the transition from partial to the second power state slumber) to the same value. A single run of trace-based simulation takes 1~20 minutes depending on the number of accesses and TO values, which is 6~ 100+ times faster than real SSD runs.6 5

6

We assumed MLC, 4KB/page, 33MHz I/O frequency, 8 channels and 8 ways/channel based on the peak performance and capacity. We expect faster trace-based simulation can be achieved when implementing the algorithm in C/C++ rather than in Matlab.

Power Modeling of SSD for Dynamic Power Management Policy Design

"%"

$

33

( )&'

Fig. 5. MobileMark 2007 (a)~(c) and PCMark05 results (d)

Fig. 5 shows that there can be a trade-off between energy reduction and performance drop. The general trend is that as TO increases, energy consumption increases since less idle periods are utilized for energy reduction while performance penalty (due to accumulated wakeup delay) decreases since there are less wakeups. As shown in the figure, in the case of MobileMark scenarios, the sensitivity of performance drop to TO is the most significant in the Productivity scenario. It is because Productivity scenario has 5.34 and 6.26 times the SSD accesses of DVD and Reader scenarios, respectively. However, the absolute level of performance drop is not significant in the case of Reader and DVD scenarios and moderate in the case of Productivity scenario. It is because MobileMark runtime is dominated by idle periods which occupy about 95% of total runtime. Thus, the performance impact of DPM policy is not easily visible with the MobileMark traces. However, users may experience performance loss due to aggressive DPM policies (e.g., short TO such as 1ms) when the SSD is heavily accessed. PCMark05 represents such a scenario where the host runs continuously, thus, accessing the SSD more frequently than MobileMark. Fig. 5 (d) shows the result of PCMark05 which incurs up to 16.9% performance drop in the case of aggressive DPM policy (TO=1ms). It is mainly because PCMark05 has 9.7 times more SSD accesses (per minute) than Productivity scenario. Considering the results in Fig. 5, there can be a trade-off between energy reduction and performance drop which designers need to investigate in low power SSD design. As a final remark, Fig. 5 shows that there is still a large gap between the result of Oracle DPM (where we obtain maximum energy reduction without performance drop) and that of optimal single TO case. Our power estimation method will contribute to the design of sophisticated DPM policies to exploit the trade-off while closing the gap.

34

J. Park et al.

6 Conclusion In this paper, we presented a power estimation method for SSD in embedded systems. It takes as input a SSD access trace and the measurement data of performance and power consumption of SSD accesses and gives as output the power profile considering the DPM policy under development. The presented method gives accuracy in power consumption based on real measurement data and fast simulation by applying a trace-based approach. We also presented a case study of applying the method to designing a DPM policy for SSD. As our future work, we will perform an extensive analysis on the accuracy of power estimation with the comparisons against real measurements on various scenarios.

Acknowledgement This work was supported in part by Samsung Electronics.

References 1. Kim, B.: Design Space Surrounding Flash Memory. In: International Workshop on Software Support for Portable Storage, IWSSPS (2008) 2. Creasey, J.: Hybrid Hard Drives with Non-Volatile Flash and Longhorn. In: Windows Hardware Engineering Conference (WinHEC), MicroSoft (2005) 3. Communications with Samsung engineers 4. Hylick, A., Rice, A., Jones, B., Sohan, R.: Hard Drive Power Consumption Uncovered. ACM SIGMETRICS Performance Evaluation Review 35(3), 54–55 (2007) 5. Zedlewski, J., Sobti, S., Garg, N., Zheng, F., Krishnamurthy, A., Wang, R.: Modeling Hard-Disk Power Consumption. In: The USENIX Conference on File and Storage Technologies (FAST), pp. 217–230. USENIX Association (2003) 6. IBM: Adaptive Power Management for Mobile Hard Drives. IBM (1999), http://www.almaden.ibm.com/almaden/mobile_hard_drives.html 7. Lu, Y., De Micheli, G.: Comparing System-Level Power Management Policies. IEEE Design & Test of Computers 18(2), 10–19 (2001) 8. Douglis, F., Krishnam, P., Bershad, B.: Adaptive Disk Spin-down Policies for Mobile Computers. In: 2nd Symposium on Mobile and Location-Independent Computing, pp. 121–137. USENIX Association (1995) 9. Helmbold, D., Long, D., Sconyers, T., Sherrod, B.: Adaptive Disk Spin-Down for Mobile Computers. Mobile Networks and Applications 5(4), 285–297 (2000) 10. Bisson, T., Brandt, S.: Adaptive Disk Spin-Down Algorithms in Practice. In: The USENIX Conference on File and Storage Technologies (FAST). USENIX Association (2004) 11. Dirik, C., Jacob, B.: The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization. In: International Symposium on Computer Architecture, pp. 279–289. ACM, New York (2009) 12. Joo, Y., Cho, Y., Shin, D., Chang, N.: Energy-Aware Data Compression for Multi-Level Cell (MLC) Flash Memory. In: Design Automation Conference, pp. 716–719. ACM, New York (2007)

Power Modeling of SSD for Dynamic Power Management Policy Design

35

13. Ishida, K., Yasufuku, T., Miyamoto, S., Nakai, H., Takamiya, M., Sakurai, T., Takeuchi, K.: A 1.8V 30nJ adaptive program-voltage (20V) generator for 3D-integrated NAND flash SSD. In: International Solid-State Circuits Conference, pp. 238–239. IEEE, Los Alamitos (2009) 14. Intel: X25-M and X18-M Mainstream SATA Solid-State Drives. Intel (2009), http://www.intel.com/design/flash/nand/mainstream/index.htm 15. Lee, S., Park, D., Jung, T., Lee, D., Park, S., Song, H.: A Log Buffer-Based Flash Translation Layer Using Fully-Associative Sector Translation. ACM Transactions on Embedded Computing Systems (TECS) 6(3) (2007) 16. Kang, J., Cho, H., Kim, J., Lee, J.: A Superblock-based Flash Translation Layer for NAND Flash Memory. In: The 6th ACM & IEEE International conference on Embedded software (EMSOFT), pp. 161–170. ACM, New York (2006) 17. Lee, S., Shin, D., Kim, Y., Kim, J.: LAST: locality-aware sector translation for NAND flash memory-based storage systems. ACM SIGOPS Operating Systems Review 42(6) (2008) 18. Samsung SSD, Samsung, http://www.samsung.com/global/business/semiconductor/ products/flash/ssd/2008/product/pc.html (2009) 19. Business Applications Performance Corporation: MobileMark 2007. BAPCo (2007), http://www.bapco.com/products/mobilemark2007/ 20. Futuremark, co.: PCMark05. Futuremark (2009), http://www.futuremark.com/products/pcmark05/

Optimizing Mobile Application Performance with Model–Driven Engineering Chris Thompson, Jules White, Brian Dougherty, and Douglas C. Schmidt Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN USA {jules,briand,schmidt}@dre.vanderbilt.edu, [email protected]

Abstract. Future embedded and ubiquitous computing systems will operate continuously on mobile devices, such as smartphones, with limited processing capabilities, memory, and power. A critical aspect of developing future applications for mobile devices will be ensuring that the application provides sufficient performance while maximizing battery life. Determining how a software architecture will affect power consumption is hard because the impact of software design on power consumption is not well understood. Typically, the power consumption of a mobile software architecture can only be determined after the architecture is implemented, which is late in the development cycle when design changes are costly. Model-driven Engineering (MDE) is a promising solution to this problem. In an MDE process, a model of the software architecture can be built and analyzed early in the design cycle to identify key characteristics, such as power consumption. This paper describes current research in developing an MDE tool for modeling mobile software architectures and using them to generate synthetic emulation code to estimate power consumption properties. The paper provides the following contributions to the study of mobile software development: (1) it shows how models of a mobile software architecture can be built, (2) it describes how instrumented emulation code can be generated to run on the target mobile device, and (3) it discusses how this emulation code can be used to glean important estimates of software power consumption and performance.

1 Introduction Emerging trends and challenges. Mobile devices, such as smartphones, mobile internet devices and web-enabled media players, are becoming pervasive. These devices possess limited resources, such as battery capacity, which requires developers to carefully manage resource consumption. To optimize resource utilization, mobile application developers must understand the trade-offs between performance and battery life. It is hard to predict the effects of architectural optimizations in mobile devices until a system has been completely implemented, which makes it difficult to test power consumption and performance until late in the software lifecycle [14], e.g., during implementation and testing. Changes made at this point usually result in farreaching consequences to the overall design and cost much more compared to S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 36–46, 2009. © IFIP International Federation for Information Processing 2009

Optimizing Mobile Application Performance with Model–Driven Engineering

37

those made during earlier software lifecycle phases [12], e.g., during architectural design and analysis. Conventional techniques for developing mobile device software are not well-suited to identifying performance and power consumption trade-offs during earlier phases of the software lifecycle. These limitations stem largely from the difficulty of comparing the power consumption of one architectural design versus another without implementing and testing each on the target device. Moreover, for each function an application performs, there are often multiple possible designs for accomplishing the same task, each differing in terms of operational speed, battery consumption and accuracy. Even though these design variations can significantly impact device performance, there are too many permutations to implement and test each. For example, if a mobile application communicates with a server it can do so via several protocols, such as HTTP, HTTPS, or other socket connections. Developers can also elect to have the application and/or mobile device infrastructure submit data immediately or in a batch at periodic intervals. Each design option can result in differing power consumption profiles [13]. If the developer elects to use HTTPS over HTTP, the developer is provided with additional security. The overhead associated with key exchange and the encryption/decryption process, however, incurs additional processing time and increases the amount of information that must be transmitted over the network. Both of these require more power and time than would be required if standard HTTP was used. The combination of these architectural options results in too many possible variations to implement and test each one within a reasonable budget and production cycle. A given application could have hundreds or thousands of viable configurations that satisfy the stated requirements. Solution approach Æ Emulation of application behavior through model-driven testing and auto-generated code. Model-driven engineering (MDE) [15] provides a promising solution to the challenges described above. MDE relies on modeling languages, such as domain-specific modeling languages (DSMLs) [16], to visually represent various aspects of application and system design. These models can then be utilized for code generation and performance analysis. By creating a model of candidate solution architectures early in the design phase, instrumented architectural emulation code can be generated and then run on actual mobile devices. This MDE-based approach allows developers to quickly emulate a multitude of possible configurations and provides them with actual device performance data without investing the time and effort manually writing application code. The generated code emulates the modeled architecture by consuming sensor data, computational cycles, and memory as specified in the model, as well as transmitting/receiving faux data over the network. Since wireless transmissions consume most of the power on mobile devices [3] and network interaction is a key performance bottleneck, large-scale power consumption and performance trends can be gleaned by executing the emulation code. Moreover, as the real implementation is built, the actual application logic can be used to replace faux resource consuming code blocks to refine the accuracy of the model. This MDE-based solution has been utilized previously to eliminate some inherent flaws with serialized phasing in layered systems, specifically as they apply to system QoS and to identify design flaws early in the

38

C. Thompson et al.

software production life-cycle [9]. Some prior work [8] also employs model-driven analysis to conduct what-if analysis on potential application architectures. By utilizing MDE-based analysis, mobile software developers can quantitatively evaluate key performance and power consumption characteristics earlier in the software lifecycle (e.g., at design time) rather than later (e.g., during and after implementation), thereby significantly reducing software refactoring costs due to design flaws. MDE provides this by not only allowing developers to generate emulation code, but also by providing them with a high-level understanding of their application that is easy to modify on the fly. Changes can be made at design time by simply moving model elements around rather than rewriting code. Moreover, since emulation code is automatically generated from the model, developers can quickly understand key performance and power consumption characteristics of potential solution architectures without investing the time and effort to implement them. This paper describes emerging R&D efforts that seek to provide developers of mobile applications with an MDE-based approach to optimizing application resource consumption across a multitude of platforms at design time. This paper also describes a methodology for increasing battery longevity in mobile devices through applicationlayer modifications. By focusing on the application layer, developers can still reap the benefits of advanced SDKs and compilers that shield the developer from hardwarecentric decisions. Paper organization. The remainder of this paper is organized as follows: Section 2 presents a sample mobile application running on Google’s Android platform and introduces several challenges associated with resource consumption optimization and mobile application development; Section 3 discusses our current research work developing an MDE tool to allow developers to predict software architecture performance and power consumption properties earlier in the development process; Finally, Section 4 presents concluding remarks and lessons learned.

2 Motivating Example This section presents a motivating mobile application running on Google’s Android platform and describes several challenges associated with resource consumption optimization and mobile application development. 2.1 Overview of Wreck Watch Managing system resources properly can significantly affect device performance and battery life. For instance, reducing CPU instructions not only speeds performance but also reduces the time the process is in a non-idle state thereby reducing power consumption; reducing network traffic also speeds performance and reduces the power supplied to the radio. To demonstrate the importance of proper resource management and the value of model-based resource analysis, we present the following example mobile application, called Wreck Watch, shown in Figure 1.

Optimizing Mobile Application Performance with Model–Driven Engineering

39

Fig. 1. Wreck Watch Behavior

Wreck Watch runs on Google Android smartphones to detect car accidents (1) by analyzing data from the device’s GPS receiver and accelerometer and looking for sudden acceleration events from a high velocity indicative of a collision. Car accident data is then posted to an HTTP server where (2) it can be retrieved by other devices in the area to help alleviate traffic congestion, notify first responders, and (3) provide accident photos to an emergency response center. Users of Wreck Watch can also elect to have certain people contacted in the event of an accident via SMS message or a digital PBX. Figure 1 shows this behavior of Wreck Watch. Since the Wreck Watch application runs continuously in the background, it must conserve its power consumption. The application needs to run at all times and consume a great deal of sensor information to accurately detect wrecks. If not designed properly, therefore, these characteristics could result in a substantial decrease in battery life. In our testing, for example, the Wreck Watch application was able to completely drain the device battery in less than an hour simply through its use of sensors and network connectivity. In the case of Wi-Fi, the radio represents nearly 70% of device power consumption [2] and in extreme cases can consume 100 times the power of one CPU instruction to transmit one byte of data [3]. The amount of power consumed by the network adapter is generally proportional to the amount of information transmitted [1]. The framing and overhead associated with each protocol can therefore significantly affect the power consumption of the network adapter. Prior work [5] demonstrated that significant power savings could be achieved by modifying the MAC layer to minimize collisions and maximize time spent in the idle state. This work also recognized that network operations generally involved only the CPU and transceiver and by reducing client-side processing, they could substantially reduce power consumed by network transactions. Similarly, other work [7] demonstrated that such power savings could also be achieved through transport layer modifications.

40

C. Thompson et al.

Although MAC and transport layer modifications are typically beyond the scope of most software projects, especially mobile application development, the data transmitted on the network can be optimized so it is as lightweight as possible, thereby accomplishing, on a much smaller scale, some of the same effects. The remainder of this paper uses the Wreck Watch application to showcase key design challenges that developers face when building power-aware applications for mobile devices. 2.2 Design and Behavioral Challenges of Mobile Application Development Despite the ease with which mobile applications can be developed via advanced SDKs (such as Google Android and Apple iPhone) developers still face many challenges related to power consumption. If developers do not fully understand the implications of their designs, they can substantially reduce device performance. Battery life represents a major metric used to compare devices and can be influenced significantly by design decisions. Designing mobile applications while remaining cognizant of battery performance presents the following challenges to developers: Challenge 1: Accurately predicting battery consumption of arbitrary architectural decisions is hard. Each instruction executed can result in the consumption of an unknown amount of battery power. Accurately predicting the power consumed for each line of code is hard given the level of abstraction present in modern SDKs, as well as the complexity and numerous variations between physical devices. Moreover, disregarding language commonalities between completely unrelated devices, mobile platforms, such as Android, are designed to operate on a plethora of hardware configurations, which may affect the power consumption of a given configuration. Challenge 2: Trade-offs between performance and battery life are not readily apparent. Although performance and power consumption are generally design tradeoffs, the actual relationship between the two metrics is not readily apparent. For example, when comparing two networking protocols, plain HTTP and SOAP, plain HTTP might operate much faster requiring only 10 ms to transmit the data SOAP requires 50 ms to transmit. At the same time, HTTP might consume .5 mW, while SOAP consumes 1.5 mW. Without the context of real-world performance in a physical device it would be difficult to predict the overhead associated with SOAP. Moreover, this data may vary from one device to the next. Challenge 3: Effects of transmission medium on power consumed are largely device, application, and environment specific. Wireless radios consume a substantial amount of device power relative to other mobile-device components [6], where the power consumed is directly proportional to the amount of information transmitted [1]. Each radio also provides differing data rates, as well as power consumption characteristics. Depending on the application, developers must choose the connection medium best suited to application requirements, such as medium availability and transmission rate. The differences between transmission media are generally subtle and may even depend on environmental factors [10], such as network congestion that are impossible to accurately predict. To deterministically and accurately quantify performance, therefore, testing must be performed in environmentally-accurate situations.

Optimizing Mobile Application Performance with Model–Driven Engineering

41

Challenge 4: It is hard to accurately predict the effects of reducing sensor data consumption rates on power utilization. To provide the most accurate readings and results, device sensors would be polled as frequently as they sample data. This method consumes the most possible power, however, by not only requiring that the sensor be enabled constantly, but by also increasing the amount of data the device must process. In turn, reducing the time that the sensor is active significantly reduces the effectiveness and accuracy of the readings. Determining the exact amount of power saved by a reduction in polling rate or other sensor accuracy change is difficult without profiling such a change on a device. Challenge 5: Accurately assessing effects of different communication protocols on performance is hard without real-world analysis. Each communication protocol has a specific overhead associated with it that directly affects its overall throughput. The natural choice would be to select the protocol with the lowest overhead. While this decision yields the highest performance, it also results in a tightly coupled architecture [11] and substantially increases production time. That protocol would only be useful for the specific data set for which it was designed, in contrast to a standardized protocol, such as HTTP. Standardized protocols often support features that are unnecessary for many mobile applications, however, making the additional data required for HTTP transactions completely useless. It is challenging to predict how much of a tradeoff in performance is required to select the more extensible protocol because the power cost of such protocols cannot be known without profiling them in a real-world usage scenario. Discussions on performance optimization have often focused on hardware- or firmware-level changes and ignored potential application layer enhancements [3] [5] [6]. Interestingly, this corresponds to the level of abstraction present in each layer: device drivers and hardware have little or no abstraction while software applications are often more thoroughly abstracted. It is this level of abstraction, however, that makes such optimizations challenging because often the developer has little or no control over the final machine code. Application code thus cannot be benchmarked until it has been fully developed and compiled. Moreover, problems identified after the code is developed are substantially more costly to correct than those that can be identified at design time. The value of optimizing the performance of an application before any code is written is therefore of great value. Moreover, because power consumption is generally hardware-specific [1] such optimizations result in a tightly coupled architecture that requires the developer to rewrite code to benchmark other configurations.

3 Model-Based Testing and Performance Analysis This section describes our current work in developing a modeling language extension to the Generic Eclipse Modeling System (GEMS) (www.eclipse.org/gmt/gems) [17], called the System Power Optimization Tool (SPOT) for optimizing performance and power consumption of mobile applications at design time. GEMS is an MDE tool for building Domain Specific Modeling Languages (DSMLs) for the Eclipse platform. The goal of SPOT is to allow developers to rapidly model potential application architectures and

42

C. Thompson et al.

obtain feedback on the performance and power consumption of the architecture without manual implementation. The performance data is produced by generating instrumented architectural emulation code from the architectural model that is then run on the target hardware. After execution, cumulative results can be downloaded from the target device for analysis. This section describes the modeling language, emulation code generation, and performance measurement infrastructure that we are developing to address the five challenges described in Section 2.2. 3.1 Mobile Application Architecture Modeling and Power Consumption Estimation with SPOT To accurately model mobile device applications, SPOT provides a domain-specific modeling language (DSML) with components that (1) represent key, resourceconsuming aspects of a mobile application’s architecture and (2) allows developers to specify visual diagrams of a mobile application architecture, as shown in the workflow diagram in Figure 2. SPOT’s DSML, called the System Power Optimization Modeling Language (SPOML), allows developers to build architectural specifications from the following types of model elements: • CPU consumers, which represent computationally intense code-segments such as location-based notifications that require distance calculations on hundreds of points. • Memory consumers, which represent sections of application code that will incur heavy memory operations reducing performance and increasing power consumption, e.g., displaying an image, stored on disk, on the screen, etc. • Sensor data consumers, which will poll device sensors at user-defined intervals. • Network consumers, which periodically utilize network resources emulating actual application traffic • Screen drawing agents, which interact with device graphics libraries, such as OpenGL, to consume power by rendering images to the display. The sensor and network data consumers operate independently of application logic and simply present an interface through which their data can be accessed. The CPU consumer, however, need to incorporate application-specific logic, as well as logic from other aspects of the application. The CPU consumer module also allows for developers to integrate actual logic application as it becomes available to replace emulation code that is generated by SPOML. To provide the software developer with the most flexibility and extensibility possible, SPOML provides them with many key power consumptive architectural options that would be present if they were actually writing device code. For example, if the device presents 10 possible options for granularity of GPS readings, SPOML provides all 10 possibilities via visual elements, such as drop down menus and check boxes. SPOML also provides for constraint checking that warns developers at design time if certain configuration options are unlikely to work together. Ultimately, SPOT provides developers with the ability to modify design characteristics rapidly and model their system without any application-specific logic, as well as provides them with a means to incorporate actual application code.

Optimizing Mobile Application Performance with Model–Driven Engineering

43

Fig. 2. SPOT Analysis Cycle

3.2 Architectural Emulation Code Generation Due to the difficulty of estimating power consumption for an arbitrary device and software architecture it is essential to evaluate application performance on the actual physical hardware in production conditions. To accomplish this task, SPOT can automatically generate instrumented code to perform the functions outlined by the architecture modeled in SPOML. This code generation is done by traversing the inmemory object graph of the model and outputting optimized code to perform the resource-intensive operations specified in the model. The architectural emulation code is constructed from several basic building blocks, as described above. The sensor consumers remain largely the same between applications and require little input from the user developing the model. The only variable in their construction is the rate at which they poll the sensor. They present an interface through which their data can be accessed. The network consumer itself consists of several modules: a protocol, a transmission scheme and a payload interface. The payload interface defines methods that allow other components of the application to utilize the network connection and, for the purposes of emulation and analysis, this interface also helps define the structure of the data to transmit. The protocol module allows the developer to select from a set of predefined protocols (e.g., HTTP or SOAP) or create a custom protocol with a small amount of code. The transmission scheme defines a set of behaviors for how to

44

C. Thompson et al.

transmit data back to the server, which allows developers to specify whether the application should transmit as soon as data is available, wait until a certain amount of data is available, or even wait until a certain connection medium is available (such as Wi-Fi or EDGE). Finally, the screen rendering agent allows users to specify the interval at which the screen is refreshed or invalidated for a given view. Each module described above relies almost entirely on prewritten and optimized code. Of greater complexity for users are the CPU and memory consumers. Users may elect to utilize prewritten code that closely resembles the functionality they wish to provide. Alternatively, they can write their own code to use in these modules profile their architecture more accurately. This iterative approach allows developers to quickly model their architecture without writing detailed application logic and then as this code becomes available, refine their analysis to better represent the performance and behavior of the ultimate system. 3.3 Performance and Resource Consumption Management When generating emulation code, SPOT also generates instrumentation code to record device performance and track power consumption. This code writes these metrics to a file on the device that can later be downloaded to a host machine for analysis after testing. This approach allows developers to quantitatively compare metrics such as application responsiveness (by way of processor idle time, etc), network utilization and throughput and battery longevity. These comparisons provide the developer with a means to quickly and accurately design a system that minimizes power consumption without sacrificing performance. In some instances, this analysis could even highlight simple changes such as reducing the size of XML tags to reduce the overhead associated with downloading information from a server. In each challenge presented in Section 2.2 we establish that through current methods, certain characteristics of a design can only be fully understood postimplementation. Additionally, with newer platforms such as Google’s Android, the mobile device has become an embedded multi-application system. Since each device has significantly less resources than their tethered brethren, however, individual applications must be cognizant of their resource consumption. The value of understanding a given application’s power consumption profile is thus greatly increased. The solutions to each of these challenges lie within the same space: utilization of a model that can be used to accurately assess battery life. SPOT addresses mobile application performance analysis through the use of auto-generated code specified by a DSML, which allows users to estimate performance and power consumption early in the development process. Moreover, developers can perform continuous integration testing by replacing faux code with application logic as it is developed.

4 Concluding Remarks The capabilities of mobile devices have increased substantially over the last several years and with platforms, such as Apple’s iPhone and Google’s Android, will no doubt continue to expand. These platforms have ushered in a new era of applications and have presented developers with a wealth of new opportunities. Unfortunately,

Optimizing Mobile Application Performance with Model–Driven Engineering

45

with these new opportunities have come new challenges that developers must overcome to make the most of these cutting-edge platforms. In particular, predicting performance characteristics of a given design is hard, especially those characteristics associated with power consumption. A promising approach to address these challenges is to enhance model-driven engineering (MDE) tools to enable developers to quickly understand the consequences of architectural decisions. These conclusions can be drawn long before implementation, significantly reducing production costs and time while substantially increasing battery longevity and overall system performance. From our experience developing SPOT, we have learned the following lessons: • • •

By utilizing MDE it becomes possible to quantitatively compare design decisions and deliver some level of optimization with regards to power consumption, Developing applications for platforms such as Android require extensive testing as hardware configurations can greatly influence performance, and It is impossible to completely profile a system configuration because ultimate device performance and power consumption depends on user interaction, network traffic and other applications on the device.

The WreckWatch application is available under the Apache open-source license and can be downloaded at http://vuphone.googlecode.com.

References 1. Feeney, L., Nilsson, M.: Investigating the energy consumption of a wireless network interface in an ad hoc networking environment. In: IEEE INFOCOM, vol. 3, pp. 1548–1557 (2001) 2. Liu, T., Sadler, C., Zhang, P., Martonosi, M.: Implementing software on resourceconstrained mobile sensors: experiences with impala and zebranet. In: Proceedings of the 2nd international conference on Mobile systems, applications, and services, pp. 256–269 (2004) 3. Pering, T., Agarwal, Y., Gupta, R., Want, R.: Coolspots: Reducing the power consumption of wireless mobile devices with multiple radio interfaces. In: Proceedings of the Annual ACM/USENIX International Conference on Mobile Systems, Applications and Services, MobiSys (2006) 4. Poole, J.: Model-driven architecture: Vision, standards and emerging technologies. In: Workshop on Metamodeling and Adaptive Object Models, ECOOP (2001) 5. Chen, J., Sivalingam, K., Agrawal, P., Kishore, S.: A comparison of mac protocols for wireless local networks based on battery power consumption. In: IEEE INFOCOM 1998. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies (1998) 6. Krashinsky, R., Balakrishnan, H.: Minimizing energy for wireless web access with bounded slowdown. Wireless Networks 11, 135–148 (2005) 7. Kravets, R., Krishnan, P.: Application-driven power management for mobile communication. Wireless Networks 6, 263–277 (2000) 8. Paunov, S., Hill, J., Schmidt, D., Baker, S., Slaby, J.: Domain-specific modeling languages for configuring and evaluating enterprise DRE system quality of service. In: 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems, ECBS 2006 (2006)

46

C. Thompson et al.

9. Hill, J., Tambe, S., Gokhale, A.: Model-driven engineering for development-time QoS validation of component-based software systems. In: Proceeding of International Conference on Engineering of Component Based Systems (2007) 10. Carvalho, M., Margi, C., Obraczka, K., et al.: Garcia-Luna-Aceves. Modeling energy consumption in single-hop IEEE 802.11 ad hoc networks. In: Thirteenth International Conference on Computer Communications and Networks (ICCCN 2004), pp. 367–377 (2004) 11. Gay, D., Levis, P., Culler, D.: Software design patterns for TinyOS. ACM, New York (2007) 12. Boehm, B.: A spiral model of software development and enhancement. In: Software Engineering: Barry W. Boehm’s Lifetime Contributions to Software Development, Management, and Research, vol. 21, p. 345. Wiley-IEEE Computer Society Pr. (2007) 13. Tan, E., Guo, L., Chen, S., Zhang, X.: PSM-throttling: Minimizing energy consumption for bulk data communications in WLANs. In: IEEE International Conference on Network Protocols, ICNP 2007, pp. 123–132 (2007) 14. Kang, J., Park, C., Seo, S., Choi, M., Hong, J.: User-Centric Prediction for Battery Life time of Mobile Devices. In: Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management, pp. 531–534 (2008) 15. Kent, S.: Model Driven Engineering. In: Butler, M., Petre, L., Sere, K. (eds.) IFM 2002. LNCS, vol. 2335, pp. 286–298. Springer, Heidelberg (2002) 16. Lédeczi, A., Bakay, A., Maroti, M., Völgyesi, P., Nordstrom, G., Sprinkle, J., Karsai, G.: Composing domain-specific design environments. Computer, 44–51 (2001) 17. White, J., Schmidt, D.C., Mulligan, S.: The Generic Eclipse Modeling System. In: ModelDriven Development Tool Implementer’s Forum at the 45th International Conference on Objects, Models, Components and Patterns, Zurich, Switzerland (June 2007)

A Single-Path Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Institute of Computer Engineering Vienna University of Technology, Austria [email protected], {peter,raimund}@vmars.tuwien.ac.at

Abstract. In this paper we explore the combination of a time-predictable chipmultiprocessor system with the single-path programming paradigm. Time-sliced arbitration of the main memory access provides time-predictable memory load and store instructions. Single-path programming avoids control flow dependent timing variations. To keep the execution time of tasks constant, even in the case of shared memory access of several processor cores, the tasks on the cores are synchronized with the time-sliced memory arbitration unit.

1 Introduction As more and more speedup features are added to modern processors and we are moving from single-core to multi-core processor systems, the analysis of the timing of the applications running on these systems is getting increasingly complex. The timing of single tasks per se is difficult to understand and to analyze. Besides that, task timing can no longer be considered as an isolated issue in such systems as the competition for shared resources and interferences via the state of the shared hardware lead to mutual dependencies of the progress and timing of different tasks. We are convinced that the only way of making these highly complex processing systems time predictable is to impose some restrictions on their architecture and on the way in which the mechanisms of the architecture are used. So far we have worked along two main lines of research aiming at real-time processing systems with predictable timing: On the software side we have conceived the single-path execution strategy [1]. The single-path approach allows us to translate task code in a way that the resulting code has exactly one execution trace that all executions of the task have to follow. To this end, the single-path conversion eliminates all input-dependent control flow decisions – by applying a set of code transformations [2] and if-conversion [3] it translates all input-dependent alternatives (i.e., code with if-then-else semantics) into straight-line predicated code. Loops with input-dependent termination are converted into loops that are semantically equivalent but whose iteration count is fully determined at system construction time. Architecture-wise we have been working on time-predictable processors and chipmultiprocessor (CMP) systems. We have developed the JOP prototype of a timepredictable processor [4] and built a CMP system with a number of JOP cores [5]. In this multiprocessor system a static time-division multiple access (TDMA) arbitration scheme controls the accesses of the cores to the common memory. The pre-planning of S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 47–57, 2009. c IFIP International Federation for Information Processing 2009

48

M. Schoeberl, P. Puschner, and R. Kirner

memory access schedules eliminates the need for dynamic conflict resolution and guarantees the temporal isolation that is necessary to allow for an independent progression of the computations on the CMP cores. So far, we have dealt with each of the two topics in separation. This paper is the first that describes our work on combining the concepts of the single-path approach and our time-predictable CMP architecture. We thus present an execution environment that provides both temporal predictability to the highest degree and the performance benefits of parallel code execution on multiple cores. By generating deterministic single-path code, running this code on predictable processor cores, and using a rigid, pre-planned scheme to access the global memory we manage to achieve completely stable, and therefore predictable execution times for each single task in isolation as well as for entire applications consisting of multiple cooperating tasks running on different cores. To the best of our knowledge this has not been achieved for any other state-of-the-art CMP system so far.

2 The Single-Path Chip-Multiprocessor System The main goal of our approach is to build an architecture that provides a combination of good performance and high temporal predictability. We rely on chip-multiprocessing to achieve the performance goal and on an offline-planning approach to make our system predictable. The idea of the latter is to take as many control decisions as possible before the system is actually run. This reduces the number of branching decisions that need to be taken during system operation, which, in turn, causes a reduction of the number of possible action sequences with possibly different timings that need to be considered when planning respectively evaluating the system’s timely operation. 2.1 System Overview We consider a CMP architecture that hosts n processor cores, as shown in Figure 1. On each core the execution of simple tasks is scheduled statically as cyclic executive. All core’s schedulers have the same major cycle that is synchronized to the shared memory arbiter. Each of the processors has a small local method cache (M$) for storing recently used methods, a local stack cache (S$), and a small local scratchpad memory (SPM) for storing temporary data. The scratchpad memory can be mapped to thread local scopes [6] for integration into the Java programming language. All caches contain only thread local data and therefore no cache coherence protocol is needed. To avoid cache conflicts between the different cores our CMP system does not provide a shared cache. Instead, the cores of the time-predictable CMP system access the shared main memory via a TDMA based memory arbiter with fine-grained statically-scheduled access. 2.2 TDMA Memory Arbiter The TDMA based memory arbiter provides a static schedule for the memory access. Therefore, access time to the memory is independent of tasks running on other cores. In the default configuration each processor cores has an equally sized slot for the memory

A Single-Path Chip-Multiprocessor System

JOP pipeline

M$

S$

JOP pipeline

SPM

M$

S$

49

JOP pipeline

SPM

M$

S$

SPM

TDMA arbiter

JOP chip-multiprocessor

Memory controller

Main memory

Fig. 1. A JOP based CMP system with core local caches (M$, S$) and scratchpad memories (SPM), a TDMA based shared memory arbiter, and the memory controller

access. The TDMA schedule can also be optimized for different utilizations of processing cores. In [7] we have optimized the TDMA schedule to distribute slack time of tasks to other tasks with a tighter deadline. The worst-case execution time (WCET) of a memory loads or stores can be calculated by considering the worst-case phasing of the memory access pattern relative to the TDMA schedule [8]. With single-path programming, and the resulting static memory access pattern, the execution time of tasks on a TDMA based CMP system is almost constant. The only jitter results from different phases of the task start time to the TDMA schedule. The maximal execution time jitter, due to different phases between the task start time and the TDMA schedule, is the length of the TDMA round minus one. Thus, the TDMA arbiter very well supports time-predictable program execution. The maximal jitter due to TDMA delays is bounded and relatively small. And if one is interested to even completely avoid this short bounded execution time jitter, this can be achieved by synchronizing the task start with the TDMA schedule, using the deadline instruction described in Section 3.2. 2.3 Tasks All tasks in our system are periodic. Tasks are considered to be simple tasks according to the Simple-Task Model introduced in [9]:1 Task inputs are assumed to be available when 1

More complex task structures can be simulated by splitting tasks into sets of cooperating simple tasks.

50

M. Schoeberl, P. Puschner, and R. Kirner

a task instance starts, and outputs become ready for further processing upon completion of a task execution. Within its body a task is purely functional, i.e., it does neither access common resources nor does it include delays or synchronization operations. To realize the simple-task abstraction, a task implementation actually consists of a sequence of three parts: read inputs – execute – write outputs. While the application programmer must provide the code for the execute part (i.e., the functional part), the first and the third part can be automatically generated from the description of the task interface. These read and write parts of the task implementations copy data between the shared state and task-local copies of that state. The local copies can reside in the common main memory or in the processor-local scratchpad memory. The placement depends on the access frequency and size of the local state. Care must be taken to schedule the data transfers between the local state copy and the global, shared state such that all precedence and mutual exclusion constraints between tasks are met. This scheduling problem is very similar to the problem of constructing static scheduling tables for distributed hard real-time computer systems with TDMA message scheduling in which task execution has to be planned such that task-order relations are obeyed and the message and task sequencing guarantees that all communication constraints are met. A solution to this scheduling problem can be found in [10]. Following our strategy to achieve predictability by minimizing the number of control decisions taken during runtime, all tasks are implemented in single path code. This means, we apply the single-path transformation described in [1,2] to (a) serialize all input-dependent branches and (b) transform all loops with input-dependent termination into loops with a constant iteration count. In this way, each instance of a task executes the same sequence of instructions and has the same temporal access pattern to instructions and data. 2.4 Mechanisms for Performance and Time Predictability By executing tasks on different cores with some local cache and scratchpad memory we manage to increase the system’s performance over a single-processor system. The following mechanisms make the operation of our system highly predictable: – Tasks on a single core are executed in a cyclic executive, avoiding cache influences due to preemption. – Accesses to the global shared memory are arbitrated by a static TDMA memory arbitration scheme, thus leaving no room for unpredictable conflict resolution schemes and unknown memory access times. – The starting point of all task periods and the starting point of the TDMA cycle for memory accesses are synchronized, and each task execution starts at a pre-defined offset within its period. Further, the single-path task implementation guarantees a unique trace of instruction and memory accesses. All these properties taken together allow for an exact prediction of instruction execution times and memory access times, thus making the overall task timing fully transparent and predictable. – As the read and write sections of the tasks may need more than a single TDMA slot for transferring their data between the local and the global memory, read and write operations are pre-planned and executed in synchrony with the global execution cycle of all tasks.

A Single-Path Chip-Multiprocessor System

51

Besides its support for predictability, our planning-based approach allows for the following optimizations of the TDMA schedules for global memory accesses. These optimizations are based on the knowledge available at the planning time: – The single-path implementation of tasks allows us to exactly spot which parts of a task’s execute part need a higher and which parts need a lower bandwidth for accessing the global memory (e.g., a task does not have to fetch instructions from global memory while executing a method that it has just loaded into its local cache). This information can be used to adapt the memory access schedule to optimize the overall performance of memory accesses. While an adaption of memory-access schedules to the bandwidth requirements of different processing phases has been proposed before [11,12], it seems that this technique can provide its maximum benefit when applied to single-path code – only the execution of single-path code yields a unique, and therefore fully predictable sequence and timing of memory accesses. – A similar optimization can be applied to optimize the timing of memory accesses during the read and write sections of the task implementations. These sections access shared data and should therefore run under mutual exclusion. Mutual exclusion is guaranteed by the static, table-driven execution regime of the system. Still, the critical sections should be kept short. The latter could be achieved by an adaption of the TDMA memory schedule that assigns additional time slots to tasks at times when they perform memory-transfer operations. Our target is a time-deterministic system, which means that not only the value of a function is deterministic, but also the execution time. It is desirable to exactly know which instruction is executed at each point in time. Execution time shall be a repeatable and predictable property of the system [13].

3 Implementation The proposed design is evaluated in the context of the Java optimized processor (JOP) [4] based CMP system [5]. We have extended JOP with two instructions: a predicated move instruction for single-path programming in Java and a deadline instruction to synchronize application tasks with the TDMA based memory arbiter. 3.1 Conditional Move Single path programming substitutes control decisions (if-then-else) by predicated move instructions. To avoid execution time jitter, the predicated move has to have a constant execution time. On JOP we have implemented a predicated move for integer values and references. This instruction represents a new, system specific Java virtual machine (JVM) bytecode. This new bytecode is mapped to a native function for access from Java code. The semantic of the function result = Native.condMove(x, y, b);

is equivalent to

52

M. Schoeberl, P. Puschner, and R. Kirner result = b ? x : y;

without the need for any branch instruction. The following listing shows usage of conditional move for integer and reference data types. The program will print 1 and true. String a = ”true” ; String b = ” false ” ; String result ; int val ; boolean cond = true; val = Native.condMove(1, 2, cond); System.out.println (val ); result = (String) Native.condMoveRef(a, b, cond); System.out.println ( result );

The representation of the conditional move as a native function call has no call overhead. The function is substituted by the system specific bytecode during link time (similar to function inlining). 3.2 Deadline Instruction In order to synchronize a task with the TDMA schedule a wait instruction with a resolution of single clock cycles is needed. We have implemented a deadline instruction as proposed in [14]. The deadline instruction stalls the processor pipeline until the desired time in clock cycles. To avoid a change in the execution pipeline we have implemented a semantic equivalent to the deadline instruction. Instead of changing the instruction set of JOP, we have implemented an I/O device for the cycle accurate delay. The time value for the absolute delay is written to the I/O device and the device delays the acknowledgment of the I/O operation until the cycle counter reaches this value. This simple device is independent of the processor and can be used in any architecture where an I/O request needs an acknowledgment. I/O devices on JOP are mapped to so called hardware objects [15]. A hardware object represents an I/O device as a plain Java object. Field read and write access are actual I/O register read and write accesses. The following code shows the usage of the deadline I/O device. SysDevice sys = IOFactory.getFactory().getSysDevice(); int time = sys. cntInt ; time += 1000; sys.deadLine = time;

The first instruction requests a reference to the system device hardware object. This object (sys) is accessed to read out the current value of the clock cycle counter. The deadline is set to 1000 cycles after the current time and the assignment sys.deadline = time writes the deadline time stamp into the I/O device and blocks until that time.

A Single-Path Chip-Multiprocessor System

53

4 Evaluation We evaluate our proposed system within a Cyclone EP1C12 field-programmable gate array that contains 3 processor cores and 1 MB of shared memory. The shared memory is an SRAM with 2 cycles read access time and 3 cycles write access time. Some bytecode instructions contain several memory accesses (e.g., an array access needs three memory reads: read of the array size for the bounds check, an indirection through a forwarding handle,2 and the actual read of the array value). For several bytecode instructions the WCET is minimized with a slot length of 6 cycles. The resulting TDMA round for three cores is 18 cycles. As a first experiment we measure the execution time of a short program fragment with access to the main memory. Without synchronizing the task start with the TDMA arbiter we expect some jitter. To provoke all possible phase relations between the task and the TDMA schedule the deadline instruction was used to shift the task start relative to the TDMA schedule. The resulting execution time varies between 342 and 359 clock cycles. Therefore, the maximum observed execution time jitter is the length of the TDMA round minus one (17 cycles). With the deadline instruction we make each iteration of the task start at multiples of the TDMA round (18 clock cycles in our example). In that case each task executes for a cycle accurate constant duration. This little experiment shows that single-path programming on a CMP system, synchronized with the TDMA based memory arbitration, results in repeatable execution time [13]. 4.1 A Sample Application To validate our programming model for cycle-accurate real-time computing, we developed a controller application that consists of five communicating tasks. This case study is a demonstrator that cycle-accurate computing is possible on a CMP system. Further, this case study give us some insights about the practical aspects of using the proposed programming model. The architecture of the sample application is given in Figure 2. The application is demonstrative because of its rather complex inter-process communication pattern, which shows the need of precise scheduling decisions to meet the different precedence constraints. The application consists of the following tasks: – τ1 and τ2 are the sampling tasks that read from sensors. τ1 samples the reference value and τ2 samples the system value. This two tasks share the same code basis and they run at the double frequency than the controller task to allow a low-pass filtering by averaging the sensor values. – τ3 is the proportional-integral-derivative controller (PID controller) that gets the reference value from τ1 and the feedback of the current system value from τ2 . – τ4 is a system guard similar to a watchdog timer that controls the liveness of τ1 , τ2 , and τ3 . Whenever the write phase of τ1 , τ2 , and τ3 has not been executed between two subsequent activations of τ4 then the system is set into an error state. 2

The forwarding handle is needed for the implementation of the real-time garbage collector.

54

M. Schoeberl, P. Puschner, and R. Kirner

τ1 : STSampler

τ3 : STController

τ2 : STSampler Core 1

τ4 : STGuard τ5 : STMonitor

Core 2

Core 3

JOP chip−multiprocessor Fig. 2. Sample application: control application

τ1 : STSampler

τ5 : STMonitor

τ3 : STController

τ4 : STGuard

τ2 : STSampler

Fig. 3. Communication directions of the control application

– τ5 is a monitoring task that periodically collects the sensor values (from τ1 and τ2 ) and the control value (from τ3 ). The write part of τ5 is currently empty, but it can be used to include the code for transferring the collected system state to a host computer. The inter-task communication of the sample application is summarized in Figure 3. It shows that this small application has a relatively complex communication pattern. Each task communicates with almost all other tasks. The communication pattern has a direct influence on the system schedule. The resulting precedence constraints have to be taken into account for scheduling the read, execute, and write phases for each task. And of course, since this is a CMP system, some of the task phases are executed in parallel, which complicates the search for a tight schedule. Tasks τ1 -τ5 are implemented in single-path code, thus their execution time does not depend on control-flow decisions. Since also the scheduler has a single-path implementation, the system executes exactly the same instruction sequence at each scheduling round.

A Single-Path Chip-Multiprocessor System

55

Table 1. Measured single-path execution time in clock cycles Task

Read

Execute

Write

Total

τ1 ,τ2 τ3 τ4 τ5

594 864 26604 1368

774 65250 324 324

576 576 28422 324

1944 66690 55350 2016

All tasks are synchronized on each activation with the same phase of the TDMA based memory arbiter. Therefore, their execution time does not have any jitter due to different phase alignments of the memory arbiter. With such an implementation style it is possible on the JOP to determine the WCET of each task directly by a single execution-time measurement (by enforcing either a cache hit or miss of the method). Table 1 shows the observed WCET values for each task, given separately for the read, execute, and write part of the tasks. The absolute WCET values are not that important to discuss, but more important is the fact that the execution time of each task is deterministic, not depending on the input data. To summarize on the practical aspects of the programming model, it has shown that even this relatively simple application results in a scheduling problem that is rather tricky to be solved without tool support. For the purpose of our paper we solved it manually using a graphical visualization of the relative execution times and determining the activation times of each task manually. However, to successfully use this programming model for industrial production code, the use of a scheduling tool is highly advisable [10]. With respect to generating a tight schedule, it has shown that the predictable execution time of all tasks is very helpful.

5 Related Work Time-predictable multi-threading is developed within the PRET project [14]. The processor cores are based on a RISC architecture. Chip-level multi-threading for up to six threads eliminates the need for data forwarding, pipeline stalling, and branch prediction. The access of the individual threads to the shared main memory is scheduled similar to our TDMA arbiter with the so called memory wheel. The PRET architecture implements the deadline instruction to perform time based, instead of lock based, synchronization for access to shared data. In contrast to our simple task model, where synchronization is avoided due to the three different execution phases, the PRET architecture performs time based synchronization within the execution phase of a task. The approach, which is closest related to our work, is presented in [11,12]. The proposed CMP system is also intended for tasks according to the simple task model [9]. Furthermore, the local cache loading for the cores is performed from a shared main memory. Similar to our approach, a TDMA based memory arbitration is used. The paper deals with optimization of the TDMA schedule to reduce the WCET of the tasks. The design also considers changes of the arbiter schedule during task execution to optimize the execution time. We think that this optimization can be best performed when the

56

M. Schoeberl, P. Puschner, and R. Kirner

access pattern to the memory is statically known – which is only possible with singlepath programming. Therefore, the former approach to TDMA schedule optimization shall be combined with our single-path based CMP system. Optimization of the TDMA schedule of a CMP based real-time system has been proposed in [7]. The described system proposes a single core per thread to avoid the overhead of thread preemption. It is argued that future systems will contain many cores and the limiting resource will be the memory bandwidth. Therefore, the memory access is scheduled instead of the processing time.

6 Conclusion A statically scheduled chip-multiprocessor system with single-path programming and a TDMA based memory arbitration delivers repeatable timing. The repeatable and predictable timing of the system simplifies the safety argument: measurement of the execution time can be used instead of WCET analysis. We have evaluated the idea in the context of a time-predictable Java chip-multiprocessor system. The cycle accurate measurements showed that the approach is sound. For the evaluation of the system we have chosen a TDMA slot length that was optimal for the WCET of individual bytecodes. If this slot length is also optimal for singlepath code is an open question. In future work we will evaluate different slot lengths to optimize the execution time of single-path tasks. Furthermore, the change of the TDMA schedule at predefined points in time is another option we want to explore.

Acknowledgments The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement number 214373 (Artist Design) and 216682 (JEOPARD).

References 1. Puschner, P., Burns, A.: Writing temporally predictable code. In: Proc. 7th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, January 2002, pp. 85–91 (2002) 2. Puschner, P.: Transforming execution-time boundable code into temporally predictable code. In: Kleinjohann, B., Kim, K.K., Kleinjohann, L., Rettberg, A. (eds.) Design and Analysis of Distributed Embedded Systems, pp. 163–172. Kluwer Academic Publishers, Dordrecht (2002); IFIP 17th World Computer Congress - TC10 Stream on Distributed and Parallel Embedded Systems (DIPES 2002) 3. Allen, J., Kennedy, K., Porterfield, C., Warren, J.: Conversion of Control Dependence to Data Dependence. In: Proc. 10th ACM Symposium on Principles of Programming Languages, January 1983, pp. 177–189 (1983) 4. Schoeberl, M.: A Java processor architecture for embedded real-time systems. Journal of Systems Architecture 54(1-2), 265–286 (2008) 5. Pitter, C., Schoeberl, M.: A real-time Java chip-multiprocessor. Trans. on Embedded Computing Sys. (accepted for publication, 2009)

A Single-Path Chip-Multiprocessor System

57

6. Wellings, A., Schoeberl, M.: Thread-local scope caching for real-time Java. In: Proceedings of the 12th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2009), Tokyo, Japan. IEEE Computer Society, Los Alamitos (2009) 7. Schoeberl, M., Puschner, P.: Is chip-multiprocessing the end of real-time scheduling? In: Proceedings of the 9th International Workshop on Worst-Case Execution Time (WCET) Analysis, Dublin, Ireland, OCG (July 2009) 8. Pitter, C.: Time-predictable memory arbitration for a Java chip-multiprocessor. In: Proceedings of the 6th International Workshop on Java Technologies for Real-time and Embedded Systems, JTRES 2008 (2008) 9. Kopetz, H.: Real-Time Systems. Kluwer Academic Publishers, Dordrecht (1997) 10. Fohler, G.: Joint scheduling of distributed complex periodic and hard aperiodic tasks in statically scheduled systems. In: Proceedings of the 16th Real-Time Systems Symposium, December 1995, pp. 152–161 (1995) 11. Andrei, A., Eles, P., Peng, Z., Rosen, J.: Predictable implementation of real-time applications on multiprocessor systems on chip. In: Proceedings of the 21st Intl. Conference on VLSI Design, January 2008, pp. 103–110 (2008) 12. Rosen, J., Andrei, A., Eles, P., Peng, Z.: Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip. In: Proceedings of the Real-Time Systems Symposium (RTSS 2007), December 2007, pp. 49–60 (2007) 13. Lee, E.A.: Computing needs time. Commun. ACM 52(5), 70–79 (2009) 14. Lickly, B., Liu, I., Kim, S., Patel, H.D., Edwards, S.A., Lee, E.A.: Predictable programming on a precision timed architecture. In: Altman, E.R. (ed.) Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES 2008), Atlanta, GA, USA, pp. 137–146. ACM, New York (2008) 15. Schoeberl, M., Korsholm, S., Thalinger, C., Ravn, A.P.: Hardware objects for Java. In: Proceedings of the 11th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2008), Orlando, Florida, USA. IEEE Computer Society, Los Alamitos (2008)

Towards Trustworthy Self-optimization for Distributed Systems Benjamin Satzger, Florian Mutschelknaus, Faruk Bagci, Florian Kluge, and Theo Ungerer Department of Computer Science University of Augsburg, Germany {satzger,bagci,kluge,ungerer}@informatik.uni-augsburg.de http://www.informatik.uni-augsburg.de/sik

Abstract. The increasing complexity of computer-based technical systems requires new ways to control them. The initiatives Organic Computing and Autonomic Computing address exactly this issue. They demand future computer systems to adapt dynamically and autonomously to their environment and postulate so-called self-* properties. These are typically based on decentralized autonomous cooperation of the system’s entities. Trust can be used as a means to enhance cooperation schemes taking into account trust facets such as reliability. The contributions of this paper are algorithms to manage and query trust information. It is shown how such information can be used to improve self-* algorithms. To quantify our approach evaluations have been conducted. Keywords: Trust, self-*, Autonomic Computing.

1

self-optimization,

Organic

Computing,

Introduction

The evolution of computer systems starting from mainframes towards ubiquitous distributed systems progressed rapidly. Common for early systems is the necessity for human administrators. Future systems, however, should act to a large extent autonomously in order to keep them manageable. The investigation of techniques to allow complex distributed systems to self-organize is of high importance. The initiatives Autonomic Computing [14,5] and Organic Computing [3] propose systems with life-like properties and the ability to self-conﬁgure, self-optimize, self-heal, and self-protect. Safety and security play a major role in information technology and especially in the area of ubiquitous computing. Nodes in such a system are restricted to a local view and typically no central instance can be responsible for control and organization of the whole network. Trust and reputation can serve as a means to build safe and secure distributed systems in a decentralized way. With appropriate trust mechanisms, nodes of a system have a clue about which nodes S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 58–68, 2009. c IFIP International Federation for Information Processing 2009

Towards Trustworthy Self-optimization for Distributed Systems

59

to cooperate with. This is very important to improve reliability and robustness in systems which depend on a cooperation of autonomous nodes. In this paper we adopt the deﬁnition that trust is a peer’s belief in another peer’s trust facet. There are many facets of trust in computer systems. Such facets may concern for instance availability, reliability, functional correctness, and honesty. The related term reputation emphasizes that trust information is based on recommendation. The development of trustworthy self-* systems concerns aspects of (1) generation of trust values based on direct experiences, (2) storage, management, retrieval of trust values, and (3) the usage of this information to enhance trustworthiness of the overall system. Generating direct trust values strongly depends on the trust facet. Direct experiences concerning the facet availability may be gathered by using heartbeat messages. The facet functional correctness of a sensor node may be estimated by comparison with measured values of sensors nearby. In this paper we will focus on (2) and (3), i.e. how to manage, access, and use trust information. An instance of a distributed ubiquitous system which exploits self-* properties is our Smart Doorplate Project [11]. This project envisions the use of smart doorplates within an oﬃce building. The doorplates are amongst others able to display current situational information about the oﬃce owner and to direct visitors to his current location based on a location-tracking system. A middleware called “Organic Computing Middleware for Ubiquitous Environments” OCµ [10] serves as common platform for all included devices. The middleware system OCµ was developed to oﬀer self-conﬁguration, self-optimization, self-healing, and self-protection capabilities. It is based on the assumption that applications are composed of services, which are distributed to the nodes of the network. Service distribution is performed during the initial self-conﬁguration phase considering available resources and requirements of the services. At runtime, resource consumption is monitored. In a former work we have developed a self-optimization mechanism [13,12] to balance the resource consumption (load) between nodes by service transfers to other nodes. OCµ is an open system and designed to allow services and nodes from diﬀerent manufacturers to interact. In this work we incorporate a trust mechanism into our middleware to allow network entities to decide how far to cooperate with other nodes/services. This is used to enhance the self-optimization algorithm. The paper is organized in seven sections. Section 2 gives an overview of the state of the art of trust in distributed systems. Section 3 presents the basic self-optimization algorithm we have developed. Section 4 introduces diﬀerent algorithms to build a trust management layer which is able to provide functionalities covered by (2) as mentioned above. In Section 5 we present the trustworthy self-optimization, which extends the basic self-optimization and takes trust into account. This refers to (3). Then, Section 6 describes measurements of an implementation of the algorithm. Finally, Section 7 concludes the paper.

60

2

B. Satzger et al.

Related Work

There are many approaches to incorporate trust into distributed systems. In this section some relevant papers are highlighted. Repantis et al. [7] describe a middleware based on reputation. In their model, nodes can request data and services (objects) and may receive several responses from diﬀerent nodes. In this case the object from the most trustworthy provider is chosen. The information about reputation of nodes is stored on its direct neighbors and appended to response messages. The nodes deﬁne thresholds for any object they request. Thus, only providers with a higher reputation are taken into account. After reception of an object the provider is being rated based on the satisfaction of the requester. In [7] nodes share one common reputation value which means that all nodes have the same trust in a certain node. In contrast, Theodorakopoulos et al. [9] describe a model based on a directed graph in which vertices are the network’s nodes and the weighted edges describe trust relations. The weight of an edge (u, v) describes the trust of node u in v. Any weight consists of a trust value and the conﬁdence in its correctness. The TrustMe [8] protocol focuses on the anonymity of network members. It represents a technique to store and access trust information within a peer-to-peer network. The mining of trust information plays a minor role. An asymmetric encryption technique is used to allow for protection against attacks. In contrast to many trust management systems which support very limited anonymity or assume anonymity to be an undesired feature, TrustMe emphasizes the importance of anonymity. The protocol provides anonymity for the trust provider and the trust requester. Cornelli et al. [4] present a common approach to request trust values: node A interested in the reputation of node B sends a broadcast message and receives response from all nodes which have a trust value about B. The latter message is encrypted by the public key of A. After reception of the encrypted answer the node contacts the responder to identify bogus messages. In [6], a special approach is used to store trust information. It uses Distributed Hash Tables (DHTs) to store the trust value of a node in a number of parent nodes. These are identiﬁed by hash functions using the id of the child. A node requesting the trust value of a network member uses the hash function to calculate the parents which hold the value and sends a request to them. Aberer et al. [2] present a reputation system to gather trust values. An interesting point is that trust values are mutual while traditionally nodes judge independently. The hope is to achieve an advantage due to the cooperation. This idea is integrated into the calculation of the global trust value of a node. The interaction triggered by the node itself as well as the requested interactions are accounted. Global trust values are binary, i.e. nodes are considered trustworthy or not. If nodes are cheating during an interaction they are considered globally untrustworthy. If a node detects a cheating communication partner it ﬁles a complaint. By the number of interactions with diﬀerent nodes the probability rises that a liar is unmasked. In this model reputation values are stored within the network in a distributed way. This is done using the so-called PGrid [1].

Towards Trustworthy Self-optimization for Distributed Systems

3

61

Basic Self-optimization Algorithm

The basic self-optimization algorithm [13,12] is inspired by the human hormone system. Its task is to balance the load of a distributed system based on services. This artiﬁcial hormone system consists of metrics which calculate a reaction (service transfer), nodes producing digital hormones which indicate their load, receptors collecting hormones and handing them over to the metrics, and ﬁnally the digital hormones holding load information. To minimize overhead the digital hormone value enfolds both, the activator as well as the inhibitor hormone. If the value of the digital hormone is above a given level, it activates while a lower value inhibits the reaction. To further reduce overhead, hormones are piggybacked to application messages and do not result in additional messages. The basic idea behind the self-optimization is: When a heavy loaded node receives a message containing a hormone which states that the sender is lightly loaded, services are transferred to this sender. The metrics used to decide whether to balance the load between two nodes of the network are named transfer strategies because they decide on the transfer of a service. Our self-optimization has the ability to improve a system during runtime. It yields very good results in load-balancing by only using local decision and with minimal overhead. However, it is not considered whether a service is transferred to a trustworthy node. Bogus nodes (e.g. nodes running a malicous middleware) might attract services in a systematic way and could induce a breakdown of the system. Unreliable faulty nodes might not have the ability to properly host services. The other way round, you would want to utilize particularly reliable trustworthy nodes for important services. Therefore, we propose to incorporate trust information into the transfer decision.

4

Trust Management

As mentioned, a trust system needs a component which generates trust values based on direct experiences and it needs a component which is able to process this information. The generation of direct trust values depends strongly on the trust facet and the domain. Since a network can be seen as a graph, each node has a set of direct neighbors. In order to be able to estimate the trust facet availability, direct neighbors may periodically check their availability status. But depending on domain and application the mining of trust values diﬀers. In a sensor network the measured data of a node can be compared with the measurements of its neighbors. In this way nonconforming sensors can be identiﬁed. Since we do not focus on generation of trust values by observation but on the generic management of trust information we simply assume that any node has a trust value about its direct neighbors. This trust value T (k1 , k2 ) is within [0, 1] and reﬂects the subjective trust of node k1 in node k2 based on its experiences. T (k1 , k2 ) = 0 means k1 does not trust k2 at all while a value of 1 stands for ’full’ trust. We assume that direct neighbors directly monitor each other to determine a trust value. This trust value might be inadequate due to insuﬃcient monitoring

62

B. Satzger et al.

data. It may be possible that the trust is either too optimistic or too pessimistic. With continuous monitoring of neighbors their trust can probably be estimated better and better. In the following three trust algorithms are presented which determine management of trust information in order to make it useful for the network’s entities. The algorithms are explained being used together with an algorithm spreading information via hormones - like our self-optimization algorithm. However, the trust management is not limited to such a usage. It is assumed that nodes do not alter messages they are forwarding. In a real world application this should be assured by the usage of security techniques like encryption. 4.1

Forwarder

Only direct neighbors have the possibility to directly measure trust. Nodes that are not in the direct neighborhood need to rely on some kind of reputation. Also direct neighbors can use reputation as additional information. This ﬁrst approach to propagate trust is quite simple. When a node B sends an application message to a node F , direct neighbor D which forwards the message appends its trust value T (D, B) to it. Node F receives a message containing a hormone (with e.g. load information) and the trust of D in B as shown in Figure 1. This trust value is very subjective as only measured by one node, but this approach does not introduce additional messages for trust retrieval. Immediately after the receipt of an application message it has information about the trust of the sender.

Fig. 1. Forwarder algorithm

4.2

Distant Neighborhood

In this approach not only the trust value of one direct neighbor is considered but all neighbors of the according node. A node sends an application message with an appended hormone to the receiver. If the receiver needs to know about the trust of the sender, it sends a trust request to all neighbors of the sender. They reply by a trust response message. Finally, trust in the sender is calculated as the average of all received trust values. This method results in much more messages as the Forwarder algorithm would produce, but also provides more reliable information. A further advantage is the ability to detect diverged trust values. This might be used to identify bogus nodes. In Figure 2 node B sends

Towards Trustworthy Self-optimization for Distributed Systems

63

Fig. 2. Distant Neighborhood algorithm

an application message together with its capacity and load to node J which afterwards asks A, C, D, and E for their trust in node B. 4.3

Near Neighborhood

In this variant (see Fig. 3) a node spreads trust requests to query trust information, e.g. after the receipt of an application message. These trust requests are coupled with a hop counter. First, this hop counter is set to one which means that it ﬁrst asks its own direct neighbors. If they have information about the target node they answer their trust value, otherwise they negate. If the node receives positive answers it averages the corresponding values. If the node receives only negative answers it increments its hop counter and repeats the trust request. At the latest when the request reaches a direct neighbor of the target node a trust value is responded. The requesting node stores the resulting trust value in order to answer other requests. Note that a node will execute this algorithm in order to update and reﬁne its data even if it already has trust information about a target node. If a node already has a trust value of another node it is integrated into the averaging process. Initially, this algorithm produces much more messages than

Fig. 3. Near Neighborhood algorithm

64

B. Satzger et al.

the ones described above. However, as trust values are distributed within the network the number of messages decreases over time because it becomes more and more likely that few hops are needed until a node is found with information about the target node. The same holds for the accuracy of trust values which increases with the runtime of the algorithm.

5

Trustworthy Self-optimization

The self-optimization reallocates services during runtime to ensure a uniform distribution of load. The version described in Section 3 does not take the trust of nodes into account. In the following an approach is presented to incorporate trust into the self-optimization. Therefore, each of the three trust management algorithms presented above can be used. We assume that diﬀerent services have diﬀerent priorities. If a service is of high importance for the functionality of the system or collects sensitive data, its priority is high. The prioritization may be deﬁned at design time or adapted dynamically during runtime. The trustworthy self-optimization aims at load-sharing with additional consideration of the services’ priority, i.e. to avoid hosting of services with high priority on nodes with low trust. A self-optimization step is performed strictly locally with no global control. Like the basic self-optimization algorithm, each node piggybacks hormones containing information about its current load and its capacity to each application message. This enables the receiver to compare this load value with its own load. Additionally the sender’s trust can be queried via the proposed trust management layer. A transfer strategy takes this information and decides whether to transfer a service or not. If a service is identiﬁed it is tried to transfer it to the sender. This is only possible if the sender still has enough free resources to host this service. Due to the dynamics of the network this is not always sure. The basic idea of the transfer strategy is to ﬁnd a balance between pure loadbalancing and trustworthiness of service distribution. Using parameter α allows to determine on which aspect to focus. A higher value for α emphasizes the need to transfer services to nodes with optimal trust values, a lower value for α results in a focus on pure load-balancing. All services B is able to host, for which A’s trust into B is higher than their priority, and whose dislocation would balance the load signiﬁcantly are considered for transfer.

6

Evaluation

Several test scenarios have been investigated in order to evaluate the trustworthy self-optimization. Each scenario consists of 100 nodes with random resource capacity (e.g. RAM) and a random global but hidden trust value is assigned to each node. In real world applications the trust of a node must be measured. As this strongly depends on the trust facet and the application we have chosen a theoretical approach to simulate the trust by direct observation. It is based on the assumption that nodes are able to better estimate the trust of a node over

Towards Trustworthy Self-optimization for Distributed Systems

65

Fig. 4. Simulation of direct trust monitoring

time. In the simulation the trust of a node in its direct neighbor converges to the true hidden trust value with increasing number of mutual interactions as shown in Figure 4. In this example, the true trust value of the node is 0.5. Initially, the node is only able to estimate the trust very roughly while the error decreases statistically with the interactions. Formally, the trust of node r to node k after n interactions is modeled by Tn (r, k) = t(k) + ρn . In this formula t(k) is the true global but hidden trust value of k and ρn is a random value which falsiﬁes t(k). With further interactions the possible range of ρi decreases, i.e. |ρn | > |ρn+1 |. This random value simulates the inability to precisely estimate trustworthiness. In the simulation the nodes send dummy application messages to random nodes. These are used to piggy-back information necessary for self-optimization as described above. After reception of such a message the node determines the trust using a certain trust algorithm. Then, it is decided whether it is transferred or not. Initially, each node obtains a random number of services. Resource consumption and priority of a service are chosen randomly while the sum of all service weights is not allowed to exceed a node’s capacity. The proposed trustworthy self-optimization is used for load-balancing and additionally tries to assign services with high priority to highly trusted nodes. Rating functions are used to evaluate the ﬁtness of a network conﬁguration concerning trust and equal load-sharing. The main idea of the rating function for trusted service distribution fT is to reward services with high priority and resource consumption running on very trustworthy nodes: fT =

N n

S(n)

(t(n)

c(s) · p(s)).

s

N is the set of all nodes, S(n) is the set of services of a node n, t(n) its true trust value, c(s) and p(s) resource consumption and priority of a service s.

66

B. Satzger et al.

The rating function for load-sharing fL compares the current service distribution with the global theoretical optimal distribution. For each simulation the network consists of 100 nodes. At the beginning it is rated by fT and fL . Then the simulation is started and after each step the network is rated again. Within one step 100 application messages are randomly sent. This means that some nodes may send more than one message and others may send none to reﬂect the asynchronous character of the distributed system. The result of the receipt of an application message may be a service transfer dependent on the used trust algorithm and the node’s load. Additionally, it is measured how many services are transferred. Each evaluation scenario has been tested 250 times with randomly generated networks and averaged values. Figure 5 shows the gains of the trustworthiness of service distribution regarding the rating function fT . Distant neighborhood reached the best results followed by Forwarder and Near Neighborhood. However, Forwarder introduces no additional message for trust value distribution. Without consideration of trust in self-optimization, services are transferred to random nodes and the overall trust is not improved, but even a little bit declined. Figure 6 shows the network’s load-sharing regarding the function fL . Compared to the initial average load-distribution of 75% of the theoretical optimum, the self-optimization combined with any trust algorithm improves the load-balance. This means that the consideration of trust does not prevent the self-optimization to balance the load within the system. However, the quality of pure load-sharing cannot be reached by any trustworthy algorithm. Distant Neighborhood performs best in the conducted measurements. It improves the trustworthiness of the the service distribution by about 20% while causing a deterioration of the load-sharing by about 4% (compared to load-sharing

!

Fig. 5. Trustworthy service distribution (fT )

Towards Trustworthy Self-optimization for Distributed Systems

! ! "

67

Fig. 6. Load-sharing (fL )

with no consideration of trust). This is supposed to be a beneﬁcial trade-oﬀ as a slightly improved load-sharing will have weak impact on the whole system. However, running important services on unreliable or malicious nodes may result in very poor system performance. Forwarder avoids overhead due to queries, as trust values are appended to application messages. This algorithm improves the trustworthy service distribution by about 13% and decreases load-sharing by about 5%. Near neighborhood shows similar results as Forwarder, but its explicit trust queries cause a higher overhead. However, due to its working principle it may be the case that it would show better results after a longer experiment duration.

7

Conclusion

This paper presents an approach for a trust management layer and shows how information provided by this layer can be used to improve a self-optimization algorithm. Three trust management algorithms have been introduced. With Forwarder, direct neighbors append its directly observed trust values automatically to each application message. Distant Neighborhood explicitly asks all direct neighbors of a node for a trust value. The last trust algorithm distributes trust values within the network for faster gathering of trust information especially after a longer runtime. The trustworthy self-optimization does not only consider pure load-balancing but also takes into account to transfer services only to nodes regarded suﬃciently trustworthy. A feature of the self-optimization is not to use explicitly sent messages but append information in the form of hormones to application messages to minimize overhead. Three diﬀerent transfer strategies have been proposed which determine whether a service is transferred to another node or not regarding load and trust.

68

B. Satzger et al.

The presented techniques have been evaluated. The results show that trust aspects can be integrated into the system with little restrictions regarding loadbalancing. The proposed trust mechanisms describe a way to increase the robustness of self-* systems with cooperating nodes.

References 1. Aberer, K.: P-Grid: A self-organizing access structure for P2P information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, p. 179. Springer, Heidelberg (2001) 2. Aberer, K., Despotovic, Z.: Managing trust in a peer-2-peer information system. In: CIKM, pp. 310–317. ACM, New York (2001) 3. Allrutz, R., Cap, C., Eilers, S., Fey, D., Haase, H., Hochberger, C., Karl, W., Kolpatzik, B., Krebs, J., Langhammer, F., Lukowicz, P., Maehle, E., Maas, J., M¨ uller-Schloer, C., Riedl, R., Schallenberger, B., Schanz, V., Schmeck, H., Schmid, D., Schr¨ oder-Preikschat, W., Ungerer, T., Veiser, H.-O., Wolf, L.: Organic Computing - Computer- und Systemarchitektur im Jahr 2010 (in German). VDE/ITG/GI position paper (2003) 4. Cornelli, F., Damiani, E., di Vimercati, S.D.C., Paraboschi, S., Samarati, P.: Choosing reputable servents in a P2P network. In: WWW, pp. 376–386 (2002) 5. Horn, P.: Autonomic computing: IBM’s perspective on the state of information technology (2001) 6. Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: Eigenrep: Reputation management in p2p networks. In: Proceedings of the 12th International World Wide Web Conference, WWW 2003 (2003) 7. Repantis, T., Kalogeraki, V.: Decentralized trust management for ad-hoc peer-topeer networks. In: Terzis, S. (ed.) MPAC. ACM International Conference Proceeding Series, vol. 182, p. 6. ACM, New York (2006) 8. Singh, A., Liu, L.: Trustme: Anonymous management of trust relationships in decentralized P2P systems. In: Shahmehri, N., Graham, R.L., Caronni, G. (eds.) Peerto-Peer Computing, pp. 142–149. IEEE Computer Society, Los Alamitos (2003) 9. Theodorakopoulos, G., Baras, J.S.: Trust evaluation in ad-hoc networks. In: WiSe 2004: Proceedings of the 3rd ACM workshop on Wireless security, pp. 1–10. ACM, New York (2004) 10. Trumler, W.: Organic Ubiquitous Middleware. PhD thesis, Universit¨ at Augsburg (July 2006) 11. Trumler, W., Bagci, F., Petzold, J., Ungerer, T.: Smart Doorplate. In: First International Conference on Appliance Design (1AD), Bristol, GB, May 2003, pp. 24–28 (2003) 12. Trumler, W., Pietzowski, A., Satzger, B., Ungerer, T.: Adaptive self-optimization in distributed dynamic environments. In: Di Marzo Serugendo, G., Martin-Flatin, J.-P., J´elasity, M., Zambonelli, F. (eds.) First IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO 2007), Cambridge, Boston, Massachussets, pp. 320–323. IEEE Computer Society, Los Alamitos (2007) 13. Trumler, W., Thiemann, T., Ungerer, T.: An artificial hormone system for selforganization of networked nodes. In: Pan, Y., Rammig, F.J., Schmeck, H., Solar, M. (eds.) Biologically Inspired Cooperative Computing, Santiago de Chile, pp. 85–94. Springer, Heidelberg (2006) 14. Weiser, M.: The computer for the 21st century (1995)

An Experimental Framework for the Analysis and Validation of Software Clocks Andrea Bondavalli1, Francesco Brancati1, Andrea Ceccarelli1, and Lorenzo Falai2 1 University of Florence, Viale Morgagni 65, I-50134, Firenze, Italy {bondavalli,francesco.brancati,andrea.ceccarelli}@unifi.it 2 Resiltech S.r.l., Via Filippo Turati 2, 56025, Pontedera (Pisa), Italy [email protected]

Abstract. The experimental evaluation of software clocks requires the availability of a high quality clock to be used as reference time and a particular care for being able to immediately compare the value provided by the software clock with the reference time. This paper focuses i) on the definition of a proper evaluation process and consequent methodology, and ii) on the assessment of both the measuring system and of the results. These aspects of experimental evaluation activities are mandatory in order to obtain valid results and reproducible experiments, including comparison of possible different realizations or prototypes. As case study to demonstrate the framework we describe the experimental evaluation performed on a basic prototype of the Reliable and Self-Aware Clock (R&SAClock), a recently proposed software clock for resilient time information that provides both current time and current synchronization uncertainty (a conservative and self-adaptive estimation of the distance from an external reference time). Keywords: experimental framework and methodology, assessment and measurements, software clocks, R&SAClock, NTP.

1 Introduction Experimental evaluation (i.e., testing [2]) offers the possibility to observe a system in its actual execution environment and to perform fault forecasting and removal (e.g., through fault injection [1]). During experimental evaluation, measurement results (the data) are collected and allow to discover insight on the tested system. It is mandatory that these results are valid i.e., they are not altered due to an intrusive set up, a badly designed experiment or measurement errors. To achieve valid results it is necessary i) a proper evaluation process that is carefully designed, its objective pinpointed, and the relevant quantities carefully addressed, and ii) an assessment of the measuring system (the set of instruments used to perform the measurements) and of the results [13]. The focus on the paper is on the process and methodology for the experimental validation of software clocks. The main issues to address for properly cope with this problem are the provision of a high quality clock to be used as reference time and a particular care when designing the measuring system so to be able to immediately compare the value provided by the software clock with the reference time [15]. In this S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 69–81, 2009. © IFIP International Federation for Information Processing 2009

70

A. Bondavalli et al.

paper, the evaluation process is carefully planned, and the validity of the measuring system and of the results is investigated and assessed through principles of measurement theory. As further benefits, the experimental set-up and the whole evaluation process can be easily adapted and reused for the evaluation of different types of software clocks and for comparisons of possible different implementations or prototypes within the same category. The paper illustrates the experimental process and set up by showing the evaluation of a prototype of the Reliable and Self-Aware Clock (R&SAClock [5]), a recently proposed software clock. R&SAClock exploits services and data provided by any chosen synchronization mechanism (for external clock synchronization) to provide both the current time and the current synchronization uncertainty (an adaptive and conservative estimation of the distance of local clock from the reference time). The rest of this paper is organized as follows. In Section 2 we introduce our case study: the R&SAClock prototype that will be analyzed. Section 3 describes our experimental process and the measuring sub-system. Section 4 presents the results obtained by the planned experiments and their analysis. Conclusions are in Section 5.

2 The Reliable and Self-aware Clock 2.1 Basic Notions of Time and Clocks Let us consider a distributed system composed of a set of nodes. We define reference time as the unique time view shared by the nodes of the system, reference clock as the clock that always holds the reference time, and reference node as the node that owns the reference clock. Given a local clock c and any time instant t, we define c(t) as the time value read by local clock c at time t. The behavior of the local clock is characterized by the quantities offset, accuracy and drift. The offset Θc(t) = t − c(t) is the actual distance of local clock c from reference time at time t [9]. This distance may vary through time. Accuracy Ac is an upper bound of the offset [10] and is often adopted in the definition of system requirements and therefore targeted by clock synchronization mechanisms. Drift ρc(t) describes the rate of deviation of a local clock c at time instant t from the reference time [10]. Unfortunately, accuracy and offset are usually of practical little use for systems, since accuracy is usually a high value, and it is not a representative estimation of current distance from reference time, and offset is difficult to measure exactly at any time t. Synchronization mechanisms typically compute an estimated offset Θ (and an estimated drift ρ˜c(t)), without offering guarantees and only at synchronization instants. Instead of the static notion of accuracy, a dynamic conservative estimation of the offset provides more useful information. For this purpose, the notion of uncertainty as used in metrology [4], [3] can provide such useful estimation: we define the synchronization uncertainty Uc(t) as an adaptive and conservative evaluation of offset Θc(t) at any time t; that is Ac ≥ Uc(t) ≥ |Θc(t)| ≥ 0 [5]. Finally we define the root delay RDc(t) as the transmission delay (one-way or round trip, depending on the synchronization mechanism), including all systemrelated delays, from the node that holds local clock c to the reference node [9].

An Experimental Framework for the Analysis and Validation of Software Clocks

71

2.2 Basic Specifications of the R&SAClock R&SAClock is a new software clock for external clock synchronization (a unique reference time is used as target of the synchronization) that provides to users (e.g., system processes) both the time value and the synchronization uncertainty associated to the time value [5]. R&SAClock is not a synchronization mechanism, but it acts as a new software clock that exploits services and data provided by any chosen synchronization mechanism (e.g., [9], [11]). When a user asks the current time to R&SAClock (by invoking the function getTime), R&SAClock provides an enriched time value [likelyTime, minTime, maxTime, FLAG]. LikelyTime is the time value computed reading the local clock i.e., likelyTime = c(t). MinTime and maxTime are computed using the synchronization uncertainty provided by the internal mechanisms of R&SAClock. More specifically, for a clock c at any time instant t, we extend the notion of synchronization uncertainty Uc(t) distinguishing between a right synchronization uncertainty (positive) Ucr(t) and a left synchronization uncertainty (negative) Ucl(t), such that Uc(t) = max[Ucr(t); −Ucl(t)]. Values minTime and maxTime are respectively a left and a right bound of the reasonable values that can be attributed to the actual time: minTime is set to c(t) + Ucl(t) and maxTime is set to c(t) + Ucr(t). The user that exploits the R&SAClock can impose an accuracy requirement, that is the largest synchronization uncertainty that the user can accept to work correctly. Correspondingly, R&SAClock can give value to its output FLAG, which is a Boolean value that indicates whether the current synchronization uncertainty is within the accuracy requirement or not. The main core of R&SAClock is the Uncertainty Evaluation Algorithm (UEA), that equips the R&SAClock with the ability to compute the synchronization uncertainty. Possible different implementations of the UEA may lead to different versions of R&SAClock (for example, in [5] and [12] two different versions are shown). Besides the R&SAClock specification shown, we identify the following two non-functional requirements: REQ1. The service response time provided by R&SAClock is bounded: there exists a maximum reply time ∆RT from a getTime request made by a user to the delivery of the enriched time value (the probability that the getTime is not provided within ∆RT is negligible). REQ2. For any minTime and maxTime in any enriched time value generated at time t, it must be minTime ≤ t ≤ maxTime with a coverage ∆CV (by coverage we mean the probability that this equation is true). 2.3 R&SAClock Prototype and Target System Here we describe the prototype of the R&SAClock and the system in which it is executing as our target system used for the subsequent experimental evaluations. The R&SAClock prototype works with Linux and with the NTP synchronization mechanism. The UEA implemented in this prototype computes a symmetric left and right synchronization uncertainty with respect to likelyTime i.e., −Ucl(t) = Ucr(t) and Uc(t) = Ucr(t) [5]. Using functionalities of both NTP and Linux, the UEA gets i) c(t) querying the local clock and ii) the root delay RDc(t) and the estimated offset Θ t by

72

A. Bondavalli et al.

monitoring the NTP log file (NTP refreshes root delay and estimated offset when it performs a synchronization). The behavior of the UEA is as follows. First, the UEA reads an upper bound δc, fixed to 50 part per million (ppm) in the experiments, on clock drift from a configuration file and listens on the NTP log file. When NTP updates the log file, the UEA reads the estimated offset and the root delay and starts a counter called TSLU which represents the Time elapsed Since Last (more recent) Update of root delay and estimated offset. Given t the most recent time instant in which root delay and estimated offset have been updated, at any time t1 ≥ t the synchronization uncertainty Uc(t1) is computed as: Uc(t1) =|Θ

| + RD(t) + (δc · TSLU).

(1)

The basic idea of (1) is that, given Uc(t) = |Θ | + RD(t) ≥ |Θc(t)| at time t, we have Uc(t1) ≥ Θc(t1) at t1 ≥ t (a detailed discussion is in [5]). The target system is depicted in Fig. 1. The hardware components are a Toshiba Satellite laptop, that we call PC_R&SAC, and the NTP servers connected to PC_R&SAC through high-speed Internet connection. The software components are the R&SAClock prototype, the NTP client (a process daemon) and the software local clock of PC_R&SAC. The NTP client synchronizes the local clock using information from the NTP servers.

PC_R&SAC

R&SAClock NTP Servers NTP

Disk local clock

Fig. 1. The target system: R&SAClock for NTP and Linux

3 The Experimental Evaluation Process and Methodology The process of our experimental evaluation starts by identifying the goals, and then designing the experimental set up (composed by injectors, probes and the experiment control subsystems), the planning of the experiments to conduct and finally defining the structure and organization of the data related to the experiments. 3.1 Objective The objective of our analysis is in this case to validate a R&SAClock prototype, verifying if and how much it is able to fulfill its requirements in varying operating conditions, especially nominal and faulty. We aim to assign values to ∆RT and ∆CV.

An Experimental Framework for the Analysis and Validation of Software Clocks

73

3.2 Planning of the Experimental Set Up The experimental set up is described by the grey components of Fig. 2. Its hardware components are a HP Pavilion desktop, that we call PC_GPS, and a high quality GPS (Global Positioning System [8]) receiver. Through such receiver, the PC_GPS is able to construct a clock tightly synchronized to the reference time that is used as the reference clock. Obviously such reference clock does not hold the exact reference time, but it is orders of magnitude closer to the reference time than the clock of the target system: it is sufficiently accurate to suit our experiments. The PC_GPS contains a software component for the control of the experiment (Controller hereafter) that is composed of an actuator that commands the execution of workload and faultload to the Client (e.g., requests for the enriched time value, that the Client will forward to the R&SAClock), and of a monitor that receives the information from the Client of the completion of services, accesses the reference clock and writes data on the disk. The Client is a software component located on the target system: it performs injection functions, to inject the faultload and to generate the workload, and probe functions to collect the relevant quantities and write this data on the disk. An experimental setup in which the target system and the Controller are placed on the same PC would require two software clocks, thus introducing perturbations that are hard to address.

PC_GPS Controller

Disk

monitor actuator

Client probes injector

PC_R&SAC

Disk NTP

R&SAClock

GPS

(approximation of ) reference clock

NTP Servers

local clock

Fig. 2. Measuring system and target system

Given the description of the target system (an implementation of an R&SAClock) and of the experimental set up, we describe now how the relevant measures can be collected. To verify requirements REQ1 and REQ2, our measuring system implements solutions which are specific for R&SAClocks but general for any instance of it. To evaluate REQ1, the Client logs, for each request for the enriched time value, the time Client.start in which the Client queries the R&SAClock and the time Client.end in which it receives the answer from R&SAClock (these two values are collected reading the local clock of PC_R&SAC). To verify REQ2 (see Fig. 3), the measuring system computes a time interval [Controller.start, Controller.end] that contains the actual time in which the enriched time value is generated. This time interval is collected by the Controller’s monitor that reads the reference clock. Controller.start is the time instant in which a request for the enriched time value is sent by the Controller to the Client, and Controller.end is the time instant in which the enriched time value is received by the Controller. REQ2 is satisfied if [Controller.start, Controller.end] is within [minTime, maxTime].

74

A. Bondavalli et al.

Controller.start

Controller.end

generation of the enriched time value

Fig. 3. The time interval [Controller.start, Controller.end] that allows to evaluate REQ2

3.3 Instrumentation of the Experimental Set Up PC_R&SAC is a Linux PC connected to (one or more) NTP servers by means of an Internet connection and to the PC_GPS (another Linux-based PC) by means of an Ethernet crossover cable. The Controller and the Client are two high-priority processes that communicate using a socket. Fig. 4 shows their interactions to execute the workload. The Client waits for Controller’s commands. At periodic time intervals, the Controller sends a message containing a getTime request and an identifier ID to the Client, and logs ID and Controller.start. When the Client receives the message, it logs ID and Client.start and performs a getTime request to the R&SAClock. When the Client receives the enriched time value from R&SAClock, it logs the enriched time value, Client.end and ID, and sends a message containing an acknowledgment and ID to the Controller. The Controller receives the acknowledgment and logs ID and Controller.end. At the experiment termination, the log files created on the two machines are combined pairing entries with the same ID. Controller and Client interact to execute the faultload as follows. The Controller sends to the Client the commands to inject the faults (e.g. to close the Internet connection or to kill the NTP client), and logs the related event; the Client executes the received command and logs the related event. Data logging is handled by NetLogger [6], a tool for logging data in distributed systems guaranteeing negligible system perturbation. PC_GPS

PC_R&SAC

Controller

Client

R&SAClock

LOG

get_time() e ed Tim Enrich e Valu

“ack ID=1” “get:time ID=2”

C Clien lient.star t.end t

Con Co trol ntr ler. oll sta er. rt en d

“get:time ID=1”

get_time( ) Enriched Time Value

=2”

“ack ID

LOG

Fig. 4. Controller, Client and R&SAClock interactions to execute the workload

An Experimental Framework for the Analysis and Validation of Software Clocks

75

3.4 Planning of the Experiments Here the execution scenarios, the faultload, the workload and the experiments need to be defined. Our framework allows to easily define and modify such aspects of the experimental planning as desired or needed by the objectives of the analysis. In this case of the basic R&SAClock prototype, with the objective of showing how our set up works, the selection has been quite simple and not particularly demanding (for the target system) or rich (to obtain a complete assessment). Execution scenarios. Two execution scenarios are considered, that corresponds to the two most important operating conditions of the R&SAClock: i) beginning of synchronization (the NTP client has an initial transient phase and starts to synchronize to the NTP servers, and the PC_R&SAC is connected to the network), and ii) nominal operating condition of the NTP client (it represents a steady state phase: the NTP client is active and fully working, holding information on local clock). Faultload. Beside the situation of no faults, the following situations are considered: i) loose of connection between the NTP client and servers (thus making the NTP servers unreachable), and ii) failure of the NTP client (the Controller commands to shut down the NTP client). These are the most common scenarios that the R&SAClock is expected to face during its execution. Workload. The workload selected is simply composed of getTime requests to the R&SAClock to be sent once per second. This workload does not allow to observe the behavior of the target system under overload or stress conditions, and must be modified to a more demanding one if one wants to thoroughly evaluate the behavior of the target system. Experiments. Combining the scenarios, the faultload and the workload, we identify four significant experiments: i) beginning of synchronization, and no faults injected; ii) nominal operating condition, and no faults injected; iii) nominal operating condition, and failure of NTP client; iv) nominal operating condition, and loose of connection. The duration of each experiment is 12 hours; the rationale (confirmed by the collected evidence) is that 12 hours are more than sufficient to observe all the relevant events in each experiment. 3.5 Structure of the Data The structure and organization of the data related to the experiments is shown in Fig. 5, and are organized using a star-schema [7]. The organization of the data using a star schema is an intuitive model composed of facts and dimension tables. Facts tables (such as table R&SAClock_Results in Fig. 5) contain an entry for each experiment run. Each entry in its turn contains the values observed for the relevant metrics and values for the parameters of the experimental setup used in that specific experiment run. Each dimension table refer to a specific characteristic of the experimental set up and contain the possible values for that specific feature (in Fig. 5, tables Scenario, Workload, Faultload, Experiment, Target_System).

76

A. Bondavalli et al.

Fig. 5. Structure of the datta related to the experiment organized following a star-schemaa

This model allows to strructure and highlight the objectives, the results and the key elements of each evaluation n; consequently, it helps to reason on and keep the purpposes and contexts of the analy ysis clear.

4 Analysis of Resultss We subdivide the offline prrocess to investigate the results collected in three activitties: i) data staging on the raw data collected, to populate a database where data cann be easily analyzed, ii) investig gation of the validity of the results, and iii) presentation and discussion of the results. 4.1 Data Staging We organize the data staging in three steps: log collection, log parsing and databbase loading. In log collection th he events are merged in a unique log file using NetLoggeer’s API (Application Program mming Interface). In log parsing we use an AWK sccript (AWK is a programming language for processing text-based data) to parse raw ddata and create CSV (Comma Separated S Value: a standard data file format used for the storage of data structured in i table form) files, that are easier to handle than the rraw data. In database loading we create SQL (Structured Query Language) queries startting from the content of CSV filles to populate the database. 4.2 Quality of the Measu uring System and Results We assess the quality of th he measuring system along the principles of experimenntal validation and fault injectio on [1], [14] and the confidence of the results through principles of measurement theo ory [3], [4]; we focus on the uncertainty of the results and the intrusiveness of the meeasuring system [3], [4]. Furthermore, repeatability [44] is discussed to identify to whaat extend the results can be replicated in other experimennts.

An Experimental Framework for the Analysis and Validation of Software Clocks

77

Intrusiveness (perturbation). Despite the Client is a high priority thread, its execution does not perturb the target system and does not affect results. In fact, the other relevant thread that if delayed could induce a change in the system behavior is the R&SAClock thread, responsible for the generation of the enriched time value. This thread is the one with the highest priority in the target system. Uncertainty. The actual time instant in which the enriched time value is computed is within the interval [Controller.start, Controller.end] whose length is constituted by the length of the interval [Client.start, Client.end] plus the delay due to the communications between the Client and the Controller. The sampled duration of this interval is within 1.7 ms in the 99% of cases. Analyzing the other 1% of executions we discovered they were affected by large communications delay: we decided then to discard these runs. We set the time instant in which the enriched time value is computed as the middle value of the interval [Controller.start, Controller.end], that is (Controller.end + Controller.start) / 2. Such value is affected by an uncertainty of (Controller.end − Controller.start) / 2 = 0.85 ms (milliseconds) and confidence 1 [4]. Since the time instant in which the enriched time value is computed is the time instant in which the likelyTime is generated, we can attribute to the likelyTime the same uncertainty. As a consequence also the measured offset (difference between reference time and likelyTime) suffers from the same uncertainty. In principle also the resolution of our measurement system should contribute to the final uncertainty. In our case, the Linux clock resolution is 1 µs (microsecond) and its contribution is irrelevant to the computation of uncertainty. Repeatability. Re-executing the same experiment will almost certainly produce different data. Repeatability in deterministic terms as defined in [4] is not achievable. However, re-executing the same set of experiments will lead to statistically compatible results and to the same conclusions.

distance from reference time (seconds)

0.06

maxTime

synchronizations ) G c = 50 ppm

0.04

Ucr(t) = Uc(t)

0.02

likelyTime 0

reference time (provided by GPS)

-0.02

Ucl(t) = - Uc(t)

-0.04

minTime

-0.06 200

300

400

500

600

seconds

700

800

900

1000

s

Fig. 6. A sample trace of execution of the R&SAClock

78

A. Bondavalli et al.

4.3 Results In Fig. 6 we explain the results shown in Fig. 7-9. Reference time is on the x-axis. The central dashed line with label likelyTime is the distance between likelyTime and reference time: it is the offset of the local clock of PC_R&SAC that may vary during the experiments execution. The two external lines represent the distance of minTime and maxTime from reference time; this distance vary during experiments execution. If the NTP client performs a synchronization to the NTP servers at time t, the synchronization uncertainty Uc(t) is set to Θ . After the synchronization, the synchronization uncertainty grows steadily at rate 50 ppm until the next synchronization. The time interval between two adjacent synchronizations varies depending on the behavior of the NTP client. 0.25

distance from reference time (seconds)

0.15

distance from reference time (seconds)

0.2 0.15

maxTime

0.1 0.05 0

likelyTime

-0.05 -0.1

minTime

-0.15 -0.2 -0.25

0

1

2

3

4

5

6 hours

a)

7

8

9

10

11

12

maxTime 0.1

0.05

likelyTime 0

-0.05

-0.1

minTime -0.15 0

1

2

3

4

5

6

7

8

9

10

11

12

hours

b)

Fig. 7. a) Exp. 1: beginning of synchronization. b) Exp. 2: nominal operating condition.

Experiment 1. Fig. 7a shows the behavior of the R&SAClock prototype at the beginning of synchronization. The initial offset of the local clock of PC_R&SAC is 100.21 ms. At the beginning of the experiment, the NTP client performs frequent synchronizations to correct the local clock. After 8 hours, the offset is close to zero and consequently the NTP client performs less frequent synchronizations: this behavior affects Uc(t), that increases. Reference time is always within [minTime, maxTime]. Experiment 2. Fig. 7b shows the behavior of the R&SAClock prototype when the target system is in nominal operating condition and no faults are injected. The offset is close to zero and the local clock of PC_R&SAC is stable: the NTP client performs rare synchronization attempts. Reference time is always within [minTime, maxTime]. Uc(t) varies from 65.34 ms to 281.78 ms; the offset is at worst 4.22 ms. Experiment 3. Fig. 8a shows the behavior of the R&SAClock prototype when the target system is in nominal operating condition and the NTP client is failed (the figure does not contain the beginning of synchronization, about 8 hours). LikelyTime drifts from reference time (the NTP client does not discipline the local clock): after 12 hours, the offset is close to 500 ms. Since the actual drift of local clock is smaller than , the reference time is always within [minTime, maxTime].

An Experimental Framework for the Analysis and Validation of Software Clocks 2.5

1.5

distance from reference time (seconds)

distance from reference time (seconds)

2

maxTime

1 0.5 0 -0.5

likelyTime

-1 -1.5 -2

minTime

-2.5 -3 0

79

1

2

3

4

5

6

7

8

9

10

11

12

2

maxTime

1.5 1 0.5

likelyTime

0 -0.5 -1 -1.5

minTime

-2 -2.5

0

1

2

3

4

5

6

7

8

9

10

11

12

hours

hours

a)

b)

Fig. 8. a) Exp. 3: failure of the NTP client. b) Exp. 4: unavailability of the NTP servers.

Experiment 4. Fig. 8b shows the behavior of the R&SAClock prototype when the target system is in nominal operating condition and Internet connection is lost (the NTP client is unable to communicate with the NTP servers and consequently does not perform synchronizations). NTP client disciplines the local clock using information from the most recent synchronization. After 12 hours the offset is 26.09 ms: the NTP client, thanks to stable environmental conditions, succeeds in keeping likelyTime relatively close to the reference time. Reference time is within [minTime, maxTime]. Assessment of REQ1 and REQ2. The time intervals [Client.start, Client.end] from all the samples collected in the experiments are shown in Fig. 9. The highest value is 1.630 ms, thus REQ1 is satisfied simply by setting ∆RT ≥ 1.630 ms. However such multimodal distribution shows that the response time of a getTime varies significantly depending on current system activity and possible overheads of system resources. This suggest the possibility to build a new improved prototype with a reduced ∆RT and less variance in the interval [Client.start, Client.end] (e.g., implementing the R&SAClock within the OS layer and the getTime as an OS call).

number of samples

10000 8000 6000 4000 2000 0 700

900

1100

1300

microseconds

1500

µs

Fig. 9. Intervals [Client.start, Client.end]

1700

80

A. Bondavalli et al.

In the experiments shown, REQ2 is always satisfied (∆CV = 1). However, the interval [minTime, maxTime] is often a very large value, even when offset is continuously close to zero. Results of the experiments suggest that different (more efficient) UEAs that predict the oscillator drift behavior using statistical information on past values may be developed and used. A preliminary investigation is in [12].

6 Conclusions and Future Works In this paper we described a process and set up for the experimental evaluation of software clocks. The main issues addressed have been the provision of a high quality clock (resorting to high quality GPS) to be used as reference time in the experimental set up, and a particular care in designing the measuring system, that has allowed to assess the validity of the measuring system and of the results. Besides the design and planning of the experimental activities, the paper illustrates the experimental process and set up by showing the evaluation of a prototype of the R&SAClock [5], a recently proposed software clock. Even the simple experiments described allowed to get insight on the major deficiencies of the considered prototype and to identify the directions for improvements. Acknowledgments. This work has been partially supported by the European Community through the project IST-FP6-STREP-26979 (HIDENETS - HIghly DEpendable ip-based NETworks and Services).

References 1. Hsueh, M., Tsai, T.k., Iyer, R.K.: Fault Injection Techniques and Tools. Computer 30(4), 75–82 (1997) 2. Avizienis, A., Laprie, J., Randell, B., Landwehr, C.: Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. on Dependable and Secure Computing 1(1), 11–33 (2004) 3. BIPM, IEC, IFCC, ISO, IUPAC, OIML: Guide to the Expression of Uncertainty in Measurement (2008) 4. BIPM, IEC, IFCC, ISO, IUPAC, OIML: ISO International Vocabulary of Basic and General Terms in Metrology (VIM), Third Edition (2008) 5. Bondavalli, A., Ceccarelli, A., Falai, L.: Assuring Resilient Time Synchronization. In: Proceedings of the 27th IEEE Symposium on Reliable Distributed Systems (SRDS), pp. 3–12. IEEE Computer Society, Washington (2008) 6. Gunter, D., Tierney, B.: NetLogger: a Toolkit for Distributed System Performance Tuning and Debugging. In: IFIP/IEEE Eighth International Symposium on Integrated Network Management, pp. 97–100 (2003) 7. Kimball, R., Ross, M., Thornthwaite, W.: The Data Warehouse Lifecycle Toolkit. J. Wiley & Sons, Inc., Chichester (2008) 8. Dana, P.H.: Global Positioning System (GPS) Time Dissemination for Real-Time Applications. Real-Time Systems 12(1), 9–40 (1997) 9. Mills, D.: Internet Time Synchronization: the Network Time Protocol. IEEE Trans. on Communications 39, 1482–1493 (1991)

An Experimental Framework for the Analysis and Validation of Software Clocks

81

10. Verissimo, P., Rodriguez, L.: Distributed Systems for System Architects. Kluwer Academic Publishers, Dordrecht (2001) 11. Cristian, F.: Probabilistic Clock Synchronization. Distributed Computing 3, 146–158 (1989) 12. Bondavalli, A., Brancati, B., Ceccarelli, A.: Safe Estimation of Time Uncertainty of Local Clocks. In: IEEE Symposium on Precision Clock Synchronization for Measurement, Control and Communication, ISPCS (to appear, 2009) 13. Bondavalli, A., Ceccarelli, A., Falai, L., Vadursi, M.: Foundations of Measurement Theory Applied to the Evaluation of Dependability Attributes. In: Proceedings of the 37th Annual IEEE/IFIP international Conference on Dependable Systems and Networks, pp. 522–533 (2007) 14. Arlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.-C., Laprie, J.-C., Martins, E., Powell, D.: Fault Injection for Dependability Validation: a Methodology and Some Applications. IEEE Trans. on Software Engineering 16(2), 166–182 (1990) 15. Veitch, D., Babu, S., Pàsztor, A.: Robust Synchronization of Software Clocks across the Internet. In: Proceedings of the 4th ACM SIGCOMM Conference on internet Measurement, pp. 219–232 (2004)

Towards a Statistical Model of a Microprocessor’s Throughput by Analyzing Pipeline Stalls Uwe Brinkschulte, Daniel Lohn, and Mathias Pacher Institut f¨ ur Informatik Johann Wolfgang Goethe Universit¨ at Frankfurt, Germany {brinks,lohn,pacher}@es.cs.uni-frankfurt.de

Abstract. In this paper we model a thread’s throughput, the instruction per cycle rate (IPC rate), running on a general microprocessor as used in common embedded systems. Our model is not limited to a particular microprocessor because our aim is to develop a general model which can be adapted thus ﬁtting to diﬀerent microprocessor architectures. We include stalls caused by diﬀerent pipeline obstacles like data dependencies, branch misprediction etc. These stalls involve latency clock cycles blocking the processor. We also describe each kind of stall in detail and develop a statistical model for the throughput including the entire processor pipeline.

1

Introduction

Nowadays, the development of embedded and ubiquitous systems is strongly advancing. We ﬁnd microprocessors embedded and networked in all areas of life, e.g. in cell phones, cars, planes, and household aids. In many of these areas the microprocessors need special capabilities, e.g. guaranteeing execution time bounds for real-time applications like a control task on an autonomous guided vehicle. Therefore, we need models of the timing behavior of these microprocessors by which the execution time bounds can be computed. In this paper we develop a statistical model for the IPC rate of a general purpose multi-threaded microprocessor to predict timing behavior thus improving the real-time capability. We consider both eﬀects like data dependencies and processor speed-up techniques like branch- and branch target prediction, or caches. The model is a transfer function computing the IPC rate. By analyzing this model we obtain bounds for the IPC rate which can be used to compute bounds for the execution time of user applications. Another important use of a model like this is to control the IPC rate similar to [1,2,3]. Controlling the IPC rate in pipelined microprocessors is one of the long-term goals of our work: If we develop precise statistical models of the throughput we are able to adjust the controller parameters in a very ﬁne-grained way. In addition, we can compute estimations for the applications’ time bounds which is necessary for real-time systems. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 82–90, 2009. c IFIP International Federation for Information Processing 2009

Towards a Statistical Model of a Microprocessor’s Throughput

83

The paper is structured as follows: Section 2 presents related work and similar approaches. In Section 3 we discuss modern scalar and multi-threaded microprocessors and in Section 4 we present our model which is validated by an example. Section 5 concludes our paper and gives an outlook to the future work.

2

State of the Art

Many approaches for Worst Case Execution Time (WCET) analysis are known. Most of them examine the semantics of the program code in respect to the pipeline used for the execution resulting in a cycle accurate analysis of the program code. One example is the work in [4]. The authors examine the WCET in the Motorola ColdFire 5307 and study i.e. cache interferences occurring while loop execution. In [5], the authors discuss the WCET analysis in an out-of-order execution processor. They transform the WCET analysis problem by computing and examining the execution graph (whose nodes represent the tuple consisting of an instruction identiﬁer and a pipeline stage) of the program code to be executed. The authors of [6] consider the WCET analysis for processors with branch prediction. They classify each control transfer instruction with respect to branch prediction and use a timing analyzer to estimate the WCET according to the instruction classiﬁcation. A WCET analysis with respect to industrial requirements is discussed in [7]. A good review on existing WCET analysis techniques is given in [8]. The authors also present a generic framework for WCET analysis. In [9], the author propose to split the cache into several independent caches to simplify the WCET analysis and get tighter upper bounds. The authors of [10] design a model of a single-issue in-order pipeline for a static WCET analysis and consider time dependencies between instructions. The papers presented above mostly provide cycle accurate techniques for WCET analysis of program codes. This is diﬀerent to our approach as we use a probabilistic approach based on the pipeline structure. The characteristics of the program codes are generalized by statistical values like the probability of a misprediction etc. As a result, our model is not necessarily cycle accurate but we are able to use analytical techniques to examine the throughput behavior for individual programs as well as for program classes. Furthermore, as mentioned in the section 1, our long-term goal is to control the IPC rate. Using control theory as a basis, a model of a processor as proposed in this paper is necessary not only to analyze, but as well to improve and guarantee real-time behavior by a closed control loop.

3

Pipeline Stalls in Microprocessors

Techniques like long pipelines, branch prediction and caches were developed to improve the average performance of modern super-scalar microprocessors. But the worst case oriented real-time behavior suﬀers from various reasons like branch

84

U. Brinkschulte, D. Lohn, and M. Pacher

misprediction, cache misses, etc. Besides, there is only one set of registers in single-threaded processors, thus producing context switch costs of several clock cycles in case of a thread switch. On application level thread synchronization also introduces latency clock cycles if one thread has to wait on a synchronization event. This problem depends on the programming model and is not aﬀected by the architecture of the microprocessor used for its execution. Multi-threaded processors suﬀer from the same problems considering realtime as single-threaded processors. However, there are several diﬀerences which makes them an interesting research platform to model the IPC rate: Contrary to single-threaded processors there are mostly separated internal resources in multithreaded processors like program counters, status- and general purpose registers etc. for each thread. This decreases the interdependencies of diﬀerent threads caused by the processor hardware. The remaining thread interdependencies only depend on the programming model, thus e.g. the context switching time between diﬀerent threads is eliminated. In addition, if a scheduling strategy like Guaranteed Percentage Scheduling (GP-Scheduling, see [11]) is used, the controller is able to to control each thread in a ﬁne-grained way. In GP scheduling, a requested number of clock cycles is assigned to each thread. This assignment is guaranteed within a certain time period (e.g. 100 clock cycles). Fig. 1 gives an example of three threads with a GP rate of 30% for Thread A, 20% for Thread B, and 40% for Thread C, and a time period of 100 clock cycles. This means, thread A gets 30 clock cycles, thread B gets 20 clock cycles, and thread C gets 40 clock cycles within the 100 clock cycle time period. Thread A, 30% Thread B, 20% Thread C, 40%

...

Thread A 30 clock cycles

Thread B 20 clock cycles

Thread C 40 clock cycles

100 clock cycles

Thread A 30 clock cycles

Thread B 20 clock cycles

Thread C 40 clock cycles

...

100 clock cycles

Fig. 1. Example for a GP schedule

4

Modeling

In this section we present a statistical model of a microprocessor evolving the parameters inﬂuencing the throughput. Our ﬁrst approach considers a processor with

Towards a Statistical Model of a Microprocessor’s Throughput

85

one core and a simple scalar multi-threaded pipeline. Our analysis of throughput hazards starts with the lowest hardware level, the pipeline. Every instruction to be executed passes the pipeline. Therefore, we have to consider the ﬁrst pipeline stage, the instruction fetch unit. This stage exists in almost all modern microprocessors [11,12]. The instruction set of a processor can be divided into diﬀerent classes: An instruction is either controlﬂow or data related. We can compute a probability for the occurrence of instructions from these classes. The probability of the occurrence of a controlﬂow class instruction in the interval n is denoted by pa (n), while pb (n) represents the probability of a data related instruction in the interval n. We assume the probabilities to be time dependent, because they may change with the currently executed program code. First, we consider controlﬂow related instructions like unconditional and conditional branches. These instructions may lead to a lower IPC rate, caused by delay cycles in the pipeline. Therefore, it is necessary to identify them as early as possible and handle them appropriately1. This is done with the help of a branch target buﬀer (BTB) [11]. The BTB contains the target addresses of unconditional branches, and some additional prediction bits for conditional branches to predict the branch direction. Whenever the target address of an instruction can’t be found in the BTB, the pipeline has to be stalled until the target address has been computed. Therefore, we model these delay cycles by a penalty Datarget , while patarget (n) is the probability that such a stall event occurs in the time interval n. If a conditional branch is fetched, the predictor may fail. In this case the pipeline has to be ﬂushed and the instructions of the other branch direction have to be fed into the pipeline mostly leading to a long pipeline stall. The actual number of delay cycles depends on the length of the pipeline [11]. We call pamp (n) the probability a branch is mispredicted in the interval n and Damp the penalty in delay cycles for pipeline ﬂushing. Now, we consider data related instructions because data dependencies also inﬂuence the IPC rate. There are three diﬀerent kinds of dependencies [11]: The anti dependency or write-after-read-hazard (WAR) is the easiest one because it does not aﬀect the execution in an in-order-pipeline at all. As an example let’s assume an instruction writes to a register that was read earlier by another instruction. This does not inﬂuence the execution in any way. Output dependencies or write-after-write-hazards (WAW) can be solved by register renaming thus not aﬀecting the IPC rate, too. True dependencies or read-after-write-hazards (RAW) are the worst kind of data dependencies. Their impact on the IPC rate can be reduced by hardware (forwarding techniques [12]) or by software (instruction reordering [12]). However, in several cases instructions have to wait in the reservation stations and several delay cycles have to be inserted into the pipeline until the dependency is solved. pbd denotes the statistical probability for a pipeline stalling data dependency and Dbd denotes the average penalty in clock cycles. 1

A modern microprocessor is able to detect controlﬂow related instructions yet in the instruction fetch stage of its pipeline.

86

U. Brinkschulte, D. Lohn, and M. Pacher

The following formula (1) computes the IPC rate I of a microprocessor including the above mentioned pipeline obstacles in the interval n: G(n) 1 + X(n) X(n) =pa (n)(patarget (n)Datarget + pamp (n)Damp ) + pb (n)pbd (n)Dbd I(n) =

(1)

The IPC rate I(n) of the executed thread in the interval n is the Guaranteed Percentage rate G(n) divided by one plus a penalty term X(n), where X(n) is the expected value of all inserted penalty delay cycles. If we assume a perfect branch prediction and no pipeline stalls caused by data dependencies, then the probabilities for pipeline stalling events would be zero, turning the whole term X(n) into zero. The resulting IPC rate would equal the Guaranteed Percentage rate, due to no latency cycles occurring. However, in case a data dependency could not be solved by forwarding, then pbd (n) would not be zero and X(n) would contain the penalty for the data dependency. Therefore, the IPC rate would suﬀer. Figure 2 shows the impact of those pipeline hazards on the IPC rate. The next step is to consider the eﬀects of caches on the IPC rate, ignoring any delay cycles from other pipeline stages. Since we have no out-of-order execution, every cache miss leads to a pipeline stall, until the required data or instruction is available. The statistical probability of a cache miss occurring in the interval n is denoted by pc (n) and Dc is the average penalty in delay cycles. So the resulting formula is quite similar to formula 1: GP (n) 1 + Y (n) Y (n) =pc (n)Dc I(n) =

Fig. 2. Impact of pipeline hazards on the IPC rate

(2)

Towards a Statistical Model of a Microprocessor’s Throughput

87

Y (n) is the expected value of all delay cycles in the interval n, lowering the IPC rate I(n). Figure 2 shows the eﬀects of cache misses on the IPC rate. Our ﬁnal goal in this paper is to combine the pipeline hazard and the cache miss eﬀects in one formula. As there is no dependency between cache misses and pipeline hazards, all the inserted delay cycles can simply be added, resulting in a ﬁnal penalty of Z(n). Thus, we can bring together the eﬀects of pipeline hazards and cache misses leading to the following formula 3: GP (n) 1 + Z(n) Z(n) =X(n) + Y (n) X(n) =pa (n)(patarget (n)Datarget + pamp (n)Damp ) + pb (n)pbd (n)Dbd I(n) =

(3)

Y (n) =pc (n)Dc Figure 3 shows the according IPC rate, taking into account all eﬀects on hardware level. Now, we show that formula 3 is an adequate model of a simple microprocessor. Therefore, we examine a short code fragment of ten instructions executed in the time interval i: 1. 2. 3. 4. 5.

data instruction controlﬂow instruction (jump target not known) data instruction data instruction (with dependency) data instruction

Fig. 3. Impact of cache misses on the IPC rate

88

U. Brinkschulte, D. Lohn, and M. Pacher

Fig. 4. The ﬁnal IPC rate including pipeline hazards and cache misses

6. 7. 8. 9. 10.

data instruction (with dependency) controlﬂow instruction data instruction (cache miss) data instruction data instruction

We assume our microprocessor has a ﬁve stage pipeline and runs two diﬀerent threads and a Guaranteed Percentage value of 0.5 is granted to each of them . Furthermore, we assume a penalty of 2 clock cycles for an unknown branch target, 5 clock cycles for ﬂushing the pipeline after a mispredicted branch, 1 clock cycle for an unresolved data dependency and 30 clock cycles for a cache miss. Analyzing the code fragment produces the following probability values: pa (i) = 0.2 patarget (i) = 0.1 pamp (i) = 0 pb (i) = 0.8 pbd (i) = 0.25 pc = 0.1 Having these values we are able to compute the IPC rate according to our model: X(i) = 0.2 · (0.1 · 2 + 0) + 0.8 · 0.25 · 1 = 0.4 Y (i) = 0.1 · 30 = 3 Z(i) = 0.4 + 3 = 3.4 0.5 I(i) = ≈ 0.114 1 + 3.4

Towards a Statistical Model of a Microprocessor’s Throughput

89

To verify the model, we examine what happens on pipeline level. At the beginning of interval i it takes ﬁve clock cycles to ﬁll the pipeline. At the 6th clock cycle the ﬁrst instruction is completed and then the pipeline is stalled for two cycles to compute the branch target of instruction 2. So instruction 2 ﬁnishes at the 9th clock cycle. Instruction 3 is completed at the 10th clock cycle and instruction 4 at the 12th clock cycle, because the unresolved data dependency of instruction 4 leads to a pipeline stall of one cycle. At the 13th clock cycle, instruction 5 is ﬁnished and at the 15th and 16th clock cycles the instructions 6 and 7, too. Because a cache miss happens during the execution of instruction 8, it ﬁnishes at the 47th clock cycle. The last two instructions ﬁnish at the 48th and 49th clock cycle. Since the thread has a GP value of 0.5, we have to double the execution time. This means, the execution of the code fragment would take 98 clock cycles on the real processor. This is already very close to our model (about 10%). If we neglect the ﬁrst cycles needed to ﬁll the pipeline we even get exactly an IPC rate of I(i) = 10 88 ≈ 0.114. Since real programs consist of many instructions, the time for the ﬁrst pipeline ﬁlling can be easily neglected, thus enabling our model to predict the correct value of the IPC rate.

5

Conclusion and Future Work

In this paper we developed a statistical model of a simple multi-threaded microprocessor to compute the throughput of a thread. We started to consider the inﬂuence of hardware eﬀects like pipeline hazards or cache misses on the IPC rate. First, we considered each hardware eﬀect on its own, then we combined all together to a single formula, see formula 3. We showed with the help of an example that our model adequately describes the IPC rate. Future work will concern further improvements of the model, taking into account more advanced hardware techniques, like multicore or out-of-order execution. As already mentioned above, our future work will not only concern to compute the throughput of a thread, but also to control and stabilize it to a given IPC rate by closed control loops. Therefore, we want to develop a model by what we are able to identify the most important parameters for the IPC rate.

References 1. Brinkschulte, U., Pacher, M.: A Control Theory Approach to Improve the RealTime Capability of Multi-Threaded Microprocessors. In: ISORC, pp. 399–404 (2008) 2. Pacher, M., Brinkschulte, U.: Implementing Control Algorithms Within a Multithreaded Java Microcontroller. In: Beigl, M., Lukowicz, P. (eds.) ARCS 2005. LNCS, vol. 3432, pp. 33–49. Springer, Heidelberg (2005) 3. Brinkschulte, U., Pacher, M.: Improving the Real-time Behaviour of a Multithreaded Java Microcontroller by Control Theory and Model Based Latency Prediction. In: WORDS 2005, Tenth IEEE International Workshop on Object-oriented Real-time Dependable Systems, Sedona, Arizona, USA (2005)

90

U. Brinkschulte, D. Lohn, and M. Pacher

4. Langenbach, M., Thesing, S., Heckmann, R.: Pipeline modeling for timing analysis. In: Hermenegildo, M.V., Puebla, G. (eds.) SAS 2002. LNCS, vol. 2477, pp. 294–309. Springer, Heidelberg (2002) 5. Li, X., Roychoudhury, A., Mitra, T.: Modeling out-of-order processors for software timing analysis. In: RTSS 2004: Proceedings of the 25th IEEE International RealTime Systems Symposium, Washington, DC, USA, pp. 92–103. IEEE Computer Society, Los Alamitos (2004) 6. Colin, A., Puaut, I.: Worst case execution time analysis for a processor withbranch prediction, vol. 18(2/3), pp. 249–274. Kluwer Academic Publishers, Norwell (2000) 7. Ferdinand, C.: Worst case execution time prediction by static program analysis, vol. 3, p. 125a. IEEE Computer Society, Los Alamitos (2004) 8. Kirner, R., Puschner, P.: Classiﬁcation of WCET analysis techniques. In: Proc. 8th IEEE International Symposium on Object-oriented Real-time distributed Computing, May 2005, pp. 190–199 (2005) 9. Schoeberl, M.: Time-predictable cache organization (2009), http://www.jopdesign.com/doc/tpcache.pdf 10. Engblom, J., Jonsson, B.: Processor pipelines and their properties for static wcet analysis. In: Sangiovanni-Vincentelli, A.L., Sifakis, J. (eds.) EMSOFT 2002. LNCS, vol. 2491, pp. 334–348. Springer, Heidelberg (2002) 11. Brinkschulte, U., Ungerer, T.: Mikrocontroller und Mikroprozessoren, 2nd edn. Springer, Heidelberg (2007) 12. Hennessy, J.L., Patterson, D.A.: Computer architecture: a quantitative approach, 4th edn. Elsevier [u.a.], Amsterdam (2007); Includes bibliographical references and index

Joining a Distributed Shared Memory Computation in a Dynamic Distributed System Roberto Baldoni1 , Silvia Bonomi1 , and Michel Raynal2 1

2

Universit´a La Sapienza, Via Ariosto 25, I-00185 Roma, Italy IRISA, Universit´e de Rennes, Campus de Beaulieu, F-35042 Rennes, France {baldoni,bonomi}@dis.uniroma1.it, [email protected]

Abstract. This paper is on the implementation of high level communication abstractions in dynamic systems (i.e., systems where the entities can enter and leave arbitrarily). Two abstractions are investigated, namely the read/write register and add/remove/get set data structure. The paper studies the join protocol that a process has to execute when it enters the system, in order to obtain a consistent copy of the (register or set) object despite the uncertainty created by the net effect of concurrency and dynamicity. It presents two join protocols, one for each abstraction, with provable guarantees. Keywords: Churn, Dynamic system, Provable guarantee, Regular register, Set object, Synchronous system.

1 Introduction Dynamic systems. The passage from statically structured distributed systems to unstructured ones is now a reality. Smart environments, P2P systems and networked systems are examples of modern systems where the application processes are not aware of the current system composition. Because they are run on top of a dynamic distributed system, these applications have to accommodate a constantly change of the their membership (i.e., churn) as a natural ingredient of their life. As an extreme, an application can cease to run when no entity belongs to the membership, and can later have a membership formed by thousands of entities. Considering the family of state-based applications, the main issue consists in maintaining their state despite membership changes. This means that a newcomer has to obtain a valid state of the application before joining it (state transfer operation). This is a critical operation as a too high churn may prevent the newcomer from obtain such a valid state. The shorter the time taken by the join procedure to transfer a state, the highest the churn rate the join protocol is able to cope with. Join protocol with provable guarantees. This paper studies the problem of joining a computation that implements a distributed shared memory on the top of a messagepassing dynamic distributed system. The memory we consider is made up of the noteworthy object abstractions that are the regular registers and the sets. For each of them, a notion of admissible value is defined. The aim of that notion is to give a precise meaning to the object value a process can obtain in presence of concurrency and dynamicity. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 91–102, 2009. c IFIP International Federation for Information Processing 2009

92

R. Baldoni, S. Bonomi, and M. Raynal

The paper proposes two join protocols (one for each object type) that provide the newcomer with an admissible value. To that end, the paper considers an underlying synchronous system where, while processes can enter and leave the application, their number remains always constant. While the regular register object is a well-known shared memory abstraction introduced by Lamport [10], the notion of a set object in a distributed context is less familiar. The corresponding specification given in this paper extends the notion of weak set introduced by Delporte-Gallet and Fauconnier in [5]. Roadmap. The paper is made up of 5 sections. First, Section 2 introduces the register and set objects (high level communication abstractions), and Section 3 presents the underlying computation model. Then, Sections 4 and 5 presents two join protocols, each suited to a specific object.

2 Distributed Shared Memory Paradigm A distributed shared memory is a programming abstraction, built on top of a message passing system, that allows processes to communicate and exchange informations by invoking operations that return or modify the content of shared objects, thereby hiding the complexity of the message exchange needed to maintain it. One of the simplest shared objects that can be considered is a register. Such an object provides the processes with two operations called read and write. Objects such as queues, stacks are more sophisticated objects. It is assumed that every process it sequential: it invokes a new operation on an object only after receiving an answer from its previous object invocation. Moreover, we assume a global clock that is not accessible to the processes. This clock can be seen as measuring the real time as perceived by an external observer that would not part of the system. 2.1 Base Definitions An operation issued on a shared object is not instantaneous: it takes time. Hence, two operations executed by two different processes, may overlap in time. Two events (denoted invocation and the response) are associated with each operation. They occur at the beginning (invocation time) and at the end of the operation (return time). Given two operation op and op having respectively invocation times tB (op) and tB (op ), and return times tE (op) and tE (op ), respectively, we say that op precedes op (op ≺ op ) iff tE (op) < tB (op ). If op does not precede op and op does not precede op then they are concurrent (op||op ). Definition 1 (Execution History). Let H be the set of all the operations issued on a = (H, ≺) is a partial order on H satisfying shared object O. An execution history H the relation ≺. t of H at time t). Given an execution history H = (H, ≺) Definition 2 (Sub-history H such that: and a time t, the sub-history Ht = (Ht , ≺t ) of H at time t is the sub-set of H (i) Ht ⊆ H, and (ii) ∀op ∈ H such that tB (op) ≤ t then op ∈ Ht .

Joining a Distributed Shared Memory Computation

93

2.2 Regular Register: Definition A register object R has two operations. The operation write(v) defines the new value v of the register, while the operation read() returns a value from the register. The semantic of a register is given by specifying which are the values returned by its read operations. Without loss of generality, we consider that no two write operations write the same value. This paper consider a variant of the regular register abstraction as defined by Lamport [10]. In our case, a regular register can have any number of writers and any number of readers [13]. The writes appear as if they were executed sequentially, this sequence complying with their real time occurrence order (i.e., if two writes w1 and w2 are concurrent they can appear in any order, but if w1 terminates before w2 starts, w1 has to appear as being executed before w2 ). As far as a read operation is concerned we have the following. If no write operation is concurrent with a read operation, that read operation returns the last value written in the register. Otherwise, the read operation returns any value written by a concurrent write operation or the last value of the register before these concurrent writes. Definition 3 (Admissible value for a read() operation). Given a read() operation op, a value v is admissible for op if: – ∃ write(v) : write(v) ≺ op ∨ write(v) || op, and – write(v ) (with v = v ) : write(v) ≺ write(v ) ≺ op. Definition 4 (Admissible value for a regular register at time t). Given an execution = (H, ≺) of a regular register R and a time t, let H t = (Ht , ≺t ) be the subhistory H history of H at time t. An admissible value at time t for R is any possible admissible value v for an instantaneous read operation op executed at time t. 2.3 Set Data Structure: Definition A set object S can be accessed by processes by means of three operations: add() and remove() that modify the content of the set and get() that returns the current content of the set. The add(v) operation takes an input a parameter v and returns the value ok when it terminates. Its aim is to add the element v to S. Hence, if {x1 , x2 , . . . , xk } are the values belonging to S before the invocation of add(v), and if no remove(v) operation is executed concurrently, the value of the set will be {x1, x2 , . . . , xk , v} after its invocation. The remove(v) operation takes an input a parameter v and returns the value ok. Its aim is to delete the element v from S if v belongs to S, otherwise the remove operation has no effect. The get() operation takes no input parameter. It returns a set containing the current content of S, without modifying the content of the object. In a concurrency-free context, every get() operation returns the current content of the set. The content of the set is welldefined when the operations occur sequentially. In order to state without ambiguity the value returned by get() operation in a concurrency context, let us introduce the notion of admissible values for a get() operation op (i.e. Vad (op)) by defining two sets, denoted sequential set (Vseq (op)) and concurrent set (Vconc (op)).

94

R. Baldoni, S. Bonomi, and M. Raynal

GET()

Pi ADD(2)

Pj Pk

GET()

Pi REMOVE(1)

Pj

ADD(1)

Pk

(a) Vseq = {1, 2}, Vconc = ∅

(b) Vseq = ∅, Vconc = ∅ GET()

Pi Pj Pk

ADD(1)

REMOVE(1) ADD(1)

(c) Vseq = ∅, Vconc = {1} Fig. 1. Vseq and Vconc in distinct executions

Definition 5 (Sequential set for a get() operation). Given a get() operation op executed on S, the set of sequential values for op is a set (denoted Vseq (op)) that contains all the values v such that: 1. ∃ add(v) : add(v) ≺ op, and 2. If remove(v) exists, then add(v) ≺ op ≺ remove(v). As an example, let consider Figure 1(a). The sequential set Vseq (op) = {1, 2} because there exist two operations adding the values 1 and 2, respectively, that terminate before the get() operation starts, and there is neither remove(1), nor remove(2), before get() termionates. Differently, Vseq (op) = ∅ in Figure 1(b). Definition 6 (Concurrent set for a get() operation). Given a get() operation op executed on S, the set of concurrent values for the get() operation is a set (denoted Vconc (op)) that contains all the value v such that: 1. ∃ add(v) : add(v) || op, or 2. ∃ add(v), remove(v) : (add(v) ≺ op) ∧ (remove(v) || op), or 3. ∃ add(v), remove(v) : add(v) || remove(v) ∧ add(v) ≺ op ∧ remove(v) ≺ op. When considering the execution of Figure 1(c), Vconc (op) = {1} due to the first item 1 of the previous definition. Definition 7 (Admissible set for a get() operation). Given a get() operation op, a sequential set Vseq (op) and a concurrent set Vconc (op), a set Vad (op) is an admissible set of values for op if Vseq (op) ⊆ Vad (op) ∧ Vad (op) \ Vseq (op) ⊆ Vconc (op) . As an example, let consider the executions depicted in Figure 1. For Figure 1(a) and Figure 1(b), there exists only one admissible set Vad for the get operation and it is respectively Vad (op) = {1, 2} and Vad (op) = ∅. Differently, for Figure 1(c) there exist two different admissible sets for the get operation, namely, Vad (op) = ∅ and Vad (op) =

Joining a Distributed Shared Memory Computation

ADD(4)

GET()

95

REMOVE(2)

Pi

GET()

REMOVE(4)

ADD(1)

GET()

Pj

ADD(3) Pk

REMOVE(3)

ADD(2)

t

t at time t Fig. 2. Sub-History H

{1}. Note that, in the executions depicted in Figure 1(c), if there was another get() operation after the add() and the remove() operations, these two get() could return different admissible sets. In order to take into consideration such point, consistency criteria have to be defined. Definition 8 (Admissible Sets of values at time t). An admissible set of values at time t for S (denoted V ad (t)) is any possible admissible set Vad (op) for any get() operation op that would occur instantaneously at time t. As an example, consider the scenario depicted in Figure 2. The sub-history at the time t is the partial order of all the operations started before t (i.e. the operations belonging to the set Ht are add(4) and get() executed by pi , get(), remove(4) and add(1) executed by pj , and add(3) and remove(3) executed by pk . The instantaneous get operation op is concurrent with add(1) executed by pj and remove(3) executed by pk . The sequential set for op Vseq (op) is ∅ because for both the add operations preceding op there exists a remove not following op while the concurrent set for op Vconc (op) is {1, 3}. The possible admissible sets for op (and then the possible admissible sets at time t) could be then (i) ∅, (ii) {1}, (iii) {3} and (iv) {1, 3}.

3 Joining a Computation in Dynamic Distributed System 3.1 System Model The distributed system is composed, at each time, by a fixed number (n) of processes that communicate by exchanging messages. Processes are uniquely identified with their indexes and they may join and leave the system at any point in time. The system is synchronous in the following sense. The processing times of local computations are negligible with respect to communication delays, so they are assumed to be equal to 0. Contrarily, messages take time to travel to their destination processes, but their transmission time is upper bounded. Moreover, we assume that processes can access a global clock (this is for ease of presentation; as we are in a synchronous system, such a global clock could be implemented by synchronized local clocks). We assume

96

R. Baldoni, S. Bonomi, and M. Raynal

that there exists an underling protocol (implemented at the connectivity layer) that keeps processes connected. 3.2 The Problem Given a shared object O (e.g. a register or a set), it is possible to associate with it, at each time t, a set of admissible values. Processes continuously join the system along time and every process pi that enters the computation has no information about the current state of the object with the consequence of being unable to perform any operation. Therefore every process pi that wishes to enter into the computation needs to retrieve an admissible value for the object O from the other processes. This problem is captured by adding a join() operation that has to be invoked by every joining process. This operation is implemented by a distributed protocol that builds an admissible value for the object. 3.3 Distributed Computation A distributed computation is defined, at each time, by a subset of processes. A process p, belonging to the system, that wants to participate to the distributed computation has to execute the join() operation. Such an operation, invoked at some time t, is not instantaneous. But, from time t, the process p can receive and process messages sent by any other process that belongs to the system and that participate to the computation. Processes participating to the computation implements a shared object. A process leaves the computation in an implicit way. When it does, it leaves the computation forever and does not longer send messages. (From a practical point of view, if a process wants to re-enter the system, it has to enter it as a new process, i.e., with a new name.) We assume that no process crashes during the computation (i.e., it does not crash from the time it joins the system until it leaves). In order to formalize the set of processes that participate actively to the computation we give the following definition. Definition 9. A process is active from the time it returns from the join() operation until the time it leaves the system. A(t) denotes the set of processes that are active at time t, while A([t1 , t2 ]) denotes the set of processes that are active during the interval [t1 , t2 ]. 3.4 Communication Primitives Two communication primitives are used by processes belonging to the distributed computation to communicate: point-to-point and broadcast communication. Point-to-point communication. This primitive allows a process pi to send a message to another process pj as soon as pi knows that pj has joined the computation. The network is reliable in the sense that it does not loose, create or modify messages. Moreover, the synchrony assumption guarantees that if pi invokes “send m to pj ” at time t, then pj receives that message by time t + δ (if it has not left the system by that time). In that case, the message is said to be “sent” and “received”.

Joining a Distributed Shared Memory Computation

97

Broadcast. Processes participating to the distributed computation are equipped with an appropriate broadcast communication sub-system that provides the processes with two operations, denoted broadcast() and deliver(). The former allows a process to send a message to all the processes in the distributed system, while the latter allows a process to deliver a message. Consequently, we say that such a message is “broadcast” and “delivered”. These operations satisfy the following property. – Timely delivery: Let t be the time at which a process p belonging to the computation invokes broadcast(m). There is a constant δ (δ ≥ δ ) (known by the processes) such that if p does not leave the system by time t + δ, then all the processes that are in the system at time t and do not leave by time t + δ, deliver m by time t + δ. Such a pair of broadcast operations has first been formalized in [8] in the context of systems where process can commit crash failures. It has been extended to the context of dynamic systems in [7]. 3.5 Churn Model The phenomenon of continuous arrival and departure of nodes in the system is usually referred as churn. In this paper, the churn of the system is modeled by means of the join distribution λ(t), the leave distribution µ(t) and the node distribution N (t) [3]. The join and the leave distribution are discrete functions of the time that returns, for any time t, respectively the number of processes that have invoked the join operation at time t and the number of processes that have left the system at time t. The node distribution returns, for every time t, the number of processes inside the system. We assume, at the beginning, n0 processes inside the system and we assume to have λ(t) = µ(t) = cn0 (where c ∈ [0, 1] is a percentage of node of the system) meaning that at each time unit, the number of precess that joins the system is the same as the number of process that leave, i.e. the number of processes inside the system N (t) is always equal to n0 .

4 Joining a Register Computation 4.1 The Protocol Local variables at a process pi Each process pi has the following local variables. – Two variables denoted registeri and sni ; registeri contains the local copy of the regular register, while sni is the associated sequence number. – A boolean activei , initialized to false, that is switched to true just after pi has joined the system. – Two set variables, denoted repliesi and reply toi , that are used during the period during which pi joins the system. The local variable repliesi contains the 3-uples < id, value, sn > that pi has received from other processes during its join period, while reply toi contains the processes that are joining the system concurrently with pi (as far as pi knows). The local variables of each process pk (of the n processes that compose the initial set of processes) are such that registerk contains the initial value of the regular register (say the value 0), snk = 0, activek = true, and repliesk = reply tok = ∅.

98

R. Baldoni, S. Bonomi, and M. Raynal

operation join(i): (01) registeri ← ⊥; sni ← −1; active i ← false; repliesi ← ∅; reply toi ← ∅; (02) wait(δ); (03) if (registeri = ⊥) then (04) repliesi ← ∅; broadcast INQUIRY (i); wait(2δ); (05) let < id, val, sn >∈ repliesi such that (∀ < −, −, sn >∈ repliesi : sn ≥ sn ); (06) if (sn > sni ) then sni ← sn; registeri ← val end if (07) end if; (08) activei ← true; (09) for each j ∈ reply toi do send REPLY (< i, registeri , sni >) to pj end for; (10) return(ok). ————————————————————————————————————– (11) when INQUIRY(j) is delivered: (12) if (activei ) then send REPLY (< i, registeri, sni >) to pj (13) else reply toi ← reply toi ∪ {j} (14) end if. (15) when REPLY(< j, value, sn >) is received: repliesi ← repliesi ∪ {< j, value, sn >}. Fig. 3. The join() protocol for a register object in a synchronous system (code for pi )

The join() operation When a process pi enters the system, it first invokes the join operation. The algorithm implementing that operation, described in Figure 3, involves all the processes that are currently present (be them active or not). The interested reader will find a proof in [3]. First pi initializes its local variables (line 01), and waits for a period of δ time units (line 02); This waiting period is explained later. If registeri has not been updated during this waiting period (line 03), pi broadcasts (with the broadcast() operation) an INQUIRY (i) message to the processes that are in the system (line 04) and waits for 2δ time units, i.e., the maximum round trip delay (line 04)1 . When this period terminates, pi updates its local variables registeri and sni to the most uptodate values it has received (lines 05-06). Then, pi becomes active (line 08), which means that it can answer the inquiries it has received from other processes, and does it if reply to = ∅ (line 09). Finally, pi returns ok to indicate the end of the join() operation (line 10). When a process pi receives a message INQUIRY(j), it answers pj by return sending back a REPLY(< i, registeri , sni >) message containing its local variable if it is active (line 12). Otherwise, pi postpones its answer until it becomes active (line 13 and lines 08-09). Finally, when pi receives a message REPLY(< j, value, sn >) from a process pj it adds the corresponding 3-uple to its set repliesi (line 15). 1

The statement wait(2δ) can be replaced by wait(δ + δ ), which provides a more efficient join operation; δ is the upper bound for the dissemination of the message sent by the reliable broadcast that is a one-to-many communication primitive, while δ is the upper bound for a response that is sent to a process whose id is known, using a one-to-one communication primitive. So, wait(δ) is related to the broadcast, while wait(δ ) is related to point-to-point communication. We use the wait(2δ) statement to make easier the presentation.

Joining a Distributed Shared Memory Computation δ pj

δ

0 write (1) 1

ph

0

pk

0

pi

99

1 1

⊥

pj

0 write (1) 1

ph

0

pk

0

pi

⊥

0 read()

Join()

Join

1 1

1

Join()

read() Join

Write

Write

Reply

δ

δ

Reply

δ

(a) Without wait(δ)

δ

δ

(b) With wait(δ)

Fig. 4. Why wait(δ) is required

Why the wait(δ) statement at line 02 of the join() operation? To motivate the wait(δ) statement at line 02, let us consider the execution of the join() operation depicted in Figure 4(a). At time τ , the processes pj , ph and pk are the three processes composing the system, and pj is the writer. Moreover, the process pi executes join() just after τ .The value of the copies of the regular register is 0 (square on the left of pj , ph and pk ), while registeri = ⊥ (square on its left). The ‘timely delivery” property of the broadcast invoked by the writer pj ensures that pj and pk deliver the new value v = 1 by τ + δ. But, as it entered the system after τ , there is no such a guarantee for pi . Hence, if pi does not execute the wait(δ) statement at line 02, its execution of the lines 03-07 can provide it with the previous value of the regular register, namely 0. If after obtaining 0, pi issues another read it obtains again 0, while it should obtain the new value v = 1 (because 1 is the last value written and there is no write concurrent with this second read issued by pi ). The execution depicted in Figure 4(b) shows that this incorrect scenario cannot occur if pi is forced to wait for δ time units before inquiring to obtain the last value of the regular register.

5 Joining a Set Computation 5.1 The Protocol Local variables at process pi . Each process pi has the following local variables. – Two variables denoted seti and sni ; seti is a set variable and contains the local copy of the set, while sni is an integer variable that count how many update operations have been executed by process pi on the local copy of the set. – A FIFO set variable lastopsi used to maintain an history of the update operations executed by pi . Such variable contains all the 3-uples < val, op type, id > each one characterizing an operation of type op type = {add or remove} of the value val issued by a process with identity id. – A boolean activei , initialized to false, that is switched to true just after pi has joined the system. – Three set variables, denoted repliesi , reply toi and pendingi ,that are used in the period during which pi joins the system. The local variable repliesi contains the

100

R. Baldoni, S. Bonomi, and M. Raynal

3-uples < set, sn, ops > that pi has received from other processes during its join period, while reply toi contains the processes that are joining the system concurrently with pi (as far as pi knows). The set pendingi contains the 3-uples < val, op type, id > each one characterizes an update operation executed concurrently with the join. Initially, n processes compose the system. The local variables of each of these processes pk are such that setk contains the initial value of the regular register (without loss of generality, we assume that, at the beginning, every process pk has nothing in its variable setk ), snk = 0, activek = true, and pendingk = repliesk = reply tok = ∅. The join() operation. The algorithm implementing the join operation for a set object, is described in Figure 5, and involves all the processes that are currently present (be them active or not). First pi initializes its local variables (line 01), and waits for a period of δ time units (line 02); the motivations for such waiting period are basically the same described for the regular register and it is needed to avoid that pi looses some update. After this waiting period, pi broadcasts (with the broadcast() operation) an INQUIRY (i) message to the processes that are in the system and waits for 2δ time units, i.e., the maximum round trip delay (line 02). When this period terminates, pi first updates its local variables seti , sni and lastopsi to the most uptodate values it has received (lines 03-04) and then executes all the operations concurrent with the join contained in pendingi and not yet executed (lines 05-13). Then, pi becomes active (line 14), which means that it can answer the inquiries it has received from other processes, and does it if reply to = ∅ (line 15). Finally, pi returns ok to indicate the end of the join() operation (line 16). When a process pi receives a message INQUIRY(j), it answers pj by sending back a REPLY(< seti , sni , lastopsi >) message containing its local variables if it is active (line 18). Otherwise, pi postpones its answer until it becomes active (line 19 and line 15). Finally, when pi receives a message REPLY(< set, sn, ops >) from a process pj it adds the corresponding 3-uple to its set repliesi (line 21). 5.2 add() and remove() protocols These protocols are trivially executed by sending an update message using the broadcast primitives (i.e. their execution time is bounded by δ). At the receipt of the update message, every process pi checks its state. If pi is active, it simply adds or removes the value from its local copy of the set. If pi is not active (i.e. it is still executing the join() protocol), it buffers the operation in the local set pendingi set by adding the 3-uple < val, op type, id >. Such a tuple is made up of (i) the value val to be updated, (ii) the type op type of the operation (add or remove), and (iii) the id of the process that issued the update. Every operation in the set pendingi will be then executed by pi at the end of the join() protocol (lines 05-13 of Figure 5). 5.3 Correctness Proof Due to page limitation, this section only states two lemmas and the main theorem. Their proofs can be found in [4].

Joining a Distributed Shared Memory Computation

101

operation join(i): (01) sni ← 0; lastopsi ← ∅ seti ← ∅; active i ← false; pending i ← ∅; repliesi ← ∅; reply toi ← ∅; (02) wait(δ); broadcast INQUIRY (i); wait(2δ); (03) let < set, sn, ls >∈ repliesi such that (∀ < −, sn , − >∈ repliesi : sn ≥ sn ); (04) seti ← set; sni ← sn; lastopi ← ls; (05) for each < val, op type, id >∈ pendingi do (06) if (< val, op type, id >∈ / lastopi ) then (07) sni ← sni + 1; (08) lastopi ← lastopi ∪ {< val, op type, id >}; (09) if (op type = add) then seti ← seti ∪ {val}; (10) else seti ← seti /{val}; (11) end if (12) end if (13) end for; (14) activei ← true; (15) for each j ∈ reply toi do send REPLY (< seti , sni , lastopi >) to pj end for; (16) return(ok). ——————————————————————————————————— (17) when INQUIRY(j) is delivered: (18) if (activei ) then send REPLY (< seti , sni , lastopi >) to pj (19) else reply toi ← reply toi ∪ {j} (20) end if. (21) when REPLY(< set, sn, ops >) is received: repliesi ← repliesi ∪ {< set, sn, ops >}. Fig. 5. The join() protocol for a set object in a synchronous system (code for pi )

Lemma 1. Let c < 1/3δ. ∀t : |A[t, t + 3δ]| ≥ n(1 − 3δc) > 0. = Lemma 2. Let t0 be the time at which the computation of a set object S starts, H at (H, ≺) an execution history of S, and Ht1 +3δ = (Ht1 +3δ , ≺) the sub-history of H time t1 + 3δ. Let pi be a process that invokes join() on S at time t1 = t0 + 1, if c < 1/3δ then at time t1 + 3δ the local copy seti of S maintained by pi will be an admissible set at time t1 + 3δ. = (H, ≺) be the execution history of a set object S, and pi a Theorem 1. Let H process that invokes join() on the set S at time t. If c < 1/3δ then at time t + 3δ the local copy seti of S maintained by pi will be an admissible set at time t + 3δ.

References 1. Aguilera, M.K.: A Pleasant Stroll Through the Land of Infinitely Many Creatures. ACM SIGACT News, Distributed Computing Column 35(2), 36–59 (2004) 2. Baldoni, R., Bonomi, S., Kermarrec, A.M., Raynal, M.: Implementing a Register in a Dynamic Distributed System. In: Proc. 29th IEEE Int’l Conference on Distributed Computing Systems (ICDCS 2009), pp. 639–647. IEEE Computer Society Press, Los Alamitos (2009)

102

R. Baldoni, S. Bonomi, and M. Raynal

3. Baldoni, R., Bonomi, S., Raynal, M.: Regular Register: an Implementation in a Churn Prone Environment. In: SIROCCO 2009. LNCS, vol. 5869. Springer, Heidelberg (2009) 4. Baldoni, R., Bonomi, S., Raynal, M.: Joining a Distributed Shared Memory Computation in a Dynamic Distributed System. Tech Report 5/09, MIDLAB, Universit´a di Roma, La Sapienza (Italy) (July 2009), http://www.dis.uniroma1.it/˜midlab/publications 5. Delporte-Gallet, C., Fauconnier, H.: Two Consensus Algorithms with Atomic Registers and Failure Detector Ω. In: Garg, V., Wattenhofer, R., Kothapalli, K. (eds.) ICDCN 2009. LNCS, vol. 5408, pp. 251–262. Springer, Heidelberg (2009) 6. Dolev, S., Gilbert, S., Lynch, N., Shvartsman, A., Welch, J.: GeoQuorums: Implementing Atomic Memory in Mobile Ad Hoc Networks. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 306–320. Springer, Heidelberg (2003) 7. Friedman, R., Raynal, M., Travers, C.: Abstractions for Implementing Atomic Objects in Distributed Systems. In: Anderson, J.H., Prencipe, G., Wattenhofer, R. (eds.) OPODIS 2005. LNCS, vol. 3974, pp. 73–87. Springer, Heidelberg (2006) 8. Hadzilacos, V., Toueg, S.: Reliable Broadcast and Related Problems. Distributed Systems, 97–145 (1993) 9. Ko, S., Hoque, I., Gupta, I.: Using Tractable and Realistic Churn Models to Analyze Quiescence Behavior of Distributed Protocols. In: Proc. 27th IEEE Int’l Symposium on Reliable Distributed Systems, SRDS 2008 (2008) 10. Lamport, L.: On Interprocess Communication, Part 1: Models, Part 2: Algorithms. Distributed Computing 1(2), 77–101 (1986) 11. Leonard, D., Yao, Z., Rai, V., Loguinov, D.: On lifetime-based node failure and stochastic resilience of decentralized peer-to-peer networks. IEEE/ACM Transaction on Networking 15(3), 644–656 (2007) 12. Merritt, M., Taubenfeld, G.: Computing with Infinitely Many Processes. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, pp. 164–178. Springer, Heidelberg (2000) 13. Shao, C., Pierce, E., Welch, J.: Multi-writer consistency conditions for shared memory objects. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 106–120. Springer, Heidelberg (2003)

BSART (Broadcasting with Selected Acknowledgements and Repeat Transmissions) for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network Ingu Han1,*, Kee-Wook Rim2, and Jung-Hyun Lee1 1

Dept. of computer science & information technology, Inha University 2 Dept. of computer & information science, Sunmoon University [email protected], [email protected], [email protected]

Abstract. In this paper, we suggest enhanced broadcasting method, named 'BSART(Broadcasting with Selected Acknowledgement and Repeat Transmissions)' which reduces broadcast storm and ACK implosion on the mobile ad hoc network with switched beam antenna elements that can enable bidirectional communication. To reduce broadcast storm, we uses DPDP(Directional Partial Dominant Pruning) method, too. To control ACK implosion problem rising on reliable transmission based on ACK, in case of the number of nodes that required message reception is more than throughput, each nodes retransmit messages constant times without ACK which considering message transmission success probability through related antenna elements(R method). Otherwise, the number of message reception nodes is less than throughput, each node verify message reception with ACK with these antenna elements(A method). In this paper, we suggest mixed R /A method. This method not only can control the number of message transmitting nodes, can manage the number of ACK for each antenna elements. By simulations, we proved that our method provides higher transmission rate than legacy system, reduces broadcast messages and ACKs.

‐

‐

‐ ‐

‐

Keywords: selected broadcasting, mobile ad-hoc network, node selection.

1 Introduction Because every node roles not only host but router, the broadcasting method is indispensable to wireless ad hoc network for searching special node’s positioning information or indentifying existence of any node. To control broadcast storm problem which too heavily duplicated messages are occurred when nodes operate broadcasting, it is useful a method that only a few node receives forwarded message[1][2][3]. The CDS(connected dominant set) can be is equal to forward node set for those network

‐

*

"This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) Support program supervised by the IITA(Institute of Information Technology Advancement)" (IITA-2009-C1090-0902-0020.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 103–113, 2009. © IFIP International Federation for Information Processing 2009

104

I. Han, K.-W. Rim, and J.-H. Lee

‐

set, but it is proved that finding the lowest cost CDS is NP complete problem. There are various heuristic methods to search CDS, the one method is source dependent broadcasting which consists of only one CDS per whole network, another method is source independent broadcasting which consists of one CDS per each network, the other method which mixes source independent method and source dependent method[2][3][5][6]. In general, the former method can reduce the number of selected nodes, the latter method can support node’s mobility and it also can split up whole traffics. The wireless ad hoc network environment may increase the rate of error during transmission rather than wired network environment, and the probability of message loss is high because the signals interfere and collide with each other. The one solution of these problems is ACK transmission and the other solution is selective flooding which can receive partially overlapped messages [4][7][8]. But if all nodes that received broadcasting message response with ACK message, it may cause ACK implosion which many ACK messages occur simultaneously and it leads congestion[9]. Furthermore it can reduce the performance of link in the case of ACK message is missed, because nodes must retransmit messages[1][9]. Related researches show that the node which required receiving and forwarding is applied ACK response method, and dead end node that required only receiving can receive duplicated message from neighboring nodes in the wireless ad hoc network environment with omnidirectional antennas[2][11]. But these methods select forwarding nodes that neighboring all dead end nodes with definite number of forwarding nodes compulsory, it may increase the number of forwarding node. This means that the number of broadcasting messages and ACK messages are increased, and so it can’t be appropriate solution for broadcast storm or ACK implosion. The methods that reducing duplicated messages in the wireless ad hoc network with directional antenna are message forwarding in the MAC layer, directional self pruning, three hop horizon pruning and etc. But these research didn't consider reliable transmission or ACK implosion problem though they attempt to reduce broadcasting messages[5][8][12]. Most research considered just one of them, but Lou-Wu considered both problems[1][2][3][6][10][11][14][15]. In this paper, we suggest a low-cost, reliable broadcasting method with switched beam antenna which enables directional transmission on mobile ad-hoc network. Our method manages broadcasting storm with DPDP and in case of ACK implosion, we applies SART selectively. By simulation, we proved our method quite reduced both of broadcast storm and ACK implosion and enables reliable transmission with directional antenna on mobile ad-hoc network. This instruction file for Word users (there is a separate instruction file for LaTeX users) may be used as a template. Kindly send the final and checked Word and PDF files of your paper to the Contact Volume Editor. This is usually one of the organizers of the conference. You should make sure that the Word and the PDF files are identical and correct and that only one version of your paper is sent. It is not possible to update files at a later stage. Please note that we do not need the printed paper. We would like to draw your attention to the fact that it is not possible to modify a paper in any way, once it has been published. This applies to both the printed book and

‐

‐

‐

‐

‐

‐

‐

‐

‐

‐

BSART for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network

105

the online version of the publication. Every detail, including the order of the names of the authors, should be checked before the paper is sent to the Volume Editors.

2 System Model The Mobile ad-hoc network that discussed in this paper is divided by not overlapped K sectors and we supposed that each sector contains switched beam antennas which controls each sector. Let Go where the transmission gain using omni-directional antenna, Gd where the transmission gain using directional antenna, in general the following inequality comes, Gd>Go. In case that omni-directional antenna using 10dBm power reaches 250m, but using the same antenna which beam angle setted by 60 , it reaches 450m[16]. A switched beam antenna that using only one antenna element at a time, omni-directional broadcasting can be realized by sequential sweeping process[16]. In other words, a clockwise antenna element 0, 1, 2, ..., K-1 transmits messages with constant delay. If it transmits only special antenna elements group, it can realize selective flooding, too. Let dd=λdo(where λ>1, dd: reaching distance using directional antenna, do: reaching distance using omni-directional antenna), the reaching area using directional antenna is larger than area using omni-directional antenna for λ2 times, so we can regard network model that increasing λ2 times node per neighbor node. The mobile ad-hoc network can be described by unit disk graph G=(V,E) where V is set of wireless mobile nodes and E is set of node's edge. A edge (u,v) E means wireless link between node u and node v which can reach each other. We suppose that all wireless links (u,v) satisfy symmetrical property. In other words, if u can transmit messages to v, v can transmit to u, too. We supposed u's neighbor nodes to u can reach and declare u's neighbor nodes set to N(u). By definition, u N(u). If we declare u's 2-hop neighbor nodes set to N(N(u)) or N2(u), a inequality {u} N(u) N2(u) is established and N(v) N2(u) follows if v N(u). If we declare Nh(u) that within h-hop nodes from u and Hh(u) that h-hop nodes from u, a following equation comes, Nh(u) = Nh-1(u) Hh(u) where h≥1 and N0(u) = H0(u) = {u}. For the convenience, we omit subscript if h=1.

∘

∈

∈

⊆

∪

⊆

∈

⊆

Fig. 1. Omnidirectional antenna and directional antenna, directional antenna which consist of 6 antenna elements(K=6)

106

I. Han, K.-W. Rim, and J.-H. Lee

Fig. 2. An example using 4 antenna elements

∪

∪

Fig. 2 describes N2(1) = N(1) H2(1) ={1,2,3,4,5,9,10} {6,7,8,11} = {1,2,3,4,5,6,7,8,9,10}. Nodes can communicate directly with antenna element i, where the nodes which using unoverlaped K antenna elements, so to speak 1-hop away nodes set declared to Ni→(u). Then Ni→(u) N(u) and N(u)= N0→(u) N2→(u) ... NK-1→(u) {u}. A degree of node u is |N(u)|-1 = |N0→(u)| + |N2→(u)| + ... + |NK-1→(u)| where |Ni→(u)| is the number of nodes that belongs to Ni→(u). We suppose that antenna element's direction for every node maintains fixed direction by using magnetic compass or etc.. Because radiowave travels straight, there are diagonal relationship established between antenna elements for u and v(where u N(v)) communicate each other. In other words, the antenna j where 0≤j≤K-1 which transmit messages u to v, the antenna that v uses must (j+(K/2)) mod K. In fig. 2, the antenna is 1 when node 2 transmit messages to node 8, so node 8 can receive message from node 2 via antenna 3. If Dv→u = {i|u Ni→(v)}, Dv→V= w∈V Dv→w where V is nodes set that satisfy V N(v). For example, D8→2 = {3}, N(10) = {1,2,4,9}, D10→N(10) = D10→1 D10→2 D10→4 D10→9= {0} {1} {0} {1} = {0,1} in fig. 2. In this paper, we suppose that node u broadcast HELLO periodically for obtain neighbor node's state information. In other words, node v that receives HELLO from u, transmits HELLO to u via piggybacking to communicate with 1-hop neighbor node N(v).

⊆

∪

∪ ∪

∪

∈

∪

⊆ ∪ ∪

∈

∪ ∪ ∪

∪

BSART for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network

107

3 BSART: Broadcasting with Selected Acknowledgement and Repeat Transmissions Suppose that node v gets self forwarding node set F(v) and dead-end set D(v) using DPDP. Then v gets nodes set Ti to transmit message that not classified F(v) and D(v) per antenna element 0, 1, ..., K-1, where Ti = Ni→(v) {F(v) D(v)}, i means antenna element's ID, and 0 i K-1. Then v gets nodes set Ti to transmit message that not classified F(v) and D(v) per antenna element 0, 1, ..., K-1, where Ti = Ni→(v) {F(v) D(v)}, i means antenna element's ID, and 0 i K-1. In case that |Ti| exceeds a constant number then nodes transmit messages repeatedly constant times(for convenience, we call this A-method), otherwise nodes identify message reception via ACK(for convenience, we call this Rmethod). For example, to prohibit receiving 3 messages per antenna simultaneously, set c=3. It can increase conjestion by ACK and messages generated simultaneously, if ACK identification(A-method) just as Low-Wu and method that get oppotunity from neighbor nodes minimum 2 times are applied at the same time for the area that mixed F(v) and D(v), via each antenna element i because a network with directional antenna, each antenna can control separately[10]. And if one node receives ACK heavily, the ACK implosion occurs and this situation cause not only performance decrease, extreme delay. Let the M(v, s, seq#, F(v), mode, DATA) is message to broadcast where v is ID of forwarding node, s is broadcast message source node's ID, seq# means sequential number of broadcast message that generated by s. s and seq# are used for identifying overlaped or not. Data means transmission message. F(v) is forward node set that acquired by DPDP. Besides, all v must get Rv which antenna elements set that to apply A-method and Av which antenna elements set that to apply R-method. Then for every antenna element i where i {0,1, ..., K-1}, v calculates Ti = Ni (v) {F(v) D(v)}. If |Tj|
≤≤

≤≤

∈

• • • • • • • • • • •

≥

∩

∪

∩

∪

→ ∩

∪

D(v): dead-end node set for v F(v): forward node set for v Ni→(v): neighbor node set which v can transmit message with antenna element i TXmax: throughput of retransmission WAITmax: waiting time to receive ACK Tint: time slot for transmission M tx_cnti: the number of transmission times via antenna element i timeri wait: timer that waiting ACK after transmits M via antenna element i Av: antenna element set for v to apply A-method Rv: antenna element set for v to apply R-method Piv: nodes set that response with ACK when receives M via antenna element i where i Av where Piv = Ni→(v) {F(v) D(v)}. If, i Rv, Piv =Ф.

∈

∩

∪

∈

108

I. Han, K.-W. Rim, and J.-H. Lee

• • •

ack_requ: in case of waiting ACK from neighbor node u where | Piv |
∈

∩

∪

Algorithm: BSART(Broadcasting with Selected Acknowledgements and Repeat Transmissions) input: M(u, s, seq#, F(u), Pu, mode, DATA), c output: Av, Rv, F(v), M(v, s, seq#, F(v), mode, DATA)

∈

initial state: tx_cnti=0; Av=Rv= Ф; for all i {0,1,...,Ki

1}, P

v

= Ф

// supposed that ACK can not be lossed case 1: nove v is a broadcast source 1.1 v=s; seq#=seq#+1 1.2 B(u,v) = N(v); U(u,v) = H2(v) // u = Ф 1.3 jump to 2.4 case 2: in case that receive M(u, s, seq#, F(u), mode, DATA) from neighbor node u 2.1 if mode=A, v F(u), execute followings, otherwise jump to 2.2 // ACK transmission 2.1.1 if M is overlapped, stop. 2.1.2 transmit ACK(v, u, s, seq#) via antenna element f which received M, jump to 2.3 2.2 if M is not overlapped, receive M and stop // dead-end node 2.3 B(u,v) = N(v)-N(u); U(u,v)= H2(v)-N(u)-N(N(v)∩ N(u)) 2.4 calculate F(v) with DPDP 2.5 for each antenna element i(i=0 to K-1), calculate i P v = Ni→ (v)∩{F(v) D(v)} then execute followings i 2.5.1 i where | P v |≥c execute followings, or jump to 2.5.2 // R-method 2.5.1.1 Rv = Rv {i}. i 2.5.1.2 for all x where x P v, ack_reqx=0

∈

∪

∪

∈

2.5.1.3 mode=R; Pv=Ф. // retransmission mode i 2.5.2 if | P v |
∪

∈

∈

∈

2.6.2 if i Av, execute followings via antenna element i

BSART for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network

109

2.6.2.1 timeri:wait = WAITmax 2.6.2.2 transmit M(v, s, seq#, F(v), mode, DATA) case 3: in case that receive ACK(w, v, s, seq#) from neighbor node w via antenna element f 3.1 if f Av and ack_reqw=1, execute followings 3.1.1 ackedw=1; ack_reqw=0; f f f 3.1.2 P v= P v -w. if P v =Ф, timerf wait = Ф

∈

i

case 4: when timer timer wait has expired and tx_cnti < f TXmax, P v ≠Ф i 4.1 timer wait = WAITmax; tx_cnti = tx_cnti+1. f 4.2 set F(v)= P v transmit M via antenna element i In case that broadcast source node is 0, we will apply BSART. In this case, we suppose that c=2, so when only one forward node per antenna element, nodes response with ACK and suppose that if node v transmit M with only one directional antenna element, put the transmission success probability p set to 1/2. By algorithm 2.3 and 2.4, we will get B(Ф,0) = N(0) - N(Ф) = N(0) = {0,1,2,3,4,6,8}, U(Ф,0) = H2(0) - N(Ф) - N(N(Ф)∩N(0)) = H2(0) ={5,7,9}, F(0) = {2,4,8} (F(0) = {3,6,8}, too). By algorithm 2.5 and 2.6 we will get R0 ={0,1}, A0 = {2,3}, P2 0={4}, P30={6}. Because the transmission success probability p=1/2, M will be transmitted 2 times with antenna 0, 1. By 2.5.2, node 2, 3 included A0 will wait ACK. That is to say ack_req4=1, ack_req6=1. Same way node 0, 4, 8 which included F(0)={2,4,8}, F(2)=Ф, A2={0}, R2=Ф, P02={7} F(4)=Ф, A4={3}, R4=Ф, P34={5}, F(8)=Ф, A8={1}, R8=Ф, P18={9}. If supposed that ACK can not be lossed, consider messages that generated by BSART algorithm in case c=2. A message transmission that required ACK occurs 5 times with A-method, so ACK occurs 5 times. In case that applied R-method, messages are transmitted by antenna element 0, 1, broadcasting is accomplished 2 times per each antenna element regardless the number of receiver nodes. In other words, number of message which twice retransmission, so total number of occured message is 14(=5+5+2×2). Because receive nodes can receive with 4 antennas, node 0 generates 4 messages. In the same way, node 2 generates 2 messages, node 3, 4, 5, 8 generates 1 message each other. If transmission successes without ACK implosion, ACK occurs from forward node set {1,2,3,4,5,8,9} for each except source node 0. Therefore 7 messages are generated and this is more than BSART. Moreover it is the minimum number that can be generated besides at the node 0 can occur ACK implosion if it receives 2 ACK with antenna 0, 1.

4 Experiments and Evaluation We considered 1000×1000 array with 20, 40, 60, 80, 100 nodes and nodes distributed by random number generator. Table 1 shows major parameters. Compared protocols are BF(blind flooding), HHH, SHJ, and etc. [8] [16]. The BF generates many overapped messages but the message transmission ratio is high.

110

I. Han, K.-W. Rim, and J.-H. Lee

The speed of node is set to 0-20m/s and we supposed random-way point. Considered HHH, each forward node set one forwarding node per each direction. A designated node transmits message to all direction except received direction when message received. In the SHJ algorithm, each node u forwards broadcast message with neighbor node information N(u), and v which receives that message f direction which satisfies Nf(v)-N(u){u} Ф. The simulation items as follows. forwarding direction (antenna element) ratio message forwarding ratio by number of nodes and antenna elements and movement speed ACK message processing time as number of antenna elements

≠ □ □ □

Table 1. Major parameters for simulation Parameter

Value

TXmax K WAITmax c p

4 4, 6, 8 10 2, 3, 4, 5 0.3

The experiments carried with NS-2 simulator and we programmed each module with C++ and Tcl/Tk. Fig. 3 shows selected antenna elements ratio which in order to broadcast in case that put the number of nodes to 20, 40, 60, 80, 100. In the case of BSART, less than 30% antenna elements are used, that is, 1.2(=4×0.3) per nodes and it is very similiar to HHH algorithm. In the case of BF algorithm, as number of nodes increase, messages are transmitted to all direction. SHJ case, it uses 2.4 the maximum. Fig. 4, 5, 6 shows message transmission ratio as number of nodes, antenna element and movement speed respectively. As number of nodes increases, the transmission

Fig. 3. Antenna element ratio per node

BSART for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network

111

Fig. 4. Message delivery ratio per nodes

ratio increases and HHH shows the lowest transmission ratio. Except BF, the transmission ratio is similar to each other and BSART and DCB show almost 100% in case that number of nodes is over 60. Fig. 5 also shows transmission ratio when c=2, 3, 4, 5 to apply A-method for antenna element 1, 4, 8. In general, as c increases, A-method is used frequently and it leads message transmission with ACK. As c increases, message transmission ratio is increases consequently just as Fig. 5.

Fig. 5. Message delivery ratio per antenna element

Fig. 6 shows transmission ratio as node's movement when number of nodes is 60. BF and SHJ show high ratio regardless node's movement, DCB-SD and BSART show over 90% transmission ratio. HHH shows the hightest ratio as node's movement[11].

112

I. Han, K.-W. Rim, and J.-H. Lee

Fig. 6. Message delivery ratio per mobility

Fig. 7 shows ACK processing time as c constant which decides A-method. As c is smaller and K is bigger, the processing time gets short. As K increases, the number of nodes to transmit messages per antenna element decreases and it leads c to smaller.

Fig. 7. ACK handling time per No. of antenna element

5 Conclusion In this paper, we proposed BSART(Broadcasting with Selective Acknowledgements and Repeat Transmission) that provides bidirectional, low cost and reliable broadcast with switched beam antenna in the mobile ad-hoc network. We considered A-method based ACK per antenna element and R-method which only retransmission messages constant times without ACK to deal with ACK implosion that appears reliable transmission, that is, antenna elements which number of receive nodes over c, let the node retransmit message constant times, otherwise require ACK. By experiments, we proved our algorithm reduces number of broadcast message and ACK, supports reliable message transmission. The condition which under 20m/s movement speed, K=4 and c=2,

BSART for Reliable and Low-Cost Broadcasting in the Mobile Ad-Hoc Network

113

we proved over 90% message transmission ratio by experiments. And close performance analysis of BSART and appliable BSART to sensor network are expected.

References 1. Ni, S., Tseng, Y., Chen, Y., Sheu, J.: The broadcast storm problem in a mobile ad hoc network. In: Proc. MOBICOM 1999, pp. 151–162 (1999) 2. Lim, H., Kim, C.: Flooding in wireless ad hoc networks. Computer Communications 24(34), 353–363 (2001) 3. Lou, W., Wu, J.: On reducing broadcast redundancy in ad hoc wireless networks. IEEE Trans. Mobile Computing 1(2), 111–122 (2002) 4. Basagni, C.S., Conti, M., Giordano, S., Stojmenovic, I. (eds.): Mobile Ad Hoc Networking. IEEE/Wiley (2004) 5. Dai, F., Wu, J.: Efficient broadcasting in ad hoc networks using directional antennas. IEEE Trans. Parallel & Distributed Systems 17(4), 1–13 (2006) 6. Wu, J., Dai, F.: A generic distributed broadcast scheme in ad hoc wireless networks. IEEE Trans. Computers 53(10), 1343–1354 (2004) 7. Boukerche, A., Chlamta, I. (eds.): Handbook of Algorithms for Mobile and Wireless Networking and Computing. CRC Press, Boca Raton (2005) 8. Hu, C., Hong, Y., Hou, J.: On mitigating the broadcast storm problem with directional antennas. In: Proc. IEEE ICC 2003 (2003) 9. Impett, M., Corson, M.S., Park, V.: A receiver-oriented approach to reliable broadcast ad hoc networks. In: Proc. WCNC 2000, pp. 117–122 (2000) 10. Low, W., Wu, J.: A reliable broadcast algorithm with selected acknowledgements in mobile ad hoc networks. In: Proc. IEEE GLOBECOM 2003 (2003) 11. Low, W., Wu, J.: Double-covered broadcast(DCB): a simple reliable broadcast algorithm in MANETS. In: Proc. IEEE INFOCOM 2003 (2003) 12. Spohn, M., Garcia-Luna-Aceves, J.J.: Improving route discovery in on-demand routing protocols using two-hop connected dominating sets. Ad Hoc Networks 4(4) (July 2006) 13. Lou, W., Wu, J.: Localized broadcasting in mobile ad hoc networks using neighbor designation, Technical Report, Dep’t of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL (July 2003) 14. Qayyum, A., Viennot, L., Laouiti, A.: Multipoint relaying for flooding broadcast message in mobile wireless networks. In: Proc. 35th Hawaii Int’l Conf. System sciences (HICSS35), January 2002, pp. 3898–3907 (2002) 15. Alagar, S., Venkatesan, S., Cleveland, J.: Reliable broadcast in mobile wireless networks. In: Proc. MILCOM 1995, pp. 236–240 (1995) 16. Choudhury, R.R., Vaidya, N.H.: Performance of ad hoc routing using directional antennas. Ad Hoc Networks 3(2), 157–173 (2005) 17. Shen, C.C., Huang, Z., Jaikaeo, C.: Directional broadcast for ad hoc networks with percolation theory, Tech. Report, Comp. and Info. Sciences, Univ. of Delaware (February 2004)

DPDP: An Algorithm for Reliable and Smaller Congestion in the Mobile Ad-Hoc Network Ingu Han1,∗, Kee-Wook Rim2, and Jung-Hyun Lee1 1

Dept. of computer science & information technology, Inha University 2 Dept. of computer & information science, Sunmoon University [email protected], [email protected], [email protected]

Abstract. The PDP(Partial Dominant Pruning) method is known as most practical method to reduce overlapped broadcasting messages by designating forward node as in-fly type when broadcasting occurs in the mobile wireless network with directional antenna. In this paper, we introduce DPDP(Directional PDP) that reduces not only number of nodes but number of used antenna elements simultaneously. By simulation, we proved our algorithm reduces number of forwarding nodes per antenna element and number of overlapped messages that each node receives compare to PDP though the number of antenna elements are increasing rather than in case of using omnidirectional antennas. Keywords: partial dominant pruning, selected broadcasting, mobile ad-hoc network.

1 Introduction Because all nodes roll not only host, but router, it is necessary to use broadcasting in the mobile ad-hoc network to find routing path to a certain node or discover locational information. In general, to deal with Broadcast storm, the method that message forwarding is performed by only fixed nodes in the mobile ad-hoc network are used[1][2][3][4]. These forwarding nodes come under CDS(Connected Dominent Set) to the network, but finding the lowest cost CDS is known as NP-complete problem. The heuristic to designate CDS is consists of source-independent broadcasting and source-dependent broadcasting[1][3][5][6]. A source-independent broadcasting consists only one CDS per given network and source-dependent broadcasting consists CDS based on broadcasting node by in-fly form. So source-dependent broadcasting method can have a few CDSs but source-independent broadcasting does not. In general, source-independent broadcasting garantees smaller number of nodes but sourcedependent broadcasting is fit for dynamic situation. The mobile ad-hoc network with directional antenna is known as good for bandwidth use and power consumption and can reduce interference with neighbor node, but due to technical difficulty, the research which broadcasting method with directional ∗

"This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) Support program supervised by the IITA(Institute of Information Technology Advancement)" (IITA-2009-C1090-0902-0020.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 114–120, 2009. © IFIP International Federation for Information Processing 2009

DPDP: An Algorithm for Reliable and Smaller Congestion

115

antenna in the mobile ad-hoc network has started lately. Most researches is research that designates CDS using source-independent broadcasting method and research that try to reduce redundancy broadcasting messages by considering antenna’s direction[6][7]. But there are no research that designates message forwarding nodes set based on broadcasting node like this paper. In this paper, we proposed directional partial dominant pruning that expanded version of PDP which reduces not only number of antenna elements but also number of forwarding nodes[1]. By simulation, we proved that our algorithm is superior than legacy PDP from the viewpoint of reducing number of nodes and antenna elements.

2 Network Model Fig. 1 shows omnidirectional antenna and directional antenna, Fig. 2 shows switched beam antenna that divided to fanwise sector for 360 and each sector has antenna element of it’s own[8][9]. In case that omni-directional antenna using 10dBm power reaches 250m, but using the same antenna which beam angle setted by 60∘ , it reaches 450m[9]. A switched beam antenna that using only one antenna element at a time, omnidirectional broadcasting can be realized by sequential sweeping process[8]. We supposed u's neighbor nodes to u can reach and declare u's neighbor nodes set to N (u ) . By definition, u ∈ N (u ) . If we declare u 's 2-hop neighbor nodes set to N(N(u)) or N 2(u) , a inequality {u} ⊆ N(u) ⊆ N 2(u) is established and N(v) ⊆ N 2(u) follows if v ∈ N (u ) . If we declare N h (u ) that within h-hop nodes from u and H h (u ) that h-hop nodes from u, a following equation comes, H h (u ) = N h−1 (u ) U H h (u ) where h ≥ 1 and N 0 (u ) = H 0 (u ) = {u} . For the convenience, we omit subscript if h = 1 . Nodes can communicate directly with antenna element i , where the nodes which using unoverlaped K antenna elements, so to speak 1-hop away nodes set declared to N i → (u ) . Then N i → (u ) ⊆ N (u ) and N (u ) = N o→ (u ) U N 2→ (u ) U ... U N K −1→ (u ) U {u} .

〫

D

E # " Fig. 1. Omnidirectional antenna and directional antenna

116

I. Han, K.-W. Rim, and J.-H. Lee

> 8

D 3

2

Fig. 2. Directional antenna which consist of 6 antenna elements(K=6)

Fig. 3. An example using 4 antenna elements

Because radiowave travels straight, there are diagonal relationship established between antenna elements for u and v (where u ∈ N (v) ) communicate each other. In other words, the antenna j where 0 ≤ j ≤ K − 1 which transmit messages u to v , the antenna that v uses must ( j + ( K / 2)) mod K . In Fig. 3, the antenna is 1 when node 2 transmit messages to node 8, so node 8 can receive message from node 2 via antenna 3.

DPDP: An Algorithm for Reliable and Smaller Congestion

117

If Dv→u = {i | u ∈ N i→ (v)} , Dv→V = Dw∈V U Dv→w where V is nodes set that satisfy V ⊆ N (v ) . For example, D8 → 2 = {3} , N (10) = {1,2,4,9} , D10→ N (10) = D10→1 U D10→2 U D10→4 U D10→9 = {0} U {1} U {0} U {1} = {0,1} in Fig. 3. In this paper, we suppose that node u broadcast HELLO periodically for obtain neighbor node's state information. In other words, node v that receives HELLO from u , transmits HELLO to u via piggybacking to communicate with 1-hop neighbor node N (v) .

3 Directional Partial Dominant Pruning To apply PDP that designed for omnidirectional antenna model to directional antenna model, we considered followings. •

We modified selection criterion for node that belongs to B(u, v)(= N (v) − N (u )) and covers node under U (u, v)(= H 2 (v) − N (u ) − N ( N (u ) I N (v))) , that is, we selected node p that covers q where q ∈ U (u, v) and Max (| N ( p) I U (u, v) |) where preferentially. If tie occurs, we selected p where p ∈ B (u , v ) Max (| N ( p ) I U (u , v ) |) , and if tie orrurs again, we select random node. Then we found out antenna elements set Dv→B (u ,v ) which used for message for-

•

warding to node belongs to B(u, v) , that is, 1-hop node from v which have to receive message including selected F (v) . Not a like omnidirectional antenna model, directional antenna model must transmit to only receiving node’s direction to reduce interference and waste of bandwidth. Algorithm: DPDP(Directional Partial Dominant Pruning)

N (v) , N 2 (v) , F (u ) output: F (v) , Dv → B ( u , v ) input:

initial state:

F (v) = Dv→B (u ,v ) = φ

B (u , v) = N (v) − N (u ) , U (u, v) = H 2 (v) − N (u ) − N ( N (u ) I N (v))

1.

2. if t where following 2.1 2.2

t ∈ U (u , v)

is covered by

s

where

s ∈ B (u , v) ,

F (v) = F (v) U {s} , Dv→B ( u ,v ) = Dv→B ( u ,v ) I Dv→s

B (u , v) = B (u , v) − s , U (u , v) = U (u , v) − {N ( s ) I U (u , v)} U (u , v) = φ = Dv→B ( u ,v ) U {i} for p where

3. perform repeatedly until 3.1 find

Dv→B (u ,v )

Max(| N ( p ) I U (u , v) |) , if tie occurs then jump to,

do

118

I. Han, K.-W. Rim, and J.-H. Lee

otherwise jump to 3.3

Max(| N ( p ) I U (u , v) |) , if tie occurs again then find p where Max(| H ( p ) |) , and if tie occurs again then select random p , then find Dv→ B ( u ,v ) = Dv→ B ( u ,v ) U Dv→ p 3.3 F (v) = F (v ) U { p} B (u , v) = B (u , v) − p U (u , v) = U (u , v) − {N ( p ) I U (u , v)} 3.2 find

p

Fig. 3 shows that

where

F (v) in case of node 2 is broadcasting source. In node 2,

B (φ ,2) = N (2) − {} = {1,2,7,8,9,10} , U (φ ,2) = H 2 (2) − N (φ ) − N ( N (2) I N (φ ) = {3,4,5} . Select a

node that belongs to B (φ ,2) and covers maximum number of node in U (φ ,2) . Because | N 0 → (1) I U (φ ,2) |=| {3,5} |= 2 and N (1) I U (φ ,2) = U (φ ,2) , we will get F (2) = {1} and D2 → B (φ , 2 ) = {0,1,2,3} . In node 1, we will get B (2,1) = N (1) − N (2) = {1,2,3,4,5,9,10} − {1,2,7,8,9,10} = {3,4,5} , U (2,1) = H 2 (1) − N (2) − N ( N (2) I N (1))

4 Simulation and Evaluation To evaluate propose algorithm, we considered 1000×1000 array with 20, 40, 60, 80, 100 nodes and nodes distributed equally. • • •

number of forwarding nodes average number of forwarding nodes per antenna element number of redundancy messages per node

The experiments carried with NS-2 simulator and we programmed PDP and DPDP module with C++ and Tcl/Tk. For convinience, we do not consider MAC and physical layer. We tested in case K(=number of antennas per node)=1, 4, 8 and do not consider node’s mobility. Fig. 4 shows number of forwarding nodes that selected via DPDP algorithm. In case of using directional antenna increases number of used antennas rather than K=1, that is, omnidirectional antenna, but the difference is within 5. This is caused by algorithm 3.1 that the algorithm select node which covering the maximum number of neighbor node per antenna element. As number of antennas increase, the number of forwarding nodes increases but the difference is not considerable. Fig. 5 shows the number of forwarding nodes per antenna, that is, number of forwarding nodes divideded by K. It means interference rate, power consumption indirectly. As K larger, decrease number of forwarding nodes, so the performance is enhanced. And it also reduces ACK implosion problem[10]. As K goes bigger, the difference of number of nodes are larger. For example, in case of K=4, 8, the difference goes to 230% than K=1. It means that DPDP is profitable for directional antennas in the mobile ad-hoc network.

DPDP: An Algorithm for Reliable and Smaller Congestion

119

Fig. 4. Relation of No. of nodes and No. of Forward nodes

Fig. 5. No. of forwarde nodes per antenna element

Fig. 7 shows the number of redundancy messages, and in case K=8, redundancy messages occur under 2. In case of K>1, nodes receive message from fixed direction as compare to K=1(omnidirectional antenna), so K goes bigger, the duplication ratio gets smaller.

Fig. 6. Redundancy ratio per node

120

I. Han, K.-W. Rim, and J.-H. Lee

In case K=8, the duplication ratio reduced 160%~190%. As a sumulation result, our algorithm proved that superior than legacy algorithm in many aspect and very useful.

5 Conclusion In this paper, we proposed directional partial dominant pruning that expanded version of PDP which reduces not only number of antenna elements but also number of forwarding nodes. By simulation, we proved that our algorithm is superior than legacy PDP from the viewpoint of reducing number of nodes and antenna elements. So the algorithm select a node p where Max(|Ni (p) U(u,v)|), p B(u,v) that covers q where q U(u,v) preferentially. And to reduce redundancy messages, the algorithm found antenna element set Dv B(u,v). As a sumulation result, our algorithm proved that superior than legacy algorithm in many aspect and very useful. Finally, the research that allows node’s mobility and using MAC layer is required.

∈

→

→ ∩

∈

References 1. Lou, W., Wu, J.: On reducing broadcast redundancy in ad hoc wireless networks. IEEE Trans. Mobile Computing 1(2), 111–122 (2002) 2. Ni, S., Tseng, Y., Chen, Y., Sheu, J.: The broadcast storm problem in a mobile ad hoc network. In: Proc. MOBICOM 1999, pp. 151–162 (1999) 3. Lim, H., Kim, C.: Flooding in wireless ad hoc networks. Computer Communications 24(34), 353–363 (2001) 4. Basagni, C.S., Conti, M., Giordano, S., Stojmenovic, I. (eds.): Mobile Ad Hoc Networking. IEEE/Wiley (2004) 5. Wu, J., Dai, F.: A generic distributed broadcast scheme in ad hoc wireless networks. IEEE Trans. Computers 53(10), 1343–1354 (2004) 6. Dai, F., Wu, J.: Efficient broadcasting in ad hoc networks using directional antennas. IEEE Trans. Parallel & Distributed Systems 17(4), 1–13 (2006) 7. Hu, C., Hong, Y., Hou, J.: On mitigating the broadcast storm problem with directional antennas. In: Proc. IEEE ICC 2003 (2003) 8. Ramanathan, R., Redi, J., Santivanez, C., Wiggins, D., Polit, S.: Ad hoc networking with directional antennas: a complete system solution. IEEE JSAC 23(3) (2005) 9. Ramanathan, R.: On the performance of ad hoc networks with beamforming antennas. In: Proc. MobiHOC 2001, pp. 95–105 (2001) 10. Impett, M., Corson, M.S., Park, V.: A receiver-oriented approach to reliable broadcast ad hoc networks. In: Proc. WCNC 2000, pp. 117–122 (2000)

Development of Field Monitoring Server System and Its Application in Agriculture Chang-Sun Shin1, Meong-Hun Lee1, Yong-Woong Lee1, Jong-Sik Cho1, Su-Chong Joo2, and Hyun Yoe1 1

School of Information and Communication Engineering, Sunchon National University, Korea {csshin,leemh777,ywlee,cho1318,yhyun}@sunchon.ac.kr 2 School of Electrical, Electronic and Information Engineering, Wonkwang University, Korea [email protected]

Abstract. In agricultural field, environmental factors such as temperature, humidity, solar radiation, CO2, and soil moisture are essential elements which influence on growth rate, productivity of produce, sugar content of fruit, acidity and etc. If we manage the above mentioned environmental factors efficiently, we can achieve improved results in production of agricultural product. For monitoring and managing the growth environments, this paper suggests the Field Monitoring Server System (FMSS) which can operate with solar power. We implemented the Ubiquitous Field Server System (UFSS) in our previous work. Compared with the UFSS, this FMSS enhanced or improved the power consumption, the mobility, and user-friendly environment monitoring methods. The system collects environmental data directly obtained from environment sensors, soil sensor and CCTV camera. To implement a stand-alone system, we applied a solar cell panel to operate this system without power source. To indicate the location of this system, a Global Positioning System (GPS) module is installed on the system. Finally, we confirmed that the FMSS monitors the field conditions by using various facilities and correctly operates without helping external supports.

1 Introduction The information technologies including wireless sensors, ubiquitous computing and communication devices continuously apply and adapt to agricultural filed for creating newly productive methods [1,2]. The growth of agricultural product demands intensive field data acquisition. When we raise agricultural products, the field data are important factors to decide various methods of cultivation. If we can collect precision field data, the productivity and growth rate of products could be improved. Moreover, in large-scale agriculture, field monitoring is an important work [3,4]. We need effective field monitoring technologies to improve agricultural productivity. For developing advanced agricultural techniques based on IT, we propose the Field Monitoring Server System (FMSS) that integrates IT devices and solar-power module into a single system. This system collects and monitors the information occurred from given field environments and the system’s location. The rest of this paper is organized as follows: Section 2 explains related works. Section 3 describes the system architecture of the FMSS and then Section 4 presents S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 121–130, 2009. © IFIP International Federation for Information Processing 2009

122

C.-S. Shin et al.

implementation of our system. Finally, we discuss conclusions and future work in Section 5.

2 Related Works Environment factors, like light, water, temperature, and soil, are the essential elements of agriculture. Some of the researchers have studied agricultural field monitoring. The Jet Propulsion Lab. in NASA studied the solar environment sensors to monitor temperature, humidity and oxygen of environment and soil. They applied the Sensor Web 3.1 with low power and small size [5]. The phytech Co. in Israel developed a plant growth monitoring system. This system measures the environment status by using sensors adhered to plant and sends information to farmer’s home via internet [6].

Fig. 1. A plant growth monitoring system of the phytech co.

Tokyo Univ. in Japan monitored several farmlands in Asia. They collect environmental data and soil information of the farmlands using a stand-alone system [7]. Above researches are not considered the communications of systems or sensors and solar power. They collect environmental data at one spot of field. In large-scale agricultural field, we need to collect environmental data at several spots. Our system adopts Ubiquitous Sensor Network (USN) and the solar battery. Also we develop the FMSS as a stand-alone system integrating sensors, CCTV camera, Solar, database, web server and GPS module.

3 Field Monitoring Server System (FMSS) Architecture The Field Monitoring Server System (FMSS) can collect real-time environmental field data from various sensors in physical layer and implement the agriculture application services including real-time monitoring at the higher layer, application layer. Figure 2 shows the architecture of the FMSS.

Development of Field Monitoring Server System and Its Application

123

Fig. 2. Field Monitoring Server System’s architecture consisting of three layers

Our suggested FMSS consists of three layers. The each layer and components of each layer explains in detail as follows. The physical layer includes various sensors, GPS module, CCTV camera, and solar cell. This system reduces the electric power consumption by using the low-power embedded board consisted of CPU, AD converter, DA converter, Ethernet controller, and wireless LAN. Also the system includes database and web server. Our system is a self-charging stand alone system using solar-electric power. The middle layer has the soil manager, the location manager, the motion manager, the information storage, and the web server. The sensor manager manages the information from soil sensor and environment sensors. The location manager interacts with GPS module at the device layer. This manager stores and manages location data of the system. The motion manager provides stream data of field status for users. The information storage stores the information of the physical devices to database. The web server provides environment information for users from physical devices via internet. The application layer provides users with the environment monitoring service, the location monitoring service, and the motion monitoring service. These three layers are integrated into the FMSS. By interacting with each layer, the system provides field environment information to farmers. We had ever implemented the Ubiquitous Field Server System (UFSS) in previous work [8]. Compared with the UFSS, this FMSS enhanced or improved the power consumption, the mobility, and user-friendly environment monitoring methods. And we have a field test in yard for verifying the system executability.

124

C.-S. Shin et al.

3.1 Environment Monitoring Service The environment monitoring service shows data collected from soil and environment sensors in physical layer. First, this service sends the raw data of environment sensors to the sensor manager. The raw data are temperature, humidity, soil Electronic Conductivity (EC) ratio, CO2 and illumination of field. Then, the sensor manager changes raw data into digital information and stores the information to the information storage. The web server shows this environment information to users. Figure 3 shows procedure of the environment monitoring service.

Fig. 3. Procedure of the environment monitoring service

3.2 Location Monitoring Service This service monitors the system’s location in field. First, the GPS Module sends the system’s location data to the location manager. Then the sensor manager stores the location data to the information storage. And the web server provides the location of the system for users. Figure 4 explains the procedure of this location monitoring service.

Fig. 4. Procedure of the location monitoring service

Development of Field Monitoring Server System and Its Application

125

3.3 Motion Monitoring Service This service provides the motion data by using CCTV camera for users. First, the CCTV camera sends stream data to the motion manager. The motion manager stores the stream data to the information storage. The web server shows the stream data to the user via internet. Figure 5 describes the procedure of the motion monitoring service.

Fig. 5. Procedure of the motion monitoring service

4 Development of the FMSS In this chapter, we develop the FMSS. Figure 6 shows the whole system model. The FMSS consists of autonomous systems. The system has a solar cell, a storage battery, and a low-power embedded board. The system stores electric power in the daytime and uses it in the nighttime.

Fig. 6. FMSS model including an embedded board, a GPS, solar-charging devices and sensors

126

C.-S. Shin et al.

4.1 System Components This system includes of physical devices and software modules. We explained the software modules in Chapter 3. The physical devices have sensing and information gathering devices. You can see the devices attached in our system in Figure 7.

Fig. 7. Solar cell, soil sensor, network sensor node, GPS module and CCTV camera in the FMSS Table 1. Power consumption of each module and Power supply of solar battery Power consumption Voltage Current Power

Module GPS Receiver

DC 5V

0.2A

1W

Embedded Board

DC 9V

500mA

5W

CCTV Camera

DC 9V

400mA

4W

Soil Sensor

DC 5V

10mA

0.05W

Environment Sensor

DC 3V

2.3A

6.9W

DC 12V

1.4A

16.95 W

TOTAL Module Solar Cell Battery

Voltage DC 26.4V Voltage DC 12V

Supply Power Current 7.6A Capacity(20HR) 64A

Power 200W

Table 1 is showing the power consumption of equipped modules and solar power supply in the FMSS. The total power consumption of equipped modules like GPS receiver, embedded board, CCTV camera, soil sensor, and environment sensor is 16.95W. The solar cell supplies with electric power of the maximum 200W in the 25℃ test environment. This is enough to operate the system. And, you can see the main system installed sensors’ data receiver, database, and the web server in Figure 8 and a solar battery in Figure 9. Now, we integrate above components into the system. Figure 10 shows the FMSS’s prototype including the software modules. The FMSS can apply various environments such as precision agriculture, livestock monitoring and greenhouse monitoring. For executing field test, we have been deployed the system in wild field.

Development of Field Monitoring Server System and Its Application

Fig. 8. Embedded board including environment sensor receiver, database, and the web server

127

Fig. 9. Soil sensor receiver, solar battery

Fig. 10. Prototype of the FMSS deployed in wild filed

4.2 Implementation Results Figure 11 is showing the results executed from FMSS’s GUI from web server. The (a) in the figure presents the real-time motion from the CCTV camera. The (b) is showing location of the current system via GIS map. We used the GPS data to map

128

C.-S. Shin et al.

the location. (c), (d) and (e) in Figure11 are showing the sensing value from the soil sensor, the sensing value from the environmental sensors and the average temperature, respectively.

Fig. 11. A GUI for the FMSS’s application

Figure 12 is showing the position of the FMSS. We can confirm the system’s location or movement through the location monitoring service of the FMSS.

Fig. 12. Location of the FMSS on the map

To confirm the successful operation of the FMSS using solar cell, we performed field test on a sunny day with a mean temperature of 25 degree. Figure 13 showed a

Development of Field Monitoring Server System and Its Application

129

graph of field test result in power consumption. As a final result, a solar cell which charged about 10 hours can support the operation of the FMSS about 24 hours. Hence, our FMSS can operate with the support of solar cell in the field without wired link or additional recharging process.

Fig. 13. Comparing estimated lifetime with actual lifetime of the system

5 Conclusions This paper proposed the Field Monitoring Server System (FMSS) that can collect and monitor the environmental information occurred from given field and the system’s location. Also, for verifying the executability of our system, we implemented the FMSS prototype and showed the executing results of the system. From this result, we confirmed that the FMSS monitors the field conditions by using various facilities and correctly operates without helping external supports. Also this FMSS enhanced or improved the power consumption, the mobility, and user-friendly environment monitoring methods. The FMSS can be powerful system to solve fundamental problem in large-scale agricultural area. In the future, we are to developing an improved monitoring system which operates under CDMA or other technology based USN and applies into the reference point control in GIS field. Acknowledgements. This research was supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Advancement) (IITA-2009-(C1090-0902-0047)).

130

C.-S. Shin et al.

References 1. Akyildiz, I.F., et al.: A survey on Sensor Networks. IEEE Communications Magazine 40(8) (2002) 2. Burrell, J., Brooke, T., Beckwith, R.: Vineyard computing: sensor networks in agricultural production. IEEE Pervasive Computing 3(1), 38–45 (2004) 3. Fountas, S., Pedersen, S.M., Blackmore, S.: ICT in Precision Agriculture – diffusion of technology. In: ICT in agriculture: perspective of technological innovation (2005) 4. Tilman, D., Cassman, K.G., Matson, P.A., Naylor, R., Polasky, S.: Agricultural sustainability and intensive production practices. Nature 418, 671–677 (2002) 5. Delin, K.A., Jackson, S.P., Burleigh, S.C., Johnson, D.W., Woodrow, R.R., Britton, J.T.: The JPL Sensor Webs Project: Fielded Technology. In: Space Mission Challenges for IT Proceedings. Annual Conference Series, pp. 337–341 (2003) 6. http://www.phytech.com/products/phytalk_system.html 7. Mizoguchi, M., Mitsuishi, S., Ito, T., Ninomiya, S., Hirafuji, M., Fukatsu, T., Kiura, T., Tanaka, K., Toritani, H., Hamada, H., Honda, K.: Real-time Monitoring of Farmland in Asia using Field Server. In: International Symposium on Geoinformatics for Spatial Infrastructure Development in Earth and Allied Sciences (October 2008) 8. Shin, C.S., Joo, S.C., Lee, Y.W., Sim, C.B., Yoe, H.: An Implementation of Ubiquitous Field Server System Using Solar Energy Based on Wireless Sensor Networks. Studies in Computational Intelligence 209 (2009) 9. Yoe, H., Eom, K.-b.: Design of Energy Efficient Routing Method for Ubiquitous Green Houses. In: 1st International Conference on Hybrid Information Technology (November 2006) 10. Shin, C.S., Kang, M.S., Jeong, C.W., Joo, S.C.: TMO-Based Object Group Framework for Supporting Distributed Object Management and Real-Time Services. In: Zhou, X., Xu, M., Jähnichen, S., Cao, J. (eds.) APPT 2003. LNCS, vol. 2834, pp. 525–535. Springer, Heidelberg (2003) 11. Shin, C.S., Lee, Y.W., Lee, M.H., Park, J.W., Yoe, H.: Design of Ubiquitous Glass Green Houses. Software Technologies for Future Dependable Distributed Systems, 169–172 (March 2009) 12. Kang, B.J., Park, D.H., Cho, K.R., shin, C.S., Cho, S.E., Park, J.W.: A Study on the Greenhouse Auto Control System based on Wireless Sensor Network. In: International Conference on Security Technology, December 2008, pp. 41–44 (2008) 13. Lee, M.H., Be, K., Kang, H.J., Shin, C.S., Yoe, H.: Design and Implementation of Wireless Sensor Network for Ubiquitous Glass Houses. In: 7th IEEE/ACIS International Conference on Computer and Information Science, May 2008, pp. 397–400 (2008)

On-Line Model Checking as Operating System Service Franz J. Rammig, Yuhong Zhao, and Sufyan Samara Heinz Nixdorf Institute, University of Paderborn F¨ urstenallee 11, D-33102 Paderborn, Germany [email protected]

Abstract. A complementary verification method for real-time application with dynamic task structure has been developed. Here the real-time application is developed by means of Model-Driven Engineering. The basic verification technique is given by model checking. However, the model checking is executed at run-time whenever some reconfiguration of the task set takes place. Instead of exploring the entire state space of the model to be checked, only a partial state space at model level covering the execution trace of the checked task is explored. This on-line model checking can be seen as an extension to the traditional schedulability acceptance test which is needed anyway in systems with dynamic task set. Therefore this runtime verification is implemented as a service of the underlying operating system. In this paper we describe this method in general, explain some design and implementation decisions and provide experimental results. Keywords: On-line model checking, Verification service, Real-time operating system.

1

Introduction

Real-time applications are safety critical in many cases. A careful quality assurance process therefore is mandatory. This process includes more and more formal veriﬁcation techniques like model checking. Model checking has the advantage of being fully automated and inherently includes means for diagnosis in case of errors. On the other hand, model checking is substantially confronted with the so called state explosion problem. This means that the state space to be explored grows very quickly to an unmanageable size whenever problems of practical relevance are to be handled. Numerous approaches to overcome this deﬁciency have been developed, like partial order reduction [1], compositional reasoning [2], and other simpliﬁcation and abstraction techniques, which aim to reduce the state space to be explored by over-approximation [3] or under-approximation

This work is developed in the course of the Collaborative Research Center 614 Self-Optimizing Concepts and Structures in Mechanical Engineering - Paderborn University, and is published on its behalf and funded by the Deutsche Forschungsgemeinschaft (DFG).

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 131–143, 2009. c IFIP International Federation for Information Processing 2009

132

F.J. Rammig, Y. Zhao, and S. Samara

[4] techniques. Over-approximation techniques generate an abstract model by adding redundant behaviors to the original one (weaken constraints) such that the correctness at the abstract level implies the correctness of the original model. Under-approximation techniques generate an abstract model by removing irrelevant behaviors from the original one (strengthen constraints) so that the falseness at the abstract level implies the falseness of the original model. Applying these techniques can relieve the state explosion problem to some degree, but can not resolve it totally. That is, the correctness of a complex system with respect to some properties could not always be veriﬁed completely. In this paper we propose a complementary technique, namely on-line model checking (or model-based runtime veriﬁcation) [5,6]. Deferring formal veriﬁcation to the execution phase of a real time application seems to be a strange idea, especially in real-time computing where one prefers to execute oﬀ-line as many activities related to a task as possible. However, we are looking at real-time applications with a highly dynamic task set. For a software system with selfadaptive capability, the task set consists of instances that are activated under various proﬁles. It is based on the actual environmental conditions to decide which proﬁle to be used, i.e, which tasks to be activated. In such real-time applications with dynamic task sets an acceptance test concerning schedulability has to be executed whenever a new task is added to the task set. It seems to be natural to extend this acceptance test by a logical safety test, which may be implemented by means of model checking. But we would be confronted with the state explosion problem again, now even under real-time constraints. To make on-line model checking feasible, we suppose that the real-time application is developed by means of Model-Driven Engineering (MDE) [7], which is an eﬃcient software engineering approach to complex systems development. According to MDE, we can follow three steps to develop a software system: 1. model the system according to the system speciﬁcation, 2. verify the system model against the system speciﬁcation, and 3. synthesize the system implementation (source code) from the system model. Theoretically speaking, the following assertions are supposed to be true: – The system model is consistent with the system speciﬁcation. – The system implementation is consistent with the system model. However, are they really true under any speciﬁc running environment? We try to answer this question by doing model checking at runtime. The basic idea (as shown in Fig. 1) is to check on-line whether the monitored execution trace of the system

Fig. 1. On-line model checking framework

On-Line Model Checking as Operating System Service

133

Fig. 2. Partial system model to be explored

conforms to the system model on the one hand and if a partial system model that covers the execution trace satisﬁes the system properties on the other hand. Here the partial system model is obtained by exploring only such kind of states that can be reached from those current states monitored at runtime as shown in Fig. 2. Intuitively, if this partial system model is checked safe against the system speciﬁcation, and the monitored states conform to the corresponding states in the partial system model, then we have more conﬁdence to the correctness of the actual execution trace. It doesn’t matter even if the rest of the system model might still contain some errors. Of course, sophisticated techniques have to be used to let it really ﬂy, which will be detailed in the sequel. In this way, we obtain a natural solution to the state explosion problem. Instead of looking at the entire state space, we pay our attention only to a partial state space covering the execution trace. As a result, we do not need to simplify or abstract value domains of system variables at all. It is worth mentioning that oﬀ-line model checking is usually valid under the assumption that the platform, on which the real-time application runs, should behave correctly. This assumption is no longer needed for on-line model checking. Commonly used services at run-time are usually provided by the underlying Operating System. This is exactly our approach. We provide on-line model checking as a service of the underlying Real-time Operating System (RTOS). The veriﬁcation service is implemented as isolated task in user space. This isolates model checking from the task to be veriﬁed and makes sure that errors in the task cannot infect the veriﬁcation service. To enhance eﬃciency the veriﬁcation service runs in its own address space which is attached to the kernel address space. The address space of the application is mapped into this veriﬁcation address space as “read only” partition. This avoids cache reﬁlling in case of the context switching between veriﬁcation service and the task to be checked and allows fast access to the task’s state variables by the veriﬁcation service.

134

F.J. Rammig, Y. Zhao, and S. Samara

The state-of-the-art runtime-veriﬁcation is discussed in the literature (see section 4) since years. The basic idea is to monitor the execution of the source code and afterwards to check the so far observed execution trace against the given properties speciﬁed usually by LTL formulas. The checking progress always falls behind the system execution because the checking procedure can continue only after a new state has been observed. In contrast, our runtime veriﬁcation is applied to the model level. The states observed from the execution trace are mainly used to reduce the state space to be explored at the model level. That is, the checking progress is not strictly bound to the progress of the system execution, i.e., our on-line model checking might run ahead or behind the execution of the source code. If the processing speed is fast enough, our runtime veriﬁcation could keep looking certain time steps ahead of the system execution and then tell the real-time application how many time steps ahead are safe.

2

Problem Statement

Without loss of generality, let M = {M1 , M2 , · · · , Mn } model a real-time reconﬁgurable system which consists of n (> 0) components M1 , M2 , · · · , Mn running in parallel. M may reconﬁgure itself at runtime either by adding a new component Mi to or by removing an existing component Mi from M . This also includes replacing one component with another one as shown in Fig. 3, which can be done by consecutively removing and adding operations. The components in M can communicate with each other only through the underlying RTOS. This forms a dependency relationship between the components in M . Without doubt, system reconﬁguration might more or less aﬀect the behavior of the related components in the system. What’s more, the impact of the RTOS on the inter-process communication also might aﬀect the behaviors of the related components in the system. For instance, the component B might be aﬀected most by replacing the component C with the component E in Fig. 3. Could these eﬀects violate some safety conditions associated to the related components in the system?

Fig. 3. A reconfiguration example

On-Line Model Checking as Operating System Service

135

Since the reconﬁguration might occur according to the actual running environment, it is hard to answer this question only by oﬀ-line veriﬁcation techniques due to the unpredictable indeﬁnite factors. Therefore, it is necessary to on-line check at model level if the most aﬀected component in M still maintains safety after the system reconﬁguration. In doing so, we suppose that whenever the real-time application needs to do reconﬁguration, the RTOS is informed about this in advance. As any modiﬁcation of the task set can happen only under the control of the RTOS, this requirement is rational. With the information given by the real-time application, the RTOS will trigger the veriﬁcation service as isolated task in user space and then schedule the veriﬁcation service as earlier than the component to be checked as possible without violating the real-time deadlines. To achieve this, we follow a deterministic approach by reserving a ﬁxed time slot at the beginning of each scheduling cycle of the RTOS. This time slot is mainly reserved for the veriﬁcation service. In case of no active on-line model checking task, the scheduler is allowed to allocate this slot to such preemptive low priority tasks that can be moved and replaced by on-line model checking at any time the veriﬁcation service is triggered. In this way, if the checking process is eﬃcient enough, we can always check at model level what might happen in the near future relative to the current state of the component’s execution. In case that an error is detected or the checking progress falls behind the execution of the checked task, then the real-time application is informed in order to allow it undertaking appropriate counter means. The properties to be checked are safety conditions that might be sensitive to the context of the related component. LTL (or ACTL) formulas are used to formally specify the safety properties, as the discrete time extensions to LTL (or ACTL) formulas are just shorthand notations to the usual LTL (or ACTL) formulas [8].

3

On-Line Model Checking

3.1

Overview

As mentioned in Section 1, we suppose that the real-time application is developed following the MDE approach. In this way, we can model the system in UML with real-time extension1 on the one hand and specify constraints in OCL with realtime extension on the other hand. From the real-time UML model, we can derive an FSM model and synthesize a source code respectively. Since the FSM model and the source code come from the same origin, there exists a mapping function σ from concrete states (derived from the source code) to abstract states (derived from the FSM model). From the real-time OCL constraints, we can derive the LTL (or ACTL) formulas and then transform them into B¨ uchi automata. Having the concrete model (source code), the abstract model (FSM model) and the properties (B¨ uchi automata) at hand, our on-line model checking aims to check 1

http://wwwcs.uni-paderborn.de/cs/fujaba/

136

F.J. Rammig, Y. Zhao, and S. Samara

Safety Checking

Consistency Checking conform?

conform? Büchi automaton

··· Safety Property Real-time ACTL/LTL

À

Abstract State

À

Concrete State

···

··· System Model

Execution Trace

FSM Model

Source Code

Real-time OCL Constraint

Real-time UML Model

Fig. 4. Overview

if the execution trace conforms to the FSM model (consistency checking) and meanwhile if a partial state space of the FSM model conforms to the B¨ uchi automaton (safety checking) as shown in Fig. 4. Here the partial state space reﬂects a near future relative to the current state observed from the execution trace of the system running. 3.2

Model Checking Paradigm

Recall that we have reserved a ﬁxed time slot at the beginning of each scheduling cycle for on-line model checking. Without loss of generality, let the time slot be td time units. After veriﬁcation service is triggered, in each scheduling cycle we have td time units to do on-line model checking from the current state of the task to be checked, which is obtained at the previous scheduling cycle as shown in Fig. 5. Of course, the current (concrete) state should be mapped to the corresponding abstract state at model level to be used by model checking. If the current state could not be mapped to an appropriate abstract state, it means that the execution trace no longer conforms to the behavioral model. In this case, the veriﬁcation service will terminate the checking process and inform the RTOS to deal with this problem. Otherwise, the on-line model checking will continue until one of the following two cases happens: Case No: if at some time point an error is detected, the veriﬁcation service terminates with the answer No to the real-time application via RTOS. Case Yes: if a suﬃcient partial state space that covers the execution trace of the task is successfully checked, the veriﬁcation service reports deﬁnitely Yes to the real-time application via RTOS and then terminates the safety checking process (while the consistency checking can continue if necessary). Notice that the “No” case only means that the detected errors might happen in the future, because we check at model level and thus do not know whether

On-Line Model Checking as Operating System Service verification service

· · ·

CurrentState(si-1)

task to be checked

137

· · · Yes/No

CurrentState(si )

Yes/No

· · ·

· · ·

Fig. 5. Scheduling of verification service and task to be checked

the errors are spurious or not. To avoid the errors really to happen, we have to conservatively choose to inform the real-time application that an error might emerge in the future. That is, the RTOS might raise an exception together with a counterexample (if necessary). How to handle the exception is application domain speciﬁc, thus we do not discuss this here. The implementation of a component is in fact a reﬁnement of the model of the component, i.e., the model is an abstraction of the implementation of the component. Thus, an ACTL/LTL formula being true at the model level implies that it is also true at the implementation level, while its being false at the model level does not imply that it is also false at the implementation level. In this sense, our runtime veriﬁcation is conservative due to its being applied to the model level. However, the advantage of predicting and thus avoiding potential errors are gained just due to its being applied to the model level. Experimental Results. A stand-alone prototype for on-line model checking invariants, LTL and ACTL properties is implemented. We have done some experiments for BEEM2 benchmark set derived from mutual exclusion algorithms, communication protocols and so on in research or industry area. The benchmark set contains only FSM models, so we generate randomly the execution traces from the same FSM models. This can simplify the monitoring procedure of capturing the runtime information (current states) to be used by on-line model checking. In this way, we can estimate the performance of our veriﬁcation service to some degree. Two experiments are done on a Pentium-IV 3.00Ghz processor with 1GB memory running Linux. One experiment is on-line invariant checking. This experiment can help ﬁnd out the inﬂuence of the out-degrees of the states on the look-ahead performance, i.e., how far away the model checking can look ahead from each state of the given model within a predeﬁned time interval. 16 typical models are selected from the BEEM benchmark set to perform on-line invariant checking. The features of these models are given in the number of states, the number of transitions, the average degrees of states, the height of BFS, and the maximal stack of DFS as well as the number of Boolean (state) variables. In this experiment, each transition in the models is set to represent 1 millisecond, i.e., it takes 1 millisecond from one state to next state. We also say one transition being one time step. For each model, this 2

http://anna.fi.muni.cz/models/

138

F.J. Rammig, Y. Zhao, and S. Samara

Model

Type

State

BFS Max. Transition Average Maximal Degree Out-degree Height Stack

Boolean Variables

Transition Minimal Maximal Average timeunit Look-ahead Look-ahead Look-ahead (ms)

sorter_1

Controller

20544

30697

1,5

5

198

617

36

1

40

299

103

collision_1

Communications protocol

5593

10792

1,9

5

57

617

25

1

26

81

48,7

synapse_2

Protocol

61048

125334

2,1

18

41

2349

46

1

7

28

21,5

driving_phils_2

Mutualexclusion algorithm

33173

81854

2,5

9

150

3702

27

1

31

97

65,7

blocks_1

Planningund Scheduling

7057

18552

2,6

6

19

4263

23

1

8

21

14

peterson_1

MutualExclusion Algorithm

12498

33369

2,7

5

54

1862

30

1

13

39

31,7

szymanski_1

MutualExclusion Algorithm

20264

56701

2,8

3

72

2064

27

1

13

90

49,7

hanoi_1

Puzzle

6561

19680

3

3

256

4376

36

1

56

103

75,9

iprotocol_2

Communications protocol

29994

100489

3,4

7

91

443

39

1

18

451

50

phils_3

MutualExclusion Algorithm

729

2916

4

6

17

518

18

1

156

357

265

cyclic_scheduler_1 Protocol

4606

20480

4,4

8

55

1819

40

1

23

437

278

rushhour_1

Puzzle

1048

5446

5,2

9

73

535

28

1

66

248

150,7

rushhour_2

Puzzle

2242

12603

5,6

10

80

906

32

1

36

408

116,4

pouring_1

Puzzle

503

4481

8,9

9

13

348

16

1

42

101

71,9

reader_writer_2

Protocol

4104

49190

12

19

13

4097

25

1

4

16

9,9

pouring_2

Puzzle

51624

1232712

23,9

25

15

44509

18

1

1

4

2

Fig. 6. Experimental result of on-line invariant checking

experiment is designed to compute how many time steps model checking could look ahead from each state in the model within one time step (i.e. 1ms). So the invariant to be checked is a Boolean formula derived from the set of the states in each model. The experimental results in Fig. 6 show the minimal, the maximal and the average look-ahead from the states of each model. It is easy to see that the maximal out-degree of a model has a larger inﬂuence on look-ahead performance than the average degree of the model. The other experiment is on-line LTL model checking. The model driving phils 2 is derived from a mutual exclusion algorithm of processes accessing several resources, motivated by “The Driving Philosophers” in [9]. The property to be checked is G(ac0 → F gr0 ), where the proposition ac0 denotes that process 0 requests a resource and the proposition gr0 denotes that the resource is granted to process 0. In other words, if process 0 requests a resource, it will be granted to him eventually. The experimental result in Fig. 7 is obtained by setting td = 5ms and running 2000 scheduling cycles. That is, at each scheduling cycle the veriﬁcation service is allocated 5ms to perform on-line model checking. The property is not violated at least up to this 2000 checking rounds. Fortunately, the veriﬁcation service can always run enough time steps earlier than the simulated execution of this model. The minimal look-ahead is 23 time steps, the maximal look-ahead is 74 time steps and the average look-head is 57.2 time steps relative to the corresponding current states monitored from the randomly generated execution trace. Compared to the usual (oﬀ-line) model checking, our on-line model checking can reduce the state space to be explored by using the monitored states obtained while the system is running. On this view, the computational complexity of the on-line model checking is less than that of the traditional model checking. Compared to the usual runtime veriﬁcation, our runtime veriﬁcation checks the

On-Line Model Checking as Operating System Service time steps

139

On-line LTL model checking

80

70

60

50

40

30

20

Real-time Application 10

0 1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001

time

Fig. 7. Experimental result of on-line LTL checking

system properties at the model level while just using the monitored states to do consistency checking and then to shrink the state space to be explored. As a result, the computational complexity of the model-based runtime veriﬁcation is greater than that of the conventional runtime veriﬁcation. However, if we make our model-based runtime veriﬁcation look ahead only several time steps at each checking round, then its computational complexity in terms of time and memory overhead will be closer to that of the state-of-the-art runtime veriﬁcation. In addition, our model-based runtime veriﬁcation can check more general properties speciﬁed by ACTL and/or LTL formulas, since [10] shows that the property patterns to be checked in practice are usually not very complex. 3.3

Pre-checking and Post-checking

Ideally, we wish that on-line model checking could always run enough (time) steps ahead the execution of the task to be veriﬁed. This depends on the complexity of the behavioral model of the task as well as the underlying hardware architecture. Therefore, we have to face the reality that the veriﬁcation service might fall behind the execution of the task to be checked. As a result, we introduce two checking modes: pre-checking and post-checking. We say that the veriﬁcation service is in pre-checking mode, if it runs ahead of the execution of the task to be checked; otherwise, it is in post-checking mode as shown in Fig. 8. In pre-checking mode, the veriﬁcation service can naturally predict violations before they really happen. In post-checking mode, it seems that the violations could only be detected after they have already happened. Fortunately, it is still possible to “predict” violations even in post-checking mode because our on-line veriﬁcation works at the model level. In case that an error is found at some place

140

F.J. Rammig, Y. Zhao, and S. Samara

Model-based Runtime Verification Service

Pre-checking

Real-time Application

Runtime Verification

Post-checking

Runtime Verification

Real-time Application

Fig. 8. Pre-checking and Post-checking

other than the monitored execution trace in the partial state space being checked, then we can “predict” that there might be an error in the model which has not happened yet. In this sense, both checking modes are useful for safety-critical systems. Notice that our on-line model checking can observe the actual execution trace of the task being checked once it falls behind. This means that only a rather small state space needs to be explored in post-checking mode. Thus, there still exists chance for the veriﬁcation service to pass over the task being checked again. On this view, it seems as if the veriﬁcation service and the task are involved into a two-player game. In the course of the game, we say that the veriﬁcation service wins against the task being checked, if the veriﬁcation service takes the leading position for a longer time than the task does. Without doubt, we need to ﬁnd an improved strategy to make the veriﬁcation service have more chance or higher probability to win against the task to be checked. Recall that the source code of the system implementation is usually validated by simulation and testing. Therefore, in the future we are going to learn some heuristic knowledge at the system testing phase so that the system model can be enriched with more useful information. The heuristic information can thus guide on-line model checking to reduce the state space to be explored whenever necessary.

4

Related Work

Unlike our on-line model checking, the state-of-the-art runtime veriﬁcation takes the system implementation and the system speciﬁcation into account. The basic idea is to monitor the execution of the source code and afterwards to check the so far observed execution trace against the system properties speciﬁed usually by LTL formulas. This kind of runtime veriﬁcation can only do post-checking, i.e., the checking progress always falls behind the system execution because the checking procedure can continue only after a new state has been observed. Consequently, property violations are usually detected after they have already happened. Notice that even if a property is checked correct with this approach, it

On-Line Model Checking as Operating System Service

141

does not imply that the monitored execution trace conforms to the system model and the system model satisﬁes the same property as well. The former depends on the consistency between the system implementation and the system model, while the latter depends on the granularity of the system model and the property automaton to be checked. Typically, [11] presents runtime checking for the behavioral equivalence between a component implementation and its interface speciﬁcation by writing the interface speciﬁcation in the executable AsmL so that one can synchronously run the interface speciﬁcation and the component implementation while monitor if they are equivalent on the observed behaviors; [12] presents runtime certiﬁed computation whereby an algorithm not only produces a result for a given input, but also proves that the result is correct with respect to the given input by deductive reasoning; [13] presents runtime checking for the conformance between a concurrent implementation of a data structure and a high-level executable speciﬁcation with atomic operations by ﬁrst instrumenting the implementation code to extract the execution information into a log and then executing a veriﬁcation thread concurrently with the implementation while using the logged information to check if the execution conforms to the high-level speciﬁcation; [14] presents monitoring-oriented programming (Mop) as a light-weight formal method to check conformance of implementation to speciﬁcation at runtime by ﬁrst inserting speciﬁcations as annotations at various user selected places in programs and then translating the annotations into an eﬃcient monitoring code in the same target language as the implementation during a pre-compilation stage. Similar to Mop, Temporal Rover [15] is a commercial code generator allowing programmers to insert speciﬁcations in programs via comments and then generating from the speciﬁcations the executable veriﬁcation code, which are compiled and linked as part of the application under test. In addition, Java PathExplorer (JPaX) [16] is a runtime veriﬁcation environment for monitoring the execution traces of a Java program by ﬁrst extracting events from the executing program and then analyzing the events via a remote observer process. What’s more, [17] extends the usual runtime veriﬁcation techniques to on-line verify and steer a Discrete Event System (DES) by looking ahead into a partial system model to predict violations and then applying steering actions to prevent them. This method requires that the time delay for the DES to move from the current state to the next state must be long enough so that the runtime checking has suﬃcient time to explore a partial system model, which is generated after the current state is known. Our on-line model checking can explore the system model even before the current state is known and then shrinks the state space after the current state is known. That is, the progress of our runtime veriﬁcation is not strictly bound to the execution of the source code, i.e., it may run before or after the system execution. If the processing speed is fast enough, our runtime veriﬁcation could keep running certain time steps before the system execution and then tell the system how many time steps ahead are safe. Also, our runtime veriﬁcation can check more general properties speciﬁed by ACTL and/or LTL formulas.

142

5

F.J. Rammig, Y. Zhao, and S. Samara

Conclusion

On-line model checking has the potential to serve as a powerful complementary veriﬁcation technique for real-time applications with dynamic task sets. It is complementary in the sense that we have to assume that the newly accepted task has been veriﬁed oﬀ-line under the assumptions necessary for such a veriﬁcation. The on-line model checking then can be restricted to verify whether the actual execution trace is correct under the real environmental conditions. As our online model checking reduces dramatically the state space to be veriﬁed, a much ﬁner granularity concerning value domains of system variables can be handled. By this and due to the fact that a priori unknown run-time conditions can be considered as well, our run-time veriﬁcation establishes an additional safety level. This method can also be seen as a complementary attempt to overcome the well known state explosion problem of model checking. Whenever the state space is reduced, it is essential to reduce it to the states that are relevant. Our method automatically and dynamically reduces the state space to exactly those states that are relevant for the actual execution trace. The resulting veriﬁcation method can be implemented as an operating system service comparable to the schedulability acceptance test which is part of any RTOS being able to handle dynamic task sets. This service is triggered whenever some reconﬁguration of the task set to be handled takes place. In contrary to traditional a posteriori runtime veriﬁcation methods published so far, our approach can look into the future, i.e., a partial state space at model level relative to the current state of the execution trace. Experimental results show that run-time model checking is possible when the approach as outlined in this paper is followed. Although these experiments have been carried out based on simulations up to now, there is a strong indication that also systems of practical relevance can be handled.

References 1. Godefroid, P.: Partial-Order Methods for the Verification of Concurrent Systems. LNCS, vol. 1032. Springer, Heidelberg (1996); Foreword By-Wolper, Pierre 2. Berezin, S., Campos, S.V.A., Clarke, E.M.: Compositional reasoning in model checking. In: de Roever, W.-P., Langmaack, H., Pnueli, A. (eds.) COMPOS 1997. LNCS, vol. 1536, pp. 81–102. Springer, Heidelberg (1998) 3. Clarke, E.M., Grumberg, O., Long, D.E.: Model checking and abstraction. ACM Trans. Program. Lang. Syst. 16(5), 1512–1542 (1994) 4. Lee, W., Pardo, A., Jang, J.Y., Hachtel, G., Somenzi, F.: Tearing based automatic abstraction for ctl model checking. In: ICCAD 1996: Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design, Washington, DC, USA, pp. 76–81. IEEE Computer Society, Los Alamitos (1996) 5. Zhao, Y., Oberth¨ ur, S., Kardos, M., Rammig, F.J.: Model-based runtime verification framework for self-optimizing systems. Electr. Notes Theor. Comput. Sci. 144(4), 125–145 (2006) 6. Zhao, Y., Rammig, F.J.: Model-based runtime verification framework. In: Proceedings of the Formal Engineering Approaches to Software Components and Architectures (FESCA 2009), New York, UK (March 2009)

On-Line Model Checking as Operating System Service

143

7. Kent, S.: Model driven engineering. In: Butler, M., Petre, L., Sere, K. (eds.) IFM 2002. LNCS, vol. 2335, pp. 286–298. Springer, Heidelberg (2002) 8. Clark, E.M., Grumberg Jr., O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999) 9. Baehni, S., Baldoni, R., Guerraoui, R., Pochon, B.: The driving philosophers. Technical report. In: Proceedings of the 3rd IFIP International Conference on Theoretical Computer Science (TCS 2004) (2004) 10. Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Patterns in property specifications for finite-state verification. In: ICSE 1999: Proceedings of the 21st international conference on Software engineering, pp. 411–420. IEEE Computer Society Press, Los Alamitos (1999) 11. Barnett, M., Schulte, W.: Spying on components: A runtime verification technique. In: Leavens, G.T., Sitaraman, M., Giannakopoulou, D. (eds.) Workshop on Specification and Verification of Component-Based Systems (October 2001) 12. Arkoudas, K., Rinard, M.: Deductive Runtime Certification. In: Proceedings of the 2004 Workshop on Runtime Verification (RV 2004), Barcelona, Spain (April 2004) 13. Tasiran, S., Qadeer, S.: Runtime Refinement Checking of Concurrent Data Structures. In: Proceedings of the 2004 Workshop on Runtime Verification (RV 2004), Barcelona, Spain (April 2004) 14. Chen, F., Rosu, G.: Towards Monitoring-Oriented Programming: A Paradigm Combining Specification and Implementation. In: Proceedings of the 2003 Workshop on Runtime Verification (RV 2003), Boulder, Colorado, USA (2003) 15. Drusinsky, D.: The Temporal Rover and the ATG Rover. In: Havelund, K., Penix, J., Visser, W. (eds.) SPIN 2000. LNCS, vol. 1885, pp. 323–330. Springer, Heidelberg (2000) 16. Havelund, K., Rosu, G.: Java PathExplorer — a runtime verification tool. In: Proceedings 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space (ISAIRAS 2001), Montreal, Canada (June 2001) 17. Easwaran, A., Kannan, S., Sokolsky, O.: Steering of discrete event systems: Control theory approach. Electr. Notes Theor. Comput. Sci. 144(4), 21–39 (2006)

Designing Highly Available Repositories for Heterogeneous Sensor Data in Open Home Automation Systems Roberto Baldoni, Adriano Cerocchi, Giorgia Lodi, Luca Montanari, and Leonardo Querzoni Dipartimento di Informatica e Sistemistica “A. Ruberti” Sapienza Universit`a di Roma - Rome, Italy {baldoni,cerocchi,lodi,montanari,querzoni}@dis.uniroma1.it

Abstract. Smart home applications are currently implemented by vendor-specific systems managing mainly a few number of homogeneous sensors and actuators. However, the sharp increase of the number of intelligent devices in a house and the foreseen explosion of the smart home application market will change completely this vendor centric scenario towards open, expandable systems made up of a large number of cheap heterogeneous devices. As a matter of fact, new smart home solutions have to be able to takle with scalability, dynamicity and heterogeneity requirements. In this paper we present the architecture of a basic building block, namely a distributed repository service, for smart home systems. The repository stores data from heterogeneous devices deployed in the house that can be then retrieved by context aware applications implementing some home automation functionalities. Our architecture, based on a DHT, offers a completely decentralized and reliable storage service able to offer complex query functionalities.

1 Introduction Thanks to recent progresses in the area of wired and wireless networking, sensor networks, networked appliances and embedded computing, all the enabling technologies needed to develop the vision of smart automation in home environments seems to be available. Despite this fact, currently available smart home applications are mainly represented by complex prototypes that still face a long way before reaching the status of commercial products. Existing applications are developed primarily with proprietary technology and seem to lack a long-term vision of evolution and interoperation. The future market for smart home applications will comprise a wide variety of devices and services from different manufacturers and developers. We must therefore achieve platform and vendor independence as well as architecture openness before smart homes become common places.

The work described in this paper was partially supported by the EU Projects SM4All, SOFIA and eDIANA.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 144–155, 2009. © IFIP International Federation for Information Processing 2009

Designing Highly Available Repositories for Heterogeneous Sensor Data

145

Future open smart home applications will be ready for the market only if they will meet the following requirements: Scalability - current applications are limited to a small number of devices, but an open architecture would open the market to many different vendors offering a wider selection of devices, thus raising the current limit to hundreds or even thousands of devices per home; Dynamics - while current applications are mainly based on cabled devices installed by experts, we envisage a future where new devices can be added to existing environments in a plug-and-play fashion and where wearable devices can follow users and join the home environment only when the user enters it. In this scenario, future applications must be able to tolerate or even leverage the dynamic environments where they will be required to run; Heterogeneity - a large base of available devices offered by different vendors will clearly increase the heterogeneity of the environments causing interoperability issues and requiring new approaches for resource sharing and scheduling; Reliability - as the users will start to put confidence in smart home applications, the reliability aspects of these applications will gain more importance, requiring the definition of new techniques to guarantee their correct behaviour. In this scenario, a fundamental building block is represented by a repository where devices (e.g., sensors, actuators etc) store data that can be retrieved either by the devices themselves or by some context-aware application for further processing. This paper describes the design of a scalable and reliable repository well suited to smart home applications. To this aim, the repository is implemented in a fully distributed fashion. Processes constituting the repository can be deployed on various devices located in the house offering sufficient computational and storage resources. TVs, smart phones, PCs, refrigerators etc can be example of such devices. Processes cooperate in a peer-to-peer fashion to implement storage and query functionalities in a dynamic, scalable and reliable way. In order to fit the heterogeneity of data that could be stored in the repository these functionalities are realized through a mapping component able to efficiently store and organize pieces of data in order to facilitate their search and retrieval. The rest of this paper is organized as follows: Section 2 introduces the architecture of the repository detailing its internal components and explaining how data can be stored and retrieved from it; Section 3 introduces an application scenario that shows a possible usage of the repository in a realistic setting; Section 4 describes related works in this field of research and, finally, Section 5 concludes the paper.

2 Architecture of the Repository This section is devoted to describing the architecture of our repository; however, before delving into its details, we will briefly introduce the reader to the smart home environment where our repository will be deployed.

146

R. Baldoni et al.

2.1 The Smart Home Environment Traditionally, smart home solutions (e.g. [1,5,6,2]) were provided by one vendor, using a single standard for communication, often choosing a closed one, and were expensive. In the future of this area we envisage a scenario where a single home will host a large (up to hundreds) number of devices coming from different vendors, with different hardware, communication interfaces, and operating software, that will interoperate and cooperate to offer complex services to house residents [9]. The cooperation among widely different technologies will be guaranteed through the use of middleware platforms [7] and interconnection standards [3,4] able to hide to software developers the complexities stemming by the unavoidable differences among communication standards. Devices interacting within the home environments will be characterized by different capabilities and different available resources: dumb temperature/light sensors, automatic blinds, phones, light switches, home appliances, media centers, PCs, etc. Most of them will offer different kinds of services like reading the current temperature value or showing a high-def movie on a TV. Some of them, arguably the powerful ones, will be able to make part of these resources available to the system to offer storage space or computing power. These resources will be used to build and offer to house inhabitants complex services that could not be offered by devices working alone in a “digitally closed” world. In the following we will assume that all these devices are able to communicate using a common middleware infrastructure. We also assume that all devices and services running in the system agree on a common schema representing the environment. This schema contains an organized description of all the devices present in the environment together with the services they offer and the state they maintain. Such shared schema could be queried and then used to automatically compose services offered by different devices. Figure 1 depicts a possible general schema for an environment hosting a number of heterogeneous sensor and actuator devices; note that the schema defines a hierarchy of elements through IS-A relationships; every element can contain a set of attributes used to describe its state (for example a temperature sensor could have an attribute “current temperature”). The schema also describes the types and admitted values for every attribute it defines. The schema can be represented within the architecture using a markup language like OWL Device

Attr. x

Sensor

S1

Attr. y

Actuator

S2

Sn

Attr. v

A1

Attr. w

A2

Attr. z

Fig. 1. General descriptive schema of the environment

Attr. k

Am

Designing Highly Available Repositories for Heterogeneous Sensor Data

147

[8]. In this paper we assume that the schema is given and static and that is known by all devices in the system. Devices produce data represented by XML documents whose format respects the schema; we can imagine a document as a set of nested tags specifying completely the device where the data has been originated and the content of the data itself. Using a common format to describe data is fundamental to support operations among heterogeneous sources. 2.2 Architecture Overview In this section we present the architecture of our repository service. The service is provided by a distributed set of processes, all running the same software component, that cooperate to provide the required functionalities: (i) storage of data provided by devices and (ii) retrieval of data matching queries possibly issued by devices or other software components. These processes can exchange messages using the communication primitives provided by the underlying middleware; here we do not make any specific assumptions on these primitives as a simple TCP/IP-like channel would perfectly suit our needs. Figure 2 depicts the internal architecture of the repository service. It consists of three main components: the Mapper, a hash function and a Key-value storage. Arrows in the picture depict data as it flows through the repository; the upper half of the picture shows what happens when an external component queries the repository for some data, while the bottom half shows the repository actions when some data provided by a device must be stored. Data provided to the repository is stored in a key-value storage component. This component provides a simple interface: a store(key,data) primitive that is used to store Mapped query

Lookup keys

Results

Mapper Data

Mapped Data

Hash Data storage key

Key-Value storage

Repository

DATA

STORE

store(key,data)

RETRIEVE

mappings

lookup(key)

Query

XML data Storage key

Fig. 2. Architecture of the repository. The figure also shows main operations that can be executed on it: data storage (bottom half) and retrieval (top half).

148

R. Baldoni et al.

a document data with an identifier key, and a lookup(key) that returns all data associated to identifier key. Keys are usually represented by strings of bits. A key-value storage represents a very simple and flexible method to store large amounts of data; thanks to its simplicity it can easily be distributed to improve its scalability. However, the interface it provides limits users to exact-match like data searches. This severely reduces the expressiveness of queries that could be issued to the repository; in this way it would impossible to issue a query like “retrieve all data from devices that sensed a temperature higher than 21°C in the last three hours”. Such kind of complex queries are quite common in smart home scenarios because knowledge about a specific service context must often be built from scratch without the need to retrieve the complete environmental context (e.g. I don’t want to know the temperature in every room of the house if the service will be delivered only in the kitchen). To increase the flexibility of queries issued to the repository we introduce the Mapper component whose goal is to decouple interactions with the key-value storage by automatically mapping complex queries to set of keys. Mappings must be defined such that, if some data generated by a device matches a query, then both the data and query are mapped to two sets of keys with non empty intersection. This intersection property guarantees the correct behaviour of the data retrieval functionality. Mapping is realized by transforming original data and queries in their mapped versions and then hashing such documents to obtain keys that can be used to access the key-value storage. 2.3 The Key-Value Storage This functional component can be embodied by any service able to store data associated to and identifier key. Many different technologies can be used to implement this service. Given the need to run the repository on a possibly large number of devices that could be added or removed at runtime we advocate the usage of a Distributed Hash Table (DHT). DHTs implement a simple key-value storage as a completely distributed and self-organizing system of processes running on different hosts. A DHT is able to change its composition at run-time by allowing new processes to join the storage system o gracefully removing nodes that silently left it. Data stored within the DHT is automatically moved and replicated to adapt to the new system configuration and to resist to unexpected changes. Moreover, thanks to their completely decentralized functionalities, DHT are able to scale to a large number of processes and fairly balance the load among them. Past research on DHTs mainly focused on developing systems for large scale wide area network settings [12]; however, given their nice properties with respect to reliability, scalability and ability to autonomously react to changes in the system, we think DHTs could perfectly fit out smart home scenario. 2.4 The Mapper Each query issued to the repository is an XML document with the same format used by devices to store their data, but where attributes can contain data or complex operators. These operators can be simple >=< for numerical values, regular expressions for strings or a composite construct with various constraints. When a query reaches the Mapper component these complex constraints expressed on various attributes are

Designing Highly Available Repositories for Heterogeneous Sensor Data

149

used to derive from the original query a set of mapped queries; each of these mapped queries has a different value specified for every attribute. The translation from an attribute constraint to one or more specified values is executed using a mapping for the attribute. Mappings for the various attributes are retrieved from a local storage inside the mapping component. Mappings must be defined for every attribute associated to every element of the schema. A mapping is built by partitioning the set of all valid values for an attribute in subsets and electing a representative value for each of them. This operation is quite intuitive if we consider a numerical attribute with values bounded by min and max; all the values in this interval can be partitioned in continuous interval and the first value of each interval can be elected as a representative. For example, the following mapping can be considered for a temperature sensor with an attribute “current temperature” whose values are floating point numbers bounded by -10 and 60: /Device/Sensor/Temperature/current temperature ⇒ [−10, 0, 10, 20, 30, 40, 50] Mappings are strictly related to the schema describing the environment, therefore we assume that they are defined at startup time and remain unchanged at runtime. Issues related to runtime updates to mappings will be considered in future work. 2.5 Data Storage and Retrieval Let us now detail how data produced by devices can be stored and then retrieved from the repository. For the sake of simplicity we will first show query management. When a query is issued to the repository it is first passed to the Mapper component. Figure 3 shows how a query is managed within the Mapper component. The Mapper splits it into its components, i.e. the attributes and their constraints (phase 1). It then checks its content one attribute at a time and retrieves the corresponding mapping from the internal storage. Constraints associated to the attribute are then matched (phase 2) against the sets of values contained in the mapping: all intervals that do not match the constraints are discarded (grey intervals in the figure). When all attributes have

mappings

Query

Attribute Constraints

Mapped query

Attribute Constraints

Mapped query

C Attribute Constraints

Phase 1 - split

Mapped query Phase 2: match

Phase 3: combine

Fig. 3. Query management within the Mapper component

150

R. Baldoni et al.

been considered the mapping component builds the XML documents representing the mapped queries by combining all the representative values from intervals that have not been discarded in the previous phase for all the different attributes (phase 3); each mapping query contains for each attribute of the original query one of the representative values. Mapped queries produced by the Mapper are then passed to the hash component. This component simply implements an hashing function that returns a string of bits for any incoming mapped query. These strings are the keys that must be used to invoke the lookup procedure on the key-value storage component. If the storage component is implemented as a DHT the hashing function can be the same function provided by the DHT implementation, otherwise any collision-resistant function will fit. For each key passed to the storage, a resulting document will be returned. All these data constitute the response to the query. The management of new data submitted to devices for storage is equivalent to query management. The main difference is that data provided by devices will contain values for every attribute, i.e. no constraints are admitted in these documents. When this data is passed to the Mapper component it decomposes it into its components and then checks its content one attribute at a time and retrieve the corresponding mapping from the internal storage. When the value of an attribute is matched against the sets of values defined by the mapping one and only one set is positively matched; this is due to the fact that a mapping is a complete partitioning of the value space defined by the attribute. After all attributes have been matched, the corresponding representative values are combined to obtain the mapped data; note that, since a single representative value can be returned by the matching phase for each defined attribute, only one instance of mapped data will be created in this process. Mapped data produced by the Mapper is then passed to the hash component to obtain the data storage key associated to the submitted data. Submitted data and the corresponding key are then used as parameters of the storage component’s store primitive. Clearly, the matching between a query and the mappings for the attributes it contains can generate false positives in the result sets, i.e. data that was mapped to one of the keys associated to the query but that does not satisfy the constraints defined in the query. The amount of false positives depends on the granularity of mappings defined for the various attributes: the more value sets are defined in the mapping for a specific attribute, the less will be the probability of false positives caused by constraints expressed on that attribute. False negatives are not possible as long as the intersection property is satisfied for all mappings.

3 Application Scenario Our example is based on a single house, and within this house we will focus our attention only on three locations: the kitchen, the dining room and the toilet. Figure 4 shows of the left plan the positioning of actuators (represented by triangles), temperature sensors (represented by circles) and light sensors (represented by squares). A fourth symbol is used to denote devices (like the PC or the TV set) whose computational and storage resources are sufficient to host an instance of the repository. Obviously, the set of devices in a real scenario would easily be larger than the one presented here; we decided

Designing Highly Available Repositories for Heterogeneous Sensor Data

151

! "#$%#&’ ( ) * +,-. ) / . &’ 0123#-. ) / . &’ 4 ) 51" ) -#3%#-" %/ &66) ’ -. #&’%2) -

Fig. 4. Example of a smart home and its corresponding schema of the environment

to limit both the number and the different types of devices to improve the readability of this example. We consider two types of devices that can produce data: sensors (light and temperature) and actuators (that open and close windows and doors, turn on and off lights, the refrigerator, the tv and the heating system). Starting from these considerations, a schema of the environment can be produced (right half of Figure 4). The schema describes any element in the environment as a subtype of the Device type. Different types are characterized by different attributes. Some attributes, like Temp value or Gas flow for the temperature sensor and Stove type respectively, are bounded numerical values; other attributes, like Door for the Fridge type are simple boolean values; finally, attributes like Channel for the Television type are enumerated text values. Regardless of the specificities of the considered scenario, each attribute in the schema is supposed to be detailed with its value type and possible bounds or valid value sets. Mappings definition - Given the schema of figure 4 a mapping for each attribute must be defined in the Mapper component. Let consider the attributes Temp Value and Light Amount for the two sensor types. Assume that temperature values are bounded by 10 and 40 degrees Celsius and grouped in intervals 5 degrees wide. The Mapper internal storage unit will contain an entry like the following one1 : /Device/Sensor/Temperature/Temp value ⇒ [10, 15, 20, 25, 30, 35] With respect to the Light Amount attribute, we can imagine that we are interested only in four value intervals: dark, soft light, normal light and strong light. A light sensor returns values expressed in lumen, thus we have to discretize the great set of possible light amounts in four coarse intervals. Therefore, the Mapper will contain an entry like the following one: 1

In this example we use a compact representation of intervals for numerical attributes in the mapping. This compact representation is subject to variations depending on the specific attribute type.

152

R. Baldoni et al.

/Device/Sensor/Light/Light amount ⇒ [0, 1000, 5000, 20000] An enumeration attribute like Location has a predefined set of acceptable values; in our example it could have values “Kitchen”, “Dining Room” and “Toilet”. In this case grouping the values in sets for the mapping is useless and a 1-to-1 mapping can be considered. Op 1: temperature data update - Each device producing data knows the data schema and this lets it produce well-formed data. An example of how data produced by a temperature sensor could be represented is shown in the listing 1; in this case the data is an instance of the environment schema. Listing 1. Representation of data produced by a temperature sensor 1 K i t c h e n <S e n s o r > 26.5

This listing represents data produced by a temperature sensor located in the kitchen of out home that is identified by Home ID 1. When this data is passed to the Mapper component it considers the lists of attributes it contains and retrieves the corresponding mappings from the internal storage. In this case the mappings are: /Device/Home ID ⇒ [1] /Device/Location ⇒ [“Kitchen”, “Dining Room”, “Toilet”] /Device/Sensor/Temperature/Temp value ⇒ [10, 15, 20, 25, 30, 35] The Home ID and Location attributes are thus mapped to their corresponding values. The Temp value attribute is mapped to the representative value 25 as the real value 26.5 is included in the range 25 − 30. Now that the mapper knows the values for the mapped attributes it can build the mapped data: Listing 2. Mapped data 1 K i t c h e n <S e n s o r > 25

Designing Highly Available Repositories for Heterogeneous Sensor Data

153

The mapped data is then used to calculate through the hash function the key representing the sensor data in the key-value storage component. Note how all data generated by temperature sensors located in the kitchen of house 1 and whose sensed value is in the range 25 − 30 are stored with the same key. Op 2: querying temperature data - Suppose now that an external software component needs to interrogate the repository and obtain data from all temperature sensors in the house whose last reading reported a temperature value greater than 25.5°C. This kind of query could be useful for example to control the heating subsystem in order to maintain a constant temperature in the house. The query looks like the following: Listing 3. A query submitted to the repository 1 * <S e n s o r > >25.5

Note how values for some attributes have been substituted by constraints; a star “*” wildcard is used to match any possible value of the attribute meaning that the query is not interested in restricting the search to a specific location, while the constraint “>25” limits the range of temperatures that are returned. The mapper receives the query and splits it into its components. It retrieves the mappings for all three attributes and starts to match the contraints against the intervals. All intervals in the mapping associated to attribute Location is matched given the “*” constraint. The Home ID attribute defined with value 1 leads to a 1-to-1 mapping. Finally, the constraint “>25.5” defined for attribute Temp value is matched against the intervals contained in the corresponding mapping; a match is positive if there is a non empty intersection between the set of values defined by the contraint and the set of values contained in one of the intervals defined by the mapping; the matching operation thus returns values 25, 30 and 35. All these matched values are combined in all possible ways to obtain the mapped queries. One of the 9 mapped queries generated by the Mapper component is the one presented as listing 2. This mapped query is indistinguishable from the mapped data we showed in the previous section. This means that this query generates at least one key that corresponds to the key previously generated to store the sensor data. This is correct, indeed, as the sensor data perfectly matches the query. Note that a a slightly different query, e.g. a query requiring all data from temperature sensors exposing a value greater than 29°C would have returned the same mapped query. In this case the previously stored sensor data would constitute a false positive returned in the response due to the lack of precision in the attribute mapping.

154

R. Baldoni et al.

4 Related Work Even if the possibility to store and retrieve data is fundamental in all smart home applications, to the best of our knowledge issues related to the design of embedded repositories for such kind of applications have been hardly takled in the State of the Art. Probably this is due to the fact that previous works in this area often considered simple centralized approaches to the problem. To find some informations about the distributed storage problem in this kind of environment, we have to look works addressing problems related to context-awareness, that typically need to access a reliable repository. Schmidt in [13] explains that context-aware systems are computing systems that provide relevant services and information to users based on their situational conditions. This status of the system must be stored reliably in order to be queried and explored. From this point of view the availability of a reliable distributed repository can be very useful to a context-aware systems deployed in a smart home. In our work we focused on how such reliable distributed repository could be realized. Khungar and Riekki introduced Context Based Storage (CBS) in [11], a context aware storage system. The structure of CBS is designed to store all types of available data related to a user and provide mechanisms to access this data everywhere using devices capable to retrieve and use the information. CBS provides a simple way to store documents using the context explicitly provided by the user. It allows users to retrieve documents from a ubiquitous storage using the context related directly to the document or context related to the user that is then linked to the document through timestamping methods. The main difference between CBS and our system is that in CBS special emphasis is given to group activities and access control, since CBS is designed for an ubiquitous environment. In our system rights are not considered since we assume that within a closed home environment the set of users is well known and security is tackled when trying to access the system. Another great difference regards the way data is stored: the storage system in our architecture is completely distributed and built to provide reliable storage and access to devices deployed in the environment. All the previous solutions assume the presence of a central database, a storage server that in same cases could be overloaded. In [14] the authors show a solution targeted at enhancing the process of data retrieval. The system is based on the idea of prefetching data from a central repository to improve responsiveness to user requests. The solution proposed in this paper tries to overcome these problems using a scalable distributed approach where all participating nodes are able to provide the same functionalities. A solution similar to the one proposed in this paper has been previously adopted in the field of large scale data diffusion to implement a content-based publish/subscribe system on top of a DHT [10]. The main difference between the two approaches is in the way data is accessed: while publish/subscribe systems assume that queries (subscriptions) are generated before data (publications) is diffused, our system targets a usage pattern closer to the way classical storage systems are accessed.

5 Conclusions This paper presented the architecture of a distributed repository for smart home application environments characterized by the presence of a large number of heterogeneous

Designing Highly Available Repositories for Heterogeneous Sensor Data

155

devices. The repository bases its reliability and scalability properties on an underlying DHT that is used to store and retrieve data. The limitations imposed by the DHT lookup primitive is solved by introducing a mapping component able to correctly map queries and matching data. The authors plan to start experimenting this idea through an initial prototype that will be adopted for testing purposes. Aim of these test will be to evaluate the adaptability of the proposed architecture to different applicative scenarios. A further improvement plannes as futur work will consist in modifying the system in order to automatically adapt at run-time the mappings definition in order to reduce the number of false positives returned by the repository as response to queries without adversely affecting its performance.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

11. 12.

13. 14.

AMX, http://www.amx.com/ BTicino “My Home”, http://www.myhome-bticino.it/ KNX, http://www.knx.org/ LonWorks, http://www.echelon.com/ Lutron Electronics Co., Inc., http://www.lutron.com/ Philips Dynalite, http://www.dynalite-online.com/ The UPnP forum, http://www.upnp.org/ Web Ontology Language (OWL), http://www.w3.org/2004/OWL/ Smart Homes for All: An embedded middleware platform for pervasive and immersive environments for-all. EU STREP Project: FP7-224332 (2008) Baldoni, R., Marchetti, C., Virgillito, A., Vitenberg, R.: Content-based publish-subscribe over structured overlay networks. In: Proceedings of the International Conference on Distributed Computing Systems, pp. 437–446 (2005) Khungar, S., Riekki, J.: A context based storage system for mobile computing applications. SIGMOBILE Mob. Comput. Commun. Rev. 9(1), 64–68 (2005) Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001) Schmidt, A.: Ubiquitous Computing - Computing in Context. PhD thesis, Ph.D dissertation, Lancaster University (2002) Soundararajan, G., Mihailescu, M., Amza, C.: Context-aware prefetching at the storage server. In: Proceedings of the 33rd USENIX Technical Conference, pp. 377–390 (2008)

Fine-Grained Tailoring of Component Behaviour for Embedded Systems Nelson Matthys, Danny Hughes, Sam Michiels, Christophe Huygens, and Wouter Joosen IBBT-DistriNet, Department of Computer Science, Katholieke Universiteit Leuven, B-3001, Leuven, Belgium {firstname.lastname}@cs.kuleuven.be

Abstract. The application of run-time reconﬁgurable component models to networked embedded systems has a number of signiﬁcant advantages such as encouraging software reuse, adaptation to dynamic environmental conditions and management of changing application demands. However, reconﬁguration at the granularity of components is inherently heavy-weight and thus costly in embedded scenarios. This paper argues that in some cases component-based reconﬁguration imposes an unnecessary overhead and that more ﬁne-grained support for the tailoring of component functionality is required. This paper advocates for a highlevel policy-based approach to tailoring component functionality. To that end, we introduce a lightweight framework that supports ﬁne-grained adaptation of component functionality based upon high-level policy speciﬁcations. We have realized and evaluated a prototype of this framework for the LooCI component model.

1

Introduction

Run-time reconﬁgurable component models provide an attractive programming model for Wireless Sensor Networks (WSN). As WSN environments are typically highly dynamic, run-time reconﬁgurable component models allow this dynamism to be eﬀectively managed through the deployment of new functionality or the modiﬁcation of existing compositions. WSNs are also increasingly expected to support multiple applications in the long-term perspective. In response, reconﬁgurable component models allow system functionality to evolve to meet changing application requirements. Run-time reconﬁgurable component models also promote reuse, which is essential in resource-constrained WSN environments. A number of run-time reconﬁgurable component models have been developed for embedded systems, most notably OpenCOM [4], RUNES [3], and OSGi [13]. These component models address the problems of dynamism, evolution and reuse by oﬀering developers: – Concrete interfaces that promote the reuse of components between applications. – On demand component deployment that can be used to manage dynamism and evolution through the injection of new functionality. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 156–167, 2009. c IFIP International Federation for Information Processing 2009

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

157

– Component rewiring that can be used to modify component compositions on the ﬂy and thus oﬀers a mechanism to manage dynamism and evolution. The ability to dynamically wire a third party component into a composition also promotes reuse. In sum, run-time reconﬁgurable component models allow for reconﬁguration of system functionality through the introduction of new components, or the modiﬁcation of relationships between existing components. However, component-based reconﬁguration has two critical disadvantages: – Coarse granularity: As reconﬁgurations may be enacted only by modifying relationships between components or deploying new components, componentbased reconﬁguration is a poor ﬁt for enacting ﬁne-grained changes. Thus, while component-based reconﬁguration provides a generic mechanism for enacting changes, it is ineﬃcient when that change may be represented by a few lines of code. This is particularly critical for embedded platforms, such as WSN nodes, where memory is limited and software updates are costly operations. – Complexity of abstraction level: Component-based reconﬁguration is complex and requires a domain expert to be enacted properly. This complexity prevents end-users from tailoring the functionality of the deployed system themselves. Furthermore, expressing simple changes in a component-based system should be oﬀered at the abstraction level of the end-user. This paper addresses the problems of coarse granularity and complexity through the introduction of a lightweight policy framework for adapting component behaviour. Policies for this framework are high-level and platform independent, thus allowing end-users to more easily tailor component behaviour. The performance of this system is evaluated through a number of case studies. The remainder of this paper is structured as follows: Section 2 provides background on component and policy frameworks for networked embedded systems, while Section 3 presents the design of a policy language and corresponding framework for tailoring component behaviour. An initial prototype of this framework is evaluated based on a case study in Section 4. Section 5 critically discusses advantages and shortcomings of our approach. Finally, Section 6 concludes and presents directions for future work.

2

Background

This section ﬁrstly discusses the state-of-the-art in component models for networked embedded systems. Section 2.2 then discusses existing policy-based mechanisms for tailoring component functionality. Finally, Section 2.3 provides a brief overview of the LooCI component model. 2.1

Component Models for Networked Embedded Systems

NesC [6] is perhaps the best known component model for networked embedded systems and is used to implement the TinyOS [12] operating system. NesC

158

N. Matthys et al.

provides an event-driven programming approach together with a static component model. NesC components cannot be dynamically reconﬁgured, however, the static approach of NesC allows for whole-program analysis and optimization. Mat´e [11] extends NesC and provides a framework to build application-speciﬁc virtual machines. As applications are composed using speciﬁc virtual machine instructions, they can be represented concisely, which saves power that would otherwise be consumed due to transmitting software modules. However, compared to component-based approaches, Mat´e has one critical shortcoming - compositions are limited by the functionality that is already deployed on each node and thus it is not possible to inject new functionality into a Mat´e application without reﬂashing each node. OpenCOM [4] is a general purpose, run-time reconﬁgurable component model and while it is not speciﬁcally targeted at networked embedded systems, it has been deployed in a number of WSN scenarios [7]. OpenCOM supports dynamic reconﬁguration via a compact runtime kernel. Reconﬁguration in OpenCOM is coarse-grained, being achieved through the deployment of new components and modifying connections between components. The RUNES [3] component model brings OpenCOM functionality to more embedded devices. Along with a smaller footprint, RUNES adds a number of introspection API calls to the OpenCOM kernel. Like OpenCOM, RUNES allows for only coarse-grained component-based reconﬁguration. The OSGi component model [13] targets powerful embedded devices along with desktop and enterprise computers. OSGi provides a secure execution environment, support for run-time reconﬁguration and life-cycle management. Unfortunately, while OSGi is suitable for powerful embedded devices, the smallest implementation, Concierge [15] consumes more than 80KB, making it unsuitable for highly resource-constrained devices. 2.2

Policy Techniques for Tailoring Component Behaviour

Over the last decade, research on policy-based management [2] has primarily been applied to facilitate management tasks, such as component conﬁguration, security, or Quality of Service in large-scale distributed systems. Policy-based management allows the speciﬁcation of requirements about the intended behaviour of a managed system using a high-level policy language, which are then automatically enforced in the system. Furthermore, policies can be changed dynamically without having to modify the underlying implementation or requiring the consent or cooperation of the components being governed. ESCAPE [16] is a component-based policy framework for programming sensor network applications using TinyOS [12]. Similar to our approach, ESCAPE advocates the use of policy rules to govern component behaviour. However, policies in ESCAPE are exclusively used to specify interactions between components, removing interaction code from the individual components, whereas in our approach we apply policy techniques to conﬁgure entire component compositions, including the existing information ﬂow. In addition, ESCAPE is implemented

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

159

on top of the static NesC component model [6], whereas our policy framework builds on top of a more ﬂexible run-time reconﬁgurable component model. Recently, the Service Component Architecture (SCA) deﬁned a Policy framework speciﬁcation [14], which aims to use policies for describing capabilities and constraints that can be applied to service components or to the interactions between diﬀerent service components. While not being bound to a speciﬁc implementation technology, the SCA policy framework focusses on service-oriented environments such as OSGi [13] which may only be applied to relatively powerful embedded devices. The approach this paper proposes is to combine the key beneﬁts of a runtime reconﬁgurable component model (i.e. the ability to inject new functionality dynamically and reason about distributed relationships between components), with the eﬃciency of policy-based tailoring of functionality. As we will show in Section 4, this reduces the burden on developers while also reducing performance overhead for simple reconﬁgurations. Furthermore, the policy language we propose is high-level and easy to understand, allowing end-users, as well as domain experts, to customize the functionality of component compositions. 2.3

LooCI: The Loosely-Coupled Component Infrastructure

The Loosely-coupled Component Infrastructure (LooCI) [8] is designed to support Java ME CLDC 1.1 platforms such as the Sun SPOT [17]. LooCI is comprised of a component model, a simple yet extensible networking framework and a common event bus abstraction. LooCI components support run-time reconﬁguration, interface deﬁnitions, introspection and support for the rewiring of bindings. LooCI oﬀers support for two component types, macrocomponents and microcomponents. Macrocomponents are coarse-grained and service-like, building upon the notion of Isolates inherent in embedded Java Virtual Machines such as Sentilla [1] or SQUAWK [18]. Isolates are process-like units of encapsulation and provide varying levels of control over their execution (exactly what is provided depends on the speciﬁc JVM). LooCI standardizes and extends the functionality oﬀered by Isolates. Each macrocomponent runs in a separate Isolate and communicates with the runtime middleware via Inter Isolate RPC (IIRPC), which is oﬀered by the underlying system. Unlike microcomponents, macrocomponents may use multiple threads and utility libraries. Microcomponents are ﬁne-grained and self-contained. All microcomponents run in the master Isolate alongside the LooCI runtime. Unlike macrocomponents, microcomponents must be single threaded and self-contained, using no utility libraries. Aside from these restrictions, microcomponents oﬀer identical functionality to macrocomponents in a smaller memory footprint. Unlike OpenCOM or RUNES, LooCI components are indirectly bound over a lightweight event bus. LooCI components deﬁne their provided interfaces as the set of LooCI events that they publish. The receptacles of a LooCI component are similarly deﬁned as the events to which they subscribe. As bindings are indirect, they may be modiﬁed in a manner that is transparent to the composition.

160

N. Matthys et al.

Furthermore, as all events are part of a globally speciﬁed event hierarchy, it becomes easier to understand and modify data ﬂows.

3 3.1

A Policy-Based Component Tailoring Framework Policy Language Design and Tool Support

The speciﬁcation of policies to tailor component behaviour is accomplished by using policy rules following Event-Condition-Action (ECA) semantics, which correspond well to the event-driven nature of the target embedded platforms. An ECA policy consists of a description of the triggering events, an optional condition which is a logical expression typically referring to external system aspects, and a list of actions to be enforced in response. In addition, our prototype policy language allows various functions to be called inside the condition and action parts of a policy. By using these policies, we oﬀer a simple, yet powerful method to tailor component behaviour for end-users. In addition, we provide tool support to the end-users to allow simple tailoring of system behaviour. Our tool allows the end-user to ﬁrstly select the components and interfaces that can be tailored. Secondly, after speciﬁcation of the corresponding policies, the tool parses and analyzes each policy for syntactic consistency. Finally, the tool allows the end-user to choose which nodes he wants to deploy the policy to. Concrete examples of the policy language can be found in Section 4. 3.2

Policy Framework Design

As illustrated in Figure 1, the policy framework is deployed on each sensor node and consists of three key components: the Policy Engine, the Rule Manager, and a Policy Distribution component. The Policy Engine is the main component in the framework and is responsible for intercepting events as they pass between two components and evaluating them based upon the set of policy rules on each node. In case of a match (i.e. a triggering event and a condition evaluating to true), the engine enforces the actions deﬁned in the action part of the matching policy. Typical examples of actions are, e.g. denying the event to pass, publishing a custom event, or invoking a particular function in the middleware runtime. Potential conﬂicts between multiple matching policies are handled by following a priority-based ordering of policies, whereas only the actions of the highest priority policy are executed. Distribution of policy ﬁles from the back-end to the sensor network is achieved using a Policy Distribution component hosted on each individual sensor node. After speciﬁcation and analysis of a policy by our tool, the policy is transformed into a compact binary representation that can be eﬃciently disseminated to the sensor nodes. On reception of this binary policy representation, the policy distribution component passes it to the Rule Manager component. The Rule Manager on each individual sensor node is responsible for storing and managing the set of policy rules on the node. After reception of a binary

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

161

Fig. 1. Overview of the policy framework

policy from the distribution component, the rule manager converts the policy into a data structure suitable for more eﬃcient evaluation which is then passed to the policy engine on a per triggering-event base. By retaining the ability to dynamically change the set of policies at run-time, the framework can be adapted according to evolving application demands.

4

Case-Study Based Evaluation

This section presents a scenario that requires two archetypal reconﬁgurations of a distributed component composition: (i.) introduction of ﬁltering functionality and (ii.) binding interception and monitoring. For each case, we compare the overhead of realizing reconﬁgurations using LooCI macrocomponents and microcomponents to that of realizing reconﬁguration using the policy framework introduced in Section 3. Speciﬁcally Section 4.1 describes our motivating application scenario. Section 4.2 describes how compositions may be modiﬁed through component reconﬁguration and policy application. Section 4.3 then considers the overhead for developers inherent in each approach, while Section 4.4 analyzes the memory consumption of each approach. Finally, Section 4.5 explores the performance overhead of component-based versus policy-based reconﬁguration. 4.1

Application Scenario

Consider the scenario of a WSN-based warehouse monitoring scenario. In this scenario, a company STORAGO CO provides temperature controlled storage of

162

N. Matthys et al.

goods, wherein the temperature of stored packages is monitored using a WSN running the LooCI middleware. STORAGE CO oﬀers two classes of service for stored goods: best eﬀort temperature control and assured temperature control. The customers of STORAGE CO (CHOCOLATE CO and CHEMICAL CO) each have diﬀerent storage requirements that evolve over time. – Best eﬀort temperature control : in this scheme, STORAGE CO sets temperature alarms, which alert warehouse employees if the temperature of a stored package has breached a speciﬁed threshold. As the scheme is alarm-based, it generates low levels of traﬃc, increasing battery life and reducing cost. – Assured temperature control : in this scheme, STORAGE CO provides continuous data to warehouse employees, who may view detailed temperature data and take pre-emptive action to avoid package spoiling. As this scheme transmits continuous data, it decreases node battery life and increases costs. Scenario 1. CHOCOLATE CO begins by requesting the assured temperature service level from STORAGE CO, however, due to tightening cost-constraints, CHOCOLATE CO later requests their service level to be switched to best eﬀort. CHEMICAL CO begins by requesting the low-cost best eﬀort service, however stricter government regulations require CHEMICAL CO increasing their coverage to assured temperature control. Scenario 2. STORAGE CO wishes to perform a detailed analysis of how their WSN infrastructure is being used, and thus deploys functionality to monitor all component bindings in their WSN. This functionality includes accounting of all events that pass. 4.2

Component-Based Modification versus Policy-Based Modification

Scenario 1. This section explores how the changing requirements of the customers on both temperature monitoring schemes can be reﬂected using (i.) component-based modiﬁcation of the compositions, and (ii.) by a single composition customized using our policy-based approach. Component-Based Tailoring of Functionality. The assured and best eﬀort temperature monitoring schemes discussed in Section 4.1 may be represented by two distinct component compositions. This is shown in Figure 2. In the assured monitoring scheme, a TEMP SENSOR exposes a single interface of type TEMP, which is wired to the matching receptacle of a TEMP MONITORING component. In the best eﬀort temperature monitoring scheme, the TEMP SENSOR component is wired to the matching receptacle of a TEMP ALARM component, the ALARM interface of which is then wired to the matching interface of a TEMP MONITORING component.

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

163

Fig. 2. Component conﬁgurations

In the case of CHOCOLATE CO, switching from assured to best eﬀort temperature monitoring, the existing TEMP SENSOR component will be unwired from the TEMP MONITORING component and rewired to a TEMP ALARM component, the ALARM interface of which is wired to the TEMP MONITORING component. In the case of CHEMICAL CO, switching from best eﬀort to assured monitoring, the existing TEMP ALARM component will be unwired from the TEMP MONITORING and TEMP SENSOR component. Subsequently, the TEMP interface of the TEMP SENSOR component will be wired to the matching receptacle of the TEMP MONITORING component. Policy-Based Modification. To enable CHOCOLATE CO to switch from assured to best eﬀort monitoring, the developer needs to specify and enable the following policy with priority 1: policy " assured - to - best - effort " " 1 " { on TEMP as t ; // TEMP contains ( source , dest , value ) if ( t . value > 20 && t . dest == T E M P _ M O N I T O R I N G _ C H O C _ C O) then ( // publish an ALARM event to T E M P _ M O N I T O R I N G _ C H O C _ C O publish ALARM ( t . source , TEMP_MONITORING _C HO C_C O , t . value ) ; deny t ; // and block TEMP event for further dissemination ) }

This policy speciﬁes that the policy engine should intercept all TEMP events, while only allowing those events to pass with a temperature value higher then 20 degrees Celsius and by converting them to ALARM events destined for the TEMP MONITORING component. To enable CHEMICAL CO switching from best eﬀort to assured temperature monitoring, the developer needs to specify and enable the following policy:

164

N. Matthys et al.

policy " best - effort - to - assured " " 1 " { on TEMP as t ; if ( t . dest == TEMP_ALARM ) then ( // allow sending to TEMP_ALARM for threshold checking allow t ; t . dest = T E M P _ M O N I T O R I N G _ C H E M _ C O; // change destination publish t ; // assure sending to T E M P _ M O N I T O R I N G _ C H E M _ C O ) }

This policy changes the destination of TEMP events from the TEMP ALARM to the TEMP MONITORING CHEM CO component to enforce the assured monitoring scheme. In addition, it may not break the existing composition (i.e. TEMP events must also be sent to the TEMP ALARM component). Scenario 2: Insertion of Global Monitoring Behaviour. The networkwide monitoring of component interactions described in Section 4.1 may also be implemented using a component-based or policy-based approach. In either case, the reception and transmission of an event should be logged to a ACCOUNTING component which stores events for future retrieval and analysis. In order to implement logging or accounting using component-based modiﬁcation, STORAGE CO would be required to continually probe the network to discover the state of compositions and then insert a BINDING MONITOR interception component into each discovered binding - clearly a resource intensive process. In contrast, as the LooCI Event manager provides a common point of interception for all events on each node, a single, generic policy may be inserted to perform equivalent monitoring. As all events are routed through the policy engine, such a conﬁguration is agnostic to the component compositions executing on the WSN, and clearly entails signiﬁcantly lower overhead. A policy to implement this is shown below: policy " logging " " 1 " { on * as e ; // all events have source , dest , data [] as payload then ( // always do accounting of event occurrence invoke ACCOUNTING ( e . source , e . dest , e . data []) ; allow e ; // do not block e , allow it to continue ) }

While this example is simple, we believe that the ability to install per-node, as well as per binding policies to enforce various non-functional concerns may reduce overhead in many scenarios. 4.3

Overhead for the Developer

In this section, we analyze the eﬀort required to implement the TEMP ALARM component and compare this with the eﬀort required to develop a functionally equivalent policy, as described in Section 4.2. Each implementation was analyzed in terms of Source Lines of Code (SLoC). The results are shown in Table 1.

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

165

Table 1. Development Eﬀort Comparison Micro-component Macro-component Policy 35 SLoC 35 SLoC 8 SLoC

Perhaps more critically than the conservation of development eﬀort, as illustrated by the SLoC savings shown in Table 1, is the high-level and platform independent nature of the policy speciﬁcation language, which unlike a Javabased LooCI component could equally be applied to a TinyOS [12] or Contiki [5] software conﬁguration where a suitable policy interpreter exists. 4.4

Memory Footprint

The size of the policy framework is 26 kB. Subsequently, we analyzed the static memory (size on disk) and dynamic memory (RAM) consumed by the software elements introduced in Section 4.2. As can be seen in Table 2, policy-based reconﬁguration consumes signiﬁcantly less memory than component-based reconﬁguration, a critical advantage in memory-constrained environments like WSNs. Table 2. Memory Consumption

Static Dynamic

4.5

Micro-component Macro-component Policy 1 kB 1 kB 103 bytes 3 kB 26 kB 376 bytes

Performance Overhead

We evaluated the performance of policy-based and component-based reconﬁguration using a standard SunSPOT node (180 MHz ARM9 CPU, 512 kB RAM, SQUAWK VM ‘BLUE’ version) and a 3 GHz Pentium 4 desktop with 1 GB of RAM running Linux 2.6 and Java 1.6. We ﬁrst logged the time required to deploy and initialize the policy speciﬁcation and component implementation required to achieve the reconﬁgurations described in Section 4.2. We then analyzed the time which each took to handle an incoming TEMP event (i.e. process it and disseminate an ALARM event to the gateway). In each case, the SPOT node was deployed between 20 cm and 30 cm from the network gateway and we performed 50 experiments, the averaged results of which are illustrated in Table 3. As can be seen from Table 3, not only is the overhead inherent in deploying and initializing a policy signiﬁcantly lower than that of deploying and initializing a component, the ongoing performance overhead per event caused by applying a policy to a binding is also lower (or equal to microcomponent performance) than that caused by inserting a new macrocomponent. In embedded environments where CPU and energy resources are scarce, we believe that policy-based reconﬁguration provides concrete beneﬁts over component-based reconﬁguration for tailoring compositions as it does not introduce additional overhead.

166

N. Matthys et al. Table 3. Performance Comparison

Deployment Initialization Execution overhead

5

Microcomponent Macrocomponent Policy 11330 ms 11353 ms 200 ms 8418 ms 7420 ms 6 ms 28 ms 43 ms 28 ms

Discussion

The evaluation presented in the previous section clearly shows that policy-based modiﬁcation of component compositions can have signiﬁcant advantages in terms of: (i.) lowering development overhead, (ii.) reducing memory footprint and (iii.) improving performance. This leads to a critical question: When to apply component-based modiﬁcation of functionality, and when to use policy-based tailoring of functionality? The policy-based approach is suited to enforce non-functional concerns like accounting or security on component compositions, as these non-functionalities are orthogonal to the composition and not radically change the end-to-end information ﬂow in the component composition. Despite the concrete advantages of policy-based composition modiﬁcation, this approach is not without drawbacks: it can reduce reusability of components. In a pure (or functional) component composition, the functionality of each component is solely identiﬁed by its type along with the interfaces and receptacles it provides. As the application of policies to component bindings can modify functionality in a manner that is opaque, this can eﬀectively render the component unreliable for use in other compositions and thus reduces the maintainability of the system. Managing long-term system evolution must be done with care. Rather, we believe that policies should be used to eﬃciently realize transient modiﬁcations to compositions and to enforce non-functional concerns on compositions.

6

Conclusions

This paper has presented a policy-based framework that can be used to tailor the functionality of component compositions. We have presented a compact and lightweight prototype of this framework realized for the LooCI component model and, through evaluation, we have shown that policy-based tailoring can reduce overhead for developers, reduce memory consumption and improve the performance of reconﬁguration when compared to purely component-based reconﬁguration approaches. In the short term, future work will focus upon further researching the impact of policy-based modiﬁcations on component compositions. In addition, we plan evaluating policy-based tailoring of functionality in a logistics scenario with concrete WSN end-users. In the longer term we hope to improve the expressiveness of our policy language, and implement prototypes of our policy engine and evaluate its performance for the OpenCOM [4] and OSGi [13] component models.

Fine-Grained Tailoring of Component Behaviour for Embedded Systems

167

Acknowledgments. Research for this paper was partially funded by the Interuniversity Attraction Poles Programme Belgian State, Belgian Science Policy, Research Fund K.U.Leuven, and is conducted in the context of the IBBT-DEUS project [9] and IWT-SBO-STADiUM project No. 80037 [10].

References 1. Sentilla Perk Platform (July 2009), http://www.sentilla.com/ 2. Boutaba, R., Aib, I.: Policy-based management: A historical perspective. J. Network Syst. Manage. 15(4), 447–480 (2007) 3. Costa, P., Coulson, G., Mascolo, C., Mottola, L., Picco, G.P., Zachariadis, S.: Reconﬁgurable component-based middleware for networked embedded systems. International Journal of Wireless Information Networks 14(2), 149–162 (2007) 4. Coulson, G., Blair, G., Grace, P., Taiani, F., Joolia, A., Lee, K., Ueyama, J., Sivaharan, T.: A generic component model for building systems software. ACM Trans. Comput. Syst. 26(1), 1–42 (2008) 5. Dunkels, A., Gronvall, B., Voigt, T.: Contiki - a lightweight and ﬂexible operating system for tiny networked sensors. In: Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks (LCN 2004), Washington, DC, USA, pp. 455–462. IEEE Computer Society, Los Alamitos (2004) 6. Gay, D., Levis, P., von Behren, R.V., Welsh, M., Brewer, E., Culler, D.: The nesc language: A holistic approach to networked embedded systems. In: PLDI 2003: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pp. 1–11. ACM Press, New York (2003) 7. Hughes, D., Greenwood, P., Blair, G., Coulson, G., Grace, P., Pappenberger, F., Smith, P., Beven, K.: An experiment with reﬂective middleware to support gridbased ﬂood monitoring. Conc. Comp.: Pract. Exper. 20(11), 1303–1316 (2008) 8. Hughes, D., Thoelen, K., Horr´e, W., Matthys, N., Del Cid, J., Michiels, S., Huygens, C., Joosen, W.: LooCI: a loosely-coupled component infrastructure for networked embedded systems. Technical Report CW 564, K.U.Leuven (September 2009) 9. IBBT-DEUS project (July 2009), https://projects.ibbt.be/deus 10. IWT STADiUM project 80037. Software technology for adaptable distributed middleware (July 2009), http://distrinet.cs.kuleuven.be/projects/stadium/ 11. Levis, P., Culler, D.: Mat´e: a tiny virtual machine for sensor networks. In: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, New York, USA, pp. 85–95 (2002) 12. Levis, P., Madden, S., Gay, D., Polastre, J., Szewczyk, R., Woo, A., Brewer, E.A., Culler, D.E.: The emergence of networking abstractions and techniques in tinyos. In: Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI 2004), March 2004, pp. 1–14 (2004) 13. OSGi Alliance. About the OSGi Service Platform, whitepaper, rev. 4.1 (June 2007) 14. OSOA. SCA Policy Framework. SCA Version 1.00 (March 2007) 15. Rellermeyer, J.S., Alonso, G.: Concierge: a service platform for resourceconstrained devices. SIGOPS Oper. Syst. Rev. 41(3), 245–258 (2007) 16. Russello, G., Mostarda, L., Dulay, N.: ESCAPE: A component-based policy framework for sense and react applications. In: Chaudron, M.R.V., Szyperski, C., Reussner, R. (eds.) CBSE 2008. LNCS, vol. 5282, pp. 212–229. Springer, Heidelberg (2008) 17. Sun Microsystems. Sun SPOT world (July 2009), http://www.sunspotworld.com/ 18. Sun Squawk Virtual Machine (July 2009), http://squawk.dev.java.net/

MapReduce System over Heterogeneous Mobile Devices Peter R. Elespuru, Sagun Shakya, and Shivakant Mishra Department of Computer Science University of Colorado, Campus Box 0430 Boulder, CO 80309-0430, USA

Abstract. MapReduce is a distributed processing algorithm which breaks up large problem sets into small pieces, such that a large cluster of computers can work on those small pieces in an eﬃcient, timely manner. MapReduce was created and popularized by Google, and is widely used as a means of processing large amounts of textual data for the purpose of indexing it for search later on. This paper examines the feasibility of using smart mobile devices in a MapReduce system by exploring several areas, including quantifying the contribution they make to computation throughput, end-user participation, power consumption, and security. The proposed MapReduce System over Heterogeneous Mobile Devices consists of three key components: a server component that coordinates and aggregates results, a mobile device client for iPhone, and a traditional client for reference and to obtain baseline data. A prototypical research implementation demonstrates that it is indeed feasible to leverage smart mobile devices in heterogeneous MapReduce systems, provided certain conditions are understood and accepted. MapReduce systems could see sizable gains of processing throughput by incorporating as many mobile devices as possible in such a heterogeneous environment. Considering the massive number of such devices available and in active use today, this is a reasonably attainable goal and represents an exciting area of study. This paper introduces relevant background material, discusses related work, describes the proposed system, explains obtained results, and ﬁnally, discusses topics for further research in this area. Keywords: MapReduce, iPhone, Android, Mobile Platforms, Apache, Ruby, PHP, jQuery, JavaScript, AJAX.

1

Introduction

Distributed computing has come into its own in the internet age. Such a large computational pool has given rise to endeavors such as the SETI@Home [14], and Folding@Home [7] projects, which both attempt to allow any willing person to surrender a portion of their desktop computer or laptop to a much larger computational goal. In the case of SETI@Home, millions of users participate to analyze data in search of extra terrestrial signals therein, whereas Folding@Home S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 168–179, 2009. c IFIP International Federation for Information Processing 2009

MapReduce System over Heterogeneous Mobile Devices

169

is a bit more practical. Folding@Home’s goal is ”to understand protein folding, misfolding, and related diseases”. These systems, along with others which are mentioned later, are conceptually similar to what we propose, which is a system that allows people to participate freely in these kinds of massive, computationally bound problems so that results may be quickly obtained. There are many similar approaches to solving large computationally intensive problems. One of the most famous of these is the problem of providing relevant search of the Internet itself [2]. Google has emerged as the superior provider of that capability, and a portion of that superiority comes by way of the underlying algorithms in use to make their process eﬃcient, elegant, and reliable [13], MapReduce [4]. MapReduce is similar to other mechanisms employing parallel computations, such as parallel preﬁx schemes [12] and scan primitives [3], and is even fairly similar to blockedsort based indexing algorithms [16]. We believe there exists a blatant disregard of certain capable devices [11] in the context of these kinds of distributed systems. Existing implementations have neglected the mobile device computation pool, and we suspect this is due to a number of factors which hamper most current mobile devices. It seems only smart phones are powerful enough, computation wise, for most of these distributed workloads. There are many additional concerns as well that have been covered by prior work, such as power usage, security concerns [9] and potential interference with the device’s intended usage model as a phone. All of these factors limit the viability of incorporating mobile devices into a distributed system. It is our belief that despite these limitations, there are solutions that allow the inclusion of the massive smart phone population [6] into a distributed system. One logical progression of MapReduce, and other such distributed algorithms, is toward smart mobile devices primarily because there are so many of them, and they are largely untapped. Even a small scale incorporation of this class of device can have an enormous impact on the systems at large and how they accomplish their goals. Increases in data volume underscores the need for additional computational power as the world continues to create far more data than it can realistically and meaningfully process [5]. Using smart mobile devices, in addition to the more traditional set of servers, is one possible way to increase computational power for these kinds of systems, and is exactly what we attempt to prove and quantify in speciﬁc cases by leveraging prior work on MapReduce. This paper explores the feasibility of using smart mobile devices in a MapReduce system by exploring several areas, including quantifying the contribution they make to overall computation throughput, end-user participation, power consumption, and security. We have implemented and experimented with a prototype of a MapReduce system that incorporates three types of devices: a standard Linux server, an iPhone, and an iPhone simulator. Preliminary results from our performance measurements support our claim that mobile devices can indeed contribute positively in a large heterogenous MapReduce system, as well as similar systems. Given that the number of smart phones is clearly on the rise, there is immense potential in using them to build computationally-intensive parallel processing applications.

170

P.R. Elespuru, S. Shakya, and S. Mishra

The rest of the paper is organized as follows. In Section 2, we brieﬂy outline the MapReduce system. Section 3 touches on similar endeavors. In Section 4, we describe the design of our system, and in Section 5, we describe the implementation details. In Section 7, we discuss experimental results measured from our prototype implementation. Next, we discuss some optimizations in Section 8 and then ﬁnally conclude our paper in Section 10.

2

MapReduce

MapReduce [4] is an increasingly popular programming paradigm for distributed data processing, above and beyond merely indexing text. At the highest architectural level, MapReduce is comprised of a few critical pieces and processes. If you have a large collection of documents or text, that corpus must be broken into manageable pieces, called splits. Commonly, a split is one line of a document, which is the model we follow as well. Once split, a master node must assign splits to workers who then process each piece, store some aspect of it locally, but ultimately return it to the master node or something else for reduction. The reduction then typically partitions the results for faster usage, accounting for statistics, document identiﬁcation and so on. We describe the MapReduce process in three phases: Map, Emit, and Reduce (See Figure 1). In our system, the map phase is responsible for taking a large data set, and chunking it into splits. The Emit phase is entails the distributed processing nodes obtaining and work on the splits, and returning a processed result to another entity, the master node or job server that coordinates everything. Unlike most MapReduce implementations, the nature of mobile devices precludes us from using anything other than network communications to read and write data, as well as assign jobs and process them. The ﬁnal phase is Reduce, which in our case further minimizes the received results into a unique set of data that ultimately gets stored in a database for simplicity. For example, given a large set of plain text ﬁles over which you may wish to search by keyword, a MapReduce system begins with a master node that takes all those text ﬁles, splits them up line by line, and parcels them out to participants. The participating computation nodes ﬁnd the unique set of keywords in each line of text they were given, and emit that set back to the master node. The masFig. 1. High Level Map Reduce Explanation ter node, after getting all of the pieces back, aggregates all

MapReduce System over Heterogeneous Mobile Devices

171

of the responses to determine the overall unique set of keywords for that whole set of data, and stores the result in a database, ﬁle, or some other persistent storage medium. It is at this point the data is analyzed and searched whatever way desired from within our web application. One of the biggest strengths of MapReduce lies in its inherent distribution of phases, which results in an extremely high degree of reliable parallelism when implemented properly. MapReduce is both fault and slow-response tolerant, which are very desirable characteristics in any large distributed system.

3

Related Work

There have been a number of other explorations of heterogeneous MapReduce implementations and their performance [15], as well as some more unique expansions on the idea such as using JavaScript in an entirely client side browser processing framework [8] for MapReduce. None of this related work however focuses on using a mobile device pool as a major computation component. To complement these related works, we focus on mobile devices, and in particular, on the speciﬁcs of heterogeneity in the context of mobile devices mixed with more traditional computation resources.

4

The Heterogeneous Mobile Device MapReduce System

Our problem encompasses three areas: 1) Provide a mechanism for interested parties to participate in a smart phone distributed computational system, and ensure they are aware of the potential side eﬀects; 2) Make use of this opt-in device pool to compute something and provide aggregate results; and 3) Provide meaningful results to interested parties, and summarize them in a timely fashion, considering the reliability of devices on wireless and cellular networks. Our solution is The Heterogeneous Mobile Device MapReduce System. Fig. 2. System Summary There are several key components in our system: 1) A server which acts as the master node and coordinator for MapReduce processing; 2) Server side client code used to provide faster more powerful client processing in conjunction with mobile devices, 3) The mobile device client which

172

P.R. Elespuru, S. Shakya, and S. Mishra

implements MapReduce code to get, work on, and emit results of data from the master node; and ﬁnally 4) The BUI, or browser user interface (web application), which lets the results be searched (See Figure 2). The MapReduce master node server leverages the Apache [17] web server for HTTP. To provide the MapReduce stack, we actually have two diﬀerent implementations of our master node/job server code, one in Ruby [18] and one in PHP [19]. However, we primarily used the PHP variant during our testing. Once the master node has been seeded with some content to process, it is told to begin accepting participant connections. Once the process begins, clients of any type, mobile or traditional, may connect, get work, compute and return results. During processing, clients, whether they are mobile devices or processes running on a powerful server, can continually request work and compute results until nothing is left to do for a given collection. In this case, the server still responds to requests, but does not return work units since the cycle is complete (See Figure 3). After all the data has been processed, clients can still request work, but obviously are not furnished anything. At this point, our web application front end is used to search for keywords throughout the documents which were just processed. The web application was implemented in PHP and makes use of the jQuery [20] JavaScript framework to provide asynFig. 3. Client Flow chronous (AJAX) page updates as workers complete units, in real-time. More can be seen in Figure 2. Further, Figure 4 illustrates exactly what the entire process looks like.

Fig. 4. Work Loop

MapReduce System over Heterogeneous Mobile Devices

5

173

System Development

There are a few additional aspects of developing this system that warrant discussion. Our experience with the development environment, and lessons learned are worth sharing as well. 5.1

Mobile Client Application Development Experience

We developed our mobile client application on the iPhone OS platform using the iPhone SDK, Cocoa Touch framework and Objective-C programming language. As part of the iPhone SDK, the XCode development environment was used for project management, source code editing and debugging. To run and test the MapReduce mobile client, we used the iPhone simulator in addition to actual devices. Apple’s Interface Builder provided a drag and drop tool to develop the user interface very rapidly. All in all the experience was extremely positive [10]. 5.2

Event Driven Interruption on iPhone

Event handling on the iPhone client proved rather interesting, due largely to the fact that certain events can override an application and take control of the device against an application’s will. While the iPhone is processing data, other events like an incoming phone call, a SMS message or a calendar alert event can take control of the device. In the case of an incoming phone call, the application is paused. Once the user hangs up, the iPhone client is relaunched by iPhone OS, but it is up to the application to maintain state. While on the call, if the user goes back to the home screen or launches another application, the phone client does not resume, and again the application is responsible for maintaining state. When an SMS message or calendar event occurs the computation continues in the background unless the user clicks on the message or views the calendar dialog. In the latter case the action is same as when there is phone call. These events, which are entirely out of the control of the application, pose an interesting challenge and must be addressed during development.

6

End-User Participation

Participants are largely in two diﬀerent camps, captive and voluntary. For example, if a system such as ours was deployed in a large corporation where most employees have company provided mobile devices, that company could require employees to allow their devices to participate in the system. These are what we consider captive users. Normal users on the other hand are true volunteers, and participate for diﬀerent reasons. The key is to come up with methods which engage both of these types of users so that the overall experience is positive for everyone involved. There are a large number of possible solutions to entice both types of users. Both captive and voluntary users could be oﬀered prizes for participation, or perhaps simply receive accolades for being the participant with the most computed work units. This is similar to what both SETI@Home and Folding@Home

174

P.R. Elespuru, S. Shakya, and S. Mishra

do, and has proven eﬀective. The sense of competition and participation drives people to team up with the hopes of being the most productive participant. This topic is discussed further later on as well.

7

Results

Our results were very interesting. We created several data sets of varying sizes composed of randomly generated text ﬁles of varying sizes. Data set sizes overall ranged from 5 MB to almost 50 MB. Within those data sets, each individual text document ranged in size as well, from a few kilobytes up to roughly 64 kilobytes each. Processing throughput was largely consistent independent of both the overall Fig. 5. Client Type Comparison data set size and the distribution of included document sizes. Figure 5 illustrates exactly what we expected would be the case. The simulated iPhone clients were the fastest, followed by the traditional perl clients, and lastly the real iPhone clients, which processed data at the slowest rate of all clients tested. The reason this behavior was expected is that the simulated iPhone clients ran on the same machine as the server software during our tests. The perl clients were executed on remote Linux machines. Interestingly though, mixing and matching client types didn’t seem to impact the contribution of any one particular client type. Perl clients processed data at roughly the same rate independent of whether a given test included only perl clients, as did simulated and real iPhone clients. Figure 6, presents another visualization that clearly shows there was a fair amount of variation in the diﬀerent client types. Again, the simulated iPhone clients were able to process the most data, primarily because they were run on the same machine as the server component. The traditional perl clients were not far behind, and the real iPhone Fig. 6. Min Max Average clients were the laggards of the bunch.

MapReduce System over Heterogeneous Mobile Devices

7.1

175

Interpretation of the Results

Simulated iPhone clients processed an average of 1.64 MB/sec, Perl clients processed an average of 1.29 MB/sec, and ﬁnally, real iPhone clients processed an average of 0.12 MB/sec. The simulated iPhone clients can be thought of as another form of local client, and they help highlight the diﬀerence and overhead in the wireless conFig. 7. Projected System Throughput nection and processing capabilities of the real phones. These results and averages were consistent across a variety of data sets, both in terms of size and textual content. Our results show that very consistently, the iPhones were capable of performing at roughly an order of magnitude slower than the traditional clients, which is a very exciting result. It implies that a large portion of processing could be moved to these kinds of mobile clients, if enough exist at a given time to perform the necessary work load. For example, a company could purchase one server to operate as the master node, and farm all of the processing to mobile devices within their own company. Provided they have on the order of one hundred or more employees with such devices, which is a very likely scenario. This also suggests that this system could be particularly useful for non-time sensitive computations. For example, if a company had a large set of text documents it needed processed, it could install a client on its employees mobile devices. Those devices in turn could connect and work on the data set over a long period of time, so long as they are capable of processing data faster than it is being created. Considering how easy it is to quantify the contribution each device type is capable of making, such a system could very easily monitor its own progress. In summary, there are a large number of problems to which this system is a viable and exciting solution. We had a limited number of actual devices to test with (3 to be speciﬁc) but all performed consistently across all tests and data sets, so we feel comfortable projecting forward to estimate the impact of even more devices. As you increase the number of actual devices, throughput should grow similarly to what is represented in Figure 7. If a system utilized 500 mobile devices, we expect that system would be capable of processing close to 60 MB/sec of textual data. Similarly, 10000 devices would likely yield the ability to process 1,200 MB/sec (1.2 GB/sec!) of data. This certainly suggests our system warrants further exploration, but points to the fact that other components of the system would deﬁnitely start becoming bottlenecks. For example, at those rates, it would take massive network bandwidth to even support the data transfer necessary for the processing to take place.

176

8

P.R. Elespuru, S. Shakya, and S. Mishra

Optimizations

There are a few areas where certain aspects of this system could be improved to provide a more automatic and ideal experience. It is particularly important that the end user experience be as automatic and elegant as possible. 8.1

Automatic Discovery

Currently, the client needs to know the IP address and port number of the server in order to participate. This requires prior knowledge of the server address information which may be a barrier to entry for our implementation of MapReduce. In order to allow auto-discovery, we can have Bonjour (aka mDNS), a service discovery protocol, running on the server and clients. Bonjour automatically broadcasts the service being oﬀered for use. With Bonjour enabled on the server and clients a WiFi network is not an absolute requirement. However, there are some limitations of using Bonjour as service discovery protocol. Namely, all devices must be on the same subnet of the same local area network, which imposes maximum client limits that would minimize the viability of our system in those situations. 8.2

Device Specific Scaling

An important goal of this system is that it can be used on heterogeneous mobile devices. As such, not all mobile devices perform the same or have the same power usage characteristics. The system should ideally have the ability to know about each type of device it can run on to maintain a proﬁle of sorts. The purpose is to allow the system to optimize itself. For example, on a Google Android device it would have one proﬁle, and on an iPhone it would have another, and in each case the client application would taylor itself to the environment on which it is running. The ultimate goal is to maximize performance relative to power consumption. 8.3

Other Client Types

In addition to smart mobile devices of various types, and traditional clients, there are other kinds of clients which could be used in conjunction with the other two varieties. In particular, a JavaScript client would allow any web browser to connect and participate [8] in the system. The combination of these three types of clients would be formidable indeed, and form a potentially massive computational system.

9

Additional Considerations

There are a number of areas we did not explore as part of our implementation. However, the following topics would need to be considered in an actual production implementation.

MapReduce System over Heterogeneous Mobile Devices

9.1

177

Security

Security is a multi-faceted topic in this context. Our primary concerns are two fold, ﬁrst can the client implementation impact security of end-user’s mobile devices, or in any way be used to compromise their devices. Second, is the data mobile devices receive in the course of participating information that would be risky to have out in the open, were a device compromised by some other means. A production implementation would need to ensure that even if a mobile device is compromised by some other means, any data associated with this processing system is inaccessible. One way to accomplish this might be to improve the client to store its local results in encrypted form, and transmit them via HTTPS to ensure the communication channel also minimizes the opportunity for compromise. Another consideration that must be made is whether to process sensitive information in the ﬁrst place. In fact, certain regulations may even prevent it altogether. 9.2

Power Usage

Power usage is a very critical topic when considering a system such as this. The additional load placed on mobile devices will certainly draw more power, which is potentially even disastrous in some situations. For example, if the mobile device is an emergency phone, running down the battery to participate in a computation cycle is a very bad idea. Ultimately, power usage must be considered when deciding which devices to allow in the mix. There are a number of things which may be done to account for these concerns, such as adding code to the mobile client that would prevent it from participating if a certain power level has been passed. This may prove particularly tricky however, since not all mobile platforms include API calls that allow an application to probe for that kind of low level system information. A balance must be reached, and it is the responsibility of the client application implementation to ensure that balance is maintained. 9.3

Participation Incentives

Regardless of whether a participating end user is a captive corporate user or a truly voluntary end user, there should be an incentive structure that rewards participation in a manner that beneﬁts all parties. There are several incentives that could be considered. One potential way would be to oﬀer some kind of reward for participating, based on the number of work units completed for example. This would entice users to participate even more. A community or marketplace could even be set up around this concept. For example, companies could post documents they want processed and oﬀer to pay some amount per work unit completed. Users could agree to participate for a given company, and allow their device to churn out results as quickly as possible. This would have to be a small amount of money per work unit to be viable, perhaps a few cents each. Such a marketplace could easily become quite large and be very beneﬁcial to all involved. Amazon has a similar concept in place with its Mechanical Turk, that

178

P.R. Elespuru, S. Shakya, and S. Mishra

allows people to post work units which other people then work on for a small sum of money [1]. Another possibility would be to bundle the processing into applications where it runs in the background, such as a music player, so that work goes on continually while the media player is playing music. The incentive could be a discount of a few cents when purchasing songs through that application, relative to some number of successfully completed jobs. The possibilities are numerous.

10

Conclusions

As is clearly evident in our results, mobile devices can certainly contribute positively in a large heterogenous MapReduce system. The typical increase from even a few tens of mobile devices is substantial, and will only increase as more and more mobile devices participate. Assuming a good server implementation exists, the mobile client contribution should increase with each new mobile device added. It is expected there would be a point of diminishing returns relative to network communication overhead, but the potential beneﬁt is still very real. If non-captive user bases could be properly motivated, there is a large potential here to process massive amounts of data for a wide range of uses. This is conceptually similar to existing cloud computing, but where computation and storage resources happen to be mobile devices, or they interoperate between the traditional cloud and a new set of mobile cloud resources.

References 1. Amazon, Inc. Amazon Mechanical Turk, https://www.mturk.com/mturk/welcome 2. Barroso, L.: Web Search for a Planet: The Google Cluster Architecture. IEEE 23(2) (March 2003) 3. Blelloch, G.E.: Scans as Primitive Parallel Operations. IEEE Transactions on Computers 38(11) (November 1989) 4. Dean, J., Ghemawat, J.: Map Reduce, Simplied Data Processing On Large Clusters. ACM, New York (2004) 5. Dubey, P.: Recognition, Mining, and Synthesis Moves Computers to the Era of Tera. Technology@Intel Magazine (February 2005) 6. Egha, G.: Worldwide Smartphone Sales Analysis, UK (February 2008) 7. Folding@Home. Folding@Home project, http://folding.stanford.edu/ 8. Grigorik, I.: Collaborative MapReduce in the Browser (2008) 9. Hunkins, J.: Will Smartphones be the Next Security Challenge (October 2008) 10. iPhone Developer Program. iphone development, http://developer.apple.com/iphone/program/develop.html 11. Krazit, T.: Smartphones Will Soon Turn Computing on its Head, CNet (March 2008) 12. Ladner, R.E., Fischer, M.J.: Parallel Prex Computation. Journal of the ACM 27(4) (October 1980) 13. Mitra, S.: Robust System Design with Built-in Soft-Error Resilience. IEEE 38(2) (February 2005)

MapReduce System over Heterogeneous Mobile Devices

179

14. SETI@Home. SETI@Home Project, http://setiathome.ssl.berkeley.edu/ 15. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce Performance in Heterogeneous Environments. In: OSDI (2008) 16. Manning, C., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008) 17. Apache Web Server. Apache, http://httpd.apache.org/ 18. Ruby Programming Language Ruby, http://www.ruby-lang.org/en/ 19. PHP Programming Language PHP, http://www.php.net/ 20. jQuery JavaScript Framework. jQuery, http://jquery.com/

Towards Time-Predictable Data Caches for Chip-Multiprocessors Martin Schoeberl, Wolfgang Puffitsch, and Benedikt Huber Institute of Computer Engineering Vienna University of Technology, Austria [email protected], [email protected], [email protected]

Abstract. Future embedded systems are expected to use chip-multiprocessors to provide the execution power for increasingly demanding applications. Multiprocessors increase the pressure on the memory bandwidth and processor local caching is mandatory. However, data caches are known to be very hard to integrate into the worst-case execution time (WCET) analysis. We tackle this issue from the computer architecture side: provide a data cache organization that enables tight WCET analysis. Similar to the cache splitting between instruction and data, we argue to split the data cache for different data areas. In this paper we show cache simulation results for the split-cache organization, propose the modularization of the data cache analysis for the different data areas, and evaluate the implementation costs in a prototype chip-multiprocessor system.

1 Introduction With respect to caching, memory is usually divided into instruction memory and data memory. This cache architecture was proposed in the first RISC architectures [1] to resolve the structural hazard of a pipelined machine where an instruction has to be fetched concurrently to a memory access. The independent caching of instructions and data has also enabled the integration of cache hit classification of instruction caches into the worst-case execution time analysis (WCET) [2]. While analysis of the instruction cache is a mature research topic, data cache analysis is still an open problem. After n accesses with unknown address to a n-way set associative cache, the abstract cache state is lost. In previous work we have argued about cache splitting in general [3]. We have argued that caches for data with statically unknown addresses shall be fully associative. In this paper we evaluate time-predictable data cache solutions in the context of the Java virtual machine (JVM). We provide simulation results for different cache organizations and sketch the resulting modular analysis. Furthermore, an implementation in the context of a Java processor shows the resource consumptions and limitations of highly associative cache organizations. Access type examples are taken from the JVM implemented on the Java processor JOP [4]. Implementation details of other JVMs may vary, but the general classification of the data areas remains valid. Part of the proposed solution can be adapted to other object-oriented languages, such as C++ and C#, as well. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 180–191, 2009. c IFIP International Federation for Information Processing 2009

Towards Time-Predictable Data Caches for Chip-Multiprocessors

181

2 Data Areas and Access Instructions The memory areas used by the JVM can be classified into five categories: Method area. The instruction memory that contains the bytecodes for the execution. On compiled Java systems this is the native code area Stack. Thread local stack used for stack frames, arguments, and local variables Class information. A data structure representing the different types. Contains the type description, the method dispatch table, and the constant pool. Heap. Garbage collected heap of class instances. The object header, which contains auxiliary information, is stored on the heap or in a distinct handle area. Class variables. Shared memory area for static variables. Caching of the method area and the stack area have been covered in [5] and [6]. In this paper we are interested in a data cache solution for the remaining data areas. On standard cache architectures these memory areas and the stack memory share the same data cache. 2.1 Data Access Types Data memory accesses (except stack accesses) can be classified as follows: CLINFO. Type information, method dispatch table, and interface dispatch table. The method dispatch table is read on virtual and static method invocation and on the return from a method. The method dispatch table contains two words per method. Bytecodes: new, anewarray, multianewarray, newarray, checkcast, instanceof, invokestatic, invokevirtual, invokespecial, invokeinterface, *return. CONST. Constant pool access. Is part of the class information. Bytecodes: ldc, ldc w, ldc2 w, invokeinterface, invokespecial, invokestatic, invokevirtual. STATIC. Access to static fields. Is the class variables area. Bytecodes: getstatic, putstatic. HEADER. Dynamic type, array length, and fields for garbage collection. The type information on JOP is a pointer to the method dispatch table within CLINFO. On JOP each reference is accessed via one indirection, called the handle, to simplify the compacting garbage collection. The header information is part of the handle area. Bytecodes: getfield, putfield, *aload, *astore, arraylength, *aload, *astore, invokevirtual, invokeinterface. FIELD. Object field access. Is part of the heap. Bytecodes: getfield, putfield. ARRAY. Array access. Is part of the heap. Bytecodes: *aload, *astore. 2.2 Cache Access Types The different types of data cache accesses can be classified into four classes w.r.t. the cache analysis: – The address is always known statically. This is the case for static variables (STATIC), which are resolved at link time, and for the constant pool (CONST), which only depends on the currently executed method.

182

M. Schoeberl, W. Puffitsch, and B. Huber

– The address depends on the dynamic type of the operand, but not on its value. Therefore, the set of possible addresses is restricted by the receiver types determined for the call site. The class info table, the interface table and the method table are in this category (CLINFO). – The address depends on the value of the reference. The exact address is unknown, as some value on the managed heap is accessed, but in addition to the symbolic address a relative offset is known. Instance fields and array fields, both showing some degree of spatial locality, belong to this category (FIELD, ARRAY). – The last category contains handles, references to the method dispatch table, and array lengths (HEADER). They reside on the heap as well, but we only know the symbolic address. 2.3 Cache Coherence For a chip-multiprocessor system the cache coherence protocol is the major limiting factor on the scalability. Splitting data caches also simplifies the cache coherence protocol. Most data areas are actually constant (CLINFO, CPOOL). Access into the handle area (HEADER) is pseudo-constant. The data written into the header area during object creation and can not be changed by a thread. However, the garbage collector can modify this area. To provide a coherent view of the handle area between the garbage collector and the mutators, a cache for the handle area has to be updated or invalidated appropriately. Data on the heap (FIELD, ARRAY) and in the static area (STATIC) is shared by all threads. With a write-through cache the cache coherence can be enforced by invalidating the cache on monitorenter and before reads from volatile fields.

3 Cache Benchmarks Before developing a new cache organization we run benchmarks to evaluate memory access patterns and possible cache solutions. Our assumption is that the hit rate on the average case correlates with the hit classification in the WCET analysis, when different access types are cached independently. Therefore, we can reason about useful cache sizes from the benchmark results. For the benchmarks we use two real-world embedded applications [7]: Kfl is one node of a distributed control application and Lift is a lift controller deployed in industrial automation. The Kfl example is a very static application, written in conservative, procedural style. The application Lift was written in a more object-oriented style. Furthermore, two benchmarks from an embedded TCP/IP stack (UdpIp and Ejip) are used to collect performance data. Figure 1 shows the access frequencies for the different memory areas for all benchmarks. There are no write accesses to the constant data areas and also no write access to the pseudo-constant area (HEADER). As we measure applications without object allocation at runtime, the data in the HEADER area is not mutated. The general trend is that load instructions dominate the memory traffic (between 89% and 93%).

Towards Time-Predictable Data Caches for Chip-Multiprocessors

183

Table 1. Data memory traffic to different memory areas (in % of all data memory accesses) Kfl

Lift

UdpIp

Ejip

load store load store load store load store CLINFO CONST STATIC HEADER FIELD ARRAY

31.2 11.4 28.3 14.3 0.0 3.9

0.0 7.4 0.0 14.4 0.0 2.6 0.0 13.8 7.6 2.6 0.6 8.8 0.0 50.5 0.0 39.1 0.0 24.9 0.8 6.3 3.2 4.7 5.7 10.7

0.0 0.0 1.1 0.0 1.8 4.0

10.7 12.3 12.3 39.4 6.4 10.6

0.0 0.0 3.4 0.0 1.0 4.0

For the Kfl application there are no field accesses (FIELD). Dominating accesses are to static fields (STATIC), static method invocation (CLINFO), and access to the constant pool (CONST). The rest of the accesses are related to array accesses (HEADER, ARRAY). The Lift application has a quite different access pattern: instance field accesses dominate all reads (FIELD and HEADER). There are less methods invoked than in the Kfl application and less static fields accessed. The array access frequency of both applications is similar (4%–5%), for the TCP/IP benchmark, due to many buffer manipulations, considerable higher (11% loads). 3.1 Cache Simulations As first step we simulate different cache configurations with a software simulation of JOP (JopSim) and evaluate the average case hit count. Handle Cache. As all operations on objects and arrays need an indirection through the handle we first simulate a cache for the handle. The address of the handle is not known statically, therefore we assume a small fully-associative cache with LRU replacement policy. The results of the cache is shown in Table 2 for different sizes. The size is in single words. Quite interesting to note is that even a single entry cache provides a hit rate for the handle indirection access of up to 72%. Caching a single handle should be so simple, so a single cycle hit detection including a memory read start in the same cycle should be possible. In that case, even a uniprocessor JOP with a two cycle memory read will gain some speedup. A size of just 8 entries results in a reasonable hit rate between 84% and 95%. Constants and the Method Table. Mixing access to the method table and access to the constant pool in one direct mapped cache is an option when the receiver types can be determined precisely. However, if the set of possible receiver types is large, the analysis becomes less precise. Therefore, we evaluate individual caches for the constant pool access (CPOOL) and the access to the method table (CLINFO). Table 3 shows that a small direct-mapped cache of 512 words (2 KB) gives a hit rate of 100%. Keeping the cache sizes small is important for our intended system. We are targeting chip-multiprocessor systems with private caches, even for accesses to constants, to keep the individual tasks time-predictable. A shared cache would not allow to perform any cache analysis of individual tasks.

184

M. Schoeberl, W. Puffitsch, and B. Huber Table 2. Hit rate of a handle cache, fully associative, LRU replacement Hit rate (%) Size Kfl Lift UdpIp Ejip 1 2 4 8 16 32

72 82 84 88 92 95

15 20 94 95 95 95

43 80 87 91 94 96

69 78 82 84 84 86

Table 3. Hit rate of a constant pool cache, direct mapped Hit rate (%) Size Kfl Lift UdpIp Ejip 32 68 69 77 82 64 96 69 79 95 128 98 69 88 95 256 100 100 100 95 512 100 100 100 100 Table 4. Hit rate of a method table cache, direct mapped Hit rate (%) Size Kfl Lift UdpIp Ejip 32 64 83 64 85 83 128 91 100 256 100 100

62 77 85 97

49 74 93 95

The hit rate of a direct-mapped cache for the method table (MTAB) shows a similar behavior as the constant pool caching, as shown in Table 4. A size of 256 words gives a hit rate between 95% and 100%. It has to be noted that the method table is accessed by static and virtual methods. While the MTAB entry is known statically for static methods, the MTAB entry for virtual methods depends on the receiver type. If data-flow analysis can determine most of the receiver types the combination of a single cache for the constant pool and the method table is an option further to explore. Static Fields. Table 5 shows the results for a direct mapped cache for static fields. For object-oriented programs (represented by Lift), this cache can be kept very small. Although the addresses are statically known as the addresses for the constants, a combination of these two caches is not useful. Static fields need to be kept cache coherent, constant pools entries are implicitly cache coherent. Cache coherence enforcement, with cache invalidation at synchronized blocks, limits the hit rate in UdpIp and Ejip.

Towards Time-Predictable Data Caches for Chip-Multiprocessors

185

Table 5. Hit rate of a static field cache, direct mapped Hit rate (%) Size Kfl Lift UdpIp Ejip 32 76 64 85 128 99 256 100

100 100 100 100

33 33 33 33

77 77 77 77

Table 6. Hit rate of an instance field cache, fully associative, LRU replacement Hit rate (%) Size Kfl Lift UdpIp Ejip 1 2 4 8 16 32

84 84 84 84 84 84

17 75 86 88 88 88

47 59 65 67 67 67

9 13 18 20 20 20

Object Fields. Addresses of object fields are unknown for the analysis. Therefore, we can only attack the analysis problem via a high associativity. Table 6 shows hit rates of fully-associative caches with LRU replacement policy. For the Lift benchmark we observe a moderate hit rate of 88% for a very small cache of just 8 entries. UdpIp and Ejip saturate at 8 entries due to cache invalidation during synchronized blocks of code. 3.2 Summary From the experiments with simulation of different caches for different memory areas we see that quite small caches can provide a reasonable hit rate. However, as the memory access latency for a CMP system with time-sliced memory arbitration can be quite high,1 even moderate cache hit rates are a reasonable improvement.

4 Cache Analysis In the following section we sketch cache analysis as it will be performed in a future version of our WCET analysis tool [8]. We leverage the cache splitting of the data areas for a modular analysis, e.g., analysis of heap allocated objects is independent from analysis of the cache for constants or cache for static fields. 1

Our 8 core CMP prototype with a time slot of 6 cycles per core has a worst-case latency of 48 cycles.

186

M. Schoeberl, W. Puffitsch, and B. Huber

In multithreaded programs, it is necessary to invalidate the cache when entering a synchronized block or reading from volatile variables.2 We require that accesses to shared data are properly synchronized, which is the correct way to access shared data in Java. In this case it is safe to assume that object references on the heap are not changed by another thread at arbitrary points in the program, resulting in a significantly more precise analysis. The effect of synchronization, namely invalidating some of the caches, has to be taken into account though. The running example is illustrated in Figure 1 and was taken from the Lift application. The figure comprises the source code of the method checkLevel and the corresponding control flow graph in static single assignment (SSA) form. Each basic block is annotated with the cache accesses it triggers. 4.1 Static and Type-Dependent Addresses If we only deal with statically known addresses in a data cache, the standard cache hit/miss classification (CHMC) for instruction caches delivers precise results and is therefore a good choice [9]. In the example, there is only one static variable, LEVEL POS. If we assume a direct-mapped cache for static variables, and a separate one for values on the heap, all but the first access to the field will be a cache hit every time checkLevel is executed. When the address depends on the type of the operand, we have to deal with a set of possible addresses. The straight forward extension of CHMC to sets of memory addresses is to update the abstract cache state for each possible address, and then join the resulting states. This leads to a very pessimistic classification when dynamic dispatch is used, however, and therefore is only acceptable if the exact address is known for most references. 4.2 Persistence Analysis If dynamic types are more common, a more promising approach is to classify program fragments, or partitions, where it is known that one or all memory addresses are locally persistent. If this is the case, they will be missed at most once during one execution of the program fragment. For both direct-mapped and N-way set associative caches with LRU replacement, a dataflow analysis for persistence analysis has been introduced in [10]. For FIFO caches, the concept of persistence is useful as well, but it is not safe anymore to assume that a persistent address will be loaded at the first access. Most work on persistence analysis focusses on dataflow equations and global persistence, leaving out some aspects which deserve more attention. Persistence for the whole program is rare and only of theoretical interest. We therefore identify a set of nested scopes [11], and identify for each scope which cache lines or cache sets are locally persistent. A scope is a subgraph of the control flow graph which represents a 2

The semantics of volatile variables in the Java memory model is similar to synchronized blocks: the complete global state has to be locally visible before the read access. Simply bypassing the cache for volatile accesses is not sufficient.

Towards Time-Predictable Data Caches for Chip-Multiprocessors

012

. &./(

187

9 &

1 29 3 26 =

01

:;0/ 99& !"#$"%& :(;' &4

!"#$"%& '("$&)* !"#$"% + #%%#, +)-*

!"#$"% '("$ )&* !"#$"% '("$ )9*

012&

! "##$

& 3 ;"7"6" 4 & 93 2

0/#/(: !"#$"% '("$ ) *

012&21

%&'((

) ) "*+,- %.!

,.&- /

3 "6"780 ) * 54 9

0/#/(:&9+ !"#$"% #%%#, ) *

+3&2 4 5 +) 4* 1 2

!"#$"%& '("$&)* !"#$"% + #%%#, +) 4* !"#$"% '("$ )&*

2 <&

2& <&< +

&< +

2&21

/

/

Fig. 1. Cache Analysis Example

set of execution sequences. Methods, loops and loop bodies are typical examples of scopes, but partitions of less regular shape are possible as well. To reduce the amount of analysis work, persistence is checked in a bottom-up manner, starting at the leaves of the scope nesting graph. In the example we partitioned the flow graph of checkLevel into two scopes, the first of which contains the method turnOffLeds, while checkLevel.2.Loop is a subscope of the second one. 4.3 Object Header and Fields As the address of the object header and the object fields depends on the instance and is not known at compile time, we use small, fully associative caches to track the cache state. There is usually a high number of handle accesses in object-oriented programs,

188

M. Schoeberl, W. Puffitsch, and B. Huber

but many of them do not change often. In our architecture, the object header is fully transparent at the bytecode level, and is managed by the runtime system. Caching is hence expected to gain a substantial benefit. On other platforms, which compile the bytecode and hold handles and array length values in registers, caching those values is probably less beneficial. To calculate the symbolic addresses of object headers and fields used in some scope, the data dependencies of the control flow graphs in SSA form are analyzed. In SSA form, each variable is only defined once. If the definition is of the form v = φ(v1 , v2 ), the definition is called a φ node, and the value of v is either that of v1 or v2 . For each object header used, those data depenencies which are defined in the same scope, and might be executed more than once within the scope, are identified. If all of those definitions are neither φ nodes nor depend on an indeterministic instruction, the variable representing the object corresponds to a unique symbolic address. Finally, if all references used within a scope correspond to a unique symbolic address, we are able to perform a local persistence analysis. Additionally, using a variant of the global value numbering technique used in optimizing compilers [12], the quality of the analysis is further improved by identifying variables mapping to the same symbolic address. In the running example, no handle has a data dependency on a φ node, and therefore persistence analysis is relatively simple. If a fully associative cache with four cache lines is used, all object headers of scope checkLevel are locally persistent. If the object header cache only has two entries, at least those headers used in scope turnOffLeds and checkLevel.2.loop are locally persistent.

5 Cache Implementation We have implemented various forms of caches in the context of the Java processor JOP [4]: (1) a small fully associative cache with LRU replacement, (2) a fully associative cache with FIFO replacement, and (3) a direct mapped cache. We have combined the different caches to distinguish between different data areas. 5.1 LRU and FIFO Caches The crucial component of an LRU cache is the tag memory. In our implementation it is organized as a shift register structure to implement the aging of the entries (see Figure 2). The tag memory that represents the youngest cache entry (cache line 0) is fed by a multiplexer from all other tag entries and the address from the memory load. This multiplexer is the critical path in the design and limits the maximum associativity. Table 7 shows the resource consumption and maximum frequency of the LRU and FIFO cache. The resource consumption is given in logic cells (LC) and in memory bits. As a reference, a single core of JOP consumes around 3500 LCs and the maximum frequency in the Cyclone-I device without data caches is 88 MHz. We can see the impact on the maximum frequency of the large multiplexer in the LRU cache on configurations with a high associativity. The implementation of a FIFO replacement strategy avoids the change of all tag memories on each read. Therefore, the resource consumption is less than for an LRU

Towards Time-Predictable Data Caches for Chip-Multiprocessors

cache line 0

cache line 1

cache line 2

189

cache line 3

addr reset clk ena[0..3]

dout

hit[0]

din

tag

=

v

tag

ena[3]

dout

hit[1]

din

idx

din

R

=

v

ena[2]

idx

dout

idx

din

idx

din

R

=

v

ena[1]

tag

R

=

v

ena[0]

tag

R

dout

hit[2]

hit[3] hit[0..3]

dout[0]

dout[1]

dout[2]

dout[3] dout[0..3]

address

Fig. 2. LRU tag memory implementation Table 7. Implementation results for LRU and FIFO based data caches LRU Cache Associativity 16-way 32-way 64-way 128-way 256-way

LC Memory

FIFO Cache Fmax

LC Memory

Fmax

783 0.5 KBit 102 MHz 633 0.5 KBit 119 MHz 1315 1 KBit 81 MHz 1044 1 KBit 107 MHz 2553 2 KBit 57 MHz 1872 2 KBit 94 MHz 4989 4 KBit 36 MHz 3538 4 KBit 89 MHz 10256 8 KBit 20 MHz 9762 8 KBit 84 MHz

cache and the maximum frequency is higher. However, hit detection still has to be applied on all tag memories in parallel and one needs to be selected. 5.2 Split Cache Implementation We have combined a direct mapped cache and an LRU cache with one JOP core. The LRU cache stores the object header and the object fields; the direct mapped cache stores class info, constants, and static fields; array data is not cached. Table 8 shows the resources and the maximum system frequency of different cache configurations. The first line gives the base numbers without any data cache. From the resource consumptions we can see that a direct mapped cache is cheap to implement. Furthermore, the maximum clock frequency is independent of the direct mapped cache size. A highly associative LRU cache (i.e., 32-way and more) dominates the maximum clock frequency and consumes considerable logic resources.

6 Related Work Early work on data cache access classification by White et al. focusses on computing addresses and analyzing array access patterns [13]. It is assumed, however, that the

190

M. Schoeberl, W. Puffitsch, and B. Huber Table 8. Implementation results for a split cache design Cache size

DM Cache

DM LRU LC Memory 0 KB 1 KB 2 KB 4 KB 8 KB

0 8 16 32 64

LRU Cache LC Memory

System LC Memory

Fmax

0 0 KBit 0 0 KBit 3530 61 KBit 199 12 KBit 515 0.25 KBit 4731 73 KBit 199 23.5 KBit 1045 0.5 KBit 5142 85 KBit 172 46 KBit 1369 1 KBit 5344 108 KBit 171 90 KBit 3235 2 KBit 7257 153 KBit

88 MHz 85 MHz 85 MHz 81 MHz 79 MHz

exact memory accesses can be resolved. Ferdinand et al. [10] discuss the use of dataflow analysis for data cache analysis. They suggest to use persistence analysis to deal with memory accesses which reference one out of a set of possible addresses. To overcome the problems with unknown memory addresses, Lundquist et al. [14] suggest to distinguish unpredictable and predictable memory accesses to improve the analysis of data caches. If an address cannot be resolved at compile time, accesses to that address are considered as unpredictable. Those data structures which might be accessed by unpredictable memory accesses are marked for being moved into an uncached memory area. Vera et al. [15] lock the cache during accesses to unpredictable data . The locking proposed there affects all kinds of memory accesses though, and therefore is necessarily coarse grained.

7 Conclusion Chip-multiprocessor systems increase the pressure on the memory bandwidth and caching of instructions and data is mandatory. In order to estimate tight WCET values, we propose to split data caches for different data areas. Benchmarking of embedded applications show possible tradeoffs between achievable hit rates and sizes of the different caches. Splitting the data cache for different access types (e.g., constant pool and heap) allows to modularize the cache analysis. Furthermore, unknown addresses of one data type access have no impact on data accesses of a different type. Caches for data where the address is not known statically (e.g., heap allocated data), can only be analyzed when the cache has a very high associativity. From our prototype implementation within an FPGA we conclude that LRU caches scale up to an associativity of 16 and FIFO caches up to an associativity of 64.

Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under grant agreement number 216682 (JEOPARD).

Towards Time-Predictable Data Caches for Chip-Multiprocessors

191

References 1. Patterson, D.A.: Reduced instruction set computers. Commun. ACM 28(1), 8–21 (1985) 2. Arnold, R., Mueller, F., Whalley, D., Harmon, M.: Bounding worst-case instruction cache performance. In: Proceedings of the Real-Time Systems Symposium 1994, December 1994, pp. 172–181 (1994) 3. Schoeberl, M.: Time-predictable cache organization. In: Proceedings of the First International Workshop on Software Technologies for Future Dependable Distributed Systems (STFSSD 2009), Tokyo, Japan. IEEE Computer Society, Los Alamitos (2009) 4. Schoeberl, M.: A Java processor architecture for embedded real-time systems. Journal of Systems Architecture 54(1-2), 265–286 (2008) 5. Schoeberl, M.: A time predictable instruction cache for a Java processor. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM-WS 2004. LNCS, vol. 3292, pp. 371–382. Springer, Heidelberg (2004) 6. Schoeberl, M.: Design and implementation of an efficient stack machine. In: Proceedings of the 12th IEEE Reconfigurable Architecture Workshop (RAW 2005), Denver, Colorado, USA. IEEE, Los Alamitos (2005) 7. Schoeberl, M.: Application experiences with a real-time Java processor. In: Proceedings of the 17th IFAC World Congress, Seoul, Korea (July 2008) 8. Huber, B.: Worst-case execution time analysis for real-time Java. Master’s thesis, Vienna University of Technology, Austria (2009) 9. Theiling, H., Ferdinand, C., Wilhelm, R.: Fast and precise WCET prediction by separated cache and path analyses. Real-Time Syst. 18(2/3), 157–179 (2000) 10. Ferdinand, C., Wilhelm, R.: On predicting data cache behavior for real-time systems. In: M¨uller, F., Bestavros, A. (eds.) LCTES 1998. LNCS, vol. 1474, pp. 16–30. Springer, Heidelberg (1998) 11. Engblom, J., Ermedahl, A.: Modeling complex flows for worst-case execution time analysis. In: RTSS 2000: Proceedings of the 21st IEEE Real-Time Systems Symposium, pp. 163–174. IEEE Computer Society, Los Alamitos (2000) 12. Click, C.: Global code motion/global value numbering. SIGPLAN Not. 30(6), 246–257 (1995) 13. White, R.T., Mueller, F., Healy, C., Whalley, D., Harmon, M.: Timing analysis for data and wrap-around fill caches. Real-Time Syst. 17(2-3), 209–233 (1999) 14. Lundqvist, T., Stenstr¨om, P.: A method to improve the estimated worst-case performance of data caching. In: RTCSA 1999: Proceedings of the Sixth International Conference on RealTime Computing Systems and Applications, Washington, DC, USA, pp. 255–262. IEEE Computer Society, Los Alamitos (1999) 15. Vera, X., Lisper, B., Xue, J.: Data caches in multitasking hard real-time systems. In: RTSS 2003: Proceedings of the 24th IEEE International Real-Time Systems Symposium, Washington, DC, USA, pp. 154–165. IEEE Computer Society, Los Alamitos (2003)

From Intrusion Detection to Intrusion Detection and Diagnosis: An Ontology-Based Approach Luigi Coppolino, Salvatore D’Antonio, Ivano Alessandro Elia, and Luigi Romano Dipartimento per le Tecnologie - Universit` a degli Studi di Napoli “Parthenope” {luigi.romano,luigi.coppolino, salvatore.dantonio,ivano.elia}@uniparthenope.it http://www.dit.uniparthenope.it/FITNESS

Abstract. Currently available products only provide some support in terms of Intrusion Prevention and Intrusion Detection, but they very much lack Intrusion Diagnosis features. We discuss the limitations of current Intrusion Detection System (IDS) technology, and propose a novel approach - which we call Intrusion Detection & Diagnosis System (ID2 S) technology - to overcome such limitations. The basic idea is to collect information at several architectural levels, using multiple security probes, which are deployed as a distributed architecture, to perform sophisticated correlation analysis of intrusion symptoms. This makes it possible to escalate from intrusion symptoms to the adjudged cause of the intrusion, and to assess the damage in individual system components. The process is driven by ontologies. We also present preliminary experimental results, providing evidence that our approach is eﬀective against stealthy and non-vulnerability attacks. Keywords: Intrusion Detection and Diagnosis, Information Diversity, Ontologies, Stealthy and non-vulnerability attacks.

1

Rationale and Contribution

By Diagnosis, we mean the capability of: i) clearly identifying the causes of the attacks, and ii) accurately estimating their consequences on individual system components. Currently available products only provide some (indeed limited) support in terms of Intrusion Prevention and Intrusion Detection, but they very much lack Intrusion Diagnosis capabilities. We strongly believe that this technology trend should be subverted, and that more eﬀorts should be put in the development of eﬀective techniques for implementing Intrusion Diagnosis features. We propose a novel approach, which extends Intrusion Detection System (IDS) technology to what we call Intrusion Detection & Diagnosis System (ID2 S) technology, to overcome this limitation. The basic idea is to collect information at several architectural levels (namely: Network, Operating System, Data Base, and Application), using multiple security probes which are deployed as a distributed S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 192–202, 2009. c IFIP International Federation for Information Processing 2009

From Intrusion Detection to Intrusion Detection and Diagnosis

193

architecture, and use Complex Event Processing (CEP) technology to perform sophisticated correlation analysis of intrusion symptoms. The idea of collecting information from diﬀerent sources to gain more insight into attack/intrusion related phenomena is not new. A (far from complete) list of remarkable works is: [3], [4], and [5]. While the presented works exploit the concept of correlation and multilayer analysis, they do not address the issue of diagnosing the kind of anomaly or attack the system is experiencing. In our approach, the escalation process from intrusion symptoms to the adjudged cause of the intrusion and to an assessment of the damage in individual system components is driven by ontologies. More precisely, we have developed two sets of ontologies: the ﬁrst one allows us, given that a particular symptom has been observed, to identify which are the attacks that have possibly generated that symptom; the second one can be used to infer an estimate of the damage to speciﬁc system components from knowledge of the attack. The output of the process can then be used to drive remediation actions, and ultimately replenish system resources. To demonstrate the eﬀectiveness of our approach, we have conducted preliminary experiments on a testbed consisting of a web server running Joomla, the well known open source Content Management Systems (CMSs) written in PHP. The experiments exposed vulnerabilities to SQL injection (SQLi) and Cross Site Scripting (XSS) attacks of the target applications, which are described in the Bugtraq repository. The experimental tests have demonstrated that our approach leads to better detection results, both in terms of improved accuracy of the classiﬁcation process and of enhanced reliability of the decision-making process. Also importantly, we were able to clearly identify the nature of the attack, as well as the speciﬁc system components which are aﬀected by it. We emphasize that the proposed approach is eﬀective against an emerging class of new attacks, which is referred to in the literature [1] as “stealthy”. These attacks represent a major threat, since not only they have a dramatic impact in terms of economic losses [2], but (i) they also are invisible to current State-Of-The-Art IDSs, and (ii) current Intrusion Prevention Systems are ineﬀective against them. Instead, since such attacks have clear symptoms at architectural levels other than the network, by collecting information also at multiple architectural levels using diverse security probes, and performing sophisticated correlation analysis of attack symptoms, we are able to detect them. The rest of the paper is organized as follows. Sect. 2 describes the approach we propose, and how it is currently implemented in the framework of the INTERSECTION and INSPIRE projects. Sect. 3 describes the ontology-based detection and diagnosis process. Sect. 4 provides a description of the case study and presents preliminary experimental results. Finally, Sect. 5 gives some concluding remarks, along with information concerning the directions of our future work.

2

Conceptual Architecture of the ID2 S Technology

In order to eﬀectively assess the security status of a networked system, the results of the monitoring activities performed at diﬀerent observation points need to be correlated. Such observation points are distributed throughout the network

194

L. Coppolino et al.

Fig. 1. Conceptual architecture of the ID2 S technology

as well as throughout the system to be protected. The more diverse the information sources and the processing methods, the more eﬀective the correlation process. The deployment of probes at diﬀerent observation points located in the networked system and at diﬀerent architectural levels (network level, operating system level, application level, etc.), allows to fulﬁll the requirement of diversity of the information sources. By exploiting information diversity, it is possible to improve the accuracy of the detection process, as well as to implement diagnostic capabilities. Fig. 1 shows the architecture of the proposed Intrusion Detection and Diagnosis System, which comprises the following functional blocks: – Event Collection - Collecting security-related events from a wide range of detection probes implies dealing with heterogeneous data sources. To cope with heterogeneity, a promising solution is the use of Adaptable Parsers (APs). APs extract security-related information from multiple data feeds, and convert it to IDMEF (Intrusion Detection Message Exchange Format) standard messages, which are then routed to the Event Distribution Channel. A system component, called Decision Engine (DE), is in charge of processing data available on the Event Distribution Channel, and of correlating them, in order to take a decision on whether the collected symptoms represent an actual attack or not. The current implementation of AP components relies on Java Compiler Compiler (JavaCC) technology. More details on adaptable parsers are available in [8]. – Stream Mapping - This function converts IDMEF messages describing the symptoms of a possible intrusion to streams of tuples, which are then fed to a Complex Event Processing (CEP) engine. CEP technology enables the Decision Engine to perform sophisticated correlations on information gathered

From Intrusion Detection to Intrusion Detection and Diagnosis

195

from the multiple security probes. The current implementation of the CEP engine is based on Borealis [10]. – Detection - This phase consists in the extraction of higher level knowledge from situational information. This is done in real-time or near real-time. – Diagnosis - An ontology-based hierarchical organization of event patterns is used to automate the process of deriving the queries which implement the diagnostic analysis. The knowledge formalized by the ontology is used by the diagnostic process to identify which intrusions/attacks are the possible cause of the observed symptoms, and which part of the system has been aﬀected by the malicious activity.

3

Ontology-Based Detection and Diagnosis

The Decision Engine is in charge of detecting ongoing attacks by analyzing and correlating the security-related events which have been conveyed to the Event Distribution Channel. The queries are formulated based on the information and knowledge contained in the threat ontology. Fig. 2 presents a high level view of such an ontology. Properties of concepts and sub-concepts (denoted by ellipses) are not shown because they would make the ontology unwieldy. An event generated by a probe is considered as a potential symptom of an attack against a speciﬁc target. Each kind of attack is associated to a set of symptoms. An attack is described by using a number of indicators that assess the trustworthiness of the probe used to detect it. Attack has the isEvaluatedBy property, which is deﬁned by the AttackIndicator concept. AttackIndicator has the following properties: (i) hasTrustworthinessValue, which is deﬁned by the Trustworthiness concept; (ii) isAssociatedTo, which is deﬁned by the Probe concept, and (iii) indicates, which is deﬁned by the Symptom concept. Symptoms are classiﬁed into Abuses, Misuses and Suspicious Acts.

Fig. 2. The Threat Ontology

196

L. Coppolino et al.

Abuses are actions which change the state of an asset. They are further divided into Anomaly-based and Knowledge-based Abuses. The ﬁrst category of abuses includes anomaly behaviors (e.g., unusual application load, anomalous input requests, etc.), while the second category relies on the recognition of signatures of well-known attacks (e.g., brute force attacks). Misuses are out-of-policy behaviors which do not aﬀect the state of the system components (e.g., authentication failures, query failures). Suspicious Acts do not violate any policy. They are events of interest to the probes (e.g., execution of commands providing information about the system state). The Symptom concept has the following properties: (i) isDetectedBy, which is deﬁned by the Probe concept; (ii) isCausedBy, which is deﬁned by the Attacker concept, and (iii) isDirectedTo, which is deﬁned by the Target concept. Each Symptom is characterized by the hasDetectionTime property, which speciﬁes the detection time of the symptom, and the hasIntensityScore property, which measures the probability of occurrence of the symptom with reference to the speciﬁc probe detecting it. Targets are classiﬁed in 4 categories, i.e. Network, Operating System, Data Base and Application. A software tool, named Query Generator, browses the threat ontology, extracts the properties characterizing an attack, and generates the queries to be executed by the Complex Event Processor. Since the events feeding the distribution channel encompass information about both the attack symptom and the detection probe, the Symptom and the Probe concepts of the threat ontology can be used to build the query which drives the correlation process performed by the CEP engine. More precisely, the structure of the threat ontology shows that the Attack Indicator (AI) concept indicates a symptom and is AssociatedTo a probe. Therefore, events on the Event Distribution Channel can be considered as AI instances. While generating the queries, the Query Generator uses the Attack Indicator properties to deﬁne the query parameters which help detect the speciﬁc attack. Fig. 3 shows the main functions that compose the process, namely: Classiﬁcation, Aggregation, and Filtering. Classification. The purpose of the classiﬁcation function is to create a separate stream for each possible attack (i.e. every Attack described in the ontology), in order to make the subsequent aggregation function more eﬃcient. Every attack is associated to a stream containing the Attack Indicators, i.e. the attack information, which should be used to detect the attack according to the IsEvaluatedBy relationship in the threat ontology. The Classiﬁer is implemented as a set of concatenated ﬁlter and map queries. A ﬁlter query extracts from the event descriptions on the Event Distribution Channel only the attack details that concern speciﬁc attack, while a map query converts the selected information to a stream of attack indicators. Aggregation. Once, for every class of Attack, a stream has been created containing only relevant AIs, such indicators are aggregated in order to formulate hypothesis about the ongoing attacks. The aggregation is performed by combining subsets of AIs. The aggregation process scales with the number and the type

From Intrusion Detection to Intrusion Detection and Diagnosis

197

Fig. 3. The query generation process

of matching patterns, thanks to the use of the ontology, which enables to ﬁlter out the most relevant information (for the speciﬁc domain). The implemented aggregation patterns include temporal proximity and Source and/or Target matching of the Symptoms. Event aggregation is performed by making join queries which generate meta-events containing hypothesis on the possible results of the diagnostic activity. In order to discriminate among such hypothesis, a conﬁdence degree is associated to every meta-event by using the HasConfidenceLevel property. The conﬁdence degree is computed through a weighted combination of the HasIntensityScore values of the aggregated Symptoms where the weight of every Symptom is given by the hasTrustworthinessValue of the AI. Filtering. The ﬁltering function performs the crucial task of selecting the Attacks instances that have a HasConfidenceLevel value exceeding a conﬁgurable threshold. This threshold-based ﬁltering enables to lower the number of false positives as only the aggregated events showing a robust detection pattern will be considered as an actual diagnosis of an attack and raise an alert. The diagnostic process aims to extract a higher level of knowledge from the aggregated symptoms of an ongoing Attack. This process is not performed in oﬀline mode on the aggregated AIs. Conversely, it is performed during all the phases of the correlation process. The goal of the diagnostic process is to characterize the ongoing attack in terms of: – Attack Type - The Attack Type indicates the class of the ongoing attack. It is determined during the classiﬁcation phase, by discriminating AIs based on the detectable types of Attack. The aggregation performed on a given class of AIs will result in the detection of that kind of Attack.

198

L. Coppolino et al.

– Attack Targets - Information about the attack target(s) is inferred by looking at the Target concepts of the aggregated AIs. In most cases, not all aggregated Symptoms are DirectedTo the same Target. For example, when detecting an attack against an application server it is required that the behavior of both the application server and the database is monitored, and in case of an ongoing attack the Event Distribution Channel will be fed with diﬀerent events. Some events are raised by probes monitoring the application server, while others are generated by probes observing the database. The generated events correspond to symptoms having a diﬀerent target, and therefore the event correlation would result unfeasible. The information allowing the CEP to identify the relationships and dependencies between the diﬀerent system components is inferred by the Attack Indicators concepts. This approach enables the correlation of the diﬀerent symptoms and the identiﬁcation of the system components aﬀected by the attack. These components are considered as potentially violated targets, and as such they should not be considered trusted. – Attack Latency - Basically, the Attack Latency (AL) parameter is the amount of time that an ongoing attack against to a system has been undetected. More precisely, it represents an upper bound estimate of the amount of time during which the system has been manipulated by the attacker. Given t1 , t2 , .. , tN the timestamps of the aggregated AIs, and tD the instant when the attack is detected, (i.e. the instant of the generation of the alert), AL can be evaluated as: AL= tD - min(t1 , t2 , .. , tN ) with N being the number of the available AIs.

4

Case Study and Experimental Results

In this section we present preliminary results attained by applying the proposed approach in a laboratory experimental testbed. The testbed includes a mySQL database and an Apache web server running an open-source Content Management System (CMS) written in PHP, namely Joomla (v.1.5). We carried out experimental tests in order to assess the capability of the proposed system to detect and diagnose two common attacks, namely SQL injection (SQLi) and Cross Site Scripting (XSS). Attacks were modeled and injected in the normal trafﬁc proﬁle. The following probes have been deployed throughout the networked system: (i)Apache Scalp[11], a host-level signature-based analyzer of Apache web server access logs, which uses a set of rules to spot malicious requests; (ii) ACDM (Anomalous Character Distribution Monitor)[6], a host-level anomalybased probe which analyzes character distribution in HTTP requests and gives a score to every anomalous request; (iii) AQFM (Anomalous Query Failures Monitor), a database-level probe that monitors the rate of failed queries in a SQL database. 4.1

SQL Injection: Assumptions and Experimental Setup

Assuming that the system administrators apply security patches to the web application components in order to prevent attacks exploiting well-known vulnerabilities, any attacker will have to proceed by trial and error in order to ﬁnd a

From Intrusion Detection to Intrusion Detection and Diagnosis

199

way to exploit new and unknown vulnerabilities. For this reason the SQLi attack was modeled as composed of a set of unsuccessful attempts possibly followed by a successful exploitation of the vulnerability. The preliminary attempts performed by the attacker will leave a trace in the web server access log. These requests will look increasingly similar to the successful one as the attacker learns more about the internal mechanisms of the application and increasingly diﬀerent from the requests in the “normal” traﬃc. Moreover these blind attempts will at ﬁrst result in the injection of syntactically wrong queries leaving more traces of the ongoing attack as normal traﬃc usually does not generate errored queries. Given this attack model, both the application server and the database are monitored in order to detect SQLi attacks. The application server is monitored by using two complementary probes: Scalp, which uses a signature based approach, and ACDM, that uses an anomaly-based approach. The database is monitored by using the AQFM probe. From an ontology point of view the SQLi Attack isEvaluatedBy three AIs: (i) SQLi attack detection performed by Scalp; (ii) Anomalous Character Distribution detected by ACDM; (iii) Anomalous Query Failures detected by the AQFM. The proposed Intrusion Detection and Diagnosis system collects such AIs and merges them in a separate event stream. Afterward, the AIs are aggregated in order to group the traces of the same ongoing attack. Since both Scalp and ACDM analyze the Apache access logs events raised by the two probes are aggregated on the basis of the access log entry they refer to. Subsequent attack attempts are correlated by using time proximity and source matching patterns. Once aggregation has been performed a set of meta-events is produced. Each meta-event contains a global, multi-level evidence of an attack and can be considered as a preliminary attack diagnosis. The diagnostic process provides further information which is deduced by looking at the aggregated AIs. For example, the starting and ending time of the attack are obtained by evaluating the lowest and the highest timestamp values. 4.2

Cross Site Scripting: Assumptions and Experimental Setup

Strategies for the detection of Cross Site Scripting (XSS) attacks and SQLi attacks are very similar since both approaches rely on the analysis of HTTP requests performed by the attacker in order to ﬁnd attack traces. Cross Site Scripting is modeled as composed of a sequence of attempts which reﬂect the fact that initially the attacker has no information about how to inject the malicious code. The diﬀerence between SQLi attacks and XSS attacks is that no query failure is detected by monitoring the database in case the XSS code injection is not exploited to trigger the injection of SQL commands. This happens as the attacker usually injects client side code (like JavaScript) in web pages accessed by other users so as to steal sensitive information (i.e. cookies or session tokens) from their browsers without modifying the server side code of the application. This kind of Attack isEvaluatedBy Scalp XSS and ACDM probes. Thanks to the use of the ontology attack type, target and latency are correctly diagnosed.

200

4.3

L. Coppolino et al.

SQLi and XSS: Experimental Results

In this section, we discuss the results of the experimental campaign, with respect to SQLi and XSS attacks. Fig. 4a shows the performance of the detection process when a single AI is used. With respect to detection, experiments show that: – Scalp has good performance when detecting SQLi attacks (72%), while the detection rate of XSS attacks is lower (63%); – ACDM provides a very high detection rate (94%), at the cost of a rather high false positive rate (36% of the normal traﬃc is erroneously perceived as malicious); – AQFM never fails to report database level traces of SQLi attacks (100%), but normal traﬃc and XSS attacks may generate failing queries, too. As to diagnosis, it is worth noting that: – Due to their intrinsic nature, the anomaly-based probes, namely ACDM and AQFM, have the drawback of being only capable of raising alarms upon detection of anomalous requests, but they are not able to provide any information about the speciﬁc threat which has originated the alarm neither they are able to provide further diagnostic info, e.g. detection latency, parts of the system under attack; – Since SQLi and XSS attacks share similar patterns, they can be easily confused. Even Scalp, although being a signature-based probe, marks many SQLi requests as XSS (44% of the SQLi attacks, with 20% mapped to both kinds of attack). Fig. 4b shows the performance of the detection process when complex correlation rules are applied to symptoms detected by probes which monitor diﬀerent architectural levels, and use diverse detection approaches. Results show that the detection rate increases signiﬁcantly, and the accuracy of the diagnosis is improved. In particular, the wrong marking performed by Scalp is eliminated. Furthermore, the false positives produced by the ACDM probes are drastically

Fig. 4. (a) Detection performances using a single AI ; (b) Detection performance correlating multiple AIs

From Intrusion Detection to Intrusion Detection and Diagnosis

201

reduced. Basically, even though Scalp does not detect all the malicious requests and, even worse, sometimes it gives wrong hints, the system almost always detects the SQLi attacks by adding to the Scalp detections those obtained via correlation with symptoms detected by ACDM and AQFM. In this way, the rate of correctly diagnosed SQLi attacks rises from the 73% achieved by Scalp to the 91% of our ID2 S. When application-level symptoms are aggregated with ACDM and AQFM symptoms, the successfull diagnosis of XSS attacks rises from 63% achieved by Scalp to 71% of the correlation-based approach. The cases in which our ID2 S fails can be associated to two main scenarios: (i) wrong diagnoses are performed when query failures are generated by an XSS attack which leads to an incorrect SQLi diagnosis (4%); (ii) False Positives are raised by normal traﬃc that triggers a false detection by the ACDM (36%). When this detection is incorrectly aggregated with query failures generated by an overlapping SQLi attack, it produces a false SQLi diagnosis (5%), otherwise it can produce a false XSS diagnosis (11%). It should be emphasized that the false positives generated by ACDM are halved (16%) by means of the correlation with other symptoms.

5

Conclusions and Future Work

In this work, we have discussed the limitations of current Intrusion Detection Systems (IDS) technology, and proposed a novel approach, which we call Intrusion Detection & Diagnosis System (ID2 S) technology, to overcome such limitations. The basic idea is to collect information at several architectural levels (namely: Network, Operating System, Data Base, and Application), using multiple security probes which are deployed as a distributed architecture, and use Complex Event Processing (CEP) technology to perform sophisticated correlation analysis of intrusion symptoms. The escalation process from intrusion symptoms to the adjudged cause of the intrusion and to an assessment of the damage in individual system components is driven by ontologies. We have conducted preliminary experiments on a testbed consisting of web servers running a well-known open-source Content Management System (CMS) written in PHP, namely Joomla. The experimental tests have demonstrated that our approach is eﬀective against an emerging class of new attacks, which is referred to in the literature [1] as “stealthy”. These attacks represent a major threat, since not only they have a dramatic impact in terms of economic losses [2], but (i) they also are invisible to current State-Of-The-Art IDSs, and (ii) current Intrusion Prevention Systems are ineﬀective against them. Experiments conducted so far indicate that the proposed approach has three important advantages: – It improves the performance of the detection process; – It provides diagnostic features; – It reduces the false positive rates. Future work will follow two main directions. The ﬁrst objective will be to conduct a thorough experimental analysis, in order to collect more evidence of the

202

L. Coppolino et al.

eﬀectiveness of the approach. The second objective will be the implementation of more sophisticated correlation approaches, and the development of more detailed ontologies, so to allow ﬁner grain estimation of the consequences of attacks on individual system components.

Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement no. 216585 (INTERSECTION Project) and Grant Agreement no. 225553 (INSPIRE Project).

References 1. Jakobsson, M., XiaoFeng, W., Wetzel, S.: Stealth attacks in vehicular technologies. In: Proc. of The Vehicular Technology IEEE Conference, September 26-29, vol. 2, pp. 1218–1222 (2004) 2. IDC, Worldwide threat Management Security Appliances 2007-2011 Forecast and 2006 Vendor Shares: Still Stacking the Racks, Doc # 209303 (November 2007) 3. Repp, N., Berbner, R., Heckmann, O., Steinmetz, R.: A Cross-Layer Approach to Performance Monitoring of Web Services. In: Proc. of the Workshop on Emerging Web Services Technology, CEUR-WS (December 2006) 4. Yu-Sung, W., Bagchi, S., Garg, S., Singh, N.: SCIDIVE: a stateful and cross protocol intrusion detection architecture for voice-over-IP environments. In: Proc. of Dependable Systems and Networks Conference, June 28, pp. 433–442 (2004) 5. Vigna, G., Robertson, W., Vishal, K., Kemmerer, R.A.: A stateful intrusion detection system for World-Wide Web servers. In: Proc. of 19th Annual Computer Security Applications Conference, December 8-12, pp. 34–43 (2003) 6. Kruegel, C., Vigna, G.: Anomaly detection of web based attacks. In: Proc. of the 10th ACM conference on Computer and Communication Security (CCS 2003), pp. 251–261. ACM Press, New York (2003) 7. Majorczyk, F., Totel, E., M´e, L., Sa¨ıdane, A.: Anomaly Detection with Diagnosis in Diversiﬁed Systems using Information Flow Graphs. In: Proc. of The Iﬁp Tc 11 23rd International Information Security Conference, July 17, pp. 301–315 (2008) 8. Campanile, F., Cilardo, A., Coppolino, L., Romano, L.: Adaptable Parsing of RealTime Data Streams. In: 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, PDP 2007, February 7-9, pp. 412–418 (2007) 9. Fisher, Gruber, R.: PADS: a domain-speciﬁc language for processing ad hoc data. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005) 10. The Borealis project, http://www.cs.brown.edu/research/borealis/public/ 11. apache-scalp, Apache log analyzer for security, http://code.google.com/p/apache-scalp/

Model-Based Testing of GUI-Driven Applications Vivien Chinnapongse1 , Insup Lee1 , Oleg Sokolsky1 , Shaohui Wang1 , and Paul L. Jones2 1

University of Pennsylvania U.S. Food and Drug Administration {vichi,lee,sokolsky,shaohui}@cis.upenn.edu, [email protected] 2

Abstract. While thorough testing of reactive systems is essential to ensure device safety, few testing methods center on GUI-driven applications. In this paper we present one approach for the model-based testing of such systems. Using the AHLTA-Mobile case study to demonstrate our approach, we first introduce a high-level method of modeling the expected behavior of GUI-driven applications. We show how to use the NModel tool to generate test cases from this model and present a way to execute these tests within the application, highlighting the challenges of using an API-geared tool in a GUI-based setting. Finally we present the results of our case study.

1

Introduction

Thorough testing of reactive systems is an active research area with a long history. Reactive systems are primarily event-driven systems that operate by continuously interacting with their environment, responding to received signals. Operation of reactive systems is often safety- and life-critical. Rigorous development and analysis techniques are required to ensure safe and correct operation of such systems. In many safety-critical domains, for example in avionics and medical device areas, government regulators certify or approve systems before they can be used. In particular, the U.S. Food and Drug Administration (FDA) approves medical devices for use in the United States. An important class of reactive systems comprises systems interacting with a human user. Such systems oﬀer a user interface, through which the user can send signals to the system and observe its responses. The user typically learns to interact with the system by reading the user manual or through targeted training sessions. In either case, the user forms a mental model of the system

This research has been supported in part by the FDA/TATRC grant MIPR6MRXMM6093 and NSF grants CNS-0509327 and CNS-0720703.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 203–214, 2009. c IFIP International Federation for Information Processing 2009

204

V. Chinnapongse et al.

in his/her head. This model is then used as a speciﬁcation, against which operation of the system is assessed. In this paper, we are interested in establishing conformance between the system operation and user expectations. Conformance between the mental model and observable behavior of the system is important from diﬀerent perspectives. From the development perspective, it will help avoid usability problems in the system. From the regulatory perspective, it may help to evaluate necessary user training and instruction materials that accompany the device. We concentrate on GUI-driven handheld devices as a particular case of usercentric reactive systems. In the long-standing collaboration between experts at the FDA and the high-conﬁdence systems design group at Penn (e.g., [1,3]), we have considered several medical devices that fall in this category. This paper has been motivated by a recent case study, in which we analyze a point-of-injury data entry device application called AHLTA-Mobile [2]. The Armed Forces Health Longitudinal Tracking Application–Mobile (AHLTA-Mobile) is a point-of-care handheld medical assistant developed by the Telemedicine and Advanced Technology Research Center (TATRC), approved for use by the FDA and deployed in R Windows the U.S. Army. AHLTA-Mobile is a C# application on the Microsoft TM Mobile platform. It assists medical personnel, deployed, on military bases, or at military medical centers, with diagnosis and treatment of patients. Medical personnel also use the solution to record patient clinical encounters and transmit those records to a central data repository. AHLTA-Mobile provides users access to service members’ complete medical records and oﬀers advice for diagnosis and treatment. It contains a set of question-and-answer examinations that evaluate common battleﬁeld injuries such as concussions. For the safety of patients it is important that the device always functions correctly, because misdiagnosis and incorrect treatment can cause serious harm. For the purposes of this paper we are concerned with the correctness of a subset of AHLTA-Mobile’s behavior, the Military Acute Concussion Evaluation (MACE) module. MACE is a series of eight GUI screens, displaying forms to be completed by the user. Seven of these screens, to which we refer as MACE 11 through MACE 7, are used to enter results of the user examination, while the last screen, MACE Results, is used to enter diagnosis and oﬀers the possibility to save the results by entering them in a database. Relevant screens including Start Screen, Resume Screen, No Unit, and Error are also sconsidered. The screens are navigated by invoking the Next Screen button on each screen or the Previous menu item in the Tools menu. In response to users invoking an action, the system moves to a diﬀerent screen or updates information on the current screen. Note that the user can enter data into the appropriate ﬁelds on the screen, but cannot modify user interface actions. This observation led us to represent the mental model of the device as a state machine, in which states are identiﬁed with GUI screens and transitions represent changing screens in response to invoking UI elements. Each transition in such a state machine is labeled with the UI element 1

Throughout this paper we use a sans font for the names of GUI items. We use a fixed-width font to identify model and source code elements.

Model-Based Testing of GUI-Driven Applications

205

that eﬀects the change. In our case study, we constructed the model manually through the careful reading of the AHLTA-Mobile user manual [16]. We discuss the model in more detail in Section 3.2. Of the 114,000 lines of C# code that comprise the AHLTA-Mobile application, MACE screen classes and auxiliary classes contain approximately 6,000 lines of code. Given the state-machine model of the system, we can pursue two approaches to ascertain compliance of the system to its model. One is model-based testing, where the model is used to generate a test suite, which is then applied to the system implementation. Several tools are available for model-based testing of software. In our case study, we used NModel [7] from Microsoft Research, one of the few tools that target C# applications. The other alternative is to extract a state-machine model from the application source code and compare it directly to the mental model using a suitable notion of state machine equivalence or preorder. Although it makes more thorough testing possible, this alternative is much more challenging, and is one subject of our ongoing work. The contributions of this paper are threefold. First, we present an approach to capture behavioral models of GUI-driven handheld devices. We believe that the high-level modeling approach we have applied to represent the mental model of the AHLTA-Mobile device will be equally applicable to most devices in this category. Second, we present lessons learned during model-based testing to the AHLTA-Mobile case study. We discuss challenges we faced while applying the NModel methodology in the GUI-based setting, and the ways in which we overcame these challenges. Finally, we present results of the case study, which uncovered inconsistencies between the device behavior and the desired behavior described in the manual. The paper is organized as follows: Section 2 describes the NModel framework to be used in analyzing AHLTA-Mobile. Section 3 discusses the development of a mental model, both as an extended ﬁnite state machine (EFSM) and as an NModel model program. Section 4 explains the creation of a test harness to link an implementation with test cases. The testing of the AHLTA-Mobile application is described in Section 5. Section 6 discusses related research work. We conclude our paper with a discussion of our contributions in Section 7.

2

Using NModel to Analyze MACE

Developed at Microsoft Research, the NModel [6,7] framework is a model-based software testing and analysis tool for C# programs. NModel allows us to create a formal model of an implementation’s expected behavior and determine through model-based testing whether or not the implementation’s actual behavior and the model are consistent. The open-source tool is freely available online at no cost and there is a good level of support and documentation. No other tool we discovered matched this description and we decided NModel would suit our purposes reasonably well.

206

V. Chinnapongse et al.

The NModel framework consists of the following components: – a library for creating model programs, executable speciﬁcations for implementations, – a model program viewer (mpv) for viewing model programs as ﬁnite-state machines (FSMs), – an oﬄine test generator (otg), which performs link coverage of model programs to produce test cases, and – a conformance tester (ct), which takes test cases and executes them within the implementation.2 This must be coupled to the implementation with a test harness, called a stepper. A diagram of the steps involved in testing implementations with NModel is provided in Figure 1: 1. First, we take the speciﬁcations and/or the user manual and write a model program using the NModel library. We can use this model program to generate a graphical FSM using mpv for a visual representation. 2. Then, we use otg to generate a test suite from the model program. 3. To test the implementation with the test suite, we ﬁrst write a stepper to couple the test cases described in the model program with the implementation. 4. Finally, we run ct with the test suite and the implementation coupled with the stepper to check for consistency between the implementation and the model. The output of ct is Success if the implementation is correct and Failure otherwise. specifications/user manual

implementation

manual generation

model program

model program viewer (mpv)

graphical FSM

offline test generator (otg)

test suite

manual generation

stepper

conformance tester (ct)

output

Fig. 1. Testing implementations with NModel 2

ct can also generate test cases on the fly from a model program during test execution, but this is not necessary and is therefore not discussed in this paper.

Model-Based Testing of GUI-Driven Applications

3

207

Creating the Mental Model

The ﬁrst step of our process was to produce a mental model of MACE from the AHLTA-Mobile user manual. The challenge in creating the model was to ﬁnd an adequate modeling approach that captures the user perception of the application. Taken in its full complexity, the problem of user perception goes well beyond the scope of the case study. However, after showing the AHLTAMobile to several potential users, we concluded that the application can be modeled as an extended ﬁnite state machine (EFSM), which has been long used in model-based testing [5]. In the following, we give a brief deﬁnition of EFSM, followed by the description of our modeling approach and a discussion of the implementation of a given EFSM as a model program in NModel. 3.1

Extended Finite State Machines

Preliminaries. For a ﬁnite set of variables X = {x1 , ..., xn }, each ranging over the space of values O, a valuation is a function v : X → O that assigns to each variable x its current value. The set of valuations of X is denoted V(X). A predicate P over X is a boolean-valued function P : V(X) → {true, false}. A valuation transformer T is a function T : V(X) → V(X). An EFSM M is a tuple Q, Σ, X, E, q0 , v0 , where Q is a set of states with the designated initial state q0 , Σ is a ﬁnite alphabet, X is a set of variables with the initial valuation v0 , and E is a transition relation. A transition t ∈ E is a tuple q1 , g, a, u, q2 , where q1 , q2 are the source and destination states of the transition, respectively. The symbol a ∈ Σ is the event that triggers the transition. The guard g is a predicate over the variables of M that states when the transition is allowed to be taken. Finally, the update u is a valuation transformer that reﬂects changes to variables when the transition occurs. For the purpose of this paper, we represent each update as a sequence of assignments xi = fi (X). A run of M is an alternating sequence (q0 , v0 )a1 (q1 , v1 )a2 ... such that, for each i, M has a transition qi−1 , gi , ai , ui , qi such that gi (vi−1 ) = true and vi = ui (vi−1 ). That is, in every step of the execution, a transition of M is taken such that its guard is satisﬁed by the variable values in the source state and the valuation after the transition is taken is updated according to the update speciﬁed by the transition. The update occurs by performing assignments in their syntactic order in ui . 3.2

EFSM Model for AHLTA-Mobile

The AHLTA-Mobile user manual uses two ways to convey the expected behavior of the application to the user: ﬁrst, it oﬀers pictures of each GUI screen, and second, it describes the actions that may be performed when a given screen is displayed. With the exception of editing actions, the outcome of performing an action is the new screen being displayed. We found it natural from the documentation to formulate the mental model as an EFSM that encompasses the observable behavior of the system, identifying screens with states and actions with transitions between the screens.

208

V. Chinnapongse et al.

For this case study we focus on a subset of MACE’s behavior, capturing the actions Resume and Suspend. The resulting EFSM is MAM = QAM , ΣAM , XAM , EAM , StartScreen, v0 , where – QAM = {StartScreen, MACE1, . . . , MACE7, MACEResults, ExamIndex, ResumeScreen, NoUnit}, – ΣAM = {Edit, Next, Suspend, Resume, Select, Start, MACE}, – XAM = {Edited1, . . . , Edited7, EditedResults, Suspended, Selected, UnitInfo} EAM is visually represented in Figure 2, somewhat simpliﬁed for readability, and v0 is discussed below. In the EFSM model, states represent the following subset of AHLTA-Mobile screens relevant to MACE. MACE is comprised of eight screen states, MACE 1 through MACE 7, and MACE Results. Each screen is a form that has to be completed before the next may be displayed. The Start Screen is the initial screen where the application begins after the user has logged in and a patient has been selected. The Exam Index is a menu from which the user can navigate to MACE, and the Resume Screen is a menu from which the user may resume a suspended exam. The alphabet of this EFSM consists of the following actions available within MACE. – – – – – – –

Edit completes the required ﬁelds in the current screen. Next clicks Next to navigate to the next screen. Suspend clicks Suspend to suspend the evaluation. Resume clicks Resume to resume the evaluation. Select selects the appropriate exam to resume. Start clicks Exam Index on the initial screen. MACE clicks MACE within the Exam Index.

Each action would label a transition in the EFSM representation of the mental model of MACE. Note that the Edit action is the only one that does not correspond to the invocation of a particular user interface element. In our approach, we do not model the contents of MACE forms. Instead, we capture only the fact

!Suspended Start

Exam Index

UnitInfo MACE

Select Selected = 1 Suspended Resume

Resume Screen

Selected Resume Suspended = 0, Selected = 0

!Edited1 Edit Edited1 = 1 MACE 1

Suspend Suspended = 1

Edited1 Next

Suspend Suspended = 1

!Edited2 Edit Edited2 = 1 MACE 2

Start Screen

Edited2 Next

...

Edited6 Next

Suspend Suspended = 1

!Edited7 Edit Edited7 = 1 MACE 7

Edited MACE Results? Save

Suspend Suspended = 1

Fig. 2. Expected behavior the MACE exam

Edited7 Next

!EditedResults Edit EditedResults = 1 MACE Results

Model-Based Testing of GUI-Driven Applications

209

that some editing has to be performed before the user can move to the next screen. The variables of the EFSM model have been introduced to capture conditional execution of user actions as speciﬁed in the user manual. For example, the action Resume may only be executed from the Start Screen state if the value of the boolean variable Suspended is false, indicating that the exam has been previously suspended. Other variables include UnitInfo, which indicates whether unit information has been previously speciﬁed for the patient, Selected, which indicates whether an exam has been selected in the Resume Screen state to resume, and Edited1. . . EditedResults indicate whether the required ﬁelds in the MACE screens MACE 1. . . MACE Results have been completed. Initial values of Suspended, Selected, and Edited1. . . EditedResults are all false.We have separately considered initial valuations with UnitInfo either true or false. The reason for this is that the patient’s unit information is set in another part of the AHLTA-Mobile system, which has been excluded from the case study. 3.3

Model Program Representation of the Mental Model

After creating a formal representation of the mental model we needed to translate it into a model program that could be used by NModel to test the AHLTAMobile application. Model programs, executable speciﬁcations written in C# using the NModel library, are action oriented. They deﬁne which actions in an application may be taken in what circumstances. A model program contains a set of variables that captures the state of the model program, and a collection of methods that represent actions. For each action a, the model program contains two methods, a() that represents the action itself, and aEnabled() that, based on the current state of the model program, determines whether a is enabled. The body of a() is a collection of cases that update the variables of the model program when the action method is invoked. Given an EFSM M = Q, Σ, X, E, q0 , v0 that deﬁnes states as screens, we mechanically translate it into a model program as follows: 1. Declare the Model class, which references the System and NModel libraries. 2. Within Model, create all variables in X and initialize them according to v0 . Add the string variable current that stores the label of the current state of M , initially q0 . 3. For each a in Σ: (a) Create an action skeleton: [Action("a")] static public void a() {} static bool aEnabled() { return false; }

(b) For each t = q, g, a, u, q in E: i. Add the following lines to a():

if (current.Equals("q") && g) { current = "q’"; update(u); }

210

V. Chinnapongse et al.

where update(u) updates all variables according to u. Given our sequential interpretation of u in the deﬁnition of EFSM, the assignments of u can be syntactically transcribed into C# statements. ii. Add the following line to the beginning of aEnabled(): if (current.Equals("q") && g) return true;

The described translation is mechanical and can be easily made automatic. However, in the case study, we followed the described procedure manually. As an example of this translation, consider the action MACE in Figure 3. The action may be taken when the Exam Index screen is displayed and may lead to No Unit or MACE 1 depending on the value of UnitInfo. This fragment of the EFSM yields the model program shown below it.

!UnitInfo MACE Exam Index

UnitInfo MACE

No Unit

MACE 1

[Action("MACE")] static public void MACE() { if (unitinfo) current = "MACE 1"; else current = "No Unit"; } static bool MACEEnabled() { return current.Equals("Exam Index"); } Fig. 3. Representing the MACE() action in a model program

After writing the model program representation of the MACE mental model, we used mpv to produce an FSM. We then used otg to automatically generate a test suite for MACE.

4

Writing a Test Harness

NModel requires the use of a stepper, a test harness that invokes an instance of the implementation to be tested and causes the appropriate actions to be executed when invoked by ct. For simple applications, like the samples provided on the NModel website[6], when ct requests for an action to be executed, the stepper is written to simply call a corresponding method that exists within the implementation. In AHLTA-Mobile this was not possible: our actions did not directly correspond to single methods provided in the application but instead

Model-Based Testing of GUI-Driven Applications

211

to multiple methods triggered by user input events, like keystrokes and mouse clicks. Attempting to associate input actions with existing methods, like callbacks for buttons, was problematic for a few reasons. Since callback methods are normally private, code needed to be modiﬁed in many places for them to be used; an inelegant solution that presented many possibilities for errors to be introduced. We also needed to know which instance of any object we were manipulating, requiring further additions to the source code. This method also required detailed knowledge about how the implementation worked, which was both tedious and, as we found, unnecessary. Instead of using callbacks and related methods in order to simulate actions, we inserted actual keystroke and mouse click events into the application’s message loop. We did this in AHLTA-Mobile by retrieving object handles from the C# message loop within the application and sending our user input events directly to the appropriate handles via the message loop. This method allowed us to add code in only one part of the application, making it simpler to work with and reducing the opportunities to introduce errors into the application.

5

Testing AHLTA-Mobile

Once a stepper is written we can run ct with the coupled implementation and stepper and the test suite generated by otg as arguments, as shown previously in Figure 1. Running the conformance tester quickly revealed an error: Suspending an exam does not lead to the Start Screen as expected but instead to the Exam Index. This resulted in a timeout, a function included in NModel in case an implementation does not behave as expected and must be terminated. Since the Start Screen and thus Resume button never appeared, it was never clicked and no exam could be selected, causing the application to stall. The test trace that caused the error is given below. TestResult(0, Verdict("Failure"), "Action timed out", Trace( Test(0), Start(), MACE(), Suspend(), Resume(), Select() ) )

6

Related Work

The use of state machines for specifying user interfaces has been explored as early as mid-1980s in [17]. At that time, however, state machines were applied to textual user interfaces, which are much simpler to model and analyze (for example, they do not involve callbacks). With the advent of ﬂexible, dynamically modiﬁable GUI systems research in the human-computer interface (HCI) area has focused primarily on dynamic aspects of GUI-based systems, where state

212

V. Chinnapongse et al.

machines appear to be less useful. However, in the domain of GUI-driven handheld devices considered in our case study, EFSMs are quite appropriate and yield high-level and accurate models of user expectations of the system. Model-based testing of GUI programs is also explored in [9], where the authors use randomized online testing instead of providing oﬄine tests that achieve transition coverage of the model. The paper presents a very diﬀerent modeling approach based on labeled transition systems with concurrency. The approach involves two levels of models. A high-level model describes the various user-level actions that may be performed. A user level action may require several GUI operations, such as popping up a menu and then selecting an item in the menu. A low-level model then describes how these actions are accomplished. We believe that the approach of [9] is targeted towards systems with dynamically created and manipulated GUI screens. In our case, their multi-level approach would be an overkill. Several other research works focus on diﬀerent aspects of model-based testing. [13] mentions the use of model based test case generation for fault detection, and employs hierarchical predicate transition Petri Nets as a formalism. [18] discusses and compares several testing methodologies toward open source software using model based testing. In [12], the authors present extensions to the Spec Explorer tool to automate testing based on Spec# speciﬁcations. A GUI mapping tool allows the tester to associate actions with physical objects that appear on the GUI display. The tool generates C# code with methods that have the same signature as those speciﬁed and actions are performed externally according to tests generated by Spec Explorer. [14] specializes the task modelling notation to ConcurTaskTrees.

7

Conclusions and Discussion

In this paper we presented an approach for the behavioral modeling of GUIdriven handheld devices. We illustrated how the NModel methodology can be applied for the model-based testing of this class of devices. We discussed the challenges we faced in applying this approach and our way of overcoming them. Finally, we presented the results of our case study of the AHLTA-Mobile application, demonstrating an inconsistency between the observed behavior and the behavior described in the user manual. We believe that our approach is applicable to most GUI-driven handheld devices, oﬀering a viable method of establishing conformance between system operation and user expectations for these types of reactive systems. While this work is an encouraging step forward, it is still far from the comprehensive methodology needed for the analysis of user-centric GUI-driven devices that we envision. Several aspects of such a methodology remain open problems, as discussed below. Realistic mental models. In this paper, we constructed the mental model based on the contents of a user manual. Clearly, perception of the appropriate use for a device by a user is formed through other factors as well and can be quite diﬀerent

Model-Based Testing of GUI-Driven Applications

213

from the literal representation of the user manual in formal notation [11,10]. Empirically constructed mental models, capturing probabilistic information about observed user behaviors, are used in testing literature under the name of usage models or usage proﬁles [4]. A predictive way to construct such mental models in needed, especially for new kinds of devices. A practical mental modeling methodology should build on both cognitive science and computer science. Detecting and managing underspeciﬁcation. A big part of the challenge in constructing mental models is that natural-language documents describing a system are never complete and users interpret them by making assumptions based on their knowledge and prior experience with similar systems. The problem here is that these assumptions are so natural for the reader that it is often hard to detect that an implicit assumption has been made. Representation of alternatives would require us to apply a diﬀerent modeling and testing approach. A possible way to capture alternatives is to use nondeterministic EFSMs, where diﬀerent transitions labeled by the same symbol would correspond to diﬀerent alternatives. During testing, as long as the implementation oﬀers a behavior corresponding to one alternative, a test should succeed. A slight complication here is the need to ensure consistency: if an alternative has been resolved in some way during a test execution then later in the same execution it has to be resolved the same way. We also will have to rely on a diﬀerent tool to generate and execute tests, since NModel operates on input-deterministic model programs. Soft vs. hard inputs. Many approaches to model-based testing require that the model does not restrict the tester from performing an action. Technically, this corresponds to the notions of weak input enabledness [15] or input completeness [8]. In our case, this requirement should be relaxed, because the system interacts with its environment via what we call soft inputs, such as GUI buttons on the screen. Such a button many not be present on some screens, and in that case the tester should actually be prevented from invoking that input.

References 1. Alur, R., Arney, D., Gunter, E.L., Lee, I., Lee, J., Nam, W., Pearce, F., Van Albert, S., Zhou, J.: Formal specifications and analysis of the computer-assisted resuscitation algorithm (CARA) infusion pump control system. Software Tools for Technology Transfer 5(4), 308–319 (2004) 2. AHLTA-Mobile fact sheet. Medical Communications for Combat Casualty Care Web Site, https://www.mc4.army.mil/AHLTA-Mobile.asp 3. Arney, D., Jetley, R., Jones, P., Lee, I., Sokolsky, O.: Formal methods based development of a PCA infusion pump reference model: the generic infusion pump (GIP) project. In: Joint Workshop on High-Confidence Medical Devices, Software and Systems and Medical Device Plug-and-Play Interoperability, July 2007, pp. 23–33 (2007) 4. Brooks, P., Memon, A.M.: Automated GUI testing guided by usage profiles. In: Proceedings of the 22nd IEEE International Conference on Automated Software Engineering (ASE 2007) (November 2007)

214

V. Chinnapongse et al.

5. Cheng, K.T., Krishnakumar, A.S.: Automatic functional test generation using the extended finite state machine model. In: Proceedings of the 30th international conference on Design automation (DAC 1993), June 1993, pp. 86–91 (1993) 6. Microsoft Corporation. NModel website (2009), http://www.codeplex.com/NModel 7. Jacky, J., Veanes, M., Campbell, C., Schulte, W.: Model-based Software Testing and Analysis with C#. Cambridge University Press, Cambridge (2008) 8. Jard, C., J´eron, T.: TGV: theory, principles and algorithms. a tool for the automatic synthesis of conformance test cases for non-deterministic reactive systems. Software Tools for Technology Transfer (2004) 9. Kervinen, A., Maunumaa, M., P¨ aa ¨kk¨ onen, T., Katara, M.: Model-based testing through a GUI. In: Grieskamp, W., Weise, C. (eds.) FATES 2005. LNCS, vol. 3997, pp. 16–31. Springer, Heidelberg (2006) 10. Legrenzi, P., Girotto, V.: Mental models in reasoning and decision making. In: Garnham, A., Oakhill, J. (eds.) Mental models in cognitive science, pp. 95–118 (1996) 11. Lewis, C.: A model of mental model construction. In: Proceedings of the SIGCHI conference on Human factors in computing systems (CHI 1986), pp. 306–313 (1986) 12. Paiva, A., Faria, J., Tillmann, N., Vidal, R.: A model-to-implementation mapping tool for automated model-based GUI testing. In: Lau, K.-K., Banach, R. (eds.) ICFEM 2005. LNCS, vol. 3785, pp. 450–464. Springer, Heidelberg (2005) 13. Reza, H., Endapally, S., Grant, E.S.: A model-based approach for testing gui using hierarchical predicate transition nets. In: ITNG, pp. 366–370 (2007) 14. Silva, J.L., Campos, J.C., Paiva, A.C.R.: Model-based user interface testing with spec explorer and concurtasktrees. Electronic Notes in Theoretical Computer Science 208, 77–93 (2008) 15. Tretmans, J.: Test generation with inputs, outputs and repetitive quiescence. Software - Concepts and Tools 17(3), 103–120 (1996) 16. U.S. Army Medical Research & Material Command, Mobile Computing Group, Telemedicine and Advanced Technology Research Center, Fort Detrick, Maryland. AHLTA-Mobile User Manual, v2.2.61 17. Wasserman, A.I.: Extending state transition diagrams for the specification of human-computer interaction. IEEE Transactions on Software Engineering 11(8), 699–713 (1985) 18. Xie, Q., Memon, A.: Model-based testing of community-driven driven open source GUI applications. In: 22nd International Conference on Software Maintenance (ICSM 2006), pp. 145–154 (2006)

Parallelizing Software-Implemented Error Detection Ute Schiﬀel, Andr´e Schmitt, Martin S¨ ußkraut, Stefan Weigert, and Christof Fetzer Technische Universt¨ at Dresden Department of Computer Science Dresden, Germany {ute,andre,suesskraut,stefan,christof}@se.inf.tu-dresden.de http://wwwse.inf.tu-dresden.de

Abstract. Because of economic pressure, more commodity hardware with insuﬃcient error detection is used in critical applications. Moreover, it is expected that commodity hardware is becoming less reliable because of the continuously decreasing feature size. Thus, we expect that software-implemented approaches to deal with unreliable hardware will be needed. Arithmetic codes are well suited for this purpose because they can provide very good error detection capabilities independent of the actual failure modes of the underlying hardware. But arithmetic codes generate high slowdowns. This paper describes our encoding which uses an expensive AN-code. Second, we show how we harness the power of modern multicore CPUs to parallelize this expensive but ﬂexible and powerful software-implemented fault detection technique. Our measurements show that under continuous probabilistic error injection, AN-encoding reduces the number of runs with incorrect output from 15.9% for the unencoded execution to 0.5% in the encoded case. Our parallelization reduces the observed slowdowns by an order of magnitude.

1

Introduction

Historically, hardware reliability has been increasing with every new generation. In the future, it is expected that decreasing feature size of hardware will, however, lead to less reliable hardware [5]. Moreover, the error rate in logical circuits has overtaken error rates in memory [8]. Thus, the usage of memory protection alone is not suﬃcient anymore. Historically, critical and especially safety-critical systems have mostly been built using special purpose hardware with better error detection and masking with the help of redundancy. Some hardware is even radiation hardened to prevent environment induced execution errors. However, these solutions are expensive and usually an order of magnitude slower than commodity hardware. We expect that in the future there will be even a higher economic pressure to use commodity hardware for dependable computing. Therefore, there is a need to cope with the restrictive failure detection capabilities of commodity hardware in S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 215–226, 2009. c IFIP International Federation for Information Processing 2009

216

U. Schiﬀel et al.

software. Commodity hardware will not exhibit pure fail-stop behavior but instead exhibit value failures which are much more diﬃcult to detect and to mask. We aim at implementing a system which will turn these arbitrary value failures into easier to handle crash failures without the need for special hardware. Arithmetic codes (see Sec. 2) facilitate software-implemented hardware error detection. In this paper, we use an AN-code. Its main advantage of arithmetic codes is that one can ensure error detection with a given probability - independent of the underlying hardware. Arithmetic codes introduce a very high overhead. Previous approaches [6,12] reduced the overheads of arithmetic codes by not completely encoding applications and additionally, by using less powerful AN-codes. We, instead, protect every instruction of a program using the same powerful AN-code. Section 3 demonstrates how we reduce the slowdowns of the encoding by parallelizing the encoded execution using the power of modern multicore CPUs. The measurements presented in Sec. 4 show that under continuous probabilistic error injection, AN-encoding reduces the number of runs with incorrect output from 15.9% for the unencoded execution to 0.5% in the encoded case. Note that one can improve the strength of the encoding (by using a larger A - see Section 2) to reduce the percentage of incorrect correct outputs even further. Of course, a larger A also increases the overheads. Our parallelization reduces the observed slowdowns by an order of magnitude on a 16-core system. We discuss the related in Sec. 5.

2

AN-Encoding

Arithmetic codes are a technique to detect hardware errors during runtime. The encoding adds redundancy to all data words. This results in a larger domain for data words and only a small subset of the domain contains the valid code words. Arithmetic codes are conserved by correctly executed arithmetic operations: a correctly executed operation given valid code words as input, outputs a valid code word. A faulty arithmetic operation destroys the code with a very high probability, i.e., results in an invalid code word [2]. In the following, we will brieﬂy summarize our previous work with AN-code which is published in [14]. We want to give the reader a general idea about the concept and an understanding why the application of AN-code is as computationally expensive as it is. The AN-code is one of the most widely known arithmetic codes. The encoded version xc of variable x is obtained by multiplying its original functional value xf with a constant A. This encoding is only done for input values of a program. All computations take multiples of A as inputs and if executed error-free, produce multiples of A as outputs. Code checking is done by computing the modulus with A, which is zero for a valid code word. Before a variable is externalized, i.e., used as a parameter of an external function or as a memory address in a load or a store operation, it is checked if it is a valid code word. If the check fails, the application is aborted.

Parallelizing Software-Implemented Error Detection

217

The advantage of an AN-code is that the probability of detecting an operation error does not depend on the used hardware but only on the choice of A. Assuming a failure model with equally distributed bit ﬂips and a constant Hamming distance between all code words, the resulting probability of detecting one error is approximately: 1 − 2−k , where, k is the size of A in bits. Thus, A should be as large as possible. Furthermore, it should not be a power of two because a multiplication by A would only shift the bits to the left and no bit-ﬂips in the higher bits would be detected. A should also have as few factors as possible to increase the probability of detecting an error. Large prime numbers are therefore a good choice for A. For encoding a program with an AN-code, every variable has to be replaced with its larger AN-encoded version. Every instruction is substituted by its AN-preserving counter-part. We perform the instrumentation at compilation time: – The scope of the protection includes compiler errors. For example, errors in lowering the source code to an executable binary will most likely result in invalid codes and hence, become detectable. – We do not introduce further slowdowns because of dynamic instrumentation. See [21] for a detailed discussion of advantages and disadvantages of encoding at compile vs at runtime. We have implemented our encoding compiler using the LLVM compiler framework [10]. We encode LLVM’s bitcode which is a static single assignment assembler-like language. The advantage of LLVM’s bitcode, in comparison to any native assembler, is its manageable amount of instructions for which we have to provide encoded versions and the LLVM framework for analyzing and modifying LLVM bitcode. For AN-encoding LLVM bitcode, we solved the following problems: – We need encoded versions of all operations supported by LLVM. Therefore, we provide a set of basic hand-encoded arithmetic operations and a set of encodable replacement operations which uses only the basic arithmetic operations. – We have to handle calls to un-encoded external libraries. – AN-codes are only applicable to integers. Thus, we encode ﬂoating point operations by replacing them with encodable software implementations which make only use of integers. – We have to provide encoded versions of all constants and initialization values. LLVM enables us to ﬁnd and modify all those initializations. Thus, we replace them with appropriate multiples of A. – For the encoding of memory content, a speciﬁc word size had to be chosen: we chose 32 bit. We require the compiler to align all memory accesses to that word size because only whole encoded words can be read. Basic Hand-Encoded Arithmetic Operations. Executing arithmetic operations on AN-encoded data mostly requires some corrections to obtain a correctly

218

U. Schiﬀel et al.

Slowdown

1000.0 100.0 10.0 1.0 addition

subtraction multiplication unsigned division

signed division

compare equal

compare unequal

unsigned signed greater than greater than

unsigned less than

signed less than

Fig. 1. Slowdowns of encoded arithmetic operations compared to their native versions

encoded result. For example, consider the multiplication of two encoded values ac = A ∗ af and bc = A ∗ bf . When just multiplying ac and bc , the obtained result is A2 ∗ af ∗ bf but the correctly encoded result should be A ∗ af ∗ bf . Thus, an additional division by A is required. Our encoded arithmetic operations are hand-encoded. See [19] for the details of an AN-code with signatures, i.e., the value of a variable is not only multiplied by a factor A but also a unique constant is added. Fig. 1 shows the slowdowns of our AN-encoded operations compared to their unencoded versions. While AN-encoded additions and subtractions take only two times as long as their native versions, an AN-encoded multiplication takes 126 times longer. Divisions and comparisons are between 3 and 10 times slower. The main reasons for those slowdowns are: (1) the implementation of the overﬂow behavior of integer operations as deﬁned in the C standard, and (2) that encoded multiplications and divisions require expensive 128-bit integer operations. Replacement Operations. Since encoding by hand is a tedious and errorprone task, we automated as much of the remaining encoding tasks as possible. Thus, we provide a library of so-called replacement operations. Those contain implementations of the following operations: shifts, casts, bitwise logical operations, and remainder operations. The replacement operations are written in such a way that they can be automatically encoded by our encoding compiler, i.e., they only use arithmetic operations for which hand-encoded versions exist. Before encoding a program, all not directly encodable operations, i.e., all operations which have no encoded variant in our basic set of hand-encoded arithmetic operations, are replaced with their appropriate encodable replacement operation. Fig. 2 depicts the slowdowns generated by the slowest replacement operations compared to their native versions: for the unencoded and for the encoded replacement operations. Especially the bitwise logical operations (notx , andx , orx , and xorx ) generate large slow downs. Their implementation uses tabulated results for 16-bit (notx ) and 8-bit (the other operations) blocks and expensive shift operations to combine those results. Arithmetic right shifts (ashrx ) are expensive because they require two accesses to tabulated data and an encoded division. Finally, the encoded versions of signed and unsigned remainder operations (sremx and uremx ) and upcast and downcast operations (sext-x -to-y and trunc-x -to-y ) still generate slowdowns between 8 and 64. They also require expensive encoded divisions.

219

AN-encoded replacement operations un-encoded replacement operations

trunc-32-to-16

trunc-32-to-8

trunc-16-to-8

sext-16-to-32

sext-8-to-32

sext-8-to-16

ashr32

ashr16

ashr8

srem32

srem16

srem8

urem32

urem16

urem8

xor32

xor16

xor8

or32

or16

or8

and32

and16

and8

not32

not16

2048 1024 512 256 128 64 32 16 8 4 2 1

not8

slowdown: compared to native

Parallelizing Software-Implemented Error Detection

Fig. 2. Slowdowns of unencoded and AN-encoded versions of replacement operations compared to their native versions

Calls to unencoded External Code. In contrast to dynamic binary instrumentation, static instrumentation does not allow for protection of external libraries whose source code is not available at compilation time. For calls to those libraries, we currently provide hand-coded wrappers which decode parameters and after executing the unencoded original, encode the obtained results. Note that AN-encoding leads to unexpected performance modiﬁcations. Some operations whose unencoded versions are very fast (casts, shifts, bitwise logical operations, multiplications and divisions) suddenly induce very large overheads. Depending on the encoded application AN-encoding results in slowdowns between 7.5 and 238 (see Sec. 4).

3

Parallelize Encoded Execution

We mitigate the performance overhead of the encoded execution by parallelization. Fig. 3 shows the general approach which is very similar to [11] with two important exceptions: (1) to reduce the overall overhead and to increase safety, we use static instrumentation instead of dynamic binary instrumentation, and (2) we introduce speculative variables to decouple the execution of the individual epochs. Speculative variables are discussed in more details in Sec. 3.2.

Fig. 3. (a) Original application execution without encoding and parallelization. (b) Sequential execution with encoding. (c) Parallelized encoded execution. Unencoded predictor for fast state prediction. Parallelized executors execute slow encoded variant.

220

U. Schiﬀel et al.

An encoded execution (see Fig. 3 (b)) introduces, in general, a substantial runtime overhead compared to the unencoded execution (Fig. 3 (a)). To parallelize the encoded execution (Fig. 3 (c)), we execute the original unencoded application within the predictor process. The predictor’s execution is partitioned into epochs. An executor process reexecutes a given epoch using its encoded version. Executors run on additional CPU cores in parallel to each other and to the predictor. They synchronize their state using speculative variables to make the approach scalable. The predictor runs up to two orders of magnitude faster than the executors. Hence, it can provide the snapshot for starting the executor of epoch ei+1 even if the executor of the previous epoch ei has not yet ﬁnished. Our parallelization approach is not completely transparent to the application developer. The application developer has to mark potential places in the code where a snapshot could be taken, i.e., a new epoch can be started. Ideally, these snapshot places are executed periodically with constant frequency at runtime. 3.1

Platform Support

At compile time, we ﬁrst generate two code bases from the unencoded application: the predictor code base and the executor code base. This stage just duplicates the functions of the original code base and renames them. Second, we instrument both code bases to allow switching from the predictor code base to the executor code base at epoch boundaries. At runtime, the added code rewrites the stack when switching from the predictor’s code base to the executor’s code base (e.g., it rewrites the return addresses on the stack to point into the executor’s code base). After these preparations, the encoding compiler encodes the executor code base. The instrumentation process is the same as for encoding without parallelization. At runtime, we provide a snapshot mechanism (similar to fork [11]) which starts a new executor for each new epoch started by the predictor. The executor replays the same computation for an epoch e as the predictor performed for e but with encoding. Therefore, any input received by the predictor is deterministically replayed in the executors. The input is encoded at runtime by the hand-coded wrappers described in Sec. 2. All externally visible side-eﬀects (issued via system calls) of the predictor are held back until they are veriﬁed by the executors. After successfully executing an epoch, an executor explicitly approves it to make its side-eﬀects externally visible. Because the executors are running in parallel, the veriﬁcation order of system calls might be diﬀerent from the order in which they were issued in the predictor. We allow out-of-order issuing of system calls by the executors but ensure their in-order retirement. The deterministic replay and speculative execution of external side-eﬀects is transparent to the encoding compiler and its runtime support. We implemented these two features as kernel-module for Linux similar to Speck [11]. 3.2

Speculative Variables

To parallelize the encoded execution of epochs by executors, the executor of epoch ei starts independently from the state of the executor of its predecessor

Parallelizing Software-Implemented Error Detection

221

epoch ei−1 . Initially, the executor of ei contains only the unencoded state from the snapshot of the predictor. Whenever the current state is read in ei , it is lazily encoded. This approach does not protect against errors in the predictor’s execution or the snapshots. Hence, after executing ei , its executor veriﬁes the initial encoded state of ei against the final encoded state of ei−1 . By comparing only encoded states, we achieve end-to-end safety. We implemented the described approach by using speculative variables. A speculative variable holds a value and optionally an obligation. The value is written within the executed epoch, i.e., computed in that epoch. The obligation is the initial value which was read from the snapshot and that needs to be veriﬁed at some later point in time. At runtime, the whole encoded state is stored in speculative variables as follows. Every word in memory is assigned to one speculative variable. An epoch starts with an empty set of speculative variables. Speculative variables are created lazily when their corresponding memory addresses are accessed for the ﬁrst time. When a memory address i is read or written, the value of its speculative variable vi is read or written, respectively. A speculative variable vi is either created by an encoded read from address i or an encoded write to i. When created by a write, the value of vi is set to the (already) encoded value given to the write. A speculative variable created by a write does not have an obligation. If vi is created by a read, the unencoded value at address i is read from the predictor’s snapshot at the start of the current epoch ei . This unencoded value is then encoded and written to the value of vi and to its obligation. Subsequent accesses to vi do not touch its obligation. At the end of epoch ei , all obligations created in the encoded executor of ei are checked against the ﬁnal state of the encoded executor of the preceding epoch ei−1 . Therefore, the executor of ei−1 writes its ﬁnal encoded state into a global view shared by all executors. At the start of the application, the global view contains the encoded initial state of the application. The executor Ei of ei waits for the executor of ei−1 to terminate. Then Ei veriﬁes all obligations of ei against the global view. If this veriﬁcation fails, the application is stopped. In the future, we want to retry the execution of the current epoch to tolerate transient faults. After the veriﬁcation, Ei updates the global view with the current values of all speculative variables of ei .

4

Evaluation

We evaluated our approach using four applications: (1) md5 calculates the md5 hash of a string, (2) primes implements the Sieve of Eratosthenes, (3) tcas is an open-source implementation of the traﬃc alert and collision avoidance system [1] which is mandatory for airplanes, and (4) pid is a ProportionalIntegral-Derivative controller [23]. Performance. We measured the performance as slowdown of the runtime of the AN-encoded application over the runtime of the unencoded application. Fig. 4 depicts the slowdown of the AN-encoded applications compared to their unencoded versions using diﬀerent amounts of parallelization by restricting the

222

U. Schiﬀel et al.

256

tcas md5 pid primes

slowdown vs. native

128 64 32 16 8 4 2

1

2

4

8

16

32

64

# parallel executors

Fig. 4. Slowdowns of sequential and parallelized AN-encoded applications

number of maximally parallel executed AN-encoded executors (x-axis). All tests were executed on 64-bit Fedora 10 running on a 16-core machine. The sequential slowdowns (1 parallel executor in Fig. 4) range from 7.5 to 238. The more expensive encoded operations are used by an application, the larger the slowdown becomes. Especially md5 uses many bitwise logical operations and thus experiences a larger slowdown. Using two parallel executors does not halve the slowdown of the sequential execution because parallelization itself generates overhead by forking new executors, encoded switching from predictor code base to executor code base, and checking the obligations. Starting with 2 parallel executors, the slowdown decreases linearly with the number of parallel executors. Between 8 and 32 parallel executors the decrease stagnates since we are then overloading our 16-core machine. The more overhead the AN-encoded version generates the better does its parallelization scale. With 16 cores the average slowdowns of the tested ANencoded applications can be reduced from 110 in the sequential case to 16 in the parallelized case. Thus, parallelizing our AN-encoded applications using 16 cores makes them at best 11.5(tcas), worst 2.3(primes) and on average 6.9 times faster. Error Detection. Fig. 5 shows the results of our error injection experiments. The used error injection tool we implemented using LLVM. At compiliation time, we insert so-called trigger points wherever possible. At runtime we decide at each trigger point if it is triggered, i.e., if an error is inserted. If it is not triggered, execution is carried on as in the errror-free case. The caption of Fig. 5 describes the implemented error model. While unencoded applications show high rates of undetected errors (incorrect output ), this is not the case for AN-encoded programs. The highest number of incorrect output with 7%, we see for md5 with faulty operations while the highest number for unencoded programs is 71% for md5 with exchanged operators. For the probabilistic error injection it even goes down to in average 0.5% of undetected errors compared to 15.9% for the unencoded execution. In the case of failure detected, we either detected an invalid code word and stopped the application, or modiﬁed data led to another inconsistency in the

normalized behavior in %

Parallelizing Software-Implemented Error Detection

100 90 80 70 60 50 40 30 20 10 0 EO1 E02 FO LS MO ALLProb EO1 E02 FO LS MO ALLProb

primes native

normalized behavior in %

223

primes AN-encoded

EO1 E02 FO LS MO ALLProb EO1 E02 FO LS MO ALLProb

tcas native

tcas AN-encoded

100 90 80 70 60 50 40 30 20 10 0 EO1 E02 FO LS MO ALLProb EO1 E02 FO LS MO ALLProb

md5 native no error

md5 AN-encoded

correct output

failure detected

EO1 E02 FO LS MO ALLProb EO1 E02 FO LS MO ALLProb

pid native performance failure

pid AN-encoded incorrect output

Fig. 5. We inserted the following transient error types: exchange operands (EO1): A diﬀerent but valid operand is used. exchange operators (EO2): A diﬀerent operator is used, e.g., a plus instead of a minus. faulty operations (FO): The result of an operation is modiﬁed by bit-ﬂips. lost stores (LS): A store operation is omitted. modify operands (MO): An operand is modiﬁed by bit-ﬂips. For each example program 10,000 runs with the same input were executed for each error type. In each run exactly one error of that type was inserted. ALL represents the average over all those experiments. Prob, however, shows the results for probabilistically executed injections of all types. At each possible injection point, we decided if an error should be injected. The same error probability was used for the unencoded and the AN-encoded version. This results in more errors being injected into the AN-encoded version because of its larger code size.

program causing a crash. The AN-encoded version always has a higher amount of failure detected than the unencoded runs. The probabilistic runs (Prob), where multiple errors are injected in nearly all AN-encoded runs, result in a higher percentage of failures detected because the more errors are injected, the higher is the probability of detection [20]. A large part of the injections produced correct output which does not diﬀer from the output of the error-free run. For three of the four AN-encoded versions, the amount of such runs increases. Especially with the AN-encoded versions of the tcas benchmark, we see performance failures, i.e., the application runs much longer than the error-free run. tcas contains several loops whose conditions contain comparisons with 8-bit integers. Encoding increases the size of those values to 64 bit. Injected errors probably increase the contained functional value.

224

U. Schiﬀel et al.

This results in longer running loops since AN-encoded comparisons currently do not check the code of their operands. Code checking is expensive and especially for control systems such as tcas or pid it might be cheaper to use a watchdog to check for liveness. No error marks those runs of the probabilistic injections into which no error was injected. This is the case for 35% to 52% of the native runs but for none of the AN-encoded runs. Obviously, the increased code path lengths makes it more probable of being hit by an error. But on the other hand AN-encoding makes it possible to prevent erroneous output to a large extent.

5

Related Work

Error Detection. For an overview of hardware approaches for hardware error detection see our previous work [14]. They all have in common that custom hardware is typically very expensive and not often adapted to new faster technologies. Our intention is to provide a software-implemented error detection mechanism which provides detection guarantees independent of the used hardware. This allows to use up-to-date hardware in safety-critical systems which require certiﬁcation. Control ﬂow checking approaches such as [4,16] in contrast to AN-encoding can detect invalid control ﬂow for the executed program, that is, execution of sequences of instructions which are not permitted for the executed binary. Modiﬁed data values, that an AN-code detects, will remain undetected. In [15,22] invariants contained in the executed program are used to check the validity of the generated results. Thereby good error detection can be provided but for most programs it is diﬃcult—if not impossible—to design invariants with good failure detection capabilities and to assess the quality of these invariants. Replicated execution and comparison of the obtained results as for example used by [18,3] provide no guarantees with respect to permanent hardware errors or soft errors which disturb the voting mechanism. ED4I [12] uses also an AN-code but the authors choose a factor A which is a power of two whenever a program contains logical operations. Thereby, logical operations become easily encodable but the detection capabilities are reduced immensely because the resulting code cannot detect bit ﬂips in the higher order bits of data values. But those contain the original functional value. [6] applies the ANcode only to registers and not to memory and only to operations which easily can handle encoded values such as additions and subtractions. In the end that leaves supposedly only small parts of applications which are AN-encoded. As should be expected their fault injection experiments show a non-negligible amount of undetected failures for most of the tested applications. Both [12] and [6] do not discuss overﬂow problems with AN-codes which we pointed out and solved in [19].

Parallelizing Software-Implemented Error Detection

225

Parallelized Checks. Recent work [11,13,17] exploits modern multi-core systems for reducing the overhead of expensive runtime checks. As in our parallelization framework the execution is split into epochs. The application runs as speculator on a single core. Whereas, the other cores replay the execution of the speculator using multiple parallel checkers. But there are also major diﬀerences. Speck [11] and SuperPin [17] rely on dynamic binary instrumentation. Instrumenting AN-encoding at runtime is less safe as a static instrumentation [21]. While Speck changes the whole OS kernel, we implemented a kernel module which is much easier to deploy and maintain. SuperPin does not have a speculation support on the syscall level. Thus, erroneous output cannot always be blocked before it is veriﬁed. Diﬀerent AN-encoded checker epochs have to share state because it is required that predecessors verify the error-freeness of the start state of their successors. Our understanding is that SuperPin does not support sharing state between parallel running checker epochs. Speck and parallel DIFT [13] merge the checking data gathered by the checkers in a separate thread into sequential order. This thread can become a bottleneck when the number of checkers increases. Parallel DIFT uses a hardware extension [7] to stream data between cores. We do not rely on any specialized hardware but implemented state sharing using speculative variables.

6

Conclusion

We demonstrated that AN-codes can be used to reduce the rate of undetected incorrect output for frequently occurring error events from 15.9% to 0.5% and for rare error events with only one injected error per run from 30.8% down to 1.7%. The remaining undetected errors can be tackled by using the more powerful but also more expensive AN-code with signatures as introduced by Forin in [9].

References 1. The Paparazzi Project, http://paparazzi.enac.fr/wiki/Main_Page 2. Avizienis, A.: Arithmetic error codes: Cost and eﬀectiveness studies for application in digital system design. Transactions on Computers (1971) 3. Bolchini, C., Miele, A., Rebaudengo, M., Salice, F., Sciuto, D., Sterpone, L., Violante, M.: Software and hardware techniques for SEU detection in IP processors. J. Electron. Test. 24(1-3), 35–44 (2008) 4. Borin, E., Wang, C., Wu, Y., Araujo, G.: Software-based transparent and comprehensive control-ﬂow error detection. In: Proceedings of the International Symposium on Code Generation and Optimization (CGO), Washington, DC, USA, pp. 333–345. IEEE Computer Society, Los Alamitos (2006) 5. Borkar, S.: Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro. (2005) 6. Chang, J., Reis, G.A., August, D.I.: Automatic instruction-level software-only recovery. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN). IEEE Computer Society, Los Alamitos (2006)

226

U. Schiﬀel et al.

7. Chen, S., Kozuch, M., Strigkos, T., Falsaﬁ, B., Gibbons, P.B., Mowry, T.C., Ramachandran, V., Ruwase, O., Ryan, M., Vlachos, E.: Flexible hardware acceleration for instruction-grain program monitoring. In: ISCA 2008: Proceedings of the 35th International Symposium on Computer Architecture. IEEE Computer Society, Los Alamitos (2008) 8. Dixit, A., Heald, R., Wood, A.: Trends from ten years of soft error experimentation. System Eﬀects of Logic Soft Errors, SELSE (2009) 9. Forin, P.: Vital coded microprocessor principles and application for various transit systems. In: IFA-GCCT, September 1989, pp. 79–84 (1989) 10. Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on Code generation and optimization (CGO). IEEE Computer Society, Los Alamitos (2004) 11. Nightingale, E.B., Peek, D., Chen, P.M., Flinn, J.: Parallelizing security checks on commodity hardware. SIGARCH Comput. Archit. News (2008) 12. Oh, N., Mitra, S., McCluskey, E.J.: ED4I: Error detection by diverse data and duplicated instructions. IEEE Trans. Comput. 51 (2002) 13. Ruwase, O., Gibbons, P.B., Mowry, T.C., Ramachandran, V., Chen, S., Kozuch, M., Ryan, M.: Parallelizing dynamic information ﬂow tracking. In: SPAA 2008: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, USA. ACM, New York (2008) 14. Schiﬀel, U., S¨ ußkraut, M., Fetzer, C.: AN-encoding compiler: Building safetycritical systems with commodity hardware. In: The 28th International Conference on Computer Safety, Reliability and Security, SafeComp 2009 (2009) 15. Stefanidis, V.K., Margaritis, K.G.: Algorithm based fault tolerance: Review and experimental study. In: International Conference of Numerical Analysis and Applied Mathematics (2004) 16. Vemu, R., Abraham, J.A.: CEDA: Control-ﬂow error detection through assertions. In: IOLTS 2006: Proceedings of the 12th IEEE International Symposium on OnLine Testing. IEEE Computer Society, Los Alamitos (2006) 17. Wallace, S., Hazelwood, K.: Superpin: Parallelizing dynamic instrumentation for real-time performance. In: 5th Annual International Symposium on Code Generation and Optimization, San Jose, CA, March 2007, pp. 209–217 (2007) 18. Wang, C., Kim, H.s., Wu, Y., Ying, V.: Compiler-managed software-based redundant multi-threading for transient fault detection. In: International Symposium on Code Generation and Optimization, CGO (2007) 19. Wappler, U., Fetzer, C.: Hardware failure virtualization via software encoded processing. In: 5th IEEE International Conference on Industrial Informatics, INDIN 2007 (2007) 20. Wappler, U., Fetzer, C.: Software encoded processing: Building dependable systems with commodity hardware. In: Saglietti, F., Oster, N. (eds.) SAFECOMP 2007. LNCS, vol. 4680, pp. 356–369. Springer, Heidelberg (2007) 21. Wappler, U., M¨ uller, M.: Software protection mechanisms for dependable systems. In: Design, Automation and Test in Europe, DATE 2008 (2008) 22. Wasserman, H., Blum, M.: Software reliability via run-time result-checking. J. ACM (1997) 23. Wescott, T.: PID without a PhD. Embedded Systems Programming 13(11) (2000)

Model-Based Analysis of Contract-Based Real-Time Scheduling Georgiana Macariu and Vladimir Cret¸u Computer Science and Engineering Department Politehnica University of Timisoara Timisoara, Romania

Abstract. We apply automata theory to analyze the schedulability of real-time component-based applications running on uniform multi-processor platforms. The resource requirements of each application or application component are speciﬁed in a service contract resulting a hierarchy of contracts. As we are interested in determining the schedulability of such applications, this hierarchy of contracts is mapped to a hierarchical scheduling strategy. We use model checking and transform the schedulability analysis problem into a reachability checking of a timed automata model of the service contracts.

1

Introduction

In the last years, real-time embedded software development has focused more and more on building ﬂexible and extensible applications. Component-based software systems achieve these objectives by gluing individually designed, developed and tested software components, each component having diﬀerent timing requirements. Therefore, when building such a component-based system one must ensure that components can coexist without jeopardizing each other’s execution. One of the solutions for temporal isolation of applications running on uniprocessor systems has been provided by utilizing hierarchical scheduling based on execution time servers [1]. In hierarchical scheduling each application has its own scheduler and can use the scheduling policy that best suits its needs. Based on such a hierarchical scheduling scheme, Harbour has introduced the concept of service contracts [2]. In Harbour’s model, every application or application component may have a set of service contracts describing its minimum resource requirement. These contracts are used in online or oﬄine negotiations to determine if the resource requirements can be guaranteed or not. Recently, hierarchical scheduling has been used in a multi-processor scheduling framework for integrating applications with hard, soft and non-real-time requirements [3]. Also research is undertaken for extending the service contract model for component-based multi-processor real-time systems. Chang et. al [4] has proposed a two-level resource contract model. First, each application has a contract specifying the resources to be reserved for its execution. This is called an external contract. Next, every component of the application has its own contract, called internal contract, describing the portion of the resources speciﬁed in S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 227–239, 2009. c IFIP International Federation for Information Processing 2009

228

G. Macariu and V. Cret¸u

the external contract that must be distributed to the component. Each component consists of one or more tasks which may require parallel execution. Internal contracts are mapped to abstract servers which are further divided in execution time sub-servers (called just servers in what follows) in order to support parallel execution of the components. On the other hand, external contracts are mapped to multi-processor time partitions [5]. As each application will be mapped to a separate time partition, a speciﬁc scheduling policy may be associated with it. Starting from the hierarchical scheduling solution proposed in [4], we apply timed automata theory [6] to specifying the service contracts for components and applications. A component contract describes the tasks of the component and the arrival patterns of these tasks, modeled by a timed automaton. An application contract will refer to the servers for all components in the application and to the arrival patterns of these servers, modeled also as a timed automaton. We allow a diﬀerent scheduling policy for each component and application, with no restriction on task preemption. Furthermore, we present a compositional approach based on timed automata for schedulability analysis of component-based real-time applications running on uniform multi-processor platforms. For each application, schedulability (i.e. checking that all application components can be executed such that all their tasks meet their deadlines) can be analyzed separately. In the timed automata formalism, schedulability analysis is reduced to reachability and can be performed using a tool like UPPAAL [7]. Schedulability analysis of real-time systems using timed automata has been proved decidable and applied successfully for non-preemptive scheduling policies of tasks. However, in timed automata models as deﬁned in [6] time elapses at the same rate for all components and therefore they cannot be used for preemptive scheduling policies where execution of tasks can be suspended and resumed later. Stopwatch automata [8], a subclass of Linear Hybrid Automata, have been proposed as a solution for modeling preemptible tasks. However, since the reachability of Linear Hybrid Automata has been proved undecidable [9], this proof extends also to the stopwatch automata. An over-approximation method based on Diﬀerence Bound Matrixes has been applied in [8] for a coarse reachability analysis of stopwatch automata. Even so, the schedulability checking problem has been shown to be decidable for non-uniformly recurring tasks triggered by events. [10] introduces timed automata extended with tasks, a class of timed automata with subtraction where clocks may be updated by subtraction in a bounded zone, and proves that the schedulability checking relative to a preemptive scheduling policy is decidable for this class of automata. This result has been extended for multi-processor real-time systems in [11] where it is shown that the schedulability problem is decidable for preemptive scheduling policies with ﬁxed execution time tasks. However, they do not allow task migration meaning that a task instance is bound with one processor until it ﬁnishes. In our framework a task instance may execute on any processor depending on the availability of the execution time servers and the conﬁguration of the multi-processor time partition. The global multi-processor schedulability analysis using model-checking has been investigated for tasks with static priorities in [12]. The models in [12] allow

Model-Based Analysis of Contract-Based Real-Time Scheduling

229

restricted and full migration of task instances. Every task is modeled separately and the schedulability of tasks is checked in decreasing order of their priority which limits the applicability of the analysis to static scheduling policies. This also implies that for a task set with N tasks, model checking has to be performed N times in order to determine the schedulability of the entire set. With this approach a maximal number of N + 1 clocks are necessary for a task set of size N . Unlike this model checking solution, our proposal addresses both static and dynamic scheduling policies. Moreover it requires just a single run of the model checking for the entire task set using a single clock in a setting with resources that are not continuously available and multiple levels of scheduling. This paper is organized as follows. Section 2 introduces our formal model for contract-based scheduling and Section 3 gives details on the timed automata used in the system model. We present performance evaluation results in Section 4. Section 5 concludes this paper.

2

The Contract-Based Scheduling Model

This section presents the formal model of the service contracts. As explained in Section 1 there are two levels of such contracts. The ﬁrst level speciﬁes the resource requirements of a single application while the second level describes the requirements of each individual component of the application. Corresponding to the two levels of contracts there are two scheduling levels. At the upper level, each component of an application has a scheduler for scheduling its tasks, while at the lower level there is an application scheduler which manages the servers associated with each component of the application. 2.1

Component Contracts

A component C consists of a ﬁnite set of n tasks T and a timed automaton AC where: - a component task τi ∈ T is a tuple τi = (wi , pi , oi , di ), with wi being the worst case execution time of the task, pi the inter-arrival time between different instances of the same task, oi the ﬁrst release of the task and di is the deadline of the task where wi ≤ di ≤ pi , - tasks may execute in parallel and are independent of each other, - AC models the execution of tasks in set T by taking transitions labeled with actions tReadyi , tF inishi and tOverruni , ∀ 1 ≤ i ≤ n representing the release and ending of task τi , and actions tGoi and tP reempti through which the component scheduler notiﬁes task execution start/restart and suspension. The tasks of the component will be executed according to a component speciﬁc scheduling policy implemented by a scheduler associated with the component. The parameters of the tasks along with the task arrival pattern determine the resource requirements for the component. These resource requirements can be supported using one or more execution time servers, depending whether the tasks

230

G. Macariu and V. Cret¸u

must execute in parallel or not. The period, deadline and budget of the servers associated with a component are speciﬁed in the component contract. A server is deﬁned by a tuple (q, p, o) where q is the capacity of the server, p is its replenishment period (i.e. the server becomes active every p time units) and o is the time of its ﬁrst release. Each server may also have a deadline equal to its period. It is assumed there is a ﬁnite set of servers S containing the servers for all the components of an application. Definition 1 (Component contract). A component contract CC providing a set of ns execution servers SC ⊆ S is a timed automata ACc over the set of actions ΣC such that: - ACc specifies the activation pattern of servers σi ∈ SC , 1 ≤ i ≤ ns . - ΣC is split in two sets: - output actions: ΣCO = {sReadyi , sF inishi , sOverruni , sActivei , sInactivei | 1 ≤ i ≤ ns } - input actions: ΣCI = {sGoi , sP reempti | 1 ≤ i ≤ ns }. The ACc automaton sends the output action sReadyi to the scheduler associated with the application as soon as server σi is ready for execution and sends sF inishi or sOverruni to the same scheduler to notify it that the server has ﬁnished its execution, respectively missed its deadline. As a response to its actions ACc can receive from the application scheduler sGoi , telling it that server σi can start its execution, or sP reempti which results in server σi being suspended from execution until the next sGoi action. Actions sActivei and sInactivei are used to announce the component scheduler that server σi has consumed all its budget, respectively that it has replenished its budget and can be used again to execute tasks. 2.2

Application Contracts

As proposed in [4] the application contracts are supported by a multi-processor time partition model. Each application is associated with a time partition which has a local scheduler to execute the execution time servers assigned to the components of the application. In a uni-processor system a time partition is implemented as a ﬁxed-length major time frame composed of several scheduling windows. A scheduling window is deﬁned by its oﬀset to the beginning of the partition major time frame and by its length. The scheduling scheme of the major time frame repeats during the execution of the system such that all scheduling windows are essentially periodic. In a multi-processor system, we assume there is a major time frame for each processor, but frames on all processors will have equal length and will be synchronized. The scheduling windows of frames on diﬀerent processors can be diﬀerent. From the above speciﬁcation we derive next a formal deﬁnition of the multiprocessor time partition.

Model-Based Analysis of Contract-Based Real-Time Scheduling

231

Definition 2 (Time partition). A time partition T P in multi-processor system is described by a set of major time frames {Fi | 1 ≤ i ≤ m, length(Fi ) = L}, one for each of the m processors in the system, where Fi is a set of scheduling windows with periods that are an exact divisor of L. In our setting the time partition is used to facilitate application contracts. In a simple scenario, the application contract could specify a few pairs of period and length values which upon successful negotiation of the contract could be mapped to a set of scheduling windows. Definition 3 (Application contract). An application contract CA is a pair (T P, ACa ) where: - T P is the multi-processor time partition provided by the contract, and - ACa is a timed automaton over the action set ΣSW modeling the scheduling scheme of the major time frame: - ΣSW = {swActivek , swInactivek }, where k is a scheduling window in T P. - action swActivek signals to the application scheduler that the scheduling window k is now active, while swInactivek signals its deactivation.

3

The Timed Automata Models

As shown in the previous section both component and application levels include three automata - one for generating tasks or servers according to a given release pattern, one for generating the resources (servers or scheduling windows) on which the tasks and servers, respectively shall be executing and one for scheduling. Notice that servers can be both schedulable entities (i.e. when referring to the application scheduler) and resources (i.e. for the component scheduler). For this reason in the rest of this section they are referred simply as tasks and, respectively as resources. Also, this plurality of roles implies that the timed automaton generating tasks for the application level is the same with the one generating resources for the component level. Therefore, this automata can be deduced immediately from the task generator and the resource generator automaton types. The rest of the section is dedicated to given detailed descriptions of each of the three types of automata. In addition to the three types of automata, the model also includes a Timer automaton which uses a single continuous clock t and each time this clock ticks sends a tick signal to the task generator and the resource generator automata. We ﬁrst introduce some notations. Let W (i), P (i), D(i), R(i) and E(i) denote the worst case execution time, the period, the deadline, the next release time and the current execution time, respectively for each task τi . For each task τi it is deﬁned a status variable status(i) that is initialized to idle meaning that a task instance has not been released yet. The value status(i) = ready is used to denote that a task instance of τi is ready for execution (i.e. it has just been released or was preempted). Let status(i) = running stand for the fact that

232

G. Macariu and V. Cret¸u

a task instance of τi is currently running on one of the active resources. To denote that an instance of task τi has ﬁnished or has missed its deadline we use status(i) = f inish and status(i) = overrun, respectively. 3.1

Task Generator Automaton

Model checking of preemptive scheduling algorithms could be done using a stopwatch model but it has been proved that schedulability of these models is undecidable. Therefore, in order to address task preemption a discrete time formalism is adopted for the model proposed in this paper. This leads to a limitation as all task parameters (i.e. worst case execution time, period, deadline, release time) must have integer values. In order to be able to determine the actual execution time of a task, a variable E(i) is used for keeping track of the time task τi has executed since its last release. Each time the task is released E(i) is set to 0 while R(i) is set to the time of its next release. When the task generator automaton receives a tick signal from the Timer automaton it increases E(i) for tasks with status(i) = running and decreases R(i) for all tasks with a value M IN representing the minimum between the time for the next release of a task or of a resource and the time for the next termination of a task or deactivation of a resource. In other words, E(i) acts like a discrete clock which can be suspended and resumed. Instead of using a task generator for releasing all n tasks of a component according to some pattern, it would have been possible to deﬁne a timed automaton for each of the n tasks, each automaton with a clock, leading to a total of n clocks. Since the state space of timed automata grows exponentially with the number of clocks in the model, the approach taken in this paper is superior to this one. Figure 1(a) shows the main locations and transitions in the task generator automaton, leaving out some self-loop transitions. All white locations in the ﬁgure have the semantics that the system cannot delay in those locations and the next transition must involve an outgoing edge from one of them.

Overrun

preempt?

overrun! status(j)=IDLE

overrun_len>0 && finish_len>0 finished_task=get_task_overrun() preempt? overrun_len--

go?

t=0

t_next_release==0 ready_task=get_task_ready()

StartUp

t_next_release>0

Idle

Increment

t_next_release>0 tgen_enabled=1

ready!

r_next_finish==0 finished_res=get_finished_res() Deactivate deactivate(finished_res)

finish_len>0 finished_task=get_task_finished() finish_len--

tick? inc_time() tgen_enabled=0

Finish

inactive! r_next_finish>=0 && r_next_release>=0

Idle

finish! status(j)=IDLE

active!

go? Ready

tick? inc_time()

t_next_release==0 && finish_len==0

(a) The task generator automaton

r_next_release==0 ready_res=get_ready_res() Activate activate(ready_res)

(b) The non-preemptive resource generator automaton

Fig. 1. Task and resource generators

Model-Based Analysis of Contract-Based Real-Time Scheduling

233

The task generator automaton uses a variable next release to remember the time until the next task is released. At start-up this variable is initialized with the smallest R(i) and if after that next release = 0 the automaton goes to the Ready location and selects a task τi for which R(i) = 0, updates next release, sets the shared variable ready task = i and sends the ready signal to the scheduler automaton. Once next release becomes greater than 0, the generator moves to the Idle location where it waits for the next tick of the Timer. When the tick signal arrives the transition to the Increment location is taken and inc time() updates the values status(i), E(i) and R(i) as follows: - for all tasks τi with status(i) = running E(i) = E(i) + M IN and if E(i) = W (i) then status(i) = f inished, - for all tasks τi R(i) = R(i) − M IN and next release = min(R(i)), - for all tasks τi running or ready for execution with E(i) < W (i) and P (i) − D(i) = R(i) sets status(i) = overrund. Next, for all tasks τj that have ﬁnished the variable f inished task is set to j and the f inished signal is sent to the scheduler which will free the resources used by these tasks. If any task τj has missed its deadline an overrun signal notiﬁes the scheduler which as a result will go to an Error location. After signaling all task ﬁnish events the generator checks to see if there is any task ready for execution and goes back to the Ready location. 3.2

Resource Generator Automaton

The task generator automaton presented above can be used to generate servers which act as resources for the component level. By adding just two signals active and inactive - to notify the scheduler about the availability of the resources the task generator automaton becomes a resource generator automaton with the property that those resource are preemptible. If resources are not preemptible (i.e. the scheduling windows of a time partition) the resource generator automaton is a simpliﬁed version of the task generator. Figure 1(b) presents the non-preemptive version of the resource generator automaton. The automaton keeps a discrete clock RE(k) for each resource rk . Also RR(k) is used to remember the time until the next activation of resource rk and two variables named next release and next f inish hold the time until the next resource activation and, respectively deactivation. When resource rk is activated RE(k) = L(k) where by L(k) we denote the length of the resource’s activation period. At every tick signal received from the Timer for all active resources rk RE(k) is decreased with the value MIN and variables next release and next f inish are also decreased with the same value. When next release reaches 0 all resources rk with RR(k) = 0 are activated. If next f inish becomes 0 than all resources rk with RE(k) = 0 are deactivated. 3.3

Scheduler Automaton

As it can be seen from the deﬁnitions in the previous sections, the component scheduler and the application scheduler have rather similar behavior. Both of

234

G. Macariu and V. Cret¸u

them must schedule a set of periodic tasks/servers with deadlines less or equal to their period. The component tasks are scheduled on execution time servers which may be active or inactive. It is possible for two or more servers to be active simultaneously which implies that two or more tasks may run in parallel. For the application scheduler the tasks to be scheduled are actually the servers used by the component scheduler as resources. The servers are scheduled for execution on the scheduling windows of a time partition. The scheduling windows represent the resources allocated to the application by the system. As more scheduling windows can be active simultaneously parallel execution of the servers is also possible. A scheduler automaton for a service (i.e. application or component) contract has the following characteristics: - has a queue holding the tasks ready for execution, - implements a preemptive scheduling policy Sch representing a sorting function for the task queue, - maintains a map between active resources (servers or scheduling windows) and tasks using those resources, and - has an Error location which is reached when a task misses its deadline. To record the status of a resource, let rt map(j) be a map where rt map(j) = inactive denotes that resource j is inactive, rt map(j) = active means that resource j is active but no task is executing on it, and rt map(j) = i denotes that resource j is active and is currently used by task τi . Figure 2 shows the scheduler automaton. The locations of the automaton have the following interpretations: 1. Idle - denotes the situation when no task is ready for execution or no resources are active, 2. Prepare - a task has been released and a resource is active after a period during which either there were no tasks to schedule or no active resources, 3. Running - at least one task is currently executing, running==false overrun?

active? res_no++ rt_map[ready_res]=ACTIVE

overrun?

inactive? res_no-task_no==0 active? res_no++

Error

res_no==0 ready? task_no++

finish? task_no--

Idle

task_no>0 activate? res_no++ activated_task=Sc h()

Prepare

task_ready==true activated_task=Sch() rt_map[rk]=activated_task go! res_no>0 ready? task_no++

rt_map[ready_res]=ti go!

task_ready==false && running==true

Running

Prio(ready_task)>Prio(tj) && res_no>0 enqueue(tj) preempt!

task_ready==false && running==true ready? task_no++ enqueue(ready_task)

inactive? res_no— running==false && res_no==0

AssignTask

AssignResource Check rt_map[finished_res]!=ACTIVE enqueue(rt_map[finished_res]) preempt! rt_map[finished_res]=INACTIVE

Fig. 2. The scheduler automaton

Model-Based Analysis of Contract-Based Real-Time Scheduling

235

4. AssignTask - a task has just ﬁnished and as a result an active resource can be used to schedule another ready task, 5. AssignResource - a task has just been released or a resource has just become inactive leaving its assigned task with no resource on which to execute; consequently the task has to be enqueued and if it has the highest priority in the queue according to Sch then an active resource is assigned to it, 6. Check - a resource has become inactive, 7. Error - the task set is not schedulable with Sch. The scheduler enters the Idle location when either there are no ready tasks, no active resources or both of these conditions hold. As long as new tasks are released for execution but there are no active resources on which the tasks to be executed (i.e. task no > 0 and res no == 0) or as long as there are available resources but no ready tasks (i.e. task no == 0 and res no > 0) the scheduler stays in the Idle location. If the scheduler receives a ready signal meaning that task τready task has been released and res no > 0 the scheduler goes to the Prepare location. Leaving the Prepare location for the Running location, it assigns the task to one of the active resources by setting rt map(j) = i, sets the variable activated task = i and sends a go signal to announce the task generator automaton that task τ i is running. After the scheduler has reached the Running location, it will leave this location if one of the following situations happen: - the resource rk becomes active (signaled by the active signal and activated re-source = k): this is marked by updating rt map[k] = ACT IV E on the transition to the AssignTask location. If tasks are ready for execution than the scheduler will assign the highest priority task τj to resource rk by setting rt map[k] = j and will notify the task generator with the signal go on a transition back to the Running location. - a new task τi has been released (signaled by the ready signal and ready task = i): the task is enqueued by setting status(i) to ready on the transition to the AssignResource location. If task τi is the highest priority released task and there are active resources then τi must start executing. If there is a free active resource then task τi is assigned to it otherwise the lowest priority task is chosen from the running tasks, preempted and the automaton goes to the AssignTask location. On the transition from AssignTask to Running the resource is assigned to τi and a go signal is sent to the task generator to notify it that task τi has started running. - the resource rk becomes inactive (signaled by the inactive signal and deactivated resource = k): this is marked by updating rt map[k] = IN ACT IV E on the transition to the Check location. If the deactivated resource was free and there are still running tasks but no tasks in the queue then the transition back to Running location is taken. If a task τi was using resource rk then the scheduler must set status(i) = ready and go to AssignResource location. Should the resource rk be the last active resource the scheduler would simply preempt task τi and go back to the Idle location, otherwise an active resource is searched analog to the situation when a new task is released.

236

G. Macariu and V. Cret¸u

- the task τj ﬁnishes (signaled by the f inish signal and f inished task = j): the resource used until now by τj can be assigned to the highest priority task waiting in the queue, if there is such a task. - the task τi misses it deadline (signaled by overrun): the scheduler automaton goes into the Error location.

4

Performance Analysis

This section presents an evaluation of the performance and scalability of model checking the contract-based scheduling model. The experiments were run on a machine with Intel Core 2 Quad 2.40 GHz processor and 4 GB RAM running Ubuntu. The analysis of the model was automated using UPPAAL and the utility program memtime [13] was used for measuring the model checking time and memory usage. Although the proposed model addresses scheduling at two levels, namely task level and server level, experiments were conducted only for the server level as we consider the analysis of the task level is just a replica of the server level due to the similarities between the two levels. In all experiments, to verify schedulability we checked if property A[] not Error holds. In order to observe the behavior of the model for diﬀerent number of application servers we have used randomly generated sets of servers with periods in the range [10, 100] and utilizations (i.e. budget/period) generated with a uniform distribution in the range [0.05, 1]. The oﬀset of each server was set to a value equal to the period multiplied with a randomly generated number in the interval [0, 0.3]. Also, the servers sets were accommodated by a time partition with 9 scheduling windows and a total utilization of 4.5. Figure 3 shows how the model checking time and memory usage increase with the number of servers in the set. Also it can be noticed that for the same size of the server set the performance of the model checking can vary between rather larger limits (e.g. for sets of 30 servers the model checking time grows from 7 seconds to approximatively 25 seconds). This is due to the size of the hyper-period of the server sets, larger the hyper-period larger the model checking time and memory consumption. Next, we analyzed the scalability and performance of model checking when the number of scheduling windows in the time partition accommodating the servers varies. For this, sets with 25 servers each and parameters in the same Model checking mem mory usage [MB]

Model checkiing time [sec]

30 25 20 15 10 5 0 0

5

10

15

20

25

Server set size [no. of servers]

(a) Model checking time

30

35

30 25 20 15 10 5 0 0

5

10

15

20

25

30

Server set size [no. of servers]

(b) Model checking memory usage

Fig. 3. Inﬂuence of server set size on model checking performance

35

Model-Based Analysis of Contract-Based Real-Time Scheduling 14

14 12 10 8 6 4 2 0 0

2

4

6

8

10

Model checking mem mory usage [MB]

Model checkiing time [sec]

16

237

12 10 8 6 4 2 0 0

2

Time partition size [no. of scheduling windows]

4

6

8

10

Time partition size [no. of scheduling windows]

(a) Model checking time

(b) Model checking memory usage

Fig. 4. Inﬂuence of time partition size on model checking performance ϭϬϬ

ϴ

ZD

ϵϬ

ϳ

&

ϴϬ

ϲ

dͲ

ϳϬ

Success rate [%]

Model checkiing time [sec]

ϵ

ϱ ϰ ϯ Ϯ

ϲϬ ϱϬ ϰϬ ϯϬ

ZDͲϯϬ

ϮϬ

ϭ

ZDͲϮϬ

ϭϬ

Ϭ

Ϭ

0

5

10

15

20

25

30

35

Task set size [no. of tasks]

Fig. 5. Inﬂuence of scheduling policy on model checking time

ϭ͘ϬϬ

ϭ͘ϱϬ

Ϯ͘ϬϬ

Ϯ͘ϱϬ

ϯ͘ϬϬ

ϯ͘ϱϬ

ϰ͘ϬϬ

ϰ͘ϱϬ

Task set utilization

Fig. 6. Schedulability of task sets

limits as for the ﬁrst experiment were generated and time partitions with 2, 3, 5, 7 and 9 scheduling windows were tested. In Figure 4 it can be seen that both the time for checking the model and the memory usage grow with the number of scheduling windows in the time partition. In the ﬁrst two experiments the server sets were scheduled using the Rate Monotonic (RM) priority scheduling policy. The goal of our next experiment is to determine the impact of the scheduling policy on the model checking time and peak memory usage. The same time partition conﬁguration as in the ﬁrst experiment was used and sets of 5, 10, 15, 20, 25 and 30 servers were scheduled using both the Rate Monotonic, the Earliest Deadline First (EDF) and the (T-C) (i.e. the higher the diﬀerence between the period and the budget of a server the lower its priority) scheduling policies. As can be seen in Figure 5 the scheduling policy has little inﬂuence on the performance of the model checking. In the last experiment we are interested in seeing what is the inﬂuence of the task set utilization on the schedulability analysis. We have used the same time partition as in the ﬁrst experiment with a total utilization of 4.5 and task sets of 20 and 30 tasks with utilizations between 1 and 4.5 scheduled using the Rate Monotonic policy. Figure 6 depicts the number of schedulable task sets identiﬁed by our analysis. It can be noticed that even if the total utilization of a task set is maximal with respect to the available resources, our analysis is able to determine its schedulability, which is a clear advantage over the pessimist schedulability bounds presented in [4].

238

5

G. Macariu and V. Cret¸u

Conclusions

In this paper we have presented a compositional approach using the timed automata formalism for schedulability analysis of component-based real-time applications which utilize multi-processor resource partitions. Starting with the assumption that the resource requirements for each application and component are stipulated in a service contract we have deﬁned a timed automata model for specifying the contracts and shown how to use model checking as a technique for analyzing the preemptive schedulability of an hierarchy of such contracts. The performance analysis of our technique using the UPPAAL model checker showed that even with just one real-time clock used for the entire model, the applicability of the technique is limited by the state-explosion problem.

Acknowledgment This research is supported by eMuCo, a European project supported by the European Union under the Seventh Framework Programme (FP7) for research and technological development.

References 1. Lipari, G., Bini, E.: A methodology for designing hierarchical scheduling systems. Journal of Embedded Computing 1(2), 257–269 (2005) 2. Harbour, M.G.: Architecture and contract model for processors and networks. Technical Report D-AC1, Universidad de Cantabria (2006) 3. Brandenburg, B.B., Anderson, J.H.: Integrating hard/soft real-time tasks and besteﬀort jobs on multiprocessors. In: ECRTS 2007: Proceedings of the 19th Euromicro Conference on Real-Time Systems, Washington, DC, USA, pp. 61–70. IEEE Computer Society, Los Alamitos (2007) 4. Chang, Y., Davis, R., Wellings, A.: Schedulability analysis for a real-time multiprocessor system based on service contracts and resource partitioning. Technical Report YCS-2008-432, Computer Science Department, University of York (2008) 5. Kaiser, R.: Combining partitioning and virtualization for safety-critical systems. White Paper, SYSGO AG (2007) 6. Alur, R., Dill, D.L.: A theory of timed automata. Theoretical Computer Science 126(2), 183–235 (1994) 7. Larsen, K.G., Pettersson, P., Yi, W.: UPPAAL in a nutshell. International Journal on Software Tools for Technology Transfer 2(1), 134–152 (1997) 8. Cassez, F., Larsen, K.G.: The impressive power of stopwatches. In: Palamidessi, C. (ed.) CONCUR 2000. LNCS, vol. 1877, pp. 138–152. Springer, Heidelberg (2000) 9. Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What’s decidable about hybrid automata? In: STOC 1995: Proceedings of the 27th annual ACM symposium on Theory of computing, pp. 373–382. ACM, New York (1995) 10. Fersman, E., Krcal, P., Pettersson, P., Yi, W.: Task automata: Schedulability, decidability and undecidability. Information and Computation 205(8), 1149–1172 (2007)

Model-Based Analysis of Contract-Based Real-Time Scheduling

239

11. Krcal, P., Stigge, M., Yi, W.: Multi-processor schedulability analysis of preemptive real-time tasks with variable execution times. In: Raskin, J.-F., Thiagarajan, P.S. (eds.) FORMATS 2007. LNCS, vol. 4763, pp. 274–289. Springer, Heidelberg (2007) 12. Guan, N., Gu, Z., Deng, Q., Gao, S., Yu, G.: Exact schedulability analysis for staticpriority global multiprocessor scheduling using model-checking. In: Obermaisser, R., Nah, Y., Puschner, P., Rammig, F.J. (eds.) SEUS 2007. LNCS, vol. 4761, pp. 263–272. Springer, Heidelberg (2007) 13. Memtime utility, http://freshmeat.net/projects/memtime/

Exploring the Design Space for Network Protocol Stacks on Special-Purpose Embedded Systems Hyun-Wook Jin and Junbeom Yoo Department of Computer Science and Engineering Konkuk University Seoul 143-701, Korea {jinh,jbyoo}@konkuk.ac.kr Abstract. Many special-purpose embedded systems such as automobiles and aircrafts consist of multiple embedded controllers connected through embedded network interconnects. Such network interconnects have particular characteristics and thus have different communication requirements. Accordingly, we need to frequently implement new protocol stacks for embedded systems. Implementing new protocol stacks on embedded systems has significant design space but it has not been explored in detail. In this paper, we aim to explore the design space of network protocol stacks for special-purpose embedded systems. We survey several design choices very carefully so that we can choose the best design for a given network with respect to performance, portability, complexity, and flexibility. More precisely we discuss design alternatives for implementing new network protocol stacks over embedded operating systems, methodologies for verifying the network protocols, and the designs for network gateway. Moreover, we perform case studies for the design alternatives and methodologies discussed in this paper. Keywords: Embedded Networks, Embedded Operating Systems, Network Protocol Stacks, Formal Verification, Protocol Verification, Network Gateway.

1 Introduction Many special-purpose embedded systems consist of multiple embedded controllers connected through network interconnects. For example, machine control systems such as automobiles and aircrafts are equipped with more than hundreds embedded controllers or boards, which collaborate by communicating each other. Since such embedded systems use special network interconnects and have different communication requirements, there are many cases where new protocol stacks are needed to be implemented. Implementing new protocol stacks on embedded systems has significant design space but it has not been explored in detail. Thus it is highly desirable to analyze the possible design alternatives and present their case studies as references. In this paper, we aim to explore the design space of network protocol stacks for special-purpose embedded systems. The legacy protocol stacks such as TCP/IP have several implementations already exist, which can help significantly to reduce the time frame of design and implementation phases. Adding new protocol stacks, however, requires significant cost in terms of time and complexity from the industry perspective. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 240–251, 2009. © IFIP International Federation for Information Processing 2009

Exploring the Design Space for Network Protocol Stacks

241

Therefore, we need to consider several design choices very carefully so that we can choose the best design for a given network with respect to performance, portability, complexity, and flexibility. In this paper, we present various design alternatives and compare them in several aspects. Moreover, we perform the case studies for the design alternatives. The rest of the paper is organized as follows: Section 2 discusses the design alternatives for implementing new network protocol stacks over embedded operating systems. Section 3 describes the methodologies for verifying the network protocols. Section 4 addresses the network interoperability issue and discusses the designs for network gateway. Finally we conclude the paper in Section 5.

2 Protocol Stacks on Embedded Nodes In this section, we explore the design and implementation alternatives of network protocol stacks on embedded nodes. The designs can highly depend on operating systems and their task models but we try to generalize this discussion as much as possible so that the designs described can be applied to the most of embedded operating systems. One of the most important issues when implement new network protocol stacks is who takes charge of multiplexing and demultiplexing of network packets. Accordingly, we classify the possible designs into two: i) user-level design and ii) kernel-level design. 2.1 User-Level Design In this design alternative, the protocol stacks are implemented as a user-level thread or process, which performs (de)multiplexing across networking tasks. The user-level protocol stacks can be portable across several embedded operating systems as far as they follow the standard interfaces such as POSIX. The overall designs are shown in Figure 1. As we have mentioned, the way to implement new network protocol stacks are dependent on the task models of operating systems. Many embedded operating systems such as VxWorks [18] and uC/OS-II [19] define the thread-based tasks on top of the flat memory models in which the user-level protocol stacks are implemented as a user thread. On the other hand, some other embedded operating systems such as Embedded Linux and QNX [20] define isolated memory spaces between tasks. In such systems, the user-level protocol stacks are implemented as a user process in general. Though most of these process-based task models also support multiple threads, the design of processbased protocol stacks is still attractive. This is because, in this task model, if we implement the protocol stacks as a thread it can only support the threads belong to the same process. That is, the thread-based protocol stacks over the process-based task models is not suitable to support multiple networking processes. In either thread or process-based design, the protocol stacks send the network packets by accessing the network device driver directly. Thus the device drivers have to provide the interfaces (e. g., APIs or system calls) for the network protocol stacks. The user-level tasks request the protocol stacks to send their packets through InterProcess Communication (IPC). In case of thread-based design, since the protocol stacks share the memory space with other tasks, the network data can be directly accessed from the protocol thread without data copy as far as the synchronization is

242

H.-W. Jin and J. Yoo

guaranteed by an IPC such as semaphore. On the other hand, the process-based protocol stacks need to pass the network data between the networking tasks and the protocol stacks explicitly by using an IPC such as message queue. This can add message passing overhead because the messaging IPCs usually require memory copy operations to move data between two different memory spaces. On the receiver side, how it works is similar with the sender side; however, there is a critical design issue of how to detect the incoming new packets. Since the protocol stacks are implemented at the user-level, there is no proper asynchronous signaling mechanism at the device driver to notify new packet arrival to the user-level protocol stacks. Thus, the interfaces provided by the device driver are the only way to check new packet arrival. However, if the interface has a blocking semantic then the protocol stacks cannot handle other requests (e.g., sending requests) from the tasks while waiting a new packet arrived. There are two solutions to overcome this issue. One is to use asynchronous interface and the other one is to have multithreaded protocol stacks. The asynchronous interface is easy to use but it is hard to come up with an optimal strategy of calling the interface in terms of when and how frequently. Thus it is likely to achieve lower performance than what the actual network can provide or waste the processor resources. Instead, the multithreaded protocol stacks can have separate threads to serve the sending and receiving operations respectively. That is, for both thread- and process-based designs, the protocol stacks consist of a set of threads. Only one difference is that the multiple threads belong to the same process in case of the process-based design. The receiving thread can block on waiting a new packet while the sending thread handles the requests from the tasks. Once the new packet has been recognized by returning from the blocked function, the receiving thread of the protocol stacks interpret the header and pass the packet to the corresponding process through an IPC. Since the protocol stacks are implemented at the user-level, they are scheduled as other user-level tasks by the task scheduler. If we give the same priority to the protocol stacks with other tasks, the execution of the protocol stacks can get delayed, which results in high network latency. Thus it is desired that the protocol stacks have higher priority than general user-level tasks and block on waiting new packets received or sending requests, which allows other tasks to utilize the processor resources if there are no pending jobs of the protocol stacks. As a case study we have implemented Network Service of Media Oriented System Transport (MOST) [1] at the user-level over Linux [2]. MOST is an automotive highspeed network to support multimedia data streaming. The current MOST standard specifies 25Mbps ~ 150Mbps network bandwidth with QoS support. To meet the demands from various automotive applications, MOST provides three different message channels: control, stream, and packet message channels. Network Service is the transport protocol for the control messages, which covers from layer 3 to parts of layer 7 of OSI 7 layers. In order to implement Network Service, we have applied the process-based design where the protocol stacks consist of sending and receiving threads. We have utilized the ioctl() system call to provide interfaces between the protocol stacks and the device driver. We have also implemented library for applications, which provides interfaces to interact with the protocol stacks using POSIX message queue. The performance results show 0.9ms of one-way latency with 8-byte control message.

Exploring the Design Space for Network Protocol Stacks

243

Fig. 1. User-level protocol stacks: (a) thread-based design and (b) process-based design

2.2 Kernel-Level Design In this design alternative, the protocol stacks are implemented as a part of operating system. Thus we do not need to move data between the device driver and the protocol stacks. This is because both use the kernel memory space and can share the network buffer. In addition, since the kernel context has higher priority than the user context, the kernel-level protocol stacks can guarantee the network performance. Accordingly, it has more potential of achieving better performance than the user-level protocol stacks. This design however may require modifications of the kernel, which is not portable across several operating systems. As shown in Figure 2, we classify the kernel-level design into bottom half based design and device driver based design according to where the protocol stacks are implemented (especially for receiver side). The traditional protocol stacks are implemented as a bottom half in general. In such design, when a packet has been received from the network controller, the interrupt handler simply queues it to a queue shared with the bottom half. Then the bottom half takes care of most of protocol processing including demultiplexing. The bottom half is scheduled by the interrupt handler when there is no interrupts to be processed. On the other hand, in the device driver based design, the entire protocol stacks are implemented in the device driver. Therefore, if the protocol stacks are heavy like TCP/IP then the device driver based design may not be suitable. In case of the kernel-level design, the user tasks request a sending operation through a system call. The system call eventually passes the request to the device driver. On the sender side, the main difference between two design alternatives is that, in case of the bottom half based design, the kernel performs most of protocol processing before passing down the user request to the device diver. It is to be noted that the data copy operation between the user and kernel spaces should be carefully designed. In either synchronous or asynchronous interface, we can copy the user data into the kernel and return immediately; however, this results in the copy overhead. On the contrary, we can avoid the copy operation by delaying the notification of completion but this can hinder the application’s progress.

244

H.-W. Jin and J. Yoo

Fig. 2. Kernel-level protocol stacks: (a) bottom half based design and (b) device driver based design

On the receiver side, once a new packet comes in from the network controller, the interrupt handler does urgent process before passing it to the upper layer. In the bottom half based design, as we have mentioned earlier, the bottom half takes care of interpreting the header and demultiplexing. Some of operating systems such as Embedded Linux provide an interface to insert a new bottom half (more precisely tasklet in Embedded Linux) without kernel modification. The microkernel based operating systems such as QNX also allow adding new protocol stacks in a similar manner. In the device driver based design, the bottom half is not taken into account at all. In this design alternative, the protocol stacks are implemented in the system call and the interrupt handler. The distribution of weight between the system call and the interrupt handler can vary in terms of which does more protocol processing but usually the interrupt handler does majority of the protocol processing. This is because doing demultiplexing at the interrupt handler is more efficient. Otherwise, the system call needs to search the incoming packet queue internally, which requires exhaustive searching time and locking overhead between tasks. However, doing more work at the interrupt handler is not desirable because it is supposed to finish its work very quickly. Therefore, this design is valuable when the overhead for protocol processing is low. As a case study of kernel-level design, we have implemented a device driver based protocol called RTDiP (Real-Time Direct Protocol) in the Embedded Linux over Ethernet [3, 4]. RTDiP is a new transport protocol that can provide priority aware communication, communication semantics for synchronization, and low communication overhead. In the synchronous semantics, the communication protocols do not queue the packets but keep only the last packet received, which is suitable for distributed synchronization over relatively small area embedded networks. The performance results show that RTDiP reports 48us one-way latency with 8-byte message and provides better overhead prediction. We are currently implementing RTDiP over Control Area Network (CAN) as well. In addition we plan to implement it in another embedded operating system such as QNX.

3 Verification Methodologies Protocol verification [5] is an activity to assure the correctness of network communication protocols. The design alternatives we have studied in Section 2 should be verified

Exploring the Design Space for Network Protocol Stacks

245

thoroughly before proceeding to the implementation. The formal verification has been known as the prominent but cost-ineffective technique. This section introduces formal verification techniques for verifying network protocol stacks. We briefly overview formal verification techniques and then review the techniques from aspect of network protocol stacks verification. We then share our experience of verifying protocol stacks of system air conditioning system. 3.1 Formal Verification Formal verification and formal specification altogether are called as formal methods [6]. Formal specification [7] is a technique for specifying the system on the basis of mathematics and logic. It has various techniques and notations, e.g. algebra, logic, table, graphics and automata. After completing the formal specification, we can apply formal verification techniques to the specification to prove that the system satisfies required properties. There are two main approaches in formal verification: deductive reasoning and algorithmic verification. Deductive reasoning is a verification methodology using axioms and proof rules to establish the reasoning. Experts construct the proofs in hands, and it usually requires greater expertise in mathematics and logic. Even if tools called theorem prover have been developed to provide a certain degree of automation, its inherent characteristic makes it difficult to be used widely for verifying recent network protocol stacks. Second methodology is algorithmic verification, usually called model checking [8]. Model checking is a technique verifying finite state systems through exhaustively searching all states space to check whether specified correctness condition is satisfied. It is carried out automatically without almost any intervention of experts, but restricted to the verification of finite state systems. The deductive reasoning, on the other hand, has no such limitations. With respect to protocol verification, the latter - model checking is more efficient and cost-effective than the former - theorem proving. The former’s main drawback, requiring considerable expertise, makes the model checking techniques better suited for protocol stacks verification. Indeed, as the performance of model checking technique has increased rapidly, it can do various verifications more efficiently than when it had been firstly proposed. 3.2 Formal Verification Techniques for Network Protocol Stacks The formal verification techniques for network protocol stacks fall into several categories. General-purpose model checkers such as Cadence SMV [9] and SPIN [10] can verify protocols efficiently. General-purpose proof tools which are not the model checker but conduct formal verification such as UPPAAL [11] are useful too. We can also use specialized protocol analysis tools (e.g., Meadows’ NRL [12] and Cohen’s TAPS [13]) Formal specification should be prepared before conducting formal verification. Finite State Machine (FSM) based formal specification technique has been widely used for specifying network protocols and stacks. FSM mainly consists of a set of transition rules. In the traditional FSM model, the environment of the FSM consists of two finite and disjoint sets of signals, input signals and output signals. A number of papers

246

H.-W. Jin and J. Yoo

using FSM based formal specification have been reported. Especially, network protocols can be well specified using communicating FSM or extended FSM as reported in [14, 15]. With respect to the formal verification of network protocol stacks, we have to consider two tasks: specification of protocol stacks and modeling of the system implementing the protocol. In the first step, we have to model the protocol algorithm and stack hierarchy using a formal specification method. Then the modeling of the whole embedded network system which includes the implementations of the protocol stacks can proceed. Therefore, verifying the network protocol stacks requires not only the formal specification for the protocol stacks but also the encompassing environment where the protocol stacks are implemented and used. Formal verification for network protocol stacks totally depends on the formal specification developed beforehand. If we use FSM-based formal specifications (e.g., Statecharts [16] and Specification and Description Language (SDL) [17]), most general-purpose model checkers are available. In case that exact timing constraints should be preserved, timed automata based formal specification like UPPAAL is a good choice. We can also use specialized protocol verification tools, but it is not easy to model the whole system with them. Therefore, the combination of FSM based formal specification and general-purpose model checking tools will be more effective than others. 3.3 SDL-Based Verification of Protocol Stacks SDL is a formal specification language and tool suite widely used to model the system which consists of a number of independently communicating subsystems. The SDL specification can be translated into FSM forms, and then used as an input for general-purpose model checkers such as SMV and SPIN. Figure 3 describes the architecture of system air conditioning system. We performed the formal verification of the

Fig. 3. The architecture of system air conditioning system

Exploring the Design Space for Network Protocol Stacks

247

network protocol between distributed controllers called DMS (Distributed Management System) and a personal controller called MFC (Multi-Function Controller). A DMS controls all indoor air conditioners, outdoor compressors and network routers under its control. An MFC is a touch-screen based personal controller like PDA. In our experience, special-purpose embedded network system such as the above can be well specified with SDL and verified formally through general-purpose model checkers such as SPIN. We implemented an automatic translator from SDL into PROMELA, SPIN’s input program language, and conducted SPIN model checking. We verified several properties, categorized as feasibility test, responsiveness, environmental assumptions and consistency checking. In addition to the SPIN model checking, the SDL tool has its own validation tool, which checking syntax errors and completeness of the specification.

4 Network Interoperability Since various network interconnects can be utilized on a distributed embedded system, the network interoperability is a critical requirement in such system. For example, in modern automobile systems, several network interconnects such as CAN, LIN, FlexRay, and MOST are widely used in an integrated manner. In such systems, we need a gateway for interoperation between different networks [2, 21, 22], which is similar with bridges or routers on Internet. Thus the gateway needs to understand several network protocols and convert one into another. In this section, we explore the design alternatives for embedded network gateways. Especially, we classify the gateway designs into two based on how to operate the operating system on the gateway. 4.1 Single OS Based Gateway In this design alternative, the gateway architecture has a single or multiple homogeneous Micro-Controller Units (MCUs) that run single operating system’s image. The MCU can include the network controllers for different network interconnects supported by the gateway or can be connected with the network controllers on the same board through buses such as Serial Peripheral Interface (SPI), Inter-Integrated Circuit (I2C), etc. The protocol stacks can be designed and implemented as any of the design choices described in Section 2 but a layer of protocol stacks is required to perform the gateway functions. If the network layer performs the gateway functions, it can be transparent to the networking processes running on the embedded nodes. The network protocols for embedded systems, however, usually have no strict distinction between the network and transport layers because their network layers do not suppose to allow arbitrary transport layers while the Internet Protocol (IP) layer does. In addition, even if the gateway performs the protocol conversion at the network layer, in many cases it is hard to conserve the end-to-end communication semantics due to significant differences between transport protocols of embedded networks.

248

H.-W. Jin and J. Yoo

Fig. 4. Single OS based gateway: (a) transport layer based design and (b) global addressing layer based design

Another solution is to introduce a gateway module at the transport layer as shown in Figure 4(a). In this design alternative, the gateway should manage the protocol conversion tables that map between message headers of both network and transport layers for different networks. Since the transport layer translates the protocols internally the legacy applications do not need any modifications. A drawback of this design is the limitation of scalability. The number of possible header patterns can be numerous in some embedded systems, which can result in memory shortage on the gateway node. Therefore, this design is useful only when the number of entries of the protocol conversion tables is predictable. Fortunately, in many embedded systems, we can figure out the number of embedded devices that need to collaborate (i.e., communicate) each other across different networks at the design phase. We can also add a new layer on top of the transport layer as shown in Figure 4(b). The new layer defines global addressing and APIs to access the layer from the applications. If the gateway uses global addressing the networking processes on every embedded node have to be aware about it. Thus the applications need to be modified but if once this is done they can run transparently on any networks in which the additional layer is inserted. In this case, the gateway only requires managing the routing table and thus the scalability in terms of memory space can be better than the previous design. However, if the most of embedded nodes perform intra-network communication, the overhead of additional layering can harm the performance. Therefore, the decision among the design alternatives described can vary based on the system requirements and characteristics. As a case study of single OS gateway, we have implemented a gateway between MOST and CAN networks based on the transport layer based design [2]. In this case study, we utilize the MOST Network Service implemented in Section 2.1. The communication semantics of MOST control message are very different with the traditional send/receive semantics. The MOST control message invokes a function on a MOST device. However, CAN does not provide such communication semantics while providing multicast like communication semantics which is not in MOST Network Service. Thus, simple message forwarding with address translation at the network layer does not work. To provide transparent conversion of communication semantics we have suggested a gateway module. In addition, we have implemented the protocol conversion table and defined some entries for performance measurement. The performance results show that the suggested design hardly adds additional

Exploring the Design Space for Network Protocol Stacks

249

overhead, which is about 15% of pure communication overhead, and can deliver control messages very efficiently. 4.2 Multi-OS Based Gateway Since the embedded nodes on different networks can have different requirements the desirable operating systems can vary. For example, the automobile gateway node can have many kinds of peripheral interfaces such as USB and wireless network for supporting infotainment applications over MOST. Therefore, an operating system that has fluent device drivers such as Embedded Linux is highly expected. On the other hand, the electronic units such as chassis, powertrain and body controllers connected to CAN or LIN demand to guarantee the real-time requirements and thus an RTOS is desirable. Since the gateway needs to meet such various requirements we can consider having multiple operating systems on the gateway node. The address translation issue discussed in Section 4.1 is still applied in the similar manner even in this design alternative. However, an efficient scheme for communication between operating systems has to be taken into account. A gateway node can be equipped with multiple heterogeneous MCUs that have different network controllers as shown in Figure 5(a). Each MCU can run its own operating system that satisfies the requirements of responsible networks. The MCUs on the gateway node can collaborate by communicating each other through a bus or shared memory module. Since an MCU may do not have all network controllers required to a specific embedded system, we can need several MCUs, which makes the connection architecture between MCUs very complicate. Thus the architecture based on multiple MCUs can be applied to limited cases. Another approach is to exploit the virtualization technology, which allows running several operating systems on the same MCU as shown in Figure 5(b). The virtualization technology can isolate the system fault from propagating to other operating systems and provide service assurance. In addition, the state-of-the-art virtualization technologies enable low overhead virtualization and better resource scheduling, which lead to high scalability. In addition to the existing optimization technologies, a lighter I/O virtualization can be suggested because the network controllers on the gateway

Fig. 5. Multi-OS based gateway: (a) multiple MCUs based design and (b) system virtualization based design

250

H.-W. Jin and J. Yoo

node may not be shared between operating systems. An important issue is how efficiently the operating system domains can communicate each other. In general, the portion of inter-domain communication on a virtualized node is not dominant compared with inter-node communication. However, on the gateway node, many of network messages cause inter-domain communication because they are supposed to be forwarded to another network interface of which another operating system domain may take care. As a case study of the gateway with multiple operating systems, we are implementing a MOST-CAN gateway using virtualization technology provided by Adeos [23]. Adeos provides a flexible environment for sharing hardware resources among multiple operating systems by forwarding hardware events to appropriate operating system domain. We run Linux and Xenomai [24], a parasitic operating system to Linux, over Adeos. The Linux operating system takes charge of the MOST interface while Xenomai does the CAN interface. The gateway processes are running on each operating system and communicate each other through inter-domain communication interface provided by Xenomai. Since the protocol stacks for MOST and CAN run on different operating systems we perform the protocol conversion at the above of the transport layer but we do not use global addressing. Instead we define a protocol conversion table that maps network connections over different networks.

5 Conclusions In this paper, we have explored the design space of network protocol stacks for special-purpose embedded systems. We have surveyed several design choices very carefully so that we can choose the best design for a given network with respect to performance, portability, complexity, and flexibility. More precisely we have discussed design alternatives for implementing new network protocol stacks over embedded operating systems, methodologies for verifying the network protocols, and the designs for network gateway. Moreover, we have performed case studies for the design alternatives and methodologies. Acknowledgments. This work was partly supported by grants #NIPA-2009-C10900902-0026 and #NIPA-2009-C1090-0903-0004 by the MKE (Ministry of Knowledge and Economy) under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) and grant #R33-2008-000-10068-0 by MEST (Ministry of Education, Science and Technology) under the WCU (World Class University) support program.

References 1. MOST Cooperation.: MOST Specification. Rev 3.0 (2008) 2. Lee, M.-Y., Chung, S.-M., Jin, H.-W.: Automotive Network Gateway to Control Electronic Units through MOST Network (2009) (under review) 3. Lee, S.-H., Jin, H.-W.: Real-Time Communication Support for Embedded Linux over Ethernet. In: International Conference on Embedded Systems and Applications (ESA 2008), pp. 239–245 (2008)

Exploring the Design Space for Network Protocol Stacks

251

4. Lee, S.-H., Jin, H.-W.: Communication Primitives for Real-Time Distributed Synchronization over Small Area Networks. In: IEEE International Symposium on Object/component/service-oriented Real-Time distributed Computing (ISORC 2009), pp. 206–210 (2009) 5. Palmer, J.W., Sabnani, K.: A Survey of Protocol Verification Techniques. In: Military Communications Conference - Communications-Computers, pp. 1.5.1–1.5.5 (1986) 6. Peled, D.: Software Reliability Methods. Springer, Heidelberg (2001) 7. Wing, J.M.: A specifier’s introduction to formal methods. IEEE Computer 23(9) (1990) 8. Clarke, E., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (1999) 9. SMV, http://w2.cadence.com/webforms/cbl_software/index.aspx 10. SPIN, http://spinroot.com/spin/whatispin.html 11. UPPAAL, http://www.uppaal.com/ 12. Meadows, C.: Analysis of the Internet Key Exchange protocol using the NRL Protocol Analyzer. In: SSP 1999, pp. 216–231 (1999) 13. Cohen, E.: TAPS: A first-order verifier for cryptographic protocols. In: 13th IEEE Comp. Sec. Found. Workshop, pp. 144–158 (2000) 14. Aggarwal, S., Kurshan, R.P., Sabnani, K.: A Calculus for Protocol Specification and Verification. In: Int. Workshop on Protocol Specification, Testing and Verification (1983) 15. Sabnani, K., Wolper, P., Lapone, A.: An Algorithmic Procedure for Protocol Verification. In: Globecom (1985) 16. Harel, D.: Statecharts: A Visual Formalism for complex systems. Science of Computer Programming 8, 231–274 (1987) 17. SDL, http://www.telelogic.com/products/sdl/index.cfm 18. Wind River, http://windriver.com 19. Labrosse, J.: MicroC/OS-II: The Real-Time Kernel. CMP Books (1998) 20. QNX Software Systems, http://www.qnx.com 21. Hergenhan, A., Heiser, G.: Operating Systems Technology for Converged ECUs. In: 7th Embedded Security in Cars Conference (2008) 22. Obermaisser, R.: Formal Specification of Gateways in Integrated Architectures. In: Brinkschulte, U., Givargis, T., Russo, S. (eds.) SEUS 2008. LNCS, vol. 5287, pp. 34–45. Springer, Heidelberg (2008) 23. Yaghmour, K.: Adaptive Domain Environment for Operating Systems (2001), http://www.opersys.com/adeos 24. Xenomai, http://www.xenomai.org

HiperSense: An Integrated System for Dense Wireless Sensing and Massively Scalable Data Visualization Pai H. Chou1,2,4 , Chong-Jing Chen1,2 , Stephen F. Jenks2,3 , and Sung-Jin Kim3 1

2

4

Center for Embedded Computer Systems, University of California, Irvine, CA Electrical Engineering and Computer Science, University of California Irvine, CA 3 California Institute for Telecommunications and Information Technology, Irvine, CA Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

Abstract. HiperSense is a system for sensing and data visualization. Its sensing part is comprised of a heterogeneous wireless sensor network (WSN) as enabled by infrastructure support for handoﬀ and bridging. Handoﬀ support enables simple, densely deployed, low-complexity, ultra-compact wireless sensor nodes operating at non-trivial data rates to achieve mobility by connecting to diﬀerent gateways automatically. Bridging between multiple WSN standards is achieved by creating virtual identities on the gateways. The gateways collect data over Fast Ethernet for data post-processing and visualization. Data visualization is done on HIPerWall, a 200-megapixel display wall consisting of 5 rows by 10 columns of 30-inch displays. Such a powerful system is designed to minimize complexity on the sensor nodes while retaining high ﬂexibility and high scalability.

1

Introduction

Treating the physical world as part of the cyber infrastructure is no longer just a desirable feature. Cyber-physical systems (CPS) are now the mandate of many national funding agencies worldwide. CPS entails more than merely interfacing with the physical world. The goal is to form synergy between the cyber and the physical worlds by enabling cross pollination of many more features. A wireless sensor network (WSN) is an example of cyber-physical interface. A sensor converts a physical signal into a quantity to enable further processing and interpretation by a computing machine. However, it is still mostly an interface, rather than a system in the CPS sense. Most WSNs today lack the cyber part, which would leverage the vast amount of information available on the network to synthesize new views of data in ways never possible before. An example of a system that is one step towards CPS is SensorMap [1], which offers a GoogleEarth-style application to be augmented with sensor data collected at the corresponding positions. SensorMap provides an abstraction in the form of the sensor-to-map programming interface (API), so that data providers can S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 252–263, 2009. c IFIP International Federation for Information Processing 2009

HiperSense: An Integrated System

253

leverage the powerful cloud-computing backend without having to re-invent yet another tool for each highly specialized application. However, data visualization can be much more than merely superimposing data on geographical or topological maps by a cloud computing system to be rendered on a personal computer. In fact, the emergence of large, high-resolution displays, high-performance workstations, and high-speed interconnection interfaces give rise to large display walls as the state-of-the-art visualization systems. An example is the HIPerWall, a 200-megapixel tiled screen driven by 25 PowerMac G5s interconnected by two high-speed networks [2, 3]. Such a system has found use in ultra-high-resolution medical imaging and appears as a prime candidate for visualization of a wide variety of sensor data as well. This paper describes work in progress on such a massive-scale sensing and visualization system, called HiperSense. On the sensing side, we aim to further develop a scalable infrastructure support, called EcoPlex [4]. It consists of a tiered network, where the upper tier contains the gateways and the lower tier includes the sensor nodes. The gateways support handoﬀ for mobility and bridging of identity for integrating heterogeneous radio and networking standards. On the data visualization side, we feed the data to the HIPerWall, which can then render the data in an interactive form across 50 screens as one logical screen. This paper reports on the technologies developed to date and discusses practical issues that we have encountered.

2

Related Work

Several multiple-access protocols that use multiple frequency channels have been proposed for wireless sensor networks [5, 6]. Some have been evaluated only by simulation, while others have been adopted for researchers in ad-hoc network domains. Many protocols for WSNs have been implemented on the popular MicaZ or TelosB platforms. Y-MAC [7] is a TDMA-based protocol that schedules the receivers in the neighborhood by assigning available receiving time slots. Its light-weight channelhopping mechanism can enable multiple pairs of nodes to communicate simultaneously, and it has been implemented on the RETOS operating system running on a TmoteSky-class sensor node. However, a problem with Y-MAC is the relatively low performance due to time synchronization and overhead of channelswitching time. Base on previous experimental results from another team [8] on the Chipcon CC2420 radio transceiver, which implements the IEEE 802.15.4 MAC as used on MicaZ and TelosB, the channel switching time is nearly equal to the time it takes to transmit a 32-byte data packet. Therefore, changing to another frequency channel dynamically and frequently can become a damper on system performance. Le et al proposed and implemented another multi-channel protocol on the MicaZ [8]. Their protocol design does not require time synchronization among the sensor nodes. They also take the channel-switching time into consideration for sensor nodes that are equipped with only a half-duplex radio transceiver. A

254

P.H. Chou et al.

distributed control algorithm is designed to assign accessible frequency channels for each sensor node dynamically to minimize the frequency-changing time. The compiled code is around 9.5 KB in ROM and 0.7 KB in RAM on the top of TinyOS. Although it is smaller than others’ solutions, it is still too big to ﬁt in either the Eco or µPart sensor node, both of which have much smaller RAM and ROM. After data collection, showing sensing data in a meaningful way in real-time is another emerging issue. Several solutions have been proposed to integrate the sensing data with a geographic map [9, 10]. SensorMap [1] from Microsoft Research is for displaying sensor data from SenseWeb [11] on a map interface. They are able to provide tools to query sensor nodes and visualize sensing data in real-time. Google Map with traﬃc can show not only the live traﬃc but also the history traﬃc at day and time [12]. However, these works currently assumes limited screen resolution and has not been scaled to the 200-megapixel resolution of the HIPerWall.

3

System Architecture

Fig. 1 shows the architecture of HiperSense. It consists of HIPerWall as the visualization subsystem and EcoPlex as the sensing infrastructure. This section summarizes each subsystem and their interconnection. 3.1

HIPerWall

HIPerWall

HIPerWall is a tiled display system for interactive visualization, as shown in Fig. 2(d). The version we use consists of ﬁfty 30-inch LCD monitors arranged in ﬁve rows by ten columns. Each monitor has a pixel count of 2560 × 1600 at 100 dots per inch of resolution, and therefore the entire HIPerWall has a total resolution of 204.8 million pixels. The tiled display system is driven by 25 PowerMac G5 workstations interconnected by two networks to form a highperformance computing cluster. One network uses Myrinet for very high-speed 50 tiled displays (5 rows x 10 cols) driven by 25 PowerMac G5s, connected by Myranet & Gigabit Ethernet front-end node

EcoPlex

G

ZigBee mesh

Legend Eco nodes, connected to via a Gateway G

Gateway for ZigBee and Eco nodes, w/ Ethernet uplink

Fast Ethernet G

G

ZigBee mesh

G

ZigBee mesh

Fig. 1. The HiperSense architecture

HiperSense: An Integrated System

(a) Eco Node

(b) Base Station

255

(c) EZ-Gate

(d) HIPerWall Fig. 2. Components of HiperSense

data transfer, and the other network uses Gigabit Ethernet for control. The HIPerWall software is portable to other processors and operating systems, and it can be conﬁgured for a wide variety of screen dimensions. A user controls the entire HIPerWall from a separate computer called the front-end node. It contains a reduced view of the entire display wall, enabling the user to manipulate the display across several screens at a time. The frontend node also serves as the interface between the sensing subsystem and the visualization subsystem. 3.2

EcoPlex

EcoPlex is a tiered, infrastructure-based heterogeneous wireless sensor network system. Details of EcoPlex can be found in another paper [4], and here we highlight the distinguishing features. At the bottom tier are the wireless sensor nodes. The top tier consists of a network of gateway nodes. Lower Tier Sensor Nodes. EcoPlex currently supports two types of nodes with diﬀerent communication protocols: ZigBee and Eco. Our platform allows other protocols such as Bluetooth and Z-Wave to be bridged without any inherent diﬃculty. ZigBee is a wireless networking protocol standard primarily targeting lowduty-cycle wireless sensing applications [13]. In recent years, it is also targeting home automation domain. ZigBee is a network protocol that supports ad hoc mesh networking, although it also deﬁnes roles for not only end devices but also

256

P.H. Chou et al.

routers and coordinators. ZigBee is built on top of the IEEE 802.15.4 media access control (MAC) layer, which is based on carrier-sense multiple access with collision avoidance (CSMA/CA). Currently, many wireless sensor applications are built on top of 802.15.4, though not necessarily with the ZigBee protocol stack, since it occupies about 64–96KB of program memory. Another type of wireless sensor node supported in EcoPlex is Eco [14, 15], our own ultra-compact wireless sensing platform, as shown in Fig. 2(a). It is 1 cm3 in volume including the MCU, RF, antenna, and sensor devices. It contains 4 KB RAM and 4 KB EEPROM. The radio is based on Nordic VLSI’s ShockBurst protocol at 1 Mbps, and it is a predecessor of Wibree, also known as Bluetooth Low Energy Technology as a subset of Bluetooth 3.0 [16]. Eco is possibly the world’s smallest, self-contained, programmable, expandable platform to date. A ﬂex-PCB connector enables an Eco node to be connected to other I/O devices and power. Eco is meant to complement ZigBee in that Eco nodes can be made much smaller and cheaper than ZigBee ones, and thus they can be deployed where ZigBee nodes cannot, especially in some highly wearable or size-constrained applications. Upper Tier: Gateways. The upper tier of EcoPlex consists of a network of gateway nodes called EZ-Gates. An EZ-Gate is essentially a Fast Ethernet router based on an ARM-9-core network processor running Linux 2.6. It is augmented with the radio transceivers that are needed for supporting the protocols used by the wireless sensor nodes. In this case, one ZigBee transceiver and two Eco transceivers are added to each EZ-Gate. For ZigBee support, the EZGate implements the protocol stack of a ZigBee coordinator. Since Eco is more resource-constrained and can aﬀord to implement only a much simpler protocol, the gateway provides relatively more support in the form of handoﬀ and virtual identity. Eco nodes connect to the gateways and not to each other, the same way cellular phones connect to base stations but not to each other. Just as cellular towers perform handovers based on the proximity of the mobile, our EZ-Gates perform handoﬀs based on the link quality of the Eco nodes. Unlike cell phones, which are treated as independently operated units, EcoPlex supports the concepts of clusters, which are groups of wireless sensor nodes that work together, are physically close to each other, and move as a group [17]. Instead of performing handoﬀ for each node individually, cluster handoﬀ would rely on a node as a representative for the entire cluster and has been shown to be eﬀective especially in dense deployments. EZ-Gates also support bridging in the form of virtual identity. That is, for every Eco node connected to EcoPlex, the owner gateway maintains a node identity in the ZigBee space. This way, an Eco node appears just like any other ZigBee node and can communicate logically with other ZigBee nodes, without having to be burdened with the heavy ZigBee stack. A simpler base station without the handoﬀ and virtual identity support was also developed. It is based on the Freescale DEMO9S12NE64 evaluation board connected to a Nordic nRF24L01 transceiver module, as shown in Fig. 2(b). It has a Fast Ethernet uplink to the front-end node. It was used for the purpose

HiperSense: An Integrated System

257

of developing code between the Ethernet and Eco sides before porting to the EZ-Gate for ﬁnal deployment.

4

System Integration

HiperSense is more than merely connecting the EcoPlex sensing subsystem with the HIPerWall tiled display system. It entails design of a communication and execution scheme to support the needs of sensing and visualization. This section ﬁrst discusses considerations for HiperSense to support CPS-style visualization. Then, we describe the communication scheme for system integration. 4.1

Visualization Styles and Support

Visualization is the graphical rendering of data in a form that helps the user gain new insights into the system from which the data is collected. Unlike many other visualization systems that render only static data that has been collected and stored in advance, HiperSense is designed to support visualization of both static data ﬁles and live data streams. More importantly, we envision a visualization system that synthesizes views from static or live data and other sources available on the Internet. As an example, consider a WSN that collects vibration data from sensor nodes on a pipeline network. The user is not interested in vibration per se but wants to non-invasively measure the propagation speed of objects traveling inside the pipeline based on the peak vibration. In this case, time-history plots of raw vibration data is not so meaningful to the user; instead, the data streams must be compared and processed to derive the velocity. The visualization system then renders the velocity by superimposing colored-encoded regions over highresolution images of the pipeline network and its surroundings. Moreover, the user may want the ability to not only navigate spatially similar to GoogleEarth but also temporally by seeing how the peak velocity shifts over time. To support smooth spatial navigation, HIPerWall relies on replication or prefetching of large data (e.g., patches of GoogleEarth photos) from adjacent nodes. The data could be information in the local database system, images or videos. These data shown on the screens are treated as independent objects and can be zoomed in, zoomed out or rotated arbitrarily. The front-end node simply sends out commands to every computing node to indicate which object should be displayed, which position the object is and the other properties to control the object. This mechanism reduces the traﬃc between the front-end node and the cluster of computing nodes. To support real-time access to data, supervisory control and data acquisition (SCADA) systems, which are commonly found in factories for up to thousands of sensing and actuation channels per site, have used real-time databases for the purpose of logging and retrieval of data. A similar set up can be built for HiperSense. Historic data can also be retrieved from the real-time database via a uniform interface. However, one diﬀerence between a conventional SCADA and

258

P.H. Chou et al.

HiperSense is that the former is commonly handled by one or a small number of computers, while the latter relies on a cluster of 25 computers to parallelize the handling of the massive graphical bandwidth. For the purpose of tiled visualization of live sensor data, we program the frontend node of the HIPerWall to also be a fat client for data collection from wireless sensor nodes via the gateways in EcoPlex. The front-end node then broadcasts the collected data to the cluster of computing nodes inside HIPerWall. Every computing node decodes the whole data packet but shows only the portion that is visible on the two LCD screens that it controls. This broadcasting mechanism removes the time synchronization between all workstations and ensures that all sensing data can be shown on the tiled displays at the same time. If the backbone of Intranet and the cluster of workstations both support Jumbo Frame [18], we can increase the overall system performance and deploy more wireless sensor nodes at once. 4.2

Protocols for the Tiers

EcoPlex currently supports both ZigBee and Eco as two complementary wireless protocols. The ZigBee standard is designed for sporadic, low-bandwidth communication in an ad hoc mesh network, whereas Eco is capable of high-bandwidth, data-regular communication on ultra-compact hardware in a star network. Of course, it is possible for each platform to implement each other’s characteristics, but they would be less eﬃcient. For the purpose of integration with HiperSense, ZigBee does not pose a real problem due to its lack of timing constraints. We therefore concentrate our discussion on integration of Eco nodes and gateways. On many wireless sensor nodes, the software complexity is dominated by the protocol stack. The code sizes of the typical protocol stacks of Bluetooth, ZigBee, and Z-Wave are 128KB, 64–96KB, and 32KB, respectively, whereas the main program of a sensing application can be as little as a few kilobytes. Our approach to minimizing complexity on the Eco nodes is to externalize it: we implement only the protocol-handling mechanisms on Eco and move most policies out to the infrastructure. This can be accomplished by making nodes passive with a thin-server, fat-client organization. That is, the sensor nodes are passive and the host actively pulls data from them. This eﬀectively takes care of arbitration1 and eﬀective acknowledgment. The core mechanism can be implemented in under 40 bytes of code. After adding support for multi-gateway handoﬀ and joining, channel switching, and a number of other performance enhancement policies (e.g., amortize the pulling overhead by returning multiple packets), the code size can still be kept around 2KB. Once the reliable communication primitives are in place, then we can add another layer of software for dynamic code update and execution [19] on these nodes. Our vision is host-assisted dynamic compilation, where the host or its delegate dynamically generates optimized code to be loaded into the node for execution. This will be much more energy eﬃcient than a general-purpose protocol stack that must anticipate all possibilities. An example is the protocol stack 1

No intra-network collision unless there is a malfunctioning node.

HiperSense: An Integrated System

259

for network routing. Since our gateway as well as many commercially available gateways run Linux and have plenty of storage available, the gateways should also be able run synthesizer, compiler, and optimizer without diﬃculty. The front-end node of the HIPerWall acts as a client to query data from all gateways. Each gateway is connected to the front-end node via a wired interface with higher bandwidth than the wireless interface. At beginning, all wireless sensor nodes communicate with the gateway via the control channel. The frontend node issues frequency switching commands to the wireless sensor nodes based on the sampling rate of each wireless sensor node and the available bandwidth of each wireless frequency channel. Later on, the front-end node issues a command packet to the gateway to get data from the wireless sensor nodes controlled by the gateway. The gateway in turn broadcasts the command packet to all Eco sensor nodes on the gateway’s own frequency channel. The gateway packs all pulled data together and forwards the data to the front-end node, which in turn broadcasts the data to the real-time databases for visualization. ZigBee nodes push data rather than being pulled. This way, the ZigBee network can coexist with the Eco network by sporadically taking up bandwidth only when necessary. For HiperSense, the front-end node can resend the command packet to a wireless sensor node if it does not get any reply packet within certain amount of time. In order to improve the system performance, we have the retransmission mechanism inside the gateways instead of having it at the front-end node. A gateway resends the pulling command packet that it received from the front-end node if it does not receive any reply packet from a wireless sensor node after a pre-deﬁned timeout period.

5

Evaluation

This section presents experimental results on a preliminary version of HiperSense. The experimental setup consists of 100 Eco nodes (Fig. 3(a)) and two gateways densely deployed in an area from 2 to 16 m2 . The larger setup is for a miniature-scale water pipe monitoring system, where nodes measure vibration at diﬀerent junctions. The gateways are connected to the HIPerWall’s front-end node (Fig. 3(b)) via Fast Ethernet. We compare the performance of our system with other works in terms of measured throughput, latency, and code sizes for diﬀerent sets of features included. 5.1

Latency and Throughput

Fig. 4 shows the measured aggregate throughput and per-node latency results for one and two gateways over diﬀerent numbers of reply packets per pull. Returning more packets per pull enables amortization of pulling overhead, though at the expense of increased latency. The lower and upper curves for each chart shows the result for one and two gateways, respectively. In the case of one gateway, the throughput ranges from 6.9 KB/s to 15.8 KB/s for 20 reply packets per pull, though the latency increases linearly from around 100 ms to over 1.5 seconds.

260

P.H. Chou et al.

(a) 100 Eco Nodes

(b) Front-End Node and HIPerWall

Fig. 3. Experimental Setup 30000

2500

25000

20000

Response Time (ms)

Performance (Bytes/S)

2000

15000

10000 2 Base Stations 2 Transceivers 5000

1500

1000

2 Base Stations 2 Transceivers

500

1 Base Station 1 Transceiver

1 Base Station 1 Transceiver

0

0 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

Number of Reply Packet(s)

(a) Throughput

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

Number of Reply Packet(s)

(b) Latency

Fig. 4. Performance Results

By doubling the number of gateways, the aggregate throughput ranges from 13.5KB/s for one reply per pull to 27.9 KB/s for 20 reply packets per pull. In the latest case, the total throughput increases by 81% while the latency increases by 9%. The data rate is rather low compared to the bandwidth within HIPerWall. With nonoverlapping frequencies, EcoPlex can scale up to 50 nodes/gateway × 14 channels/gateway = 6200 nodes with only 0.1% utilization of the bandwidth on the Gigabit Ethernet or 1% of Fast Ethernet. 5.2

Comparison

The closest work to ours in the area of communication protocol for the wireless sensor nodes is the Multi-Channel MAC (MCM) protocol as proposed by Le et al from the Cyber Physical Computing Group [8]. Their protocol was built on the top of TinyOS v2.x, whose minimum code size is around 11 KB [20]. Depending

HiperSense: An Integrated System

261

Table 1. Comparison Between HiperSense for Eco nodes and the Multi-Channel MAC protocol for TinyOS 2 [8] Code Size HiperSense MCM Protocol MAC layer 31 bytes 9544 bytes Runtime Support 1.1 KB 11-20 KB Dynamic Execution 430 bytes N/A Total Code Size 1.56 KB + loaded code 20.5-29.5 KB

on the hardware and software conﬁgurations, the compiled code size of TinyOS v2.x could exceed 20 KB [20]. Table 1 shows the required code size in the ROM after compilation. In HiperSense, the gateways and the front-end node handle most of the protocol policies originally handled by the sensor nodes. This enables the sensor nodes to be kept minimally simple with only the essential routines. Moreover, the processing time on a sensor node can be shortened, and the ﬁrmware footprint in the ROM is also minimized. We implemented a dynamic loading/dispatching layer, which occupies 430 bytes, enabling the node to dynamically load and execute code fragments that can be highly optimized to the node in its operating context [19]. In contrast, the MCM protocol occupies 9544 bytes on the top of TinyOS, which can increase the total code size to 29.5KB. That is over an order of magnitude larger than our code size.

6

Conclusion and Future Work

This paper reports the progress on the HiperSense sensing and visualization system. The sensing aspect is based on EcoPlex, which is an infrastructure-based, tiered network system that supports heterogeneous wireless protocols for interoperability and handoﬀ for mobility. We keep node complexity low by implementing only bare minimum mechanisms and either externalize the policies to the host side or make them dynamically loadable. The visualization subsystem is based on HIPerWall, a tiled display system capable of rendering 200 megapixels of display data. By feeding the data streams from EcoPlex to the front-end node of the HIPerWall and replicating them among the nodes within HIPerWall, we are making possible a new kind of visualization system. Unlike previous applications that use static data, now we can visualize both live and historic data. Scalability in a dense area was shown with 100 wireless sensor nodes in a 2m2 area. By utilizing all frequency channels, we expect HiperSense to handle 6200 independent streams of data. Applications include crowd tracking, miniature-scale pipeline monitoring, and a wide variety of medical applications. Future work includes making the protocol more adaptive and power manageable on the wireless sensor nodes. Dynamic code loading and execution has been implemented but still relies on manual coding, and it is a prime candidate for automatic code synthesis and optimization.

262

P.H. Chou et al.

Acknowledgments The authors would like to thank Seung-Mok Yoo, Jinsik Kim, and Qiang Xie for their assistance with this work on the Eco protocol, and Chung-Yi Ke, NaiYuan Ko, Chih-Hsiang Hsueh, and Chih-Hsuan Lee for their work on EcoPlex. The authors also would like to thank Duy Lai for his assistance on HIPerWall. This research project is sponsored in part by the National Science Foundation CAREER Grant CNS-0448668, UC Discovery Grant itl-com05-10154, the National Science Council (Taiwan) Grant NSC 96-2218-E-007-009, and Ministry of Economy (Taiwan) Grant 96-EC-17-A-04-S1-044. HIPerWall was funded through NSF Major Research Instrumentation award number 0421554. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of the National Science Foundation.

References [1] Nath, S., Liu, J., Miller, J., Zhao, F., Santanche, A.: SensorMap: a web site for sensors world-wide. In: SenSys 2006: Proceedings of the 4th international conference on Embedded networked sensor systems, pp. 373–374. ACM, New York (2006) [2] Kuester, F., Gaudiot, J., Hutchinson, T., Imam, B., Jenks, S., Potkin, S., Ross, S., Sorooshian, S., Tobias, D., Tromberg, B., Wessel, F., Zender, C.: HIPerWall: A high-performance visualization system for collaborative earth system sciences (2004), http://dust.ess.uci.edu/prp/prp_Kue04.pdf [3] Jenks, S.: HIPerWall, http://hiperwall.calit2.uci.edu/ [4] Ke, C.Y., Ko, N.Y., Hsueh, C.H., Lee, C.H., Chou, P.H.: EcoPlex: Empowering compact wireless sensor platforms via roaming and interoperability support. In: Proceedings of the Sixth Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services (MobiQuitous 2009), Toronto, Canada, July 13-16 (2009) [5] Wu, S.L., Lin, C.Y., Tseng, Y.C., Sheu, J.P.: A new multi-channel MAC protocol with on-demand channel assignment for multi-hop mobile ad hoc networks. In: ISPAN, p. 232 (2000) [6] So, H.S.W., Walrand, J., Mo, J.: McMAC: A parallel rendezvous multi-channel MAC protocol. In: Wireless Communications and Networking Conference, March 2007, pp. 334–339 (2007) [7] Kim, Y., Shin, H., Cha, H.: Y-MAC: An energy-eﬃcient multi-channel MAC protocol for dense wireless sensor networks. In: IPSN 2008: Proceedings of the 2008 International Conference on Information Processing in Sensor Networks, Washington, DC, USA, pp. 53–63. IEEE Computer Society, Los Alamitos (2008) [8] Le, H.K., Henriksson, D., Abdelzaher, T.: A practical multi-channel media access control protocol for wireless sensor networks. In: IPSN ’08: Proceedings of the 2008 International Conference on Information Processing in Sensor Networks (IPSN 2008), Washington, DC, USA, pp. 70–81. IEEE Computer Society, Los Alamitos (2008)

HiperSense: An Integrated System

263

[9] Krause, A., Horvitz, E., Kansal, A., Zhao, F.: Toward community sensing. In: IPSN 2008: Proceedings of the 7th international conference on Information processing in sensor networks, Washington, DC, USA, pp. 481–492. IEEE Computer Society, Los Alamitos (2008) [10] Ahmad, Y., Nath, S.: COLR-Tree: Communication-eﬃcient spatio-temporal indexing for a sensor data web portal. In: IEEE 24th International Conference on Data Engineering, April 2008, pp. 784–793 (2008) [11] Grosky, W., Kansal, A., Nath, S., Liu, J., Zhao, F.: SenseWeb: An infrastructure for shared sensing. IEEE Multimedia 14(4), 8–13 (2007) [12] Google Traﬃc, http://maps.google.com/ [13] ZigBee Alliance, http://www.zigbee.org/ [14] Park, C., Chou, P.H.: Eco: Ultra-wearable and expandable wireless sensor platform. In: Third International Workshop on Body Sensor Networks (BSN 2006) (April 2006) [15] Ecomote, http://www.ecomote.net/ [16] Bluetooth Low Energy Technology, http://www.bluetooth.com/Bluetooth/Products/low_energy.htm [17] Lee, C.H.: EcoFlock: A clustered handoﬀ scheme for ultra-compact wireless sensor platforms in EcoPlex network. Master’s thesis, National Tsing Hua University (2009) [18] Jumbo Frame, http://en.wikipedia.org/wiki/Jumbo_frame [19] Hsueh, C.H.: EcoExec: A highly interactive execution framework for ultra compact wireless sensor nodes. Master’s thesis, National Tsing Hua University (2009) [20] Cha, H., Choi, S., Jung, I., Kim, H., Shin, H., Yoo, J., Yoon, C.: RETOS: resilient, expandable, and threaded operating system for wireless sensor networks. In: IPSN 2007: Proceedings of the 6th international conference on Information processing in sensor networks, pp. 148–157. ACM, New York (2007)

Applying Architectural Hybridization in Networked Embedded Systems Antonio Casimiro, Jose Ruﬁno, Luis Marques, Mario Calha, and Paulo Verissimo FC/UL {casim,ruf,lmarques,mjc,pjv}@di.fc.ul.pt Abstract. Building distributed embedded systems in wireless and mobile environments is more challenging than if fixed network infrastructures can be used. One of the main issues is the increased uncertainty and lack of reliability caused by interferences and fading in the communication, dynamic topologies, and so on. When predictability is an important requirement, then the uncertainties created by wireless networks become a major concern. The problem may be even more stringent if some safety critical requirements are also involved. In this paper we discuss the use of hybrid models and architectural hybridization as one of the possible alternatives to deal with the intrinsic uncertainties of wireless and mobile environments in the design of distributed embedded systems. In particular, we consider the case of safety-critical applications in the automotive domain, which must always operate correctly in spite of the existing uncertainties. We provide the guidelines and a generic architecture for the development of these applications in the considered hybrid systems. We also refer to interface issues and describe a programming model that is “hybridization-aware”. Finally, we illustrate the ideas and the approach presented in the paper using a practical application example.

1

Introduction

Over the last decade we have witnessed an explosive use of wireless technologies to support various kinds of applications. Unfortunately, when considering real-time systems, or systems that have at least some properties whose correctness depends on the timely and reliable communication, the communication delay uncertainty and unreliability characteristic of wireless networks becomes a problem. It is not possible ignore uncertainty and simply wait until a message arrives, hoping it will arrive soon enough. Our approach to address this problem is considering a hybrid system model, in which a part of the system is asynchronous, namely the part that encompasses the wireless networks and the related computational subsystems, and another part that is always timely, with well deﬁned interfaces to the asynchronous

Faculdade de Ciˆencias da Universidade de Lisboa. Navigators Home Page: http://www.navigators.di.fc.ul.pt. This work was partially supported by FCT through the Multiannual Funding and the CMU-Portugal Programs.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 264–275, 2009. c IFIP International Federation for Information Processing 2009

Applying Architectural Hybridization in Networked Embedded Systems

265

subsystem. In this paper we discuss applicability aspects of this hybrid model, considering in particular safety-critical applications in the automotive domain. In vehicles, safety-critical functions related to automatic speed and steering control are implemented using real-time design approaches, with dedicated controllers that are connected to car sensors and actuators through predictable networks. Despite all the advances in wireless communication technologies, relying on wireless networks to collect information from external sources and using this information in the safety-critical control processes, seems to be too risky. We argue that this may be possible if a hybrid system model and architecture are used. The advantage is the following: with the additional information it may be possible to improve some quality parameters of the control functions, possibly optimizing speed curves and fuel consumption or even improving the overall safety parameters. One fundamental aspect to make the approach viable is to devise appropriate interfaces between the diﬀerent parts of the architecture. On the other hand, special care must be taken when programming safety-critical applications, as we illustrate by providing the general principles of a “hybridization-aware” programming model. The presented ideas and principles have been explored in the HIDENETS European project [9], in which a proof-of-concept prototype platooning application has been developed. We use this example to brieﬂy exemplify the kind of beneﬁts that may be achieved when using a hybrid system and architecture to build a networked embedded system. The paper is structured as follows. Some related work is address in the next section. Then, Section 3 motivates the idea of using hybrid distributed system models and highlights their main advantages. In Section 4 we discuss the applicability of the model in the automotive context and in Section 5 address interface issues and introduce the hybrid-aware programming model. The platooning example case is then provided in Section 6 and we end the paper with some conclusions and future prospects.

2

Related Work

The availability of varied and increasingly better technologies for wireless communication explains the pervasiveness of these networks in our everyday life. In the area of vehicular applications, new standards like the one being developed by IEEE 802.11p Task Group for Wireless Access in Vehicular Environments (WAVE) will probably become the basis on many future applications. The 802.11p standard provides a set of seven diﬀerent logical communication channels among which one is a special dedicated control channel which speciﬁcally aims at allowing some more critical vehicular applications to be developed [1]. In fact, improving the baseline technologies and standards in one of the ways to be able to implement safety-critical systems that operate over wireless networks. And there is a large body of research concerned with studying and proposing solutions to deal with the reliability and temporal uncertainties of wireless communication.

266

A. Casimiro et al.

A line of research consists in devising new protocols for the MAC level using speciﬁc support at the physical level (e.g., Dedicated Short-Range Communications, DSRC) [17] or adopting decentralized coordination techniques, such as using rotating tokens [6]. In fact, the possibility of characterizing communication delays with a reasonable degree of conﬁdence is suﬃcient for a number of applications that provide safety-related information to the driver, for instance to avoid collisions [5,14]. However, these applications are not meant to autonomously control the vehicles and therefore the involved criticality levels are just moderate. In general, and in spite of all improvements, we are still a few steps away of ensuring the levels of reliability and timeliness that are required for the most critical systems. A recent approach that indeed aims at dealing with safety requirements and allow autonomous control of vehicles in wireless and mobile environments is proposed in [3]. The approach relies on the cooperation and coordination between involved entities, and deﬁnes a coordination model that builds on a real-time communication model designated as the Space Elastic model [2]. The Space Elastic model is actually deﬁned to represent the temporal uncertainties associated real wireless communication environments. The work presented in [13] also addresses the coordination of automated vehicles in platoons. In particular, it focuses on the feasibility of coordination scenarios where vehicles are able to communicate with their closest neighbors. The authors argue that in these scenarios, communication between a vehicle and its leader/follower is possible, as supported by simulation results presented in [7]. In contrast with these works, we consider a hybrid system model, which accommodates both the asynchrony of the wireless environments and the predictable behavior of the embedded control systems and local networks. In the area of wireless sensor networks eﬀorts have also been made in devising architectures and protocols to address the temporal requirements of applications. One of the ﬁrst examples is the RAP architecture [11], which deﬁnes query and event services associated to new network scheduling policies, with the objective of lowering deadline miss ratios. More recent examples include VigilNet, for real-time target tracking [8] in large-scale sensor networks, and TBDS [10], a mechanism for node synchronization in cluster-tree wireless sensor networks. Our focus is at a higher conceptual level, abstracting from the speciﬁc protocols, network topologies and wireless technologies that are used.

3

Hybrid System Models

Classical distributed system models range from purely asynchronous to fully synchronous, assume diﬀerent failure models, from crash to Byzantine. But independently of the particular synchrony or failure model that is assumed, they are typically homogeneous, meaning that the assumed properties apply to the entire system, and do not change over time. However, in many real systems and environment, we observe that synchrony or failure modes are not homogeneous: they vary with time or with the part of the system being considered.

Applying Architectural Hybridization in Networked Embedded Systems

267

Therefore, in the last few years we have been exploring the possibility of using hybrid distributed system model approaches, in which diﬀerent parts of the system have diﬀerent sets of properties (e.g. synchronism [16] or security [4]). Using hybrid models has a number of advantages when compared to approaches based on homogeneous models. The main advantages include more expressiveness with respect to reality, the provision of a sound theoretical basis for crystal-clear proofs of correctness, the possibility of being naturally supported by hybrid architectures and, ﬁnally, the possibility of enabling concepts for building totally new algorithms. One example of a hybrid distributed system model is the Wormholes model [15]. In essence, this model describes systems in which it is possible to identify a subsystem that presents exceptional properties allowing overcoming fundamental limitations of overall system if seen as a whole. For instance, a distributed system in which nodes are connected by a regular asynchronous network, but in which there is also a separate real-time network connecting some synchronization subsystem in each node, can be well described by the Wormholes model. Another very simple example, in which the wormhole subsystem is only local to a node, is a system equipped with a watchdog. Despite the possible asynchrony of the overall system, the watchdog is synchronous and always resets the system in a timely manner whenever necessary. We must note that designing systems based on the Wormholes model is not just a matter of assuming that uncertainty is not ubiquitous or does not last forever. The design philosophy also builds on the principle that predictability must be achieved in a proactive manner, that is, the system must be built in order to make predictability happen at the right time and right place.

4

Application in Automotive Context

The wormhole concept can in fact be instantiated in diﬀerent ways, and here we discuss the possible application of the concept to car systems. Therefore, we ﬁrst provide an overview of system components that are found in modern cars, and then we explain how an hybrid architecture can be projected over these systems. 4.1

In-Vehicle Components

Modern cars include a wide set of functions to be performed by electronics and microcontrollers complementing and in many cases totally replacing the traditional mechanical and/or hydraulic mechanisms. These functions include both hardware and software components and are usually structured around Electronic Control Units (ECUs), using the terminology of the automotive industry, which are subsystems composed of a microcontroller complemented with an appropriate set of sensors and actuators. The functions being replaced by these components are quite diverse and aim to assist the driver in the control of the vehicle. They range from critical functions associated to the control of the power train (e.g. engine and transmission related

268

A. Casimiro et al.

functions), traction (e.g. driving torque), steering or braking, to less critical ones to control the diﬀerent devices in the body of the vehicle, such as lights, wipers, doors and windows, seats, climate systems, just to name a few. Recently, a slightly diﬀerent set of functions is also being incorporated. They are related to information, communication and entertainment (e.g. navigation systems, radio, audio and video, multimedia, integrated cellular phones, etc). The implementation of these functions is supported in specialized ECUs. However, many of these functions are distributed along the car infrastructure. Thus, there is a need for those functions to be distributed over several ECUs that exchange information and communicate through in-vehicle networking. Furthermore, there may be required to exchange information between ECUs implementing diﬀerent functions. For example, the vehicle speed obtained from a wheel rotation sensor may be required for gearbox control or for the control of an active suspension subsystem, but it may also be useful for other subsystems. Given the diﬀerent functional domains have diﬀerent requirements in terms of safety, timeliness and performance guarantees, the interconnection of the different ECUs is structured along several networks, classiﬁed according to their available bandwidth and function. There are four classes of operation, including one (Class C) with strict requirements in terms of dependability and timeliness, and another (Class D) for high speed data exchanges such as those required for mobile multimedia applications. The combination of the functions typically provided by each one of those four networking classes involves network interconnection through gateways, as illustrated in Figure 1.

Fig. 1. Typical In-Vehicle Networking

The in-vehicle ECUs provide support for the diﬀerent functions implemented in nowadays cars. Each ECU is composed of a computing platform where the ECU software is executed. The computing platform is typically complemented with some speciﬁc hardware, a set of sensors, for gathering data from the system under control and a set of actuators which allows to act over the given car subsystem. The support of drive by wire functions integrating a set of sensors (e.g. proximity sensor) and actuators (e.g. speed and brake control) are just one example with relevance for the platooning application that we refer in Section 6. Others ECUs may exhibit a slightly diﬀerent architecture because they are intended to support diﬀerent functions. One example is illustrated in Figure 2,

Applying Architectural Hybridization in Networked Embedded Systems

269

Fig. 2. Example of In-Vehicle Infotainment Functions

intended to support the integration of infotainment functions. In this case, the architecture of the computing platform is designed to interface and to integrate the operation of multiple gadgets (radio, cellular phone) and technologies. 4.2

Architectural Hybridization in Vehicles

Given the description provided above, it is clear that there is a separation between what may be called a general computing platform, able to run general purpose local and distributed applications connected through wireless networks, and embedded systems dedicated to the execution of speciﬁc car functions. Interestingly, there exist gateways between these diﬀerent subsystems, which allow for information to ﬂow across their boundaries. For example, the information provided by a proximity sensor in the car electronic subsystem may be highly relevant for a driver warning application running in the general computing platform. However, sending information from the general purpose system to a critical control system is not so trivial, and as far as we know is typically avoided. We argue that in this context it is interesting and useful to apply the wormholes hybrid model in order to explicitly assume the existence of a general (payload) system, asynchronous, but in which complex applications can be executed without special concern for timeliness, and a wormhole subsystem, which is timely, reliable, and in which it is possible to execute critical functions to support interactions with the payload system. The wormhole must provide at least one Timely Timing Failure Detection (TTFD) service, available to payload applications, to allow the detection of timing failures in the payload part or in the payload-wormhole interactions. This TTFD service must also be able to timely trigger the execution of fault handling operations for safety purposes. These handlers have to be implemented as part of the wormhole subsystem and will necessarily be application dependant. With these settings it is possible to deal with information ﬂows from the payload side to the critical subsystems, thus allowing developing applications that run in the general computing platform, which are able to exploit the availability of wireless communication, and which are still able to control critical systems

270

A. Casimiro et al.

in a safe way. Of course that in order for this to be possible, the applications must be programmed in a way that is “hybridization-aware”, explicitly using the TTFD service provided by the wormhole subsystem and being complemented by safety functions that must be executed on predictable subsystems. In the following section we describe the architectural components that constitute the hybrid system, focusing on these interfacing and programming issues.

5 5.1

Designing Applications in Hybrid Systems Generic Architecture

Asynchronous control task

Admission layer Control Task

Shared memory 1/0 Safety Task

TTFD Task

synchronous real-time subsystem

Gateway

asynchronous payload

In the proposed approach for the design of safety-critical applications in hybrid systems, the system architecture must necessarily encompass the two realms of operation: the asynchronous payload and the synchronous real-time subsystem, as illustrated in Figure 3.

Sensors, actuators

Fig. 3. System architecture for asynchronous control

A so called asynchronous control task executes in the payload part, possibly interacting with external systems through wireless or other non real-time networks. Interestingly, this asynchronous control task can perform complex calculations using varied data sources in order to achieve improved control decisions. On the real-time (or wormhole) part of the system, several tasks will be executed in a predictable way, always satisfying (by design) any required deadline. In order to exploit the synchronism properties of the wormhole part of the system, the interface to access wormhole services must be carefully designed. The solution requires the deﬁnition of a wormhole gateway, much like the gateways between the diﬀerent network classes that are deﬁned in car architectures. This wormhole gateway includes an admission layer, which restricts the patterns of service requests as a means to secure the synchrony properties of the wormhole subsystem (we assume that the payload system can be computationally powerful, and the number of requests sent to the wormhole subsystem is not

Applying Architectural Hybridization in Networked Embedded Systems

271

bounded a priori). Some service requests may be delayed, rejected or simply not executed because of lack of resources. This behavior is admissible because from the perspective of the asynchronous system, no guarantees are given anyway. Several interface functions may be made available, some of which speciﬁcally related to the application being implemented (e.g., functions for control commands to be sent to actuators or ECUs, and for sensor information to be read). At least it is necessary to provide a set of functions to access and use the TTFD service. The role of the TTFD service is fundamental: in simple terms, it is a kind of “enhanced watchdog” programmed by the payload application, and it works as a switching device that gives control to the wormhole part when the payload becomes untimely. A more detailed description of the TTFD service and how it must be used is provided in Section 5.2. A control task is deﬁned within the gateway, which will implement the speciﬁc functions and will also interact with the TTFD service, forwarding start and stop commands received from the payload. The task may also decide whether an actuation command can eﬀectively be applied or not, depending on the timeliness status of the payload. A safety task must also be in place, which will be in charge of ensuring a safe control whenever the asynchronous control task is prevented from taking over the control. This safe control will be done using only the locally available information, collected from local sensors. This control task can be designed to keep the system in a safe state, but this will be a pessimistic control in the sense that it will be based only on local information. The eﬀective activation of this task is controlled by the TTFD service, using a status variable in a shared-memory structure, or some equivalent implementation. Quite naturally, each speciﬁc application must have its own associated safety task. Therefore, although the architecture is generically applicable to safety-critical applications in hybrid systems, some components must be instantiated on a case-by-case basis. In Figure 3 we also represent the sensors and actuators, which are necessarily part of the real-time subsystem. 5.2

Using the TTFD Service

A fundamental idea underlying the approach is to use the TTFD service to monitor the timeliness of a payload process. The TTFD service provides the following functions: startTFD, stopTFD and restartTFD. The startTFD function speciﬁes a start instant to start the monitoring of a timed action and a maximum duration for that action. The handling functions that are executed when a timing failure is detected must be programmed a priori as part of the wormhole. A speciﬁc handler may be speciﬁed when starting a timing failure monitoring activity. The stopTFD function stops the on-going monitoring activity, returning an indication of whether the timed execution was timely terminated or not. The restartTFD function allows the atomic execution of a stopTFD request followed by a startTFD request. Before starting a timed execution, the TTFD service is requested to monitor this execution and a deadline is provided. If the execution is timely, then the

272

A. Casimiro et al.

TTFD monitoring activity will be stopped before the deadline. Otherwise, when the deadline is reached the TTFD service will raise a timing fault condition (possibly a boolean variable in the shared memory, as shown in Figure 3). From a programmers view perspective, and considering that we are concerned with the development of asynchronous control applications, there are two important issues to deal with: a) determining the deadline values provided to the TTFD service; b) use the available functions in a way that ensures that either the execution is timely (thus allowing control commands to be issued) or else a timing failure is always detected (and safety handler can at least be executed). The deadline must be such that the application is likely able to perform the necessary computations within that deadline. In control, there is a tradeoﬀ between reducing the duration of the control cycle and the risk of not being able to compute a control decision within the allowed period. On the other hand, specifying large deadlines will have a negative inﬂuence on the quality of control. The other restriction for the deadline is determined by safety rules and by the characteristics of a fail-safe real-time control task that will be activated when the deadline is missed. The deadline must be such that the fail-safe control task, when activated, is still able to fulﬁll the safety requirements. The second issue concerns the way in which interactions between the payload and the wormhole must be programmed, which we discuss in what follows. 5.3

Payload-Wormhole Interactions

In the proposed architecture, TTFD requests are in fact directed to the control task, along with actuation commands. That is, when the asynchronous control task sends an actuation command, it is implicitly ﬁnishing an on-going timed action, and it must explicitly start a new one by specifying a deadline for the next actuation instant. The idea is the following: when an actuation command is sent from the payload to the wormhole, it is supposed to be sent before a previously speciﬁed deadline. Therefore, when the command is received by the control task, this task ﬁrst has to stop the on-going TTFD monitoring activity. Depending on the returned result, the control task will know if the actuation command is timely (and hence can be safely used and applied to actuators) or if it is untimely (in which case, it will just be discarded). In the latter case, the TTFD must already have triggered the failure indication. In fact, this indicator is used by the safety task to decide if it should indeed become active and take over the control of the system. As soon as a timing failure occurs, the indicator is activated, and the safety task will take over the next time it is released. This means that a late command received from the payload will be ignored, and it will be ensured that the safety task will already be in control. In a steady state, the asynchronous control task will be continuously sending commands to the the wormhole, timely stopping the on-going TTFD monitoring activity, atomically restarting the TTFD for a future point in time (the next actuation deadline), and applying the control command.

Applying Architectural Hybridization in Networked Embedded Systems

6

273

Platooning Example

Let us consider the example of a platooning application, in which the objective is to achieve a better platoon behavior (keep cars close together at the maximum possible speed), using not only the information available from local car sensors, but also other information collected from other cars or from ﬁxed stations along the road. The hybrid architecture will encompass an asynchronous platooning control task running in some on-board general purpose computer, processing all the available information and producing control decisions that must be sent to the vehicle ECUs. The information exchanged between vehicles (through the wireless network) includes time, position and speed values. This information is relevant for the platooning control application, since it will know where to position each other car in a virtual map and hence what to do regarding the own car speed. Clocks are assumed to be synchronized through GPS receivers and accelerations (positive and negative) are bounded. In this way, worst case scenarios can be considered when determining the actuation commands. Every car in the platoon periodically retrieves the relevant information from local sensors (through the wormhole interface), disseminates this information, and hopefully receives the same information from the other cars. In the platooning application case, failures in the communication will not have serious consequences. In fact, if a car does not receive information from the preceding car, it will use old information and will “see” that car closer than it is in reality. The consequence is that the follower car will stop, even if not necessary. Given the periodic nature of the payload message exchanges, the asynchronous control tasks may become aware of lost or very delayed messages (if timeouts are used) and refrain from sending actuation commands to the wormhole. In this case, or if the payload becomes to slow (remember that this is a general purpose computing environment), the actuation commands expected by the wormhole will not be received or will arrive too late, and meanwhile the safety task is activated to take over the control of the car. From the platooning application perspective, the proposed implementation provides some clear improvements over a traditional implementation. The latter is pessimistic in the sense that it must ensure larger safety distances between cars, in particular at high speeds, since no information is available about the surrounding environment and in particular about the speed of the preceding car. On the other hand, in the prototype we implemented it is possible to observe that independently of the platoon speed, the distance between every two cars is kept constant because follower cars are able to know the distance to the preceding car, and its speed also. We implemented a prototype of this platooning application, which was demonstrated using emulators for the physical environment and for the wireless network. Figure 4 illustrates some of the hardware and a graphical view of the platoon in the physically emulated reality. The interested reader can refer to [12], which provides additional details about this demonstration.

274

A. Casimiro et al.

Fig. 4. Platooning demonstration

7

Conclusions

The possibility of using wireless networks for car-to-car or car-to-infrastructure interactions is very appealing. The availability of multiple sources of information can be used to improve the quality of control functions and implicitly the safety or the fuel consumptions. The problem that we addressed in this paper is concerned with the potential lack of timeliness and with the unreliability of wireless networks, which make it diﬃcult to consider their use when implementing real-time applications or safety-critical systems. We propose an approach that is based on the use of a hybrid system model and architecture. The general idea is to allow applications to be developed on a general purpose part of the system, typically asynchronous, and provide the means to ensure that safety-critical properties are always secured. Since we focus on applications for the vehicular domain, typically control applications, we ﬁrst explain why the considered hybrid approach is very reasonable in this context. Then we provide the guidelines for designing asynchronous control applications, explaining in particular how the interactions between the payload and the wormhole subsystems should be programmed. From the experience we had in the development of the platooning example application and from the observations we made while executing our demonstration system, we conclude that the proposed approach constitutes a potentially interesting alternative for the implementation of optimized safety-critical systems in wireless environments.

References 1. IEEE P802.11p/D3.0, Part 11: Wireless LAN Medium Access Contrl (MAC) and Physical Layer (PHY) Specifications: Amendment: Wireless Access in Vehicular Environments (WAVE), Draft 3.0 (July 2007) 2. Bouroche, M., Hughes, B., Cahill, V.: Building reliable mobile applications with space-elastic adaptation. In: WOWMOM 2006: Proceedings of the 2006 International Symposium on on World of Wireless, Mobile and Multimedia Networks, Washington, DC, USA, pp. 627–632. IEEE Computer Society, Los Alamitos (2006)

Applying Architectural Hybridization in Networked Embedded Systems

275

3. Bouroche, M., Hughes, B., Cahill, V.: Real-time coordination of autonomous vehicles. In: Proceedings of the IEEE Intelligent Transportation Systems Conference 2006, September 2006, pp. 1232–1239 (2006) 4. Correia, M., Ver´ıssimo, P., Neves, N.F.: The design of a COTS real-time distributed security kernel. In: Bondavalli, A., Th´evenod-Fosse, P. (eds.) EDCC 2002. LNCS, vol. 2485, pp. 234–252. Springer, Heidelberg (2002) 5. Elbatt, T., Goel, S.K., Holland, G., Krishnan, H., Parikh, J.: Cooperative collision warning using dedicated short range wireless communications. In: VANET 2006: Proceedings of the 3rd international workshop on Vehicular ad hoc networks, pp. 1–9. ACM, New York (2006) 6. Ergen, M., Lee, D., Sengupta, R., Varaiya, P.: WTRP - Wireless Token Ring Protocol. IEEE Transactions on Vehicular Technology 53(6), 1863–1881 (2004) 7. Halle, S., Laumonier, J., Chaib-Draa, B.: A decentralized approach to collaborative driving coordination. In: Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems, October 2004, pp. 453–458 (2004) 8. He, T., Vicaire, P., Yan, T., Luo, L., Gu, L., Zhou, G., Stoleru, R., Cao, Q., Stankovic, J.A., Abdelzaher, T.: Achieving real-time target tracking using wireless sensor networks. In: RTAS 2006: Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, Washington, DC, USA, pp. 37–48. IEEE Computer Society, Los Alamitos (2006) 9. HIDENETS, http://www.hidenets.aau.dk/ 10. Koubˆ aa, A., Cunha, A., Alves, M., Tovar, E.: Tdbs: a time division beacon scheduling mechanism for zigbee cluster-tree wireless sensor networks. Real-Time Syst. 40(3), 321–354 (2008) 11. Lu, C., Blum, B.M., Abdelzaher, T.F., Stankovic, J.A., He, T.: Rap: A real-time communication architecture for large-scale wireless sensor networks. In: Eighth IEEE Real-Time and Embedded Technology and Applications Symposium, Washington, DC, USA, pp. 55–66. IEEE Computer Society, Los Alamitos (2002) 12. Marques, L., Casimiro, A., Calha, M.: Design and development of a proof-ofconcept platooning application using the HIDENETS architecture. In: Proceedings of the 2009 IEEE/IFIP Conference on Dependable Systems and Networks, pp. 223–228. IEEE Computer Society Press, Los Alamitos (2009) 13. Michaud, F., Lepage, P., Frenette, P., Letourneau, D., Gaubert, N.: Coordinated maneuvering of automated vehicles in platoons. IEEE Transactions on Intelligent Transportation Systems 7(4), 437–447 (2006) 14. Misener, J.A., Sengupta, R.: Cooperative collision warning: Enabling crash avoidance with wireless. In: 12th World Congress on ITS, New York, NY, USA (November 2005) 15. Verissimo, P.: Travelling through wormholes: a new look at distributed systems models. SIGACT News 37(1), 66–81 (2006) 16. Ver´ıssimo, P., Casimiro, A.: The Timely Computing Base model and architecture. IEEE Transactions on Computers - Special Issue on Asynchronous Real-Time Systems 51(8) (August 2002); A preliminary version of this document appeared as Technical Report DI/FCUL TR 99-2, Department of Computer Science, University of Lisboa (April 1999) 17. Xu, Q., Mak, T., Ko, J., Sengupta, R.: Vehicle-to-vehicle safety messaging in dsrc. In: VANET 2004: Proceedings of the 1st ACM international workshop on Vehicular ad hoc networks, pp. 19–28. ACM Press, New York (2004)

Concurrency and Communication: Lessons from the SHIM Project Stephen A. Edwards Columbia University, New York, ny, usa [email protected]

Abstract. Describing parallel hardware and software is diﬃcult, especially in an embedded setting. Five years ago, we started the shim project to address this challenge by developing a programming language for hardware/software systems. The resulting language describes asynchronously running processes that has the useful property of schedulingindependence: the i/o of a shim program is not aﬀected by any scheduling choices. This paper presents a history of the shim project with a focus on the key things we have learned along the way.

1

Introduction

Shim, an acronym for “software/hardware integration medium,” started as an attempt to simplify the challenges of passing data across the hardware-software boundary. It has since turned into a language development eﬀort centered around a scheduling-independent (i.e., race-free) concurrency model and static analysis. The purpose of this paper is to lay out the history of the shim project with a special focus on what we learned along the way. It is deliberately light on technical details (which can be found in the original publications) and instead tries to contribute intuition and insight. We begin by discussing the original motivations for the project, how it evolved into a study of concurrency models, how we chose a particular model, and how we have added language features to that model. We conclude with a section highlighting the central lessons we have learned along with open problems.

2

Embryonic shim

We started developing shim in 2004 after observing the diﬃculties our students were having building embedded systems [1,2] that communicated across the hardware/software boundary. The central idea was to provide variables that could be accessed equally easily by either hardware processes or software functions, both written in C-like dialect. Figure 1 shows a simple counter in this dialect of the language. The count function resides in hardware; the other two are in software. When a software function like get time would reads the hardware register counter, the compiler would automatically insert code that would fetch its value from the hardware and synthesize vhdl that could send the data S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 276–287, 2009. c IFIP International Federation for Information Processing 2009

Concurrency and Communication: Lessons from the SHIM Project

277

module timer { shared uint:32 counter; /∗ Hardware register visible from software ∗/ hw void count() { /∗ Hardware function ∗/ counter = counter + 1; /∗ Direct access to hardware register ∗/ } out void reset timer() { /∗ Software function ∗/ counter = 0; /∗ Accesses register through bus ∗/ }

}

out uint get time() { /∗ Software function ∗/ return counter; /∗ Accesses register through bus ∗/ }

Fig. 1. An early fragment of shim [1]

on a bus when requested. We wrote a few variants of an i2 c bus controller in the language, starting with an all-software version and ending with one that implemented byte-level communication completely in hardware. The central lesson of this work was that the shared memory model, while simple, was a very low-level way to implement such communication. Invariably, it is necessary to layer over it another communication protocol (e.g., some form of handshake) to ensure coherence. We had not included an eﬀective mechanism for implementing communication libraries that could hide this fussy code, so it was back to the drawing board.

3

Kahn, Hoare, and the shim Model

We decided we wanted reliable communication, including any across the hardware/software boundary, to be a centerpiece of the next version of shim. Erroneous communication is a central source of bugs in hardware designs: our embedded system students’ favorite mistake was to generate a value in one cycle and read it in another. This rarely produces even a warning in usual hardware simulation, so it can easily go unnoticed. We also found the inherent nondeterminism of the ﬁrst iteration of shim a key drawback. The speed at which software runs on processors is rarely known, let alone controlled. Since software and hardware run in parallel and communicate using shared variables, the resulting system was nondeterministic, making it diﬃcult to test. It also ran counter to what we had learned from Esterel [3]. Table 1 shows our wishlist. We wanted a concurrent, deterministic (i.e., independent of scheduling) model of computation and started looking around. The synchronous model [4] was unsuitable because it generally assumes either a single or harmonically related clocks and would not work well with software.

278

S.A. Edwards Table 1. The shim Wishlist

Trait

Motivation

Concurrent Mixes synchronous and asynchronous styles Only requires bounded resources Formal semantics Scheduling-independent

Hardware/software systems fundamentally parallel Software slower and less predictable than hardware; need something like multirate dataﬂow Fundamental restriction on hardware No arguments about meaning or behavior i/o should not depend on program implementation

Steve Nowick steered us toward the body of work on delay-independent circuits (e.g., van Berkel’s Handshake circuits [5]). We compared this class of processes to Kahn’s networks [6] and found them to be essentially the same [7]. We studied how to characterize such processes [8], ﬁnding that we could characterize them as functions that, when presented with more inputs or output opportunities, never produced less or diﬀerent data. In their classic form, the unbounded buﬀers of Kahn networks actually make them Turing-complete [9] and diﬃcult to schedule [10], so we decided on a model in which Kahn networks were restricted to csp-like rendezvous [11]. Others, such as Lin [12] had also proposed using such a model. In 2005, we presented our new shim model and a skeleton of the language, “Tiny-shim,” and its formal semantics [13]. It amounted to read and write operations sewn together with the usual C-like expressions and control-ﬂow statements. We later extended this work with further examples, a single-threaded C implementation, and an outline of a hardware translation [14]. In 2006, we published our ﬁrst real research result with shim: a technique for very eﬃcient single-threaded code generation [15]. The centerpiece of this work was an algorithm that could compile arbitrary groups of processes into a single automaton whose states abstracted the control states of the processes. Our goal was to eliminate synchronization overhead, so the automaton captured which processes were waiting on which channels, but left all other details, such as variable values and details of the program counters, to the automaton. Figure 2 demonstrates the algorithm from Edwards and Tardieu [15]. The automaton’s states are labeled with a number (e.g., S0), the state of each channel in the system (ready “-”, blocked reading “R”, or blocked writing “W”), and, for √ each process, whether it is runnable ( ) or blocked on a channel (×), and a set of possible program counters. From each state, the automaton generator (a.k.a., the scheduler) nondeterministically chooses one of the runnable processes to execute and generates a state by considering each possible pc value for the process. The code generated for a state with multiple pc values begins with a C switch statement that splits control depending on the pc value.

Concurrency and Communication: Lessons from the SHIM Project

process sink(int32 B) { for (;;) B; }

sink 0 PreRead 1 1 PostRead 1 tmp3 2 goto 0

process buﬀer(int32 &B, int32 A) { for (;;) B = A; }

buﬀer 0 PreRead 0 1 PostRead 0 tmp2 2 tmp1 := tmp2 3 Write 1 tmp1 4 goto 0

process source(int32 &A) { A = 17; A = 42; A = 157; A = 8; } network main() { sink(); buﬀer(); source(); } (a)

source 0 tmp4 := 17 1 Write 0 tmp4 2 tmp5 := 42 3 Write 0 tmp5 4 tmp6 := 157 5 Write 0 tmp7 6 tmp8 := 8 7 Write 0 tmp8 8 Exit (b)

S0 -√ {0} √ {0} √ {0}

sink

279

S1 -R ×{1} √ {0, 4} √ {0, 2, 4, 6, 8} buﬀer

source

S2 RR ×{1} ×{1} √ {0, 2, 4, 6, 8} source

S5 RR ×{1} ×{1} ×{8}

S3 WR ×{1} √ {1} ×{2, 4, 6, 8} buﬀer S4 -W √ {1} ×{4} sink √ {2, 4, 6, 8} (c)

Fig. 2. An illustration of the shim language and its automaton compilation scheme from Edwards and Tardieu [15]. A source program (a) is dismantled into intermediate code (b), then simulated to produce an automaton (c). Each state is labeled with its name, the state of each channel (blocked on read, blocked on write, or idle), and the state of each process (runnable, and possible program counter values).

At this point, the language fairly closely resembled the Tiny-shim language of the Emsoft paper [13]. A system consisted of a collection of sequential processes, assumed to all start when the system began. It could also contain networks—groups of connected processes that could be instantiated hierarchically. One novel feature of this version, which we later dropped, was the ability to instantiate processes and networks without supply explicit connections. Instead, the compiler would examine the interface to each instantiated process and make sure its environment supplied such a signal. Connections were made implicitly by name, although this could be overridden. This feature arose from observing how in vhdl it is often necessary to declare and mention each channel many times: once for each process, once for each instantiation of each process, and once in the environment in which it is instantiated. However, in the process of writing more elaborate test cases, such as a jpeg decoder [16], we decided that this connection-centric speciﬁcation style (which we adopted from hardware description languages) was inadequate for any sort of interesting software. We wanted function calls.

280

4

S.A. Edwards

Recursion

In 2006, we introduced function calls and recursion to shim, making it very Clike [17]. Our main goal was to make basic function calls work, allowing the usual re-use of code, but we also found that recursion, especially bounded recursion, was a useful mechanism for specifying more complex structures. void buﬀer( int i, int &o) { void ﬁfo(int i, int &o, int n) { for (;;) { int c; int m = n − 1; recv i; if (m) o = i; buﬀer(i, c) par ﬁfo(c, o, m); send o; else } buﬀer(i, o); } } Fig. 3. An n-place fifo speciﬁed using recursion, from Tardieu and Edwards [17]

Figure 3 illustrates this style. The recursive ﬁfo procedure calls itself repeatedly in parallel, eﬀectively instantiating buﬀer processes as it goes. This recursion runs only once, when the program starts, to set up a chain of single-place buﬀers.

5

Exceptions

Next, we added exceptions [18], certainly the most technically diﬃcult addition we have made. Inspired by Esterel [3], where exceptions are used not just for occasional error handling but as widely as, say, if-then-else, we wanted our exceptions to be widely applicable and be concurrent and scheduling-independent. For sequential code, the semantics of exceptions were clear: throwing an exception immediately sends control to the most-recently-entered handler for the given exception, terminating any functions that were called in between. For concurrently running functions, the right behavior was less obvious. We wanted to terminate everything leading up to the handler, including any concurrently running relatives, but we insisted on maintaining shim’s scheduling independence, meaning we had to carefully time when the eﬀect of an exception was felt. Simply terminating siblings when one called an exception would be nondeterministic: the behavior would then depend on the relative execution rates of of the processes and thus not be scheduling independent. Our solution was to piggyback the exception mechanism on the communication system, i.e., a process would only learn of an exception when it attempted to communicate, the only point at which processes agree on the time. To accommodate exceptions, we introduced a new, “poisoned,” state for a process that represents when it has been terminated by an exception and is waiting for its relatives to terminate. Any process that attempts to communicate with a poisoned process will itself become poisoned. In Figure 5, the ﬁrst thread throws

Concurrency and Communication: Lessons from the SHIM Project

281

void main() { int i; i = 0; try { i = 1; throw T; i = i ∗ 2; // is not executed } catch(T) { i = i ∗ 3; } // i = 3 }

void main() { int i; i = 0; try { // thread 1 throw T; } par { // thread 2 for (;;) i = i + 1; // runs forever } catch(T) {} }

(a)

(b)

Fig. 4. (a) Sequential exception semantics are classical. (b) Thread 2 never feels the eﬀect of the exception because it never communicates. From Tardieu and Edwards [18].

void main() { chan int i = 0, j = 0; try { // thread 1 while (i < 5) next i = i + 1; throw T; // poisons itself } par { // thread 2 for (;;) next j = next i + 1; // poisoned by thread 1 } par { // thread 3 for (;;) recv j; // poisoned by thread 2 } catch (T) {} } Fig. 5. Transitive Poisoning: throw T poisons the ﬁrst process, which poisons the second when the second attempts next i. Finally the third is poisoned when it attempts recv j and the whole group terminates.

an exception; the second thread is poisoned when it attempts to rendezvous on i, and the third is poisoned by the second when it attempts to rendezvous on j. The idea was simple enough, and the interface it presented to the programmer could certainly be used and explained without much diﬃculty, but implementing it turned out to be a huge challenge, despite there being fairly simple set of structural operational semantics rules for it. The real complexity came from having to consider exception scope, which limits how far the poison propagates (it does not propagate outside the scope of the exception) and the behavior of multiple, concurrently thrown exceptions.

6

Static Analysis

Shim has always been designed for aggressive compiler analysis. We have attempted to keep its semantics simple, scheduling-independent, and restrict it to ﬁnite-state models. Together, these have made it easier to analyze.

282

S.A. Edwards

We developed a technique for removing bounded recursion from shim programs [19]. One goal was to simplify shim’s translation into hardware, where general recursion would require memory for a stack and choosing a size for it, but it has found many other uses. In particular, if a program has only bounded recursion, it is ﬁnite-state, simplifying other analysis steps. The basic idea of our work was to unroll recursive calls by exactly tracking the behavior of variables that control the recursion. Our insight was to observe that for a recursive function to terminate, the recursive call must be within the scope of a conditional. Therefore, we need to track the predicate of this conditional, see what can aﬀect it, and so forth. Figure 6 illustrates what this procedure does to a simple fifo. To produce the static version in Figure 6(b), our procedure observes that the n variable controls the predicate around ﬁfo’s recursive call of itself. It then notices that n is initially bound to 3 by ﬁfo3 and generates three specialized versions of ﬁfo—one with n = 3, n = 2, and n = 1—simpliﬁes each, then inlines each function, since each is only called once. Of course, in the worst case our procedure could end up trying to track every variable in the program, which would be impractical, but in many examples we tried, recursion control only involved a few variables, making it easy to resolve. A key hypothesis of the shim project has been that scheduling independence should be a property of any practical concurrent language because it greatly simpliﬁes reasoning about a program, both by the programmer and by automated tools. Our work on static deadlock detection reinforces this key point. Shim is not immune to deadlocks (e.g., { recv a; recv b; } par { send b; send a; } is about the simplest example), but they are simpler in shim because of its scheduling-independence. Deadlocks in shim cannot occur because of race conditions. For example, because shim does not have races, there are no racevoid ﬁfo3(chan int i, chan int &o) { ﬁfo(i, o, 3); } void ﬁfo(chan int i, chan int &o, int n) { if (n > 1) { chan int c; buf(i, c); par ﬁfo(c, o, n−1); } else buf(i, o); } void buf(chan int i, chan in &o) { for (;;) next o = next i; } (a)

void ﬁfo3(chan int i, chan int &o) { chan int c1, c2, c3; buf(i, c1); par buf(c1, c2); par buf(c2, o); } void buf(chan int i, chan in &o) { for (;;) next o = next i; } (b)

Fig. 6. Removing bounded recursion, controlled by the n variable, from (a) gives (b). After Edwards and Zeng [19].

Concurrency and Communication: Lessons from the SHIM Project

283

induced deadlocks, such as the “grab locks in opposite order” deadlock race present in many other languages. In general, shim does not need to be analyzed under an interleaved model of concurrency since most properties, including deadlock, are the same under any schedule. So all the clever partial order tricks used by model checkers such as spin [20], are not necessary for shim. We ﬁrst used the synchronous model checker nusmv [21] to detect deadlocks in shim [22]—an interesting choice since shim’s concurrency model is fundamentally asynchronous. Our approach was to abstract away data operations and choose a speciﬁc schedule in which each communication event takes a single cycle. This reduced the shim program to a set of communicating state machines suitable for the nusmv model checker. We continue to work on deadlock detection in shim. Most recently [23], we took a compositional approach where we build an automaton for a complete system piece by piece. Our insight is that we can usually abstract away internal channels and simplify the automaton without introducing or avoiding deadlocks. The result is that even though we are doing explicit model-checking, we can often do it much faster than a state-of-the art symbolic model checker such as nusmv. We have also used model checking to search for situations where buﬀer memory can be shared [24]. In general, each communication channel needs storage for any data being communicated over it, but in certain cases, it is possible to prove that two channels can never be active simultaneously. We use the nusmv model checker to identify these cases, which allow us to share potentially large buﬀers across multiple channels. Because this is optimization, if the model checker becomes overloaded, we can we safely analyze the system in smaller pieces.

7

Backends

We have developed a series of backends for the shim compiler; each works oﬀ a slightly diﬀerent intermediate representations. First, we developed a code generator that produced single-threaded C [14] for a variant of Tiny-shim, which had only point-to-point channels. The runtime system maintained a linked list of runnable processes, and for each channel, tracked what process, if any, was blocked on it. Each process was compiled into a separate C function, which stored its state as a global integer and used an switch statement to restore it. This worked well, although we could improve runtimes by compiling away communication overhead through static scheduling [15]. To handle multi-way rendezvous, exceptions, and recursion on parallel hardware we needed a new technique. Our next backend [25] generated C code that made calls to the posix thread library to ask for parallelism. The challenge was to minimize overhead. Each communication action would acquire the lock on a channel, check whether every process connected to it had also blocked (i.e., whether the rendezvous could occur), and then check if the channel was connected to a poisoned process (i.e., a relevant exception had been thrown). All of these checks ran quickly; actual communication and exceptions took longer.

284

S.A. Edwards

We also developed a backend for ibm’s cell processor [26]. A direct oﬀshoot of the pthreads backend, it allows the user to assign computationally intensive tasks to the cell’s synergistic processing units (spus); remaining tasks run on the cell’s powerpc core (ppu). Our technique replaces the oﬄoaded functions with wrappers that communicate across the ppu-spu boundary. Cross-boundary function calls are technically challenging because of data alignment restrictions on function arguments, which we would have preferred to be stack-resident. This, and many other fussy aspects of coding for the cell, convinced us that such heterogeneous multicore processors demand languages at a higher level than C.

8 8.1

Lessons and Open Problems Function Calls

Early version of the language did not support classical software-like function calls. However, these are extremely useful, even in dataﬂow-centric descriptions, that they really need to be part of just about any language. We were initially deceived by the rare use of function calls in vhdl and Verilog, but we suspect this is because they do not ﬁt easily into the register-transfer model. 8.2

Two-Way vs. Multi-way Rendezvous

Initial versions of shim only used two-way rendezvous, but after a discussion with Edward Lee, we became convinced that multi-way rendezvous was useful to provide at the language level. Debugging was one motivation: with multiway rendezvous, it becomes easy to add a monitor that can observe data ﬂowing through a channel; modeling the clock of a synchronous system was another. Unfortunately, implementing multiway rendezvous is much more complicated than implementing two-way rendezvous, yet we found that most communication in shim programs is point-to-point, so we are left with a painful choice: slow down the common case to accommodate the uncommon case, or do aggressive analysis to determine when we can assume point-to-point communication. We would like to return shim to point-to-point communication only but provide multiway rendezvous as a sort of syntactic sugar, e.g., by introducing extra processes responsible for communication on channels. How to do this correctly and elegantly remains an open question, unfortunately. 8.3

Exceptions

Exceptions have been an even more painful feature than multi-way rendezvous. They are extremely convenient from a programming standpoint (e.g., shim’s rudimentary i/o library wraps each program in an exception to allow it to terminate gracefully; virtually every compiler testcase includes at least a single exception), but extremely diﬃcult to both implement and reason about. We have backed away from exceptions for now (all our recent work addresses the exception-free version of shim); we see two possibilities for how to proceed.

Concurrency and Communication: Lessons from the SHIM Project

285

One is to restrict the use of exceptions so that the complicated case of multiple, concurrent exceptions is simply prohibited. This may prohibit some interesting algorithms, but should greatly simplify the implementation, and probably also analysis, of exceptions. Another alternative is to turn exceptions into syntactic sugar layered on the exception-free shim model. We always had this in the back our minds: an exception would just put a process into an unusual state where it would communicate its poisoned state to any process that attempts to communicate with it. The problem is that the complexity tends to grow quickly when multiple, concurrent exceptions and scopes are considered. Again, exactly how to translate exceptions into a simpler shim model remains an open question. 8.4

Semantics and Static Analysis

We feel we have proven one central hypothesis of the shim project: that simple, deterministic semantics helps both programming and automated program analysis. That we have been able to devise truly eﬀective mechanisms for clever code generation (e.g., static scheduling) and analysis (e.g., deadlock detection) that can gain deep insight into the behavior of programs vindicates this view. The bottom line: if a programming language does not have simple semantics, it is really hard to analyze its programs quickly or precisely. We have also validated the utility of scheduling independence. Our test suite, which consists of many parallel programs, has reproducible results that lets us sleep at night. We have found few cases where the approach has limited us. Algorithms where there is a large number of little, variable-sized, but independent pieces of work to be done do not mesh well with shim’s schedulingindependent philosophy as it currently stands. The obvious way to handle this is to maintain a bucket of tasks and assign each task to a processor once it has ﬁnished its last task. The order in which the tasks is performed, therefore, depends on their relative execution rates, but this does not matter if the tasks are independent. It would be possible to add scheduling-independent task distribution and scheduling to shim (i.e., provided the tasks are truly independent or, equivalently, conﬂuent); exactly how is an open research question. 8.5

Buﬀers

That buﬀering is mandatory for high-performance parallel applications is hardly a revelation; we conﬁrmed it anyway. The shim model has always been able to implement fifo buﬀers (e.g., Figure 3), but we have realized that they are sufﬁciently fundamental to be a ﬁrst-class type in the language. We are currently working on a variant of the language that replaces pure rendezvous communication with bounded, buﬀered communication. Because it will be part of the language, it will be easier to map to unusual environments, such as the dma mechanism for inter-core communication on the cell processor.

286

8.6

S.A. Edwards

Other Applications

The most likely future role of shim will be as inspiration for other languages. For example, Vasudevan has ported its communication model into the Haskell functional language [27] and proposed a compiler that would impose its schedulingindependent view of the work on arbitrary programs [28]. Certain shim ideas, such as scheduling analysis [29], have also been used in ibm’s x10 language.

Acknowledgments Many have contributed to shim. Olivier Tardieu created the formal semantics, devised the exception mechanism, and instigated endless (constructive) arguments. Jia Zeng developed the static recursion removal algorithm. Nalini Vasudevan has pushed shim in many new directions; Baolin Shao has just started pushing. The nsf has supported the shim project under grant 0614799.

References 1. Edwards, S.A.: Experiences teaching an fpga-based embedded systems class. In: Proceedings of the Workshop on Embedded Systems Education (wese), Jersey City, New Jersey, September 2005, pp. 52–58 (2005) 2. Edwards, S.A.: Shim: A language for hardware/software integration. In: Proceedings of synchron, Schloss Dagstuhl, Germany (December 2004) 3. Berry, G., Gonthier, G.: The Esterel synchronous programming language: Design, semantics, implementation. Science of Computer Programming 19(2), 87–152 (1992) 4. Benveniste, A., Caspi, P., Edwards, S.A., Halbwachs, N., Guernic, P.L., de Simone, R.: The synchronous languages 12 years later. Proceedings of the IEEE 91(1), 64–83 (2003) 5. van Berkel, K.: Handshake Circuits: An Asynchronous Architecture for vlsi Programming. Cambridge University Press, Cambridge (1993) 6. Kahn, G.: The semantics of a simple language for parallel programming. In: Information Processing 74: Proceedings of ifip Congress 74, Stockholm, Sweden, pp. 471–475. North-Holland, Amsterdam (1974) 7. Edwards, S.A., Tardieu, O.: Deterministic receptive processes are Kahn processes. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (memocode), Verona, Italy, July 2005, pp. 37–44 (2005) 8. Tardieu, O., Edwards, S.A.: Specifying conﬂuent processes. Technical Report cucs– 037–06, Columbia University, Department of Computer Science, New York, USA (September 2006) 9. Buck, J.T.: Scheduling Dynamic Dataﬂow Graphs with Bounded Memory using the Token Flow Model. PhD thesis, University of California, Berkeley (1993); Available as ucb/erl M93/69 10. Parks, T.M.: Bounded Scheduling of Process Networks. PhD thesis, University of California, Berkeley (1995); Available as ucb/erl M95/105 11. Hoare, C.A.R.: Communicating sequential processes. Communications of the ACM 21(8), 666–677 (1978) 12. Lin, B.: Software synthesis of process-based concurrent programs. In: Proceedings of the 35th Design Automation Conference, San Francisco, California, June 1998, pp. 502–505 (1998)

Concurrency and Communication: Lessons from the SHIM Project

287

13. Edwards, S.A., Tardieu, O.: Shim: A deterministic model for heterogeneous embedded systems. In: Proceedings of the International Conference on Embedded Software (Emsoft), Jersey City, New Jersey, September 2005, pp. 37–44 (2005) 14. Edwards, S.A., Tardieu, O.: Shim: A deterministic model for heterogeneous embedded systems. IEEE Transactions on Very Large Scale Integration (vlsi) Systems 14(8), 854–867 (2006) 15. Edwards, S.A., Tardieu, O.: Eﬃcient code generation from Shim models. In: Proceedings of Languages, Compilers, and Tools for Embedded Systems (lctes), Ottawa, Canada, June 2006, pp. 125–134 (2006) 16. Vasudevan, N., Edwards, S.A.: A jpeg decoder in Shim. Technical Report cucs– 048–06, Columbia University, Department of Computer Science, New York, USA (December 2006) 17. Tardieu, O., Edwards, S.A.: R-shim: Deterministic concurrency with recursion and shared variables. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (memocode), Napa, California, July 2006, p. 202 (2006) 18. Tardieu, O., Edwards, S.A.: Scheduling-independent threads and exceptions in Shim. In: Proceedings of the International Conference on Embedded Software (Emsoft), Seoul, Korea, October 2006, pp. 142–151 (2006) 19. Edwards, S.A., Zeng, J.: Static elaboration of recursion for concurrent software. In: Proceedings of the Workshop on Partial Evaluation and Program Manipulation (pepm), San Francisco, California, January 2008, pp. 71–80 (2008) 20. Holzmann, G.J.: The model checker spin. IEEE Transactions on Software Engineering 23(5), 279–294 (1997) 21. Cimatti, A., Clarke, E.M., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., Tacchella, A.: NuSMV 2: An openSource tool for symbolic model checking. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 359–364. Springer, Heidelberg (2002) 22. Vasudevan, N., Edwards, S.A.: Static deadlock detection for the schim concurrent language. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (memocode), Anaheim, California, June 2008, pp. 49–58 (2008) 23. Shao, B., Vasudevan, N., Edwards, S.A.: Compositional deadlock detection for rendezvous communication. In: Proceedings of the International Conference on Embedded Software (Emsoft), Grenoble, France (October 2009) 24. Vasudevan, N., Edwards, S.A.: Buﬀer sharing in csp-like programs. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (memocode), Cambridge, Massachusetts (July 2009) 25. Edwards, S.A., Vasudevan, N., Tardieu, O.: Programming shared memory multiprocessors with deterministic message-passing concurrency: Compiling Shim to Pthreads. In: Proceedings of Design, Automation, and Test in Europe (date), Munich, Germany, March 2008, pp. 1498–1503 (2008) 26. Vasudevan, N., Edwards, S.A.: Celling Shim: Compiling deterministic concurrency to a heterogeneous multicore. In: Proceedings of the Symposium on Applied Computing (sac), Honolulu, Hawaii, March 2009, vol. III, pp. 1626–1631 (2009) 27. Vasudevan, N., Singh, S., Edwards, S.A.: A deterministic multi-way rendezvous library for Haskell. In: Proceedings of the International Parallel and Distributed Processing Symposium (ipdps), Miami, Florida, April 2008, pp. 1–12 (2008) 28. Vasudevan, N., Edwards, S.A.: A determinizing compiler. In: Proceedings of Program Language Design and Implementation (pldi), Dublin, Ireland (June 2009) 29. Vasudevan, N., Tardieu, O., Dolby, J., Edwards, S.A.: Compile-time analysis and specialization of clocks in concurrent programs. In: de Moor, O., Schwartzbach, M. (eds.) CC 2009. LNCS, vol. 5501, pp. 48–62. Springer, Heidelberg (2009)

Location-Aware Web Service by Utilizing Web Contents Including Location Information YongUk Kim, Chulbum Ahn, Joonwoo Lee, and Yunmook Nah Department of Computer Science and Engineering, Dankook University, 126 Jukjeon-dong, Suji-gu, Yongin-si, Gyeonggi-do, 448-701, Korea {yukim,ahn555,jwlee}@dblab.dankook.ac.kr, [email protected]

Abstract. Traditional search engines are usually based on keyword-based retrievals, where location information is simply treated as text data, thus resulting in incorrect search results and low degree of user satisfaction. In this paper, we propose a location-aware Web Service system, which adds location information to web contents, usually consisting of text and multimedia information. For this purpose, we describe the system architecture to enable such services, explain how to extend web browsers, and propose the web container and the web search engine. The proposed methods can be implemented on top of traditional Web Service layers. The web contents which include location information can use their location information as a parameter during search process and therefore they can increase the degree of search correctness by using actual location information instead of simple keywords. Keywords: Location-Based Service, Web Service, GeoRSS, search engine, GeoWEB.

1 Introduction With the rapid development of related technologies, Web has become an important medium to provide and share information. Various groups from different industries produce and use web contents. Especially, the means of information providing and sharing have been enlarged from large scale Web Service providers into small scale and even individual providers by Web 2.0 technology [1]. The Web now deals with contents such as detail information closely related with casual life of ordinary people as well as traditional contents related with professional research and commercial promotion. The volume of information is much larger than the data volume handled by any other media. We usually produce and consume web contents related with our normal lives while moving continuously in our real life. The production and consumption of information related with specific locations are also ever increasing. It is now very common to search for restaurants by typing ‘the most favorite Spaghetti restaurant near Kangnam subway station’ on the keyword box of search engines, which then make search engines to check for user blogs and return appropriate results. But, such location information is usually treated as simple text and such retrievals related with positional information are usually processed by only text matching techniques, resulting in incorrect search results. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 288–295, 2009. © IFIP International Federation for Information Processing 2009

Location-Aware Web Service by Utilizing Web Contents

289

Search engines usually return huge volume of unnecessary sites because they simply contain the same keyword with the given positional information. In this paper, we propose how to extend web contents so that their own location information can be built in them. The proposed method prevents location information from being treated as simple text and allows the exact web contents having strong relationship with the given location can only be retrieved. Here, the web contents mean the general extension of HTML/CSS contents [2]. The proposed system can collect location information from extended web contents and search and provide web contents related with the given query location. To support and utilize location-aware web contents, we propose how to extend web browsers and how to build web containers and web search engines. The remainder of this paper is organized as follows. The common problems of current search systems are described in Section 2. Section 3 shows the overall structure of location-aware Web Service system and describes detail structures and behaviors of the proposed system. Section 4 describes some implementation details and experimental results. Finally, section 5 concludes the paper.

2 Overview In the location-related web contents retrieval by keyword matching, the location of contents are included in the search keyword or input form and that query is processed by simple keyword matching. Let’s consider the query ‘Kangnam station BBQ house.’ The term ‘Kangnam station’ is the positional information describing the location of the contents and ‘BBQ house’ is the normal query term. In the current search engines, the term ‘Kangnam station’ and ‘BBQ house’ are all handled as text information and documents including both query terms are included in the search result. The term ‘Kangnam station’ has a meaning related with position, but it is treated as a simple keyword without any special consideration on its meaning during the search processing. In the current web contents service, there is no difference between the location-related web contents retrieval and the keyword-based retrieval. The problems become more severe when multiple location-related keywords are randomly contained in keyword-based retrievals. In such case, it is very difficult to eliminate unrelated documents. For example, documents that contain information on the Subway Line 2 will contain the names of 44 subway stations and such documents can be treated as documents having 44 location-related keywords, even though these documents are not directly related with 44 specific locations. Therefore, the search engines will return all the documents containing information related with the Subway Line 2 for the query asking about one location among 44 subway stations of the Subway Line 2, resulting in incorrect search results and low degree of user satisfaction. The local search services are services provided by content providers, which show the contents related with specific location on the map. The portal sites, such as Naver and Daum, provide such services, by showing the map on the one side of the screen, while listing the neighboring store list on the other side of the screen. The listed stores are the ones that have direct contracts with the portal sites. When users select a link, the summary information containing the store name, the address and the telephone number of the selected store appears on the pop-up layer [3, 4].

290

Y. Kim et al.

The local search service of Naver shows the map of ‘Kangnam station’ on the left side of the screen and displays the store list, with telephone number and grade, in alphabetical order with alphabet balloon symbol on the right side of the screen. When users select a link, the review information is displayed. The search results consist of information provided by contents providers and the short summary information is only provided instead of full web contents. The contents of the search results depend on the contents providers and, therefore, the information provided by local search services is not enough to users in terms of volume and quality. To relieve this problem, some providers also provide links to the review information posted by users in blog services. But, the main problem of this approach is that it only provides the information intentionally prepared by the contents providers and it shows only quick summary and review instead of full web pages. The research on the Geospatial Web (or Geoweb), which means the combination of the location-based information and the Web, was started in 1994 by U.S. Army Construction Engineering Research Laboratory [5]. This research can be divided into subareas, such as geo-coding, geo-parsing, GCMS(Geospatial Contents Management Systems) and retrievals. The geo-coding is a technique to verify the location of a document by other tools, by recording the location information within the document as shown in Figure 1. For geo-coding, EXIF information of image files, meta information of web sites and meta tag information on text or picture data can be used [6, 7]. GPS Latitude : 57 deg 38' 56.83" N GPS Longitude : 10 deg 24' 26.79" W GPS Position : 57 deg 38' 56.83" N, 10 deg 24' 26.79" W Fig. 1. Example of EXIF information of JPEG picture

The geo-parsing is a technique which translates location information in text into real spatial coordinates, thus enabling spatial query processing [8, 9]. For example, if users input ‘Hannam-dong Richensia North 50m’, that location information is parsed into the real coordinate. The Geospatial Contents Management Systems supports location information by extending traditional CMS. The Geospatial Web technologies support coordinate-based and region-based retrieval and they provide location information of individual documents or domains. However, these technologies depend on specific documents or specific domains and they put focus on region-based retrieval.

3 Structure of Location-Aware Web Service Systems The overall structure of the system to easily change and service general web contents according to the location information is shown in Figure 2. This system consists of the Client, the Web Container and the Search Engine. The Web Container module manages and transfers web contents with location information to provide location-aware web contents to users. It supports both static pages and dynamic web applications. It is able to notify update status of contents made by web applications actively to the Search Engine by using pingback methods. The correctness of information returned by the Search Engine can be improved by this mechanism.

Location-Aware Web Service by Utilizing Web Contents

291

Fig. 2. The location-aware Web Service system structure

The Search Engine module collects information from the Web Container and allows users to retrieve location-aware web contents. It recognizes changes in dynamic web contents by using the Pingback Retriever and periodically collects static documents by invoking Web Crawlers using the Timer. The information is updated by the Information Updater. Location-based queries are delivered through the Query Handler and the Information Finder finally processes these queries. The Client is the module for users to use location-aware web contents and it provides tools to support easy retrieval. It is an extension of common web browsers and includes modules, such as Map Generator, Location Information Parser, Query Generator, Location Detector, etc. It is able to show the location of contents intuitively to users by using Location Information Parser and Map Generator. The Query Generator is used to allow users more convenient location-aware retrieval. The Location Detector is a module which captures location information by utilizing external hardware or Web Services. This module can directly get location information by using GPS or devices supporting Global Navigation Satellite System, like Beidou [10]. Also, it can indirectly get location information by using Web Services supporting Geolocation API Spec [13], such as Mozilla Firefox Geode [11] and Yahoo FireEagle [12]. Currently, the utilization of the Location Detector is not high because general PCs do not provide location information. However, it will be possible to provide more exact location information when the IP and GPS information are more commonly utilized. 3.1 Web Contents Extension to Support Location-Aware Information In the specification of traditional Web contents standards, such as HTML and XHTML[14], the markups to specify location information are not included and the

292

Y. Kim et al.

namespaces for such extension are not also predetermined. Therefore, the format of web contents needs to be extended to deal with location-aware information. One method is to add new markups to the HTML/XHTML format and another method is to include links to external documents. In the previous cases, the formats of web contents were usually extended by linking external documents using tag [15]. However, the standard committees have recently turned their attitude in a conservative way and they are eliminating the indiscriminately added markup tags, such as <marquee> and <embed>, to manage namespaces clearly [16]. The method adding new markup tags will face some difficulties in maintaining future web contents. Therefore, in this paper, we extended the markup tag to include location information in GeoRSS [17] format, as shown in Figure 3. 45.256-71.92 Fig. 3. Location information format

The <where> markup tag, having the ‘georss’ namespace, is used to represent the object containing the location information. The markup tags located within the <where> tag are used to describe location using geospatial languages, such as GML (Geography Markup Language) [18]. Documents holding location information are referenced within HTML/XHTML documents using markup tag and the file type of such documents is ‘application/geoweb.’ The documents having this file type can interpret data in GeoRSS format. We will represent coordinates using WGS84 format [19], which is one of coordinate formats supported by GeoRSS. The WGS84 format is consistent with the GPS format and, thus, it can be effectively used to support interoperability with mobile devices. Also, that format can be easily transformed into the KATECH TM128 coordinate developed by the National Geographic Institute. 3.2 Web Client The Web Client provides facilities for web contents retrieval and visually shows documents stored in the Web Container with location information. Figure 4 shows the interaction between the Web Client and the Web Container. The Web Client module which is an extension of web browsers consists of HTML Renderer, Location Information Parser and Map Generator as shown in Figure 4. When users provide URI of contents (1 of Figure 4), the HTML Renderer sends request to and receive the required contents from the Web Container (2 and 3). The Web Client then receives the location information (5 and 6) and generates the appropriate map (7, 8, 9). The steps from 5 to 9 are repeated if there are more external links in the web contents.

Location-Aware Web Service by Utilizing Web Contents

User

HTML Renderer

Location Info. parser

1 : Type URI()

Map Generator

293

Web Container

2 : Request the web page.() 3 : web page 4 : Get map info.()

loop exist out links?

5 : Request the location.() 6 : Location 7 : Request a map.() 8 : Generate a map() 9 : map

10 : Render all contents.() 11 : all contents

Fig. 4. Basic operations of the Web Client

Web Container

Pingback Retriever

1 : Notify update()

Web Crawler

Info. Updater

2 : Crawl it()

3 : Request the web contents() 4 : HTML page and location info. 5 : Update() loop frequently 6 : Request the changed set() 7 : HTML/XHTML & location info.

8 : update()

Fig. 5. Information collection by the Search Engine

3.3 Search Engine When some information is updated by Web applications, the Web Container notifies that fact to the Search Engine (1 of Figure 5). The Pingback Retriever makes request to the Web Crawler to check the updated web contents. The Web Crawler then visits the Web Container to see the updated information (2, 3, 4) and updates the information in the database (5). The Web Crawler continuously visits the Web Container, checks the newly updated information and downloads such updates, according to the timer event (6, 7, 8).

294

Y. Kim et al.

The Search Engine can handle keyword-based queries, coordinate and keywordbased queries, location and keyword-based queries and keyword plus keyword-based queries.

4 Implementation and Experiments Both the Web Container and the Search Engine are implemented on the same computer, using AMD Athlon 64 Dual Core 4000+, 1GB memory, Ubuntu 4.1.2-16 Linux 2.6.22-14-server version, Apache2 and MySQL 5.0.45. The Web Client is developed on a computer with Pentium 4, 1GB memory, Windows XP, Firefox 3.0.8. The proposed functions are implemented using NPAPI [20] and extended features [21]. We compared the search results of location and keyword-based queries by using general search engines and the proposed Search Engine. The query is finding information related with ‘Subway Line 2.’ We first got results using general search engines and then got improved results using location-aware web contents search engine. Table 1. Filtering rate and error rate

Filtering rate Error rate

N search engine 66.67% 3.33%

D search engine 47.22% 10.56%

Y search Engine 53.33% 15.56%

As shown in Table 1, we can eliminate 66.67% unnecessary search results by N search engine. However, there still exist wrong answers (3.33%), which can’t be eliminated by location information and which contain the given keywords meaningless in the search results. The most common meaningless results were caused by the keywords contained in the titles of the bulletin boards included in the web pages.

5 Conclusion In this paper, we proposed a location-aware Web Service system, which adds location information to traditional web contents. We explained the overall structure of location-aware Web Service system, which consists of the Web Client, the Web Container and the Search Engine. We also described detail structures and behaviors of the proposed systems. The Web Container module manages and transfers web contents with location information to provide location-aware web contents to users. The Search Engine module collects information from the Web Container and allows users to retrieve location-aware web contents. The Web Client provides facilities for web contents retrieval and visually provides documents stored in the Web Container with location information. The proposed methods can be implemented on top of traditional Web Service layers. The web contents which include location information can use their location information as a parameter during search process and therefore they can increase the degree of search correctness by using actual location information instead of simple keywords. To show the usefulness our schemes, some experimental results were shortly provided.

Location-Aware Web Service by Utilizing Web Contents

295

Acknowledgments. This research was supported by the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center support program supervised by the Institute of Information Technology Advancement (grant number IITA-2008-C1090-0801-0031). This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant number R01-2007-000-20958-0 funded by the Korea government (MOST). This research was also supported by Korea SW Industry Promotion Agency (KIPA) under the program of Software Engineering Technologies Development and Experts Education.

References 1. Millard, D., Ross, M.: Web 2.0: Hypertext by Any Other Name? In: Proc. ACM Conference on Hypertext and Hypermedia, pp. 22–25. ACM Press, New York (2006) 2. HTML Spec., http://www.w3.org/TR/REC-html40/ 3. Daum local information, http://local.daum.net/ 4. Naver local information, http://local.naver.com/ 5. An Architecture for Cyberspace: Spatialization of the Internet, http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.37.4604 6. geo-extension-nonWGS84, http://microformats.org/wiki/geo-extension-strawman 7. Geographic registration of HTML documents, http://tools.ietf.org/id/draft-daviel-html-geo-tag-08.txt 8. NGA GEOnet Names Server, http://earth-info.nga.mil/gns/html/ 9. U.S. Board on Geographic Names, http://geonames.usgs.gov/domestic/index.html 10. Beidou, http://www.globalsecurity.org/space/world/china/beidou.htm 11. Mozilla Firefox Geode, http://labs.mozilla.com/2008/10/introducing-geode/ 12. Yahoo FireEagle, http://fireeagle.yahoo.net 13. Geolocation API Spec., http://dev.w3.org/geo/api/spec-source.html 14. XHTML Spec., http://www.w3.org/TR/xhtml11/ 15. Important change to the LINK tag, http://diveintomark.org/archives/2002/06/02/ important_change_to_the_link_tag 16. How to upgrade markup code in specific cases: <embed>, , <marquee>, , https://developer.mozilla.org/en/Using_Web_Standards_in_your _Web_Pages/ 17. GeoRSS(Geographically encoded objects for RSS), http://www.georss.org/ 18. GML Specification, http://portal.opengeospatial.org/modules/admin/ 19. WGS 84 Implementation Manual, EUROCONTROL and ifEN (1998) 20. NPAPI, https://developer.mozilla.org/en/Gecko_Plugin_API_Reference 21. Firefox Extensions, https://developer.mozilla.org/En/Extensions

The GENESYS Architecture: A Conceptual Model for Component-Based Distributed Real-Time Systems Roman Obermaisser and Bernhard Huber Vienna University of Technology, Austria

Abstract. This paper proposes a conceptual model and terminology for component-based development of distributed real-time systems. Components are built on top of a platform, which offers core platform services as the basis for the implementation and integration of components. The core platform services enable emergence of global application services of the overall system out of local application services of the constituting components. Therefore, the core platform services provide elementary capabilities for the interaction of components, such as message-based communication between components or a global time base. Also, the core services are the instrument via which a component creates behavior that is externally visible at the component interface. In addition, the specification of a component’s interface builds upon the concepts and operations of the core platform services. The component interface specification constrains the use of these operations and assigns contextual information (e.g., semantics in relation to the component environment) and significant properties (e.g., reliability requirements, energy constraints). Hence, the core platform services are a key aspect in the interaction between integrator and component developer.1

1 Introduction It is beyond doubt that complex embedded real-time systems can only be reasonably built by following a component-oriented approach. The overall system is divided into components that can be independently developed and serve as building blocks for the ensuing computer system. A platform is required both as a baseline for the component developers and for the integrator to combine the independently developed components to the overall system. For the latter purpose, the platform needs mechanisms for the interaction between components. However, these elementary interaction mechanisms not only serve for the coupling between components. Beyond that, the elementary interaction mechanisms of the platform determine how component behavior comes into existence. The use of these mechanisms is the instrument via which a component generates behavior that is externally visible. From the point of view of a component’s interface to the platform, the use of these elementary interaction mechanisms (along with associated meaning in the application context) is what constitutes a component. In this paper, we define the notion of a platform and its elementary interaction mechanisms that we denote core platform services. We argue that the core platform services 1

This work has been supported in part by the European research project INDEXYS under the Grant Agreement ARTEMIS-2008-1-100021.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 296–307, 2009. c IFIP International Federation for Information Processing 2009

A Conceptual Model for Component-Based Distributed Real-Time Systems

297

are the result of emergence from local implementations of platform functionality at the components in conjunction with shared platform functionality (e.g., network switches, network-on-a-chip). On their behalf, the core platform services permit the interaction between application services and enable another form of emergence: the emergence of application services out of more elementary local application services of components. The presented conceptual model is a result of discussions on embedded system architectures within the GENESYS project and the ARTEMIS Strategic Research Agenda. Experts from five domains of embedded systems (i.e., automotive, avionics, industrial control, mobile systems, consumer electronics) were involved in devising the underlying concepts. The model serves as the conceptual basis of the cross-domain architecture developed within the GENESYS project. The contribution of this paper is the analysis and conceptualization of the relationship between embedded platforms and applications. In this regard, we go beyond related work on specific component-based frameworks (e.g., CORBA [17], AUTOSAR [19] to name two widely-used frameworks). We analyze platform services and their implications on the development of application services, ranging from platform capabilities, to instruments for behavior generation, to concepts for specifying behavior.

2 Basic Concepts This section introduces fundamental concepts that will be used in the paper. System. We use the definition of a system introduced in [2]: an entity that is capable of interacting with its environment and is sensitive to the progression of time. The environment is in principle another system. The environment takes advantage of the existence of a system by producing input for the system and acting on the output of the system. A system combines physical and logical aspects. As a consequence, behavior can be associated with a system taking into account both the value and time domain. This definition excludes, for example, a software module without associated hardware. In general, systems are hierarchic and can on their behalf be recursively decomposed into sets of interacting constituting systems. The constituting elements of a system are denoted as components. Time. The temporal awareness of systems requires a model of time. We assume a model based on Newtonian time, where the continuum of real-time can be modeled by a directed timeline consisting of an infinite set of instants [20]. In a distributed computer system, the progression of time is captured with a set of physical clocks. Since any two physical clocks will employ slightly different oscillators, clock synchronization is required to bring the time of clocks into close relation with respect to each other. A measure for the quality of clock synchronization is the precision, which is defined as the maximum offset between any two clocks during an interval of interest. Service. A service is what a system delivers to its environment according to the specification. Through its service, a system can support the environment, i.e., other systems that use the service. The specification for a system defines the service. Given a concrete paradigm of interaction between systems, the notion of a service can be refined. For example, in context of message-based interaction, the service of a system can be defined

298

R. Obermaisser and B. Huber

LIF

Local Appl. Services

LIF

Component

Local Appl. Services

Component

Local Appl. Services

Component

Component

Global Appl. Services

LIF

Local Appl. Services LIF Components

Core Platform Services

Platform

Fig. 1. Architectural Model

as the sequence of intended messages that is produced by a system in response to the progression of time, input and state [2, page 28]. An overview of formalisms for the definition of services in different interaction paradigms can be found in [4] (e.g., Statecharts, Specification and Description Language (SDL)). Behavior. In the presence of faults (e.g., design fault in the implementation), a system can violate its specification. In this case, the system exhibits a failure [1] instead of its specified service. We use the term externally visible behavior (or behavior for short) as a generalization of the notions of service and failure: behavior = service ∪ failure. The correct behavior (as defined by the specification) is the system’s service. The faulty behavior (violation of the specification) is a failure. In the absence of a specification, we can only reason about the behavior of systems.

3 Architectural Model The overall system consists of two types of constituting systems: a set of components and an underlying platform (cf. Figure 1). Based on this system structure, we can distinguish different types of services: global application services of the overall system, local application services of components, and platform services. 3.1 Users of the Architectural Model When developing a system that follows this architectural model, we can differentiate two roles: component developers and the integrator. The component is the unit of delegation and the unit of integration. Using the platform, the integrator is responsible for binding together the components to an overall system with global application services. The platform offers the means to integrate the components based on the specification of the components’ local application services. The component developers are concerned with the design and implementation of individual components. A component developer can delegate subtasks for the realization of a component to suppliers/subcontractors. Nevertheless, the component developer delivers an entire component with a local application service to the integrator. The integrator need not be aware of the inner structure of the component and the involvement of suppliers/subcontractors.

A Conceptual Model for Component-Based Distributed Real-Time Systems

299

3.2 Components and Application Services A component is a self-contained building block of the computer system. The borderline between a component and the platform is called the component’s Linking Interface (LIF) [9]. At the LIF, the component provides its local application services to the other components. A local application service is the intended behavior of a component at the LIF. The component exchanges information with other components at the LIF and the specification of the local application service must cover all aspects that are relevant for the integration of the component with other components: (1) Values. The syntax of the information exchanged at the LIF needs to be defined. In addition, relationships between inputs and outputs are specified. (2) Timing: In a real-time system, the specification of the LIF encompasses temporal constraints, e.g., for consuming inputs or producing outputs. (3) Relationship to the (natural) environment of the computer system: For each component that interacts with the environment of the computer system, the LIF specification must capture the (semantic) relationship between the information exchange at the LIF and the interaction with the environment. While abstracting from the details of the component’s local interfaces (e.g., I/O interfaces or fieldbuses), the semantics of the LIF interaction in relation to the component environment need to be described. Due to the inability to fully formalize the relationship to the environment [10], natural language or ontologies are examples of suitable specification methods. For example, information provided at the LIF to other components can resemble entities in the natural environment with a given delay and originate from a sensor. Likewise, information consumed at the LIF can cause an effect on the natural environment via an actuator. In addition to the value domain, this relationship must be specified in the temporal domain. For example, the lag between sensory information at the LIF and the state of the environment is of concern (cf. temporal accuracy [8]). In addition, many other aspects can be relevant for the specification of the LIF, e.g., reliability, energy, security. 3.3 Platform and Platform Services The platform is the foundation for development and integration of components. It consists of platform services, which we denote as core platform services (in order to discriminate them from platform-like services within components that we will call optional platform services). More precisely, a platform is essential for two reasons: (1) Baseline for development of components. Using the platform services, the component developer establishes the local application services of a component. Component developers need a starting point for realizing components. The platform offers a foundation on top of which application-specific functionality can be established. This foundation consists of generic services, which are required to be useful in many specific components. Although most of these services could also be realized within the components, their availability in the platform simplifies the component development. As an example, consider a sensor component that periodically samples the lateral acceleration in a car and produces a message on the LIF with this measurement. The local application service of this component would be ’acceleration measurement’. An

300

R. Obermaisser and B. Huber

example of a platform service that can be used to construct this application service would be a time service. Such a time service can provide the periodic sampling points (e.g., with respect to a global time base). Using platform services, recurring problems are solved once-and-for-all in the platform without the need to redevelop them in every component. Principally, the development of components becomes easier if more functionality is offered by the platform. However, overloading the platform with a plethora of functionality is likely to lead to a high overhead. The reason for the overhead is that part of the functionality will be too specific to be applicable except for a few very specialized components. Furthermore, the complexity of the platform will increase. Thus, the likelihood of design faults in the platform will increase. Such a design fault in the platform is of particular severity. While a design fault of a component would affect this specific component, potentially all components can be affected by a design fault in the platform. All components build upon the platform and depend on its correctness. The issue of the complexity of the platform and the susceptibility to design faults is of particular importance for safety-critical applications that need to be certified. (2) Framework for integration of components. Besides serving as the baseline for the establishment of local application services of components, the platform services are an instrument for emergence. The platform services enable the emergence from local application services of the components to global application services of the system. Therefore, the platform offers mechanisms to compose the overall system out of the independently developed components. These mechanisms include communication services enabling the exchange of information between components. In addition, other services can serve as a useful basis for integration, e.g., fault isolation services [12] that prevent component failures from propagating between components or clock synchronization services [13] to establish a common notion of time. Following up on the previous example, the introduced component can be integrated with other in-vehicle components, thereby composing the speed measurement service with local application services of other components. The result is the emergence of a global application service (e.g., passive safety in the example with the lateral acceleration measurement) out of local application services (e.g., brakes, steering, and suspension in addition to the mentioned acceleration measurement service).

4 Core Platform Services This section characterizes the constituting elements of the platform: the core platform services. The introduced concept is exemplified using the GENESYS architecture. 4.1 Three Roles of Platform Services: Platform Capabilities, Instrument for Behavior Generation and Concepts for Service Specification A core platform service is an elementary building block of the platform. Inversely, the set of core platform services defines the platform. Each core platform service is a capability of the platform that is offered to the components. An example of a core platform service is ’message multicasting’ that enables

A Conceptual Model for Component-Based Distributed Real-Time Systems

301

components to interact by the exchange of messages. In its simplest form, this service would consume messages from one port and transport these messages to a set of destination ports with defined properties (e.g., latency, reliability). This core platform service would enable a component to deposit messages at a port in order to be delivered via multicast communication to ports belonging to other components. Secondly, the core services are the instrument by which a component generates behavior at the LIF. The use of the core services results in activities that can be perceived at the LIF from outside the component. The core services offer elementary operations, a sequence of which forms behavior at run-time. For example, in case of the platform service ’message multicasting’, the instrument would be the ability to send messages. The instrument for behavior generation is a part of the capability. The capability is, however, more than empowering the component to generate behavior. The capability is also concerned with the reaction to the component behavior in the platform. In particular, the capability links behavior at different components (e.g., output of one component becomes input of another component). In addition, the core services provide the underlying concepts for the specification of component services. This statement about behavior relates to behavior on a meta-level. Given a set of core services, different service specification languages are possible that use these concepts. 4.2 Core Platform Services of the GENESYS Architecture To give examples of core platform services, we will give an overview of the services of the GENESYS architecture [16]: (1) Periodic and sporadic messaging. The GENESYS architecture uses the communication paradigm of messaging. The concept of a message is an atomic structure that is formed for the purpose of inter-component communication. A messages encompasses data and control information, where the control information addresses timing, acknowledgement, and addressing. In addition, the syntax of the data is defined and the message is ascribed meaning in the application context. Messaging has been chosen as a core platform service of GENESYS, because messaging provides an ideal abstraction level for embedded real-time systems. In particular, messages offer inherent consistency, synchronization, and the timing is explicit compared to shared memory abstractions. Nevertheless, messaging is a universal model and other communication paradigms can be realized on top of messages (e.g., a virtual shared memory [11]). Depending on the message timing, the core service distinguishes periodic and sporadic messages. The timing of periodic messages is defined by a period and phase. Sporadic messages possess a minimum message interarrival time. In addition to the concept of messaging (as explained above), the platform service is associated with a platform capability and an instrument for behavior generation. The capability associated with this platform service is the transport of messages from a specific sender-component to a set of receiver-components (i.e., multicast). The capability involves non-functional properties, such as the the reliability (e.g., probability for message omission failure) and temporal properties (e.g., end-to-end latency, bandwidth). The instrument for behavior generation is the ability for the transmission of a message, i.e., the production of a message by the sender-component. In case of periodic

302

R. Obermaisser and B. Huber

communication, the sender-component performs an update-in-place of a state message. For sporadic communication, the sender-component places an event message into an outgoing message queue. (2) Global time. The concept of a global time is a counter value that is globally synchronized across components. If the counter value is read by a component at a particular point in time, it is guaranteed that this value can only differ by one tick from the value that is read at another component at the same point in time. The concept of a global time base enables service specifications with temporal constraints w.r.t. global time (e.g., adherence to the action lattice of a sparse time base [7], specification of phase of periodic messages, etc.). In addition, activities at different components can be temporally coordinated, e.g., avoiding collisions by design or creating phase-aligned transactions [14]. The capability of the platform is the execution of a clock synchronization algorithm. For example, in case of a distributed clock synchronization algorithm, local views regarding the global time are exchanged. Thereafter, a convergence function is applied and the local clocks at the components are adjusted (e.g., using rate correction). The instrument for behavior generation comprises the use of the global time in activities at the LIF, such as the transmission or reception of a message at specific instants of the global time base. (3) Network diagnosis and management. The membership vector contains globally consistent information about the operational state of every component (e.g., correct or faulty) within a given membership delay. The platform capability involved in this core platform service is the construction of a membership vector. Firstly, the platform performs error detection for individual components. For example, the platform can determine whether the temporal properties of the messages are satisfied. Secondly, the platform needs to perform agreement on a globally consistent view on the operation state of the components. A membership vector offers an instrument to generate behavior by taking different actions depending on the operational state of components. For example, in a brake-by-wire car with four components associated with the four wheels of the car, components can react differently to messages with braking actuation values when they possess knowledge about the failure of the component at another wheel. (4) Reconfiguration. Reconfiguration is concerned with adapting the platform depending on application contexts, the occurrence of faults (e.g., permanent failure of a component) and the availability of resources (e.g., low battery power). To accomplish these goals, the other platform services are altered within given limits, e.g., by introducing a new message or altering the timing in the first core service. The capability of the platform involves the processing of triggers for reconfigurations, such as the acceptance of requests for reconfigurations or monitoring resource availability. Secondly, reconfiguration needs to be executed, e.g., by creating a new communication schedule for periodic messaging. The instruments for behavior generation include the operations to request reconfiguration. For example, a component can request the ability to send an additional periodic message. 4.3 Realization of Platform Services Two structural elements can be distinguished in the realization of the platform: component-specific and shared constituents of the platform (cf. Figure 2).

A Conceptual Model for Component-Based Distributed Real-Time Systems Networked Node

Networked Node

Component

Component

LIF

LIF

Component-specific Platform Constituent

Component-specific Platform Constituent

303

Application

Platform

(Realized Node-Internally)

(Realized Node-Internally)

Technology-Dependent Interface

Technology-Dependent Interface

Shared Platform Constituent (Realized Node-Externally)

Fig. 2. Deployment of Core Platform Services: A computer system consists of networked nodes (nodes for short) that are conjoined through a shared platform constituent

Firstly, the platform comprises a shared part that is not specific to any component in particular. This shared platform constituent includes the communication medium for the communication between the components. The communication medium can be a bus (e.g., CAN), a routed network (e.g., IP network with routers), a starcoupler (e.g., FlexRay starcoupler), or even air or vacuum in case of a wireless connection. In addition to the shared part of the platform, each component is associated with a component-specific platform constituent. Core platform services can be realized using distributed implementations. For example, a common time service usually depends on a distributed clock synchronization capability. The common time service provides each component with access to a local clock, which is repeatedly adjusted to ensure a bounded precision in relation to all other clocks of the system. In addition to distributed implementations of core platform services, the componentspecific platform constituents serve a second purpose. They map the technologyindependent LIF to a technology-dependent interface. The platform maps the technology-independent abstractions provided by the core platform services (e.g., messaging concept) to the physical world (e.g., a specific communication protocol and physical layer). The benefit of a technology-independent LIF is the abstraction over different implementation technologies, which are not fixed when the LIFs are specified. Thus tradeoffs between different implementation technologies can be performed later in the development process. For example, a general purpose CPU implementation offers more flexibility for modifications and often involves less implementation effort. An FPGA, on the other hand, can have superior non functional properties (e.g., energy efficiency, silicon area). In addition, as a system evolves the component implementation for a given LIF can be changed without requiring changes at other components. A component together with its component-specific platform constituent typically forms a self-contained physical unit (called networked node). The integrator will provide to component developers a component-specific platform constituent as a hardware/software container for the incorporation of a particular component. For this purpose, the integrator can use platform suppliers (similar to Integrated Modular Avionics (IMA) platform suppliers in the avionic domain [3]), which provide

304

R. Obermaisser and B. Huber

implementations of the core services. The networked node is the result of combining the supplier’s component with the component-specific platform constituent. During integration the networked node (e.g., an ECU in the automotive domain) is combined with the shared platform-constituent. For example, the networked node is connected to an Ethernet network in case of Ethernet as a communication medium. Figure 2 depicts the layout of the platform and its structural elements graphically. In order to illustrate the introduced concepts, a few examples will be outlined in the following: Let’s first consider wireless sensor devices as networked nodes. The component-specific platform constituent contains a wireless transceiver/receiver that maps the message-based abstraction to radio transmissions. The wireless sensor device will also contain a host (e.g., general purpose CPU, FPGA) as the component providing the application services. The shared platform-constituent can simply be the communication medium (e.g., air) or comprised of wireless base stations. A second example of a networked node is an Electronic Control Unit (ECU) in a car. The shared platform-constituent can be an automotive communication system (e.g., Controller Area Network (CAN) [6], FlexRay starcoupler [18]). The componentspecific platform constituent is a communication controller (e.g., CAN controller, FlexRay controller) to the respective communication network. Another example is an IP core on a Multi-Processor System-on-a-Chip (MPSoC). Here the shared platform-constituent is the interconnect for the communication between the IP cores, e.g., a network-on-a-chip. The component-specific platform constituent provides the access to the on-chip interconnect (e.g., Trusted Interface Subsystem (TISS) [15], Memory Flow Controller [5]). 4.4 Emergence of Core Platform Services The core platform services are emergent services of the services provided by the shared and component-specific platform constituents. Consider for example the global time service. Each component-specific platform constituent comprises a local clock synchronization capability. The local clock synchronization capability is responsible for repeatedly computing correction terms to be applied via state or rate correction to the local clock. For this purpose, the difference of the local clock to other clocks is determined via synchronization messages or the implicitly known reception times of messages. Through the interplay of the local clock synchronization capabilities, a global notion of time emerges. As discussed in Section 3.3, this emerging service can be used to specify the LIF and as a basis for the realization of application services. Likewise, the periodic message transport service is an emerging service. The periodic message transport service is more than the sum of the capabilities of the component-specific platform constituents to perform periodic transmissions/receptions of messages.

5 Component Model From the point of view of the LIF, a component provides application services expressed w.r.t. the core platform services. The integration of a system out of components is helped through the existence of the core platform services. The core platform services introduce a uniform instrument across all components for generating component behavior.

A Conceptual Model for Component-Based Distributed Real-Time Systems

305

However, component developers can favor different sets of platform services to express application behavior. Firstly, legacy applications have been developed for different platforms with different sets of platform services. In order to avoid a complete redevelopment of these legacy applications, a mapping of the legacy platform services to the uniform core platform services is desirable. In addition, different domains can have unique requirements regarding the capabilities of the platform. For example, a safetyrelated control subsystem can build upon platform services for active redundancy. A multimedia system, on the other hand, might have to cope with a large number of different configurations and usage scenarios. Hence, a multimedia system can need dynamic reconfiguration capabilities that go beyond the support of the core platform services. We introduce a layered component model in order to resolve this discrepancy between uniform core services and the need for application-specific platform services. For providing application services, the component can employ an intermediate form of the application services: application services expressed w.r.t. optional platform services. 5.1 Optional Platform Services On top of the core services, the optional platform services establish higher-level capabilities for certain domains (e.g., control systems, multimedia). Besides, the optional services within a component establish an instrument for behavior generation just as the core services do. In contrast to the core platform services, the optional platform services provide additional constructs that are not always needed or useful in all types of components. The optional services reflect the heterogeneity of a system by no longer enforcing uniformity of platform services throughout the system. A particular optional platform service can prove to be useful in one component, whereas the deployment of this service in another component might impede the component development. Consider for example an optional service for dynamic reconfiguration, which would make difficult the certification of a safety-related component. The instrument for behavior generation of the optional platform services must have a defined mapping to the underlying instrument for behavior generation of the core platform services. Hence, the optional platform services transform the application services towards the core platform services. 5.2 Equivalence of Optional Services and Application Services In the following we will point out that optional platform services are application services that have been internalized in a component. An application service exploits another application service by using the core platform services. In contrast, an application service exploits the optional platform services directly, i.e., without using the core platform. However, from a logical point of view it is equivalent whether a particular capability is introduced into a system via an application service or via an optional platform service. Figure 3 depicts that a certain capability α can be provided by an optional platform service or as an application service in a supporting component.

306

R. Obermaisser and B. Huber Component

Component Implementation of platform capability as supporting component

vs Implementation of platform capability as optional platform service

OSs CSs

OS CSs

Fig. 3. Optional Platform Service vs. Supporting Component

Preexisting Standard Components. A preexisting standard component is a component that provides an application service that can be used for the construction of other application components. The integrator defines which preexisting standard components exist in a system and provides their LIF specifications to the suppliers. Preexisting standard components are visible to the integrator and their services are expressed w.r.t. to the core platform services. An example of a preexisting standard component would be a voting component for a Triple Modular Redundancy (TMR) configuration. This component accepts three redundant messages and outputs a single message based on a majority decision. Another example of a preexisting standard component would be an encryption component. Such a component accepts ’plaintext’ messages and outputs encrypted messages by applying a suitable ciphering method (e.g., DES). Such a component would facilitate the interaction between a subsystem with trusted components and a subsystem with untrusted components (e.g., a public network or the Internet). Optional Platform Services. The component developer is responsible for the optional platform services. For the integrator, who combines components based on their LIFs, the optional platform services are invisible. The integrator cannot discern a component that implements the LIF behavior directly from a component that employs optional platform services.

6 Conclusion The contribution of the paper is a conceptual model for component-based distributed real-time systems. In particular, the notion of a platform in relation to components has been analyzed. The platform enables the emergence of global application services out of local application services of components. Starting from a decomposition of a platform into core platform services, we have elaborated on the implications for the design and implementation of components. The core platform services represent the concepts for the specification of interfaces, the instrument for the generation of component behavior, and the capabilities for the interaction of components. In addition, different organizational roles in a component-based development were analyzed: the integrator, component suppliers and platform suppliers.

A Conceptual Model for Component-Based Distributed Real-Time Systems

307

References 1. Avizienis, A., Laprie, J., Randell, B.: Fundamental concepts of dependability. In: Proc. of ISW 2000. 34th Information Survivability Workshop, pp. 7–12. IEEE, Los Alamitos (2000) 2. Jones, C., et al.: DSoS conceptual model. DSoS Project, IST-1999-11585 (December 2002) 3. Conmy, P., McDermid, J., Nicholson, M., Purwantoro, Y.: Safety analysis and certification of open distributed systems. In: Proc. of the Inter. System Safety Conference Denver (2002) 4. Davis, A.M.: A comparison of techniques for the specification of external system behavior. Communication of the ACM 31 (1988) 5. IBM, Sony, and Toshiba. Cell broadband engine architecture. Technical report (2006) 6. Int. Standardization Organisation, ISO 11898. Road vehicles – Interchange of Digital Information – Controller Area Network (CAN) for High-Speed Communication (1993) 7. Kopetz, H.: Sparse time versus dense time in distributed real-time systems. In: Proc. of 12th Int. Conference on Distributed Computing Systems, Japan (June 1992) 8. Kopetz, H.: Real-Time Systems, Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Boston (1997) 9. Kopetz, H., Suri, N.: Compositional design of rt systems: a conceptual basis for specification of linking interfaces. In: Proc. of the Sixth IEEE Int. Symposium on Object-Oriented RealTime Distributed Computing, pp. 51–60. IEEE, Los Alamitos (2003) 10. Kopetz, H., Suri, N.: On the limits of the precise specification of component interfaces. In: Proc. of the Ninth IEEE Int. Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2003), pp. 26–27 (2003) 11. K¨uhn, E.: Virtual Shared Memory for Distributed Architectures. Nova Science Pub. Inc. (2002) 12. Lala, J.H., Harper, R.E.: Architectural principles for safety-critical real-time applications. Proc. of the IEEE 82(1), 25–40 (1994) 13. Lamport, L.: Using time instead of timeouts for fault-tolerant distributed systems. ACM Trans. on Programming Languages and Systems, 254–280 (1984) 14. Obermaisser, R., El-Salloum, C., Huber, B., Kopetz, H.: Modeling and verification of distributed real-time systems using periodic finite state machines. Journal of Computer Systems Science & Engineering 22(6) (November 2007) 15. Obermaisser, R., El Salloum, C., Huber, B., Kopetz, H.: The time-triggered system-on-a-chip architecture. In: Proc. of the IEEE Int. Symposium on Industrial Electronics (2008) 16. Obermaisser, R., El Salloum, C., Huber, B., Kopetz, H.: Fundamental design principles for embedded systems: The architectural style of the cross-domain architecture GENESYS. In: Proc. of the 8th IEEE ISORC, Tokyo, Japan, March 2009, pp. 3–11 (2009) 17. Object Management Group, Needham, MA 02494, U.S.A. The Common Object Request Broker: Architecture and Specification (July 2002) 18. Belschner, R., et al.: FlexRay – requirements specification. Technical report, BMW AG., DaimlerChrysler AG., Robert Bosch GmbH, and General Motors/Opel AG (2002) 19. Rolina, T.: Past, present, and future of real-time embedded automotive software: A close look at basic concepts of AUTOSAR. In: Proc. of SAE World Congress (April 2006) 20. Whitrow, G.J.: The Natural Philosophy of Time, 2nd edn. Oxford University Press, Oxford (1990)

Approximate Worst-Case Execution Time Analysis for Early Stage Embedded Systems Development Jan Gustafsson1, , Peter Altenbernd2, , Andreas Ermedahl1, , and Bj¨ orn Lisper1, 1

School of Innovation, Design and Engineering, M¨ alardalen University, V¨ aster˚ as, Sweden {jan.gustafsson,andreas.ermedahl,bjorn.lisper}@mdh.se 2 Department of Computer Science, University of Applied Sciences, Darmstadt, Germany [email protected]

Abstract. A Worst-Case Execution Time (WCET) analysis finds upper bounds for the execution time of programs. Reliable WCET estimates are essential in the development of safety-critical embedded systems, where failures to meet timing deadlines can have catastrophic consequences. Traditionally, WCET analysis is applied only in the late stages of embedded system software development. This is problematic, since WCET estimates are often needed already in early stages of system development, for example as inputs to various kinds of high-level embedded system engineering tools such as modelling and component frameworks, scheduling analyses, timed automata, etc. Early WCET estimates are also useful for selecting a suitable processor configuration (CPU, memory, peripherals, etc.) for the embedded system. If early WCET estimates are missing, many of these early design decisions have to be made using experience and “gut feeling”. If the final executable violates the timing bounds assumed in earlier system development stages, it may result in costly system re-design. This paper presents a novel method to derive approximate WCET estimates at early stages of the software development process. The method is currently being implemented and evaluated. The method should be applicable to a large variety of software engineering tools and hardware platforms used in embedded system development, leading to shorter development times and more reliable embedded software.

Supported by the ALL-TIMES FP7 project, grant no. 2215068, by the KKfoundation grant 2005/0271, and by the Swedish Foundation for Strategic Research (SSF), via the strategic research centre PROGRESS. Supported by Deutscher Akademischer Austauschdienst (DAAD).

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 308–319, 2009. c IFIP International Federation for Information Processing 2009

Approximate Worst-Case Execution Time Analysis

1

309

Introduction

Embedded systems are special-purpose computer systems designed to perform one or a few dedicated functions, often with timing constraints. They are usually embedded into devices including hardware and mechanical parts. Diﬀerent types of (and even parts within) embedded systems may have different timing requirements. Some safety-critical embedded systems have hard real-time requirements, i.e., failures to meet timing deadlines can have catastrophic consequences. For these, a safe (i.e., never underestimated) WCET of the software is a key measure. 1.1

The Need for Early WCET Estimates

The speciﬁc properties of real-time embedded systems puts certain demands on the development of such systems. Often, hardware and software are developed in parallel (hardware-software co-design). Since embedded systems often are in a high-volume market, it is important for those systems to choose a suitable processor conﬁguration (CPU, memory, peripherals, etc.) which is just powerful enough, in order not to use a too costly hardware. Thus, embedded systems have smaller resource margins than, e.g., desktop computers, and special attention has to be given to the timing of the software. Consequently, it is important to assess the worst-case timing (i.e., the WCET) of the software to be able to choose a suitable processor conﬁguration. There are several software development models to be used for embedded realtime systems, e.g., the traditional waterfall model (Requirements – Design – Development – Testing), or the more advanced V-Model [1] shown in conceptual form in Fig. 1. In the V-model, the requirements, architecture and design in the left part of the model will be veriﬁed in the activities in the right part of the model (integration, test, veriﬁcation, system veriﬁcation and validation). This is also true for the timing domain of the system. For example, the timing requirements deﬁned early in the model (on the left side) will be veriﬁed against

Fig. 1. The V-Model (graph from [1])

310

J. Gustafsson et al.

the system on the same level in the model on the right side. Software timing decided during module design will be tested during module test etc. Early WCET estimates would be useful in the early stages of real-time embedded systems development (like the requirements, architecture and design stages in the V-model) for a number of reasons. Modern embedded system development typically includes a large variety of software engineering tools, such as modelling and component frameworks, schedulability analysis, timed automata, etc. The tools are used to, e.g., decide how tasks should be generated from larger software components or models, how to distribute tasks to computer nodes, what hardware to use on the diﬀerent nodes, what priorities to assign to diﬀerent tasks, etc. To be able to model, validate and verify the real-time properties of early sketches of the system, these tools need to associate some type of execution time bounds to their inherent high-level code constructs. Existing WCET analysis methods often fail to provide early stage WCET estimates. The main reason is that the following requirements must be fulﬁlled, which is often not the case at the early stages of the project: − the program to be analysed must be compiled and linked to an executable binary, and − either a useful input data set must be available, and the actual hardware (or a timing accurate simulator) must be available in a setup that allows for correct measurements, or − a (safe) timing model of the actual hardware must be available. See further Section 2. For modern embedded system development this is a problem, since as described above, good approximate WCET estimates are needed already during early system development. Moreover, if the WCET bounds derived on the executable program violates the timing bounds assumed in earlier development stages, it may result in costly system re-design, including code re-implementation, re-testing and re-analysis. It is a well-known fact that the cost for correcting functional errors and doing larger system changes becomes higher at later stages of the development process [2,3]. There are all reasons to believe that this cost curve also holds for errors in the timing domain. 1.2

Paper Contribution and Content

In this paper, we will present an analysis method to provide early approximate WCET estimates. The method consists of two main steps; 1) timing model identiﬁcation using a set of programs, and 2) WCET estimate calculation using ﬂow analysis and the derived timing model. Our method is not guaranteed to always provide safe WCET bounds. However, we believe that during early phases of system development it is often suﬃcient to provide reasonably accurate WCET estimates. Such estimates will allow the system designer to make a good initial system model, where the approximate WCET estimates of diﬀerent code parts are taken into account. At later development stages a more precise WCET analysis tool can be used, allowing the rough WCET es-

Approximate Worst-Case Execution Time Analysis

311

timates and the initial system model to be reﬁned. The main advantages of our method, when compared to existing WCET analysis methods, are the following: − no timing model of the actual hardware has to be available; it will be created in the model identiﬁcation step. − the actual hardware/simulator only has to be available during an initial timing model identiﬁcation step. − model identiﬁcation only has to be performed once for each compiler/hardware combination. − the program for which a WCET estimate will be calculated does not have to be executable; it does not even have to be compiled. The rest of this paper is organized as follows. Section 2 gives an overview of WCET analysis methods. Section 3 gives short descriptions of related work in the area. Section 4 presents the proposed method. Section 5 describes the implementation that we are currently working on. Finally, in Section 6, we discuss the approach and give ideas for evaluation and further research.

2

Worst-Case Execution Time Analysis

A Worst-Case Execution Time (WCET) analysis ﬁnds an upper bound to the worst possible execution time of a program. WCET analysis must handle the fact that a program typically has no ﬁxed execution time. Diﬀerent execution time may occur for diﬀerent inputs, due to the characteristics of the software, as well as of the computer upon which the software is run. Thus, both the possible inputs as well as the properties of the software and the hardware must be considered in order to predict the WCET of a program. An overview of WCET analysis is given in [4]. Here, we will give just a short introduction of the three main approaches used: measurements, static timing analysis, and hybrid analysis. Measurements, also known as dynamic timing analysis, is the traditional way to determine the timing of a program. Basically, the program is executed many times with diﬀerent inputs and the execution time is measured for each test run. For WCET analysis, there are problems connected with measurements: The methods often means labor-intensive and error-prone work, and even worse, it cannot guarantee that the WCET has been found. This is because each measurement exercises only one path. For most programs, the number of possible execution paths is immense, and therefore too large for exhaustive testing. Also it is, in general, very hard to ﬁnd the worst case input. This means that the set of inputs may not include the worst case path, and thus the method may underestimate the WCET. WCET measurements require: − the program to be analysed can be compiled and linked to an executable binary. This requires that the program is ”ﬁnished” in some sense (it must compile and link without errors, and execute to completion), − an input data set which covers all of the program paths (or as many as possible, hopefully including the WCET path) is available, and

312

J. Gustafsson et al.

− the actual hardware (or a timing accurate simulator) is available in a setup that allows for correct measurements. Static timing analysis estimates the WCET of a program without actually running it. The analysis avoids the need to run the program by simultaneously considering the eﬀects of all possible inputs, including possible system states, together with the program’s interaction with the hardware. The analysis relies on mathematical models of the software and hardware involved. Given that the models and the calculation never make underestimations, the result is a safe timing estimate that is greater than or equal to the actual WCET. Static WCET analysis is usually divided into three phases: a ﬂow analysis where information about the possible program execution paths is derived, a low-level analysis where the execution time for atomic parts of the code (e.g., instructions, basic blocks or larger code sections) is decided from a model of the target architecture, and a ﬁnal calculation phase where the derived ﬂow and timing information are combined into a resulting WCET estimate. Due to the complexity of today’s software and hardware, both ﬂow- and lowlevel analysis may result in over-approximations, e.g., reporting too many paths as feasible or assigning too large timings to instructions. Thus, the calculation will give a safe but potentially pessimistic WCET value. Static WCET analysis requires: − that the analysed program can be compiled and linked to an executable binary, and − that a (safe) timing model of the hardware is available, something that can be very hard and costly to provide. Depending on the type of static analysis, more requirements may be added. If the static analysis is based on binary code, a binary decoder must be developed for the used binary format in order to re-generate the control ﬂow graph of the program. This ﬂow graph is used by the static analysis. Other tools do ﬂow analysis on source code level or intermediate code level. For these tools, some kind of compiler support is necessary. A possibility to add manual annotations for, e.g., ﬂow constraints, is useful as a complement to the ﬂow analysis. This may reduce the overestimation of the analysis. Some tools use annotations to describe input data limits for the same purpose. Hybrid analysis techniques combine measurements and static analyses techniques. The tools use measurements to extract timing for smaller program parts, and static analysis to deduce the ﬁnal program WCET estimate from the program part timings. Since measurements are used, the resulting WCET estimates are not guaranteed to be safe. Hybrid WCET analysis requires: − that the analysed program can be compiled and linked to an executable binary, − an input data set which covers all of the program paths (or as many as possible, hopefully including the WCET path) is available, and − the actual hardware (or a timing accurate simulator) is available in a setup that allows for correct measurements.

Approximate Worst-Case Execution Time Analysis

3

313

Related Work

The idea of early timing analysis has been explored in a number of variants. The TimingExplorer [5] developed by AbsInt [6] is an assisting tool to ﬁnd early timing estimates to be able to decide suitable hardware architectures for realtime systems. The available source code is compiled and linked for each of the cores in question. The user also has the possibility to specify diﬀerent memory layout and cache properties. Each resulting executable is then analysed with the TimingExplorer to get a WCET estimate for the chosen core and hardware architecture. TimingExplorer uses a run-time optimised analysis to quickly produce WCET estimates. It should be noted that TimingExplorer trades precision against, e.g., speed. It is also not certain that the estimates are safe. Even though the goals and the achievements of the TimingExplorer approach are similar to ours, there are some diﬀerences: TimingExplorer requires a hardware model, something which is not needed in our approach. Also, the source code for the programs for which WCET estimates are to be found does not have to be compiled and linked in our approach. This means that it probably can be applied easier and earlier than TimingExplorer. On the other hand, TimingExplorer will probably give more precise WCET estimates than our method. Another interesting approach for early stage WCET analysis is the integration of the aiT WCET analysis tool [6] into the SCADE [7,8], and ASCET [9] software tools. The use of these tools gives the possibility to generate code, and possibly WCET estimates, early from sketches of the system. Their approach requires support from diﬀerent software tools for keeping a mapping between the high-level code constructs and the corresponding binary code snippets. Moreover, aiT’s low-level analysis must have been ported to the used target processor. Thus, the approach should be less portable than ours. Moreover, it can only be applied rather late in the system development process, since aiT requires the full binary program to be available for its WCET analysis. No ﬂow analysis is performed on the high-level code level. Lisper and Santos has recently developed a new kind of regression method, based on end-to-end measurements of programs, where the resulting timing model is guaranteed to not underestimate any observed execution times [19]. This is similar to the model identiﬁcation used in our method. In contrast to our method, the Lisper/Santos method works completely on the binary level. There are some earlier eﬀorts to do early stage WCET analysis upon Petri nets [10], Statechart models [11], and Matlab/Simulink models [12]. Compared to our approach, all these approaches are less portable and require much support from the high-level development tool. There have been approaches to WCET analysis on Java. Persson et al. [13] assign timing costs to Java constructs using an attribute grammar. Bernat et al. [14] do analysis on the Java Byte Code (JBC) level and provide a mechanism for giving compiler/language independent WCET annotations in the Java source code. Bate et al. [15] extended the latter approach to also include a timing model for JBC instructions. Compared to our approach, all these approaches rely on manual ﬂow annotations and manual assignments of times to constructs.

314

J. Gustafsson et al.

Bartlett et al. [16] use program traces to derive parametric upper loop bounds for WCET analysis. Their work, and the measurement-based approach to WCET analysis used by Rapita systems [17], both have similarities to our idea of deriving a timing model by program runs. Franke [18] uses measurements and regression analysis to derive a reasonably timing-accurate simulator for assembly code instruction sets, thus having some similarities to our timing model derivation approach.

4

Early Stage WCET Analysis

Our method is based on the same main phases as ordinary static WCET analyses; i.e., a ﬂow analysis, a low-level analysis, and a ﬁnal calculation phase. In particular, the method combines a worst-case oriented ﬂow analysis with the usage of an approximate timing model for low-level analysis, both working on code constructs available during early phases of embedded system development. The method is based on the assumption that the possible ﬂows through a program is similar independent of which code level the program is represented in. For example, an upper limit on the number of times a certain code construct is taken in the high-level code is normally mimicked as a limit on the corresponding low-level instructions. However, a direct mapping between the high level code and the low-level code is not always possible, due to compiler optimisations. The timing behaviour of a program on diﬀerent code levels is more complex. It is often not possible to assign constant timing to high-level code constructs due to the timing variability caused by diﬀerent hardware features. Moreover, compiler optimisations may make the mapping between diﬀerent code levels very hard. Instead, our method derives an approximative timing model for high-level code constructs by systematic measurements. As mentioned in Section 1, our method for obtaining early stage WCET estimates includes two main steps: 1) Timing model identiﬁcation using a set of programs, and 2) WCET estimate calculation using ﬂow analysis and the derived timing model. 4.1

Timing Model Identification

We propose a timing model identiﬁcation using a set of programs to derive an approximate (but not guaranteed safe) timing model for diﬀerent code constructs found in program code. The code level to be used for the model identiﬁcation, hereafter called timing model code level (tmcl), can be on any level of choice. It can be the high-level code itself, or some code possible to generate from the highlevel code in one or more steps. For example, it can be a modelling language, C, C++ or some other source code format, intermediate code (maybe emitted by a compiler or some modelling tool), or even assembler. The idea is that by identifying a timing model for the tmcl code, the WCET analysis can be performed directly on this code, rather than analyzing the compiled binary and a precise timing model on the binary level. This eliminates the

Approximate Worst-Case Execution Time Analysis

315

need to compile the code for the target system. The higher the level of the code, the earlier the analysis can be done, but on the other hand the WCET estimates are likely to be less precise. The timing model identiﬁcation is made by a combination of measurements and regression analysis as described in the following. We propose to identify a linear timing model n t= ci τi i=1

for n code constructs. These can, for instance, be arithmetic/logic operators, program variable accesses, statements altering the program ﬂow, or possibly more complex constructs that recur often enough. The model assumes a constant execution time τi to each construct i, ci is the number of times the construct is executed, and t is the execution time predicted by the model. The accuracy of such a simple model will depend on how closely the selected code constructs correspond to (ﬁxed sequences of) instructions in the corresponding binary, compiled for the target machine. The model will typically be derived for a given combination of target hardware, compiler, and possibly also compiler options (like optimisation level). The model identiﬁcation selects the execution times τi to ﬁt the model as well as possible to a set of given runs on the target machine. An overview of timing model identiﬁcation is shown in Fig. 2. First, a test set of programs and corresponding sets of inputs is selected and executable binaries are generated using the compiler of interest. For each pair of input and program, two runs are made (similar to [19]): − the binary is run, and the actual execution time is recorded, and − the tmcl code is run, with the same inputs, by an interpreter or similar that records the number of times each code construct is executed. For each such (double) run j we obtain an actual execution time tj , and, for each code construct i, a number of executions cij . According to the linear timing model, we obtain an equation n tj = cij τi i=1

If we make m such runs, we obtain a linear equation system t = Cτ where the coeﬃcients in the m × n-matrix C are the recorded execution counts cij . If the timing model happens to be exact, then the system has a unique solution. Most likely it is not. Then, if the number of linearly independent row vectors in C is at least n, we can perform some kind of regression to ﬁnd a “best” solution τ ∗ that in some sense minimizes the deviation of the execution times predicted by the model and the actual execution times. A possible choice is the usual linear regression, or least-squares method [20]. Another choice is the max regression proposed in [19], which has the property that the model never will underestimate any observed execution times. The resulting timing model will depend both on how the source code is compiled to executable code, e.g., on the compiler and its options, on the translation

316

J. Gustafsson et al.

!!!"

Fig. 2. Timing model identification

to tmcl format, as well as on the characteristics of the diﬀerent hardware features, such as caches and pipelines of the used hardware. Ideally, if we have a 1-1 mapping between the high level code and the binary code, and if the code has basic blocks with constant execution times, we may get a very good ﬁt. When this is not the case, the ﬁt will be less good, but probably still good enough to give a WCET estimate that is helpful in early design stages. 4.2

Early Stage WCET Estimate Calculation

The approximate WCET for the program of interest will be derived using ﬂow analysis, the approximate timing model derived in the model identiﬁcation step, and a ﬁnal WCET calculation. Neither the executable of the program, nor the hardware/simulator is required during this step (see Fig. 3). Note that the derived WCET for the program is valid for the combination of target hardware, compiler, and compiler options that was used in the model identiﬁcation step.

!!!"

Fig. 3. Early stage WCET estimate calculation

Approximate Worst-Case Execution Time Analysis

317

First, the analysed program is translated to tmcl format. We then perform a ﬂow analysis of the program, using the tmcl representation of the program and (optionally) constraints on input variables to the program. The ﬂow analysis generates ﬂow constraints on basic blocks and edges in the tmcl code. The approximative low-level analysis uses the tmcl form of the program and the table of the best possible ﬁt of timing costs to the diﬀerent code constructs (generated in the model identiﬁcation step). While scanning through the program, execution time estimates for basic blocks will be generated and saved. Finally, we can derive a WCET estimate using some existing WCET calculation method, see e.g., [4], combining the results of the previous steps.

5

Implementation

The method described in this paper is currently being implemented by the WCET group at M¨ alardalen University [21] in the SWEET (SWEdish Execution time Tool) WCET analysis tool. The ALF (ARTIST2 Language for WCET Flow Analysis) language [22] has been selected as the tmcl format. ALF is a language intended for ﬂow analysis for WCET calculation. SWEET includes an ALF interpreter, which output statistics of ALF construct exercised during a program run. Thus, the timing model identiﬁcation will assign timing to diﬀerent constructs found in the ALF code. To perform the measurement runs, we plan to test both some clock-cycle accurate simulators and some hardware equipped with time-measurement facilities. We will also do the ﬂow analysis on ALF, using SWEET’s powerful ﬂow analysis methods [23,24,25]. The result will be given as constraints on the number of times diﬀerent basic blocks of the ALF code can be executed and edges of the control ﬂow graph can be followed. The ﬁnal calculation of the WCET estimate will be made using one of SWEET’s diﬀerent calculation methods [26].

6

Discussion and Future Work

The method will be evaluated using WCET benchmarks and industrial code. This ﬁrst version of the method uses linear equations which are exact only if the tmcl code constructs always have the same corresponding code in the binary representation, and that the high code constructs always have the same execution time. We will in, a systematic way, study the consequences of deviations from these requirements. The experiences we will get from the evaluation will hopefully give some ideas on how to extend the method. There are a number of interesting ideas to explore, and some of them has been mentioned in the paper. Other ideas include: − More precise model generation. It may be possible to do a context sensitive and more precise model where ALF constructs have diﬀerent costs depending on execution context, e.g., whether it is the ﬁrst loop iteration or not

318

− −

−

−

J. Gustafsson et al.

that is executed, since cache eﬀects can give diﬀerent execution times for the corresponding ”real” instructions in the binary. How will compiler optimisations and other code transformations aﬀect the accuracy? Code transformations make the mapping between the high-level code constructs and the binary program more complex. How may diﬀerent hardware features aﬀect the accuracy? Modern processors use many performance-enhancing features, like caches, pipelines, branchprediction, out-of-order execution, etc., all which may cause the execution time for instructions to vary. How well will timing eﬀects of such types of features be captured by the resulting timing model? Are some types of hardware features more troublesome than others? Selection of model identiﬁcation sets. How many programs and input data sets are needed for getting an accurate timing model? How similar must the test programs used to generate the timing model be to the program for which WCET estimates will be derived? How to handle a mix of source code formats? An embedded system project may involve a variety of source code formats, including diﬀerent high-level languages, C or C++ code, or even assembler. An interesting property of our implementation is that if programs can be translated to ALF, they are analysable by the method. We expect a number of formats to be able to translate to ALF.

References 1. Wikipedia: V-Model (2009), http://en.wikipedia.org/wiki/V-Model 2. Boehm, B.W.: Software Engineering Economics. Prentice Hall PTR, Upper Saddle River (1981) 3. Westland, C.J.: The cost of errors in software development: evidence from industry. Journal of Systems and Software 62, 1–9 (2002) 4. Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenstr¨ om, P.: The worst-case execution time problem — overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems (TECS) 7, 1–53 (2008) 5. Nenova, S., K¨ astner, D.: Source Level Worst Case Timing Estimation and Architecture Exploration in Early Design Phases. In: Holsti, N. (ed.) Proc. 9th International Workshop on Worst-Case Execution Time Analysis (WCET 2009), Dublin, Ireland, pp. 12–22 (2009) (Preliminary proceedings) 6. AbsInt: aiT tool homepage (2008), http://www.absint.com/ait 7. Scade: Homepage for Scade tool suite from Esterel (2009), http://www.esterel-technologies.com/products/scade-suite 8. Souyris, J., Pavec, E.L., Himbert, G., Jegu, V., Borios, G., Heckmann, R.: Computing the worst case execution time of an avionics program by abstract interpretation. In: Wilhelm, R. (ed.) Proc. 5th International Workshop on Worst-Case Execution Time Analysis, WCET 2005 (2005) 9. Ascet: Homepage of ETAS’ ascet tool chain (2009), http://www.etas.com/en/products/ascet_software_products.php

Approximate Worst-Case Execution Time Analysis

319

10. Stappert, F.: From Low-Level to Model-Based and Constructive Worst-Case Execution Time Analysis. PhD thesis, Faculty of Computer Science, Electrical Engineering, and Mathematics, University of Paderborn, C-LAB Publication, Vol. 17. Shaker Verlag, Aachen (2004), ISBN 3-8322-2637-0 11. Erpendbach, E., Altenbernd, P.: Worst-case execution times and schedulability analysis of statecharts models. In: Proc. 11th Euromicro Conference of Real-Time Systems, pp. 70–77 (1999) 12. Kirner, R., Lang, R., Freiberger, G., Puschner, P.: Fully automatic worst-case execution time analysis for Matlab/Simulink models. In: Proc. 14th Euromicro Conference of Real-Time Systems (ECRTS 2002), Washington, DC, USA (2002) 13. Persson, P., Hedin, G.: Interactive execution time predictions using reference attributed grammars. In: Proc. of the 2nd Workshop on Attribute Grammars and their Applications (WAGA 1999), Netherlands, pp. 173–184 (1998) 14. Bernat, G., Burns, A., Wellings, A.: Portable Worst-Case Execution Time Analysis using Java Byte Code. In: Proc. 12th Euromicro Conference of Real-Time Systems (ECRTS 2000), Stockholm, pp. 81–88 (2000) 15. Bate, I., Bernat, G., Murphy, G., Puschner, P.: Low-level Analysis of a Portable Java Byte Code WCET Analysis Framework. In: Proc. 7th International Conference on Real-Time Computing Systems and Applications (RTCSA 2000), pp. 39–48 (2000) 16. Bartlett, M., Bate, I., Kazakov, D.: Guaranteed loop bound identification from program traces for WCET. In: Proc. 15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2009), San Francisco, CA. IEEE Computer Society, Los Alamitos (2009) 17. Rapitime: Rapitime white paper (2009), http://www.rapitasystems.com/system/files/RapiTime-WhitePaper.pdf 18. Franke, B.: Fast cycle-approximate instruction set simulation. In: Falk, H. (ed.) Proc. 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2008), Munich, pp. 69–78 (2008) 19. Lisper, B., Santos, M.: Model identification for WCET analysis. In: Proc. 15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2009), San Francisco, CA, pp. 55–64. IEEE Computer Society, Los Alamitos (2009) 20. Chatterjee, S., Hadi, A.S.: Regression Analysis by Example, 4th edn. John Wiley & Sons, Chichester (2000) 21. M¨ alardalen University: WCET project homepage (2009), http://www.mrtc.mdh.se/projects/wcet 22. Gustafsson, J., Ermedahl, A., Lisper, B., Sandberg, C., K¨ allberg, L.: ALF – a language for WCET flow analysis. In: Proc. 9th International Workshop on WorstCase Execution Time Analysis (WCET 2009), Dublin, Ireland, pp. 1–11 (2009) 23. Gustafsson, J., Ermedahl, A., Sandberg, C., Lisper, B.: Automatic derivation of loop bounds and infeasible paths for WCET analysis using abstract execution. In: Proc. 27th IEEE Real-Time Systems Symposium, RTSS 2006 (2006) 24. Sandberg, C., Ermedahl, A., Gustafsson, J., Lisper, B.: Faster WCET flow analysis by program slicing. In: Proc. ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES 2006), pp. 103–112 (2006) 25. Ermedahl, A., Sandberg, C., Gustafsson, J., Bygde, S., Lisper, B.: Loop bound analysis based on a combination of program slicing, abstract interpretation, and invariant analysis. In: Rochange, C. (ed.) Proc. 7th International Workshop on Worst-Case Execution Time Analysis (WCET 2007), Pisa, Italy (2007) 26. Ermedahl, A.: A Modular Tool Architecture for Worst-Case Execution Time Analysis. PhD thesis, Uppsala University, Dept. of Information Technology, Uppsala University, Sweden (2003)

Using Context Awareness to Improve Quality of Information Retrieval in Pervasive Computing∗ Joseph P. Loyall and Richard E. Schantz BBN Technologies, Cambridge, MA 02138, USA {jloyall,schantz}@bbn.com

Abstract. Publish-subscribe-query information broker middleware offers great promise to users of pervasive computing systems requiring access to information. However, users of publish-subscribe-query information broker middleware face a challenge in requesting information. The decoupling of publishers and consumers of information means that a user requesting information is frequently not aware of what is available, where it comes from, and when it becomes available. Too specific a request might return no results, while too broad a request might overwhelm the user with a combination of useless and buried useful information. This paper investigates using context, such as a user’s location, affiliation, and time, to automatically improve the quality of information brokering and delivery. Augmenting an explicit client request with contextual clauses can automatically prioritize, order, and prune information so that the most useful and highest quality among the information available is delivered first. The paper provides techniques for augmenting client requests with context, techniques for combining multiple contextual aspects, and experiments evaluating the efficacy and performance of those techniques. Keywords: Context awareness, location based computing, quality of service, publish-subscribe-query information management.

1 Introduction Computing devices, including PDAs, GPSs, iPods, cell phones and laptops, as well as those embedded inside automobiles, appliances, and smart buildings, permeate our daily lives. Many of these computing devices are interconnected and more are becoming so, frequently through wireless connections and increasingly in an ad hoc manner. The devices and users are often mobile and it is not always obvious what devices are present and reachable. The ubiquity of computing and connectivity leads to unprecedented data access and variety confronting all users of mobile, embedded, and internetworked systems. Advanced information brokers have emerged to help navigate the Digital Whitewater of the expanding volume and variety of data available in ubiquitous systems. Based on publication-subscribe-query operations, these information management services offer ∗

This work was supported by the USAF Air Force Research Laboratory under ITT contract number SPO900-98-D-4000, Subcontract Number: 205344 – Modification No. 4.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 320–331, 2009. © IFIP International Federation for Information Processing 2009

Using Context Awareness to Improve Quality of Information Retrieval

321

active information management, enabling information providers to make information available (i.e., through publication) and information consumers to request future (i.e., subscriptions) or past (i.e., queries) information based on type, attributes, or content. A user requesting information is frequently not aware of what is available, where it comes from, and when it becomes available. This presents new challenges in the area of traditional quality of service (QoS), typically concerned with the management of resources such as CPU, bandwidth, and memory. In addition, the limited cognitive resources of users can be overloaded with the tasks of discovering, processing, and consuming the vast amounts of information available to them. Users of publish-subscribe-query information services face a challenge in crafting requests to discover and deliver needed information. Too specific a request might return no results, while too broad a request might overwhelm the user with information, burying the most useful results in a sea of less useful, even superfluous, information. As an example, consider a user planning a route through an urban area and who needs maps or imagery of the possible routes he is likely to take as illustrated in Fig. 1. For this user, information that is recent, that is from trusted sources, and that is delivered in the order of closeness to his current location is much more likely to be of greater use than information that is older, less trustworthy, or more distant. However, with publish-subscribe-query systems, the user does not necessarily know what information is available, when it was created, by whom, or which locations it covers. It is challenging for the user to craft a request to get the best information available. If he crafts a request asking for any imagery available of the city, he might get a large number of responses, with a mix of imagery that is recent and old, for intermixed locations close to and distant from his location, and from a variety of sources that are known and unknown. Conversely, if he crafts a request that asks for imagery in a specific timeframe, location, or from particular sources, he Fig. 1. A user planning a route through an area might receive no information. This requires information about the area without knowing all the potential sources of the inforchallenge affects the quality of inmation formation request and delivery in the following ways: •

•

Increases cognitive burden and effort on the user to examine a potentially large set of results to find the most useful ones or craft a set of precise requests that will be issued simultaneously or sequentially until useful results are received. Decreases resource efficiency, e.g., memory is needed on the user’s device to hold a large set of results and bandwidth is used to transmit a large set of results or requests only some of which are useful.

322

•

J.P. Loyall and R.E. Schantz

Adds delay due to the time required for returning and processing a large set of results or for issuing multiple increasingly refined requests and awaiting their responses.

Furthermore, when considering multiple aspects of information, the decision of what information is the most useful in response to a request is often not obvious. Returning to the example of urban route planning, which of the following images is more useful to the requesting user? • •

An image that was collected today, but miles away from the user or another image that is of the location that the user is in, but is a year old. An image that is very recent, but taken by some unknown person’s camera (and has gone through any unknown amount of doctoring), or an image much older but collected by official government or industry communication satellites.

Characteristics of the user and his situation form the context in which his explicit requests are made and are useful to help determine the characteristics of information that are the most useful. Contextual information about the user and his situation can affect what he needs, how he perceives a particular response, and information ordering. Contextual information can be used to supplement explicit requests, to refine broad requests or to broaden narrow requests, and to prune, prioritize, and sort information to better suit the user’s situation and needs. Automatically gathering context and using it as part of information requests and dissemination is an important step towards a more proactive computing vision for embedded and ubiquitous computing environments. Information Management Services (IMS). As a starting point frame of reference for the investigation into context aware capabilities, and to seed prototype experimentation, we use a US Air Force reference implementation of IMS [1, 2, 7] developed for tactical and enterprise net-centric operations. Information is in the form of typed Managed Information Objects (MIOs) consisting of a data payload and metadata describing the object and its payload. Requests for information, i.e., subscriptions or queries, are expressed using predicates over MIO types and metadata values, expressed in query languages such as XPath or XQuery. The core set of IMS, illustrated in Fig. 2, includes the following: 1. 2. 3. 4. 5.

Submission service receives MIOs published by clients. Predicate evaluation matches the metadata of MIOs to predicates registered by information subscribers. Archive service inserts MIOs into an MIO data store. Query service processes queries consisting of predicates over the metadata of archived MIOs. Dissemination service sends query results to querying clients and MIOs that match subscription predicates to subscribing clients.

There are three places in the IMS architecture in which contextual information can be used to order and filter MIOs to improve the quality of information dissemination:

Using Context Awareness to Improve Quality of Information Retrieval

•

•

•

323

Subscribers Producers Query service Register MIO operations to afSubscription fect the order Registered Submission Service and number of Predicates 1 query results. 2 Contextual inDissemination 5 Service Archive Predicate formation can be Published MIO Evaluation used to deter3 Archive mine which reService sults to return to MIO Query results Query Query Data store the query client results Service Query predicate and the order in 4 which to return them to be of Query clients most use, i.e., of highest quality, to the client. Fig. 2. The services involved in information management that Once a suffican benefit from using context information cient number of high quality results have been returned, the remainder can be pruned. Dissemination service queues to affect the order of delivery of MIOs. When the outgoing bandwidth is insufficient to keep up with the number of MIOs that need to be disseminated and the number of clients to whom they need to be disseminated, the dissemination service enqueues the MIOs. Contextual information can be used to order and filter the MIOs in these queues. Registered subscription predicates to prioritize and refine predicates improving the quality of MIO brokering. Just as contextual information can be used to refine and order the results of queries, context can be used to refine and order predicates to be evaluated.

Aspects of Context. Although there are many elements of context that can be considered, the following are particularly relevant for ubiquitous computing and mobile, tactical users, and drive our initial investigation: •

•

•

Location – A user’s position or the position of an area of interest, frequently expressed as latitude, longitude, and altitude. Many existing devices include the ability to detect their locations, using Global Positioning System (GPS) and Enhanced Observed Time of Difference (E-OTD), which provide absolute position, and WLAN and RFID, which provide relative position. Time – Whereas location indicates a user’s position in space, there is a corresponding position in time. More recent information is likely to be more useful in general, while a user’s time zone and related aspects (night or day, winter or summer) can have an effect on QoS requirements and perception. Affiliation – A user’s affiliation can also affect QoS requirements and perception, e.g., in terms of trust. Information from an affiliated source, or closely related source, is more likely to be trusted. For example, in a military situation, a warfighter is more likely to trust information from his own unit or command than from a coalition partner, and more likely to trust a coalition partner than an unknown or opposing source.

324

J.P. Loyall and R.E. Schantz

These three contextual attributes are easily quantified and represent core attributes upon which to build a context aware approach for QoS. Location is increasingly being exploited in location-based services and location aware computing [3, 4]. Time corresponds closely to the core QoS property of timeliness (i.e., latency and performance), and affiliation contributes to the core QoS property of trust. Other contextual attributes are less easily quantifiable and less directly mapped to QoS characteristics, but are useful to consider in future work as we expand our vision into the notion of generally intelligent and anticipatory information management capabilities. These include user activity (what a user is doing), user mood (e.g., is the user distracted, fatigued, impatient, or angry), situation awareness (the things going on around the user), equipment (what the user is carrying or has), user attention (the amount of attention that the user is able to provide), user preferences (what a user prefers, wants, or needs in his or her current situation), and administrative domain (the domain that a user is in or connected to). The challenges associated with inserting context awareness toward improving quality of service for information management include where and how to incorporate context awareness, selecting the aspects of QoS to incorporate, establishing the appropriate granularity of contextual values, determining the added value of combinations of contextual attributes, and how to (and how frequently to) automatically gather or update the contextual values themselves. The remainder of this paper touches upon some of these subjects as a first step in exploring this rich and emerging field of research.

2 Incorporating Context into Information Requests In this section, we explore techniques for automatically converting explicit information requests originating from a user into augmented requests that prioritize results based on just a single dimension of context, e.g., time (most recent results first), location (closest results first), or affiliation (most trusted results first). We then discuss techniques for combining contextual aspects. Request Expansion. A straightforward technique, Request Expansion, converts a single explicit user request into a set of requests, each with extra clauses that match ranges of contextual values. The full set of new requests would match all (or approximately all) the MIOs matched by the original request, but each of the new requests would return only a subset of the matches, with the first of the new set returning those MIOs that are most appropriate or desirable based on the added context, with each subsequent request in the new set returning the next set of most appropriate MIOs. For example, a single user supplied request can be turned into multiple context added requests that additionally match on • •

Time, each predicate representing a range of time, e.g., an hour, day, week, or month. Location, each predicate representing a particular distance, e.g., a mile, from a point.

This approach is easy to implement and it can be efficient. Once an expanded request has returned results, no subsequent results are going to be better from the perspective of that context, so the processing of the remaining expanded predicates can often be

Using Context Awareness to Improve Quality of Information Retrieval

325

avoided, especially if the results completely satisfy the user’s need, the resources can be better used for other requests or operations, or there is a need to reduce the cognitive overload of too many results. However, the approach comes with tradeoffs. The efficiency and effectiveness depend on the granularity chosen for the contextual attribute. This technique provides only an approximation of the optimal order of contextual value, more or less optimal based on the granularity of the contextual ranges chosen. Finer granularity results in closer to optimal order of results, but more predicates to process (resulting in lower performance). Coarser granularity takes less time to execute because there are fewer predicates to process, but the results are also more coarsely ordered. As an example, Fig. 3 shows an expanded set of requests based on location that expands from the requesting client (represented by the “R” in the center of the image) one latitude and longitude minute at a time. Any information object that matches a predicate representing an inner bounding box will be roughly closer in proximity to the requesting client than an information object matching a subsequent predicate representing an outer Fig. 3. A subscription by a client (R) can be turned into multiple bounding box, and subscriptions, each matching on the original predicate and location will be considered inside a bounding box. The closer the bounding box to R, the highhigher quality with er quality the resulting matched information is. regard to the location context. The request expansion technique addresses several of the challenges described in Section 1, namely (1) the user provides a request only about what he needs, limiting the complexity of his request; (2) some results will be returned to the client, if any are available within the ranges specified; and (3) if a best set of results are available as determined by the added context, e.g., very recent, close, or trusted, they can be prioritized accordingly. Ordering as a Feature of Query Languages. Another technique uses existing query languages, e.g., XQuery, to prioritize and sort information. We automatically add clauses to an explicitly made user request that order on ranges of contextual values. XQuery provides an “order by” operator which uses capabilities of the

326

J.P. Loyall and R.E. Schantz

underlying database to sort the responses.1 This needs to be applied in a place in the overall system design where there are a set of MIOs to sort, such as in the archive repository, dissemination service queue, or submission service queue. It is less useful in the predicate evaluation service, where only one MIO is available at a time. In contrast to the request expansion approach, this technique produces an optimal ordering among the MIOs available for sorting but is less universal. Context Templates as an Automation Mechanism. We developed a set of templates as a proof of concept approach serving two purposes. First, the templates provide a means for automatically extending explicitly provided user requests with the additional contextual clauses, using either of the approaches above. Second, the templates are composable, so that we can also easily combine multiple contextual attributes in an automated fashion. The templates take arguments indicating the predicate being extended, a Boolean indicating whether the call to the template is the first or is nested in a composed set of template calls, and additional arguments indicating granularity and starting points (e.g., current location or time). Combining Contextual Attributes. While using a single contextual attribute to focus information brokering and dissemination can improve the quality of information, the more expected case will utilize multiple aspects of context. For example, it might be useful for a user to have information that is close in location to his position, but not if it is out of date or from an untrusted source. Likewise, recent information that is distant in location from the user's position might be less useful than something a little older but much closer. What a user typically needs is information with the right combination of contextual attributes that indicates that it is of the highest value to his current goals. Supporting this, however, requires being able to key on multiple simultaneous aspects of context and to be able to compare and sort the combinations in a meaningful way. A key tradeoff in combining contextual aspects is deciding how the aspects are weighted relative to one another, i.e., whether one or more contextual aspects are more important than others, whether they are equal in importance, whether the relative importance is unknown, or whether the relative weighting is itself “context sensitive”. If there is an order of importance, then the composable templates described above can be used to sort on multiple contextual attributes sequentially. The template is called multiple times in order, sending the original user request to the first call and the result of the first call to the second, etc. This automatically generates an extended predicate that orders by the first contextual attribute followed by ordering 1

In practice, the information management services that we use utilize XPath [12] for the predicate language, instead of its superset XQuery [13], because of the latter’s ability to modify or delete archived information, a security risk that breaks the semantics of the information broker in which subscribing and querying clients only have read access to the published and archived MIOs. If this restriction is relaxed, or enforced by other means such as examining requests for specific language clauses that indicate modification or deletion, then features of XQuery, or another expressive and Turing complete [5] predicate language, can be used to order and filter results based on complex relationships, such as contextual information.

Using Context Awareness to Improve Quality of Information Retrieval

327

by the second contextual attribute, and so forth2. This works well when the contextual attributes can be totally ordered and when they are have coarse granularity. Finer grained attributes that have only one or a very few elements in each unique value will not have any effect when they sort on the additional attributes. When contextual attributes are of equal (or unknown) importance, one can examine the intersection of the matching sets for individual contextual aspects. That is, execute multiple, interleaved requests, each incorporating a single contextual aspect, and then aggregate the order of delivery of MIOs across the sets. This approach works with either the request expansion or query language ordering approach, but requires running multiple predicate evaluations or executing multiple queries, essentially trading off more execution time for a smaller, more tailored result set. This is a suitable tradeoff when CPU at the server is plentiful, but bandwidth to the requesting client is not, a situation to be expected in mobile, tactical environments. Furthermore, evaluation of the requests can be halted once the intersection set is large enough.

3 Example and Evaluation To illustrate and evaluate incorporating context into requests, we created an operationally relevant experimental scenario, shown in Fig. 4, and database of MIOs. We developed a requesting client and a set of ten publishing clients, each simulating a moving sensor (e.g., an unmanned aerial vehicle, UAV) placed randomly within the confines Fig. 4. The experimental database was populated by ten publishers of Baghdad, Iraq, located roughly within the confines of Baghdad, Iraq, moving in one heading in a of twelve directions, publishing simulated imagery MIOs randomly chosen direction “traveling” 2

For example, the following pseudocode uses the templates to order by Time context to the “month” granularity, followed by ordering by Location to the finest granularity (Lat-Lon second): tmpredicate = tx.extendPredicateWithTimeContext(orig_pred, "month", true); tmlocpredicate = ctx.extendPredicateWithLocationContext(tmpredicate, client_lon, client_lat, false); ... executeQuery(tmlocpredicate);

328

J.P. Loyall and R.E. Schantz

approximately 30 mph, and publishing imagery information of the location over which the UAV is “flying” every 12 seconds. The requesting client was considered a “US Force” and each UAV was assigned an affiliation ran- Fig. 5. XPath expansion by location graphed against optimal domly from a set of distance 14 ordered choices (e.g., US, UK, or Iraqi force). The requesting client and each published MIO were tagged with time, location, and affiliation contextual values. Fig. 4 shows the simulated location of the requesting client (“Requester”), the starting location of the 10 publishers, the direction each is traveling, and their affiliations. Fig. 5 and Fig. 6 show the results from automatically incorporating location context Fig. 6. The order of MIO delivery using an XQuery request with locainto the informa- tion context graphed against the Euclidean distance of each MIO from tion search request the requesting client (smaller is better) using the request expansion and query language ordering approach, respectively. The results of request expansion track the optimal results closely, but not perfectly due to the selection of granularity of the expanded requests. Likewise, the query language ordering technique tracks the optimal distance context closely, but not perfectly. In this case, the difference is because the XQuery ordering we used is based on the Taxicab distance calculation [6] rather than the optimal Euclidean distance. While the inclusion of context can increase the amount of time (between 1% and 20% longer to execute the requests in our experiments) to get all the results, a more insightful question to ask instead of how long it takes to get all results is how quickly a sufficient set of results useful to the requester are available (or, conversely, how many useful results can be retrieved within a specific deadline). As the graph in Fig. 7 shows, the 100 closest, most recent, or most trusted MIOs are available in less than six seconds in any of the methods.

Using Context Awareness to Improve Quality of Information Retrieval

329

Fig. 8 shows an approach to provide a set of results with the best overall combination of the time, location, and affiliation contextual attributes, where best is defined to be the intersection of the result sets for each individual contextual attribute delivered in order of the normalized summed Fig. 7. Cumulative time it takes for MIOs to be delivered in the concontextual value. text-based request expansion After 325 results ordered by the individual contextual attributes, there were 34 MIOs common to each set. Fig. 8 shows these graphed in the order of their normalized summed values. The thick purple line (marked with “×”) is the combined normalized Fig. 8. The results if the intersection of the first 325 results ordered by distance, recentness, and trust are returned in order of their norvalue, with the malized summed values other lines representing the individual normalized attribute values. These 34 MIOs represent the order of delivery of MIOs with the best combination of recentness, distance from the requesting user, and trust. The intersection technique to create this combination is a promising technique for combining even significantly independent contextual attributes like the ones that we investigated.

4 Related Work Other work in context awareness has targeted pervasive computing environments and database queries. Context Query Language, CQL, is a specific query language [10] for

330

J.P. Loyall and R.E. Schantz

gathering contextual information in pervasive computing environments, and could therefore be used to collect the contextual information used to enhance the predicates in our approach. In [8], Mansour et al fashion queries in mobile applications as relational algebra, including tags for contextual information. Context is defined as XML documents and queries are automatically formed by middleware that inserts values from the XML context into the queries. In [11], Seher et al build an ontology of contextual information that is used to tag queries to refine them. In [9], Norrie et al also deal with context in terms of database queries, but define “context” as the part of the schema needed to interpret a query in a local database that needs to be distributed with a query to enable it to be interpreted across federations.

5 Conclusions This paper explores the space of using context awareness to improve the quality of information management. It provides the following major contributions: • • •

Identification of ways in which context can be used to automatically streamline and improve information retrieval and dissemination. Strategies for automatically using context awareness to improve the quality of processing information requests and quantitative evaluations of the strategies. Approaches and metrics for combining context attributes.

Context awareness and the use of contextual information as part of an overall set of information management services can provide the following improvements in the quality of information discovered, brokered, and disseminated: • • • •

Reduce the manual burden of crafting appropriate information requests. Reduce the cognitive load on the user to choose results that are most useful. Improve the speed by which useful results get to the user and ensure that the earliest results are more likely to be the most useful. Reduce demand on resources and increase the effectiveness of resource usage.

Our investigations illustrated that some (maybe many) context attributes are independent and have other aspects (such as granularity) that make it challenging to combine them into a single measure of context useful for improved quality. We presented a few techniques for combining attributes that are promising, including composable templates that support sorting on multiple keys. The intersection approach we present is more versatile (supports any granularity and doesn’t require ordering the context attributes) but comes with the tradeoff of requiring processing of more requests. The next steps for this research include applying and building upon the results of this study to incorporate context awareness into emerging information management (IM) services. An overarching result of our investigation is evidence that context awareness can improve the usefulness of information brokers and reduce the burden on the users of IM services. The results of this investigation can be used to influence the design, configuration, and implementation of IM elements to make them context aware. Contextual information can be incorporated in MIOs and in MIO database indexes, so there is ready access to context information and faster retrieval and ordering based on context. Similarly, the structures underlying predicate regis-

Using Context Awareness to Improve Quality of Information Retrieval

331

tration and predicate matching can be organized to support context-aware pattern matching, prioritization, and filtering. Finally, context awareness can be built into the IM brokering services, making IM operations more context aware, responsive to client needs and preferences, proactive, and effective.

References 1. Combs, V., Hillman, R., Muccio, M., McKeel, R.: Joint Battlespace Infosphere: Information Management within a C2 Enterprise. In: Tenth International Command and Control Technology Symposium, ICCRTS (2005) 2. Grant, R., Combs, V., Hanna, J., Lipa, B., Reilly, J.: Phoenix: SOA Based information Management Services. In: SPIE Conference on Defense Transformation and Net-Centric Systems, Orlando, FL, April 13-17 (2009) 3. Das, A.: Location Based Services, A Status Report of South East Asia. Location (JulyAugust 2008) 4. Deshmukh, S., Miller, J.: Location-Aware Computing. Deviceforge.com (December 3, 2003), http://www.deviceforge.com/ 5. Kepser, S.: A Simple Proof for the Turing-Completeness of XSLT and XQuery. In: Proceedings of Extreme Markup Languages, Montreal, Quebec, Canada, August 2-6 (2004) 6. Krause, E.F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover, NY (1986) 7. Linderman, M., Siegel, B., Ouellet, D., Brichacek, J., Haines, S., Chase, G., O’May, J.: A Reference Model for Information Management to Support Coalition Information Sharing Needs. In: Tenth International Command and Control Technology Symposium, ICCRTS (2005) 8. Mansour, E., Höpfner, H.: Towards an XML-Based Query and Contextual Information Model in Context-Aware Mobile Information Systems. In: 10th Intl. Conf. on Mobile Data Management: Systems, Services and Middleware (2009) 9. Norrie, M.C., Kerr, D.: Universal Contextual Queries in Database Networks. In: 3rd Intl. Conf. on Cooperative Information Systems (1995) 10. Reichle, R., Wagner, M., Khan, M.U., Geihs, K., Valla, M., Fra, C., Paspallis, N., Papadopoulos, G.: A Context Query Language for Pervasive Computing Environments. In: Sixth Annual IEEE Intl. Conf. On Pervasive Computing and Communications (2008) 11. Seher, I., Ginige, A., Shahrestani, S.A.: A Domain Identification Algorithm for Personalized Query Expansion with Contextual Information. In: 6th IEEE/ACIS Intl. Conf. on Computer and Information Science (2007) 12. World Wide Web Consortium, XML Path Language (XPath) Version 1.0, W3C Recommendation (November 16, 1999), http://www.w3.org/TR/xpath 13. World Wide Web Consortium, XQuery 1.0: An XML Query Language, W3C Recommendation (January 23, 2007), http://www.w3.org/TR/xquery/

An Algorithm to Ensure Spatial Consistency in Collaborative Photo Collections Pieter-Jan Vandormael and Paul Couderc INRIA Rennes, Irisa http://www.irisa.fr/aces

Abstract. Ubiquity of digital capture devices and on-line sharing enable the emergence of new collaborative services, similar to Google Street view which would leverage on community contributed pictures. A key problem is to deal with the variable quality of spatial context meta data (geotagging) associated with these pictures. In this paper, we propose a solution to ensure spatial consistency in photo collections aggregated from distributed capture. Keywords: distributed capture, geotagging, digital pictures.

1

Introduction

A trend in modern computing is the increasing availability of computing power and connectivity in a ubiquitous and miniaturized form. This is due to several technologies that have become widely accepted: mobile phones, the universality of the Internet and web technologies, wireless communication systems providing global connectivity, and ﬁnally, sensing technologies (such as the GPS) which provide environment awareness. This progress brings along new applications, such as the geotagging of photos on Flickr [1], and other ways of coupling photos with locations, such as in Google’s Street View [2] or Microsoft Live Labs’ Photosynth [3]. Decentralized forms are also being looked into: an example is PIX-grid[4], an Internet-scale peer-to-peer system that enables users to share (globally or only within a group) and eﬃciently locate photographs based on semi-automatically provided metainformation (by the devices and the user). Context-enabled photo collections can also be used to help in-situ navigation with a mobile device, such as in [8]. An important issue in these information system is the quality of the context data, in this case the location and the heading (or direction) information associated with the pictures: inaccurate geotagging can raise important coherency issues in the collection. While an outdoor photo collection captured by a single entity will likely not hit into major inconsistencies, the problem becomes diﬃcult when merging collection captured by diﬀerent users, geotagged with diﬀerent sensors and/or methods. The rest of the paper is organized as follows. In the next section, we present the photo application which we consider and the spatial coherence problem that is S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 332–342, 2009. c IFIP International Federation for Information Processing 2009

An Algorithm to Ensure Spatial Consistency

333

addressed. In the third section, we detail the algorithm of the solution. The fourth section briefely presents a prototype application of the algorithm. The ﬁfth section discusses related works and ﬁnally we conclude with some perspectives.

2

Application Example

We elaborated a simple application on a mobile device where a user is asked to take pictures and input the distance and the direction relatively in between the spots where the pictures are taken. In this way a map can be built of the environment wherein the photos are related by their distance and relative direction (angle). This information can be shared between users and this collectively built map can then serve as a photographic guide to others. One can expect the map would grow quite big easily, so the application should scale well to large collections. It makes sense to distribute the geotagged photo collection and any processing involved in locating and relating the pictures, so resources can be shared in a way that is achievable for limited mobile devices and without a central service. Imagine a user standing at the corner of a room, facing one far end of the room, and taking a picture. The user would then walk diagonally (say 45 ◦ to the right) across the room in 20 steps, face another wall to the North, and again take a picture. The user enters this data (20 steps, 45 ◦ , facing North) along with the photos into the application. In this way the user has done a relative position estimation himself and does not need to rely on external systems as GPS. While the user’s estimate might be much less accurate, we can count on it that if the user would now move to the left of where he was before, the new, third picture would certainly be classiﬁed as being left to the second picture, and never the other way around, as it could have been with GPS if it was within the margin of inaccuracy. This form of relative positioning can still be given absolute coordinates if the ﬁrst corner of the room were a known landmark, which has been located by means of tools such as GPS beforehand. While the user will not easily make obvious errors in subsequent photos, the user might go for a walk, make a very large tour and then arrive at his/her starting point, only ﬁnding out the ﬁrst and the last pictures are not in the same spot in the application, because he or she has built up a large cumulative error on the way. This problem can be resolved by incorporating known locations (called landmarks) as part of the tour, such that intermediate measurements can be corrected with interpolation techniques: suppose the user starts at the corner of a large room, which is a known landmark, and then traverses the room diagonally in steps. Finally he ends up in the opposite corner, which is also a known landmark. There the error might be large but can be corrected because the position is known. The steps in between can subsequently be adjusted as well. The errors we expect from users are both in distance and in angles: both systematic distance errors (the optimistic user always thinks it is not as far) and

334

P.-J. Vandormael and P. Couderc

non-systematic distance errors (it being diﬃcult for humans to judge distances, even in clear view). Also, some error in the angle estimations, even for not too great distances, especially when it is not 90 ◦ or 0 ◦ . The errors have a chance of becoming greater for larger distances and in the end so large that the user will need to work in steps. The focus of the application is however on consistency and transparency for the user, over accuracy, so this error is not harmful if the user does not notice it. It gets even more diﬃcult when things, such as buildings or shrubbery, are in the way of the direct line of sight, and the user must deviate from the straight path. The same goes for hills and diﬀerent level of terrain, when the problem gets a height component. One would expect even greater errors here. They can be reduced if the user is willing to work in steps (e.g. to the building ﬁrst, then the sides of the building, and afterwards to the endpoint), except when there is nothing to reference to. More errors are inserted because multiple users can collaborate or work interchangeably on the adding of the photos to the application.

3

Solution and Algorithm

We propose an algorithm to deal with spatial coherence issues that arise because of inaccuracies in the users’ estimates in this distributed capture example application. Distributed photo capture more speciﬁcally concerns the aggregation of photos taken with a mobile device, by several users and at diﬀerent but related places, into a coherent collection. The coherence of the collection is clearly the quality metric here. A photo collection becomes spatially incoherent when a photo is added that is not at the location where the user expects it. Based on the virtual world model of the collection, the user might expect to see one thing but might get to see something quite diﬀerent that does not match reality. The algorithm is closely based on the relaxation algorithm described in [7]. Its original purpose is to solve the localization problem in the ﬁeld of robotic node localization. It serves to maintain a coherent representation of the environment and to allow reconciliation with future estimates. However we will be using it with diﬀerent data in a diﬀerent situation. Also, decentralization is an additional goal. The main goal of our solution is improving the coherence, and error diﬀusion is the means. Our adapted algorithm relies to a great extent on the information provided by the users, and tries to use as much of the information as possible to come up with the best possible solution. Making sure that the error is distributed and not extremely located in one place requires lots of data being present and will only then make the map truly coherent. Taking in the information from each new estimate and applying it locally, is the quickest way to improve the coherence and also makes distributed calculation possible. In [7] a Gaussian noise distribution is assumed in all directions around the positions of the nodes so a circular probability area is achieved and a single variance measure can be used. The variance measure typically is taken to be proportional to the estimate distance as the variance represents the uncertainty of the measurement.

An Algorithm to Ensure Spatial Consistency

335

The general idea behind the relaxation algorithm is based on the mesh of springs analogy, where coherence is attained for optimization to the lowest energy in the mesh. However the algorithm does not solve a system of physics-like equations but works iteratively. The principle is to handle each node in turn, determine its position through the links from each of its topological neighbours, and move it there (“where its neighbours think it should be”). By repeating this the positions will converge to a globally optimal solution. While this original algorithm lacks some of the advanced techniques to improve coherence in comparison with other algorithms from the domains of robotics and ad-hoc sensor networks, it is one of the few that allow to create a lightweight, simple and distributed system. Its skill level can be described as plain accuracy maximization, as found in many of the sensor node algorithms, yet at a higher level because it tries to use all the information available, without approximating. Its merit is its capacity to be adapted for decentralized calculation, while remaining rather straightforward to implement. Also adaptations can be brought about to make it more speciﬁc to the photo capturing aspect. For instance functionality is added to take in account the fact that not only the positions of the photos must be made coherent but also the photos themselves, since they have an orientation. 3.1

Representation

How will we relate photos to locations in our virtual world model (i.e. a map)? One way is to directly link the location of the main object with the photo itself. This is useful for contextual relating because this object-centric way allows to group several photos of the same object, for instance photos that are rotated around the object in respect. However, there is no easy way to determine what the main object is. Inside a building a photo of an object will likely be a close-up. It might be diﬃcult to ﬁnd this object by its photo. Outside, it could be a view from afar of a touristic landmark. But the landmark could look diﬀerent from diﬀerent sides. So its connected photos would all be diﬀerent and one photo by itself would give unclear location information. Also, all the photos having this landmark would be inserted on one spot, and no photos would surround it, so there is no easy way to follow photos to get to the landmark. It is diﬃcult to spatially map out the photos, unless vision techniques are used. In fact the photos we expect are not about objects at all, they are about views from diﬀerent viewpoints. The viewpoints we propose as an alternative are the positions of the camera when the photo was taken. We take the location of the camera or cameraman (the user who inputs the photos) and link this to the photo. Rather than an inward and centralized view, this gives an outward and scattered view on the environment with photos “looking around”. In the interior photos contain walls and doors or views of corridors, outside they show the scenery from that point. With such a set of photos it is possible to create a walk-through experience for the user, much like he himself would see with his own eyes. Grouping of photos can be done based on the camera viewpoint. So, when the camera is only rotated, photos are added in the same group. Moving to another group happens when the cameraman walks around. Disadvantages here

336

P.-J. Vandormael and P. Couderc

are of course the impossibility to insert just any photo coherently into the system. Photos of close-ups or photos not taken straight at eye-level are discouraged. Nodes. There are some points or places (the viewpoints) of which we want to determine the position in GPS-like coordinates. We will keep referring to these points as nodes 1 . Several photos can share a node on their (equal) position. Links. Some of these nodes’ positions will be determined using GPS or pointing on a map. Others will be estimated by the user from another node. Thus, there are two kinds of estimates, which we will call links, that give information on the positions of the nodes and topologically place or connect the nodes: Relative Links. Relative links are user estimates of distance and direction from one node to the next. Actually they connect two photos rather than two nodes, since they include information on the view direction of the photos. However the idea of relating nodes still holds for the bigger part, when orientations do not matter, and therefore we will continue to use it below out of simplicity. Each relative link has a distance and two relative angles (one for the link in respect to the “startphoto”, and one for the “endphoto” in respect to the link). A consequence is that the position calculated by estimating depends on the orientation of the photo at the neighbouring (related) node. This is only logical: if the neighbour’s position gets corrected and its photo gets turned a bit to the left, then the position of this node should not only translate with the neighbour’s position, but should also slightly rotate counterclockwise (to the left) around it. Of course multiple links can be made. The links are reciprocal. Absolute Links. Absolute links are considered local metric information with very little inaccuracy about known positions. The information consists of a position in world coordinates and an absolute angle (versus North) indicating the view direction of the photo. High accuracy is needed for several reasons matching those of anchor nodes in sensor networks. The more known nodes, the fewer real correcting work needs to be done (supposing that there are no conﬂicts). In any case a suﬃcient proportion is needed for the problem to be feasible. And if these known nodes have a small error on their position, then it will be virtually impossible to get a smaller error than this for nearby nodes. Also, a couple of nodes with absolute coordinates are needed to be able to transfer everything into absolute coordinates. There can be more than one absolute link involved per photo. 3.2

Algorithm

The algorithm consists of two steps for each iteration. In a classical, non-distributed execution on a single device, both steps are performed for each node i in turn, within one iteration. The algorithm is iterated until the solution has converged. 1

As in nodes of a graph, not sensor node or robotic node devices.

An Algorithm to Ensure Spatial Consistency

337

Step 1. For the current node i and each neighbouring node j, between which there is a relative link with distance dji and direction (absolute angle) θji , a po sition estimate (xji , yji ) for node i using the position (xj , yj ) of the neighbouring node j is calculated. xji = xj + dji cos θji

(1a)

yji

(1b)

= yj + dji sin θji

The used technique to calculate the position can be seen as a combination of multilateration and multiangulation. The absolute angle θji is derived from the two relative angles known as part of the relative link, and the (resulting) orientation (absolute angle) αpj of the involved photo pj of neighbouring node j (remember that a relative link really is between two photos rather than two nodes). In the same way the variance vji on the position of the current node i in this estimate is calculated from the variance vj of each neighbouring node j and the variance of the link between the nodes uji . vji = vj + uji

(2)

For simplicity, but at the expense of the generality of having multiple relative links between the same neighbouring nodes, the roles of the neighbouring node and the relative link are not set apart above (and below in step 2). If the index i is kept for the current node, then the index j should be allowed to represent the same neighbouring node several times, and the index ji, replaced by the index k, represents the link. For each absolute link k concerning node i the position estimate (xk , yk ) (or (xji , yji )) is simply the position that the absolute link provides. Absolute links are given a variance too. But since there is no neighbouring node on the other side, the variance vk (or vji ) only has this one component. Apart from this diﬀerence (and apart from the fact that they will not transport any requests for neighbour updates) absolute links are incorporated into the calculations of the algorithm just like the relative links. Step 2. In the second step the position (xi , yi ) of the current node i is updated with a weighted average of the position estimates (xji , yji ) made from all of the neighbours j of the previous step. The purpose is to produce a new position (xi , yi ) for node i that in the end (after looping i and iterating) is more suitable. First, the new variance vi for node i is accumulated from the variances in the estimates vji , so the inverted variances in the estimates can be used as the weights for the average. Here, j is the sum over the neighbours j of node i (excluding j = i). In practice, this is done in a loop over j, constantly adding to a temporary value 1/vi , and then inverting this to get vi .

1 1 = vi vji j

(3)

338

P.-J. Vandormael and P. Couderc

Next, also in a loop over j for the summation, the position is calculated and updated.

xi =

xji vi j

vji

(4a)

yi =

yji vi j

vji

(4b)

The node’s position is now corrected, but it is additionally necessary to turn each photo to be as much as possible in correspondence with the information in the links. For each photo pi on node i the following is done: for each link j to that photo pi , the absolute angle (orientation) of the photo αjpi is calculated as it should be to be perfectly in line with the information within the link. Next, the angle is averaged with weights (according to the variances) for all these links and the photo pi is turned to this absolute angle αpi . 3.3

Stopping Criterion

In [7] the total change in positions falling below some threshold is given as a possible stopping criterion. However, in view of the distributed application, it is better to stop for each node individually when no more neighbouring nodes have had signiﬁcant updates to their position. If this is the case then calculating the position for the node again would have no signiﬁcant inﬂuence since the used data has nearly not changed. The fact that the position update diﬀerence for one node drops below a certain threshold still is the stopping criterion in a way. If this node is recalculated but not corrected much, while its neighbours are still updating signiﬁcantly, and if this node never really gets corrected anymore, so it is not this node that is asking its neighbours to update, then this will bring the algorithm to a halt eventually, under normal circumstances. 3.4

Locality

The algorithm can work locally. If this proposed stopping criterion is not too strict then an update in the position of a current node will not ripple through more than a couple of neighbours (with the change in position getting smaller and smaller), given that the mesh is rigid enough. This all depends on how many accurately known locations there are and how high the connectivity is. A high connectivity (many direct neighbours) increases the accuracy of the current node’s position. An absolute link makes the node’s position perfectly accurate (nearly perfectly accurate, in the next version). However, a high connectivity throughout the whole mesh makes the convergence speed slower because of more involved computations. Accuracy comes at the cost of speed. The same goes for a too strict stopping criterion, which could make changes propagate themselves nearly endlessly and would destroy the convergence speed.

An Algorithm to Ensure Spatial Consistency

3.5

339

Decentralized Execution

This locality property makes it possible to theoretically implement the algorithm in a distributed fashion. Calculations can be executed per node, on the device that “owns” the node: when the calculations are performed decentralized on several devices, some of the nodes i are virtually situated on diﬀerent devices. The responsibility to recalculate a node and its photos, and the ownership of their information, is assigned to a single device based on the device on which the ﬁrst photo of that node was added. Each device queues the nodes that need to be recalculated and runs the algorithm in turn for each of them. If a node gets updated and the change is noteworthy, neighbouring nodes are set to be updated too in case they are on the same device, or an update request is sent out to the other neighbouring nodes that are on other devices. The stopping criterion per node now decides if the update should be propagated to the neighbours or not. The whole procedure over all the devices enveloping several algorithm runs goes as follows: ﬁrst of all, the above algorithm for the node i is started on its owning device, because either it was newly created or because some data from one of its neighbouring nodes (possibly on a diﬀerent device) got corrected signiﬁcantly enough to also justify a correction run for this node. As in step 1, the information from the neighbours is used to estimate the position. The most recent information about the neighbours (their position, variance and the orientations of their photos) got sent by the neighbours when it changed (or when this node was newly created, since it needed to be presented to its neighbours). 3.6

Real-Time Insertion

This local impact property also makes it possible to add a node along the way and without incurring changes in the whole system (but only in a small area surrounding the node). The dynamism in this algorithm allows for new nodes to be added at any time (except in the steps of one running core calculation of course). Its presence should be announced much like update requests. The device that added the node should take responsibility for it and its calculations. This means that when the device is switched oﬀ afterwards no non-cached information can be requested and no updates can be made to that node, as is inevitable in a distributed system. Because eﬀects of change or addition are local the insertion is also fast. There is no need to recalculate the whole system of nodes. One iteration of the algorithm for the new node is suﬃcient for a ﬁrst roughly nearby position. Still, it might take quite a bit more to stabilize the eﬀect of the insertion. 3.7

Proof of Convergence

By referring to the mesh of springs analogy, the authors of [7] show that each update to any node will always result in a decrease in energy. Because the energy function moreover is both bounded at zero and quadratic, they prove that there is always convergence to a global optimum (minimum in energy) for their

340

P.-J. Vandormael and P. Couderc

speciﬁc version of the algorithm. However, it is not proven that the algorithm as stated above (with for instance the additional photo rotations) always converges. Implementation without slower convergence is also diﬃcult if real world coordinates are used, rather than some artiﬁcial Cartesian coordinates. 3.8

Averaging the Angles

For the last part of step 2 (3.2), as with averaging vectors starting from one point there is the possibility that the diﬀerent angles cancel each other out in the calculation of the average. Take for example the average of an angle of 0 degrees and of an angle of 180 degrees. Rather than 90 degrees or -90 degrees, as one might answer too quickly, the average is unsolvable. If it is looked upon as two vectors of equal length pointing in opposite direction, then it becomes obvious that the resultant vector of the sum of both is a vector in any direction with zero length. For this reason the average angle αpi is calculated with the x and y components of the estimated angles αjpi .

αxpi

cos (αjp )vi i = v ji j

αypi

sin (αjp )vi i = v ji j

(5a)

(5b)

An arctangent (with determination of the correct quadrant, often called “atan2”) is used to calculate the average angle from its components (converting rectangular coordinates to polar). αpi = arctan (

αypi ) αxpi

(6)

In case αxpi and αypi are very small (and the zero length vector problem has occurred), by choice, the angle calculation can be skipped and the photo’s orientation just would not be corrected at all. This would only happen very exceptionally and clearly there would not be enough or correct information to improve the coherence anyway. Since a photo at some time had one link to it, namely at creation, and therefore always had a valid angle at some point, the problem can be solved in this way, because the old orientation is kept.

4

Prototype Application

Following up on this theoretical work, we have implemented a prototype application of a distributed geotagged photo collection, called “Geometer”, as an example and to show the realizability of such a system. The system is built on the Android platform for mobile devices. The realised prototype allows individual users to add photos and performs the algorithm to make their positions

An Algorithm to Ensure Spatial Consistency

(a) The background is a satellite image.

341

(b) On a white background.

Fig. 1. Geometer map mode screen with a full set of photos added of an indoor walkthrough

coherent. The users can then view a map or walk through the photos. In practice distributed operation is included, following the theoretical design, but the application’s distributed use has not been perfected yet.

5

Related Works

While photo sharing and aggregation is still a hot topic, to our knowledge collaborative photo capture by uncoordinated entities has not been given much attention. Google Street view [2] does not import user geotagged picture, and Flicker [1] does not try to ensure spatial consistency. Distributed capture is being investigated by the robotic (swarm of robots) and sensor network communities [5], but the speciﬁc problem of large scale photo collection with pictures contributed by anyone on the Internet with variable context quality (spatial meta data) seems new. On a smaller scale, the problem of temporal consistency in cooperative capture of multimedia streams by a group of mobile phones has been studied in [6].

6

Conclusion

This paper has investigated the spatial coherence in the novel concept of distributed photo capture, in particular the issue of global coherence in geotagged

342

P.-J. Vandormael and P. Couderc

photo collections. We illustrated this issue with an application example, where users can capture and geotag pictures, and then apply an algorithm to enforce the coherency of the spatial references. A prototype was implemented on the Android platform, and was used for limited experimentations. The algorithm, in practice, turns out capable. It succeeds in keeping things “together” and in reducing the distributed error by diﬀusing it. However, some issues of the algorithm need to be looked into: the ever increasing variance, the order of recalculating the nodes, the presence of oscillations, and the zero length vector problem in case of many links. These issues are the object on-going works.

References 1. 2. 3. 4.

5.

6.

7.

8.

http://www.flickr.com/ http://maps.google.com/help/maps/streetview/ http://livelabs.com/photosynth/ Aberer, K., Cudre-Mauroux, P., Datta, A., Hauswirth, M.: PIX-Grid: A Platform for P2P Photo Exchange. In: Proceedings of Ubiquitous Mobile Information and Collaboration Systems (UMICS 2003), collocated with CAiSE 2003 (2003) Bisnik, N., Abouzeid, A., Isler, V.: Stochastic event capture using mobile sensors subject to a quality metric. In: MobiCom 2006: Proceedings of the 12th annual international conference on Mobile computing and networking, pp. 98–109. ACM, New York (2006) Bourdon, X.L., Couderc, P.: A protocol for distributed multimedia capture for personal communicating devices. In: Autonomics 2007: Proceedings of the 1st international conference on Autonomic computing and communication systems, pp. 1–8 (2007) Duckett, T., Marsland, S., Shapiro, J.: Learning globally consistent maps by relaxation. In: IEEE International Conference on Robotics and Automation, ICRA 2000. Proceedings, vol. 4 (2000) Pauty, J., Couderc, P., Banˆ atre, M.: Using context to navigate through a photo collection. In: Proceedings of the 7th international conference on Human computer interaction with mobile devices & services, pp. 145–152 (2005)

Real-Sense Media Representation Technology Using Multiple Devices Synchronization Jae-Kwan Yun, Jong-Hyun Jang, Kwang-Ro Park, and Dong-Won Han Meta-Verse Technology Research Team, IT Convergence Technology Resarch Laboratory Electronics and Telecommunications Research Institute, 138 Gajeong-no, Yuseong-gu, Daejeon, 305-700, South Korea {jkyun,jangjh,krpark,dwhan}@etri.re.kr

Abstract. Recently, the requirements for the real-sense media representation are increasing rapidly. Until now, most people mainly used the method SMSD on one device, but we need more than one device to play multiple audio/videos and multiple sensory effects of the real-sense media. Therefore, multi-track media synchronization and effect device synchronization algorithms are very important part of the real-sense media representation. Therefore, in this paper, we suggested concept of the real-sense media playback, sensory effect metadata scheme, real-sense media playback system architecture, and synchronization algorithm of multiple audio/video devices and multiple sensory effect devices. Keywords: SMMD, SEM, Media/Device Synchronization, Real-Sense Media.

1 Introduction Until now, most people have used the method SMSD (Single Media Single Device) which plays one media, including audio and video, on one playback device such as TV or PC [1]. But, as the development of the media playback technology, media running method is changing to the SMMD (Single Media Multiple Devices) which plays new type of media that includes not only previous type of audio, video, and text but also sensory effects(metadata information) to give users real-sense effects [3]. This SMMD service is very effective when we are making a movie for an experience room or an exhibition center, and a real-sense broadcasting service like stocks, cooking, quiz program which plays multiple devices to give users a lot of information at the same time [2]. Therefore, in this paper, we introduced the method for making a real-sense media that combines multiple audio/videos and sensory effects, and the method for playing this media with user peripheral audio/video playback devices and effect devices. Also, we explained the algorithms for the multiple audio/video synchronization and the multiple effect device synchronization. This paper is organized as follows. Chapter 2 discusses the concepts of the real-sense media playback, the real-sense media system architecture and SEM(Sensory Effect Metadata) scheme, chapter 3 explains media/device synchronization and synchronization algorithms and chapter 4 shows the evaluation test results. The conclusion of this paper is given in chapter 5. S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 343–353, 2009. © IFIP International Federation for Information Processing 2009

344

J.-K. Yun et al.

2 Real-Sense Media Playback In this chapter, we explained concepts of the real-sense media playback, the realsense media playback system architecture, and the sensory effect metadata scheme. 2.1 Concepts of the Real-Sense Media Playback Basically, in the real-sense media service system, we aimed that one media which contains multiple audio/video tracks and sensory effects can be played with multiple audio/video devices and effect devices. Fig, 1 shows contents of the real-sense media of lecturing how to cook with 3 audio/video tracks and 4 sensory effects.

Fig. 1. Concept of the Real-Sense Media

Track number 1 becomes main track media and it contains audio/video of front screen of cooking lecture, track number 2 contains the scene of foodstuffs, and track number 3 contains the scene of cooking tools, and each track is stored in one real-sense media file. Sensory effects (vibration, scent, light, web link) are the real-sense effect related with each track, and also it is edited with the timeline of the main track media. The real-sense media put together is transmitted to the home server from the contents server. And then, home server analyzes real-sense media, plays main track media on the playback devices like TV connected to the home server. The home server sends track number 2 and 3 to the user's around audio/video devices like laptop or other TV. Each sensory effect for real-sense presentation is analyzed within the home server, translated to a control information for device control, and each control information activates effect devices like light, scent generator, PC as the timeline of the main track media. As shown in fig. 1, to activate 3 devices (vibration ON, light OFF, web link A OPEN) exactly at the time of t(x), the device execution time and network delay must be considered.

Real-Sense Media Representation Technology

345

2.2 Real-Sense Media Playback System Architecture To play real-sense media within the home server, we inserted each audio/video into one MPEG-4 media file. First of all, sensory effects that activate playback devices in each scene must be generated in the contents server with metadata authoring tool. And, based on the main track media, multi-track media must have synchronization information within a media. we normally used SL(Sync Layer) information of MPEG-4 file. Transport manager transmits real-sense media to the home server, and main controller sends this real-sense media to media parser. In the media parser, real-sense media is separated into multi-track audio/videos and SEM. The main track media in the separated audio/video tracks is played in the hardware decoder. The other tracks are sent individually to the audio/video playback devices, and sync manager is keep trying to synchronize each track with the main track media. The SEM is parsed in the media parser and sent to the metadata analyzer. The metadata analyzer converts SEM into proper device information and device mapper makes device control message for each sensory effect. At this time, sync manager also synchronizes effect devices with the timeline of the main track media. Fig. 2 describes the system architecture for the real-sense media playback.

Fig. 2. System Architecture for the Real-Sense Media Playback

2.3 Sensory Effect Metadata Scheme The SEM is the metadata for representing real-sense effects, and it contains effect types and effect variables by using XML [4]. The SEM is consisted of two main parts those are effect property and effect variables. Effect property contains definition of each sensory effect applied to the real-sense media. By analyzing effect property, the home server can map each sensory effect to a proper sensory device in the user’s environment before processing real-sense media playback. Effect variables contains the control variables for sensory effect synchronized with the media stream. Fig. 3 shows the process of the SEM.

346

J.-K. Yun et al.

Fig. 3. Process of the Sensory Effect Metadata

Sensory effect metadata should be represented by the SEDL(Sensory Effect Metadata Description Language). Currently, The ISO/IEC MPEG is currently standardizing the SEDL [5]. Fig. 4 shows EBNF(Extended Backus-Naur Form) of SEDL.

Fig. 4. Sensory Effect Metadata Description Language

SEM is the root element and the DescriptionMetadata provides information about the SEM itself. The Declarations element is used to define a set of elements. The Parameter may be used to define common settings used by several sensory effects similar to variables in programming languages. A GroupOfEffects defines at least two effects play several devices at once. The EffectDefinition comprises all the information pertaining to a single sensory effect. An EffectDefinition may have several optional attributes which are defined as follows: activate describes whether the effect shall be activated; duration describes how long the effect shall be activated; fade-in and fade-out provide means for fading in/out effects; alt describes an alternative effect identified by a URI; priority describes the priority of effects with respect to other effects in the same group of effects; intensity indicates the strength of the effect; position describes the position from where the effect is expected to be received from the user’s perspective; adaptability enable the description of the preferred type of adaptation of the corresponding effect with a given upper and lower bound.

Real-Sense Media Representation Technology

347

3 Media/Device Synchronization In this paper, we synchronized several types of devices with real-sense media to give users real-sense effects. This effect device can be an active device which has its own CPU, memory, operating system and activated by itself. On the contrary, passive device only receives control messages from the home server, controls and returns the feedback messages. There are two method to give real-sense effects to users, one method is representing effects by directly controlling such devices like electronic fan, scent generator, digital light related with the scene. Another method is 3 dimensional display using left and right movies, or 360 degree dome shape media display by playing multiple audio/videos with synchronized way. In this chapter, we explain audio/video synchronization and effect device synchronization algorithm. 3.1 Workflow of the Multi-track Media Synchronization The multi-track media can be played several audio/video devices that user possesses and sub media is capable of being played in a user's laptop or PC. Multi-track is transmitted to the home server or other audio/video devices by using RTP(Real-Time Transfer Protocol), sends media control information with RTCP protocol. Home server becomes master, and it sends or receives synchronization messages continuously from slaves. Fig. 5 shows the workflow for the multi-track media synchronization. Master Media #1 Audio

Video

Hom e Server Audio Decoder

Video Decoder

Sensory Effect Device #1

Inter-media synch

SEM 1

SEM 1, SEM 2, SEM 3

Video

SEM 2

Slave #1

Media Device #1 Audio Decoder

Media #n Audio

group synch

Video

SEM 3

Video Decoder

Inter-media synch group synch

. . .

Inter-Device synch

Media #2 Audio

Slave #n

Media Device #2

Media data unit Control Packets (RTCP-Group Synch)

Audio Decoder

Sensory Effect Device #2

Video Decoder

Sensory Effect Device #3

Inter-media synch

Feedback(RTCP)

Fig. 5. Workflow of the Multi-track Media Synchronization

To play multi-tracks in the real-sense media, the service flow like fig. 6 is required. SEI(Server External Interface) is the interface for browsing the real-sense media list in the contents server and controlling start, pause, stop of media playback. DM(Device Management) is the interface for controlling the connection between the home server and players installed in each audio/video device, it sends VCR control message generated by the SEI and checks session maintenance of each audio/video device.

348

J.-K. Yun et al.

*QOG 5GTXGT 5GTXGT'ZVGTPCN+( XQLFDVW!

6HVVLRQ0DQDJHU

%NKGPV NCRVQR &GXKEG/CPCIGOGPV XQLFDVW!

6HVVLRQ&OLHQW

&GXKEG5[PEJTQPK\CVKQP EURDGFDVW!

3OD\HU,QWHUQDO,)

6WUHDPLQJ6HUYHU

465246246%2 XQLFDVW!

3OD\HU /GFKC5[PEJTQPK\CVKQP

Fig. 6. Audio/Video Synchronization Architecture

PII(Player Internal Interface) do a roll of media playback controlling, it runs and quits player process. DS(Device Synchronization) makes synchronization between each player installed in audio/video device, it do the OCR(Object Clock Reference) mapping for synchronization. RTSP, RTP/RTCP is protocol for the real-time media transferring. MS(Media Synchronization) do inter-media synchronization by the OCR of the audio track in the player. 3.2 Multi-track Media Synchronization Algorithm After choosing real-sense media and initializing players, the home server sends 'Play' message to the session manager. The home server becomes master, opens UDP socket, and broadcasts the master's play time for audio/video device synchronization. Player records master's play time in a memory mapping file and session client reads this time. After make it media start time M(t), register current system time C(t) by timer event, and broadcasts periodically current time every one second over a UDP socket. Current media time MC(t) is calculated by the formula (1), fig. 7 shoes audio/video synchronization algorithm.

Session Client Receives mediaPlay Message

Start Timer

Timer Event System Time

Previous Timer Event Time P(t)

Elapsed Time C(t) – P(t)

Current System Time C(t)

Player Media Timer

Media Start Time M(t)

Current Media Time Current MediaTime MC(t) = M(t)+C(t)-P(t)

Fig. 7. Audio/Video Synchronization Algorithm

Broadcast System Time

Real-Sense Media Representation Technology

349

(1)

MC ( t ) = M ( t ) + C ( t ) − P ( t )

In this case, like fig. 8, there are occasions that stream of the master device does not synchronized with the stream of slaves. Number 3 of the slave stream #2 was discarded because it did not reached the time range [t2, t3] that master stream must be played. Number 6 of the slave stream #1, it paused and replayed to make a synchronization with the master stream, because master stream did not arrived just in its playing time. By using this way, the home server can synchronize with other devices by performing pause and discard MDU(media device unit)s. 6\QFKURQL]DWLRQ ZLWKGLVFDUG

6ODYH6WUHDP

6\QFKURQL]DWLRQ ZLWKSDXVH

W

W

W

W

PLVPDWFKHGWLPHOLQH

W

6ODYH6WUHDP

0DVWHU6WUHDP

W

W

W

7LPH

PLVPDWFKHGWLPHOLQH

Fig. 8. Synchronization with Pausing MDUs and Discarding MDUs

3.3 Effect Device Synchronization

To synchronize effect devices efficiently, there need two time table in real-sense media service system. First of all, the sensory effect time table is used to maintain the SEM when real-sense media is transmitted to the home server. In the sensory effect time table, each effects are classified into effect property, start time, duration, and effect variables. Fig. 9 (a) shows the SEM example defined 9 effects during the time [t0, t6] about the effect type Light, Wind, Scent, Heat. With an authoring tool which makes an expression of the SEM, the author may edit effects about the scene regardless of the effect devices activated. In this case, the home server maintains types and status of effect device connected with the home server. After analyzing the SEM in the parser, mapping process is done to choose best device to represent that effect. In other words, H1 which represents heat effect, can be mapped to effect device heater, effect variable 30% is converted to heater's capable control message heater level 1(if heater had level 3, effect variable 100% could be heater level 3). There are some devices that they could not uses duration so they only be controlled by start, stop control message. Fig. 9 (b) shows when there are only that kind devices, 9 sensory effects can be mapped to 16 control command stored in the effect device controller time table. When real-sense media is played, control commands are transmitted to the effect devices by the media play time.

350

J.-K. Yun et al.

C5GPUQT[ 'HHGEVU6KOG6CDNG

+WWW

+HDW(IIHFW

+WWW 6WWW 526(

6FHQW(IIHFW

6 WWW *5$66

:WWW

:LQG(IIHFW /LJKW(IIHFW /GFKC%NQEM

+WWW

:WWW

/WWW 5('

W

W

/WWW *5((1

W

W

W

W

W

7LPH

D'HHGEV&GXKEG %QPVTQNNGT 6KOG6CDNG

'KHDWHU W21 ' 3(5)80(5 W21

&GXKEG%QPVTQN

' KHDWHU W21

W

' /(' W 21

W

')$1 W21

W

' )$1 W2))

' KHDWHU W21

' KHDWHU W2))

' 3(5)80(5 W2))

' 3(5)80(5 W2))

' )$1 W21

' )$1 W2))

' /(' ' /(' W 2)) W 21

W

W

' 3(5)80(5 W21

W

' /(' W 2))

W

7LPH

0XOWLSOH&RQWURO6LJQDOVDWVSHFLILF WLPH W

Fig. 9. Sensory Effect Time Table and Effect Device Controller Time Table

3.4 Effect Device Synchronization Algorithm

To give real-sense effect to users, same as audio/video synchronization, effect device must be activated at the exact time of the scene to be played. Therefore, we need algorithm which calculates device activation time D(t). In the home server, event occurs according to the time line of the main track media. D (t ) = MC (t ) − δ ( t ) − N (t )

δ (t ) =

Home Server

(2)

1 n 1 k δ i (t ), N (t ) = ∑ N i (t ) ∑ n i =1 k k =1

Timer Event Ei(t)

Ei+1 (t)

Ei+2 (t) System Time

COMMAND Network Delay N(t)

ACK

Effect Device

Device Execution Time d(t) Device Activation Time D(t) = MC(t)-d(t)-N(t)

Current MediaTime MC(t)

Fig. 10. Effect Device Synchronization algorithm

System Time

Real-Sense Media Representation Technology

351

According to the formula (2), when the sync manager sends control command to the device controller at D(t) time, the command is analyzed and decomposed into control command type, interface, control value, and start time.

4 Evaluation Test This synchronization algorithms we proposed are tested mainly in an experience room or an exhibition room. In these kind of room we used multi-track media as to display 3 dimensional media, and there are various kind of effect devices that are capable of making real-sense representation. Currently presented media is using 2 main tracks, 4 sub-tracks and 14 kinds of effect devices. 2 main tracks use beam projects for displaying right and left images, 2 sub-tracks in 4 are displayed in UMPC(Ultra Mobile PC) while users are moving, and the other 2 sub-tracks are displayed in PCs embedded in the motion chair. Table 1 shows the information of the tested video, audio. Table 1. Video and Audio Information of the Test Media Codec Width,Height Frame rate Average bitrate

Video

MPEG-4(H.264) 720 x 480 29.97 2.54MBits/sec

Codec Bit per sample Channels Average bitrate

Audio

AAC 16bit/sample 2 channels 1411 KBits/sec

The test methods for the multi-track media are as follows. First of all, we displayed each track without carrying out synchronization algorithm. We compared time gap between home server and each client whenever home server broadcasts synchronization time in every 10 seconds. As shown in fig. 11, the error can be reduced to 10% compared with not using synchronization algorithm. 1,500

1060

1,400

1040

1,300

Slave#1 Slave#2

1,200

Slave#3 1,100

Slave#4

1020

Slave#1 Slave#2

1000

Slave#3 Slave#4

980

Master 1,000 900

(ms) 1

Master 960 940

21

41

61

81

101 121 141 161 181 (sec)

(ms) 1

21

41

61

81

101 121 141 161 181 (sec)

Fig. 11. Evaluation Results for the multi-track media synchronization

The SMP 8635 hardware video/audio decoder(Revision C, Production ChipMRUA library version is mrua_SMP8634_2.8.2.0) currently used in the home server is based on the mips chip operated by embedded linux. The decoder's performance is not so good because it was developed only for the purpose of playing

352

J.-K. Yun et al.

audio/video. As shown in Fig. 11 (a), there was 400 milliseconds time difference between the home server and slave #1, slave #2 when media was played over 200 seconds because slave media was played in a normal PC. But, if synchronization algorithm was applied, the time difference occured 40ms compare with the case of not using synchronization algorithm. Table 2. Devices for the Evaluation Test Effect Property Heat Effect Wind Effect Scent Effect Light Effect Vibration Effect Shade Effect

Other Effect

Device Heater Fan Scent Generator Dimmer Flash Color Light(LED) Vibration Motion Curtain Water Sprayer Air Zet Tickler Phone Web Browser

Interface RS-232 RS-232 RS-232 RS-485 USB RS-485 RS-485 RS-485 RS-485 RS-485 RS-485 RS-485 Ethernet Ethernet

δD(t) 550ms 500ms 1s 800ms 70ms 120ms 150ms 100ms 5.3s 200ms 200ms 200ms 7s 1.2s

δN(t) 5ms 10ms 12ms 7ms 28ms 7ms 7ms 7ms 10ms 7ms 7ms 7ms 1ms 1ms

Number 3 2 2 1 2 4 4 1 1 2 2 1 1 2

We used 7 kind and 14 effect devices to represent real-sense effect as shown in table 2. We used electronic appliances that have MCU(Micro Controller Unit) to represent real-sense effect like "Heat", "Wind", "Shade". "Light" effect devices are categorized to "Dimmer" supporting fade in and fade out, and "Flash" to show an effect of lightning, spark, and "LED" supporting RGB color to express real colors. To show a "Vibration" effect, there are "Vibration" chairs that users feel oscillation from the seat and backrest, also this vibration chair is capable of giving a "Motion" effect like swing, up/down, shaking, forward/backward, left/right turn. And, there are "Other" effect like "Water Sprayer", "Air Zet", "Tickler". To interact with PC, there are "Web Browser" effect that opens related web pages in a specific scene of a movie. The "Phone" effect rings a real world phone or a cell phone when there is a scene of phone calls. We recorded the time of δD(t) since a control signal was delivered to a device, executed and reached its effect to a person in two meters. And, we measured the time δN(t) that MCU of each device sent a control signal and received a response message. Fig. 12 shows the result with synchronization algorithm and without it. We checked the maximum time difference from media time and device execution time, and it also contains the re-transmit time when control signal did not work correctly. After using synchronization algorithm, devices can be controlled within the time error 25ms.

Real-Sense Media Representation Technology

353

Fig. 12. Evaluation Results for the Effect Device Synchronization Algorithm

5 Conclusion Recently, requirements for media that gives users real-sense effects like real-sense movie, real-sense broadcast, real-sense news, real-sense education is increasing. But, previous playback method like SMSD cannot provide users with the real-sense effect, because one device cannot have all capability of effect representing. Therefore, in this paper, we explained the concepts and architecture of the real-sense media service system, and suggested the playback algorithms for operating multiple audio/video devices with multiple effect devices together. As shown in evaluation test and field tests, people feel ackward when synchronization algorithms did not work. Therefore in the future, these synchronization algorithms can do an important role when realsense broadcasting service like IPTV will be widely spreaded. The future research issue is a study for the QOS of the multi-track media and the effect device control. Acknowledgments. This work was supported by the IT R&D program of MKE/KEIT, [2007-S010-03, Development of Ubiquitous Home Media Service System base on SMMD]"

References [1] Choi, B.S., Joo, S.H., Lee, H.R., Park, K.R.: Metadata structure for Media and Device Interlocking, and the Method for Mapping It in the Media Format. In: Advances in Information Sciences and Services, pp. 186–190 (2007) [2] Segui, F.B., Cebollada, J.C.G., Mauri, J.L.: Multiple Group and inter-stream synchronization techniques: A comparative study. Information Systems, 108–131 (2009) [3] Yun, J.K., Shin, H.S., Lee, H.R., Park, K.R.: Development of the Multiple Devices Control Technology for Ubiquitous Home Media Service System. In: International Conference on Ubiquitous Information Technology & Applications, pp. 404–408 (2007) [4] Timmerer, C., Hasegawa, S. (eds.): Working Draft of ISO/IEC 23005 Sensory Information, SO/IEC JTC 1/SC 29/WG 11/N10475, Lausanne, Switzerland (February 2009) [5] Waltl, M., Timmerer, C., Hellwagner, H.: A Test-Bed for Quality of Multimedia Experience Evaluation of Sensory Effects. In: Proceedings of the First International Workshop on Quality of Multimedia Experience (QoMEX 2009), July 29-31 (2009)

Overview of Multicore Requirements towards Real-Time Communication Ina Podolski and Achim Rettberg Carl von Ossietzky University Oldenburg [email protected], [email protected]

Abstract. For embedded systems multicores are becoming more important. The flexibility of multicores is the main reason for this increasing extension. Considering embedded systems are often applied for real-time task, the usage of multicores implicates several problems. This is caused by the architecture of multicore chips. Usually such chips consist of 2 or more cores, a communication bus, I/O's and memory. Exactly the accesses to these resources from a core make it hard to ensure real-time. Therefore, well known mechanisms for resource access must be used, but for sure this is not sufficient enough. Because, a lot of design decisions depend on the applications. As a result, it is mandatory to analyze the requirements of the applications in detail and from the targeted multicore system. The aim of this analysis process is to derive a method for an optimal system design with respect to real-time support. This paper gives an overview of the requirement analysis for multicores and RT scheduling algorithms. Additionally, existing scheduling strategies are reviewed and proposals for new schedulers will be made.

1 Introduction The trend of applying multicore systems in many embedded application the necessity of real-time for such systems becoming more and more important. Typical multicores are shown in figure 1.

Fig. 1. Three exemplary types of multicore architectures

The authors of [1] argue that the transactional memory concept within multicore systems has attracted much interest from both academy [3] [4] and industry [5] as it eases programming and avoids the problems of lock-based methods. Furthermore they discuss, by supporting the ACI (Atomicity, Consistency and Isolation) properties of S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 354–364, 2009. © IFIP International Federation for Information Processing 2009

Overview of Multicore Requirements towards Real-Time Communication

355

transactions, transactional memory relieves the programmer from dealing with locks to access resources, see [1]. Besides the locking problems it is necessary to find solutions for deadlock avoidance and take care on the priority inversion problem. As surely argued in [1], in the case of multicore systems, lock-based synchronization can reduce the data bandwidth by blocking several processes that try to access critical sections, thus reducing processors utilization. The resource access is one of the major problems. Shared resources are controlled by lock-based methods with the wellknown disadvantages of serial access. Parallel access can be realized with transactional memory, see also [1]. That means, a transaction is either aborted when a conflict is detected, or committed in case of successful completion. An exemplary mapping of functional modules to tasks is depicted in figure 2. The tasks itself are mapped to cores of a multicore architecture. Some tasks require operating systems with a scheduler. As we can see from literature and the authors of [1] argue, real-time scheduling of transactions, which is needed for many real-time applications, is an open problem in multicore systems. The idea is to look at existing solutions for real-time scheduling of tasks in multiprocessor systems or transaction in systems are not suitable for multicore systems. The literature shows, real-time scheduling of tasks in multiprocessor systems does not consider important features of multicore systems, such that the presence of on-chip shared caches, see [1]. Caches are a big problem. If each core has is own cache the problems is the shared access from the caches to the main memory. In case of shared caches again locking mechanism have been used with all it’s disadvantages. Again as argued in [1] real-time scheduling of transactions in systems has been around since the 80s but assuming either centralized or distributed systems, but both solutions are not suitable for multicore systems as well. In this paper, we give an overview of related work of existing scheduling methods with resource access for multicore systems. We will briefly present a simple resource access strategy to demonstrate the problems and will discuss the main challenges for real-time scheduling in multicore systems.

Fig. 2. Exemplary mapping of functions to tasks and multicores

356

I. Podolski and A. Rettberg

2 Multiprocessor Scheduling Approaches with Resource Access Protocols In this section we will demonstrate with a small example the problems of shared resource access for multicore systems. We make the following assumptions. For each task, a set of jobs is associated. At any time, each processor executes at most one job. The task has a period and an execution requirement. When a job is released, it executes during the execution requirement of the task, and once the period is elapsed, another job of the task, is released. On multiprocessors, EDF is not optimal either under the partitioned or the global approaches [10], called respectively P-EDF and G-EDF, see also [1]. There exist further classes of scheduling algorithms differs from the previous ones. A typical example is the Pfair algorithm [11]. Pfair based on the idea of proportionate fairness and ensures that each task is executed with uniform rate. All tasks are broken into so-called quantum-length subtasks and the time is subdivided into a sequence of subintervals of equal lengths called windows, see [1]. This means, a subtask has to be executed within the associated window. Additionally, migration is allowed for each subtask. As described in [1] an optimal Pfair variant is that from [12]. In figure 3 an example multicore architecture is shown with cores A and B. On core A two tasks t1 and t2 are executed and on core B t3 and t4 are running. Additionally, the tasks access resources R1 and R2. For example task t2 has access to R1. As described in [1], the protocols managing resources in real-time systems are usually used in a hard real-time context, such as M-PCP and FMLP1 [13] under EDF. For Pfair scheduling, a lock-free algorithm has been proposed [12] to ensure that some task is always making progress. Indeed, classical lock-based algorithms cannot satisfy this property. Obviously, it can be figured out that resource access conflicts can occur. If task t1 on core A and task t4 on core B run at the same time and try to access R2 there is a resource conflict. Those conflicts exist due to the parallelism implicated by multicore architectures.

Fig. 3. Example multicore architecture with 2 cores A and B, 2 resources R1 and R2, and tasks t1 to t4

Overview of Multicore Requirements towards Real-Time Communication

357

Fig. 4. Task graph for t1 to t4 Table 1. Task characteristics

t1 t2 t3 t4

ci 6 6 8 5

di 9 16 20 22

ai 0 0.5 0 9

Figure 4 shows a task graph consisting of four tasks t1 to t4. Data dependencies between the tasks are represented by edges. Arrival time ai, computation time ci and deadline di for the tasks are given in table 1. On uni-core architectures there exist different resource access protocols. The most used one is the priority ceiling protocol (PCP) see [9]. The advantages of PCP are to prevent chained blocking and deadlocks. Each resource has a semaphore. A semaphore sk is assigned a priority ceiling C(sk) equal to the priority of the highest-priority job that can lock it. Note that C(sk) is a static value that can be computed off-line. Let ti be a task with the highest priority among all tasks ready to run; thus, ti is assigned the processor. Furthermore, let s* be the semaphore with the highest ceiling among all the semaphores currently locked by tasks other than ti and let C(s*) be its ceiling. To enter a critical section guarded by a semaphore sk, task ti must have a priority higher than C(s*). If the priority Pi of task ti is greater equal C(s*), the lock on sk is denied and ti is said to be blocked on semaphore s* by the job that holds the lock on s*. When a task ti is blocked on a semaphore, it transmits its priority to the task, say tk, that holds that semaphore. Hence, tk resumes and executes the rest of its critical section with the priority of ti. Task tk is said to inherit the priority of ti. In general, a task inherits the highest priority of the jobs blocked by it. When tk exits a critical section, it unlocks the semaphore and the highest-priority job, if any, blocked on the semaphore is awakened. Moreover, the active priority of tk is updated as follows: if no other jobs are blocked by tk, pk is set to the nominal priority Pk; otherwise, it is set to the highest priority of the jobs blocked by tk . The schedule with PCP protocol for an uni-core processor is shown in figure 5. On a uni-core 26 time units are needed for the schedule.

358

I. Podolski and A. Rettberg

Fig. 5. Uni-core scheduling with PCP

The question is, if the PCP is adaptable for multicores? PCP solves the resource access within one core, but between cores is has to be modified, see the example in figure 6. There we show an optimal but unrealistic schedule. One conflict is the parallel access of to t1 and t3 on resource r2. Therefore, it is necessary to adapt PCP to avoid conflicts caused by parallel access.

Fig. 6. Unrealistic schedule on a multicore with resource conflicts

A solution could be a globally controlled semaphore queue. With this queue we are able to avoid parallel access to resources. Lets say a task t2 on core A access resource r1, the semaphore s0 for r1 is included in the queue. Other tasks trying to access s0 have to check within the queue if the global semaphore is not set within the global queue. Figure 7 shows the PCP solution with the global queue. The ceiling blocking is now visible between the cores, see the resource access from t2 and t3 on s0. In figure 7 we need 20 times units for the schedule in comparison to the 26 for the uni-core schedule. To reduce the long blocking we suggested another modification of PCP. Let us assume we have another resource r3 inside the system that is used by t2 on core A, see figure 8. This is a sub-optimal solution, because now we have again a blocking time. This blocking time may be very long and lead to deadline misses. Another problem is the reduction of concurrency within the system. We see that core B has idle times caused by the blocking of t3 by access to s0.

Overview of Multicore Requirements towards Real-Time Communication

359

Fig. 7. Resource conflicts of PCP within multicore architecture

The main idea of this modification is the following. A resource is released whenever a task is finished with the usage. Furthermore, if a task is pre-empted by another task on the same core, the blocked resources of the pre-empted task are released. This enables tasks running on others cores to block the resources and starts with executing their critical sections. We have to ensure the fairness of our approach. First of all on each core we use PCP and for all tasks running on this core we have therefore a fair situation.

Fig. 8. Modified multicore example with additional resource R3

The inter-core PCP respectively our modified version has to ensure real-time requirements on all cores. Let ti a task on core A that is pre-empted by a task with a higher priority. In this case the semaphores blocked by ti are released. Let’s assume ti hold only one semaphore sk. The time period ti is pre-empted is as long as the higher priority task is running, let’s say this time-period is bi. Semaphores are released only for the time-period bi. A task tj on core B needed sk for time bj can now block the semaphore sk, but only if bi ≥ bj. If ti and tj tries to access sk at the same time, the task with the earliest deadline will get the allowance to block semaphore sk. Let di the deadline for ti and dj for tj. If di < dj task ti will have access to sk first. Now it is necessary to calculate the new deadline for tj as follows: dj = dj + li, whereas li is the time sk is needed by ti. This is a calculation for only one blocking. Figure 9 shows the schedule with the modified PCP.

360

I. Podolski and A. Rettberg

Fig. 9. Optimal schedule with the modified PCP

For evaluation the utilization factor for a schedule with our approach is as follows:

Whereas Ci is the computation time task ti and Ti is the period of the task. The blocking time Bi for all semaphores of the task is added to the Ci. Additionally we have to update the deadline of ti, because due to the blocking. That means, if the ti is not blocked di is not modified, otherwise the new deadline is added by the sum of all blocking times of ti.

This short example demonstrates the shared resource access problem. It is wellknown that for uni-processor systems based on static or dynamic priority assignment of task Earliest Deadline First (EDF) is optimal [9].

3 State-of-the-Art In this chapter we will give a short overview of the related work in this research field and enriches the discussion started in [1]. The authors of [2] claims also that the shared memory bus becomes a major performance bottleneck for many numerical applications on multicore chips, understanding how the increased parallelism on chip strains the memory bandwidth and hence affects the efficiency of parallel codes becomes a critical issue. They introduce the notion of memory access intensity to facilitate quantitative analysis of program’s memory behavior on multicores, which employ stateof-the-art prefetching hardware. The paper in [10] deals with the scalability of the scheduling algorithms presented above, on multicore platforms. One main conclusion of the authors is that on multicore platforms bandwith have negative impact on the algorithms, allowing migrations. The global approach, the scheduling overheads greatly depend on the way of implementing the run queues. On the other hand, without resource sharing, P-EDF performs well for this study, see also [1]. In [6] the authors present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. They

Overview of Multicore Requirements towards Real-Time Communication

361

consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and multicore, which has private L1 caches and a shared L2 cache. Furthermore they derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of these problems, they propose a generic CMP algorithm with an associated tiling sequence. A fundamentally new approach to increase the timing predictability of multicore architectures aimed at task migration in embedded environments is described in [7]. A task migration between two cores imposes cache warm-up overheads on the migration target, which can lead to miss deadlines for tight real-time schedules. The authors propose novel micro-architectural support to migrate cache lines. The developed scheme shows dramatically increased predictability in the presence of cross-core migration. Another scheduling method for real-time systems implemented on multicore platforms that encourages certain groups of tasks to be scheduled together while ensuring realtime constraints is proposed in [8]. This method can be applied to encourage tasks that share a common working set to be executed in parallel, which makes more effective use of shared caches. Another good overview of resource access protocols can be found in [14] and is as follows: “Rajkumar et al. [15] were the first to propose locking protocols for real-time multiprocessor systems. They presented two multiprocessor variants of the priorityceiling protocol (PCP) [16] for systems where partitioned, static-priority scheduling is used. In later work, several protocols were presented for systems scheduled by PEDF. The first such protocol was presented by Chen and Tripathi [17], but it is limited to periodic (not sporadic) task systems. In later work, Lopez et al. [18] and Gai et al. [19] presented protocols that remove such limitations, at the expense of imposing certain restrictions on critical sections (such as, in [19], requiring all global critical sections to be non-nested). A scheme for G-EDF that is also restricted was presented by Devi et al. [20]. More recently, Block et al. [21] presented the flexible multiprocessor locking protocol (FMLP), which does not restrict the kinds of critical sections that can be supported and can be used under either G-EDF or P-EDF. In the FMLP, resources are protected by either spin-based or suspension-based locks. The FMLP is the only scheme known to us that is capable of supporting arbitrary critical sections under G-EDF. Furthermore, the schemes in [20, 19, 18] are special cases of it. Thus, given our focus on G-EDF and P-EDF, it suffices to consider only the FMLP when considering lock-based synchronization.” In [1] they discussed that pure global algorithms will not scale, and thus real-time global policies need to be revisited for many-core architectures. More particularly, the scheduler should be able to control more precisely the sharing of processor's internal resources (i.e. cache levels) by real-time tasks with on-chip shared caches, both the small size of the caches and the memory. In [2] the overview of the memory access is as follows: “Memory bandwidth has been a fundamental issue for decades and has now become a major limitation on multicore systems ([22], [23], [24]). In S. Carr’s paper ([25]), methods are proposed to balance computation and memory accesses to reduce the memory and pipeline delays for sequential code on uniprocessor machines. The authors statically estimate the

362

I. Podolski and A. Rettberg

ratio of memory operations and floating point operations for each loop and use them to guide loop transformations (e.g. unroll and jam). The methodology from [2], which is based on the new notion of the memory access intensity, targets parallel programs on multicore systems which employ sophisticated prefetching hardware. Several existing papers investigate the scalability problem on multicore ([26], [27], [28]). They make observations that the memory bandwidth constraint can hamper program performance. However, no quantitative analyses are performed in those studies.” The authors of [29] present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. They consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. They derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of problems, they develop a generic CMP algorithm with an associated tiling sequence. They then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.

4 Summary Within this paper we start the discussion of real-time scheduling approaches and the resource access for multicore systems. Existing scheduling mechanisms have been adapted for multicore systems. Obviously new rules and policies have been required, which leads to new requirements depending on the features of the multicore architecture. With this position paper we want to give an overview based on the given literature.

References [1] Sarni, T., Queudet, A., Valduriez, P.: Real-time scheduling of transactions in multicore systems. In: Proc. of Workshop on Massively Multiprocessor and Multicore Computers (2009) [2] Liu, L., Li, Z., Sameh, A.H.: Analyzing memory access intensity in parallel programs on multicore. In: Proceedings of the 22nd Annual international Conference on Supercomputing, ICS 2008, Island of Kos, Greece, June 7-12, pp. 359–367. ACM, New York (2008), http://doi.acm.org/10.1145/1375527.1375579 [3] Herlihy, M., Moss, J.E.B.: Transactional memory: Architectural support for lock-free data structures. In: Proc. the 20th Annual International Symposium on Computer Architecture, May 1993, pp. 289–300 (1993) [4] Shavit, N., Touitou, D.: Software transactional memory. In: Proc. the 12th Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 204–213 (1995) [5] Tremblay, M., Chaudhry, S.: A third-generation 65nm 16-core 32-thread plus 32-scoutthread cmt sparc r processor. In: IEEE International Solid-State Circuits Conference (February 2008)

Overview of Multicore Requirements towards Real-Time Communication

363

[6] Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2008, Munich, Germany, June 14-16, pp. 207–216. ACM, New York (2008), http://doi.acm.org/10.1145/1378533.1378574 [7] Sarkar, A., Mueller, F., Ramaprasad, H., Mohan, S.: Push-assisted migration of real-time tasks in multi-core processors. SIGPLAN Not. 44(7), 80–89 (2009), http://doi.acm.org/10.1145/1543136.1542464 [8] Anderson, J.H., Calandrino, J.M.: Parallel task scheduling on multicore platforms. SIGBED Rev. 3(1), 1–6 (2006), http://doi.acm.org/10.1145/1279711.1279713 [9] Buttazzo, Giorgio, C.: Hard real time computing systems. Kluwer Academic Publishers, Dordrecht (2000) [10] Calandrino, B.B.J., Anderson, J.: On the scalability of real-time scheduling algorithms on multicore platforms: A case study. In: Proc. The 29th IEEE Real-Time Systems Symposium (December 2008) [11] Baruah, S.K., Cohen, N.K., Plaxton, C.G., Varvel, D.A.: Proportionate progress: A notion of fairness in resource allocation. Algorithmica 15, 600–625 (1996) [12] Leung, J.: Handbook of scheduling: algorithms, models, and performance analysis. Chapman & Hall/CRC, Boca Raton (2004) [13] Brandenburg, B.B., Calandrino, J.M., Block, A., Leontyev, H., Anderson, J.H.: Real-time synchronization on multiprocessors: To block or not to block, to suspend or spin? In: IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 342–353. IEEE Computer Society, Los Alamitos (2008) [14] Brandenburg, B.B., Calandrino, J.M., Block, A., Leontyev, H., Anderson, J.H.: RealTime Synchronization on Multiprocessors: To Block or Not to Block, to Suspend or Spin? In: Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, April 22-24, pp. 342–353. IEEE Computer Society, Washington (2008), http://dx.doi.org/10.1109/RTAS.2008.27 [15] Rajkumar, R.: Synchronization in Real-Time Systems: A Priority Inheritance Approach. Kluwer Academic Publishers, Dordrecht (1991) [16] Sha, L., Rajkumar, R., Lehoczky, J.: Priority inheritance protocols: An approach to realtime system synchronization. IEEE Transactions on Computers 39(9), 1175–1185 (1990) [17] Chen, C., Tripathi, S.: Multiprocessor priority ceiling based protocols. Technical Report CS-TR-3252, Univ. of Maryland (1994) [18] Lopez, J., Diaz, J., Garcia, D.: Utilization bounds for EDF scheduling on real-time multiprocessor systems. Real-Time Systems 28(1), 39–68 (2004) [19] Gai, P., di Natale, M., Lipari, G., Ferrari, A., Gabellini, C., Marceca, P.: A comparison of MPCP and MSRP when sharing resources in the Janus multiple processor on a chip platform. In: Proceedings of the 9th IEEE Real-Time and Embedded Technology Application Symposium, pp. 189–198 (2003) [20] Devi, U., Leontyev, H., Anderson, J.: Efficient synchronization under global EDF scheduling on multiprocessors. In: Proceedings of the 18th Euromicro Conference on RealTime Systems, pp. 75–84 (2006) [21] Block, A., Leontyev, H., Brandenburg, B., Anderson, J.: A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 71–80 (2007) [22] Smith, A.J.: Cache Memories. Computing Surveys 14(3), 473–530 (1982)

364

I. Podolski and A. Rettberg

[23] Asanovic, K., et al.: The Landscape of Parallel Computing Research: A View from Berkeley. EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2006-183 (December 18, 2006) [24] Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. (2007) [25] Carr, S., Kennedy, K.: Improving the Ratio of Memory Operations to Floating-Point Operations in Loops. ACM Transactions on Programming Languages and Systems 16, 1768–1810 (1994) [26] Zhang, Q., et al.: Parallelization and Performance Analysis of Video Feature Extractions on Multi-Core Based Systems. In: Proceedings of International Conference on Parallel Processing, ICPP (2007) [27] Alam, S.R., et al.: Characterization of Scientific Workloads on Systems with Multi-Core Processors. In: International Symposium on Workload Characterization (2006) [28] Chai, L., et al.: Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System. In: Cluster Computing and the Grid (2007) [29] Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2008, Munich, Germany, June 14-16, pp. 207–216. ACM, New York (2008), http://doi.acm.org/10.1145/1378533.1378574

Lifting the Level of Abstraction Dealt with in Programming of Networked Embedded Computing Systems (Keynote Speech) K.H. Kim* EECS Dept. & DREAM Lab, University of California, Irvine Irvine, CA 92697, USA [email protected]

Abstract. The scale and complexity of advanced networked embedded computing (NEC) application systems are steadily growing. The need has become increasingly acute for a programming method and style that imposes much less amounts of detail-handling on the real-time (RT) distributed computing (DC) programmer than the currently widely used method does. With this motivation a number of different attempts have been made toward establishing high-level RT DC objects. The TMO scheme developed over the past 18 years by the author and his collaborators is one of those attempts. In terms of lifting the level of abstractions of main program building-blocks, the TMO scheme has been about the most daring attempt. However, all the attempts have not yet reached the level of sufficient maturity in that the ease of guaranteeing the timeliness of critical output actions with a high degree of precision has not been much demonstrated. Some basic principles and techniques learned from past research on TMO are briefly reviewed. Then major remaining research issues are discussed. Keywords: real time, distributed computing, networked embedded computing, TMO, time-trigger, message-trigger, object, timeliness, guarantee.

1 Motivation In recent years the scale and complexity of advanced networked embedded computing (NEC) application systems have been steadily growing. To cope with this growing complexity, the programming styles, methods, and tools need to be significantly upgraded. The predominant style of programming used today in the young field of NEC application programming is the low-level programming that can be called the thread - UDP – thread-priority (TUP) programming. For the past two decades I have belonged to the small community which strongly believe in the need for upgrading the level of abstractions of the program building-blocks available to real-time (RT) distributed computing *

The author is also an adjunct faculty member of Konkuk University.

S. Lee and P. Narasimhan (Eds.): SEUS 2009, LNCS 5860, pp. 365–376, 2009. © IFIP International Federation for Information Processing 2009

366

K.H. Kim

(DC) programmers. The fundamental problem with the TUP programming is that understandable specification, design, and timeliness guaranteeing is very difficult. To be more specific, the following limitations exist: (Pr1) The quality of advanced NEC software produced by use of the TUP approach tends to be low due to the difficulty of obtaining understandable designs. (Pr2) The productivity of NEC software engineers is low due to the need for them to deal with tedious details, which can be avoided when using a higher-level programming approach, as well as the difficulty of obtaining understandable designs. (Pr3) The performance of advanced NEC software tends to be low due to the difficulty of understanding the performance impacts of detailed design choices and achieving optimizations. The relationship between the application requirements, especially timing requirements, and the designs of TUP programs is very difficult to trace. As a result, ineffective use of resources in NEC systems results. Therefore, the need has become increasingly acute for a programming method and style that imposes much less amounts of detail-handling on the RT DC programmer. This means specifically the following: (Up1) A higher-level RT DC program model consisting of program structures and vocabularies that are more abstract and yet do not force compromises in the degree of control of various important action timings, is desired. (Up2) A desirable program model should exhibit more directly and clearly the relationship between itself and the application requirements that it is intended to meet, including the timing and other performance requirements. (Up3) As the core part of the desirable program model being sought, the program construct which can play the role of the main building-block in development of RT DC application programs was singled out from the beginning phase of the research on high-level RT DC programming. In addition, given the insufficient understanding that had existed in the field regarding program constructs and the desire to lift the level of abstraction which RT DC programmers would need to deal with without removing any important kind of programming power from them, the small community quickly reached a conclusion that the main building-block should be some version of the abstract data type object. Therefore, abstract RT DC object models have been at the top in the list of research targets in this area. Among a number of different attempts toward establishing such high-level RT DC objects, the efforts that have been pursued with the greatest persistence are the following three: RT Corba [OMG05], RT Java [Bru09], and TMO (Time-triggered Message-triggered Object) [Kim97, Kim00, Liu09]. In terms of lifting the level of abstractions of main program building-blocks, the TMO scheme can be viewed as the most daring and advanced attempt. However, all three attempts have not yet reached the level of sufficient maturity in that the ease of guaranteeing the timeliness of critical output actions with a high degree of precision, e.g., sub-millisecond-level guaranteeing of a result return from a remote object-method invocation, has not been sufficiently demonstrated. Much further research remains to be done.

Lifting the Level of Abstraction Dealt with in Programming of NEC Systems

367

In the next section, some basic principles and techniques learned from past research on establishing the high-level RT DC object, TMO, are briefly reviewed. Then major remaining research issues are discussed in Section 3.

2 Principles and Approaches Newly Explored in TMO Research In 1992 the author decided to adopt a simple skeleton model with concrete syntactic structure and semantics, which was initially named the RTO.k and later renamed to TMO (Time-triggered Message-triggered Object) [Kim97, Kim00, Liu09]. In the past 18 years, the progress in fully developing and maturing the TMO programming technology has been slow but steady. In fact, all efforts geared toward establishing highlevel RT DC objects have been advancing at similar rates. Some parts of the TMO model were concrete mechanizations of the basic principle of global-Time based Coordination of Distributed computing Actions (TCoDA), which had been first advocated by Hermann Kopetz in Austria [Kop87, Kop97]. Name of TMO ODSS

ODSS

1

2

•••

EAC

Object Data Store ( ODS )

AAC

AAC Reservation Q Service Request Queues

From SvM's , SpM's

Capabilities for accessing other TMO's and network environment incl . logical multicast channels, and I/O devices

SpM 1

SpM 2

• • Deadlines

Time-triggered (TT) Spontaneous Methods ( SpM's) "Absolute time domain"

SvM 1 Client TMO's

• •

SvM 2 concurrency control

• •

Message-triggered Service Methods ( SvM's) "Relative time domain"

Fig. 1. Structure of the TMO (adapted from [Kim97])

The basic TMO structure is depicted in Figure 1. Calling the TMO scheme a highlevel DC programming scheme is justified by the following characteristics of the scheme: (1) No manipulation of processes and threads. Concurrency is specified in an abstract form at the level of object-methods. Since processes and threads are transparent to TMO programmers, the priorities assigned to them, if any, are not visible, either.

368

K.H. Kim

(2) No manipulation of hardware-dependent features in programming interactions among objects. TMO programmers are not burdened with any direct use of low-level network protocols and any direct manipulation of physical channels and physical node addresses. (3) Timing requirements need to be specified only in the most natural form of a timewindow for every time-triggered method execution and a completion deadline for every client-requested method execution. This high-level expression matches the most closely with the designer's intuitive understanding of the application's timing requirements. The program structuring principles and approaches newly explored in TMO research are summarized in this section. These principles and approaches are considered to be of fundamental nature and useful in practical NEC programming and software engineering. 2.1 Pervasive Use of a Global Time Base Practically all time references in a TMO are references to global time [Kop97] in that their meaning and correctness are unaffected by the location of the TMO. If GPS receivers are incorporated into the TMO execution engine, then a global time base of microsecond-level precision can be established easily. Within a cluster computer or a LAN based DC system a master-slave scheme, which involves time announcements by the master and exploitation of the knowledge on the message delay between the master and the slave, can be used to establish a global time base of sub-millisecond level precision. A TMO instantiation instruction may contain a parameter which explicitly indicates the required precision of the global time base to be established by the TMO execution engine. This pervasive use of a global time base in networks of RT DC objects was first explored in the TMO scheme and is regarded as a fundamental approach of high usefulness in advanced NEC software engineering. 2.2 Deadline for Result Arrival (DRA) TMO is a natural, syntactically minor, and semantically powerful extension of the conventional object(s). TMO is a DC component and thus TMOs distributed over multiple nodes may interact via remote method calls and another mechanism discussed below in (G). Object-methods in a TMO that can be called from other TMOs are called message-triggered service methods (SvMs). It takes two simple statements to construct a remote method call: (Stmt1) SvMGateClass Gate1 (_T("TMO3“), _T("SvM7“), tm4_DCS_age(7*1000*1000) ) (Stmt2) Gate1.BlockingSR ( void* ParamPtr, int ParamSize, MicroSec DRA, tms ORT ) The first statement Stmt1 is for instantiation of a proxy object, called a TMO-method gate, for the remote TMO-method (TMO3.SvM7) to be invoked. This statement is

Lifting the Level of Abstraction Dealt with in Programming of NEC Systems

369

placed in the environment access capability (EAC) section which is one of the four major parts of the TMO structure. The other statement Stmt2 is a call for a built-in method of the TMO-method gate, BlockingSR(), which in turn performs a blocking type of a service request to the remote TMO-method. The TMO-method gate possesses built-in methods for both blocking types and nonblocking types of service requests. Note that the 3rd parameter in Stmt2 is a deadline for return result arrival (DRA). The delay from the instant at which this statement is executed to the instant at which the result returned from the invoked remote SvM becomes available in the node hosting the client should not exceed the DRA specified. Otherwise, the TMO execution engine which is spread over the network of computing nodes forming the DC network raises an error signal. In order to specify the DRA appropriately the TMO programmer needs to have good understanding of at least the worst-case service time of the remote SvM or a tight upper bound on the service time. In fact, each SvM includes an explicit specification of a guaranteed execution duration bound (GEDB) and a maximum invocation rate (MIR) supported by it. The TMO was the first RT DC object / component which incorporated DRA as an integral mechanized part. 2.3 Object-Data-Store Segment (ODSS) and Concurrency Control One of the four major parts of the TMO structure is the object-data-store (ODS) section which contains the data-container variables shared among methods of a TMO. Variable are grouped into ODS segments (ODSSs), which are the units that can be locked for exclusive use by a TMO-method in execution. Access rights of TMOmethods for ODSSs are explicitly specified and registered to the execution engine which in turn analyzes them to exploit maximal concurrency. An execution of a TMOmethod is launched only when all the ODSSs for which the TMO-method has access rights have been locked for use in the TMO-method-execution. Conversely, multiple TMO-method-executions may progress concurrently if for each of those executions the TMO execution engine has locked all the ODSSs needed for that method-execution. This can be viewed as an approach for "maximal exploitation of concurrency among object-methods". TMO is among the earliest RT DC objects which incorporated such a concurrency control approach. 2.4 Spontaneous Method (SpM) TMO is an autonomous active DC component. Its autonomous action capability stems from one of its unique parts, called the time-triggered (TT) methods or the spontaneous methods (SpMs), which are clearly separated from the conventional service methods (SvMs). The SpM executions are triggered upon reaching of the global time at specific values determined at the design time whereas the SvM executions are triggered by service request messages from clients. For example, the triggering times may be specified as "for t = from 10am to 10:50am every 30min start-during (t, t+5min) finish-by

370

K.H. Kim

t+10min". All time references here are global-time references. By using SpMs, the TCoDA principle can be easily designed and realized. If an SpM has the specification of triggering times, "for t = from system-startup to system-shutdown every eternity start-during (t, t + minimum-activation-delay) finishby system-shutdown", the function body is executed from the system-startup instant until the system-shutdown instant or the completion of the function body, whichever comes first. Therefore, such SpM can be viewed as a "thread" activated upon system startup to run the function body expressed in a high-level language form. The logic in such function body may be designed to check the global time and sleep until the global time reaches a certain time-point. It may also access a certain pointer and invoke a local function pointed to by that pointer. The SpM as an object-method was first explored in the TMO scheme. The SpM is regarded as a fundamental approach of high usefulness in advanced NEC software engineering. 2.5 Basic Concurrency Constraint (BCC) A major execution rule intended to enable reduction of the designer's efforts in guaranteeing timely service capabilities of TMOs is the basic concurrency constraint (BCC) that prevents potential conflicts between SpMs and SvMs. Basically, activation of an SvM triggered by a message from an external client is allowed only when potentially conflicting SpM executions are not in place. Thus an SvM is allowed to execute only if no SpM that accesses the same ODSSs to be accessed by this SvM has an execution time-window that will overlap with the execution time-window of this SvM. The BCC does not reduce the programming power of TMO in any way. Under BCC the timing behavior of SpMs is not impacted in complicated ways by the SvM executions, especially if the CPU sharing among independent SpM executions and SvM executions is handled such that the share of the CPU time each SpM execution receives can be easily calculated. Therefore, the analysis of the timing behavior of a TMO can proceed largely in two steps, i.e., analyze the execution time bounds, which can also be viewed as the service time bounds (STBs), of SpMs first and then analyze the STBs of SvMs. The STBs of SvMs tend to be less tight than the STBs of SpMs. BCC was first explored in the TMO scheme. The author believes that this BCC is also a fundamental approach of high usefulness in advanced NEC software engineering. 2.6 Ordered Isolation (OI) Rule The difficulty of analyzing competitions among method-executions depends much on the way ODSSs are locked and released. To reduce that difficulty further after incorporating BCC, the TMO scheme adopted the ordered isolation (OI) rule [Kim07b]. The OI rule can be stated by using the term initiation timestamp (I-timestamp) defined as follows: - In the case of an SvM execution, the I-timestamp is defined as the record of the time instant at which the execution engine initiated the SvM execution after receiving

Lifting the Level of Abstraction Dealt with in Programming of NEC Systems

371

the client request and ensuring that the SvM execution can be initiated without violating BCC. - In the case of an SpM execution, the I-timestamp is defined as the record of the time instant at which the SpM execution was initiated according to the AAC specification of the SpM. The OI rule has the following two parts: (OI-1) A method-execution with an older I-timestamp must not be waiting for the release of an ODSS held by a method-execution with a younger I-timestamp. (OI-2) A method execution must not be rolled back due to an ODSS conflict. If these rules or other rules restricting the possible types of waiting situations and rollback situations are not followed, then the validation of GEDBs can be drastically more complicated. The price paid for reducing the complexity of deriving tight execution duration bounds by adopting the OI rule is the loss of some concurrency. The OI rule was first explored in the TMO scheme. A rule that allows a greater amount of concurrency than the OI rule does and yet does not make the derivation of tight execution duration bounds of TMO-methods is an open research topic. 2.7 Real-Time Multicast and Memory Replication Channel (RMMC) TMOs can use another interaction mode in which messages can be exchanged over logical multicast channels of which access gates are explicitly specified as data members of involved TMOs. The channel facility is called the Real-time Multicast and Memory-replication Channel (RMMC) [Kim00, Kim05]. The RMMC scheme facilitates RT publish-subscribe channels in a powerful form. It supports not only conventional event messages but also state messages based on distributed replicated memory semantics [Kop97]. The access gates are called RMMC gates and treated as special types of ODSSs. Access rights of TMO-methods for RMMC gates are thus explicitly specified and registered to the execution engine. RMMC gates are declared in the EAC section. The declaration of an RMMC gate includes a parameter specifying a bound on the message transmission delay over the channel. When a message is multicast over an RMMC by calling for a built-in method of the corresponding RMMC gate, an official release time (ORT) is tagged to the message. When the message reaches a computing node hosting a subscriber TMO, the message cannot be opened until the ORT arrives. This ORT mechanism is useful in synchronizing message-pickups by multiple subscribers or ordering message-pickups by a subscriber of the messages coming over multiple RMMCs to the same TMO. The ORT is also incorporated into remote method call mechanisms. The part of the RMMC that carries event messages is an extension of the data field scheme initiated by Hitachi, Ltd. in Japan [Kim95, Mor93] and the part that carries state messages is an extension of the messaging scheme initiated by Hermann Kopetz. The ORT idea was initiated by Toshiba Corp., Japan. The following four concepts or approaches were first explored in the TMO scheme.

372

K.H. Kim

(1) The use of an RT logical multicast channel as a mechanism for interconnecting RT DC objects. (2) A single RT logical multicast channel that carries both event messages and state messages. (3) The approach of using an RMMC gate in connecting an RT DC object to an RT logical multicast channel. (4) The incorporation of the ORT into multicasts of RT event messages and state messages. Through the TMO programming experiences over the years the author has become convinced that this RMMC is a fundamental mechanism of high usefulness in advanced NEC software engineering. 2.8 Underground Non-Blocking Buffer (NBB) with a Pair of Access Gates In the basic TMO scheme, there is no way for any two concurrent method executions to exchange any data. This is because an ODSS cannot be shared when at least one method-execution needs to access it in the read-write mode. In some application cases, it is desirable to let two long-running SpM executions exchange data streams. The TMO was recently extended by incorporating a new mechanism that enables multiple rounds of data passing from one method-execution to the other method-execution and yet does not damage the nature of the TMO scheme which makes STB validation relatively easy. The mechanism is based on the NonBlocking Buffer (NBB) developed in recent years by several teams [Var01; Kim06; Kim07a]. An NBB used between a producer thread and a consumer thread allows the producer to insert a new data item into its internal circular buffer at any time without experiencing any blocking. If the internal circular buffer is saturated, then the producer attempting to insert a new item can detect it immediately and choose to do other things for a while and then check the NBB again. Similarly, the NBB allows the consumer to retrieve a data item from the internal circular buffer at any time without experiencing any blocking. The version of NBB that is appropriate for use between two methods in a TMO is depicted in Figure 2. Two NBBs are there. Each NBB consists of an internal circular buffer, a producer gate, and a consumer gate. The two gates are ODSSs and they are registered with the execution engine. In a sense, the internal circular buffer is treated as an invisible data structure. Therefore, the producer method puts a read-write lock on the producer gate and the consumer method puts a read-write lock on the consumer gate and then the two are treated by the execution engine as two independent methods not sharing any data structure. Only the TMO designer knows that the internal circular buffer is a shared data structure but the execution engine pretends not to know this and allows the producer method-execution and the consumer to proceed concurrently. Therefore, this version of NBB is called the underground NBB. The author believes that the NBB is an important mechanism for use in RT concurrent programs and TMOs containing underground NBBs may be the first instance of using NBBs in RT DC programs.

Lifting the Level of Abstraction Dealt with in Programming of NEC Systems

373

Fig. 2. Underground NBBia TMO (Adapted from [Kim07b])

3 Major Remaining Research Issues Over the past 15 years TMO execution facilities have been continuously enhanced along with the related APIs (application programming interfaces). A middleware model called the TMOSM (TMO Support Middleware) provides execution support mechanisms and can be easily adapted to a variety of commercial kernel-hardware platforms in wide use in industry [Kim99; Jen07, Kim08]. Therefore, a TMO execution engine consists of a group of networked computing node platforms (hardware nodes, plus OS kernels) and instantiations of the TMO Support Middleware (TMOSM) running on the node platforms. TMOSM uses well-established services of commercial OS kernels, e.g., process and thread support services, short-term scheduling services, and low-level communication protocols, in a manner transparent to the application programmer. Prototype implementations of TMOSM currently exist for Windows XP / Vista, Windows CE, and Linux 2.6 platforms. Along with TMOSM, the TMO Support Library (TMOSL) has been developed [Kim99; Kim00; Kim05]. It provides a set of friendly application programming interfaces (APIs) that wrap the execution support services of TMOSM. TMOSL defines a number of C++ classes and enables convenient high-level programming by approximating a programming language directly supporting TMO as a basic building block. Other research teams have also developed TMO execution engines based on different kernel platforms [KimH02; KimJ05]. TMOSM, TMOSL, and other tools are available for down-load from web, http://dream.eng.uci.edu /tmodownload/. A number of RT DC applications have also been developed by use of the TMO scheme [Hen08, Liu09]. Some demo applications can be seen in http:// dream.eng.uci.edu/demo.

374

K.H. Kim

However, there are a number of issues that need to be resolved through further research before the technology can reach the level of full maturity. Some major issues are listed below. 3.1 Intermediate-Level RT DC Programming Scheme For implementing NEC applications on small computing platforms, the DC object approach such as the TMO scheme could appear overhead-heavy. Also, transitioning from the TUP programming to a high-level programming approach such as TMO programming may be too big an adjustment in many industrial environments. Therefore, there seems to be a good justification for establishing a complimentary RT DC programming scheme which is based on a somewhat lower level of abstractions of program building-blocks. I believe that one promising direction is to take the object framework off the TMO and let SpMs become independent time-triggered functions (TTFs) and SvMs become independently threaded service functions (ITSFs). Then the resulting programming approach is to build RT DC programs with TTFs, ITSFs, NBBs, logical multicast channels, and other data sharing mechanisms. Such intermediate-level programming approach still avoids the direct use of threads and tread-priorities. In order to avoid creating overhead-heavy environments, it will be necessary to develop "kernel-level support" for building-blocks of such intermediate-level RT DC programs. 3.2 Service Time Bound (STB) Although the TMO scheme was initiated with the goal of enabling timeliness guaranteeing, only very limited experimental research on deriving STBs of TMO-methods has been performed [Col09]. This is the primary reason why the TMO scheme, and any other RT DC programming scheme for that matter, cannot be said to have reached a sufficiently mature level. The extent of research conducted all over the world in that direction, i.e., toward establishing a sound technical foundation for derivation of tight STBs of RT DC programs, looks quite unimpressive at present. Considering the critical needs for such research, the author hopes that the situation change soon. To establish a useful technical foundation, much better understanding needs to be obtained on issues such as allocation of both computing and communication resources to concurrent method-executions, allocation of resources in manners driven by qualityof-service specifications (e.g., timing specifications in TMOs), utilization of multi-core CPUs, timeliness-guaranteed handling of I/O activities, etc.

Acknowledgements The research reported here has been supported in part by NSF under Grant Numbers 03-26606 (ITR) and 05-24050 (CNS), by ETRI, and by the Konkuk Univ WCU project, No, R33-2008-000-10068-0, sponsored by MEST, Korea. No part of this paper represents the views and opinions of the sponsors mentioned above.

Lifting the Level of Abstraction Dealt with in Programming of NEC Systems

375

References 1. Bruno, E.J., Bollella, G.: Real-Time Java Programming: With Java RTS, Bru 2009, Sun Microsystems (2009) 2. Colmenares, J.A., Kim, K.H., Kim, D.H.: Experimental Evaluation of a Hybrid Approach for Deriving Service-time Bounds of Methods in Real-time Distributed Computing Objects. In: Proc. IESS 2009 (Int’l Embedded Systems Symp.), Langenargen, Germany (2009) 3. Henrich, E., et al.: Realization of an Adaptive Distributed Sound System Based on Globaltime-based Coordination and Listener Localization. In: Proc. ISORC 2008 (11th IEEE CS Int’l Symp. on Object/Component/Service-Oriented Real-time distributed Computing), Orlando, FL, May 2008, pp. 91–99 (2008) 4. Jenks, S.F., Kim, K.H., et al.: A Middleware Model Supporting Time-Triggered MessageTriggered Objects for Standard Linux Systems. Real-Time Systems - The International Journal of Time-Critical Computing Systems 36(1), 75–99 (2007) 5. Kim, K.H., Mori, K., Nakanishi, H.: Realization of Autonomous Decentralized Computing with the RTO.k Object Structuring Scheme and the HU-DF Inter-Process-Group Communication Scheme. In: Proc. IEEE 2nd Int’l Symp. on Autonomous Decentralized Systems (ISADS 1995), Phoenix, AZ, April 1995, pp. 305–312 (1995) 6. Kim, K.H.: Object structures for real-time systems and simulators. IEEE Computer 30(8), 62–70 (1997) 7. Kim, K.H., Ishida, M., Liu, J.: An Efficient Middleware Architecture Supporting TimeTriggered Message-Triggered Objects and an NT-based Implementation. In: Proc. 2nd IEEE Int’l Symp. on Object-oriented Real-time distributed Computing (ISORC 1999), May 1999, pp. 54–63 (1999) 8. Kim, K.H.: APIs for Real-Time Distributed Object Programming. IEEE Computer 33(6), 72–80 (2000) 9. Kim, K.H., Li, Y.Q., Liu, S., et al.: RMMC Programming Model and Support Execution Engine in the TMO Programming Scheme. In: Proc. ISORC 2005 (8th IEEE CS Int’ Symp. On Object-Oriented Real-Time Distributed Computing), May 2005, pp. 34–43 (2005) 10. Kim, K.H.: A Non-Blocking Buffer Mechanism for Real-Time Event Message Communication. Real-Time Systems 32(3), 197–211 (2006) 11. Kim, K.H., Colmenares, J.A., Rim, K.W.: Efficient Adaptations of the Non-blocking Buffer for Event Message Communication between Real-Time Threads. In: Proc. ISORC 2007 (10th IEEE CS Int’l Symp. on Object & Component Oriented Real-Time Distributed Computing), Santorini, Greece, May 2007, pp. 29–40 (2007) 12. Kim, K.H., Colmenares, J.: Maximizing Concurrency and Analyzable Timing Behavior in Component-Oriented Real-Time Distributed Computing Application Systems. KIISE Journal of Computing Science and Engineering (JCSE) 1(1), 56–73 (2007), http://jcse.kiise.org/ 13. Kim, K.H., Li, Y.Q., Rim, K.W., Shokri, E.: A Hierarchical Resource Management Scheme Enabled by the TMO Programming Scheme. In: Proc. ISORC 2008 (11th IEEE CS Int’l Symp. On Object/Component/Service-Oriented Real-time distributed Computing), Orlando, FL, May 2008, pp. 370–376 (2008) 14. Kim, H.J., Park, S.H., Kim, J.G., Kim, M.H., Rim, K.W.: TMO-Linux: A Linux-based real-time operating system supporting execution of TMOs. In: Proc. 5th IEEE Int’l Symp. on Object-Oriented Real-Time Distributed Computing (ISORC 2002), pp. 288–294 (2002)

376

K.H. Kim

15. Kim, J.G., Kim, M., Kim, K., Heu, S.: TMO-eCos: An eCos-Based Real-Time MicroOperating System Supporting Execution of a TMO Structured Program. In: Proc. 8th IEEE Int’l Symp. on Object-Oriented Real-Time Distributed Computing (ISORC 2005), pp. 182–189 (2005) 16. Kopetz, H., Ochsenreiter, W.: Clock Synchronisation in Distributed Real-Time Systems. IEEE Trans. Computers, 933–940 (1987) 17. Kopetz, H.: Real-Time Systems: Design Principles for Distributed Embedded Application. Kluwer Academic Pub., Boston (1997) 18. Liu, S., Kim, K.H., Zhang, Z., Lee, S.P., Rim, K.W.: Achieving High-Level QoS in MultiParty Video-Conferencing Systems via Exploitation of Global Time. In: Proc. ISORC 2009 (12th IEEE CS Int’l Symp. on Object/Component/Service-Oriented Real-time distributed Computing), Tokyo, Japan (March 2009) 19. Mori, K.: Autonomous Decentralized Systems: Concept, Data Field Architecture, and Future Trends. In: Proc. IEEE CS Int’l Symp. on Autonomous Decentralized Systems (ISADS 1993), March 1993, pp. 28–34 (1993) 20. Object Management Group, Real-time CORBA Specification, Version 1.2, (formal/0501-04) (January 2005), http://www.omg.org/cgi-bin/apps/doc?formal/ 05-01-04.pdf 21. Varma, P.: Two Lock-Free, Constant-Space, Multiple-(Impure)-Reader, Single-Writer Structures, US Patent No. 6304924 B1 (2001)

Author Index

Ahn, Chulbum 288 Aikebaier, Ailixier 12 Altenbernd, Peter 308 Bagci, Faruk 58 Baldoni, Roberto 91, 144 Bondavalli, Andrea 69 Bonomi, Silvia 91 Brancati, Francesco 69 Brinkschulte, Uwe 82 Calha, Mario 264 Casimiro, Antonio 264 Ceccarelli, Andrea 69 Cerocchi, Adriano 144 Chang, Chun-Hyon 1 Chen, Chong-Jing 252 Chinnapongse, Vivien 203 Cho, Jong-Sik 121 Chou, Pai H. 252 Coppolino, Luigi 192 Couderc, Paul 332 Cret¸u, Vladimir 227 D’Antonio, Salvatore 192 Dougherty, Brian 36 Edwards, Stephen A. 276 Elespuru, Peter R. 168 Elia, Ivano Alessandro 192 Enokido, Tomoya 12 Ermedahl, Andreas 308 Falai, Lorenzo 69 Fetzer, Christof 215 Gustafsson, Jan

308

Han, Dong-Won 343 Han, Ingu 103, 114 Heu, Shin 1 Huber, Benedikt 180 Huber, Bernhard 296 Hughes, Danny 156 Huygens, Christophe 156

Jang, Jong-Hyun 343 Jenks, Stephen F. 252 Jin, Hyun-Wook 240 Jones, Paul L. 203 Joo, Su-Chong 121 Joosen, Wouter 156 Kim, Doo-Hyun 1 Kim, JungGuk 1 Kim, K.H. 365 Kim, Se-Gi 1 Kim, Sung-Jin 252 Kim, YongUk 288 Kirner, Raimund 47 Kluge, Florian 58 Lee, Insup 203 Lee, Joonwoo 288 Lee, Jung-Hyun 103, 114 Lee, Meong-Hun 121 Lee, Sunggu 24 Lee, Yong-Woong 121 Lisper, Bj¨ orn 308 Lodi, Giorgia 144 Lohn, Daniel 82 Loyall, Joseph P. 320 Macariu, Georgiana 227 Marques, Luis 264 Matthys, Nelson 156 Michiels, Sam 156 Mishra, Shivakant 168 Montanari, Luca 144 Mutschelknaus, Florian 58 Nah, Yunmook

288

Obermaisser, Roman Pacher, Mathias 82 Park, Chanik 24 Park, Jinha 24 Park, Kwang-Ro 343

296

378

Author Index

Podolski, Ina 354 Puﬃtsch, Wolfgang 180 Puschner, Peter 47

Sokolsky, Oleg 203 Song, Seung-Hwa 1 S¨ ußkraut, Martin 215

Querzoni, Leonardo

Takizawa, Makoto Thompson, Chris

144

Rammig, Franz J. 131 Raynal, Michel 91 Rettberg, Achim 354 Rim, Kee-Wook 103, 114 Romano, Luigi 192 Ruﬁno, Jose 264 Samara, Sufyan 131 Satzger, Benjamin 58 Schantz, Richard E. 320 Schiﬀel, Ute 215 Schmidt, Douglas C. 36 Schmitt, Andr´e 215 Schoeberl, Martin 47, 180 Shakya, Sagun 168 Shin, Chang-Sun 121

Ungerer, Theo

12 36

58

Vandormael, Pieter-Jan Verissimo, Paulo 264 Wang, Shaohui 203 Weigert, Stefan 215 White, Jules 36 Yoe, Hyun 121 Yoo, Junbeom 240 Yoo, Sungjoo 24 Yun, Jae-Kwan 343 Zhao, Yuhong

131

332

Software Technologies for Embedded and Ubiquitous Systems, 5th IFIP WG 10.2, SEUS 2007

Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Human Error, Safety and Systems Development: 7th IFIP WG 13.5 Working Conference, HESSD 2009, Brussels, Belgium, September 23-25, 2009, Revised ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Advances in Web and Network Technologies and Information Management: AP Web WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Next Generation Information Technologies and Systems: 7th International Conference, NGITS 2009 Haifa, Israel, June 16-18, 2009 Revised Selected Papers ... Applications, incl. Internet Web, and HCI)

Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC 2004, San Diego, CA, USA, July 6, 2004, Revised Selected Papers ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)

Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Web Reasoning and Rule Systems: Third International Conference, RR 2009, Chantilly, VA, USA, October 25-26, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)

E-Commerce and Web Technologies: 10th International Conference, EC-Web 2009, Linz, Austria, September 1-4, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Networked Services and Applications - Engineering, Control and Management: 16th EUNICE IFIP WG 6.6 Workshop, EUNICE 2010, Trondheim, Norway, June ... Applications, incl. Internet Web, and HCI)

Reasoning Web. Semantic Technologies for Information Systems: 5th International Summer School 2009, Brixen-Bressanone, Italy, August 30 - September 4, ... Applications, incl. Internet Web, and HCI)

Reasoning Web. Semantic Technologies for the Web of Data: 7th International Summer School 2011, Galway, Ireland, August 23-27, 2011, Tutorial Lectures ... Applications, incl. Internet Web, and HCI)

Internet and Network Economics: 5th International Workshop, WINE 2009, Rome, Italy, December 14-18, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)

Service-Oriented Computing: Agents, Semantics, and Engineering: AAMAS 2009 International Workshop, SOCASE 2009, Budapest, Hungary, May 11, 2009, ... Applications, incl. Internet Web, and HCI)

Advanced Internet Based Systems and Applications: Second International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2006, Hammamet, ... Applications, incl. Internet Web, and HCI)

On the Move to Meaningful Internet Systems: OTM 2009: Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, ... Applications, incl. Internet Web, and HCI)

Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)

On the Move to Meaningful Internet Systems: OTM 2009: Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, ... Applications, incl. Internet Web, and HCI)

Data and Applications Security and Privacy XXIV: 24th Annual IFIP WG 11.3 Working Conference, Rome, Italy, June 21-23, 2010, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)

Ubiquitous Intelligence and Computing: 6th International Conference, UIC 2009, Brisbane, Australia, July 7-9, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Web and Wireless Geographical Information Systems: 9th International Symposium, W2GIS 2009, Maynooth, Ireland, December 7-8, 2009. Proceedings ... Applications, incl. Internet Web, and HCI)

Testing of Software and Communication Systems: 21st IFIP WG 6.1 International Conference, TESTCOM 2009 and 9th International Workshop, FATES 2009, Eindhoven, ... Networks and Telecommunications)

Software Technologies for Embedded and Ubiquitous Systems (Lecture Notes in Computer Science, 6399)

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close

Software Technologies for Embedded and Ubiquitous Systems: 7th IFIP WG 10.2 International Workshop, SEUS 2009 Newport Beach, CA, USA, November 16-18, ... Applications, incl. Internet Web, and HCI)

Software Technologies for Embedded and Ubiquitous Systems, 5th IFIP WG 10.2, SEUS 2007

Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Human Error, Safety and Systems Development: 7th IFIP WG 13.5 Working Conference, HESSD 2009, Brussels, Belgium, September 23-25, 2009, Revised ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Advances in Web and Network Technologies and Information Management: AP Web WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Next Generation Information Technologies and Systems: 7th International Conference, NGITS 2009 Haifa, Israel, June 16-18, 2009 Revised Selected Papers ... Applications, incl. Internet Web, and HCI)

Next Generation Information Technologies and Systems: 7th International Conference, NGITS 2009 Haifa, Israel, June 16-18, 2009 Revised Selected Papers ... Applications, incl. Internet Web, and HCI)

Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC 2004, San Diego, CA, USA, July 6, 2004, Revised Selected Papers ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)

Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Web Reasoning and Rule Systems: Third International Conference, RR 2009, Chantilly, VA, USA, October 25-26, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)

E-Commerce and Web Technologies: 10th International Conference, EC-Web 2009, Linz, Austria, September 1-4, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Networked Services and Applications - Engineering, Control and Management: 16th EUNICE IFIP WG 6.6 Workshop, EUNICE 2010, Trondheim, Norway, June ... Applications, incl. Internet Web, and HCI)

Reasoning Web. Semantic Technologies for Information Systems: 5th International Summer School 2009, Brixen-Bressanone, Italy, August 30 - September 4, ... Applications, incl. Internet Web, and HCI)

Reasoning Web. Semantic Technologies for the Web of Data: 7th International Summer School 2011, Galway, Ireland, August 23-27, 2011, Tutorial Lectures ... Applications, incl. Internet Web, and HCI)

Internet and Network Economics: 5th International Workshop, WINE 2009, Rome, Italy, December 14-18, 2009, Proceedings (Lecture Notes in Computer ... Applications, incl. Internet Web, and HCI)

Service-Oriented Computing: Agents, Semantics, and Engineering: AAMAS 2009 International Workshop, SOCASE 2009, Budapest, Hungary, May 11, 2009, ... Applications, incl. Internet Web, and HCI)

Advanced Internet Based Systems and Applications: Second International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2006, Hammamet, ... Applications, incl. Internet Web, and HCI)

On the Move to Meaningful Internet Systems: OTM 2009: Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, ... Applications, incl. Internet Web, and HCI)

Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)

On the Move to Meaningful Internet Systems: OTM 2009: Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, ... Applications, incl. Internet Web, and HCI)

Data and Applications Security and Privacy XXIV: 24th Annual IFIP WG 11.3 Working Conference, Rome, Italy, June 21-23, 2010, Proceedings (Lecture ... Applications, incl. Internet Web, and HCI)

Ubiquitous Intelligence and Computing: 6th International Conference, UIC 2009, Brisbane, Australia, July 7-9, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Web and Wireless Geographical Information Systems: 9th International Symposium, W2GIS 2009, Maynooth, Ireland, December 7-8, 2009. Proceedings ... Applications, incl. Internet Web, and HCI)

Testing of Software and Communication Systems: 21st IFIP WG 6.1 International Conference, TESTCOM 2009 and 9th International Workshop, FATES 2009, Eindhoven, ... Networks and Telecommunications)

Software Technologies for Embedded and Ubiquitous Systems (Lecture Notes in Computer Science, 6399)

Recommend Documents