Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4388
David Hutchison Spyros Denazis Laurent Lefevre Gary J. Minden (Eds.)
Active and Programmable Networks IFIP TC6 7th International Working Conference, IWAN 2005 Sophia Antipolis, France, November 21-23, 2005 Revised Papers
13
Volume Editors David Hutchison University of Lancaster, Faculty of Science and Technology Computing Department, InfoLab21 Lancaster, LA1 4WA, UK E-mail:
[email protected] Spyros Denazis University of Patras Department of Electrical and Computer Engineering Patras, Greece E-mail:
[email protected] Laurent Lefevre INRIA RESO / LIP - University of Lyon Ecole Normale Supérieure de Lyon 46 Allée d’Italie, 69364 Lyon Cedex 07, France E-mail:
[email protected] Gary J. Minden The University of Kansas Information & Telecommunication Technology Center 2335 Irving Hill Road, Lawrence, KS 66045-7612, USA E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): C.2, D.2, H.3.4-5, K.6, D.4.4, H.4.3 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13
0302-9743 3-642-00971-9 Springer Berlin Heidelberg New York 978-3-642-00971-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12642465 06/3180 543210
Preface
This volume contains the proceedings of the 7th International Working Conference on Active and Programmable Networks (IWAN 2005) that was held during November 21–23, 2005, in Sophia Antipolis, Cote d’ Azur, France, jointly organized by Hitachi Europe and INRIA. IWAN 2005 took place against a backdrop of questions about the viability and necessity of a conference that deals with an area perceived by many as having run its full course. The Organizing Committee, during the preparations of the conference, took these concerns seriously and reflected them in the theme of this year’s event, entitled “Re-incarnating Active Networking Research,” and expanding the scope of past calls for papers into topics that have emerged from active and programmable networks. The result was a success because we received 72 submissions, a number that exceeded our expectations and in fact is one of the highest in the history of the conference. The distinguished Technical Program Committee set high standards for the final program; each one of the submitted papers received three peer reviews with detailed comments and suggestions for the authors. In total, 13 papers were accepted for the main program sessions with 9 papers accepted unconditionally and the remaining 4 papers being conditionally accepted with shepherding by selected Program Committee members. The Program Committee also noted that from the papers that were not selected a considerable number were also of high quality, therefore a small committee was formed to suggest which of these could be accepted as short papers, resulting in the final selection of an additional set of 13 short papers also included in these proceedings. The full-length papers were organized, according to their content, into four sessions, “Programmable Networks and Heterogeneity,” “Network Architectural Frameworks,” “Node Architectures,” and “Services,” with the short papers providing the material for two further sessions. We have kept the same paper order and structure in this volume. With the aim of addressing the issues implied by the IWAN theme, we invited two distinguished keynote speakers who have been at the forefront of active and programmable networks research since the beginning: Ken Calvert of Kentucky University and Gísli Hjálmtýsson of Reykjavik University. In his talk “Reflections on the Development of Active and Programmable Networks,” Ken Calvert discussed the past and present of the field. Gísli Hjálmtýsson, speaking about “Architecture Challenges in Future Network Nodes,” addressed future directions. Finally, the program concluded with a panel “The Guises of Active Networking––Strategy or Destiny?” chaired by Lidia Yamamoto, where invited panellists evaluated the shortcomings and the impact of active networks on the computer networking field. We thank the Technical Program Committee for their thorough work in reviewing, selecting and shepherding the papers. Special thanks go to Robin Braun of the University of Technology, Sydney and Jean-Patrick Gelas of INRIA, for their outstanding work as Publicity Chairs, Mikhail Smirnov for selecting and organizing a Tutorial Day with truly state-of-the-art tutorials, and last but not least the secretarial support of
VI
Preface
Beatrice Dessus and colleagues of the Hitachi Sophia Antipolis Lab, and Danièle Herzog of INRIA and Jean-Christophe Mignot from the LIP Laboratory, the hidden heroes of every conference. Above all, we would like to thank all the authors who honored IWAN 2005 by submitting their work and the 55 participants of the conference; they are the ones who really made the conference a success.
November 2005
Spyros Denazis Laurent Lefevre Gary Minden David Hutchison
Introduction
Active and programmable networking has, over the past several years, laid the foundations of providing an easy, but robust, introduction of new network services to devices such as routers and switches by adding dynamic programmability to network equipment. Network programmability and service deployment architectures are necessary to bring the right services to the customer at the right time and in the right location. However, research focused exclusively on the field has been declining during the last couple of years and is currently carried out in the context of other emerging or more “fashionable” research areas instead. Under these circumstances, the 7th International Working Conference on Active Networks (IWAN 2005), through its call for papers and consequently its program, was called upon to explore whether active networking (AN) research can be re-incarnated within these new research fields. Such motivation was inspired by the fact that methods and technologies that have been explored in active and programmable networking research have helped to realize the trend toward various research initiatives including ad-hoc networks, autonomic networks and communications, overlays, sensor networks and content-aware distribution. Furthermore, the issues that AN technology has tried to address find themselves at the center of any future research agenda that touches upon service and network operations at large, and in this respect AN will always be relevant. It is also our belief that many of the problems identified by the AN and programmable networks research agenda are far from being solved in a satisfactory, scalable and secure way and in this respect research in the new fields are likely to be haunted by the lack of appropriate solutions in the absence of embracing a programmable networking – if not an AN – approach. Having reached such a stage, we included in our program two keynotes that covered the past, present and future of AN. These talks were chosen with two contrasting observations in mind. First, the lack of wide acceptance of AN derives at least partly from the inability to identify truly compelling example applications (not necessarily killer applications), and this is an aspect that should not be neglected by future researchers. But second, AN and programmable networks have been quite successful in helping define simple and expressive reference models and elegant solutions to persisting problems like security, QoS, and multicasting, but perhaps the lack of their adoption could be attributed to the fact that they have generally not been presented with a strong business model in mind. A message conveyed by this year’s IWAN – which may be the last of its kind – is to bear in mind the likely utility of the technology developed by this community which goes beyond hype and buzzwords like active and programmable networks. And we should keep the essence of it in new research areas such as those already mentioned, including ad-hoc networks, autonomic networking and content-aware distribution. November 2005
Organization
Organizing Committee General Chair General Co-chair Program Comittee Chairs
Publication Chair Publicity Chair
Tutorial Chair Local Arrangements committee Local Technical Support
David Hutchison, Lancaster University, UK Akira Maeda, Hitachi, Japan Spyros Denazis, Hitachi Europe, France / Universtiy of Patras, Greece Laurent Lefevre, INRIA, France Gary J. Minden, The University of Kansas, USA Alessandro Bassi, Hitachi, France Jean-Patrick Gelas, INRIA, France Robin Braun, University of Technology of Sydney, Australia Mikhail Smirnov, Fraunhofer FOKUS, Germany Beatrice Dessus, Hitachi Europe, France Daniele Herzog, INRIA, France Jean Christophe Mignot, LIP, Ecole Normale Superieure de Lyon, France
Program Committee Bobby Bhattacharjee Christian Bonnet Elisa Boschi Matthias Bossardt Raouf Boutaba Marcus Brunner Ken Calvert Ken Chen Hermann DeMeer Simon Dobson Takashi Egawa Alex Galis Erol Gelenbe Peter Graham Jim Griffioen Robert Haas Toru Hasegawa Gisli Hjalmtysson Doan Hoang
University of Maryland, USA Eurecom, France Hitachi Europe, France ETH, Switzerland University of Waterloo, Canada NEC, Germany University of Kentucky, USA University Paris 13, France University of Passau, Germany University College of Dublin, Ireland NEC Corporation, Japan University College of London, UK Imperial College, UK University of Manitoba, Canada University of Kentucky, USA IBM, Switzerland KDDI R&D laboratoies, Japan Reykjavik University, Iceland University of Technology, Sydney, Australia
X
Organization
Javed Kahn Andreas Kind Guy Leduc Dave Lewis John Lockwood Laurent Mathy Douglas Maughan Eckhart Moeller Sandy Murphy Scott Nettles Naomichi Nonaka Cong-Duc Pham Guy Pujolle Danny Raz Paul Roe Lukas Ruf Joan Serrat Nadia Shalaby Yuval Shavitt Vijay Sivaraman James Sterbenz Toshiaki Suzuki Yongdong Tan Dirk Trossen Christian Tschudin John Vicente Tilman Wolf Miki Yamamoto Krzysztof Zielinski Martina Zitterbart
Kent State University, USA IBM, Switzerland University Liege, Belgium Trinity College Dublin, Ireland Washington University, USA Lancaster University, UK U.S. Department of Homeland Security, USA Fraunhofer Fokus, Germany Trusted Information Systems Labs, USA University of Texas – Austin, USA Hitachi Ltd., Japan University of Pau, France LIP6, France Technion, Israel Queensland University of Technology, Australia ETH, Switzerland UPC, Spain Princeton University, USA Tel Aviv University, Israel CSIRO (ICT Centre), Sydney, Australia University of Kansas (USA) / Lancaster University (UK) Hitachi Ltd, Japan Southwest Jiaotong University, China Nokia, USA University of Basel, Switzerland INTEL, USA University of Massachusettes, USA Kansai University, Japan University of Mining and Metallurgy Krakow, Poland University of Karlsruhe, Germany
Table of Contents
Programmable Networks and Heterogeneity Validating Inter-domain SLAs with a Programmable Traffic Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elisa Boschi, Matthias Bossardt, and Thomas D¨ ubendorfer Cross-Layer Peer-to-Peer Traffic Identification and Optimization Based on Active Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I. Dedinski, H. De Meer, L. Han, L. Mathy, D.P. Pezaros, J.S. Sventek, and X.Y. Zhan Towards Effective Portability of Packet Handling Applications across Heterogeneous Hardware Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Baldi and Fulvio Risso
1
13
28
Architectural Frameworks Architecture for an Active Network Infrastructure Grid – The iSEGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T.K.S. LakshmiPriya and Ranjani Parthasarathi
38
Network Services on Service Extensible Routers . . . . . . . . . . . . . . . . . . . . . . Lukas Ruf, K´ aroly Farkas, Hanspeter Hug, and Bernhard Plattner
53
A Network-Based Response Framework and Implementation . . . . . . . . . . . Marcus Tylutki and Karl Levitt
65
Towards Resilient Networks Using Programmable Networking Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linlin Xie, Paul Smith, Mark Banfield, Helmut Leopold, James P.G. Sterbenz, and David Hutchison
83
Node Architectures Towards the Design of an Industrial Autonomic Network Node . . . . . . . . Martine Chaudier, Jean-Patrick Gelas, and Laurent Lef`evre A Web Service- and ForCES-Based Programmable Router Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evangelos Haleplidis, Robert Haas, Spyros Denazis, and Odysseas Koufopavlou
96
108
XII
Table of Contents
An Extension to Packet Filtering of Programmable Networks . . . . . . . . . . Marcus Sch¨ oller, Thomas Gamer, Roland Bless, and Martina Zitterbart
121
Services SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Sifalakis, A. Mauthe, and D. Hutchison
132
A Programmable Structured Peer-to-Peer Overlay . . . . . . . . . . . . . . . . . . . . Marius Portmann, S´ebastien Ardon, and Patrick S´enac
145
Interpreted Active Packets for Ephemeral State Processing Routers . . . . Sylvain Martin and Guy Leduc
156
Short Papers A Secure Code Deployment Scheme for Active Networks . . . . . . . . . . . . . . Le¨ıla Kloul and Amdjed Mokhtari
168
Securing AODV Routing Protocol in Mobile Ad-Hoc Networks . . . . . . . . Phung Huu Phu, Myeongjae Yi, and Myung-Kyun Kim
182
Extensible Network Configuration and Communication Framework . . . . . Todd Sproull and John Lockwood
188
A Model for Scalable and Autonomic Network Management . . . . . . . . . . . Amir Eyal and Robin Braun
194
Intelligibility Evaluation of a VoIP Multi-flow Block Interleaver . . . . . . . . ´ Juan J. Ramos-Mu˜ noz, Angel M. G´ omez, and Juan M. Lopez-Soler
200
A Web-Services Based Architecture for Dynamic-Service Deployment . . . Christos Chrysoulas, Evangelos Haleplidis, Robert Haas, Spyros Denazis, and Odysseas Koufopavlou
206
The Active Embedded Ubiquitous Web Service Framework . . . . . . . . . . . . Dugki Min, Junggyum Lee, and Eunmi Choi
212
Framework of an Application-Aware Adaptation Scheme for Disconnected Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Umar Kalim, Hassan Jameel, Ali Sajjad, Sang Man Han, Sungyoung Lee, and Young-Koo Lee Kinetic Multipoint Relaying: Improvements Using Mobility Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´erˆ ome H¨ arri, Fethi Filali, and Christian Bonnet
218
224
Table of Contents
The Three-Level Approaches for Differentiated Service in Clustering Web Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Myung-Sub Lee and Chang-Hyeon Park On the Manipulation of JPEG2000, In-Flight, Using Active Components on Next Generation Satellites . . . . . . . . . . . . . . . . . . . . . . . . . . L. Sacks, H.K. Sellappan, S. Zachariadis, S. Bhatti, P. Kirstein, W. Fritsche, G. Gessler, and K. Mayer TAON: A Topology-Oriented Active Overlay Network Protocol . . . . . . . . Xinli Huang, Fanyuan Ma, and Wenju Zhang
XIII
230
236
247
A Biologically Inspired Service Architecture in Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frank Chiang and Robin Braun
253
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259
Validating Inter-domain SLAs with a Programmable Traffic Control System Elisa Boschi1 , Matthias Bossardt2 , and Thomas D¨ubendorfer2 1
Hitachi Europe Sophia Antipolis Lab, France
[email protected] 2 Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology, ETH Z¨urich, Switzerland {bossardt,duebendorfer}@tik.ee.ethz.ch
Abstract. For network users and service providers it is important to validate the compliance of network services to the guarantees given in Service Level Agreements (SLAs). This is particularly challenging in inter-domain environments. In this paper, we propose a novel solution for inter-domain SLA validation, based on programmable traffic processing devices that are attached to routers and located in several autonomous systems. Using our service management infrastructure, the measurement logic is deployed on the traffic processing devices in a flexible and secure way. We safely delegate partial network management capability from network operators to network users, which are enabled to configure service logic on the traffic processing devices. At the same time, the management infrastructure guarantees against negative influence of the network user’s configuration on network stability or other user’s traffic. Via the flexible configuration of service logic, our system gives network users powerful means to observe quality of service parameters agreed upon in SLAs. We present a detailed scenario of the SLA validation service and its deployment across several administrative domains. Keywords: Inter-domain measurement, programmable networks, SLA validation, network service, management delegation.
1 Introduction The need for verifiable quality differentiation of network services is one major trigger for the deployment of measurements in IP networks. Services like VoIP, multimedia streaming, video telephony or e-gaming require a minimum guaranteed level of network performance. Internet Service Providers (ISPs) negotiate a contract with their customers called Service Level Agreement (SLA) in which they specify in measurable terms the service to be furnished. One of the main problems faced by ISPs is how to deploy SLAs that cross ISP boundaries (inter-domain SLAs) to achieve end-to-end SLA enforcement. The problem stems from the fact that although ISPs can control and monitor their own network, which allows them to validate their intra-domain SLAs, they have only minimal information about the characteristics and performance of other networks. Also customers that stipulate SLAs with one single ISPs have concerns that the agreed D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 1–12, 2009. © IFIP International Federation for Information Processing 2009
2
E. Boschi, M. Bossardt, and T. D¨ubendorfer
Quality of Service has been met, and are therefore interested in end-to-end, interdomain measurements. Classical measurement architectures determine end-to-end, or edge-to-edge performance, comparing ingress and egress reports from two measurement devices located at the end points of a flow. These architectures though, are not sufficient to determine performance of specific path portions, or to determine which segments failed to provide the expected Quality of Service (QoS) in case the end-to-end guarantees are not met. If for instance the delay is higher than agreed in the SLA, it is not possible to determine in which administrative domain the higher delay occurred (or in other words: which ISP is responsible for not meeting the requirements). Another problem with such architectures is that they require to configure two edge devices and retrieve information from them. This configuration is difficult in case the devices are not located in the same administrative domain, since ISPs have major security related concerns in delegating any management function to third parties. These concerns are based on the risk that third party configurations may negatively affect network stability or other user’s network traffic. In this paper, we present a novel solution for inter-domain SLA validation that allows deploying measurement logic on distributed devices in a flexible and secure way. The system is flexible in that it allows the deployment of almost arbitrary service logic with just a few restrictions that we specify later in this paper. These restrictions, together with the concept of traffic ownership [9,5] are used in our system to address the ISPs’ security concerns. The goal of our architecture is to configure on demand several measurement devices along a flow path in a multi domain environment, and to process the raw measurement data, in order to determine the QoS experienced by the flow on several nodes of its path. With these kind of measures, an end user, or a monitoring application, could determine not only the end-to-end QoS, but also the QoS provided on different path segments even if belonging to different administrative domains. This paper is organized as follows. Section 2 discusses the state of the art in interdomain measurements. Section 3 discusses our distributed architecture for end-to-end SLA validation, describing the underlying traffic control system and how it can be used for effective, flexible and secure inter-domain measurements. A detailed scenario is presented in Section 4. Finally, we draw our conclusions in Section 5.
2 State of the Art 2.1 Inter-domain QoS Models The authors of [17] distinguish three different models being pursued in the attempt to provide inter-domain QoS: bilateral, cooperative and third-party. In the bilateral approach two providers interconnect at one or more points and agree on a set of metrics, measurement methods, service classes, settlements and issue resolution processes in a customized way. Generally the solutions agreed have a very low reusability for other peering agreements. Moreover these contracts involve just two parties, two neighboring ISPs, limiting the feasibility of end-to-end SLA validation (and QoS provision) to very “simple” cases.
Validating Inter-domain SLAs with a Programmable Traffic Control System
3
Cooperative approaches extend bilateral ones by defining a set of rules that a group of cooperating ISPs has to follow to provide inter-domain QoS within that group. These rules include the definition of common metrics, common SLA monitoring and reporting methodologies, common tools. This approach requires standard measurement and reporting techniques, metrics, data formats. While the IETF [13] is partly working towards this goal, e.g. [16,14] a full standardized inter-domain QoS provision process is unlikely to come in the near future. A more flexible approach is provided by the third-party model. There a third party composes an end-to-end service offer out of the single provider offerings and is responsible for the site-to-site measurement, and the metric definitions. Our approach falls into this category. 2.2 Inter-domain Measurement The National Internet Measurement Infrastructure (NIMI) [20] is a software system for building network measurement infrastructures consisting of measurement servers (called NIMI probes) and measurement configuration and control software running on separate hosts. Each probe reports to, and is configured by, a Configuration Point of Contact (CPOC), typically one per administrative domain. The probes are accessed through access control checks, and communications between all NIMI components are encrypted via public key credentials. NIMI does not require a particular set of measurement tools, new tools can be included in the infrastructure by writing a wrapper for them and propagating both tool and wrapper to all NIMI probes. The IP Measurement Protocol (IPMP) [15] is a protocol based on packet probes suited to measure packet delay at router level. This active measurement protocol operates as an echo protocol, allowing hosts in a domain to gather information from router units along the end-to-end path. However, only a limited set of measurements can be taken: packet loss, one-way packet length, round-trip time, and one-way delay. In [3] inter-domain measurements are configured sending XML-based documents called Specification of Monitoring Service (SMS) to a controller, located on each administrative domain crossed by the flow to be monitored. The configuration is sent to the controller in the source domain, and then with a cascade model, each domain configures its intra-domain measurement and forwards the request pertaining to the rest of the path to the controller of the subsequent autonomous system. The Diameter protocol [11] is used for secure inter-domain communication. All the above systems miss a flexible system to automatically deploy the requested services (i.e. a measurement technique for a particular metric) to the appropriate measurement devices in the network. Also, they do not provide adequate guarantees for network data privacy and against intended or unintended misuse of the system once the user has been authorized to configure the system. This considerably lowers the acceptance of a system especially if multiple network operators are involved.
3 Traffic Control System To validate the conformance of a service to the guarantees given in an SLA involving several domains, it is necessary to set up an inter-domain QoS measurement. In this
4
E. Boschi, M. Bossardt, and T. D¨ubendorfer
section we describe the traffic control system that we use to configure, deploy, and perform the measurement. The measurement results are exported to a collector, and used as input for the QoS metrics computation done in a component called evaluator. Collector and evaluator are not part of the traffic control system and therefore are not further described in this section. 3.1
Network Model
The network model of the traffic control system distinguishes four different roles: Internet number authority, Traffic control service provider (TCSP), Internet service provider (ISP), and Network user. The TCSP manages traffic control (TC) services. It sets up contracts with many ISPs that subsequently attach Traffic Processing Devices (TPDs), to some or all of their routers and enable their network management system to program and configure these devices (see Figure 1). The introduction of a TCSP helps to scale the management of our service. A network user needs only a single service registration with the TCSP instead of a separate one with each ISP. A network user must first register with the TCSP before using the traffic control system. The TCSP checks the identity of the network user performing similar actions as a digital certification authority (CA), e.g. offline verification of an official identity card or online verification of a digital certificate issued by a trusted CA. To verify the claimed ownership of IP addresses the user wants to control traffic for, the TCSP checks with Internet number authorities if the IP addresses are indeed owned by the network user. Ownership of (ranges of) IP addresses is maintained in databases of organizations such as ARIN, RIPE NCC, etc. Upon successful user identification, access to the traffic control system is granted. The binding of a network user to the set of IP addresses owned and the subsequent verification when using the TC service is implemented with digital certificates signed by the TCSP. After successfully registering to the basic TC service, a network user can initiate the deployment of a specific service (e.g. QoS traffic monitoring), which is implemented on top of the TC service. 3.2
Node Architecture
The node architecture is based on a legacy Internet router with basic filtering and redirection mechanisms. The router is extended with a programmable Traffic Processing Device (TPD), as shown in Figure 1. Network user traffic can be redirected permanently based on source or destination IP address in the transported IP packet to the traffic processing device, processed according to the service requested by the network user, and further sent along its path. Services are composed of components that are arranged as directed graphs [18,6], each of which performs some well defined packet processing. When the TPD processes a network packet, it first executes traffic control, such as e.g. monitoring, on behalf of the owner of the source IP address (first processing stage) and subsequently on behalf of the owner of the IP destination address (second processing stage). The functionality of service components is restricted as specified in Section 3.3. For instance, service components
Validating Inter-domain SLAs with a Programmable Traffic Control System
5
Fig. 1. Node architecture
that match traffic by header fields, payload (or payload hashes), or timing characteristics can be installed, configured, and activated instantly. 3.3
Security Considerations
For the proposed distributed traffic control service to be accepted by ISPs, it is vital, that traffic processing devices keep the network manageable by the network operators and that it cannot be misused for an attack itself. This is addressed by the core concept of the traffic control system, which is traffic ownership. We restrict the traffic control for each network address owner to his/her own traffic, i.e. packets to/from owned IP addresses. This allows our service to assure that traffic owned by other parties is not affected. Hence, collateral damage caused by misconfigurations or malicious behavior of users having access to such devices can be prevented. In addition, ISPs do not loose control over their network. As any misuse of such a novel service must be prevented from the very beginning for gaining acceptance by network operators, we restrict it even further. We do not allow the adaptive device to modify the source and the destination IP address of a packet. Such rerouting could wreak havoc easily (causing routing loops, interference with other routing mechanisms, transparent source spoofing, or “forwarding” of attack traffic). Also the TTL (time to live) field of IP packets is a field we cannot allow to be modified as it aims to set an upper bound of network resources a packet is able to use. Furthermore, we need to prevent that the service can cause amplifying network-like effects. The traffic control must not allow the packet rate to increase. In addition, the amount of the network traffic leaving the traffic processing device must be equal or less1 compared to the amount of traffic entering it. I.e. packet size may only stay the same or become smaller. New service components for the traffic processing devices must be checked for security compliance before deployment. The security concerns of ISPs with respect to the danger of delegating partial control of the network from the network operator to the customers are adequately addressed as countermeasures against effects of misconfigurations and misuse were taken into consideration when designing the traffic control system. 1
For e.g. logging, statistics or trigger event services, we will allow a reasonable amount of additional traffic.
6
E. Boschi, M. Bossardt, and T. D¨ubendorfer
3.4 Deployment Process The deployment process is subdivided into TCSP, ISP, TCU and Device layer. A Traffic Control Unit (TCU) is defined as the combination of a router interface and all the TPDs that traffic from the interface can be redirected to. The TPDs can be physically separate, even located at different sites, or integrated into future routers. The complete deployment process is carried out at the management stations of TCSP and ISP. For each service a layer offers, a service descriptor specifies the following: – The mapping of the service to sub-services offered by the layer below. – The set of mandatory and optional parameters, their default values and their mapping to parameters for sub-services. – Restrictions that direct the placement of service logic. For each layer a database contains context information about the infrastructure relevant to that layer. These logical databases may be merged into two physical databases located at the TCSP and ISP management stations. Information at the TCSP layer includes the identities of contracted ISPs and properties of their networks, e.g. whether they transport transit traffic or provide a stub network, or BGP information. At the ISP layer relevant information includes the location of the TCUs, e.g. whether it is located at the border of a network or in the core network. At the TCU layer details about the pairing of TPDs and routers are kept as context information. Finally, at the Device layer information about the make and version of TPDs and routers and their configuration interfaces must be kept. Additionally, context information can contain dynamic state information about managed objects and deployed services. Deployment logic on each layer maps the service request from the layer above to services provided by the layer below based on information provided by the service descriptors. Taking into account restrictions specified in the service descriptor and context information from the databases, sub-services are placed on the managed objects of the corresponding lower layer (ISPs, TCUs, TPDs and routers, respectively). The deployment process ends with the configuration of the devices that were selected to run part of the service logic.
4 Delay Variation Measurement Scenario This section describes how the traffic control architecture can be used to perform endto-end QoS measurements for SLA validation. We describe here the deployment of the delay variation measurement service. Other QoS measurements (e.g. one way delay, one way loss, round trip time) could be similarly performed. 4.1 Scenario Description Let’s suppose that a corporate Internet user, e.g. a video streaming company (from now on identified simply as “user”), wants to verify that the performance parameters agreed in an SLA stipulated with ISP1 have been met. The SLA specifies guarantees on the jitter of a video the user sends from point A in the network of ISP1 to point Z in
Validating Inter-domain SLAs with a Programmable Traffic Control System
7
the administrative domain of ISPn (cf. Figure 2). The user employs our traffic control system to measure the jitter of the video flow, and verify the SLA conformance of the service he’s providing. IP delay variation, or jitter, is defined as the difference of one-way delay values for selected packets [8]. This metric can be calculated by performing passive measurement of one-way delay [1] for subsequent packets (e.g. of a flow) and then calculating the differences. Jitter is particularly robust with respect to differences and skews of the clocks of the measurement points. This allows performing jitter measurements even if the TPDs are not synchronized. As described in [8], indications of reciprocal skew of the clocks can be derived from the measurement and corrections are possible. The measurement requires the collection of data at two measurement points at least (in our case the TPDs) situated at the end points of the flow. If more measurement points are involved in the measurement, detailed information on jitter values on the different path segments (or autonomous systems) can be obtained. The data returned by the TPDs need to be collected and post-processed to calculate the jitter. These actions are performed at a collector where data collected at the TPDs are exported to, and an evaluator, where the delay variation is computed. In an inter-domain environment it is crucial to have standard formats and protocols to export measurement results. The IP Flow Information eXport (IPFIX) [7] protocol is about to become a standard for exporting flow information from routers and probes, while standardized methods for packet selection and the export of per packet information will be provided by the IETF group on packet sampling (PSAMP) [21]. 4.2 Measurement Service Deployment at the TCSP Layer The network user requests the TCSP to deploy the jitter monitoring service in the network, selecting the service from among those that the TCSP has made available (see Figure 2). The service, with the parameters it requires are described in the service description (see Figure 3a), while the user provides the necessary parameter values in the service request shown in Figure 2. In the service request, the user identifies itself as the owner of the source address of the flow and specifies parameters necessary to the measurement service. Parameters listed are the source and destination addresses of the flow, the addresses of the uplink interface of source A to ISP1 and the downlink interface from ISPn to the flow destination Z, and the address of the data collector the measurement results have to be sent to. A set of fields that allow to further specify the flow to be measured are optional. In this example, we provide also source and destination ports. The user can specify start and end time of the measurement. These parameters are defined as optional in the TCSP layer service description (see Figure 3a). That is, if no value is provided default values are used: the service is started immediately, as specified with the default element and/or ended when the given flow ends using a flow termination criterium that is hard coded into the service component. The TCSP maps the request to the sub-service component jitterOnEgressRouters and selects appropriate ISPs according to the restriction as defined in the restrictionDefinition. In our case, this restriction limits the ISPs to those on the BGP path between A and
8
E. Boschi, M. Bossardt, and T. D¨ubendorfer
Fig. 2. Service request and deployment
Z 2 . The restriction yields true if the ASnumber taken from the context database is one that can be found on the BGP path. The TCSP obtains the BGP path using the function getBGPPath specified in the service description for the TCSP layer (see Figure 3a). Required and optional parameters needed by the jitterOnEgressRouters sub-service complete the description. The parameters can be defined as fixed values, taken from the network user’s service request or calculated using a function (e.g. getNextAS). 4.3 Measurement Service Deployment at the ISP Layer ISPs in turn select appropriate TCUs and TPDs to deploy and configure the jitter service components according to the service descriptor in Figure 3b. The restrictions are to deploy the service ”only on egress routers on the path from previous autonomous system prevAS to next autonomous system nextAS (refID 1) and at the uplink (refID 2) and downlink interfaces (refID 3)”. The sub-services, or service components, to be deployed on all selected TPDs are specified in the service descriptor: pktSelection, timestamp, IDGeneration, jitterRecordGeneration, ipfixExport. The deployed components are shown in Figure 4. Packets belonging to the flow to be measured are first selected and then timestamped. Timestamping should be done as early as possible in order to get the best possible accuracy for the arrival time and to reduce further variable delay effects like variations in packet processing time. The selection function could either select all subsequent packets in a 2
In case of BGP route changes the service must be deployed again using the same descriptors.
Validating Inter-domain SLAs with a Programmable Traffic Control System
(a)
(b)
Fig. 3. Jitter measurement service description at the TCSP layer (a) and at the ISP layer (b)
9
10
E. Boschi, M. Bossardt, and T. D¨ubendorfer
Fig. 4. Jitter measurement service components
given time interval, or be a sampling function. In our scenario, the pktSelection function selects all packets matching the parameters specified in the request: source and destination address, and source and destination port. It is necessary to recognize the same packets captured at different measurement points to correlate packet arrival events. To recognize a packet parts of its header and eventually payload need to be captured. To reduce the amount of measurement data a unique packet ID can be calculated from the header and part of the content e.g. by using a CRC or hash function [12,10,23]. This identifier must be unique during a relatively long period of the flow measurement in order to avoid duplicate packet identification. At least, timestamp and packet ID need to be exported. Packet size should be reported as well since it influences the measurement: the delay measurement starts with the first bit of the packet sent from the source and ends with the last bit received at destination. These data are exported with IPFIX using the solution proposed in [4], that optimizes the export of per-packet information and is therefore particularly suited in case of jitter measurements. 4.4
Scalability Considerations
The scaling factors that our distributed traffic control service depends on are 1) the number of service subscribers (i.e. network users), 2) the total number of ISPs deploying our service, 3) the number of service components installed per network user, and 4) the bandwidth of network links. These scaling factors influence several parameters: Service logic and state per TPD. Following the estimation made in [5] on the number of users and number of services run on a TPD per user, the memory needed is a rather modest requirement. Signalling effort. Even if we assume that each AS corresponds to one ISP and that all ISPs offer our distributed traffic control service, signalling overhead due to the secure distribution of the small service deployment messages by the TCSP to a few thousand ISPs is not a bottleneck. Traffic processing capacity. A hardware based solution for our traffic processing devices is favorable. Research prototypes of FPGA based devices exist that can
Validating Inter-domain SLAs with a Programmable Traffic Control System
11
concurrently filter 8 mill. flows [22] on a 2.5 Gbps (OC-48) link. According to [2], faster FPGAs allow achieving advanced packet filtering at 10 Gbps (OC-192). A more detailed scalability analysis can be found in [5].
5 Conclusions In this paper, we have shown how guarantees given in an SLA spanning multiple autonomous systems can be validated by setting up inter-domain QoS measurements. We described a scenario where an end-to-end jitter measurement service is deployed and automatically configured with our management system using a service description language. The jitter measurement service is executed on a programmable traffic control system, able to safely delegate partial control over traffic processing devices to network users. Our system allows network users to deploy almost arbitrary measurement logic on distributed traffic processing devices attached to routers located in different autonomous systems. The service deployment has just a few restrictions, that guarantee against negative influence on network stability or other user’s traffic. The measurement service is highly modular, i.e. composed of functional components. This modularity has the advantages of reusability of some components in other services, and it simplifies restriction compliance tests. Thus, we provide a network user with the means to locate his measurement logic on the end-to-end path to be monitored, and ISPs with guarantees against collateral damage due to users’ intended or unintended misbehavior. Jitter measurement is by no means the only service that can be provided by our TCS: one way delay, one way loss, RTT, flow volume are other metrics that can be measured (just to cite a few). Inter-domain measurement is not even the only application for our system. Mitigation of DDoS attacks, as well as other emerging applications based on the presented TCS have already been investigated in [9,5] and showed promising results. Leveraging acceptance by ISPs is vital. We think that our traffic control system offers many incentives for ISPs and at the same time a high level of security against misuse, which was a major concern with other approaches in the field of active and programmable networks and is still one of the major concerns in inter-domain data exchange or control delegation. Currently, we are implementing the measurement service components for Click router– [18] and Field Programmable Gate Array (FPGA)–based [19] traffic processing devices.
References 1. Almes, G., Kalidindi, S., Zekauskas, M.: RFC 2679, A One-way Delay Metric for IPPM (September 1999), ftp://ftp.rfc-editor.org/in-notes/rfc2679.txt 2. Attig, M., Lockwood, J.W.: A Framework for Rule Processing in Reconfigurable Network Systems. In: Proceedings of IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, USA (April 2005) 3. Boschi, E., Denazis, S., Zseby, T.: A Measuremnet Infrastructure for Inter-domain SLA Validation. Elsevier Journal of Computer Communications: Special Issue on End-to-end QoS Provision Advances (to appear)
12
E. Boschi, M. Bossardt, and T. D¨ubendorfer
4. Boschi, E., Mark, L.: Use of IPFIX for Export of Per-Packet Information, Internet-draft, work in progress (2005) 5. Bossardt, M., D¨ubendorfer, T., Plattner, B.: Enhanced Internet Security by a Distributed Traffic Control Service Based on Traffic Ownership. Elsevier Journal of Network and Computer Applications: Special Issue on DDoS and Intrusion Detection (to appear, 2005) 6. Bossardt, M., Hoog Antink, R., Moser, A., Plattner, B.: Chameleon: Realizing Automatic Service Composition for Extensible Active Routers. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982. Springer, Heidelberg (2004) 7. Claise, B., Bryant, S., Sadasivan, G., Leinen, S., Dietz, T.: IPFIX Protocol Specification, Interrnet-draft, work in progress (2005) 8. Demichelis, C., Chimento, P.: RFC 3393, IP Packet Delay Variation (November 2002), ftp://ftp.rfc-editor.org/in-notes/rfc3393.txt 9. D¨ubendorfer, T., Bossardt, M., Plattner, B.: Adaptive Distributed Traffic Control Service for DDoS Attack Mitigation. In: IEEE Proceedings of IPDPS, International Workshop on Security in Systems and Networks SSN (2005) 10. Duffield, N., Grossglauser, M.: Trajectory Sampling for Direct Traffic Observation. In: ACM SIGCOMM 2000 (2000) 11. Calhoun, P., et al.: RFC 3588, Diameter Base Protocol (September 2003), ftp://ftp.rfc-editor.org/in-notes/rfc3588.txt 12. Graham, I.D., Donnelly, S.F., Martin, S., Martens, J., Cleary, J.G.: Nonintrusive and accurate measurement of unidirectional delay and delay variation on the internet. In: INET 1998 Proceedings (1998) 13. Internet Engineering Task Force, http://www.ietf.org/ 14. IP Performance Metrics (IPPM), http://www.ietf.org/html.charters/ippm-charter.html 15. IPMP homepage, http://watt.nlanr.net/AMP/IPMP/ 16. IP Flow Information Export (IPFIX), http://www.ietf.org/html.charters/ipfix-charter.html 17. Jacobs, P., Davie, B.: Technical Challenges in the Delivery of Interprovider QoS. IEEE Communications Magazine, 112–118 (June 2005) 18. Kohler, E., Morris, R., Chen, B., Jannotti, J., Kaashoek, M.F.: The Click Modular Router. ACM Transactions on Computer Systems 18(3), 263–297 (2000) 19. Lockwood, J., Naufel, N., Turner, J., Taylor, D.: Reprogrammable network packet processing on the field programmable port extender (FPX). In: Proceedings of the ACM International Symposium on Field Programmable Gate Arrays (FPGA 2001) (February 2001) 20. NIMI National Internet Measurement Infrastructure, http://www.ncne.nlanr.net/nimi/ 21. Packet SAMPling (PSAMP), http://www.ietf.org/html.charters/psamp-charter.html 22. Schuehler, D.V., Lockwood, J.W.: A Modular System for FPGA-based TCP Flow Processing in High-Speed Networks. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 301–310. Springer, Heidelberg (2004) 23. Zseby, T., Zander, S., Carle, G.: Evaluation of Building Blocks for Passive One-way-delay Measurements. In: Proceedings of Passive and Active Measurement Workshop (PAM) (2001)
Cross-Layer Peer-to-Peer Traffic Identification and Optimization Based on Active Networking* I. Dedinski1, H. De Meer1, L. Han2, L. Mathy3, D.P. Pezaros3, J.S. Sventek2, and X.Y. Zhan2 1
Department of Mathematics and Computer Science, University of Passau, Passau, Germany, 94032 {dedinski,demeer}@fmi.uni-passau.de 2 Department of Computing Science, University of Glasgow, Scotland, UK, G12 8QQ {lxhan,joe,xyzhan}@dcs.gla.ac.uk 3 Computing Department, Lancaster University, Lancaster, UK, LA1 4WA {laurent,dp}@comp.lancs.ac.uk
Abstract. P2P applications appear to emerge as ultimate killer applications due to their ability to construct highly dynamic overlay topologies with rapidlyvarying and unpredictable traffic dynamics, which can constitute a serious challenge even for significantly over-provisioned IP networks. As a result, ISPs are facing new, severe network management problems that are not guaranteed to be addressed by statically deployed network engineering mechanisms. As a first step to a more complete solution to these problems, this paper proposes a P2P measurement, identification and optimisation architecture, designed to cope with the dynamicity and unpredictability of existing, well-known and future, unknown P2P systems. The purpose of this architecture is to provide to the ISPs an effective and scalable approach to control and optimise the traffic produced by P2P applications in their networks. This can be achieved through a combination of different application and network-level programmable techniques, leading to a cross-layer identification and optimisation process. These techniques can be applied using Active Networking platforms, which are able to quickly and easily deploy architectural components on demand. This flexibility of the optimisation architecture is essential to address the rapid development of new P2P protocols and the variation of known protocols.
1 Introduction and Motivation P2P overlays do not adopt any notions of centralised management nor do they employ the traditional static client/server paradigm. Most of the peers in a P2P network are symmetric and can allow their resources to be shared amongst other peers to deliver a common service [Ora01]. Consequently, within a file sharing P2P overlay every peer *
This work has been supported by the grant EPSRC GR/S69009/01 and EuroNGI NoE.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 13–27, 2009. © IFIP International Federation for Information Processing 2009
14
I. Dedinski et al.
can simultaneously act as a server and a client, fetching and providing data objects that range from a few megabytes to approximately one gigabyte in size [GDS03]. By exploiting a user-configurable, arbitrary amount of peers’ end-systems resources, a P2P overlay can perturb the Internet in new ways by constituting random nodes and portions of the network highly loaded for an unpredictable amount of time. Internet Service Providers (ISP)s can hence experience rapidly varying traffic patterns at nonstatically-provisioned portions of their networks, which can adversely impact the network performance experienced by all traffic flows. For example, a recent study that analysed traffic collected from a 1 Gb/s link connecting an ISP backbone to several ADSL areas revealed that at least 49% of the overall traffic is due to P2P applications, as this has been reported by well-known transport port numbers [AG03]. In addition, it has also been recently reported that the proportion of P2P traffic on Tier 1 networks is steady if not increasing the last two years [KBB04]. This, coupled with the dynamicity of P2P traffic, can impact not only the peering relationships among ISPs, but also the volume-based charging imposed by upstream providers. Traditional tools for network management support quite static forms of network and traffic engineering usually based upon offline post-analysis of monitored data and estimated approximation of path, traffic and demand matrices [GR02]. However, the rapidly varying traffic patterns expected by P2P flows are not addressed by such tools, since P2P requests are not guaranteed to be addressed to a few popular servers, as is the case for the client-server environment [SW02]. Rather, the main dynamic of P2P systems is the advertisement of new data objects, which can appear at arbitrary peers [GDS03], and hence operators are in need of more dynamic (real time) mechanisms to provide fine control over the network-wide P2P traffic flow. A longer-term perspective of P2P dynamics is the constant evolution of P2P protocols and the creation of new P2P applications, which are rapidly spreading over the Internet. The P2P phenomenon is still relatively recent and does not conform to any standards or rules regarding program interfaces, connection behaviour, etc. The mutation of the P2P protocols as well as the appearance of new protocols makes tracking of P2P traffic steadily more complicated and static planning of network resources less successful. The P2P community is averse to ISP control of any kind and invents protocols that attempt to prohibit and avoid traffic identification, shaping and blocking. An ISP, therefore, needs to track actual P2P development and to adapt on new techniques and protocols quickly. The store-compute-and-forward model of operation facilitated by network node programmability is particularly suitable for such dynamic, non-standardised and highly unpredictable systems. The additional intelligence and control that is integrated with the network’s main forwarding operation can be exploited to provide for dynamic identification of P2P traffic, and consequently for network performance optimisation in the onset of P2P activity. Such traffic control enforcement can also employ application-aware programmable mechanisms that do not simply shape and block P2P traffic, but favour well-behaved P2P systems and optimise the overall resource utilisation of the network. A comparative study of different programmable network architectures can be found in [CMK99] This paper focuses on the investigation and deployment of a synergistic networklayer and application-aware programmable framework aimed at measuring, managing, and optimising the performance in networks that support P2P applications. Section 2
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
15
discusses existing P2P identification and optimisation approaches and raises its limitations. Section 3 describes the architectural properties of an always-on programmable system that exploits network and application-level knowledge to synergistically detect and the onset of P2P activity and employ traffic optimisation algorithms over both the overlay and the physical network topologies. Preliminary analysis has initially focused on the network-level identification of P2P flows based on their internal traffic behaviour, and comparison between the performance characteristics of P2P and non-P2P flows are presented in section 4. In addition, wavelet analysis results that demonstrate the existence of discriminating information within P2P traffic behaviour are presented. Section 5 concludes the paper and outlines future work directions.
2 P2P Traffic Identification and Optimization Challenges Currently, there are three, widely-used approaches for passive P2P traffic identification: application signatures, transport layer ports and network/transport layer P2P pattern recognition based on heuristics. The application signatures based approach [SSW04] searches for protocol specific patterns inside the packet payloads. The simplicity of this method is obvious, but it also introduces some important problems. First, it cannot be adapted automatically to unknown, recently-introduced P2P protocols. Enhancements to existing protocols, as well as the appearance of new protocols, occur frequently. Second, application-level pattern search in each transport packets creates a higher load compared to other network and transport-layer-based approaches. Finally, some P2P protocols avoid payload inspection by using encryption algorithms. The transport layer port identification [SW02] solves the last two problems. It is easy to use, does not produce too much load at the measurement nodes and does not rely on inspecting application payloads. This method suffers from inability to adapt to modified or recently-introduced protocols. Furthermore, many P2P applications have begun using variable or non-P2P port numbers (HTTP, FTP, etc.) to deliberately avoid port-based identification and allow P2P communication through firewalls. As a result, port-based P2P identification highly underestimates the actual P2P traffic volume [KBB03]. Heuristic based network/transport layer approaches [KBB03, KBF04] use simple network/transport layer patterns, e.g. the simultaneous usage of UDP and TCP ports and the packet size distribution of a P2P flow between two peers. This method gives good performance for existing P2P protocols and can even be used to discover unknown protocols. The problem here, however, is that it is straightforward to construct a new P2P protocol, effectively avoiding the proposed heuristics. Recently, it has been suggested that the observation of host behaviour and the identification of social, functional and application-level patterns therein can lead to accurate traffic classification that obviates the aforementioned concerns [KPF05]. Active P2P traffic identification approaches (active probing) have been used to traverse and gather topological information about different types of P2P networks [Gnu, Tuts04, KaZ]. These approaches use some probing peers called crawlers to connect to a desired P2P network. The crawlers then issue search requests and collect the IP addresses of the answering peers. By collecting these addresses, one can
16
I. Dedinski et al.
reconstruct the overlay topology of the P2P network. One obvious advantage of constructing such a topology is that subsequent P2P traffic measurement and identification needs to concentrate only on flows coming from or directed to IP addresses collected by the crawler. This improves the identification performance considerably and is an example of how application-aware, active probing can support passive P2P identification approaches. On the other hand, active probing has its limitations. In the eDonkey network, for example, it is only possible to discover the eDonkey superpeers in an efficient way. Identifying eDonkey clients can be done efficiently by using passive P2P identification approaches, which track flows coming from and directed to eDonkey superpeers. Therefore, a combination of application-aware active probing and network-level passive identification techniques is a promising strategy. As it has already been mentioned, network-layer controlling techniques do not consider preserving or even improving the functionality and performance of the P2P network. This goal can be achieved by using application-layer optimisation approaches [ADD04, Fry99, DeMeer03, GDS03, LHK04, LBB02, THH04], which all rely on P2P traffic redirection, shaping or proxy caching. These approaches work fine for a big number of P2P protocols, which are still widely open (not encrypted and can be reverse-engineered), so the redirection and shaping are possible. Application-aware programmable mechanisms can transparently provide for micro-services such as application-level routing, and application-specific resource discovery and differentiation. An ISP-controlled overlay mesh can be established in this way to manage the associated (P2P) traffic flows, through an understanding of application-specific data transfers without knowing the details of the underlying physical network. However, application-level approaches are strongly dependent on application-specific semantics. A programmable networking infrastructure that enables the deployment of specific application-aware optimiser and identification components for new, recently reverse-engineered P2P protocols is required, On the other hand, network-layer controlling techniques (shaping and blocking) do not depend on application protocol internals. The combined use of both network-level and application-level optimisation techniques to enforce control mechanisms to optimise the overall network operation opens new promising grounds for research and, at the same time, yields many integration issues. A straightforward example of such synergy is to force P2P clients to use a certain application layer optimisation service provided by the ISP by blocking and shaping non-conforming P2P traffic at the network layer [MTT03].
3 Architectural and Experimental Design This paper describes an always-on Monitoring Measurement and Control (MMC) architecture for P2P identification and network optimization deployed on programmable nodes at strategic points in the network. The choice of these points depends on numerous factors such as programmable node performance, and network topology and load. Instead of statically specifying strategic points, MMC relies on the dynamic instantiation of ALAN proxylets [Fry99] to allow the on-demand installation and removal of components at different programmable network nodes. Such approach also
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
17
allows the fast deployment of application-specific modules for P2P protocols that have been newly reverse-engineered. The ALAN infrastructure operates synergistically with the LARA++ active router framework, essentially offering an additional application-specific programmable layer. LARA++ is a software implementation of a programmable router that augments the functionality of a conventional router/host by exposing a programmable interface, allowing active programs -referred to as active components- to provide network level services on any packet-based network [SFS01]. Figure 1 shows the coarse structure of the proposed architecture. It is divided in three processing planes spread across the network and application layers. These planes try to synergistically address the identification and optimisation challenges presented in section 2. Additionally, a communication module is used to exchange locally collected data among programmable nodes. Its purpose is to enable global identification and optimization of P2P traffic. The communication module can for example be implemented in a centralistic way (programmable nodes exchange information through a central database server). Other possible communication approaches like the construction of a decentralized programmable node overlay structure. This paper does not rely on any particular design of the communication module.
Fig. 1. Architecture – Two Layer Programmability
Measurement and Identification Planes The measurement plane takes as input the traffic, passing through its network node. It captures and aggregates relevant microflow patterns used for traffic clustering. A microflow can be easily identified at the network layer by a 5-tuple including the source and destination IP addresses, transport protocol and transport layer source and destination port numbers, if not encrypted [CBP95, Cla94]. In contrast to common passive flow measurement systems that only record aggregate flow indicators [Bro97, NFL], the flow-based classification and measurement employed by this architecture needs to keep per-packet state in order to compute performance properties such as packet inter-arrival time and packet size distributions. Such state needs to be captured continuously but at the same time reduced at a minimum by periodically substituting perpacket information with aggregate statistics. Packet timestamps and lengths kept for each active flow are being periodically aggregated by the pattern detection measurement modules to distribution summary statistics. The raw indicators are subsequently
18
I. Dedinski et al.
removed from the flow table. Further state reduction through sampling is considered with systematic count-based sampling schemes being appropriate candidates due to the simplicity of the sampling algorithm, but also due to its ability to capture the traffic’s burstiness and produce accurate approximation of the parent population for both single and multi-point performance metrics [CPB93, Zse05].
(a)
(b)
Fig. 2. (a) Identification-Measurement Plane and (b) Optimization Plane
Figure 2(a) presents a more detailed view on the components of the identification and measurement planes. Plug-in measurement modules that are implemented as ALAN proxylets act as microflow pattern detectors and periodically compute properties of the identified microflows in order to classify them into similarity classes, according to some per-packet and/or inter-packet characteristics. Estimates of the packet size distribution, for example, can be used to distinguish bulk from interactive and signalling TCP flows. Although this can prove more challenging than exploiting simple heuristics, interactive flows’ dependence on user-behaviour can be revealed from the periodicity of their time series as well as from their distribution’s heavy tails. The main dynamic behaviour of the measurement process lies in the ability of new instances of measurement-based pattern detector ALAN proxylets to be loaded ondemand to compute additional metrics. This also influences the operation of the traffic monitor and shaper which can be dynamically configured to record and deliver additional per-packet information passed to the corresponding microflow pattern detector. The microflow patterns collected at the measurement layer are stored in a flow database. The microflow classifier component, which is located at the identification plane, searches for correlations between microflows passing through this access point node. The microflows are clustered into similarity classes according to the patterns collected at the measurement plane. Supervised and unsupervised adaptive techniques for flow classification can be applied to discover similarity classes. A comparative study of classification (clustering) methods is presented in [Zai97]. Unsupervised techniques have the advantage of detecting new unknown traffic classes. The addresses of all source and the destination hosts producing traffic in the same similarity class are collected in the database. With this information, a topology can be constructed, containing all hosts that produce traffic belonging to that similarity class. The micro and
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
19
macroflow information (patterns, similarity classes) is stored with some history, which allows the correlation of flows that are not necessarily passing through the instrumented node at the same time. The macroflow classifier uses the topology information to distinguish between P2P like traffic and non-P2P traffic. For example in P2P systems the participating nodes are mostly acting both as client and server. A P2P topology collected by the macroflow aggregator would thus contain incoming and outgoing flows for the most of the nodes. On the other hand, a topology collected for the HTTP protocol would have a two level hierarchical structure, with each node uniquely identifiable as a server or a client. And a DNS topology would have a multi level hierarchical structure. The knowledge about the topology is a powerful traffic identification criterion, which can help to identify even unknown traffic. The macroflow aggregator exports its macroflow knowledge to the other programmable nodes by using the communication module. Respectively, the macroflow classifier uses macroflow information coming from the communication module to construct a local view of a certain traffic topology and to decide whether it is P2P like. Finally active crawler ALAN proxylets are dynamically loaded at the application layer to traverse and discover the overlay networks of reverse-engineered P2P protocols. The results of the crawlers are stored into the flow database and are compared with the results of the identification components at the network layer in order to improve and verify the performance of the latter. Optimization Plane Based on the information collected and produced by the measurement and the identification planes, optimization and manipulation actions regarding identified P2P protocols can be taken at the optimisation plane (Figure 2(b)). The blocking and shaping component for unknown P2P traffic initiates network level actions, without semantic knowledge about a P2P protocol. Such actions can be priority-based routing, complete blocking or bandwidth limits for certain traffic flows (similarity classes). These actions may have regional or global character. The P2P optimizer components do not block or shape P2P traffic, but instead redirect it, thus avoiding network congestions and at the same time improving the P2P network performance. Different applicationlevel optimization techniques are applicable for different P2P protocols, so the application optimizer component has to be adjusted to a predefined set of supported applications. Some of the application level techniques need to install blocking or shaping strategy components for the network layer to prohibit P2P traffic to run around the optimizing entities (P2P caches, proxies). Experimental Design A critical aspect of the methodology described above is to determine the network-level characteristics of P2P application traffic of relevance to different microflow pattern detectors. An isolated network tracing environment has been constructed to capture traces of synthetic traffic from a number of P2P applications. An eDonkey-specific setup using this environment is shown in Figure 3 below.
20
I. Dedinski et al.
Fig. 3. Experimental Packet Capture Environment
Isolated experimental configurations have been setup in three participating sites (universities of Glasgow, Passau, and Lancaster), and initial tests using the eDonkey protocol have been conducted. No other network traffic was active on the sub-nets behind the isolation routers, although the traced traffic was subject to variable delays due to congestion in the campus intra-networks and the Internet. The analysis discussed below was with respect to a single content-providing peer interacting with a single downloading peer at a remote site. After a short search for the content at the superpeer, the downloading peer initiated the download of a 600 MB file from the providing peer. Full packet traces were recorded at the edges of each isolated configuration by GPS-synchronised GigEMON passive monitoring systems, which are engineered to perform lossless, full-packet capture of traffic in both directions to disk storage [End]. Pattern Detection Methodology The initial approach to detecting network-level packet patterns is to look for specific temporal behaviours associated with the packets in a P2P micro-flow. For timedependent processes that are stationary, the traditional approach is to perform a Fourier analysis of the signal, thus converting the large number of experimental data points to a small, bounded number of coefficients for the Fourier basis functions in the Fourier expansion of the signal. Due to the time varying nature of the Internet, one does not expect the temporal behaviour of a micro-flow to be stationary. Wavelet analysis techniques [Chu92] have been developed to address temporal behaviour that is non-stationary. Wavelet techniques exhibit good time resolution in the high frequency domain (implying good localization in time) as well as good frequency resolution in the low frequency domain [AV98]. For a non-stationary signal, wavelet analysis can determine sharp transitions simultaneously in both frequency and time domains. This property of wavelet analysis makes it possible to detect hidden but highly regular traffic patterns in packet traces. The result of wavelet analysis is a small, bounded number of coefficients for scaling and wavelet basis functions.
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
21
Initially, the collected eDonkey traces have been subjected to wavelet analysis to understand whether such analysis provides the ability to distinguish eDonkey traffic from non-eDonkey traffic. To that end, analysis results for an FTP session transferring the same file in the experimental environment are provided for comparison with the eDonkey analysis results in the following discussion.
4 Preliminary Analysis and Results Before discussing the results of wavelet analysis, it is informative to first look at various statistical characterizations of the measured traces. The initial focus has been with respect to packet inter-arrival time and packet size distributions. Each flow consists of control packets used by the applications to locate and initiate data transfers and data packets that correspond to the actual download of the requested content. The packet patterns for these two, different sub-flows are expected to exhibit significantly different characteristics, since the control/signalling traffic is an RPC-style interaction at the application level, while the data traffic is more characteristic of an asynchronous, reliable flow from the server to the client. Therefore, signalling and data traffic are considered separately below. Inter-Arrival Time Distributions Figures 4 and 5 below show the probability/frequency distribution functions for the inter-arrival time distributions for the data and signalling sub-flows, respectively. Two observations are immediately obvious from these figures: •
•
The data streams exhibit resonances at the same values of packet inter-arrival time; even though the resonance at 10-4 seconds for the p2p data sub-flow is more pronounced than for the ftp flow, it is not sufficiently significant to confidently discriminate between eDonkey and FTP based upon this evidence alone. The signalling sub-streams, on the other hand, exhibit significant differences in their inter-arrival time spectra, especially for large inter-arrival times. If these differences persist over different congestion regimes of the intervening networks (to be established experimentally in future work), then it is feasible that high-confidence discrimination can be achieved with appropriate pattern matching filters.
(a)
(b)
Fig. 4. Probability distribution as a function of packet inter-arrival time (in seconds) for (a) eDonkey data and (b) ftp data flows
22
I. Dedinski et al.
(a)
(b)
Fig. 5. Frequency distribution as a function of packet inter-arrival time (in seconds) for (a) eDonkey control and (b) ftp control flows
Packet Size Distributions Figures 6 and 7 below show the frequency distribution functions for the packet size distributions for the data and signalling sub-flows, respectively. Two observations are immediately obvious from these figures: •
•
The data streams exhibit strong resonances at essentially the same values of packet size; note that the p2p data sub-flow exhibits a small number of packets interspersed between the two strong resonances; it is not clear whether the presence of these intermediate packet size values is sufficiently significant to confidently discriminate between eDonkey and FTP based upon this evidence alone. The signalling sub-streams, on the other hand, exhibit differences in their packet size spectra, especially for large packet sizes. If these differences persist over different congestion regimes of the intervening networks (to be established experimentally in future work), then it is feasible that highconfidence discrimination can be achieved with appropriate pattern matching filters.
(a)
(b)
Fig. 6. Frequency distribution as a function of packet size (in bytes) for (a) eDonkey data and (b) ftp data flows
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
(a)
23
(b)
Fig. 7. Frequency distribution as a function of packet size (in bytes) for (a) eDonkey control and (b) ftp control flows
Note that analysis by others has yielded similar insights [Nla]. The results for the signalling sub-stream, if they hold across congestion regimes, augurs well for developing pattern matching filters for detection of control sub-flows when the packets are encrypted, as most encryption schemes are packet-size preserving, modulo padding introduced to make the packet size a multiple of 4 or 8 bytes. Wavelet Analysis Despite the fact that the distributions for the p2p and ftp data flows shown in figures 4 and 6 do not show significant differences, scatter plots of these traces with respect to both attributes do show significantly more variation in the p2p trace than in the ftp trace (not shown due to space reasons). This indicates that there is scope for discriminating between such traces. The first attempt at such discrimination has been through the use of wavelet analysis. Only aspects of wavelet analysis that are critical to this application are discussed below; interested readers are urged to consult [DAU92] for more details. In terms of wavelet theory, a signal Y0(t) (e.g. bursty traffic) can be represented as: J
Y 0 (t ) = Y J (t ) + ∑ detail j {Y (t )} j =1
J
= ∑ a (k )ϕ J ,k (t ) + ∑∑ d Yj (k )ψ j ,k (t ) J Y
j =1 k
k
where k and j denote time and frequency indices, respectively. The ing (approximation) coefficients, and the
aYJ (k ) are scal-
d Yj (k ) are wavelet (detail) coefficients. A
scaling function with low-pass filter properties
ϕ J ,k
is used to capture an approxima-
tion signal (low-frequency signal), and a wavelet function ψ
j, k
with band-pass filter
properties is used to extract the detailed information (high-frequency signal).
24
I. Dedinski et al.
Wavelet signal analysis consists of three primary phases: •
The analysis phase decomposes the data into a hierarchy of component signals by iteration. Starting with a signal S, the first step of the transform decomposes S into two sets of coefficients, namely approximation coefficients
aY1 (k ) and de-
j
• •
tail coefficients d Y (k ) . The input S is convolved with the low-pass filter to yield the approximation coefficients. The detail coefficients are obtained by convolving S with the band-pass filter. This procedure is followed by down-sampling by a factor of 2, and this process is then applied to the down-sampled signal. At each iteration of this phase, the input is a signal of length N, and the output is a collection of two or more derived signals which are all of length N/2. We obtain the approximation signal at the highest level j and the collection of detail coefficients at each level until the end of decomposition. The signal processing phase compresses, de-noises and detects the underlying signal by modifying some wavelet coefficient values and then reconstructs the signal using these altered wavelet coefficients. The synthesis phase is the inverse of the iteration phase.
The wavelet coefficients are key to matching the spike pattern of a signal. By focusing on the Probability Density Function (PDF) of wavelet coefficients, one can determine the algorithm for selecting a suitable threshold and dropping nonsignificant coefficients when reconstructing the signal. Tools to perform this analysis on micro-flow, inter-arrival time distributions obtained from the the experimental environment described in section 4 above have been developed and validated. These tools have been applied to the data sub-flow distributions shows in figure 4 above. Exploration of the information resulting from these analyses is in its beginning stages; the initial focus has been to concentrate on measures of the significance of wavelet coefficients at each decomposition level, denoted by the index j. Table 1 below shows the significance for each decomposition level in the analysis of the p2p and ftp data sub-flow traces. Table 1. Level-specific coefficient significance from wavelet decomposition
p2p ftp
j=1 -5.3490 -9.6785
j=2 -4.5921 -5.3787
j=3 -4.0269 4.5109
j=4 -5.0712 -4.5486
j=5 6.7115 4.8186
j=6 -1.1379 -6.0644
A particularly useful way to visualize the relationships between decompositions for different applications is to plot these values on a radar diagram. Such a diagram is shown in Figure 8 below. From this diagram, it is immediately apparent that the p2p trace is significantly different from the ftp trace at levels 1, 3 and 6. While Figure 4 did not provide sufficient information to discriminate between the two data flows, it is apparent that sufficient discriminating information is contained in the traces.
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
25
Fig. 8. Wavelet coefficient significance per level for eDonkey and ftp data sub-flows
Real-Time Considerations A micro-flow detector must detect patterns in real-time to feed the optimization layer in the architecture described in section 4. It is important to estimate the ability of a detector based upon wavelet analysis to function in real-time. In order to perform a wavelet analysis to level N, a sufficient number of packets must be accumulated in the inter-arrival time histogram for a micro-flow such that 2N bins have nonzero frequency counts. For example, if analysis to level 3 is sufficient to discriminate the flows of interest, then 8 non-zero bins are required. In addition to considerations regarding minimum number of observations, one must also assure that the time duration of the observed portion of the flow is sufficient to mask any startup transients and to yield a distribution that is representative of the entire flow. As with any predictive technique, the expected accuracy of predictions will improve if more data is available over which to perform the analysis. Given that most p2p sessions are operational for a long period of time, there is a tradeoff to consider between rapidity of prediction versus accuracy of prediction. Reduction of false positives, consistent with rapid enough prediction, is the long-term goal.
5 Conclusions and Future Work This paper has introduced an architecture for exploiting active/programmable networking techniques to manage p2p applications. Crucial to the success of an infrastructure based upon this architecture is the ability to detect onset of p2p activity by passively observing network-level micro-flows. Application-level probing mechanisms can support the network-layer identification process, which can in turn be the basis for application-layer optimisation techniques that improve P2P performance.
26
I. Dedinski et al.
The project has constructed an experimental infrastructure that enables the full packet capture of synthetic micro-flow traffic. The traces resulting from this synthetic traffic enables the assessment of a number of p2p pattern detectors for driving such management activities. The first analysis technique that has been assessed has been based upon the use of wavelets. Preliminary results indicate that these techniques may prove useful for constructing real-time p2p pattern detectors. Future work will focus on extensive measurement and analysis of further invariant factors that can be measured in real-time to identify P2P activity in short timescales. Traces of a number of p2p and non-p2p applications will be captured and analysed to gain confidence in the efficacy of wavelet analysis.
References [ADD04]Andersen, F.U., De Meer, H., Dedinski, I., Kappler, C., Mäder, A., Oberender, J., Tutschku, K.: Enabling Mobile P2P Networking. In: Kotsis, G., Spaniol, O. (eds.) Euro-NGI 2004. LNCS, vol. 3427, pp. 219–234. Springer, Heidelberg (2005) [AG03] Azzouna, N.B., Guillemin, F.: Analysis of ADSL traffic on an IP Backbone link. In: Proceedings of IEEE Globecom 2003, San Francisco, USA, December 1-5 (2003) [AH01] Akansu, A.N., Haddad, R.A.: Multiresolution signal decomposition –Transforms, Subbands, and Wavelets. Academic Press, London (2001) [AV98] Abry, P., Veitch, D.: Wavelet Analysis of Long Range Dependent Traffic. IEEE Transactions on Information Theory 44(1), 2–15 (1998) [Bro97] Brownlee, N.: Traffic Flow Measurement: Experiences with NeTraMet, IETF, Network Working Group, RFC2123 (March 1997) [CBP95] Claffy, K.C., Braun, H.-W., Polyzos, G.C.: A Parameterizable Methodology for Internet Traffic Flow Profiling. IEEE Journal On Sketched Areas In Communications 13(8), 1481–1494 (1995) [Chu92] Chui, C.K.: An introduction to the wavelets. Academic Press, London (1992) [Cla94] Claffy, K.C.: Internet Traffic Characterization. PhD thesis, University of California, San Diego, CA (1994) [CPB93] Claffy, K., Polyzos, G., Braun, H.-W.: Application of Sampling Methodologies to Network Traffic Characterisation. In: ACM SIGCOMM 1993, San Francisco, California, USA, September 13-14 (1993) [Dau92] Daubechies, I. (ed.): Ten Lectures on Wavelets. S.I.A.M (1992) [EH96] Erlebacher, G., Hussaini, M.Y., Jameson, L.M. (eds.): Wavelets: Theory and Applications. Oxford University Press, Oxford (1996) [END] http://www.endace.com [Fry99] Fry, M., Ghosh, A.: Application Level Active Networking. Computer Networks 31(7), 655–667 (1999) [GDS03] Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, D., Levy, H.M., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: Proceedings of the nineteenth ACM symposium on Operating systems principles, Boston, October 19-22 (2003) [Gnu] Gnutella, http://www.gnutella.com/ [GR02] Grossglauser, M., Rexford, J.: Passive Traffic Measurement for IP Operations. In: Park, K., Willinger, W. (eds.) The Internet as a Large-Scale Complex System. Oxford University Press, Oxford (2002)
Cross-Layer Peer-to-Peer Traffic Identification and Optimization
27
[Kaz] KaZaa, http://www.kazaa.com/ [KBB03] Karagiannis, T., Broido, A., Brownlee, N., Claffy, K., Faloutsos, M.: File-sharing in the Internet: A characterization of P2P traffic in the backbone. Technical report (November 2003) [KBB04] Karagiannis, T., Broido, A., Brownlee, N., Claffy, K.C., Faloutsos, M.: Is P2P dying or just hiding? In: IEEE Global Internet and Next Generation Networks (Globecom 2004), Dallas, Texas, USA, 29 November - 3 December, 2004, [KBF04] Karagiannis, T., Broido, A., Faloutsos, M., Claffy, K.: Transport layer identification of P2P traffic. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement 2004 (2004) [KPF05] Karagiannis, T., Papagiannaki, D., Faloutsos, M.: BLINC: Multilevel Traffic Classification in the Dark. In: ACM SIGCOMM 2005, Philadelphia, PA, USA (August 2005) [LBB02] Leibowitz, N., Bergman, A., Ben-Shaul, R., Shavit, A.: Are File Swapping Networks Cacheable? Characterizing P2P Traffic. In: 7th International Workshop on Web Content Caching and Distribution (WCW 2003), Boulder, CO (2002) [LHK04] Le Fessant, F., Handurukande, S., Kermarrec, A.-M., Massoulie, L.: Clustering in peer-to-peer file sharing workloads. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 217–226. Springer, Heidelberg (2005) [Mal01] Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, San Diego (2001) [MTT03] de Meer, H., Tutschku, K., Tran-Gia, P.: Dynamic Operation in Peer-to-Peer Overlay Networks. Praxis der Informationsverarbeitung und Kommunikation -Special Issue on Peer-to-Peer Systems (PIK Journal) (June 2003) [NFL] Cisco IOS Netflow, on-line resource, http://www.cisco.com/warp/public/ 732/Tech/nmp/netflow/index.shtml [Nla] http://www.nlanr.net/NA/Learn/packetsizes.html [Ora01] Oram, A. (ed.): Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology. O’ Reilly (2001) [SFS01] Schmid, S., Finney, J., Scott, A.C., Shepherd, W.D.: Component-based Active Network Architecture. In: IEEE Symposium on Computers and Communications (July 2001) [SSW04] Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th international conference on World Wide Web (2004) [SW02] Sen, S., Wong, J.: Analyzing peer-to-peer traffic across large networks. In: Second Annual ACM Internet Measurement Workshop (2002) [THH04] Tagami, B., Hasegawa, T., Hasegawa, T.: Analysis and Application of Passive Peer Influence on Peer-to-Peer Inter-Domain Traffic. In: Proceedings of the Fourth International Conference on Peer-to-Peer Computing. IEEE, Los Alamitos (2004) [Tuts04] Tutschku, K.: A measurement-based traffic profile of the eDonkey filesharing service. In: Barakat, C., Pratt, I. (eds.) PAM 2004. LNCS, vol. 3015, pp. 12–21. Springer, Heidelberg (2004) [Zai97] Zait, M., Messatfa, H.: Comparative study of clustering methods. Future Gener. Comput. Syst. 13(2-3), 149–159 (1997) [zse05] Zseby, T.: Sampling Techniques for Non-Intrusive QoS Measurements: Challenges and Strategies. In: Computer Communications Special Issue on Monitoring and Measurement (to appear, 2005) [CMK99] Campbell, A.T., de Meer, H., Kounavis, M.E., Miki, K., Vicente, J.B., Villela, D.: A Survey of Programmable Networks. ACM SIGCOMM Comput. Commun. 29(2) (April 1999)
Towards Effective Portability of Packet Handling Applications across Heterogeneous Hardware Platforms* Mario Baldi and Fulvio Risso Politecnico di Torino, Dipartimento di Automatica e Informatica, Torino, Italy {mario.baldi,fulvio.risso}@polito.it
Abstract. This paper presents the Network Virtual Machine (NetVM), a virtual network processor optimized for implementation and execution of packet handling applications. As a Java Virtual Machine virtualizes a CPU, the NetVM virtualizes a network processor. The NetVM is expected to provide a unified layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various network applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from high-end routers to small appliances. Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks onto specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices.
1 Introduction An increasing number of network applications performing some sort of packet processing are being deployed on current IP networks. Well known examples are firewalls, intrusion detection systems (IDS), network monitors, whose execution is must take place in a specific location within the network (e.g., backbone, network edge, on end systems) or, in some cases, be distributed across different devices. In general, such network applications must be deployed on very different (hardware and software) platforms, ranging from routers, to network appliances, personal computers, smartphones. In some cases, the whole range of potential target platforms is not even precisely and finally known at development time. A development and execution platform for packet handling applications with features comparable to the ones of Java and CLR has been thus far not available. This paper reports on work aiming at designing, implementing, and assessing such a platform based on a Network Virtual Machine (NetVM), a new architecture for a (virtual) network processor in which execution of packet handling related functions is *
This work has been carried out within the framework of the QUASAR project, funded by the Italian Ministry of Education, University and Research (MIUR) as part of the PRIN 2004 Funding Program. Its presentation has been supported by the European Union under the E-Next Project FP6-506869.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 28–37, 2009. © IFIP International Federation for Information Processing 2009
Towards Effective Portability of Packet Handling Applications
29
optimized. Specifically, when the NetVM is deployed on network processors or hardware architectures, packet handling related functions can be mapped directly on underlying special purpose hardware (such as ASICs, CAMs, etc) thanks to their virtualization in what are called NetVM coprocessors. This virtual device is programmed with an assembly language, or NetVM bytecode, that supports a set of interactions among the various blocks (e.g. memory, execution units, etc.) inside the NetVM. The project reported by this work addresses also the interaction between NetVM and external environment, e.g. how to download code to the NetVM, how to get the results of code execution, etc.
Fig. 1. NetVM framework
Virtual machines are the basis for the “write once, run anywhere” paradigm, thus enabling the realization and deployment of portable applications. Even though from certain points of view the NetVM has a more limited scope than Java and CLR virtual machines (i.e., the NetVM targets a smaller range of applications), its goals are somewhat more ambitious. In fact, the latter aim at application portability across platforms that, while different from both hardware and software (i.e., operating system) point of view, are similar in being designed to support generic applications. Instead, the NetVM must combine portability and performance; this translates in the capability of effectively deploying available hardware resources (such as processing power, memory, functional units) notwithstanding the significantly different architecture and components of the various hardware platforms targeted. The efficiency and portability of the NetVM has a significant by-product: it makes it a potential candidate for becoming a universal application development platform for network processing units (NPUs). Network processors combine high packet processing rates and programmability. However, programming NPUs is a complex task requiring detailed knowledge of their architecture. Moreover, due to the significant architectural differences, applications must be re-written for each NPU model. Deploying a virtual machine could help dealing with the diversity of network processors by offering a common platform for writing and executing portable applications. On
30
M. Baldi and F. Risso
the one hand, the NetVM hides the architectural details of the underlying NPU from the programmer. On the other hand, being designed specifically for network packet processing, the NetVM has un-matched potential for effective execution on a hardware platform specifically designed for the same purpose. NetVM programming is further simplified by the definition of a high-level programming language that operates according to packet descriptions realized with NetPDL (Network Packet Description Language) [3] and is compiled into native NetVM bytecode, as shown in the top part of Fig. 1. Once NetVM support be provided by commonly deployed network gear, distributed applications could be based on downloading NetVM code on various network nodes and possibly collecting the results deriving by its execution. This paper is structured as follows. Alternatives for the implementation of the NetVM are presented in Section 2. Section 3 outlines the proposed NetVM architecture discussing its main components; performance issues are tackled in Section 4. Section 5 draws some conclusions and briefs current and future work.
2 NetVM Implementation The NetVM aims at providing programmers with an architectural reference, so that they can concentrate on what to do on packets, rather than how to do that. This has been dealt with once for all during the NetVM implementation. This section focuses on how to implement the NetVM on both end-systems and network nodes. Several choices are available, ranging from software emulation — NetVM bytecode is interpreted and for each instruction a piece of native code is executed to perform the corresponding function — optionally with specific hardware support (selected instructions can be mapped to specific hardware available on that platform), to recompilation techniques — e.g. an ahead-of-time (AOT) or just-in-time (JIT) compiler can translate NetVM bytecode into assembler specific for the given platform (es. x86, IXP2400, etc), therefore making use of the processor registers instead of operating on a stack. A further option is to implement the NetVM architecture in hardware, i.e., the proposed architecture can be used as the basis for the design of a hardware device for network processing (e.g. VHDL can be used to create a new chip that implements the NetVM). Taking this option a step further, the NetVM code implementing a set of functionalities (e.g., a NetVM program that tracks the amount of IPv6 traffic) could be compiled in the hardware description of a (possibly integrated) hardware system that implements such functionality (e.g., an ASIC or an FPGA configuration). In other words, the NetVM could provide support to fast prototyping, specification, and implementation of network oriented hardware systems. Since the NetVM design has been modeled after the modern network processor architecture, perhaps the most appropriate implementation option for the NetVM is an AOT/JIT compiler that maps NetVM assembler into a network processor’s native code. This approach also solves one of the problems of network processors, which is their complexity from the programmability point of view.
Towards Effective Portability of Packet Handling Applications
31
3 NetVM Architecture and Components The main architectural choices of the NetVM were driven by the goal of achieving flexibility, simplicity, and efficiency and built upon the experiences maturated in the field of Network Processing Unit (NPU) architectures since they are specifically targeted to network packet processing. The resulting NetVM architecture is modular and built around the concept of Processing Element (NetPE), which virtualizes (or, it could be said, is inspired to) the actual micro-engine of a NPU. Processing Elements deal with only few tasks, but they have to perform them very fast: they have to process data at wire speed and in real time, they have to process variable size data (e.g. IP payload) or/and fragmented data (e.g. an IP payload fragmented over several ATM cells). In addition, they should execute specific tasks, such as binary searches in complex tree structures and CRC (Cyclic Redundancy Code) calculation with stringent time constraints. Multithreading is an expected feature of a NPU, hence an objective of our architectural design: in fact packets are often independent from each other and suitable to be processed independently. For example, one of the first Network Processors — the Intel IXP1200 — is composed of six processing elements called Packet Engines. The larger the number of Processing Elements, the higher is the achievable degree of parallelism, since independent packets could be distributed to these units.
Exchange Buffer 2
NetPE2 (e.g. session statistics)
Output Port
Input Port
Exchange Buffer 1
Output Port
NetPE1 (e.g. filtering)
Network Data
Network Packet
Input Port
NetVM
General Purpose CPU
Shared Memory
CRC coprocessor
Crypto coprocessor
Classification coprocessor
Fig. 2. NetVM configuration example
A NetPE is a virtual CPU (with an instruction set and local memory) that executes an assembly program that performs a specific function and maintains private state. A NetVM application is executed by several NetPEs (for example, Fig. 2 shows an application deploying two NetPEs), each of which may implement a simple functionality; complex structures can be built by connecting different NetPEs together. Moreover NetPEs use specialized functional units (coprocessors, shown in Fig. 2) and various types of memories to exchange data. This modular view derives from the observation that many packet-handling applications can be decomposed in simple functional blocks that can be connected in complex structures. These structures can exploit parallelism or sequentiality to achieve higher throughput. 3.1 Processing Element (NetPE) Architecture The general architecture of a NetPE includes six registers (Program Counter, Code Segment Length, Data Segment Length, Packet Buffer Length, Connection Table
32
M. Baldi and F. Risso
Length, Stack Pointer) in support to the processor operation, a stack used for instruction operands, a connection table whose purpose is outlined in Section 3.2, and a memory encompassing 4 independent segments (Section 3.3). Like most existing virtual processors, the NetVM has a stack-based design where each NetPE has its own stack. A stack-based virtual processor does not encompass general-purpose registers as instructions that need to store or process a value make use of the stack. This grants portability, a plain and compact instruction set and a simple virtual machine. The consequence of this choice is that. The execution model is event-based. This means that the execution of a NetPE is activated by external events, each one triggering a particular portion of code. Typical events are the arrival of a packet from an input, the request of a packet from an output or the expiration of a timer. 3.2 Internal and External Connections Connections are used to connect a NetPE with other NetPEs, with the physical network interfaces, and eventually with user applications. A NetPE can have a number of input and output exchange ports (or ports for the sake of brevity), each coupled to an exchange buffer. Each connection connects an output port of a NetPE to an input port of another one and is used to move data, usually packets, between the two. Although the meaning of a connection is different, the connection model of the NetVM is similar to the one of Click1. Particularly, two types of connections are defined: • •
Push connection: the upstream NetPE passes data to the NetPE at the other end of the connection. This is the way packets usually move from one processing function to the next one in network devices. Pull connection: the downstream NetPE initiates data transfer by requesting the NetPE at the other end of the connection to output a packet. Two options are provided for the downstream NetPE in case no packet is available: (i) it enters a wait state, (ii) an empty exchange buffer is obtained. For example, a NetPE that extracts packets from a buffer and sends them on an output interface uses a pull connection.
Also ports can be either push or pull. The NetVM runtime environment checks the validity of a NetPE interconnection configuration at creation time since there may be some illegal configuration, such as a connection between a push port and a pull port. The number and type of ports of a NetPE is defined by the NetVM application and is maintained in the Connection Table within the NetPE, which is a read-only memory portion. The NetVM runtime environment fills out the connection table during configuration instantiation. Programs can use it to obtain, for every connection, the ID inside NetVM environment, the type (push / pull), and the direction (incoming or outgoing). The NetVM communicates with external entities through of NetVM sockets. For example, if a NetVM is deployed inside the operating system of a desktop PC, external entities could be network devices, file streams or user applications that rely on the NetVM for low-level operations like filtering or network monitoring. 1
In Click [5] a connection is a direct call to a C++ method, while in NetVM it is a communication channel between two independent entities.
Towards Effective Portability of Packet Handling Applications
33
Applications that are intended to receive packets from a NetVM deploy a socket connected, through a push connection, to the push output port of a NetPE. The transfer of packets is initiated by the virtual machine (i.e., by the connected NetPE) and the application receives them through a libpcap-style [2] callback function. Alternatively, an application that is supposed to request data from a NetVM deploys a socket connected to the pull output port of a NetPE. Pull connections are appropriate to applications that retrieve tables, counters, flows, and other similar data. An advantage of the socket/exchange port model is that transferred data is generic since exchange buffers are simple data containers; it follows that the application does not have any implicit information about the data that it receives, i.e., about data type, which must be provided in some other way. 3.3 Memory Architecture A NetPE has four types of memory: one shared among all NetPEs (shared memory), one for private data (data memory), one (local to the NetPE) that contains the program that is being executed (code memory) and one that contains the data (usually a network packet) that is being processed (exchange buffer). Shared memory can be used to store data that is needed concurrently by more than one NetPE (e.g., routing tables or state information). A NetPE is not compelled to use the shared memory: if it needs only local storage, only the Data Memory segment is used. This architecture allows to better isolate different kinds of memory and to increase efficiency through better parallelization of memory accesses. Memory addresses are 32-bit wide, although we do not expect to have such amount of memory (4GB) in network devices. Since the NetVM may be potentially mapped on embedded systems and network processors, the use of high-level memory management systems like garbage collectors is not feasible. Therefore, the bytecode has a direct view of the memory. Furthermore, the memory is statically allocated during the initialization phase: the program itself, by means of appropriate opcodes, specifies the amount of memory it needs for being able to work properly. Obviously, these instructions can fail if not enough physical memory is present. The flexibility lost with this approach is balanced by higher efficiency: the program can access the memory without intermediation thanks to ad-hoc load and store instructions. Specific instructions for buffer copies (a recurrent operation in network processing; some platform have even ad-hoc hardware units) are provided as well, either inside the same memory or between different ones. Moreover, knowing the position and the amount of memory before program execution allows very fast accesses when an AOT/JIT compiler is used because memory offsets can be pre-computed. 3.4 Exchange Buffers Packets are stored in specific buffers, called exchange buffers, which are shared by two NetPE that are on the same processing path in order to minimize racing conditions (and avoid bottlenecks) when exchanging data. For instance, the NetPE1 in Fig. 2 will copy output data (e.g. the filtered packet) in the exchange buffer, which is then made accessible to NetPE2 for further elaboration (e.g. computing session statistics). Although, in principle, data can be moved from a NetPE to another through the
34
M. Baldi and F. Risso
shared memory, this could lead to very poor performance because this memory could become the bottleneck. Vice versa, exchange buffers provide a very efficient exchange mechanism between NetPEs that are on the same processing path. In order to increase packet-handling efficiency, network-specific instructions (e.g. string search) and coprocessors may have direct access to exchange buffers. Instructions for data transfer (to, from and between exchange buffers) are provided as well. Furthermore, instead of moving packet data around, NetPEs can operate on the data contained in the exchange buffer, which are then “moved” from a NetPE to another. This is very efficient because exchanged buffers are not really moved; the NetVM guarantees exclusive access to them, so that only the NetPE that is currently involved in the processing can access to that data. The typical size of exchange buffer is usually limited to some kilobytes; for larger data the shared memory can be used. This stems from the fact that this memory is often used to transport packets, although it can contain also generic data (e.g. fields, statistics or some generic state). In some cases, exchange buffers can contain also sub-portions of packets, as some network processors break packets into separate cells for internal transmission. Usually, a NetPE has a single exchange buffer (i.e. it processes one packet at a time), although the NetPE specification does not prevent to have multiple exchange buffers. Exchange buffers are readable and writeable, although some particular virtual machine implementations could provide read only access for performance purposes or hardware limitations. Under these platforms an AOT/JIT compiler will refuse to build the NetPEs that perform write operations on packet memory. 3.5 Coprocessors The NetVM instruction set is complemented by additional functionalities specifically targeted to network processing. Such functionalities are provided by coprocessors that, as shown in Fig. 2, are shared among the NetPEs. Making coprocessor functionalities explicitly available to the NetVM programmer is beneficial when the NetVM is executed on both general-purpose processors and network processors or special purpose hardware systems. On general purpose systems coprocessors are realized by native code possibly implementing optimized algorithms. Code and data structures can be shared among different modules, thus granting efficient resource usage. For example, in a NetVM configuration with several NetPEs using the CRC32 functionality, the same coprocessor code can be used by all the NetPEs. If the implementation of the CRC32 coprocessor is improved, every NetPE benefits from it without any change in the NetVM implementation or in the application code. Also, more complex functionalities, such as string search or classification, can share data structures and tables among different modules for even better efficiency and resource usage. An example is the AhoCorasick string-matching algorithm, which can build a single automaton to search for multiple strings as requested by different NetPEs. On special purpose hardware systems, such as network processors, coprocessors can be mapped on functional units or ASICs, where present. Consequently, on the one hand the efficiency of NetVM programs is significantly increased when the target platform provides the proper hardware. On the other hand, writing NetVM programs
Towards Effective Portability of Packet Handling Applications
35
represents a simple way of programming network processors or other special purpose hardware systems without having to know their hardware architectural details, yet while exploiting the benefits of their hardware specificities. Communications with NetPEs is based on a well-defined, generic (i.e., not specific of a given processor) interface based on the IN and OUT assembly primitives, while parameters are pushed on the top of the stack. This guarantees a generic invocation method for any coprocessor without the need of any dedicated instructions; therefore coprocessors can be added without modifying the NetIL bytecode. A “standard” coprocessor library (that includes a classification, a connection tracking, a string search and a checksum coprocessor, although some are still under development) is defined in the NetVM specification: a valid NetVM implementation should implement this library and each program using only coprocessors of the standard library should work on any valid NetVM. Additional coprocessors can be added to the library by NetVM implementations or third party libraries can be “linked” to a NetVM and used by applications that have been written to deploy the functionalities of non-standard coprocessors. 3.6 High Level Programming Language NetVM programs are generally written in a high level programming language designed for networking applications, specifically for packet processing. One of such language (NetPFL) enables manipulations of packets and header fields whose format is described through the Network Packet Description Language (NetPDL) [3]. Although a detailed description of NetPDL and NetPFL is outside the scope of this paper, a sample is shown in Fig. 3 to offer a glance in the complexity of using the NetVM. The code instructs the NetVM to return on its exchange port number 1 all packets that, when parsed as Ethernet frames, contain the value 0x0800 in their EtherType field. In other words, this code implements a filter for IPv4 packets. Fig. 3 shows both the syntax in the NetPFL language and the equivalent in the widely known tcpdump [2] packet filtering application. The comparison shows that, even though the NetVM provides the flexibility of a generic packet processing engine, programming a packet filter is not more complicated than specifying it for tcpdump, i.e., a utility specifically targeted and optimized for packet filtering. Hence, the increased flexibility of the NetVM is not traded for increased programming complexity, as well as for (significantly) lower performance, as discussed in the next section. NetPFL: tcpdump:
ethernet.type == 0x800 ReturnPacket on port 1 ether proto 0x800
Fig. 3. High-level code to filter IPv4 packets, in both NetPFL and tcpdump syntax
4 Performance Evaluation Although the current implementation of the NetVM is still in the early stages, a few numerical results are reported in this section in order to provide a first evaluation of
36
M. Baldi and F. Risso
NetVM assembly
BPF assembly
; Push Port Handler
0) ldh
[12]
segment .push
1) jeq 2) ret 3) ret
#0x800 jt 2 jf 3 ; jump to 2) if true, else 3) #1514 ; return the packet length #0 ; return false
.locals 5 .maxstacksize 10 pop ; pop the "calling" port ID push 12 upload.16 push 2048 jcmp.eq send ret
; ; ; ; ;
send: pkt.send out1 ret Ends
; load the ethertype field
push the location of the ethertype load the ethertype field push 0x800 (=IP) cmp the 2 topmost values; jump if true otherwise do nothing and return
; send the packet to port out1 ; return
Fig. 4. NetVM and BPF code to filter IPv4 packets
the proposed architecture. To this purpose the NetVM is compared against the Berkeley Packet Filter (BPF) [1], probably the best-known virtual machine in network processing arena. Fig. 4 shows the assembly code required to implement the filter shown in Fig. 3, for both the NetVM and BPF virtual machines. A first comparison shows that the NetVM assembly is definitely richer than the BPF one, which gives an insight about the possibility of the NetVM assembly. However the resulting program is far less compact (the “core” is six instructions against tree in BPF). This shows one of the most important characteristics of the NetVM architecture: the stack-based virtual machine is less efficient of a competing registerbased VM (such as the BPF is) because it cannot rely on a set of general-purpose registers. Hence, the raw performance obtained by NetVM cannot directly compete against the ones obtained by the BPF. Table 1. NetVM Performance Evaluation Virtual Machine NetVM BPF
Time for executing the “IPv4” filter (clock cycles)
392 64
Table 1 shows the time needed to execute the programs reported in Fig. 4: as expected, the BPF outperforms the NetVM, mainly due to the additional instructions (related to the stack-based architecture) and the poor maturity of the code. However, a NetVM is intended as a reference design and we do not expect its code to be executed as it is. In order to achieve better performance, NetVM code must be translated into native code (thorough a recompilation at execution-time, i.e., AOT/JIT compiling) according to the characteristics of the target platform. This justifies the choice of a stack-based machine, which is intrinsically slower, but its instructions are much simpler to be translated into native code. Performances are expected to be much better after a dynamic recompilation. The implementation of an AOT/JIT compiler is part of our future work on the NetVM.
Towards Effective Portability of Packet Handling Applications
37
5 Conclusions This paper presents the architecture and preliminary performance evaluation of the NetVM, a virtual machine optimized for network programming. The paper discusses the motivations behind the definition of such architecture and the benefits stemming from its deployment on several hardware platforms. These include simplifying and speeding up the development of packet handling applications whose execution can be efficiently delegated to specialized components of customized hardware architectures. Moreover, the NetVM provides a unifying programming environment for various hardware architecture, thus offering portability of packet handling applications across different hardware and software platforms. Further, the proposed architecture can be deployed as a reference architecture for the implementation of hardware networking systems. Finally, the NetVM can be a novel tool for specification, fast prototyping, and implementation of hardware networking systems. Some preliminary results on the performance of a simple NetVM program shows that other simpler virtual machines targeted to networking applications outperform the NetVM that, in turn, provides higher flexibility. Ongoing work on the implementation of a JIT compiler for NetVM code aims at reversing or at least reducing this performance discrepancy. Since writing NetVM native code (bytecode) is not very handy, work is being done towards the definition of a high level programming language and the implementation of the corresponding compiler into NetVM bytecode. Finally, in order to fully demonstrate the benefits, also in terms of performance, brought by the NetVM, further work includes the implementation of the virtual machine and its AOT/JIT compiler for a commercial network processor.
References [1] McCanne, S., Jacobson, V.: The BSD Packet Filter: A New Architecture for User-level Packet Capture. In: Proceedings of the 1993 Winter USENIX Technical Conference, San Diego, CA (January 1993) [2] Jacobson, V., Leres, C., McCanne, S.: Libpcap, Lawrence Berkeley Laboratory, Berkeley, CA. Initial public release (June 1994), http://www.tcpdump.org [3] Risso, F., Baldi, M.: NetPDL: An Extensible XML-based Language for Packet Header Description. In: Computer Networks (COMNET), vol. 50(5), pp. 688–706. Elsevier, Amsterdam (2006) [4] Degioanni, L., Baldi, M., Buffa, D., Risso, F., Stirano, F., Varenni, G.: Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Network Applications. In: 8th IEEE International Conference on Telecommunications (ConTEL 2005), Zagreb, Croatia (June 2005) [5] Morris, R., Kohler, E., Jannotti, J., Kaashoek, M.F.: The Click modular router. In: Proceedings of the 1999 Symposium on Operating Systems Principles (1999)
Architecture for an Active Network Infrastructure Grid – The iSEGrid T.K.S. LakshmiPriya and Ranjani Parthasarathi Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India
[email protected],
[email protected]
Abstract. Although the net processing power in the network is increasing steadily, it is heterogeneous. Hence the immense compute-power may be underutilized at certain points while it remains inadequate at others. This paper proposes an active network-based framework that views the entire network as a single-entity to effectively utilize the network resources. The single-entity model is enabled by establishing an infrastructure grid at the network layer. Such a grid has the advantage of supporting a wide range of application-layer services in the network. Network processors and Active Network technology work in tandem to facilitate this. The network processors with their deeppacket-processing capabilities allow offloading of application-level processing into the network. Active Network technology allows this to take place ondemand. We present the design and architecture of the infrastructure grid, called iSEGrid, and illustrate its use for streaming services. We provide experimental results to indicate the potential and scope of the concept. Keywords: Network Layer Grid, Network Infrastructure Grid, Active Networking, Network Processors, and Grid Architecture.
1 Introduction and Motivation Active Networks (AN) technology has been proposed to support dynamic deployment of services in the network. This involves execution of code carried along with the data packets, at the intermediate nodes of the network. Researchers have extensively studied the potential benefits of this approach to various performance issues in the network [1,2,3]. Application specific tasks such as providing QoS, security, policy management, network resource management, translation, etc., have been shown to benefit from this approach. However, a significant challenge to this technology is the requirement for programmable network elements; especially in a scenario where the routers and switches in the network are built using ASICs and custom-hardware. Custom hardware is used to provide higher performance, however, it lacks the flexibility required for active networks. In this context, the advent of Network Processors (NPs) which provide programmability without compromising on performance, serves as a boost to the AN technology. The benefits of this marriage of NPs and AN have just begun to be explored [4]. D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 38–52, 2009. © IFIP International Federation for Information Processing 2009
Architecture for an Active Network Infrastructure Grid – The iSEGrid
39
The entire spectrum of services from basic packet processing operations (such as classification and routing) to QoS specific operations (such as scheduling and queue management) [5], to application-specific processing in the network (such as deeppacket inspection, filtering, and caching) can be supported actively using NPs. NPs with their multi-core, multi-threaded architecture targeted at network processing functions have the potential to efficiently perform these operations and much more, at wire-speed. Recently, even application layer functions have been ported onto the NPs [19]. NPs can be positioned at the network-core, at the network-edge or as an attached processor at the end systems - both client and server. The functionality at these points may vary in complexity due to the heterogeneity of the end systems, and the traffic in the network. Thus with NPs pervading the network, the processing power in the network is bound to increase manifold. However, it may not be uniformly distributed. The processing power may be underutilized at certain points, but inadequate at certain others. We propose that this imbalance be exploited by viewing the entire network of intermediate elements as a single, coordinated entity. To this end, we propose a grid-framework that pools the in-network resources, and makes the network services available as a commodity. In this framework we propose the use of AN technology for dynamic deployment of network services on the NPs to suit the varying demands of applications. It is to be noted that this proposed grid framework operates at the network layer as opposed to conventional grids (computational grids, data grids, etc), which focus on the application layer [20]. Even the Active Grid framework of the RESO project [13,14], which is aimed at providing network services for the conventional grids using AN technology, focuses on the higher layers. Thus, our proposed grid framework is different from existing grids in that it is an infrastructure-level grid of active NPs. The grid features that we exploit are: use of idle resources, large-scale sharing of heterogeneous resources spanning across different administrative domains, and a single-system view of the network. The different network devices play the role of service providers, resource brokers and coordinators depending on their processing capability and resource availability. The single-system view emphasizes end-to-end performance as opposed to localized solutions in conventional networks, and benefits both high-end and low-end clients of the network. Thus, the significant benefits that we foresee are handling of high-bandwidth applications with reduced burden on the end-systems, and offering of customized value-added services in the network, to lowend clients like handheld devices, in a transparent manner. The proposed grid operations are facilitated by the use of NPs and AN technology. The NPs with their deep-packet-processing capabilities enable application-aware processing and allow offloading of application-level services into the network. AN technology allows this to take place on-demand. The primary goal of the AN technology is to decouple the network services from the networking element thereby enabling on-demand code deployment. The re-programmable nature of the NPs qualifies them to be Active Nodes. Thus NPs and AN technology together, enable on-demand deployment of application-aware services in the network. This paper presents the conception of this infrastructure grid, describes the proposed architecture in detail, and illustrates its use for a specific application. It examines the suitability of NPs and AN technology and provides a proof-of-concept implementation of select key components. The organization is as follows. Section II
40
T.K.S. LakshmiPriya and R. Parthasarathi
presents the design of the proposed grid, its architectural components and the mode of operation. In Section III various scenarios of the iSEGrid are illustrated for multimedia traffic. This is followed by the evaluation of the iSEGrid in Section IV. Section V presents comparison with related work and Section VI concludes the paper with a briefing on work-in-progress.
2 Design of the iSEGrid – A Network Infrastructure Grid The proposed infrastructure grid consists of network entities, which are in-networkservice-aware entities (iSEs). Hence this grid is named ‘iSEGrid’. The purpose of this network infrastructure grid is to harness the tremendous network-processing power, and offer it as a commodity to the grid users - the end-systems. Here, the term ‘endsystems’ includes the server applications (iSE_user_SAs) and the client applications (iSE_user_CAs). The servers associated with the Internet service providers, media service providers, mail-service providers and content providers, along with their clients are the iSE_user_SAs. The iSE_user_CAs that benefit from the services of the iSEGrid may run on PCs, laptops, mobile phones or any other computational gadget. The grid environment is depicted in Figure 1.
iSEGrid Resource Brokers (iSE_RBs)
iSE_User_CA iSE_User_SA
iSE_Dir
iSEGrid Service Provider (iSEs)
iSE_ACR
iSEGrid Portal
Fig. 1. iSEGrid Environment and components
The iSEGrid spans across the entire Internet, edge-to-edge, consisting of all sorts of edge nodes as well as core nodes, as its grid-resources – the iSEs. These resources posses diverse characteristics in terms of processing power, memory, data rate, type of protocol handled, QoS characteristics, data medium, type of interface, etc. In addition, being part of different administrative domains, these resources, follow different policies and practices. The requirement for an in-network node to be an iSE is the availability of ‘excess’ resources that it can volunteer to the grid and the ability to be an Active Node. Resources that may be volunteered are computational threads, CPU time, memory or buffers, and ability to handle an additional flow of packets. The providers of the network infrastructure, who volunteer their resources to the iSEGrid, constitute the iSEGrid service providers.
Architecture for an Active Network Infrastructure Grid – The iSEGrid
41
The iSEGrid coordinates its heterogeneous, distributed resources, to solve a common problem – the end-to-end performance. It employs resource brokers (iSE_RBs) which are powerful intermediate nodes, for typical resource brokering operations such as managing idle resources, delegating tasks to the iSEs, aggregating services from individual iSEs, enforcing policies, resource accounting and charging, and triggering the grid activities. With the help of a rule-base, the iSE_RBs make intelligent decisions on the aggregate information collected from other iSEs. The iSE_RBs may also handle issues relating to fault tolerance, reliability and availability (esp., transient nodes) of the iSEGrid. They may cooperate with each other or form a hierarchical resource broker structure if necessary, to provide a service. In this paper, however, these extensions are not dealt with. The iSE_RBs and the iSEs must possess the necessary code modules while offering the iSEGrid services. These modules are developed as active software components and are stored in the iSE Active-Code Repositories (iSE_ACR). In addition to the storage available at the iSE Portal, nodes like storage servers may volunteer storage to this repository. The active components are deployed at the iSEGrid nodes either during registration or on-demand. From the kind of the operations performed at the various iSEGrid nodes, it is obvious that these nodes maintain a variety of information for normal operation. The data structures that maintain these data and metadata are collectively maintained as ‘directories’ (iSE-Dirs), located at various strategic points for use by the grid nodes. The entry point into the iSEGrid is a publicly accessible portal, which advertises the iSEGrid services. The in-network node owners can register their nodes as iSEs via this portal while the server apps and the client apps can register themselves as iSE_user_SA and iSE_user_CA respectively. This portal maintains the static part of the iSE_Dirs while the rest is maintained at the iSEs that volunteer storage resources to the iSEGrid. Thus the iSEGrid is seen to consist of four major functional components: iSEGrid Portal, iSEGrid Resource Brokers (iSE_RBs), iSEGrid service providers (iSEs), iSEGrid Active Code Repositories (iSE_ACR) and the iSEGrid Directories (iSE_Dir) as shown in Figure 1. 2.1 iSEGrid Service Architecture The iSEGrid service architecture is a four-layered one, which replaces the network layer in a typical layered network architecture. For instance, in the TCP/IP model, this grid can be viewed as an extended IP layer, sitting below the TCP layer and above the MAC layer. The four layers of the proposed architecture are Basic Networkprocessing (BNp) layer, Local Decision-making (LDm) layer, Aggregate Decisionmaking (ADm) layer and iSEGrid services layer as shown in Figure 2. Of the four layers, the lower two layers, namely the BNp and the LDm layers perform the normal network processing or IP functions. The other layers namely, the ADm and the iSEGrid services layers, are the grid extensions to the IP layer. The BNp layer includes services like packet processing, classification, header processing, flow identification, etc., that are performed at the individual in-network entities.
42
T.K.S. LakshmiPriya and R. Parthasarathi
Above the BNp layer is the LDm layer. The services of this layer include the local policy and decision-making services. By ‘local’ services, we mean the consolidation of the BNp services that are performed at an individual iSE without the global view or interaction with other iSEs. The LDm services may be general purpose or may be specific to the applications. This layer exposes the network resources to the iSEGrid. For this purpose, in addition to the LDm services, it includes the communication and authentication protocols associated with the resources. The services of this layer include the policies for analysis of packets, the decision-making rules for operating on particular type of flow, the access-control policies and authentication mechanism for each iSE. The LDm services of the iSEs are aggregated by the coordination of the iSEs. These collective operations constitute the ADm layer and are performed by iSEs that coordinate the network processing services of individual iSEs, namely the iSE_RBs. This layer exhibits intelligence in the network. The aggregated services of the ADm layer are customized and offered as ‘iSEGrid Services’ to the iSEGrid consumers as per their requirements. These services form the iSEGrid Services Layer. It is via this layer that the iSEGrid communicates its services to its consumers. The iSEGrid architecture is hourglass shaped with the LDm layer at the neck of the hourglass. It is preferred to keep the LDm layer thin due to the development of a diverse range of in-the-net nodes, so that the set of core LDm services is small and a broad range of services at the ADm layer can be implemented on top of these. Services at each layer or across layers are developed as active code components and are made available at the iSE_ACRs. The granularity of the code varies with the requirement.
Transport layer and above
iSE Grid
iSEGrid Services Layer Aggregate Decision-making (Adm) Layer Local Decisionmaking (LDm) Layer Basic Network-processing (BNp) Layer MAC and Physical Layers Fig. 2. iSEGrid Service Architecture
2.2 iSEGrid Operations The iSEGrid operations can be explained under two phases namely the iSEGrid setup phase, which involves registration and module deployment, and the iSEGrid-inservice phase, in which the iSEGrid services are offered. The interactions between the iSEGrid nodes and consumers during these phases are depicted in Figure 3.
Architecture for an Active Network Infrastructure Grid – The iSEGrid
43
Phase 1 - iSEGrid Setup phase The iSEGrid is set up as the grid nodes (i.e., iSEs, iSE_RBs, iSE_CRs and iSE_Dirs), and the grid consumers, register (Figure 3). Network providers, the owners of a large variety of in-network entities including base stations, access points and CDNs, approach the iSE Portal to register their nodes as iSE grid nodes. Information regarding the configuration, capability, and constraints of these nodes are conveyed to the iSEGrid. Negotiations regarding security, accounting, type and service parameters are carried out. As each iSEGrid node is registered, the active code modules required for the services sought, are deployed at the respective nodes. The iSE_Dir is updated and initialization procedures are executed at the new iSEGrid node. At the time of registration of an iSE_RB, each new iSE_RB is associated with a set of iSE_user_SA. Similarly, when a new iSE_user_SA registers, the corresponding startup iSE modules are deployed and a specific iSE_RB is associated with it. The iSE_RB becomes its first point of contact to the iSEGrid. All further communications from the iSE_user_SA to the iSEGrid will take place via this iSE_RB. Initially, the known client groups of the SA are made the iSE_user_CAs. However, iSEGrid also permits adding client groups dynamically. This can be done in two ways. The iSE_RBs may automatically detect these clients by monitoring traffic at the server-edge, or receive intimation from the server on the arrival of requests from the clients. Figure 3 indicates registrations occurring in a particular order, but in practice, this varies. Participation in the iSEGrid is transient. By this we mean that grid nodes and consumers are permitted to register and deregister alternately. However, a node may deregister only after completing or migrating, the committed services.
iSE_RBs
iSE_User-SA
iSE_RBs
iSE_CR
iSEGrid Portal
iSE_User-CA
iSEs
iSE_Dir
iSE Registration iSE RB Registration iSE_User-CA Registration
Setup Phase
iSE_User-SA Registration Media Request Triggering the iSEGrid service Initiate the iSEs
Offering the services
Active Deployment
Fig. 3. iSEGrid – 2-phase Operation
In-service Phase
44
T.K.S. LakshmiPriya and R. Parthasarathi
Phase 2 - iSEGrid-in-service phase Soon after registration, the iSEGrid nodes enter the iSEGrid in-service phase. Step 2 is a typical client/server request. The arrival of the client request at the server triggers the iSEGrid service. This can be done in two ways: explicitly or implicitly. Explicit triggering occurs when the iSE_user_SA requests the iSE_RB for an iSEGrid service. On the other hand, implicit triggering occurs when the iSE_RB intercepts (i.e., deep packet processing) the flow at the server-edge and detects the need for an iSEGrid service. ‘Triggering’ an iSEGrid service, involves identifying and initiating the iSE(s). Identifying the right iSE(s) necessary to service the request involves intelligent decisions. Typical rules for this purpose may be based on, proximity to the client group; iSEGrid services offered by the iSE; ability to provide the service at that point of time; or expected response time. The identification of iSEs may be performed by iSE_RBs in isolation or in coordination with others. The iSEs are identified and initiated. The parameters for the service including the location of the iSE_ACR s are then sent to these iSEs (Step 4). The iSEs begin offering the services (Step 5) after the ‘on-demand’ deployment of the iSE code modules from the iSE_CR. Once triggered, the iSE_RBs periodically probe the iSEs and maintain the services. When a service terminates, wind-up operations are done at the iSEs and the iSE_Dirs are updated to reflect this change. 2.3 iSEGrid – Modes of Usage It can be seen from the above description that global view and coordinated functions are two key characteristics of the iSEGrid. We now present different modes of usage of the iSEGrid that exploits these two characteristics. 1. Integration of services A straightforward application of the iSEGrid, would be to integrate the currently available localized mechanisms, from a global perspective. 2. Code and service movement Since the grid allows dynamic deployment of services and code movement, any of the existing network services can be moved to the appropriate location (sometimes even to multiple locations) to provide efficient service. That is, services provided at the network edge may be moved into the network or those at the end systems may be moved to the edge and vice-versa. Thus the network resources can be effectively utilized and the burden on the end systems can be reduced. This has an added advantage that services that were hitherto available only to the powerful end-systems can now be provided to less powerful end systems like the hand-held devices too. 3. Novel in-network service In order to tap the full potential of the iSEGrid, novel solutions that exploit the innetwork capabilities can be identified. One such solution is in terms of setting up a chain of services at the intermediate iSEs, along the path. These services may be aggregated or used in isolation. Even though this requires a paradigm change in the networking domain, it can be seamlessly integrated using the various features of the iSEGrid. The next section illustrates each of these modes of usage as applied to multimedia services.
Architecture for an Active Network Infrastructure Grid – The iSEGrid
45
3 iSEGrid for Multimedia Services – An illustration Multimedia traffic requires special attention especially because of the differences in its characteristics from those of other traffic in a network. Researchers have been working on various issues to improve the performance of multimedia applications by making the underlying network services, streaming-aware [6-10]. One of the major issues is timely arrival of media packets at the client node. Mechanisms like prefetching and caching; media-specific packet scheduling, network congestion avoidance using multipath-transmission, rate adaptation, and minimizing end-to-end retransmission have been proposed to reduce the latency of the packets. Yet another issue, namely high bandwidth requirement, is being handled by mechanisms like transcoding and multiple source streaming. Each of these mechanisms operates at various points along the transport path i.e., some at the end systems, some at the network edge, and some at the core. Typically they provide solutions based on a localized view of the problem. Hence they do not guarantee an end-to-end solution, which adapts to varying network conditions. It is in this situation that the iSEGrid provides an alternative. Any of the three modes of usage mentioned in the previous section could be applied. We take a few instances to illustrate each of these modes. 1. Integration of Services We consider a full-fledged application, namely, flash crowd control in a p2p network, serviced by media servers. Flash crowds occur when an unexpected number of requests hit the server within a very short duration of time. One solution to this is the maintenance of a coordinated cache at the client end, which serves the clients locally during flash crowds [11]. Here, one or more of the clients performs the coordination. The physically distributed cache, which is the key component of this service, is made up of portions of memory volunteered by each client in the peer group. An iSEGrid-based approach to this solution employs a server-side iSE_RB (SS_RB) and a client-side iSE_RB (CS_RB). The SS_RB monitors the load on the server and communicates peak-load conditions to the CS_RB. The CS_RB along with the peers in the network performs the pre-flash-crowd operations, i.e., caching the recently viewed clips at the clients and transparently maintaining their indices at the CS_RB. On the occurrence of flash crowd conditions at the media server, the SS_RB sends an intimation to the CS_RB. This intimation provides timely detection of flash crowds. The CS_RB immediately redirects further media requests to the locally existing objects, ensuring continuous delivery. Since the iSE_RB takes up most of the maintenance tasks, the load on the client is reduced. The iSEGrid-specific messages either have no payload or are light in weight and can be piggybacked. Thus, this application illustrates the unification of the load-monitoring service, which is typically performed at the server, and the coordinated cache service implemented at the client end for undisturbed service. 2. Code and Service Movement We consider three different mechanisms that can be enhanced by code/service movement – feedback-based rate adaptation, QoS mechanisms for wireless network and transcoding.
46
T.K.S. LakshmiPriya and R. Parthasarathi
Feedback-based rate adaptation at the server: Feedback is normally obtained from the end system or the network edge. In the iSEGrid, this feedback generation can be moved to the iSEs in the network and an aggregated feedback can be obtained at the server. The advantage gained is that information about the entire path is available at the server for rate adaptation and adverse conditions along the path are detected earlier. Similarly, QoS mechanisms are normally deployed at the core of the network, for both wired and wireless networks. However, for a wireless network, it will be useful to move this service to the edge where the wired meet the wireless. The nodes at the junction of wired and wireless networks, which experience the varying characteristics of two different networks, are ideal iSEs, to impose QOS. These iSEs can perform QoS-specific scheduling, classification and queue management on the flows and adaptively cater to the changes in the wireless applications. The dynamic deployment feature of the iSEGrid can be used to enable the on-demand loading of the desired algorithm [5]. An iSE_RB can be used to detect the change in flow and initiate code transfer from the code repository. Transcoding is normally employed at the edges – either server or client edge – to adapt to the client requirements in terms of bandwidth and resolution. In the iSEGrid environment, this service can be moved to any position in the path – server edge or client edge or any volunteering intermediate iSE – to dynamically accommodate variation in service and demand. Prefetching and caching services can also be offered in a similar manner. Here again, an iSE_RB will be used to coordinate this adaptive service movement. 3. Novel in-Network Solution The in-network feedback generation described above is an example of an in-network chain of services. Similarly, a chain of link-cache can be maintained at the iSEs to cache high priority media packets at each link until the subsequent iSE (link) acknowledges. This provides early detection of link-level packet loss thereby avoiding end-to-end packet retransmission [12].
4 Evaluation of iSEGrid To evaluate the proposed iSEGrid, we consider two aspects – the underlying technology, and the performance benefits to an application that uses the iSEGrid. Since the idea of the grid is motivated by the use of AN and NP technologies, it is important to study the feasibility of using the NP as an iSE and as an Active node. The evaluation is based on the scenarios described in the previous section. 1. Using NP as an iSE_RB The iSE_RB functionality, illustrated in the previous section that coordinates transcoding, prefetching and caching has been developed on an Intel IXP1200 NP [17]. This iSE_RB is assumed to be located on a Base Station Controller (BSE) at the wired-wireless junction. This iSE_RB is responsible for detecting, scheduling and allocating volunteer iSEs for offering streaming service to mobile clients. This involves a sequence of events E1 to E6 as follows. The volunteers first register with the iSEGrid (E1). The media requests from the mibile are intercepted by the iSE_RB (E2). The iSE_RB then probes the volunteers for their availability (E3). On receiving a
Architecture for an Active Network Infrastructure Grid – The iSEGrid
47
response (E4), it allocates the volunteer for the service (E5) and intimates the client (E6). The volunteer iSE then begins prefetching, caching and transcoding before streaming the object to the mobile client. The ability of the IXP1200 NP-bases iSE_RB to handle these requests has been analyzed under two different scenarios: (1) a volunteering iSE is available till the end of the service (Figure 4a) and (2) the iSE leaves the system before completing the service (which requires reallocation) (Figure 4b). It is assumed that all packets arrive on 100Mbps lines with an inter packet gap of 960 nanoseconds. The µ-engines of the NP operate at 232 MHz. Figure 4 shows the time line diagram of the events that occur for scenario 1 and 2. The clock cycles at which the events occur are given. Servicing a request for an existing video
Servicing a new video request
19100,19950,20890,21585,22430 6818 7398
E1 E2
11070 12924
E3
5,000
10,000
E2 E3 E4 E5 E6
17579 18403
E4
E5 E6
20,000
15,000 Microengine cycles
25,000
(a) Scenario 1
Servicing a new video request 6818 7398
11070 12924
E1 E2
E3
5,000
10,000
Reallocation
17579 18403 19356
E5 E6 E4 (Quits)
E4
15,000 Microengine cycles
23793 24523
E5 E6
20,000
25,000
(b) Scenario 2 EVENTS a) iSEGrid setup phase : E1 - volunteering iSE registers b) iSEGrid-in-service phase : E2 to E6 (Explicit triggering) E2 - streaming request packet arrives from a mobile E3 - iSE_RB sends probe packet to the iSE E4 – iSE sends response to the iSE_RB E5 - iSE_RB sends start-service packet to the iSE E6 - iSE RB sends service-intimation packet to mobile
Fig. 4. Time line showing the events during the services of two requests
48
T.K.S. LakshmiPriya and R. Parthasarathi
Scenario 1: Initially, the volunteer iSE registers (E1 at 6818). The sequence of events (E2 at 7398 to E6 at 18403), occur during the service of a video object for the first time (i.e., a new video). This is followed by another request for the same video (E2 at 19100 to E6 at 22430). The iSE_RB requires 11,005 µ-engine cycles (47.4 µ sec) for the first request, while for a subsequent request to the same object; it is only 3,330 µengine cycles (14.3 µ sec). Scenario 2: Registration and service to a new video request (i.e., E1 to E6) are the same as in previous scenario. When the iSE leaves the system, it intimates the iSE_RB (E4 at 19356) of its unavailability. The interval between the points (6,18403) and (4,19356) indicates the partial streaming interval. Since the iSE’s service has not been completed, the iSE_RB does a reallocation (i.e., 19356 onwards) and sends the serviceintimation (E6 at 24523) to the new iSE. The reallocation latency is 5167 µ-engine cycles (22.7µsec), for this scenario. The overall latency involved when a volunteer leaves, is found to be 16172 µ-engine cycles (70.1µsec). The overheads of using an NP as an iSE_RB are viewed in terms of the resources (i.e., microengines of the NP) required and in terms of the message exchanges specific to iSEGrid operation. The iSE_RB design utilizes all the six microengines of the IXP1200. Hence it is recommended to use the IXP1200 as an attached processor on the BSC or choose a higher version NP that can house both the functionality of the BSC as well as that of the iSE_RB. The iSEGrid specific message exchanges have been listed above as ‘events’. Most of these messages either do not have a payload and hence may be piggybacked or are light in weight. The probe and start-service messages have no payload. The probe-response message contains the parameters, like channel-quality, between the iSE and the mobile, and hence requires a few tens of bytes as payload. This experiment evaluates the effectiveness of NPs as RBs on the iSEGrid and shows the reduction in service. 2. Using NP as an Active iSE The adaptive QoS service applied at the edge of wireless network as illustrated in the previous section has been developed on an NP-based WLAN Access Point (AP) [5]. This receives active packets (in the ANEP format) and dynamically loads the embedded QoS function. Active modules for various classification, queue management and scheduling algorithms have been developed on the active framework for IXP1200 based NPs. An active code handler module has been developed specifically to handle the active code and load it onto the microengines. The system has been tested with various active code modules. The NP-based active test-bed was found to receive the active packets, stop the currently running algorithm and load the new one appropriately. Normally, the switching time is a major overhead of dynamic loading operations. However, the iSE_RB at the WLAN AP has been designed with two sets of queues, one for the currently running algorithm and another for the incoming algorithm to-be-loaded. Here, the feasibility of an active NP-based iSEGrid component has been established. 3. An application on the iSEGrid The flash crowd control application, as illustrated in the previous section, has been tested in an iSEGrid environment consisting of a CS_RB, an SS_RB and a P2P group of 12 clients sharing 20 media files. The effectiveness of the service is shown in the graph in Figure 5.
Architecture for an Active Network Infrastructure Grid – The iSEGrid
49
The percentage of requests that were not serviced by the server during a flash crowd, are plotted for five different bandwidth reservations at the server (data series 1). These server-rejected requests are handled by the CS_RB. The percentage of the total requests serviced by the CS_RB is indicated by the second data series in the graph. The difference between the first and the second data series indicate the usefulness of this service. The usefulness is calculated as Effective Service Percentage (ESP) of the CS_RB. No. of requests serviced by CS_RB ESP = ------------------------------------------------ x 100 No. of requests rejected by the server
90 80
70
70
% of the Total 60 no. of Messages 50 communicated 40
30 20 10 0
81
74
65
83
55 40 28
35 36
23
26 19
2 11 3 5 7 9 ServerBandwidth (no. of connections)
Percentage of total requests NOT serviced by server Percentage of total requests serviced by CS_RB Effective Service Percentage (ESP)
Fig. 5. Flash Crowd Control service of the CS_RB component
The ESP is the percentage of the server-rejected requests that are serviced by the CS_RB. The third data series gives the ESP. It can be seen that the ESP value is high when a large bandwidth is reserved at the server, and is lower otherwise. The ESP is the parameter that determines the usefulness of the CS_RB, for a given bandwidth reservation at the server. We find that the CS_RB is more useful when the bandwidth reservation at the server, is low. About 67 requests out of the rejected 167 (i.e., 40%) have been serviced by the CS_RB with a reservation of three connections. During such situations the CS_RB experiences maximum load. The CS_RB for this service has been developed on an Intel IXP1200 NP to test its performance under various scenarios and varying flash crowd durations. The design of this service requires only two microengines, indicating that any NP-based edge device that can volunteer two microengines and 12 x ‘n’ Bytes of SRAM for table lookup can play the role of such a CS_RB with a client group size of ‘n’ peers.
50
T.K.S. LakshmiPriya and R. Parthasarathi
The above evaluation also showcases two out of three modes of use of the iSEGrid – integration of services and code and service movement. The third aspect in-network novel service – has also been demonstrated, but the results have been presented elsewhere [12].
5 Comparison with Related Work The goal of the infrastructure grid, proposed in this paper is to pool the network resources and to exploit the imbalance in their processing power to bring about better end-to-end performance and for enabling value addition for the applications. To do this effectively, the iSEGrid employs the positive aspects of NP and AN technologies. The idea of combining active networking with powerful network resources is not entirely new and is very close to the idea of active grids proposed by the RESO project [13, 14]. Active grids focus on offering intelligent adaptive transport services to support computational grids (grid environments and apps). While the RESO project is concentrating on developing application-aware components (AACs) for deployment at the edge nodes of the core network, the functionality of which is specifically for computational grids and their applications; iSEGrid concentrates on implementing the application-level services at the network layer thereby offering a common base for a wide range of technologies operating at the higher layer – computational grids, peer-to-peer networks, overlays, internet, and web services. It can be viewed as complementing the Active Grid by providing support at the network layer. Another architecture that is similar to the iSEGrid is the Open Pluggable Edge Services (OPES) architecture [15,16], which brings application awareness into the network. However, OPES is an overlay of application-level network intermediaries while iSEGrid is an overlay of all network intermediaries. Hence the iSEGrid has a wider scope for in network functionalities. It is a complementary technology in the sense that it can also be used to support OPES. In terms of the underlying concepts of resource sharing and coordination, the P2P computing paradigm and conventional grids are similar to the iSEGrid. However, while P2P networks operate at the end systems, iSEGrid spans across the network. In that sense the P2P concept is embedded in the iSEGrid. Similarly, conventional grids treat the entire network, as an individual resource, whereas iSEGrid goes deeper and focuses on the network components themselves. Thus the iSEGrid presents another dimension in the grid space.
6 Conclusion and Future Work This paper has introduced the iSEGrid-framework that pools the in-network resources of NPs, and makes the network services available as a commodity. This infrastructure grid allows dynamic deployment of application-aware network services using ANs to suit the varying demands of applications. In essence, it has been shown that the synergy of three different technologies namely grid technology, ANs and NPs, can be exploited cleverly to expend the capabilities of today’s networks for the future. The
Architecture for an Active Network Infrastructure Grid – The iSEGrid
51
layered architecture and the design of the proposed iSEGrid have been presented. A few multimedia-specific scenarios have been illustrated to bring out the usage of the iSEGrid. Even though the use of the iSEGrid has been showcased for multimedia applications, it is to be noted that all the above services benefit non-streaming applications as well. Further challenge lies in exploring more issues to be solved by this paradigm. The evaluation of the iSEGrid in terms of the underlying technology, and in terms of different usages of the iSEGrid, has been presented. To conclude, the iSEGrid has exposed a whole new paradigm for enabling networking services and solutions. Our architecture implies that an intelligent, efficient underlying grid is available to the application developers. The challenge now is in finding issues that can be solved better using this paradigm, and in exploring services that can be provided ‘in the network’. We have initiated activity in this direction for both wired and wireless media streaming applications. We also plan to explore support for text processing applications. Work in progress includes the development of a network level simulator for the proposed grid. Further analysis is required in terms of security issues at the iSEGrid components, storage issues at the resource brokers, and the development of effective protocols for communication between the iSEGrid components with reduced overloads.
Acknowledgement We thank M.Nithish, C.Ramakrishnan, J.Ramkumar, R.Sharmila, and V.Krishnan, for their contribution in the implementation of the iSEGrid modules.
References 1. Decasper, D., Plattner, B.: DAN: distributed code caching for active networks. In: IEEE INFOCOM 1998, Proceedings of the Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 2, pp. 609–616 (March/April 1998) 2. La Cholter, W., Narasimhan, P., Sterne, D., Balupari, R., Djahandari, K., Mani, A., Murphy, S.: IBAN: intrusion blocker based on active networks. In: Proceedings of the DARPA Active Networks Conference and Exposition, DANCE 2002, pp. 182–192 (2002) 3. Subramaniam, S., Komp, E., Kannan, M., Minden, G.J.: Building a reliable multicast service based on composite protocols for active networks. In: Minden, G.J., Calvert, K.L., Solarski, M., Yamamoto, M. (eds.) Active Networks. LNCS, vol. 3912, pp. 101–113. Springer, Heidelberg (2007) 4. Kind, A., Pletka, R., Waldvogel, M.: The Role of Network Processors in Active Networks. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982, pp. 18–29. Springer, Heidelberg (2004) 5. Sharmila, R., LakshmiPriya, M.V., Parthasarathi, R.: An active framework for a WLAN access point using intel’s IXP1200 network processor. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296, pp. 71–80. Springer, Heidelberg (2004) 6. Hefeeda, M.M., Bhargava, B.K., Yau, D.K.Y.: A hybrid architecture for cost-effective ondemand media streaming”, Computer Networks. The International Journal of Computer and Telecommunications Networking archive 44(3), 353–382 (2004)
52
T.K.S. LakshmiPriya and R. Parthasarathi
7. Keller, R., Sumi Choi Dasen, M., Decasper, D., Fankhauser, G., Plattner, B.: An active router architecture for multicast video distribution. In: Proceedings of the Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, INFOCOM 2000, vol. 3, pp. 1137–1146 (2000) 8. Nguyen, T.P.Q., Zakhor, A.: Distributed Video Streaming Over Internet. In: Multimedia Computing and Networking 2002, Proceedings of SPIE, San Jose, California, vol. 4673, pp. 186–195 (January 2002) 9. Chen, X., Heidemann, J.: Flash Crowd Mitigation via Adaptive Admission Control based on Application-level Observations, Technical Report, ISI-TR-2002-557 (May 2002) (updated on March 25th, 2003) 10. Korkmaz, T., Krunz, M.M.: Routing Multimedia Traffic With QoS Guarantees. IEEE Transactions On Multimedia 5(3) (September 2003) 11. Stavrou, A., Rubenstein, D., Sahu, S.: A Lightweight, Robust Peer-To-Peer System to Handle Flash Crowds. In: 10th IEEE International Conference on Network Protocols (ICNP 2002), ACM SIGCOMM Computer Communication, vol. 32(3), p. 17 (2002) 12. Nithish, M., Ramakrishna, C., Ramkumar, J., LakshmiPriya, T.K.S.: Design And Evaluation Of Intermediate Retransmission And Packet Loss Detection Schemes For MPEG 4 Transmission. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC 2004), vol. 1, p. 742 (2004) 13. RESO Project, http://www.inria.fr/recherche/equipes_ur/reso.en.html 14. Bouhafs, F., Gaidioz, B., Gelas, J.P., Lefevre, L., Maimour, M., Pham, C., Primet, P., Tourancheau, B.: Designing and Evaluating An Active Grid Architecture. The International Journal of Future Generation Computer Systems (FGCS), Special issue: Advanced grid technologies 21(2), 315–330 (2005) 15. IETF OPES, http://ietf-opes.org/ 16. Nurmela, T.: Analysis of Open Pluggable Edge Services. In: Seminar On Hot Topics In Internet Protocols (2004) 17. Intel IXP1200 Network Processor, http://www.intel.com/design/network/ products/npfamily/ 18. Krishnan, V.: InGRA-Intelligent Grid resource allocation for Mobile Clients, Project Report, Dept. of CSE, CEG, Anna University, Chennai, India (May 2005) 19. Hvamstad, O., Griwodz, C., Halvorsen, P.: Offloading Multimedia Proxies using Network Processors. In: International Network Conference 2005 (2005) 20. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications and High Performance Computing 15(3) (2001)
Network Services on Service Extensible Routers Lukas Ruf, K´aroly Farkas, Hanspeter Hug, and Bernhard Plattner Computer Engineering and Networks Laboratory (TIK) Swiss Federal Institute of Technology (ETH) Zurich CH-8092 Zurich/Switzerland {ruf,farkas,hhug,plattner}@tik.ee.ethz.ch
Abstract. Service creation on extensible routers requires a concise specification of component-based network services to be deployed and extended at node runtime. The specification method needs to cover the data-flow oriented nature of network services with service-internal control relations. Hence, it needs to provide the concept of functional service composition that hides the complexity of a distributed, dynamically code-extensible system. We propose the PromethOS NP service model and its Service Programming Language to answer this challenge. They provide the concepts and methods to specify a network service as a graph of service chains with service components, and service-internal control relations. In this paper, we present the concepts of our service model, the syntax and semantics of its Service Programming Language, and demonstrate their applicability by an exemplary service specification.
1 Introduction and Motivation One of the most significant problems of the Internet today is the lack of non-disruptive service creation and extension on demand on access and border routers. Non-disruptive service creation and extension have become key requirements due to the following trends: – Function shift from the end-systems to the access networks: Function is moved from the end-systems towards the network to ease site operation, and to benefit from the economy of scale user-centric network services are deployed on access routers for the three plains [3] of data-path, control and management functionality. Examples are the protection of network sites [15] and the alleviation of network management and control [1]. – Router consolidation: For a reduction of costs of network management and operation, routers are consolidated. Larger devices, hence, are needed to satisfy the demands for the interconnection of different networks. Equipped with programmable network interfaces, these network nodes1 provide suitable locations for new extended network services.
1
This work is partially sponsored by the Swiss Federal Institute of Technology (ETH) Zurich, the Swiss Federal Office for Education and Science (BBW Grant 99.0533), and the Intel IXA University Program (Research Grant 22919). PromethOS v1 has been developed by ETH as a partner in IST Project FAIN (IST-1999-10561). We refer interchangeably to a router device by the term network node.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 53–64, 2009. c IFIP International Federation for Information Processing 2009
54
L. Ruf et al.
– Enabling Technologies: Network Processors [7, 8] (NPs) appeared recently on the market. They provide generally an asymmetric programmable chip-multiprocessor architecture with functional units optimized for specific network operations. Usually, they are built of two different processor types, namely packet and control processors2 (PPs and CPs respectively), that reside in a conceptual view at two different levels. CPs are built commonly of a general purpose processor (GPP) while PPs provide the architecture of a stripped down RISC processor supported by specialized co-processors to process packets at line-speed. Network interface cards with embedded NPs (so-called NP-blades) provide, thus, the required flexibility to extend large network nodes easily and increase the processing capacity of a network node simultaneously. Router devices with NP-blades and multiple GPPs provide a powerful hardware platform. Service management and node operation, however, is complicated by the nature of these large heterogeneous network nodes. PromethOS NP [13, 14] provides the flexible router platform that enables non-disruptive service creation and extension on any programmable processor at node run-time according to an extended plugin [5] model. For the definition of network services, however, a concise specification method is required. This specification method needs to export the capabilities of the underlying service infrastructure while it must abstract from the complexity of large multi-port router devices. Moreover, it must be general enough to cope with the large variety of network nodes. Thus, it needs to describe control and data relations among service components, and needs to specify additions to previously deployed network services with specific resource constraints in a flexible way. Therefore, we present in this paper the PromethOS NP service model and propose its Service Programming Language (SPL) that is used to define and specify network services on the PromethOS NP router platform. The service model provides the concepts to deploy new network services composed of distributed service components on codeextensible routers and to extend previously deployed services with additional functionality. The SPL supplies the service programming interface of the router platform for the installation and basic configuration of new network services. We structure the remainder of this paper as follows. In Sec. 2, we revise related work in the area of service models and specifications. Then, in Sec. 3, we introduce our service model and present our service programming language (SPL) with the definition of the relevant key productions. For proof of concept, we evaluate the SPL by its application on an exemplary service program in Sec. 4. The imaginary service program illustrates part of the capabilities and the flexibility of the SPL. In Sec. 5, this paper is then completed by a summary and conclusion followed by a brief outlook to further fields of application. 2
NP vendors do not use a consistent naming scheme to refer to the code-extensible processors: the Intel IXP-architecture refers to the first-level processors as microengines while the IBM PowerNP identifies them as picoprocessors or core language processors. Second-level processors are named differently as well. For this reason, we refer to the first level of processing engines as packet processors and to those of the second level as control processors.
Network Services on Service Extensible Routers
55
2 Related Work Network service creation on active router platforms and deployment of services within the network have been a research area for quite a while. Research has been carried out on various levels of abstractions. We restrict our review of related work to four different projects in the area of service specification on active network nodes: Click [9] and NetScript [4] due to their service models and specification languages, Chameleon [2] due to its service model and process of service creation, and CORBA [11] because of its component model. 2.1 Click The Click modular router [9] approach defines two environments of code execution (EE) on Linux: one is the in-kernel EE and the other is a Linux user space EE. Both support service creation according to a service specification of interconnected Click elements. Click elements provide the function of network services. Network services are defined by the specification of their inter-connection. Click uses so-called compound elements that allow for user-specified service class definition. A compound element consists of simple elements that specify the functions. The Click specification language is a language that defines the elements and their inter-connection. Sub-classes of elements can be easily extended by own functionality, since new elements can be specified to create new functions in a C++-like style. The functions are then statically linked into the Linux kernel. Both, the in-kernel as well as the user space EE accept a Click service specification, resolve dependencies and create new network services. While Click defines arbitrary service graphs by its specification language, the expressiveness to specify resource limits is not given. Moreover, its capabilities to extend previously deployed network services is not given following the architectural limitations of the Click EEs. 2.2 NetScript NetScript [4] defines a framework for service composition in active networks that is programmed by three domain-specific languages: 1.) the dataflow composition language, 2.) the packet declaration language, and 3.) the rule-based packet classification language. The first defines a method to specify data path services as a composition of interconnected service components called boxes. The second is able to define the packet structure of network protocols, and the third defines the packet classification rules that are installed in the NetScript kernel. Boxes in NetScript provide a container for code or hardware-based service components, or other boxes in a recursive manner. For our vision of service composition on a high-performance router, the first language, the dataflow composition language, is relevant. As a linear XML [16] specification of subsequent and interconnected datapath boxes that may be code or hardware elements, the NetScript dataflow composition language provides an interesting approach to our problem. However, it lacks the abilities to define control relations among control service components controlling other service components as well as for
56
L. Ruf et al.
signalling conditions among subsequent service components. Moreover, the capabilities to extend previously deployed network services is not given, and it does not provide the expressiveness to specify resource and placement constraints of components. 2.3 Chameleon Chameleon [2] is a node level service creation and deployment framework which provides an XML-based service description specifying a network service in an abstract way. The description is based on a recursive service model with containers, so-called abstract service components (ASCs). The ASCs group the functional entities and describe dependencies. In Chameleon, service descriptions define network services as a composition of ASCs. A node local service creation engine (SCE) resolves the service description according to the local capabilities of the node into implementation service components and creates a tree like representation of them. These components are then deployed on the node by the help of a node configurator that provides the required interface towards the SCE to manage and control the platform. Service components in Chameleon are modelled as functional entities supporting two different types of interfaces with push and pull call semantics for control and data path communication. Depending on the underlying NodeOS [10], Chameleon supports the interconnection of different EEs. In its current implementation, Chameleon makes use of a Click Linux kernel EE and a proprietary Java-based EE. Chameleon focuses on the deployment of network services onto different network nodes. Thus, it provides the mechanisms and architecture to cope with a priori unknown network nodes. PromethOS NP, however, defines a specific architecture of a powerful router platform. Hence, for the modelling of network services, we need a service model that meets our needs and provides the capabilities to define the service infrastructure for heterogeneous NP-based network nodes, and, thus, resides at a different level of service modelling. 2.4 CORBA CORBA [11] has defined the Common Object Request Brokerage Architecture to interconnect various, heterogeneous, distributed components by the mechanisms of the object request broker (ORB). The CORBA component model (CCM) [12] defines a component as a meta-type model with the encapsulated respective function. For component description and interface specification the Interface Description Language (IDL) is used. The CCM provides four different component interfaces named facets, receptacles, event sources and sinks. Facets are named interfaces for client interaction, receptacles are connection points, event sources are points that emit events to one or more interested event consumers, and event sinks are the corresponding event targets. By the mechanisms of stubs and skeletons, a client-server architecture for distributed components is specified. The ORB provides the communication among distributed components in a way transparent to the creator of the CORBA service. Due to the level of abstraction, however, CORBA suffers from too much overhead for an efficient router platform.
Network Services on Service Extensible Routers
57
3 Network Services After the review of four specific projects of related work we introduce in this section our service model and then present its Service Programming Language (SPL) that is used to export the concepts of the service model. 3.1 Service Model The goal of our service model is the modelling of a flexible service infrastructure that provides the mechanisms needed for the seamless integration of new service elements. Services are modelled as a graph of edges and vertices with edges representing chains of service components, and vertices defining the interconnection between them. The definition of network services is based on six constituent concepts: data path service components, control service components, service chains, guards, hooks, and name spaces that identify the service on an extensible router.
SCB_in
CCI_in
CCI_out
SCB_out
SCB_in
CCI_in
CCI_out
function
function data_in
data_out
(a) Data Path Service Component
SCB_out
data_in
Ctrl_out
Ctrl_in
data_out
(b) Control Service Component
Fig. 1. PromethOS NP Service Component
In Fig. 1, the models of a control and data path service component are visualized of which we refer to both by the term service component if no specific distinction is required. The service component defines a function according to the plugin model [5], but extends the interfaces provided. In addition to the data in- and output ports, our service component defines Service Control Bus (SCB) in- and output ports, and component control in- and output interfaces (CCIs). By the data in- and output ports, network traffic is received and sent out. The SCB serves for the propagation of service-internal signals between subsequent service components. The semantics of the signals on the SCB are service specific except for three signals (ACCEPT , ABORT , CHAINEND) that causes service infrastructure to accept a packet, abort the current service processing or signal the end of the service chain. The SCB interfaces allow for multiple read but only for a single write operation of the signal. Optionally, a service component exports control inand output ports. CCIs provide the control interfaces to configure and retrieve control information of a service component at run-time. Service components are defined for two different purposes: they provide data path service functionality (cf. Fig. 1(a)) or they provide service internal control functionality (see Fig. 1(b)). Control service components are either separate control components or they are inserted into the service path of data path service components, too. Control service components may be periodically triggered by timed events providing, thus, the
58
L. Ruf et al.
required flexibility of control functionality. They offer the same interfaces but export in addition two controller interfaces (Ctrl in and Ctrl out in Fig. 1(b)) that define a multiplexing semantic to control multiple other service components provided the control service component implements the required functionality. Both types of service components have specific resource requirements and characteristics. Resource requirements specify the amount of resources they need for their instantiation and their processing of network traffic while resource characteristics identify the type of resources needed. For example, different memory types exist on a NP-blade of which a service component consumes a specific amount or, as another example, different instruction set architectures (ISAs) are available on a NP depending on the processor cores implemented. Service chains provide an aggregation of one or more service components that are strongly linked. A chain of strongly linked service components refers to the fact that only signals along the SCB are propagated between service components, and between service components and the service infrastructure. No demultiplexing of network traffic is available between the elements of a service chain allowing for fast pipeline-style processing by subsequent service components. Guards provide the demultiplexing functions that control the acceptance of network traffic to enter service chains. Their definition has been inspired by the concept of guarded commands [6]. In our service model, guards are represented by service components that signal the acceptance or rejection of network traffic by the mechanisms of its SCB output port. Hence, they are the first service components of a service chain. Hooks are key elements of the respective name space. Within a name space, they are identified by their label. They are created in the service program on demand. At creation time, the dispatching semantics are specified. If ingress hooks are created, they must be bound to a network interface. Otherwise, they must refer to previously created ones. Egress hooks may be dangling if required, thus implying the discard of arriving packets. The purpose of dangling outbound links is the provisioning of a hook for later service additions to extend provided functionality. Moreover, hooks serve for the embedding of service chains. They initiate and terminate a service chain. Multiple service chains are attached to hooks. Hooks provide the dispatching of network traffic to service chains since guards steer the demultiplexing of network traffic per service chain. Dispatching semantics have been defined by two different methods to which we refer by the terms copy and first-match-first-consume, respectively. The dispatching semantics are important for the specification of network services since it is a service design decision how network traffic is processed by different service chains. We explain the difference between the two dispatching methods by the help of Fig. 2. In this figure, five service chains, enumerated from 1 to 5, are embedded between two 1 2
hook_in
4
3
5
hook_out
Guards Service Chains
Fig. 2. Hooks, Guards and Service Chains
Network Services on Service Extensible Routers
59
hooks labelled hook in and hook out. The order of service chains is defined by the service program created by the means of our SPL that is presented next. In case of the copy method, the initiating hook dispatches network traffic to all five service chains creating copies of the packets on acceptance by the guards. On the other hand, if the firstmatch-first-consume method has been specified, packets are presented to the guards in the order of service chain specification. Upon acceptance of a packet, the processing at hook in is finished. For both methods, packets are discarded if no guard accepts a packet. Name spaces are abstract constructions of our service model that are used to avoid name collisions between services. Name collisions would occur, for example, if hooks were labelled identically for different services and then reused for extending a previously deployed service with service additions. service name space
Guard CCI_in CCI_out data_in data_out
Fc
G
Ctrl_out Ctrl_in hook3
hook1 F1 hook0
F2 SCB_in
F3 hook2 SCB_out SCB
Fig. 3. Control and Data Path Relations Among Service Components
In Fig. 3, a service graph is presented that consists of four service components named F1, F2, F3 and Fc embedded between four hooks as well as of a guard labelled G that controls the packet acceptance for its service chain. It illustrates the data path and control relations between service components with Fc controlling F2. In Fig. 3, this controlling functionality is indicated by the letter C . Moreover, the figure visualizes the SCB that accompanies service chains. 3.2 The Service Programming Language The SPL specifies a network service for the service infrastructure of the PromethOS NP router platform. It defines the Service Programming Interface (SPI) exported by the router platform for the creation and extension of new network services. EBNF 1 presents the key elements3 of the definition of the PromethOS NP Service Programming Language (SPL). The language definition is based on a modified form of the Extended Backus Naur Form (EBNF) [17] that deviates from Wirth’s definition regarding the repetition-operator of elements denoted by bracelets ({..}). According to [17], the repetition-operator contains zero or more elements. For our purposes, we redefined the repetition-operator to produce one or more elements since 3
Self-explanatory productions like, for example, BW, CYCLES or RAM are not included due to space constraints. Note that we refer to the key = value pair by the term production.
60
L. Ruf et al.
ID TIMED BW RES CPU RES RAM RES PROC TYPE CTRL INFO COMP SPEC
= = = = = = = =
COMP IDENT = SERV COMP = CTRL COMP = CTRL CHAIN = COMP STRING= GUARD = HOOK IN = HOOK OUT = SERV CHAIN =
SERVICE
=
"#" VALID NAME . " timed ="DELAY . " bwmin ="BW " bwmax ="BW [ " pps="NUMBER ] . " cpumin ="CPU " cpumax ="CPU . " type ="ID " rammin ="RAM " rammax ="RAM. ( " ia32 " | " ia64 " | " np4" | " np4_pp " | " ixp2400" | " ixp2400_pp" | . . . . ) . ( STRING | " file =" VALID NAME ) . ( " src " [ ID ] | " bin" ( PROC TYPE | ID ) ) [ "|" CPU RES ] [ { "|" RAM RES } ] . ( [ "(" COMP SPEC ")" ] VALID NAME ID | ID ) . COMP IDENT [ ":" ID ] "(" [ CTRL INFO ] ")" . [ TIMED ] SERV COMP { "!" ID "@"NUMBER } . "{" { CTRL COMP } "}" . "{" { SERV COMP } "}" . "[" [ "|" BW RES ] [ SERV COMP ] "]" . ( ID | " >" ID [ " copy " ] "?" INTF ) . ( ID | " >" ID [ " copy " ] [ "?" INTF ] ) . HOOK IN "@" [ TIMED ] [ GUARD ] COMP STRING "@" HOOK OUT . "{" ID [ "!" CTRL CHAIN ] { SERV CHAIN } "}" .
EBNF 1. The PromethOS NP Service Programming Language
the optionality-operator is defined already by pairs of brackets ([..]). Thus, the semantics of the original zero-to-many repetition-operator is expressed as [{..}] by our EBNF variant. The fundamental concept of the SPL is the linear specification of arbitrary service graphs consisting of service and control chains. Based on the concept of hooks to which service chains are attached, graphs are created out of the linear specification. The key element of the SPL is the service component specified by the SERV COMP production. It starts with the component identifier COMP IDENT . Part of the component identifier is the specification of the resources (COMP SPEC) required for its instantiation and the data format of the component. If it is specified as a reference to source code file (src), the platform assumes a component for the PromethOS NP processing environment for GPPs [13], and creates the respective binary component. Otherwise, in case of a binary component specification (bin), the SPL demands for the definition of the processor core type. This specification is relevant since different, incompatible ISAs may be available on a node. In both cases, the processor core can be specified (ID) on which the service component must be installed. This ID identifies a particular core per processor, and is required, for example, if not all processor cores are able to access particular hardware accelerators. The service component is then identified by the name of an object followed by its component instance identifier (ID). In case a service component instance is reused, the ID of a previously created instance is defined in the string of components. The router platform provides three pseudo components named NIL, DROP and CLASSIFY that exploit respective platform internal capabilities. Conceptually, they provide the interfaces like other service components, and their instances are identified by the same methods. Service components export CCIs optionally. In the SPL, they are specified by the ” : ID” term. Control information (CT RL INFO) to initialize a service component at service configuration time
Network Services on Service Extensible Routers
61
is specified then. It represents either a string of ASCII4 characters or by a reference to an arbitrary object. Control service components (CT RL COMP) are service components that may be triggered by a timer event (T IMED), and that are bound to the control interfaces of other service components by the !ID@NUMBER statement. There, an ID references the control port exported by another service component, and NUMBER provides the control port multiplexing functionality needed to bind controlled service components to specific control mechanisms of a control component. Guards are defined by the respective GUARD production according to its model introduced above. Please note that the specification of bandwidth limits and the maximal number of packets per second (BW RES) are specified as part of the guard production since the dispatching function of hooks needs to control these limits already for the packet dispatching to guards such as to separate control from service function. Hooks are specified by their respective productions (HOOK IN and HOOK OUT ). Reuse of a hook is specified by the notion of a previously created hook identifier (ID). The creation of hooks is initiated by the literal ” > ”, followed by the hook identifier (ID), the optional specification of the ”copy” method for the hook’s dispatching semantics and the binding of a hook to an interface ”?INT F”. In case no ”copy” method is specified at hook creation time, the dispatching semantics follow the first-match-firstconsume method. Note that the definition of the dispatching semantics for outbound hooks (HOOK OUT ) is needed since they are reused for further service chains potentially. Ingress hooks bound to network interfaces receive packets from the router platform following the copy method, i.e. all hooks bound to a network interface receive every incoming network packet. Analogous to the service components, the router platform provides a pseudo hook named NIL that is used to satisfy the SPL syntax for dangling hooks that are never extended, or for service chains that do not receive but just generate data. Service chains are then specified by the SERV CHAIN production that provides the aforementioned semantics of the service chain concept. Note the optional definition of a maximal delay (T IMED) the service chain is allowed to add on a packet processed by the service chain. The optional definition of the guard production allows for the specification of catch-all service chains as required for fall-back service paths if no previously defined guard accepted a packet. The service (SERVICE) is identified by its service identifier (ID) that specifies the service name space. Optionally, a service consists of a control chain (CT RL CHAIN) that contains the control service components, followed by the definition of the constituent service chains for data path packet processing.
4 Evaluation For proof of concept of our SPL, we illustrate its capabilities by a service program and its corresponding visualization hereafter. 4
ASCII – American Standard Code for Information Interchange as defined by the ISO/IEC standard 646.
62
L. Ruf et al. Table 1. Three Parallel Service Chains Visualization
Chain 1
Chain 2
NIF1
3
component
2
component
component
1
{ # threeparallel > # hook1 hook1 demux1 demux3 ? NIF1 @/ ∗ HOOK ∗ / demux2 [ / ∗DEMUX1∗ / ] { / ∗ COMP STRING ∗ / ( bin ia32 ) component1 #instance1ID ( / ∗ CTRL INFO ∗ / ) } @/ ∗ HOOK ∗ / > # hook2 hook2 ? NIF2
/ ∗ e x t e n d hook1 ∗ / # hook1 @/ ∗ HOOK ∗ / [ / ∗DEMUX2∗ / ] { / ∗ COMP STRING ∗ / ( bin ia32 ) component2 #instance2ID ( / ∗ CTRL INFO ∗ / ) } @/ ∗ HOOK ∗ / # hook2
Chain 3 / ∗ e x t e n d hook1 ∗ / # hook1 @/ ∗ HOOK ∗ / [ / ∗ DEMUX3∗ / ] { / ∗ COMP STRING ∗ / ( bin ia32 ) component3 # instance3ID ( / ∗ CTRL INFO ∗ / ) } @/ ∗ HOOK ∗ / # hook2 } / ∗ S e r v i c e End ∗ /
NIF2
Three Parallel Service Chains. Table 1 presents a simple exemplary service program that defines a network service with three parallel service chains. The service program illustrates the linear specification of a service graph with parallel service chains. The service identifier (#threeparallel) is followed by the creation of hook1. No copy method is specified. Hence, its packet dispatching semantics follow the first-match-first-consume method in the top-down order of specified service chains. Hook1 is bound to one network interface (NIF) that is symbolized by the term NIF1. The service chain that consists of component1 is attached to hook1, first. While the figure in Table 1 illustrates the demultiplexing of flows to the particular service chains by attaching abstract demux conditions to the links between hook1 and the respective service chain, no real demultiplexing is specified in the service program. However, demultiplexing conditions are indicated in the service program by the respective comments. All service chains lead into hook2, which is bound to the second NIF (NIF2). The second and third service chains follow the same principle. Their specification differs from the first service chain by that hooks are re-used, i.e. the newly defined service chains are attached to the existing hooks.
5 Summary and Conclusions In this paper, we have introduced the PromethOS NP service model and presented its Service Programming Language (SPL) that is used to specify network services. The SPL provides, hence, the Service Programming Interface (SPI) of PromethOS NP to create new network services and to define additions to previously deployed ones on extensible routers. The service model provides the concept of a name space that is used to create the environment for network services of which multiple may reside in parallel on an extensible router platform like PromethOS NP. Within a name space, services are defined as a graph of service chains with constituent service components for data path processing. They are controlled by the service control chain realizing distributed, service internal control relations. Service chains are embedded between pairs of hooks. Hooks provide
Network Services on Service Extensible Routers
63
the dispatching functionality of network traffic to service chains that accept packets depending on their guards. Hooks are dynamically created within a service and serve from thereon as the reference point for service additions to extend previously deployed network services. The SPL has been proposed as a context-free service programming language of our service model. Its syntax has been defined in a modified EBNF notation, and the semantics of the important production have been introduced extensively. For a proof of concept, we have applied our SPL to define an exemplary service program that illustrates the fundamental concept of a linear specification of arbitrary service graphs and their internal data path and control communications. We are convinced that our service model with the SPL provide a suitable way to specify distributed network services for service extensible routers. The model contributes to research by three novelties: 1.) flexible service extensibility based on hooks that are dynamically created, 2.) the 1:n bi-directional control relation between a control and multiple controlled service components, and 3.) the service control bus that propagates signals between subsequent data path service components. The SPL proposes a concise method to specify network services that are based on our service model. Our SPL extends previous work by the concepts resource constraints assigned to service chains. The definition of the pseudo component NIL provides the methods to define syntactically correct service programs with cut-through channels, the DROP element supports explicit packet dropping, and the CLASSIFY component is used to exploit platform internal classification mechanisms like hardware supported packet classification engines. Moreover, the CLASSIFY component together with the instance re-use method provides the capability to exploit mechanisms of advanced network processors in which multiple disjoint rules may be compiled into an advanced matrix-based packet classification that all lead to the same service component. Based on our service model, the service infrastructure of the PromethOS NP router platform has been designed and implemented. The SPL is currently used as the SPI to the PromethOS NP router platform for service creation and extension. However, we are convinced that our service model with its SPL provides the concepts for applications in a larger scope than only for node-local network service creation. As an example, we envision their use for other distributed component-based data processing applications, such as staged image processing that need service internal data and control relations.
References 1. Becker, T., Bossardt, M., Denazis, S., Dittrich, J., Guo, H., Karetsos, G., Takada, O., Tan, A.: Enabling customer oriented service provisioning by flexible code and resource management in active and programmable networks. In: IEEE International Conference on Telecommunications (ICT), Bucharest, Romania. IEEE, Los Alamitos (2001) 2. Bossardt, M., Antink, R.H., Moser, A., Plattner, B.: Chameleon: Realizing automatic service composition for extensible active routers. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982. Springer, Heidelberg (2004) 3. The FAIN Consortium. D14: Overview FAIN Programmable Network and Management Architecture (May 2003) 4. da Silva, S., Florissi, D., Yemini, Y.: Composing active services with NetScript. In: Proc. DARPA Active Networks Worshop, Tucson, AZ (March 1998)
64
L. Ruf et al.
5. Decasper, D., Dittia, Z., Parulkar, G., Plattner, B.: Router Plugins: A Software Architecture for Next Generation Routers. In: Proc. of the ACM SIGCOMM 1998 Conf., Vancouver, British Columbia, Canada. ACM Press, New York (1998) 6. Dijkstra, E.W.: Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Commun. ACM 18(8) (1975) 7. IBM Corp. Datasheet IBM NP4GS3(March 2004), http://www.ibm.com 8. Intel Corp. Intel IXP2xxx hardware reference manual (2003), http://www.intel.com 9. Kohler, E., Morris, R., Chen, B., Jannotti, J., Kaashoek, M., Modular, C.: The Click Modular Router. ACM Transactions on Computer Systems 18(3) (August 2000) 10. Peterson, L. (ed.): NodeOS Interface Specification. Active Network Working Group (January 2001) 11. Object Management Group (OMG). The Common Object Request Broker: Architecture and Specification. TC Document 91.12.1, Revision 1.1, OMG (December 1991) 12. Object Management Group (OMG). CORBA Components. Technical Report Version 3.0, OMG (June 2002) 13. Ruf, L., Keller, R., Plattner, B.: A Scalable High-performance Router Platform Supporting Dynamic Service Extensibility On Network and Host Processors. In: Proc. of 2004 ACS/IEEE Int. Conf. on Pervasive Services (ICPS 2004), Beirut, Lebanon. IEEE, Los Alamitos (2004) 14. Ruf, L., Pletka, R., Erni, P., Droz, P., Plattner, B.: Towards High-performance Active Networking. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982. Springer, Heidelberg (2004) 15. Ruf, L., Wagner, A., Farkas, K., Plattner, B.: A Detection and Filter System for Use Against Large-Scale DDoS Attacks in the Internet Backbone. In: Minden, G.J., Calvert, K.L., Solarski, M., Yamamoto, M. (eds.) Active Networks. LNCS, vol. 3912, pp. 169–187. Springer, Heidelberg (2007) 16. W3C XML Working Group. Extensible Markup Language (XML). Recommendation 6, W3C (October 2000), http://www.w3c.org 17. Wirth, N.: What can we do about the unnecessary diversity of notation for syntactic definitions? Communication of the ACM 20 (1977)
A Network-Based Response Framework and Implementation Marcus Tylutki and Karl Levitt University of California, Davis, CA 95616, USA {tylutki,levitt}@cs.ucdavis.edu
Abstract. As the number of network-based attacks increase, and system administrators become overwhelmed with Intrusion Detection System (IDS) alerts, systems that respond to these attacks are rapidly becoming a key area of research. Current response solutions are either localized to individual hosts, or focus on a refined set of possible attacks or resources, which emulate many features of low level IDS sensors. In this paper, we describe a modular network-based response framework that can incorporate existing response solutions and IDS sensors. This framework combines these components by uniting models that represent: events that affect the state of the system, the detection capabilities of sensors, the response capabilities of response agents, and the conditions that represent system policy. Linking these models provides a foundation for generating responses that can best satisfy policy, given the perceived system state and the capabilities of sensors and response agents. Keywords: Autonomic response, response modeling, response framework.
1
Introduction
The first intrusion detection systems were developed as low level sensors that detected attacks by using attack signatures on low level event logs [1,2,3]. Since these sensors lacked the context of a high level system policy, correlation-based intrusion detection systems were developed, which allowed for a broader context for interpreting the higher-level effect of an observed event. Similarly, current response systems operate on a relatively small scope of possible responses and state assessment with respect to policy. As an example, the Intrusion Detection and Isolation Protocol [7] uses a simple cost model for each link and modifies firewall rules to isolate infected hosts from the rest of the network. The Light Autonomic Defense System (LADS) [8] is an effective host-based solution, but does not incorporate a system-wide policy or systemwide responses. CIRCADIA [9] is a network-wide solution that uses a simple cost model with a table lookup for determining appropriate responses. Toth and Kruegel [10] present a useful dependency-based response model that can create and modify firewall rules, kill and restart processes on individual hosts, and
This work was sponsored by NSF grant ITR-0313411.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 65–82, 2009. c IFIP International Federation for Information Processing 2009
66
M. Tylutki and K. Levitt
reset a user profile to a predetermined template. However, their attack domain is mainly limited to resource management problems and they do not present a model capable of incorporating other response systems. In addition, these response systems assume the sensors that provide their alerts are infallible. Some response agents, such as Honeynet [14,13] and the Deception Toolkit [11,12] are extremely useful response agents for a larger response model. These response agents can be refined with information from higher-level events. If a high-level IDS predicts an attacker’s goals or future targets, this information could be used to reconfigure these agents to better deceive the attacker. Other response agents, such as DDoS mitigation systems [4,5,6], can also be used and configured based on their requirements and expected performance. This paper presents the Autonomic Response Model (ARM), an expressive model that unites sensor capabilities, response agent capabilities, attack and state related events, and system policy. This allows for the model to generate a policy-based optimal response. An implementation and framework based on this model is discussed in detail, as well as some experimental results.
2
Autonomic Response Model
The decision agent for ARM receives alerts from intrusion detection systems (i.e., sensors), in addition to agents that report policy changes and model-based changes. Any resulting response set is submitted to the corresponding response agents. The decision agent also recalculates the optimal sensor configuration with respect to policy. If this differs from the previous global sensor configuration, sensor configuration updates are submitted to the appropriate sensors. 2.1
Basic Components
This model uses several components as building blocks. Event classes contain attribute/value based pairs that are used to describe attack events, policy events, and state events. Each event class also has a predetermined set of policy-derived detection constraints that must be satisfied by current sensor configurations. Event instances are instances of event classes and describe an aspect of the current perceived state of the system. Alerts from sensors are translated into event instances. Each event instance has an associated false positive probability (F P P ) that represents the probability that the corresponding event that the event instance represents does not exist. This probability is directly determined by the sensor configuration that reported it or the highest F P P of all prerequisite event instances of the event instance. Rules describe the relationship between event classes. If event instances of the prerequisite event classes exist to satisfy all of the preconditions of a rule, postrequisite event instances are generated. These postrequisite event instances are initialized from postrequisite conditions associated with the rule, which refer to attributes of the prerequisite event instances or rule-specified constant values. In addition, prerequisite event instances may be modified by a postcondition.
A Network-Based Response Framework and Implementation
67
Sensor configurations are represented by the detection thresholds for each event class it detects. A detection threshold is represented as a threshold for F P P , false negative probability (F N P ), and timeliness (T ). F P P represents the probability that an alert produced by the sensor configuration is based on an event that does not exist. F N P represents the probability that given an event exists that should have been reported as an alert, it was not. T represents the estimated time the sensor configuration takes to report the alert from the moment the event takes place. This information can be obtained from receiver operating characteristic (ROC) curves for each event class and sensor pair1 . Many attack modeling languages can be used for these components [15,16,17,18]. In addition, to make the translation from sensor alerts to event instances, a common report language adapted from CIDF [19] or IDMEF [20] can be used, despite the difficulties they have encountered in gaining widespread acceptance. 2.2
Prevention and Recovery Response Model
Alerts are translated into event instances and are processed one at a time by the decision engine. System policy is represented using event classes, rules, and event instances that represent key aspects of system state. If an event instance is created that belongs to a policy violation labeled event class, then the decision engine searches for an optimal response set to handle the problem. This response set recovers from the effects of the policy violation and prevents attempts from an identical attack vector from resulting in another policy violation. Each policy violation event class also has a field that determines the acceptable F P P threshold that the corresponding event instance’s F P P must be below to be considered as a valid policy violation. If a policy violation event instance (P V EI) does not exist, or existing P V EI F P P s surpass their corresponding thresholds, then the decision engine waits for more alerts. Finding Solutions. Each P V EI has at least one prerequisite event instance that matched a rule to create it, even if it was generated from a one-to-one rule matching. Each prerequisite may have additional prerequisites that contributed to the generation of the P V EI. This generates a tree of event instances that resulted in the generation of the P V EI. A satisfying response solution set is able to recover all event instances on a path within this tree, as well as preventing at least one of these event instances from reverting back into the state that resulted in the generation of the P V EI. Recovery responses are therefore highly relative to the context of the event instances they are recovering within the policy violation tree. Recovery responses are designed to break the conditions of the rule in which the corresponding event instance was used to generate 1
It is acknowledged that most current ROC curves for sensors describe their average overall detection capabilities, rather than being associated with specific alert types or categories. It is also acknowledged that these probabilities and values are highly dependent upon the environment in which they are recorded.
68
M. Tylutki and K. Levitt
the P V EI. Prevention responses are designed to ensure that newly changed values do not easily revert to their previous values. Prevention responses on event instances at leaf nodes of the policy violation tree, which are not translated from an attack alert2 , are resistant to attacker influence from previous attack vectors. If a prevention response is unavailable due to the lack of information on the origin of an event instance, backchaining can be applied. Suppose a Tripwire [21] sensor reports that a filesystem has been compromised, but no other alerts can identify the source of the service that resulted in the compromised filesystem. Backchaining obtains all services with access to the compromised filesystem that may be at fault. If prevention responses are initiated for all of these services, the effect is the same as initiating a prevention response for the compromised filesystem event instance. Similarly, an anomaly-based sensor may not be able to pinpoint the origin of an event instance as well as a signature-based sensor. Alerts from a signaturebased sensor typically directly determine the specific vulnerability that the attack attempted to exploit. By comparison, an anomaly-based sensor may only be able to report generic attack behavior from a particular source to a particular host and/or service. Backchaining can be used to cover all possible attack behaviors against the targetted service. Evaluating Solutions. The response set for a particular path represents the response event classes that are associated with rules or other event classes. Each generic response event class can be initialized into a response event instance based on the values of the corresponding event instance to which it is responding, or based on the prerequisite event instances and prerequisite conditions of the corresponding rule matching within the policy violation tree. Within a specific path in the policy violation tree, different response sets are tested by temporarily adding the corresponding response event instances. After the testing of all combinations of a path are complete, new paths in the tree are tested. The response set that produces the best state assessment when tested is the response set that is initiated. A simple metric for assessing the state is the sum of all state assessment values (SAV s) of current event instances. Rather than have a SAV for all possible states, this approach has a SAV for each event instance, which corresponds to that event instance’s influence on the assessment of the overall state of the system. Event instances that are critical with respect to policy, such as critical services, have positive SAV s. Event instances that represent penalties to the system policy and impact the availability or integrity of the system, have negative SAV s. Each event instance has a SAV associated with the event class from which is it a member. Rules and policies can have exceptions to this default allocation can modify or override these values. This assessment method can be enhanced through the addition of assessment rules. These rules modify the overall SAV of a state based on the presence or 2
Alerts from system state sensors, such as sensors that scan current versions for available services, are acceptable.
A Network-Based Response Framework and Implementation
69
absence of particular event instances, and can be synergistic (i.e., two event instances result in a net SAV greater than their sum) or dyssynergistic (i.e., two event instances result in a net SAV less than their sum). Cost models based on risk analysis [22] can also be adapted to determine these values. Once a global optimal response set is found, these response event instances are added to the decision agent’s current state, and each response event instance is submitted to the appropriate response agent. Each response agent that receives the response makes the appropriate changes to its local configuration. 2.3
Sensor Retargeting
Sensors can become reconfigured based on policy changes. As critical tasks for a system change, so should its policy. Some of these critical tasks may be time dependent, short term tasks, while others may be more long term tasks. These tasks are represented in the policy model and correspond to individual event instances with corresponding SAV s. In addition to their use for assessing the state of the system, SAV s can be used to prioritize the detection of particular event classes. SAV s of an event class can be used to directly determine its allowable detection thresholds. In addition, sensors have resource costs. If costs were ignored for integrity scanners, such as Tripwire, then the constant operation of these scanners are likely to impact the performance of the scanned hosts. Instead, the traditional tradeoff between performance and stability (or security in this case), is acknowledged, allowing these scanners to only run periodically. The balance of this tradeoff can be shifted depending on the SAV of the event class that the sensor attempts to detect. A higher SAV event class is more critical, and therefore results in lower detection thresholds. Si =
n
(RLr − Rk (i))
(1)
r=1
The overall cost for a set of sensor configurations can also be assessed with a more complex load balancing metric, as shown in Equation 1, where the solution set with the highest value of Si is considered the most efficient with respect to policy. A similar metric can be adapted that measures the distance between current detection capabilities of sensor configurations and event class detection thresholds. This alternative metric prefers better detecting sensor configurations over resource conserving sensor configurations. Sensors can also be preemptively retargeted. Suppose an event instance (EIn ) is generated from a rule matching of other event instances. If EIn belongs to an event class with very low detection thresholds due to its SAV , then these detection thresholds are passed down to the prerequisite event classes from the rule that generated EIn . The first step for this detection threshold propagation is to obtain γ(P, r), which represents how close prerequisite event instances in P are to creating a successful match with rule r and is defined in Equation 2. Each event class possesses an α value, which represents the event class’ relative importance
70
M. Tylutki and K. Levitt
compared to other event classes for rule matchings3 . γ is initialized with the sum of all α values of prerequisite event classes, multiplied by a β factor4 , which is an attribute of the rule that is matched. If a prerequisite event instance matches all preconditions for the rule, the entire α value for that prerequisite’s event class is subtracted from γ. Partial matches result in only subtracting α2 . These values are also influenced by the false positive probabilities of each prerequisite event instances, as shown in Equation 2. γ(P, r) = βr (αi ) − (αj (1.0 − F P Pj )) − i∈P
k∈P artial(P,r)
j∈F ull(P,r)
α
k
2
(1.0 − F P Pk )
(2)
New detection thresholds are propagated for prerequisite event classes from each detection threshold of each postrequisite event class. The k th new propagated F P P detection threshold (N DF P Pi,k ) for prerequisite event class i is defined in Equation 3, where DF P Pj,n represents the nth F P P detection threshold for postrequisite event class j. False negative probability thresholds and timeliness thresholds are propagated using the same equation, but for their respective threshold values. γ (P, r) N DF P Pi,k = 1.0 + DF P Pj,n (3) αi 2.4
Attacks and Countermeasures
Attacks against ARM exploit the expressiveness of event classes and the established relationships between them. Even with a perfect model of event classes and their relationships, attacks may exploit the time it takes for the model to respond to an event by flooding it with trivial decisions that take a significant amount of time for the decision agent to determine. In addition, poorly designed models may cause infinite loops for exhaustive decision agents. When control systems perpetually overestimate or underestimate the current or future state of the system, increasingly inaccurate responses can occur, resulting in an unsteady state. Control systems used in chemical and electrical engineering have adapted meta-control agents that observe the behavior and results of their control system to enhance their reliability and have been able to prevent unsteady states from occuring within limited domains. Through the application of a similar meta-control agent for this model, many of the effects of these attacks can be mitigated. 3
4
As an alternative, α values could be tied to a specific prerequisite event class and rule pair which would be more precise, but would likely result in more α values to calibrate. β is considered to be greater than 0. This can result in propagating more strict detection thresholds if β is less than 1, since γ can be negative for this case. Additionally, if β is always greater than 1, the propagating detection thresholds are always less strict.
A Network-Based Response Framework and Implementation
3
71
Implementation
Despite the previously mentioned response systems, and a response testbed [23], a modular response framework and testbed is not readily available. This modular response framework was developed and used in the Emulab [24] environment and is freely available upon request.
Fig. 1. Overall Response Framework
The implementation of this response framework that is presented in Figure 1 is comprised of four types of agents. The host-based agent is responsible for: simulating the effects of host and network based intrusion detection sensors, executing responses received from the response agent, and storing local host-based vulnerability profiles and sensor configurations. The aggregation agent is responsible for aggregating reports received from all host-based agents and submitting the relevant new reports to the response agent. Reports that have a higher false positive probability than a previously received report is not forwarded to the response agent. The response agent receives all sensor reports from the aggregation agent. Each report is processed individually. If a new policy violation occurs, the response agent searches for an optimal response set. If one is found, the corresponding responses are submitted to all applicable host-based agents. In addition, new sensor configurations are evaluated to determine if a new global sensor configuration better satisfies system policy. New sensor configurations are sent to the corresponding host-based agents. The controller agent is responsible for initializing all the other agents, and initiating any external attacks against the network.
72
3.1
M. Tylutki and K. Levitt
Document Types
Messages passed between agents are in the form of XML documents. This subsection briefly describes the XML schemas used in the implementation. The event class schema contains the unique identifier for that event class, as well as all appropriate fields that are associated with the class. Fields can be of type integer, float, or string. This schema also supports detection thresholds for the event class, as well as the α value previously discussed in Subsection 2.3. The alert schema contains a unique identifier of the host and attack event that triggered it, as well as a F P P value that is associated with the probability that the alert is a false positive. Alert schemas contain fields that represent specific values that differ from the defaults of the event class from which it is derived. In addition, alert schemas are used for passing response messages from the response agent to the host-based agents. The host profile schema contains information about a host, including its IP address and services available on that IP address. This schema also mentions which filesystems each service has access to, as well as the version number of the service and its dependencies with other services. The IDS profile schema lists the identifiers of the event classes that the specific configuration detects and their corresponding detection values for F P P , F N P , and T . This sensor configuration schema also contains a generic resource cost for operating the sensor with this configuration. This schema also supports a NULL configuration for sensors that are disabled. The rule schema includes prerequisite event class identifiers, postrequisite event class identifiers, preconditions, and postconditions. Preconditions are represented by referencing the local rule identifier of the prerequisite event instance along with the name of its field that is being compared. For example, position 3 and field “Port” would refer to the rule’s third prerequisite event instance’s port value. Comparisons can be made to a constant value, or to another prerequisite event instance’s field value. The supported operators are equal, not equal, greater than or equal, less than or equal, greater than, and less than. Postconditions only support the equal operator for the initialization of postrequisite values. Postconditions can also be used to modify prerequisite event instance field values by referencing negative identifier values. In addition, rule schema include the β value of the rule that is used for preemptive sensor retargeting previously described in Subsection 2.3. The response map schema maps event classes to recovery and prevention response event class sets. It references the event class identifiers for each event class involved in the mapping, as well as additional fields for the response event classes to specify additional initialization information, as well as additional fields for the source event class that limit the applicability of the mapping. For example, a response map for an infected filesystem event class could restrict the response map to a specific filesystem. In addition, the response map specifies if the corresponding response sets are prevention response sets, recovery response sets, or both. The next two schemas were adapted from Joseph McAlerney’s thesis [25], which presented a framework for simulating and recording worm behavior using
A Network-Based Response Framework and Implementation
73
agents and XML documents. The event profile schema represents the vulnerability profile of the host-based agent with respect to an individual attack event class. This schema was modified to support a requirements field, which lists the required services or filesystem needed for the corresponding attack event to succeed. The event properties schema is used for documents that represent attack events. It specifies propagation details, including the rate and amount of attacks that are initiated, if the attack event is intended to propagate. This schema was modified from the worm simulation version by adding fields to represent the effects of the attack, which correspond to creating specific event instances. This schema was also modified to represent the filesystem (including memory) that the attack resides in, if it is persistent. 3.2
Host-Based Agent
The code used for the host-based agents were adapated from the worm simulation project [25] to support responses, sensor reconfigurations, sensor simulation for false positives, false negatives, and timeliness values. The code was also adapted to support the changes to event profiles and event properties discussed at the end of Subsection 3.15 . Host-based agents are the only agents in the implementation that are multi-threaded. When a new document is received, a new thread is created to parse and process the document. Mutexes are used to ensure shared data constructs are not shared while a thread is operating in a critical section. A host-based agent is initialized with: all event class definitions used in an upcoming experiment, current IDS configurations, a host profile document describing the simulated services running on the host, and an event profile document for each attack class used in the upcoming experiment. Once an event properties document is received from either another host-based agent or the controller agent, the document is parsed into a local structure and current IDS configurations are checked to determine if any sensors successfully detect the attack. If the randomly generated probability is above the detecting sensor configuration’s F N P value for the attack’s corresponding event class, a new thread is spawned which generates the corresponding alert and submits it to the aggregation agent after sleeping for a period of time derived from the T value of the sensor configuration. If a sensor configuration fails to detect the attack, it is locally blacklisted from being able to report future occurrences with the same attack identifier6 . If the host-based agent is not vulnerable to the received attack event, the thread terminates. Otherwise, the effects of the thread get added as local event 5
6
All other agents, with the exception of the controller agent that sends arbitrary XML files to a designated host and port, were not associated with the worm simulation project. To clarify attack identifiers, suppose an experiment consisted of two worms. Each worm would have a different attack identifier. If a host-based agent received the worm from multiple hosts, each attack event would have the same attack identifier. However, if it received a different worm from the same host, it would have a different attack identifier.
74
M. Tylutki and K. Levitt
instances, and sensor configurations that detect the corresponding event classes of these effects are examined. As above, if a sensor configuration detects an event instance, a new thread is spawned which submits the alert after sleeping. False positives are represented by receiving a non-attack event property document that mirrors a specific attack event property in every other way. If the randomly generated probability is below the F P P 7 for a given sensor configuration, an alert is generated and submitted as mentioned above, representing a false positive that appears identical to a true positive. Host-based agents are also responsible for updating local sensor configurations to those received from the response agent. In addition, host-based agents process responses from the response agent. Some responses, such as filesystem recovery responses, require a delay, which is specified in the appropriate delay-related fields of the response. Similar to alert reporting, a new thread is spawned which sleeps for the amount of time the response takes to complete. Other responses, such as firewall rule changes, are made instantaneously. In addition, since dependencies between filesystems and services are represented, responses that temporarily disable services or filesystems also disable services that require them. These filesystems and services are restored when the corresponding response is complete. Some responses require a service or filesystem to be available before it can become available again. Responses that have overlapping requirements are initiated sequentially on a first-come-first-serve basis. If a response recovers a filesystem that an attack event was residing in, the attack event ceases propagation. Prevention responses prevent future attack events from succeeding by making the host no longer satisfy the requirements for the attack event that are specified in the event profile. 3.3
Aggregation Agent
When the aggregation agent receives an alert, it compares the host and attack identifiers of the alert to previously seen alerts. If it does not find a match, it records the alert and forwards it to the response agent. If it finds a match, it compares the new alert’s F P P to the recorded alert’s F P P . If the new alert’s F P P is lower than the previous matching alert’s F P P , it passes this alert on to the response agent and overwrites the old alert with the new alert. Otherwise, the alert is dropped. 3.4
Response Agent
The response agent is initialized by receiving: event class documents defining all event classes to be used in the upcoming experiment, all IDS configuration profiles for all available sensors, the host profiles of all hosts, an event profile document for each host and attack event pair, rule documents including backchain 7
Recall that the definition provided for F P P is the probability that a given alert is a false positive, rather than the probability that a false positive alert will be generated from non-attack events. As a result, each sensor configuration could also contain these alternative false positive probabilities for this purpose.
A Network-Based Response Framework and Implementation
75
rules as described near the end of Subsection 2.2, and response map documents that map event classes to available responses. When an alert document is received, it is first translated to a local event instance (EIl ). If EIl ’s values are identical to a currently existing event instance (EIp ), and EIl ’s F P P is lower than EIp ’s F P P , then EIp ’s F P P is updated to EIl ’s F P P , detection propagation thresholds are recalculated, and the response agent skips attempting to match the new event instance with other existing event instances. Otherwise, rules that require EIl ’s event class are then checked with EIl and all current event instances. As event instance combinations are tested, preemptive detection threshold propagation occurs, as discussed previously at the end of Subsection 2.3. Newly generated event instances inherit a F P P equal to the highest F P P of their prerequisite event instances. New event instances and modified event instances are added to a queue. Once all rules are checked for EIl , event instances from the queue are added one at a time, just as EIl was added, until the queue is empty. Rules that modify currently existing event instances must be designed carefully. Incorrect versions of these rules may result in an infinite loop where an event instance is constantly changed back and forth or the queue never becomes empty. All newly added and modified event instances are appended to a rollback queue as transactions. SAVoverall = (SAVi (1 − F P Pi )) (4) i∈EICurrent
If a policy violation event instance is generated, the response agent searches for a response solution set as described previously in Subsections 2.2 and 2.2. Before testing a response set, the rollback queue is cleared. After adding response event instances the system state and the resulting state is assessed, the system state is rolled back by rolling back each transaction on the rollback queue. A tested response set’s resulting system state assessment (SAVoverall ) is defined in Equation 4, where EICurrent represents the set of currently existing event instances. If a response set is found to provide a state that is estimated to be better than the current state, the responses are translated to alerts that are then submitted to the corresponding host-based agents. After the response phase, sensor configurations are analyzed with respect to all event class detection thresholds that must be upheld. If it is found that a detection threshold can not be satisfied by any existing sensor configuration, the response agent generates a local alert and the detection threshold is flagged as impossible. Some detection thresholds can also be virtual, representing the notion that they are intended to be inherited through preemptive detection threshold propagation rather than satisfied for the parent event class. All sensor configurations are then tested to determine the global sensor configuration that satisfies all detection thresholds but has the lowest impact on resources. For the purposes of this implementation, resource impact is assessed by the sum of all resource costs of sensor configurations into a single value, which could be enhanced with a more thorough cost model [22] or use of the more advanced metrics previously discussed in Subsection 2.3. If a new sensor configuration is found, the IDS profile representing the new configuration is sent to the affected host-based agents.
76
4
M. Tylutki and K. Levitt
Experiments
A worm buffer overflow scenario was used for the experiments with this implementation. The majority of experiments were executed on a 7 node network on Emulab [24] where one node provided the aggregation agent and the response agent, and the remaining 6 nodes provided the host-based agents. 4.1
Setup
The experiments used a host-based anomaly IDS, a network-based signature IDS, and a host-based integrity IDS. The host-based anomaly IDS is similar to an anomaly-based IDS presented by Wenke Lee and Salvatore Stolfo [26]. In this case, a sliding window is used to observe anomalies in traffic patterns. The larger the traffic window, the lower the false negative and false positive probability, but results in a larger timeliness value strictly based on the size of the window, which includes window sizes of 5, 10, 30, 60, and 90 seconds. Since this sensor takes a traffic stream as input, if there exists a temporary cache of this traffic, the retargeted sensor could process old traffic with a new sensor configuration for an additional attempt to detect an attack or provide more evidence to a correlationbased sensor.The network-based signature IDS is loosely based upon Snort [27] or Bro [28] and only has default and NULL configurations. The timeliness values for this sensor is estimated to be approximately 80 milliseconds, based on a report presented in [29]. The host-based integrity IDS is Tripwire [21], which can scan a filesystem every 10, 15, 20, 30, 45, 60, 120, 240, or 720 minutes. The more frequent filesystem checks are intended for small but critical filesystems. Available responses included upgrading a service, disabling a service, and restoring a filesystem. The worms tested were run at propagation speed of one scan per 5 microseconds (fast), one scan per 50,000 microseconds (medium), and one scan per 80,000 microseconds (slow). In most experiments the vulnerability density was set to 0.5, representing 3 vulnerable nodes and 3 invulnerable nodes in the 7 node experiments. Experiments that exhibit the retargeting capabilities of the implementation used a vulnerability density of 0.83, which resulted in 5 vulnerable nodes and one invulnerable node in the 7 node experiments. 4.2
General Results
Figure 2 presents the average of 10 experiments using a 7 node experiment with a vulnerability density of 0.5 with a propagation speed of one scan per 5 microseconds. The y-axis represents the number of hosts, and the x-axis represents time in seconds on a logarithmic scale. Each graph is comprised of three lines: one for the number of infected hosts, one for the number of clean hosts, and one for the number of contained hosts. Five of these experiments under these conditions resulted in one node that failed to detect the attack with an anomaly or signature-based sensor, but where the Tripwire sensor succeeded in detecting
A Network-Based Response Framework and Implementation
77
Performance of the Response Engine with a Fast Worm (Average) 6 Infected Clean Contained
5
# Hosts
4
3
2
1
0 0.001
0.01
0.1
1 Time (s)
10
100
1000
Fig. 2. Average Performance of Fast Experiments
the attack8 . In this case, the node was only recovered after sending the Tripwire sensor alert, which took ˜1800 seconds or 30 minutes with the default sensor configuration compared to the ˜0.1 seconds or less for nodes that detected the attack with an anomaly or signature-based sensor. Figure 3 presents the results of one experiment using the same experimental setup but with a propagation speed of one scan per 50,000 microseconds. In this case, the worm is caught before being able to spread to a vulnerable host, resulting in only one infection. About half of the remaining medium experiments exhibited this behavior, with the remainder representing the behavior shown for the fast worms. All but one of the slow experiments exhibited behavior similar to that of Figure 3. One of the rules used represents the generation of an unknown worm event class. This rule requires three event instances that have the same compromised filesystem but reside on different hosts. In experiments with a vulnerability density of 0.83, as this rule received more partial matches, and eventually a full match, the preemptive detection threshold propagation discussed at the end of Subsection 2.3 resulted in new sensor configuration changes, which included 8
Even though this allows for the case that a host-based anomaly IDS can miss the attack against one host, but catch the attack against another host, these discrepancies can be due to different background traffic observed on each host at the time of the attack. Similarly, this allows for the network-based signature IDS to catch an attack against some hosts, but miss them against others. This can be due to polymorphic worms where the available signature is able to detect some variants of the worm, but not all, and the worm changes form as it spreads.
78
M. Tylutki and K. Levitt
Performance of the Response Engine with a Medium Worm (Trial 1) 6
5
# Hosts
4
3
Infected Clean Contained
2
1
0 0.001
0.01
0.1
1 Time (s)
10
100
Fig. 3. Medium Experiment #1
lowering the Tripwire timeliness values from one scan per 30 minutes to one scan per 20 minutes. For some nodes that failed to detect the attack with a signature or anomaly-based sensor, this reduced the amount of time the alert was sent from the infected node to the response agent from ˜1800 seconds to ˜1200 seconds. 4.3
Scalability Test Results
For scalability concerns, the experiment was also executed with 15 and 31 nodes with a 0.5 vulnerability density. On average, 7 node experiments resulted in the response agent calculating the last optimal response set within 0.019993 seconds. The 15 node experiment was able to calculate the last optimal response set within 0.280744 seconds, while the 31 node experiment took 2.49284 seconds resulting in about a 10 fold increase each time the number of nodes are doubled. However, it should be noted that these results were obtained with extensive debug log information, printing out timestamps and data for key points within the decision engine, including partial and full rule match information, and new detection threshold additions from propagation. With the trimming of just rule match notifications (but not information about newly generated event instances), the 31 node experiment was able to reduce the calculation time to 0.700421 seconds, which could likely be reduced further with less precise timestamps and further trimming. Note that decreasing the number of nodes or lowering the vulnerability density increases performance.
A Network-Based Response Framework and Implementation
4.4
79
Advanced Scenario
The following scenario can be adapted into an experiment with this implementation with relatively minor adjustments. In this scenario, the attacker utilizes multiple attack vectors for the primary goal of obtaining access to a relatively secure workstation, halfdome. Halfdome runs a local firewall and does not provide any services that are externally visible. The network halfdome is on (N ) does not utilize any firewalls and is externally accessible. In the first step, the attacker initiates a worm similar to the worm described in the experiments within Subsections 4.2 and 4.3. Although halfdome is not compromised, pinatubo, which resides on network N, is vulnerable and becomes compromised. This attack goes undetected by quick intrusion detection sensors, but will be detected by an upcoming integrity scan. The attacker then attempts to sniff passwords from network N, but is unable to find unencrypted traffic involving halfdome. However, the attacker is able to discover that halfdome uses hawkeye for DNS requests, which happens to be a Windows 2000 DNS server. The attacker then launches a Distributed Denial of Service (DDoS) attack against hawkeye using attacks from remote hosts, as well as attempting to steal hawkeye’s IP address using pinatubo and other local, compromised hosts by spoofing ARP requests and replies. The DDoS attack is easily detected by sensors and is reported to the response agent which initiates pushback on cooperating routers [4]. Pushback provides some mitigation of the attack, but is unable to provide complete protection from the attack due to the limited domain of routers that support pushback. After the response agent receives a status alert on the partial success of the pushback response, it activates a proportional-integralderivative controller-based response [6]. This combined response towards the DDoS attacks sufficiently mitigates the attack to allow for critical services to satisfy availability levels determined by policy. During the external attack, the response agent also receives an alert of spammed ARP spoofs from hosts trying to steal hawkeye’s IP address. This event is detected, but the response agent does not initiate a response since it is unsure which ARP replies are spoofs and which are genuine. A correlation sensor is able to suggest a common service the ARP spoofing hosts shared as a possible source of infection. The response agent also receives this report, but due to the high false positive probability, it does not issue a response. During the DDoS attack, the attacker is able to poison the DNS cache of halfdome by spoofing DNS replies [30,31] from hosts that were able to temporarily successfully steal hawkeye’s IP address. Because the attacker attempts to spam the DNS replies to halfdome in an attempt that one will get through, an anomaly-based sensor detects the attack and forwards an alert to the response agent. A correlation sensor sees the anomaly-based sensor’s alert and correlates this with the ARP spoofs and the possible common compromised service to produce a low false positive report on the compromised hosts. This results in a Tripwire integrity scan of all suspicious hosts while the traffic from halfdome to external sites is temporarily throttled at halfdome’s local firewall. All suspicious hosts are soon confirmed to be infected, which results in the restoration of in-
80
M. Tylutki and K. Levitt
fected filesystems and the disabling of the previously correlated service on the infected hosts. Alternatively, backups of the recovered systems are made with the disabled service. Once complete, an attempt is made to upgrade the vulnerable service to a more recent version. An automated testing procedure is executed to detect if dependent services are still able to function with the upgraded service. If successful, additional previously infected hosts attempt to upgrade their service as well and test for problems. If a problem occurs, the previous image is rolled back with the service disabled and the local administrator is notified to remedy the problem. In order to use this scenario with the current developed framework, a few key changes would have to be made. First, response feedback would have to be added by preserving policy violations that were responded to, observing the results of previous responses by retargeting sensors to observe the corresponding event classes, and initiating alternative responses that are stored along with the previous policy violation. Second, the response agent must be able to correlate seperate alerts into an overall attack vector, which is a problem many correlation systems have attempted to solve. Third, the event properties schema must be modified to be able to encapsulate other event properties documents, allowing for any attack scenario.
5
Conclusions and Future Work
In this paper we presented a modular, extensible response framework along with an implementation of a response system that utilizes this framework. The framework allows for various simulated or real intrusion detection systems, response agents, and aggregation agents. The response model and implementation presented demonstrated the benefits of sensor retargeting and supporting an expressive model that encompasses a wide variety of attacks, sensors, response agents, and policies. The experimental results presented could be compared for different high level response systems for specific response scenarios for purposes of evaluation. Although the experimental scenarios were relatively simple, a detailed scenario is presented that can be executed with minor modifications to the current implementation. Although attacks such as infinite loops are possible through poor design as described in Subsection 3.4, they can be prevented or mitigated with loop timeouts, or with the integration of a professional expert system that is designed to catch such loops. There are many other approaches for extending and enhancing this work, in addition to those proposed in Subsection 4.4, including the following: – Bayesian inferencing can be used to more accurately calculate the false positive probabilities of an event instance by taking several additional conditional probabilities into account. – This probabilistic model can be used to create a metric that assesses the detection or response capability of a system by comparing the probabilities
A Network-Based Response Framework and Implementation
81
that a system can be recovered and prevent future attacks for a given general scenario within a specified timeframe. – The integration of a professional expert system into the response agent, which would greatly increase the efficiency of the implementation, but would make the preemptive detection threshold propagation discussed in Subsection 2.3 more difficult. – A model for making sensor configuration detection threshold values a function of state properties makes these values much more realistic and dynamic. – By modifying rules to allow for any type of computation, rather than straight-forward expert system rules, entire sensors/response engines or their components can be included in the model.
References 1. Snapp, S., Brentano, J., Dias, G., Goan, T., Heberlein, T., Ho, C., Levitt, K., Mukherjee, B., Smaha, S., Grance, T., Teal, D., Mansur, D.: DIDS (Distributed Intrusion Detection System) - Motivation, Architecture, and an Early Prototype. In: Proc. 14th National Computer Security Conference (1991) 2. Heberlein, L., Dias, G., Levitt, K., Mukherjee, B., Wood, J., Wolber, D.: A Network Security Monitor. In: Proc. IEEE Symposium on Security and Privacy (1990) 3. Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303, USA. SunSHIELD Basic Security Module Guide, Solaris 7, Part No. 805-2635-10 (October 1998) 4. Ionnidis, J., Bellovin, S.M.: Implementing Pushback: Router-based Defense against DDoS Attacks. In: Proc. The Network and Distributed System Security Symposium (2002) 5. Sterne, D., Djahandari, K., Wilson, B., Babson, B., Schnackenberg, D., Holliday, H., Reid, T.: Autonomic response to distributed denial of service attacks. In: Lee, W., M´e, L., Wespi, A. (eds.) RAID 2001. LNCS, vol. 2212, p. 134. Springer, Heidelberg (2001) 6. Tylutki, M., Levitt, K.: Mitigating distributed denial of service attacks using a proportional-integral-derivative controller. In: Vigna, G., Kr¨ ugel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, pp. 1–16. Springer, Heidelberg (2003) 7. Rowe, J.: Intrusion Detection and Isolation Protocol: Automated Response to Attacks. In: Recent Advances in Intrusion Detection (1999) 8. Kreidl, O., Frazier, T.: Feedback Control Applied to Survivability: A Host-Based Autonomic Defense System. IEEE Transactions of Reliability 52(3) (2003) 9. Musliner, D.: CIRCADIA Demonstration: Active Adaptive Defense. In: Proc. DISCEX 2003 (2003) 10. Toth, T., Kruegel, C.: Evaluating the Impact of Automated Intrusion Response Mechanisms. In: Proc. 18th Annual Computer Security Applications Conference (2002) 11. Cohen, F., Lambert, D., Preston, C., Berry, N., Stewart, C., Thomas, E.: A Framework for Deception (July 2005) (accessed July 2005), http://www.all.net/journal/deception/Framework/Framework.html 12. Cohen, F.: Leading Attackers through Attack Graphs with Deceptions. Computers and Security 22(5), 402–411 (2003) 13. The Honeynet Project (accessed June 2005), http://www.honeynet.org
82
M. Tylutki and K. Levitt
14. Spitzner, L.: The Honeynet Project: Trapping the Hackers. In: Proc. IEEE Symposium on Security and Privacy (2005) 15. Templeton, S., Levitt, K.: A Requires/Provides Model for Computer Attacks. In: Proc. 2000 New Security Paradigms Workshop, pp. 31–38 (2000) 16. Cheung, S., Lindqvist, U., Fong, M.: Modeling Multistep Cyber Attacks for Scenario Recognition. In: Proc. DISCEX 2003 (2003) 17. Michel, C., M´e, L.: AdeLe: An Attack Description Language for Knowledge-Based Intrusion Detection. In: Trusted Information: The New Decade Challenge: IFIP TC11 16th International Conference on Information Security (IFIP/SEC 2001), pp. 353–368 (2001) 18. Cuppens, F., Ortalo, R.: LAMBDA: A language to model a database for detection of attacks. In: Debar, H., M´e, L., Wu, S.F. (eds.) RAID 2000. LNCS, vol. 1907, pp. 197–216. Springer, Heidelberg (2000) 19. Staniford-Chen, S., Tung, B., Schanckenberg, D.: The Common Intrusion Detection Framework (CIDF). In: Information Survivability Workshop (1998) 20. Debar, H., Curry, D., Feinstein, B.: The Intrusion Detection Message Exchange Format. Internet Draft (July 2004) (accessed July, 2005), http://xml.coverpages.org/draft-ietf-idwg-idmef-xml-12.txt 21. Kim, G., Spafford, E.: The Design and Implementation of Tripwire: A File System Integrity Checker. Technical Report CSD-TR-93-071, Purdue University, West Lafayette, IN 47907-1398 22. Lee, W., Fan, W., Miller, M., Stolfo, S., Zadok, E.: Toward Cost-Sensitive Modeling for Intrusion Detection and Response. Journal of Computer Security, 5–22 (2002) 23. Rossey, L., Cunningham, R., Fried, D., Rabek, J., Lippmann, R., Haines, J., Zissman, M.: LARIAT: Lincoln Adaptable Real-time Information Assurance Testbed. In: Recent Advances in Intrusion Detection (2001) 24. White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasadm, S., Newboldm, M., Hiber, M., Barb, C., Joglekar, A.: An Integrated Experimental Environment for Distributed Systems and Networks. In: Proc. 5th USENIX Operating systems Design and Implementation Symposium (2002) 25. McAlerney, J.M.: An Internet Worm Propagation Data Model”. M.S. thesis, University of California, Davis (2004) 26. Lee, W., Stolfo, S.: Data Mining Approaches for Intrusion Detection. In: Proc. 7 th USENIX Security Symposium (1998) 27. Roesch, M.: Snort - Lightweight Intrusion Detection for Networks. In: Proc. 13th Systems Administration Conference, USENIX (1999) 28. Paxson, V.: Bro: A System for Detecting Network Intruders in Real-Time. Computer Networks 31(23-24), 2435–2463 (1999) 29. Kruegel, C., Toth, T.: Flexible, Mobile Agent Based Intrusion Detection for Dynamic Networks. In: Proc. European Wireless (2002) 30. DNS Poisoning Summary (March 2005) (accessed July 2005), http://isc.sans.org/presentations/dnspoisoning.php 31. How to Prevent DNS Cache Pollution, Article ID 241352 (accessed July 2005), http://support.microsoft.com/default.aspx?scid=kb;en-us;241352
Towards Resilient Networks Using Programmable Networking Technologies Linlin Xie1 , Paul Smith1 , Mark Banfield3 , Helmut Leopold3 , James P.G. Sterbenz1,2 , and David Hutchison1 1
2
Computing Department InfoLab21 Lancaster University Lancaster, LA1 4WA, UK {linlin.xie,p.smith,jpgs,dh}@comp.lancs.ac.uk Information Technology and Telecommunications Research Center Department of Electrical Engineering and Computer Science University of Kansas Lawrence, Kansas 66045-7621, USA
[email protected] 3 Telekom Austria AG Lassallestraße 9 A-1020, Vienna, Austria {mark.banfield,helmut.leopold}@telekom.at
Abstract. Resilience is arguably the most important property of a networked system, one of the three quality of service (QoS) characteristics along with security and performance. Now that computer networks are supporting many of the applications crucial to the success of the emerging Information Society – including business, health care, education, science, and government – it is particularly important to ensure that the underlying network infrastructure is resilient to events and attacks that will inevitably occur. Included in these challenges are flash crowd events, in which servers cannot cope with a very large onset of valid traffic, and denial of service attacks which aim to damage networked system with malicious traffic. In this paper, we outline the case for mechanisms to deal with such events and attacks, and we propose programmable networking techniques as the best way ahead, illustrated by a flash crowd example. Keywords: Resilience, Survivability, Disruption Tolerance, Programmable and Active Networking, Flash Crowd and Distributed Denial of Service (DDoS) Detection and Remediation, Quality of Service (QoS).
1
Introduction
Networks have become increasingly important in our daily lives, to the extent that we depend on them for much of what we do, and we are significantly disrupted when they cease to operate properly. Current networks in general, and
http://www.comp.lancs.ac.uk/resilinets
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 83–95, 2009. c IFIP International Federation for Information Processing 2009
84
L. Xie et al.
the Internet in particular, do not provide the resilience that will be needed, especially when more critical applications depend on proper network operation. Resilience is the ability of the network to provide and maintain an acceptable level of service in the face of various challenges to normal operation. These challenges include natural faults of network components (fault-tolerance); failures due to mis-configuration or operational errors; large-scale natural disasters (e.g., hurricanes, earthquakes, ice storms, tsunami, floods); attacks against the network hardware, software, or protocol infrastructure (from recreational crackers, industrial espionage, terrorism, or warfare); unpredictably long delay paths either due to length (e.g., satellite) or as a result of episodic connectivity; weak, asymmetric, and episodic connectivity of wireless channels; and high mobility of nodes and subnetworks. Addressing these challenges are required for network survivability [22]. We define resilience as survivability plus the ability to tolerate unusual but legitimate traffic load. Note, that while attack detection is an important endeavour, it is in some sense futile, since a sufficiently sophisticated distributed denial of service (DDoS) attack is indistinguishable from legitimate traffic. Thus traffic anomaly detection that attempts to detect and resist DDoS attacks simply incrementally raises the bar over which attackers must pass. Since both cases adversely affect servers and cross traffic, as well as exhaust network resources, the goal is resilience regardless of whether or not an attack is occurring. Resilient networks aim to provide acceptable service to applications, including the ability for users and applications to access information when needed (e.g., Web browsing and sensor monitoring), maintenance of end-to-end communication association (e.g., a video- or teleconference), and operation of distributed processing and networked storage. Resilient network services must remain accessible whenever possible, degrade gracefully when necessary, ensure correctness of operation (even if performance is degraded), and rapidly and automatically recover from degradation. We believe that to realise resilient services it is necessary to have programmable networks – in particular, the ability of the network to dynamically adapt in response to learnt context information – providing the motivation for this need is the main contribution of this paper. In Section 2, we discuss in more detail the programmable networking features that are necessary for resilience and why they are necessary. We present in Section 3 an example resilient networking scenario – a flash crowd event, and show how programmable networking can be used to detect the onset of the ill-effects from such an event and how these effects can be mitigated. Recently, a number of important initiatives have emerged that aim to modify the Internet architecture, which could be used to realise resilient services; the rest of this section will present an overview of these initiatives. 1.1
Resilient Networking Initiatives
A knowledge plane (KP) [19] has been proposed to supplement the Internet architecture, which self-organises to discover and solves problems automatically. The principle is that a knowledge plane could reason based on collected infor-
Towards Resilient Networks Using Programmable Networking Technologies
85
mation from all levels of the protocol stack to optimise applications, diagnose and tolerate faults and attacks, and make the network reconfigurable. The KP would use cognitive AI to work on incomplete, inconsistent, or even misleading information, behave properly in the face of inconsistent high-level goals, and proactively work with new technologies and services. The KP can be considered a way of building resilient networks in the long-term future – the development of cognitive technology is still in its early stages and the KP highly depends on it. Furthermore, challenges need to be addressed in areas such as knowledge sharing (trust issues) and reasoning on vast amounts of information (scalability issues). Work in the area of autonomic computing has largely focused on developing selfconfiguring, self-managing, and self-healing networked server systems [15]. There are now initiatives that consider making communications systems autonomic (e.g., [18,17]). These communication systems aim to understand the context in which they operate, such as user requirements and network status, and then automatically adapt to meet service goals. Clearly, techniques for enabling autonomic communication systems are relevant for building resilient network services. The COPS (Checking, Observing, and Protecting Services) project [20] aims to protect networks with devices called iBoxes, which perform observation and action operations at the network edge. COPS proposes to extend checking into the protocol domain, so that iBox functionality would migrate into future generations of routers. An annotation layer resides between the IP and transport layers for network management, which will allow annotated traffic to be appropriately processed.
2
Programmable Networks
Resilient networks need to be engineered with emergent behaviour to resist challenges to normal operation, recognise when challenges and attacks occur to isolate their effects, ensure resilience in the face of dependence of other infrastructure such as the power grid, rapidly and autonomically recover to normal operation, and refine future behaviour to better resist, recognise, and recover. We believe that programmable networking technologies will be a key enabler of the emergent and autonomic behaviour necessary for resilience. The need for programmable networking technology [24,25,26] for building resilient networks stems from the nature of the challenges that will affect normal operation. These challenges will rapidly change over time and space. In other words, the moment in time when these challenges will threaten normal service operation will rapidly and arbitrarily differ, and over time new challenges will emerge, such as new application traffic loads, forms of DDoS attacks, deployment environments, and networking technologies. Furthermore, the affected organisational entities and network services will change in an unpredictable manner. These characteristics preclude the use of a set of prescribed solutions to resiliency and mandate the use of a dynamically extensible infrastructure that can be aware of its environment.
86
L. Xie et al.
The following subsections further catalogue and motivate the need for the programmable networking facilities that are required for resiliency. 2.1
Dynamic Extensibility and Self-organisation
Programmability allows the network to respond to challenges by dynamically altering its behaviour and re-programming itself. This key ability of networks to change means that nodes do not need to be hard-coded or pre-provisioned with all the algorithms that may be needed to detect and respond to the challenges to normal operation. In fact, attempting to pre-program the complete set of resilience solutions is a futile exercise because of the dynamic and adaptive nature of the challenges to normal operation, as discussed earlier. Furthermore, we believe the network must be able to alter its behaviour without the intervention of network operators, because of the increasingly short timescales at which traffic patterns change (e.g., flash crowd) and attacks spread. Thus, it is essential that the network must be self-monitoring, self-diagnosing, self-reorganising, and self-managing. In light of this, programmable networking devices must expose interfaces that allow their behaviour to be extended in a safe manner to appropriately privileged entities. Furthermore, a service that can be used to rapidly determine the most suitable programmable network locations to deploy resilience components must be available. For example, it should be possible for a resilient networking service to request the deployment of mitigation code in proximity to the source of a DDoS attack, even when the location of the source may be mobile. Approaches to this have been proposed in [28,29], but much further work is required. By introducing dynamic extensibility and self-organisation into the network, there is a risk of making the network unstable and potentially worsening the effect of any disruption to normal service provisioning. Furthermore, exposing interfaces that enable third-party services to understand and manipulate the operation of the network introduces a new entry-point for misuse. With this in mind, programmability and dynamic behaviour should be introduced carefully and exposed interfaces must be stealthy (i.e., not expose more functionality than strictly necessary). This is consistent with moderate active networking [16], in which the ability to inject and transport dynamic programming extensions is tightly controlled by the network service provider. Inter-providerAS relationships will have to be based on authentication and trust mechanisms. 2.2
Context Awareness
Understanding the characteristics of traffic and the topology in a resilient network is important. For example, when a DDoS attack occurs it is useful to learn the source addresses of the perpetrators so that remediation services can be invoked in appropriate network locations, including toward the source. In other words, it is important to understand the network context so that the correct remediation services can be invoked with the correct parameters. To understand network context it must be possible to inspect packets at line speed, as well as be aware of topology state and network signalling messages.
Towards Resilient Networks Using Programmable Networking Technologies
87
However, understanding network context is only one part of the picture. A resilient network should use context from a range of layers. Arguably, the deeper one can look into a packet at higher-layer protocol headers and data, the greater degree of information can be obtained, and more targeted any remediation service can be. Edge network devices are commercially available that are capable of application-level packet inspection at line speed (e.g., [13,14]). So that applications and services operating at different layers can understand one-another’s context and work in harmony, interfaces that enable cross-layer interaction are necessary. Without understanding context across a range of layers, actions taken at one layer may not be complimentary at another. While it is clear about the motivation for cross-layer interaction, and there has been work in the context of specific parameters and protocols, there is no fundamental understanding of how this should be undertaken and what the benefits (performance and functional improvements) and costs (complexity and stability) are. A basic understanding of the nature of cross-layer interaction, resulting control loops, and its effect on the network needs to be gained [23].
3
Programmable Flash Crowd Detection and Mitigation
As an example to demonstrate how programmable networks can be used to build resilient services, in this section we describe an approach to detecting and mitigating the effects of a flash crowd event. To detect the ill-effects of a flash crowd event (e.g., a reduction in server response rate), we employ a mechanism that uses application and network-level information at a programmable edge device to detect a mismatch in anticipated and actual response rates from a server. We also discuss a number of approaches to mitigating the effects of flash crowd events by using the extensible nature of programmable networks. 3.1
Flash Crowd Detection
A flash crowd event [1], is characterised by a dramatic increase in requests for a service over a relatively short period of time, e.g., the sharp increase in requests for content on the CNN website immediately after the 9/11 attacks of 2001 [2]. These events can lead to a degradation or complete loss of service. It is important to detect the onset of a flash crowd event so that techniques to mitigate its effect can be invoked before a loss of service occurs. A surge in service requests could cause a bottleneck to occur in the access network to the service provider, the systems providing the service, or both. In any case, one would expect to see a significant increase in request rate in a relatively short period of time and an associated levelling off or reduction in the response rate as the network queues or server resources become saturated with requests. This behaviour is what we aim to detect and use to trigger programmable mechanisms to protect the network. The mechanism we propose detects flash crowd events that are targeted at Web servers. It makes use of application-level information, but performs the
88
L. Xie et al.
detection at the network level, and executes on a programmable edge router attached to the network that is providing the service. The mechanism inspects the volume of response traffic from a server, and based upon a difference between the expected volume of response traffic and the actual traffic, suggests the presence of a flash crowd event. In other words, if there is less response traffic than expected, we deduce the effects of a flash crowd event are beginning. Proposals in [21] also compare estimated traffic volume to the actual volume to detect the onset of traffic volume anomalies. We use a similar idea, but do not aim to detect the presence of flash crowd events per se, but rather the onset of any ill-effects they cause. In [11], it is shown that Web traffic has selfsimilarity characteristics, in other words, the requested objects follow a powerlaw distribution. We use this fact and the content-size distribution of requested objects, learnt from sampling the content-length field in HTTP response headers, to estimate the volume of response traffic. Normally, the sum of the sizes of the requested objects would form the response traffic volume, as shown in Equation 1, where v is volume of response traffic, r is the number of requests, and Sı is the size of the object associated with a request rı . We maintain the average incoming HTTP request rate for a server and use this along with the learnt content-size distribution to estimate the volume of response traffic expected. Equation 2 shows how we calculate the Exponentially Weighted Moving Average (EWMA) incoming request rate (f ), where c is the request rate at a given point in time. Equation 3 describes how we use the integer value of this average (f ) to calculate the expected volume of response traffic (e) at time t, where Gı is the estimated content-size for a request fı . By selecting an appropriate values for α, we aim to obtain a close estimate of the response traffic volume. v=
r
Sı
(1)
ı=1
f (t) = (1 − α) × f (t − 1) + α × c(t),
with
α>0
(2)
f (t)
e(t) =
Gı
(3)
ı=1
a(t) | t = 1 . . . n} ∼ N (μ, σ2 ) (4) e(t) The ratio of the observed response traffic volume (a) to the estimated traffic volume (e) should follow a normal distribution: N (μ, σ2 ), see Equation 4. The value of μ should be slightly greater than one, because we did not include the TCP/IP header size in our calculations. We use the EWMA of the ratio to smooth fluctuations caused by inaccuracies in the guessing mechanism. In Section 3.2, we show how we test the ratio distribution and calculate the parameter for the distribution and gain confidence (95% in this case), from which we can ascertain if the effects of a flash crowd are occurring. If continuous points are observed to be beyond the confidence range, it suggests the occurrence of abnormality. {
Towards Resilient Networks Using Programmable Networking Technologies
3.2
89
Simulation of Flash Crowd Mechanism
To give an indication of the effectiveness of the flash crowd detection mechanism, we simulated such an event using ns-2. HTTP traffic was generated using the PagePool/WebTraf application in ns-2. Parameters used for generating HTTP sessions follow the distributions presented in [9]. The request rate for background traffic was modified to be approximately 150 requests/sec and flash traffic to 1200 requests/sec. The request rate of flash traffic was set to be almost eight times greater than that of background traffic, which is modest for a flash crowd event as the hit-rate for CNN just after 9/11 was twenty times its normal rate [2]. The parameters used for the background and flash traffic are shown in Table 1. Table 1. Simulation parameters Traffic Type Number of Sessions
Background
1000
Flash Crowd
20000
Inter- Number Inter- Number Inter- Object session of Pages page of object Size [KB] Time [s] per Time [s] Objects Time [s] Session per Page 1 15 1 10 0.01 Avg:12 Shape: 1.2 0.025 10 1 10 0.01 Avg:12 Shape: 1.2
The simulations ran for 1200 seconds; flash traffic started at 500 seconds. We used a simple network topology, this included twenty clients, an ingress edge router, an egress edge router, and a server. The bandwidth of the links between the clients and the ingress router were set to 10 Mb/sec, the two routers were connected by a 50 Mb/sec link, and the egress router was connected to the server by a 15 Mb/sec link. A detection interval (how often we checked the ratio of actual (t) to expected (e) response traffic volumes) was set to 30 seconds and the α value was set to 0.2. The simulation with the same configuration was run three times and the mean values were used for generating the graphs. The onset of the flash crowd event can be seen at 500 seconds into the simulation in Figure 1. At 1020 seconds, the request rate starts to drop due to the sessions running out. Figure 2 shows how the normal response rate is around 1.6 Mb/sec and during the flash crowd event it reaches and stabilises at 1.8 Mb/sec. The stabilisation of the response rate is caused by the buffers on the server’s ingress link becoming saturated and subsequently dropping incoming requests. To gain an estimate of μ and σ for the normal distribution of the ratio, we ran ten thousand background traffic sessions using the parameters shown in Table 1. The value of μ was set to the average of the samples: 1.10817, and σ to the standard deviation of the samples: 0.227477. Figure 3 shows that the sample distribution appears close to the normal distribution of N (μ, σ2 ). The 95% confidence range of this distribution is [0.662315, 1.554025], which means that the possibility of the ratio value going beyond this range is 5%. We use more
90
L. Xie et al.
Fig. 1. Request rate during a flash crowd event starting at 500 seconds
Fig. 2. Response traffic rate during a flash crowd event starting at 500 seconds
than two continuous values outside the confidence range to detect the saturation of the server side. Figure 4 shows that the ratio drops shortly after the onset of the flash crowd event and subsequently oscillates around 0.2 with a small amplitude. Recall that two continuous ratio values outside the confidence range [0.662315, 1.554025] is used to diagnose that the effects of a flash crowd are being felt. Given this, with a detection interval of 30 seconds, saturation can be confirmed at 570 seconds. 3.3
Flash Crowd Mitigation Mechanism
To protect a Web server and the cross traffic in the network, we propose two strategies. The first is to drop requests that the server side is not able to manage at the ingress points of a provider’s network. The ingress points are discovered by routers that perform a pushback mechanism, the basic concept and mechanism of which are presented in [7][8]. In summary, with a slight variation, our mechanism is invoked on the server’s edge router which identifies the incoming interfaces
Towards Resilient Networks Using Programmable Networking Technologies
91
Fig. 3. The distribution of sampled ratios in normal situation
Fig. 4. Ratio of actual response traffic amount over estimated traffic
of aggregates of high volume of requests to the server. The router then sends messages to the immediate upstream routers (from which the high aggregate request volumes came from) to recursively carry out this procedure and push back requests until the provider’s ingress router is reached. The second strategy is to re-route response traffic inside the network to improve the traffic distribution and reduce the possibility of links becoming congested. The reason for the first action is straightforward – to push request traffic that cannot be served outside of the network to save network resources. The reason for the other action and the mechanisms to do it are described below. According to [12], an important metric for measuring how well traffic is distributed in a network is maximum utilisation. Larger maximum utilisation values indicate that links are more sensitive to bursts. Large amounts of flash crowd traffic would cause heavily skewed distribution in the network, which could reduce the quality of service for cross traffic. To have a better distribution we need
92
L. Xie et al.
to reduce the maximum utilisation. These strategies are the subject of future work, as discussed in Section 4. 3.4
Related Flash Crowd Detection and Mitigation Work
The implications of flash crowd events and DoS attacks for Web sites and content distribution networks (CDNs) are discussed in [3]. They propose enhancements to CDNs that make them more adaptive and subsequently better at mitigating the effects of flash crowd events. Collaborative caching mechanisms that can be used to redirect requests to appropriate caches in light of a flash crowd event are proposed in [4]. The challenge here is to make sure that the appropriate content is cached – this may be difficult to predict. The authors of [5] describe a mechanism that breaks content up into small pieces and returns each request a piece. Clients need to talk to each other for other pieces of the content. This mechanism requires servers to perform the content manipulation, and requires modifications to Web browser applications and HTTP protocol. In [6] the problems associated with flash crowds are addressed by making changes to the architecture of Web servers – approaches that allow dynamic resource allocation across multiple servers are proposed. We address the flash crowd problem from the point of a network service provider (and potentially also a third-party application service provider), and make no assumption about the Web server architecture in use. An approach to dropping requests at the server’s ingress point to a network is proposed in [9]. The rate at which requests are dropped is set dynamically. A major drawback of this approach is that it requires the inspection of application layer headers of each packet. We have shown here that you can do detection at the network level while only sampling the application layer headers occasionally.
4
Future Work
Because our flash crowd detection mechanism uses hints to determine the onset of the ill-effects of a flash crowd event – it guesses the expected volume of response traffic – there is a possibility it could give false positives. Further investigation is necessary to determine under what conditions this could occur and what effects a false positive may have. In our simulations, we set the detection interval to 30 seconds; investigating whether we could effectively reduce this interval to enable faster detection is something we plan to investigate through further simulation. As part of future work into mitigating the effects of flash crowd events, we propose to improve the distribution of response traffic by instigating multi-path routing for traffic that is tolerant to packet mis-ordering. A way to approach this is by a server’s edge router building a multi-route database, in which all possible routes between the server’s edge router to all the other edge routers along with the available bandwidths are held. The database is built by deploying active
Towards Resilient Networks Using Programmable Networking Technologies
93
code to collect routing information and available bandwidth information from programmable routers. When the server’s edge router observes or is informed that the response traffic is consuming too much bandwidth on one of the links, it could distribute the traffic over a number of routes. An approach such as this removes the need to change existing routing protocols, as in [12], which manipulates the link weights in the OSPF routing database. Investigating aspects of resilience in the context of computer networks is an emerging research topic. In the recently funded Autonomic Networking Architecture (ANA) EU research project [18], we will investigate the the use of resilience techniques and mechanisms to support autonomic networks.
5
Conclusions
In this paper, we have presented work in progress in the important area of the resilience of networked systems. In addition to presenting the basic argument that resilience is really needed in the modern networked world, we argue for programmable networking techniques as an appropriate way ahead to build resilience mechanisms. By means of a modest flash crowd example, we outline simulation results that aim to show the promise of programmable networking in this crucial area. Furthermore, the mechanism demonstrates that multi-layer cooperation is a useful tool to enable resilient networks. The simulation results indicate that our detection mechanism for flash crowd events has potential. Future work will focus on the mitigation of flash crowd events and also DDoS detection and repair. By focusing on a particular application scenario we aim to develop and prove a resilient network architecture that uses programmable networking technologies.
Acknowledgements Linlin Xie and Paul Smith are supported by Telekom Austria. We are grateful to Steven Simpson for his help and contributions with the simulations. We also appreciate the comments from the anonymous reviewers.
References 1. Niven, L.: Flash Crowd. In: Flight of the Horse. Ballantine Books (September 1973) 2. LeFebvre, W.: CNN.com: Facing A World Crisis (2001), http://www.tcsa.org/lisa2001/cnn.txt2001 3. Jung, J., Krishnamurthy, B., Rabinovich, M.: Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites. In: Proceedings of The Eleventh International ACM World Wide Web Conference (ACM WWW 2002), Hawaii, USA (May 2002) 4. Stading, T., Maniatis, P., Baker, M.: Peer-to-peer caching schemes to address flash crowds. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 203. Springer, Heidelberg (2002)
94
L. Xie et al.
5. Patel, J.A., Gupta, I.: Overhaul: Extending HTTP to Combat Flash Crowds. In: Proceedings of the 9th International Workshop on Web Caching and Content Distribution (WCW 2004), Beijing, China (October 2004) 6. Chandra, A., Shenoy, P.: Effectiveness of Dynamic Resource Allocation for Handling Internet Flash Crowds, University of Massachusetts Technical Report, TR0337 (2003) 7. Mahajan, R., Bellovin, S.M., Floyd, S., Ioannidis, J., Paxson, V., Shenker, S.: Controlling High Bandwidth Aggregates in the Network. In: ACM SIGCOMM Computer Communication Review, vol. 32(3), pp. 62–73 (July 2002) 8. Ioannidis, J., Bellovin, S.M.: Implementing Pushback: Router-Based Defense Against DDoS Attacks, AT&T Technical Report (December 2001) 9. Chen, X., Heidemann, J.: Flash Crowd Mitigation via Adaptive Admission Control Based on Application-Level Observation, USC/ISI Technical Report, ISI-TR2002 557 (revised version) (March 2003) 10. Mirkovic, J., Reiher, P.: A Taxonomy of DDoS Attack and DDoS Defense Mechanisms. In: ACM SIGCOMM Computer Communications Review, vol. 34(2), pp. 39–53 (April 2004) 11. Crovella, M.E., Bestavros, A.: Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes. IEEE/ACM Transactions on Networking 5(6), 835–846 (1997) 12. Fortz, B., Thorup, M.: Internet Traffic Engineering by Optimizing OSPF Weights. In: Proceedings of the 19th Conference on Computer Communications (INFOCOM 2000), Tel-Aviv, Israel (March 2000) 13. Bivio Networks, http://www.bivio.net/ 14. IBM BladeCenter, http://www-03.ibm.com/servers/eserver/bladecenter/ 15. IBM Autonomic Computing, White Paper: An architectural blueprint for autonomic computing, 3rd edn. (June 2005), http://www-03.ibm.com/autonomic/ pdfs/AC%20Blueprint%20White%20Paper%20V7.pdf 16. Jackson, A.W., Sterbenz, J.P.G., Condell, M.N., Hain, R.R.: Active Network Monitoring and Control: The SENCOMM Architecture and Implementation. In: 2002 DARPA Active Networks Conference and Exposition (DANCE 2002), p. 379 (2002) 17. The Autonomic Communications Forum, http://www.autonomic-communication-forum.org/ 18. The Autonomic Networking Architecture (ANA) research consortium, http://www.ana-project.org/ 19. Clark, D., Partridge, C., Ramming, J., Wroclawksi, J.: A Knowledge Plane for the Internet. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM 2003), Karlsruhe, Germany (August 2003) 20. Katz, R., Porter, G., Shenker, S., Stoica, I., Tsai, M.: COPS: Quality of service vs. Any service at all. In: de Meer, H., Bhatti, N. (eds.) IWQoS 2005. LNCS, vol. 3552, pp. 3–15. Springer, Heidelberg (2005) 21. Lakhina, A., Crovella, M., Diot, C.: Diagnosing Network-wide Traffic anomalies. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM 2004), Portland, Oregon, USA (August 2004) 22. Sterbenz, J.P.G., Krishnan, R., Hain, R.R., Jackson, A.W., Levin, D., Ramanathan, R., Zao, J.: Survivable Mobile Wireless Networks: Issues, Challenges, and Research Directions. In: Proceedings of the ACM Wireless Security Workshop (WiSE) 2002 at MobiCom, Atlanta, GA, September 2002, pp. 31–40 (2002) 23. Sterbenz, J.P.G., Hutchison, D.: Towards a Framework for Cross-Layer Optimisation in Support of Survivable and Resilient Autonomic Networking, Dagstuhl Seminar 06011 (January 2006)
Towards Resilient Networks Using Programmable Networking Technologies
95
24. Calvert, K., Bhatacharjee, S., Zegura, E., Sterbenz, J.P.G.: Directions in Active Networks. IEEE Communications 36(10), 72–78 (1998) 25. Tennenhouse, D.L., Wetherall, D.J.: Towards an Active Network Architecture. ACM Computer Communication Review 26(2), 5–17 (1996) 26. Tennenhouse, D.L., Smith, J.M., Sincoskie, W.D., Wetherall, D.J., Minden, G.J.: A Survey of Active Network Research. IEEE Communications Magazine 35(1), 80–86 (1997) 27. Schmid, S.: A Component-based Active Router Architecture, PhD Thesis, Lancaster University (November 2002) 28. Smith, P.: Programmable Service Deployment with Peer-to-Peer Networks, PhD Thesis, Lancaster University (September 2003) 29. Spence, D., Crowcroft, J., Hand, S., Harris, T.: Location Based Placement of Whole Distributed Systems. In: Proceedings of ACM Conference on Emerging Network Experiment and Technology (CoNEXT 2005), Toulouse, France, pp. 124–134 (October 2005)
Towards the Design of an Industrial Autonomic Network Node Martine Chaudier, Jean-Patrick Gelas, and Laurent Lefèvre INRIA/LIP (UMR CNRS, INRIA, ENS, UCB 5668) École Normale Supérieure de Lyon 46 allée d’Italie, 69364 Lyon Cedex 07, France
[email protected],
[email protected],
[email protected]
Abstract. Programmable and active networks allow specified classes of users to deploy dynamic network services adapted to data streams requirements. Currently most of researches performed on active networks are conducted in research laboratories. In this paper, we explore the design of IAN 2 an Industrial Autonomic Network Node able to be deployed in industrial context. Performance, dynamic programmability and fault-tolerance issues of software and hardware components have been prospected. First experimental evaluations on local platforms are presented.1
1
Introduction
Research works about active and programmable networks and evaluation of the experimental prototypes take place mostly in academical research laboratory. Currently no “plug and process” active equipments are available on the market place. In the framework of a cooperative industrial maintenance and monitoring project (TEMIC project[3]), in which we are currently involved with different academic and industrial partners, we design devices to be easily and efficiently deployable in an industrial context. Once the hardware deployed and used, it must also be easily removable at the end of the maintenance or monitoring contract. In this project, we deploy our devices in secured industrial departments, restricted areas, or in an out-of-the-way locations. These devices must act as auto-configurable and re-programmable network nodes. Thus, the equipments must be autonomic and must not require direct human intervention. The design of an autonomic network equipment must take into account specific requirements of active equipments in terms of dynamic service deployment, auto settings, self configuration, monitoring but also in terms of hardware specification 1
This project is supported by french RNRT Temic [3] project with SWI Company, INRIA, GRTC and LIFC.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 96–107, 2009. c IFIP International Federation for Information Processing 2009
Towards the Design of an Industrial Autonomic Network Node
97
(limited resources, limited mechanical parts constraints, dimension constraints), reliability and fault tolerance. This paper presents our current work on the design and adaptation of an industrial autonomic network node. We propose an adaptation of a generic high performance active network environment (Tamanoir [8]) in order to deploy on limited resources based network boxes and to increase reliability and scalability. The implementation process is based on a hardware solution provided by the Bearstech[2] company. Through this approach we propose the architecture of an Industrial Autonomic Network Node (called IAN 2 ) able to be deployed in industrial platforms. We evaluate the capabilities of IAN 2 in terms of computing and networking resources and dynamic re-programmability. This paper is organized as follows. Hardware and software are respectively described in section 2 and 3. Section 4 shows first performance evaluation of the IAN 2 . Section 5 briefly covers others works on industrial active nodes and finally the paper concludes in section 6.
2
Hardware Platform
This section describes briefly the hardware used to implement the IAN 2 industrial autonomic network node. To support a transportable solution, we use a small compact aluminium case which hosts a small motherboard (200x150 mm) featuring a VIA C3 CPU 1GHz (supporting the x86 instruction set), 256MB DDR RAM, 3 Giga Ethernet LAN port, 2 PCMCIA slot, 4 USB port and one serial port. To reduce risk of failure, we choose a fan less hardware solution. Moreover, the box does not embed a mechanical hard disk drive. The operating system, file system and execution environment are stored in a memory card (e.g Compact Flash). Figure 1 shows, on the left, an inside view of the case where we can see the small mother board and its large passive cooling system (white part) hiding
Fig. 1. Internal and connections views of the industrial autonomic network node
98
M. Chaudier, J.-P. Gelas, and L. Lefèvre
chipset and CPU. A second picture shows a backside view of the case with all connectors.
3 3.1
Software Execution Environment Operating System
The industrial autonomic network node environment runs over Btux provided by the Bearstech[2] company. Btux is based on a GNU/Linux operating system running a 2.6.12 kernel version. However the whole system has been rebuilt from scratch and designed for embedding systems (small memory footprint, selected command set available). The operating system respects standards and is remotely upgradeable to easily apply patches and updates without human intervention. For the IAN 2 node, we worked in tight collaboration with the Bearstech engineers to add a secured network wireless connection and multimedia sensors (audio and video) support. We also use information returned by internal sensors (i.e temperature) in order to take intelligent decision and do not expose our hardware to critical situation (e.g over heating). 3.2
Programmable Dynamic Environment
This section describes the software used on top of the operating system node (described above). This software is called an Execution Environment (EE) which is used to dynamically plug-in and run Active Applications (AA) also called active services. A service is deployed on demand and applied on one or several data streams. Services can run in parallel and are all executed in the EE. IAN 2 software architecture. We propose the IAN 2 Industrial Autonomic Network Node architecture (Fig. 2). This node supports switching and routing
Network Services Sandbox for deploying network services
Storage facilities
Switching and routing protocols
Fig. 2. IAN2: Industrial Autonomic Network Node
Towards the Design of an Industrial Autonomic Network Node
99
protocols through wire and wireless connection hardware. The limited CPU facilities are open to dynamically deploy autonomic services. Some limited storage capabilities are available to support heterogeneous classes of services. Our Execution Environment called T amanoirembedded is based on the Tamanoir [7,8] software suite written by L. Lefèvre and J.P. Gelas (from INRIA, France). The Tamanoir suite is a high-performance execution environment for active networks designed to be deployed in either local area networks or wide area networks. It is a prototype software with features too complex for an industrial purpose (clusterbased approach, Linux modules, multi-level services[11]. . . ). Due to some typical industrial constraints (e.g code maintenance), we reduce the code complexity and remove all unused classes and methods or actually useless for the Temic [3] project. It allows us to reduce the overall size of the software suite and make the maintenance and improvement of the code easier for service developers (Figure 3).
Cluster : Distributed CPU and storage JAVA / Execution Environment OS / Kernel Programmable Modules
JAVA / Execution Environment OS
NIC (Programmable)
Fig. 3. From a generic Active Network environment (Tamanoir) to an Industrial Autonomic Environment (T amanoir embedded )
T amanoirembedded is a dedicated software platform fully written in Java and suitable for heterogeneous services. Tamanoir provides various methods for dynamic service deployment. First method allows services to be downloaded from a service repository to a Tamanoir Active Node (TAN). Second method allows a TAN to request the service from the previous active node crossed by the active data stream (Figure 4). Autonomic Service Deployment
Autonomic Service Deployment
Fig. 4. Autonomic Service Deployment on wire connections
100
M. Chaudier, J.-P. Gelas, and L. Lefèvre
T amanoirembedded also supports autonomic deployment and services updating through mobile equipments (Figure 5). Inside automatic maintenance projects, we deploy wireless based IAN 2 nodes in remote industrial environments (no wire connections available). In order to download maintenance information, human agents can come near IAN 2 nodes to request informations. During this step, mobile equipments (PDA, Tablets, cellulars) are also used as mobile repositories to push new services and software inside autonomic nodes (Figure 5).
Autonomic Service Deployment
Multimedia streams
Fig. 5. Autonomic Service Deployment through mobile nodes
4
Experimental Evaluation
In this section, we present first evaluations of the industrial autonomic network node. Experimental results are divided into three categories. First, we present results in terms of network performances (wire and wireless networks). Then, we explore some preliminary results obtained with the software of the autonomic execution environment of the IAN 2 . And last, we present experimental results obtained in a multimedia industrial context. 4.1
Network Performances
We evaluate the performances of IAN 2 concerning wire and wireless network interfaces. We used the iperf[1] tool for measuring TCP bandwidth performances. Iperf can report bandwidth, delay jitter and datagram loss. We experiment network performances within two topologies (Figure 6).
Streams exchange
Streams generation
Streams generation
Forwarding IAN2
Fig. 6. back-2-back and gateway experimental local platforms
Towards the Design of an Industrial Autonomic Network Node
101
We call back-2-back topology when one IAN 2 is connected straight to another IAN 2 through a short (50 cm) Ethernet cable (cat 6) (Figure 6). We call gateway topology when we connect two IAN 2 through a third one. In this case we allow IP forwarding on the node in the middle. We set the TCP no delay IPerf option which disables Nagle algorithm but we didn’t noticed any significant difference. Table 1 shows bandwidth results and corresponding CPU usage under the two different topologies. We observe that back-2-back IAN 2 nodes failed to obtain a full Gbit bandwidth with TCP streams. When a third node is involved as a gateway, throughput is also more impacted. These results come mainly from the limited CPU embedded in the IAN 2 which limits the capabilities of sending large streams of data. Table 1. Raw performances shown by iperf with default values of buffer length and TCP window size Configuration. back-2-back gateway (1 stream) gateway (8 streams)
Throughput cpu send cpu recv cpu gateway 488 Mbps 90% 95% N/A 195 Mbps 29% 28% 50% 278 Mbps 99% 65% 70%
For the next experience, we use one industrial autonomic network node (IAN 2 1) to transmit two streams to another industrial autonomic network node (IAN 2 2), using only one Giga Ethernet link. We obtain 312 Mbps and 229 Mbps (total 541 Mbps) and each CPU was used to the maximum (2x50% on transmitter and receiver). We also try the full-duplex feature of our card by sending one stream from IAN 2 1 to IAN 2 2 and vice-et-versa (bidirectional connection). We obtain 196 Mbps and 247 Mbps (total 443 Mbps). About 2x50 % of CPU was used on each CPU (i.e transmitter and receiver). We notice that there are as many iperf processes running as data streams on the link, and each process shares with equity the CPU load. The figure 7 shows throughput performance reduction in function of number of streams on a back-to-back connection topology.
Throughput (Mbps)
500
Wired throughput
400 300 200 100 0 1
5
10
20 Number of streams
40
Fig. 7. Throughput performance reduction when number of stream increases between two IAN 2 connected back-to-back
102
M. Chaudier, J.-P. Gelas, and L. Lefèvre
We also did some tests with a PCMCIA wireless card (using the Orinoco Linux modules) plugged in the IAN 2 (802.11b). Best obtained throughput, when the IAN 2 is 10 meters far from the wireless Access Point was only 4.45 Mbps (without external antenna). The figure 8 shows throughput performance reduction in function of number of streams in a wireless context. We also did some bidirectional tests and surprisingly we obtain an average throughput equal to the maximum speed. Finally, we try to remove the TCP no delay option (Nagle) and obtain a slightly lower performance (3.92 Mbps). These experiments show that IAN 2 nodes must be dedicated to some specific platforms (wireless environments, xDSL, Fast Ethernet). This can be compatible with some current industrial deployments.
Throughput (Mbps)
5
Wireless throughput
4 3 2 1 0 1
5
10
20
40
Number of streams
Fig. 8. Throughput performance reduction when number of stream increases in a wireless context
4.2
Evaluating Autonomic Performances
We present some results obtained with the T amanoirembedded Execution Environment. We ran two different active services : a lightweight service (in terms of CPU usage), called MarkS, used to count and mark the packets crossing the Tamanoir node. And a heavyweight service (in terms of CPU usage), called GzipS which compresses packets payload on-the-fly using the Lempel-Ziv coding (LZ77). The Execution Environment and services run in a SUN JVM 1.4.2. Table 2 shows performance results with different payload size for both services. Table 2. Throughput (Mbps) of Tamanoir applying a lightweight service (MarkS) and a CPU consuming service (GzipS) 4kB 16kB 32kB 56kB MarkS 96 144 112 80 GzipS 9.8 14.5 15.9 16.6
Towards the Design of an Industrial Autonomic Network Node
103
We compare obtained results with high performance active network node platform (embedded in a Compaq DL360 (G2) Proliant, dual-PIII, 1.4GHz, 66MHz PCI bus). We used different network interfaces (Fast, Giga Ethernet) and protocols (UDP / TCP).
100
500
1 stream/TCP/Fast/P3 1 stream/TCP/Giga/P3 1 stream/TCP/IAN2
1 stream/Fast/P3 1 streams/Giga/P3 1 stream/IAN2
80
400
Mbps
Mbps
60
300
40
200
20
100
0 128B
0
1k 4k payload size (Bytes)
8k
32k
1k
4k
8k
16k
32k
payload size (Bytes)
Fig. 9. Throughput comparisons of a Fig. 10. Throughput comparisons of a lightweight service heavy service
The IAN 2 based on a reliable fan-less no disk node with a lightweight Execution Environment, shows comparable results with a slow desktop machine with a slow hard disk drive. We can see the limit of IAN 2 CPU with the GzipS service (CPU is 100% used for a bandwidth of 16 Mbps). For a lightweight service, like the MarkS service, we observe a combined limitation of the CPU and the IAN2 network interfaces cards. The major point of deception is about the network interfaces announced to support 1 Gbps and which sustain with difficulty only half of the bandwidth. Moreover, we can observe that due the modification of the Tamanoir high performance Execution Environment, the autonomic node does not benefit from some improvements (lightweight Linux modules, efficient JVM. . . ). Thus all data packets are processed inside the Java Virtual Machine. But the industrial deployment on an industrial autonomic network node can still benefit from this trade off between performances and reliability. 4.3
Performances of the IAN 2 Node within Multimedia Context Application
In our architecture, the industrial autonomic node is the point where all active services are performed, so it is a critical point. To evaluate its performances, we measure the processor load during the adapting and the transmitting of a video file (Fig 3). Results show that the CPU of the IAN 2 (VIA C3 1 GHz) is intensively used during video adapting. There is no enhancement even when the video size decreases. In table 3, the transmission with the MJPEG format is performed by the same active service, but with no adapting step. In this case, we observe
104
M. Chaudier, J.-P. Gelas, and L. Lefèvre Table 3. CPU load on the IAN 2 when adapting and transmitting a video file Format / Size Usr CPU load MJPEG / 720x480 <1% H263 / 352x288 98,7 % H263 / 176x144 99,3 % H263 / 128x96 99 %
Table 4. Output data rate when adapting and transmitting a video file on IAN 2 Output Format / Resolution Entry File / Output File Transmitting time PDA loading time MJPEG / 720x480 14794 KB / 14794 KB 4 min 50 sec 5 min 10 sec H263 / 352x288 14794 KB / 1448 KB 22 sec 2 min 55 sec H263 / 176x144 14794 KB / 365 KB 8,5 sec 1 min 30 sec H263 / 128x96 14794 KB / 179 KB 3,8 sec 1 min 18 sec
400 H263-128x96 H263-352x288 MJPEG-352x288
350 300
kbits/s
250 200 150 100 50 0 5
10
15
20
25 30 Time (s)
35
40
45
50
Fig. 11. Output data rate when adapting and transmitting a video stream over RTP
that the CPU load is negligible. This proves that the load is totally due to the processing of data adapting and resizing, done by the autonomic service. Adapting is computation intensive for the IAN 2 equipment.. To evaluate the impacts of data adaptation on the network, we measure the output data rate on the active node (using wireless network), when transmitting an adapted video file to a PDA (table 4) and when transmitting over RTP an adapted video stream to a laptop (figure 11). We can observe on table 4 the link between video file adaptation (requiring an intensive usage of CPU) with transmitting performances of adapted file. The IAN 2 node applies adaptation of a MJPEG file and sends the results to a remote PDA. Even with a limited CPU, the industrial autonomic network node provides
Towards the Design of an Industrial Autonomic Network Node
105
efficient adaptation which reduces the amount of transported data and globally improves the performances of the application. On figure 11, we also observe the same results : the adapted stream consumes less bandwidth when changing the video format (from MJPEG to H263) and when resizing video (from CIF 352x288 to SQCIF 128x96). Resizing the video decreases considerably the amount of data transmitted on the network. When transmitting a multimedia file, the transmitting time is thus shortened, and so the occupation time on the network. These results show that adaptation on the autonomic network node is beneficial to save network bandwidth.
5
Related Works
Various research projects linking academic partners and industrial partners have explored the design of industry targeted active network and environments. GCAP[15] describes a prototype of an active commercial router. While ANDROID[4] and FAIN[5,6] propose models and prototypes of generic active network architecture. Some companies have also explored the proposition of active infrastructure in order to deal with their own needs. NTT[13] proposes the A-BOX an active network node mixing hardware and software for supporting Gigabit networks. Hitachi[6,14] presents some gateways mixing active networks and web services. Few network operators have experimented deployment of active environment on their network equipments. Nortel has proposed a software execution environment running on an Accelar Gigabit switch[9,10]. Alcatel[12] has proposed prototype of an active node based on an OmniSwitch Alcatel router.
6
Conclusion and Future Works
In this paper we described the design of the IAN 2 prototype of industrial autonomic network node. We discuss hardware choice to fit typical industrial requirements in terms of space usage, limited accessibility and then remote maintenance. Then, we discussed the software solution used to provide a reliable network device based on a minimal open source operating system easily upgradeable remotely. We proposed the T amanoirembedded suite, a simplified execution environment able to dynamically deploy active services and apply them to the data streams in order to adapt them to the terminal clients (e.g tablet PC, cell phone or PDA). By doing this, we propose a reliable “plug and route” autonomic node. This paper provides also a performance evaluation of IAN 2 in terms of processing power, networking and Execution Environment performances. Results show that performances are far from a current desktop machine so we cannot deploy IAN 2 for high performance networking (Giga) platforms. But for lower bandwidth architecture (Fast Ethernet, xDSL or Wireless networks), IAN 2 nodes can perfectly support a large class of reliable autonomic services.
106
M. Chaudier, J.-P. Gelas, and L. Lefèvre
Switching from an experimental and academic project to an industrial project providing an equipment running in a production context is a real challenge. However, it is a mandatory step if we want to see one day a large number of active and programmable equipments deployed. Next step concerns the development of a set of autonomic services in order to be deployed on IAN 2 nodes.
Acknowledgments The authors like to thank the members of the RNRT TEMIC project : SWI company, Université de Franche Comté (LIFC) and the Université de HauteAlsace (GRTC). The authors like also to thank L. Haond and L. Montagne from Bearstech[2] company who provide valuable help with Btux.
References 1. Iperf, http://dast.nlanr.net/Projects/Iperf/ 2. Bearstech company (2005), http://bearstech.com 3. Tobiet, et al.: Panorama des réseaux utilisés et services à valeur ajoutée temic. Delivrable D2.1 (March 2005), http://temic.free.fr 4. Fisher, M.: Android: Active network distributed open infrastructure development. Technical report, University College of London (2001) 5. Galis, A., Denazis, S., Brou, C., Klein, C.: Programmable Networks for IP Service Deployment. Artech House (May 2004) 6. Galis, A., Plattner, B., Smith, J.M., Denazis, S.G., Moeller, E., Guo, H., Klein, C., Serrat, J., Laarhuis, J., Karetsos, G.T., Todd, C.: A flexible IP active networks architecture. In: Yasuda, H. (ed.) IWAN 2000. LNCS, vol. 1942, pp. 1–15. Springer, Heidelberg (2000) 7. Gelas, J.-P., Hadri, S.E., Lefèvre, L.: Tamanoir: a software active node supporting gigabit networks. In: ANTA 2003: The second International Workshop on Active Networks Technologies and Applications, Osaka, Japan, pp. 159–168 (May 2003) 8. Gelas, J.-P., Hadri, S.E., Lefèvre, L.: Towards the design of an high performance active node. Parallel Processing Letters journal 13(2) (June 2003) 9. Jaeger, R., Bhattacharjee, S., Hollingsworth, J., Duncan, R., Lavian, T., Travostino, F.: Integrated active networking and commercial-grade routing platforms. In: Usenix: Intelligence at the Network Edge, San Francisco (March 2000) 10. Lavian, T., Wang, P.: Active networking on a programmable networking platform. In: IEEE OpenArch 2001, Anchorage, Alaska (April 2001) 11. Lefèvre, L.: Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks. In: IEEE Society (ed.) Workshop on Autonomic Communication for Evolvable Next Generation Networks - The 7th International Symposium on Autonomous Decentralized Systems, Chengdu, Jiuzhaigou, China, pp. 738–743 (April 2005) 12. Marcé, O., Clevy, L., Drago, C., Moigne, O.L.: Toward an industrial active IP network. In: Third International Working Conference on Active Networks (IWAN), Philadelphia, USA (September 2001)
Towards the Design of an Industrial Autonomic Network Node
107
13. Murooka, T., Hashimoto, M., Takahashi, N., Miyazaki, T.: High-speed active network node - its concept and implementation for gigabit-network applications. In: The 47th IEEE International Midwest Symposium on Circuits and Systems (2004) 14. Nishikado, T., Koizumi, M., Oochi, H.: Large-scale High-quality Communication Service Solution Using Active Network Technology. In: Hitachi Review, vol. 49 (December 2000) 15. Urueña, M., Larrabeiti, D., Calderón, M., Azcorra, A., Kristensen, J.E., Kristensen, L.K., Exposito, E., Garduno, D., Diaz, M.: An active network approach to support multimedia relays. In: Joint International Workshop on Interactive Distributed Multimedia Systems / Protocols for Multimedia Systems (IDMS-PROMS), Coimbra, Portugal (November 2002)
A Web Service- and ForCES-Based Programmable Router Architecture Evangelos Haleplidis1, Robert Haas2, Spyros Denazis1, 3, and Odysseas Koufopavlou1 1 University of Patras, ECE Department, Patras, Greece {ehalep,sdena,odysseas}@ee.upatras.gr 2 IBM Research, Zurich Research Laboratory, Rüschlikon, Switzerland
[email protected] 3 Hitachi Sophia Antipolis Lab, France, Spyros
[email protected]
Abstract. Programmable networks have accentuated the need for a clear separation of the control and forwarding planes. The IETF ForCES protocol allows control elements to be connected to logically separated forwarding elements. The FlexiNET IST project relies on dynamic service deployment, which requires router programmability in the control and/or forwarding planes. Moreover, to shorten the implementation and deployment time of control elements, there is a need for simple higher-level APIs that shield such elements from ForCES protocol and model details. This paper proposes a ForCES CE Gateway (ForCEG) architecture that fulfills these requirements and maps Web Service interfaces to ForCES messages while checking the validity of commands to ensure consistency of the router state.
1 Introduction The need for programmable networks stems from the demand to rapidly create, deploy, and manage new services in a dynamic way. As stated in [1], “a programmable network is distinguished from any other networking environment by the fact that it can be programmed from a minimal set of APIs from which one can ideally compose an infinite spectrum of higher level services”. A network can be divided into three distinct planes: forwarding, control, and management. The forwarding plane is the time-critical processing path. It performs operations directly on data packets, such as forwarding, header modification, content-based filtering, classification, encryption, etc. Forwarding-plane components are usually realized in specialized hardware such as ASICs or network processors. The scope of functions handled with such hardware is constantly expanding, i.e., TCP stack termination, RDMA support, etc [2]. The control-plane processing path is less time-critical and handles redirections from the data path (such IP options handling) as well as packets destined to the router itself, such as routing protocol updates. The control plane also includes tasks such as routing table maintenance. Control-plane components are typically based on generalpurpose processors. D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 108–120, 2009. © IFIP International Federation for Information Processing 2009
A Web Service- and ForCES-Based Programmable Router Architecture
109
The management plane, which provides an administrative interface to the overall system, consists of both software executing on a general-purpose processor as well as probes and counters in the hardware [3]. ForCES (FORwarding and Control Element Separation) is a new IETF protocol currently being standardized. The ForCES working group aims at defining a model and a protocol to standardize the information exchange between the control and forwarding planes [4],[5], [6]. Simultaneously, the Network Processing Forum (NPF) is developing APIs for services that run on the router control point which may be based on the ForCES protocol [7]. However, the increasing number of APIs increases the complexity of the control software because the selection and use of the appropriate API(s) becomes far from trivial. Similarly, the diversity of representations of forwarding-plane capabilities offered by the flexible ForCES modeling introduces additional complexity if control-plane services are directly exposed to it. For these reasons, we introduce an intermediate level that provides simple, specific, and dynamically loadable Web Service interfaces for control-plane services to configure the forwarding plane. Another issue relates to maintaining the consistency of the router state while multiple control-plane services are executing and changing the state of the forwarding-plane components using ForCES, possibly originating as NPF, Web Services, or other APIs. The ForCES gateway (ForCEG) introduced here is responsible for detecting and resolving such conflicts, and is logically placed between control software components and the forwarding hardware. The ForCEG processes configuration commands expressed in XML originated by control-plane components, determines inconsistencies with commands from other control-plane components and translates them into ForCES configuration commands to be sent to the forwarding-plane hardware while providing an open interface based on Web Services. Fig. 1 depicts the ForCEG.
Fig. 1. ForCEG Overview
The ForCEG architecture is developed as part of the FlexiNET IST research project. An objective of this project is to be able to dynamically deploy new services in a distributed router environment, taking advantage of the router programmability in the control plane and whenever possible also in the forwarding plane. In the FlexiNET project, the router is based on a distributed architecture consisting of a collection of off-the-shelf boxes interconnected in a LAN and appears as one router to the outside.
110
E. Haleplidis et al.
New services will be installed on various physical boxes, and these services should be able to configure the treatment of packets in the router [8]. This paper is organized as follows. Section 2 reviews related work. Section 3 describes the ForCES protocol and model. Section 4 provides an overview of the FlexiNET architecture and the rationale behind the development of the ForCEG. Section 5 focuses on the proposed concept. Section 6 outlines the proposed architecture. Section 7 gives an outlook on future work and summarizes the contributions of this paper.
2 Related Work The IEEE PIN1520 standardization effort [9], [10] addressed the need for a set of standard software interfaces for the programming of networks in terms of rapid service creation and open signalling. The open interfaces were structured in a layered fashion, offering their services to the layers above. Each layer comprised a number of entities in the form of algorithms or objects representing logical or physical resources depending on the layer’s scope and functionality. Four different layers were distinguished: the Physical Element layer, consisting of entities such as hardware, the Virtual Network Device (VND) layer, to logically represent resources in the form of objects, the Network Generic Services level, to logically bind entities from the VND layer into specific network functionalities, and the Value-added services layer, which included entities in the form of end-to-end algorithms. The NPF (Network Processing Forum) has developed APIs that can make use of ForCES to deliver messages to the Network Processing Elements. Examples of available APIs are IPv4 API, MPLS API, and Classification API [11]. The NPF has distinguished two layers, the System Abstraction Layer and the Element Abstraction Layer. The first layer exposes vendor-independent system-level functionality. APIs exposed at this layer are unaware of the existence of multiple forwarding planes and provide an abstraction of the underlying system that presents the functionality being manipulated by the control plane without regard to the physical topology of the system. These APIs are used by the protocol-stack and other Internet software vendors (ISVs) that do not require access to implementation details and prefer a “black-box” view of the packet-processing elements of a system. The second layer exposes vendor-independent forwarding-element-aware functionality. APIs exposed at this layer expose which forwarding element they are addressing. These APIs hide less of the system details, thus exposing the presence of multiple forwarding devices and their individual capabilities. They hide the vendor-specific details of the network-processing devices and are vendor-neutral [11]. The NPF APIs provide the necessary abstraction to the developers of Control Plane software components but require knowledge of the various APIs and their architecture. The Netconf IETF working group is currently trying to standardize a new protocol for router configuration. The Netconf protocol defines a simple mechanism by means of which a network device can be managed, configuration data information can be retrieved, and new configuration data can be uploaded and manipulated. It uses an Extensible Markup Language (XML) based data encoding for both the configuration data and the protocol messages. The Netconf protocol operations are realized on top of a simple Remote Procedure Call (RPC) layer. The protocol allows the device to expose a full, formal application programming interface (API). Applications can use this straightforward API to send and receive full and partial configuration datasets [12].
A Web Service- and ForCES-Based Programmable Router Architecture
111
3 ForCES Description Netconf and ForCES have similar aims but use a different model. As stated in [4], the reasons for separating the control from the forwarding plane is to provide increased scalability and to allow the two planes to evolve independently, promoting innovation, and extending the scope of applications compared with GSMP [13] or COPS [14]. Forwarding Elements (FEs) are logical entities in the forwarding plane that implement the ForCES protocol. FEs expose their capabilities and state to their assigned Control Element (CE) using a standardized model. A key design choice of the modelling is to avoid an extensive and hence complex modelling of the FE and instead use a coarse model combined with runtime error reporting. CEs are logical entities that implement the ForCES protocol in the control plane and use it to instruct one or more FEs how to process packets. Having standard mechanisms allows CEs and FEs to become logically, and potentially physically, separated standard components. ForCES focuses on the communication and model necessary to separate control-plane functionality such as routing protocols, signalling protocols, and admission control, from data-forwarding-plane per-packet activities, such as packet forwarding, queuing, and header editing [4], [5], [6]. The modelling is based on Logical Function Blocks (LFBs), which are blocks encapsulating fine-grained operations of the forwarding plane. Each LFB is responsible for completing only a single, specific task, such as reducing the TTL field of an IPv4 packet [5]. FEs may have a static LFB topology for which only the state of some LFBs can be changed using the ForCES protocol, or a dynamic topology that allows the linkage between LFBs to be logically reorganized dynamically, hence allowing new functions to be created. Note that the LFB-based model is independent of the actual implementation of the FE; it only provides a view of its capabilities and state that can be acted upon using the ForCES protocol. The model can be described with XML, but the protocol uses a more compact representation on the wire between the CE and FE. Only modelling of the forwarding plane is within the scope of the ForCES working group. Also, FEs can only be controlled by a single active CE, hence different services that require state changes in LFBs belonging to the same FE have to go over the same CE. In this paper, we further separate the functionalities of the control point into the Main Control Programs (MCPs), which execute control-plane services (corresponds to the Software Modules in Fig. 1), and the ForCES CE Gateway (ForCEG), which implements, among other things, the CE side of the ForCES protocol.
4 FlexiNET Architecture As stated in Section 1, the ForCEG is developed in the context of the FlexiNET project. The main objective of the FlexiNET project is to define and implement a scalable and modular network architecture incorporating adequate network elements (FlexiNET Node Instances) offering cross-connect control, switching/routing control, and
112
E. Haleplidis et al.
advanced services management/access functions at the network access points that currently only support connectivity between user terminals and network core infrastructures [15], [16], [17]. The FlexiNET network architecture consists of the node instances, communication buses, and data repositories shown in Fig. 2.
Fig. 2. FlexiNET Architecture
The FlexiNET UMTS Access Node (FUAN) provides functions to the FlexiNET interfaces such as switching/routing control, access to application data and service logic, etc. The FUAN complements existing access nodes (RNC, BSC) of UMTS networks. The FlexiNET WLAN Access Node (FWAN) acts both as a services access gateway (user authentication, service authorization, service discovery, etc.) and as a connection gateway between WLAN infrastructures and the FlexiNET WAN. FWAN achieves user and service roaming capabilities between different providers and service programmability using dynamic service deployment. Service programmability functions are provided over the Hitachi distributed router architecture [18], which will perform dynamic service deployment and consequently configuration of the forwarding plane using the ForCES protocol [16]. The FlexiNET Data Gateway Node (DGWN) acts as the gateway between a generic Storage Area Network (SAN) infrastructure and the FlexiNET network. It is used by other FlexiNET instances to access subscriber (profile, location, etc) and application data required for service execution. It provides a Generic Data Interface that other FlexiNET elements may use to access data stored in the SAN.
A Web Service- and ForCES-Based Programmable Router Architecture
113
The Generic Applications Interface Bus is used for the implementation of application-related functions and the communication of information flows pertaining to the execution of application and service logic, including a framework allowing service registration, discovery and binding across the FlexiNET network. The FlexiNET Applications Server (FLAS) is a physical entity that hosts services application logic. These services are called remotely from other entities and executed locally. Using the DGWN services, the FLAS can retrieve specific information needed for services execution. The ForCEG is developed as part of the FWAN. The FWAN architecture is shown in more detail in Fig. 3: it is a distributed router consisting of two functional blocks: the basic and the extensible function blocks. The basic function block executes general packet-forwarding functions that are common to all services, e.g., it classifies received packets and handles routing table retrieval, packet switching, and node management. The extensible function block may be comprised of modules that lie in the control plane hosting control functionality e.g. OSPF routing protocols, or of modules that lie in the forwarding plane hosting packet-forwarding functions for specific protocols and services, such as packet filtering for firewalls, layer-7 processing for content switching, address translation, encryption, etc. The control plane modules are in fact the hosts of the MCPs provided that there are enough resources available. The FWAN prototype is composed of a network processor as basic functional block, and two PCs as extended functional blocks. A user will access the FWAN through an access point using either a laptop or a mobile phone. The FWAN is responsible for authenticating native and roaming users through the FLAS using an AAA Proxy.
Fig. 3. FWAN Architecture
By default, the FWAN contains at least two software modules, i.e., a Dynamic Service Deployment module (DSD), which is responsible for the dynamic deployment of new services, and at least one ForCEG module. At boot-up the DSD module is responsible for dynamically deploying the AAAProxy module. The DSD module retrieves the AAA-Proxy service code through the DGWN and deploys it on one of the two PCs based on dynamic allocation algorithms. In addition, based on user profiles, the DSD module deploys a Quality of Service Module (QoS), which is responsible for providing preferential traffic treatment to specific users. These two modules act as MCPs and issue commands to the ForCEG in order to configure the network processor to provide the functionality offered to users.
114
E. Haleplidis et al.
5 ForCEG Concept While the current scope of ForCES is the communication between one CE and one FE in a local area network, and the configuration of the FE by the CE, as stated in Section 3, the ForCEG extends the availability of ForCES by providing a Web Service Interface, facilitates the coexistence of multiple MCPs using a central Control Logic, and manages the state of the forwarding plane elements by staying aware of the current state of all the FEs. As stated in Section 4, any MCP may potentially be deployed on any hardware module in the FWAN, thus the API should be based on a technology that provides accessibility through a network (local or the internet). Another requirement for the ForCEG is a versatile interface, i.e., an easy-to-use and easy-to-update interface. The above requirements points towards the Web Service technology. Web services is a technology that allows applications to communicate in a platform- and programming-language-independent manner. A Web service is a software interface that describes a collection of operations that can be accessed over the network through standardized XML messaging. It uses protocols based on the XML language to describe an operation or data exchanged with another Web service. Web services technology uses XML that can describe any and all data in a truly platformindependent way for exchange across systems, thus moving towards loosely-coupled applications [19], [20]. The protocol stack that the ForCEG is based on is shown in Fig. 4. ForCES protocol messages are exchanged over TCP. Web Services use XML, which is encapsulated in SOAP messages. Any MCP must first discover the ForCEG using WSDL files, which define the endpoint of the Web Service and description of the Web Service operation, located in a registry, such as a UDDI registry 21. The MCP sends a Web Service request to the ForCEG, which attempts to translate the request into a ForCES message. When an FE wants to send a message to an MCP, the ForCEG attempts to translate the ForCES message into a Web Service response that the MCP will recognize.
Fig. 4. ForCEG Protocol Stack
In addition to translation from XML messages to ForCES messages and back, the ForCEG has a consistency-checking functionality: As it is placed between the MCPs and the hardware, it is the checking point of all exchanged messages. Messages from some MCPs may disrupt the normal functionality of other MCPs. The ForCEG monitors such messages and either informs the MCP as to which functionality is about to be changed, or disallows the message.
A Web Service- and ForCES-Based Programmable Router Architecture
115
Fig. 5. “Target” Concept
To conceal the details of the ForCES model, higher-layer functions such as Firewall, Routing, QoS-related, etc., can be introduced and associations made between these functions and LFBs. The ForCEG can advertise the higher-layer functions in a WSDL [22] file for all MCPs to discover. The ForCEG then can recognize which LFBs the MCP needs to modify. An MCP “targets” a higher-layer function, provides the necessary attributes according to the “target” field, and then inserts an operation (such as SET or GET). The ForCEG is then able to construct the necessary ForCES messages to be sent to the appropriate LFBs. Fig. 5 depicts the “Target” concept. The ForCES FE Start/Termination Point is a necessary component for the ForCES protocol as it is the source and the destination of ForCES messages. As the QoS software module needs to setup classifier rules, it “targets” a QoS-related function, provides the necessary classification rules, and issues a SET command. A translator module inside the ForCEG will translate the attributes of the message, and a ForCES CE Start/Termination Point will transmit the ForCES message to the FE as described in Section 6. In the WSDL file, an MCP can acquire data regarding the URL for calling the API, the available higher-layer functions and an XML schema that defines which kind of data types each of these higher-layer functions requires. Making ForCEG web-service enabled allows the advertisement and discovery of all the supported operations through one URL. In addition, the ForCEG is able to dynamically download other mappings between higher-layer functions and LFBs from an external source, and re-publish it to the UDDI registry. Such a dynamic configuration provides additional programmability capabilities to the ForCEG.
6 The ForCEG Architecture This section describes the proposed ForCEG architecture realizing the concepts presented above.
116
E. Haleplidis et al.
Fig. 6. Proposed ForCEG Architecture
The proposed ForCEG architecture and its major architectural components is depicted in Fig. 6. The Web Services Server hosts the interfaces between the MCP and the remainder of the ForCEG. It is responsible for sending and receiving messages from different MCPs. All interfaces between MCPs and the Web Services Server are through Web Services. Besides the interfaces for sending/receiving messages from the MCPs, the Web Services Server hosts additional interfaces for interaction with a UDDI registry. This allows the ForCEG to publish its own interfaces as well as to discover interfaces of MCPs to provide packets or events back to the MCPs. Another functionality of the Web Services Server is the ability to provide dynamically downloaded interfaces from an external source in order to satisfy requirements of MCPs that require a previously unavailable higher-layer function, i.e. if an MCP requires a firewalling function that is not currently present in the ForCEG, it will request from an external source the appropriate operation and publish it for the MCP to discover. Together with the operation details, the Web Service Server will download the necessary XML to LFB mappings required by the ForCES Translator. The Message Parser is a central module that has two functions, depending on the flow of a message. If the message arrived from the MCP then the Message Parser receives the XML message from the Web Service Server, parses it, and instructs the Command Control Logic (CCL) to check the validity of the message. When it receives a positive response (i.e., a message will not create any conflict with LFB configuration), it forwards the message to the ForCES Translator. If the message arrived from the FE then the Message Parser receives the XML message from the ForCES Translator and instructs the Subscribed Events & Pending Responses (SEPR) module to identify the MCP(s) that should receive the response message(s). Once the MCP(s) has been identified, the Message Parser sends the message(s) to the appropriate MCP(s) through the Web-Service Server. The CCL module is responsible for performing coordination functions to prevent contradictory commands from different processes. It is responsible for examining each command issued by each MCP and making sure that a new message does not affect the current functionality of existing MCPs in any way. It takes the data it requires from the “Current FE & LFBs’ state” module.
A Web Service- and ForCES-Based Programmable Router Architecture
117
The ForCES Translator module has two functions, depending on the flow of a message. If the message arrived from the MCP then the ForCES Translator receives an XML message from the Message Parser and translates the message into a ForCES protocol message that is based on the higher-layer function the MCP wants to configure. It obtains the correlation data from the “Current FE & LFB’s state” module. If the message arrived from the FE then the ForCES Translator receives a ForCES message from the FCSTP and translates the ForCES message into XML. First the ForCES translator reads the ForCES message and determines the ForCES message type (i.e. packet redirection from an FE to an MCP, notification message), in order to add an XML tag which will be identified by the Parser, and then the ForCES Translator inserts the packet into the XML message. Depending on the message type, the treatment of the packet varies. If the message is a packet redirection, the packet is encapsulated into an array of bytes in the XML message, whereas in the case of a notification message, the notification is processed and transmitted as string values. The FCSTP module has two functions depending on the flow of a message. If the message has arrived from the MCP then the FCSTP receives a ForCES protocol message from the Translator. The FSCTP is responsible for sending the message to the appropriate FEs by checking the specific field in the ForCES protocol message. If the message has arrived from the FEs then the FCSTP sends the ForCES message directly to the ForCES Translator. In addition to the message-based functions, the FSCTP has functions regarding the ForCES protocol: it is responsible to create, maintain and destroy associations with FEs. The SEPR module contains information as to which MCP any response arriving from the FE should be re-directed to. MCPs may also subscribe themselves to specific events. The event notification messages that arrive at the FCSTP are redirected to the required MCPs based on information it holds about which MCP has subscribed to which event(s). The CFLS module holds information about all of the associated FEs and their LFBs. It also needs to have all configurations of the LFBs. This data is used by both the ForCES Translator and by the Command Control Logic. 6.1 Use Case As an illustration of the functions provided by the above submodules, the sequence of events involved in the execution of a typical MCP command (packet-classification configuration command originated by the AAA Proxy) is shown in Fig. 7. When the ForCEG is instantiated it registers itself on a UDDI registry. This action occurs upon instantiation, and only when the ForCEG needs to update the registered information, i.e. when a new mapping has been downloaded. The AAA Proxy is dynamically downloaded and instantiated using methods and operations described in [23]. Once the AAA Proxy has been instantiated, it discovers the ForCEG endpoint through an external UDDI registry as well as the required operation name and the parameters to make a valid call to the ForCEG. Once the above operations have completed, the AAA Proxy is ready to make calls to the ForCEG. The AAA-Proxy module sends a configuration message to the Web-Services Server, which passes the XML file to the Parser. The Parser reads the XML file and requests validation from the CCL that the command issued by the AAA proxy will not affect any other MCP. The CCL validates the command with regard the current
118
E. Haleplidis et al.
state of the FE and the other MCPs configurations, and returns a response. If the response is positive, the Parser passes the XML file to the Translator. The ForCES Translator, based on the current state of the FE and the “target” of the AAA Proxy, translates the XML file into a ForCES message by correlating data from the CFLS and passes it to the FCSTP. The FCSTP receives the messages and transmits it to the appropriate FE as specified in the ForCES Destination ID protocol field. When an acknowledge response arrives from the FE, it is transmitted back to the AAA Proxy over the Web Service.
Fig. 7. ForCEG Use Case for AAA Proxy
The first ForCEG prototype is being developed in Java, while the FE counterpart is being developed in C++. The ForCEG is configuring Intel’s IXP 2400 Network Processor.
7 Evaluation and Conclusion This paper has shown the issues raised by distributed router programmability and the requirements for an architecture built on top of a ForCES framework. The desire to accelerate the introduction of new services and to achieve service roaming at the edges of telecommunication networks requires the introduction of dynamic service deployment methods as well as effective solutions to integrate services provided by ISVs, for instance. This paper has presented an architecture fitting the FlexiNET network and acting as an hourglass between ForCES-controlled forwarding-plane components and services accessing a Web Service-based interface to control the forwarding-plane datapath. The current prototype is able to create, maintain and destroy associations with FEs as well as receive simple requests from MCPs through a simple Web Service API. As
A Web Service- and ForCES-Based Programmable Router Architecture
119
the prototyping effort continues, it will help define the appropriate Web Services APIs for the relevant services in FlexiNET. Also, the effectiveness of the solution in terms of performance, versatility, and ease of use, will be evaluated. More precisely, we are interested to measure the delay to deploy and configure a new service, and the overhead incurred by our architecture as a service executes. As the FWAN at this stage is expected to operate under the complete control of a single administrative entity and that deployed services are not expected to be malicious, the security aspects have not been addressed here. However Web Services provide extensive security measures which will be evaluated in the future. Integration of other control protocols such as Netconf may extend the versatility of the ForCEG, and is an item for further research.
References 1. Campbell, A., De Meer, H., Kounavis, M., Miki, K., Vicente, J., Villela, D.: A Survey of Programmable Networks. ACM SIGCOMM Computer Communications (1999) 2. Haas, R., Jeffries, C., Kencl, L., Kind, A., Metzler, B., Pletka, R., Waldvogel, M., Freléchoux, L., Droz, P.: Creating Advanced Functions on Network Processors: Experience and Perspectives. IEEE Network (July 2003) 3. Yang, L., Dantu, R., Anderson, T.A., Gopal, R.: Forwarding and Control Element Separation (ForCES) Framework, IETF RFC 3746 (April 2004) 4. Khosravi, H., Anderson, T.A.: Requirements for Separation of IP Control and Forwarding, IETF RFC 3654 (November 2003) 5. Doria, A.: ForCES Protocol Specification, IETF draft, work in progress,
(June 2005) 6. Yang, L., Halpern, J., Gopal, R., DeKok, A., Haraszti, Z., Blake, S.: ForCES Forwarding Element Model, IETF draft, work in progress, (Feb 2005) 7. Deval, M., Khosravi, H., Muralidhar, R., Ahmed, S., Bakshi, S., Yavatkar, R.: Distributed Control Plane Architecture for Network Element. Intel’s Technology Journal 04(04) (November 2003) 8. FP6-IST1 507646 FlexiNET Technical Annex (2004) 9. Denazis, S., Miki, K., Vicente, J.B., Campbell, A.: Designing interfaces for open programmable routers. In: Covaci, S. (ed.) IWAN 1999. LNCS, vol. 1653, pp. 13–24. Springer, Heidelberg (1999) 10. Biswas, J., et al.: The IEEE P1520 Standards Initiative for Programmable Network Interfaces. IEEE Communications, Special Issue on Programmable Networks 36(10) (October 1998) 11. Putzolu, D.M.: Network Processing Forum Software Work Group, Software API Framework Implementation Agreement (2002) 12. Enns, R. (ed.): NETCONF Configuration Protocol”, IETF draft, work in progress (February 2005) 13. Doria, A., Hellstrand, F., Sundell, K., Worster, T.: General Switch Management Protocol (GSMP) V3, IETF RFC 3292 (June 2002) 14. Durham, D., Boyle, J., Cohen, R., Herzog, S., Rajan, R., Sastry, A.: The COPS (Common Open Policy Service) Protocol, IETF RFC 2748 (January 2000) 15. FP6-IST1 507646 FlexiNET D21 Requirement, Scenarios and Initial FlexiNET Architecture (2004) 16. FP6-IST1 507646 FlexiNET D22 Final FlexiNET Network Architecture and Specifications (2004), http://www.ist-flexinet.org/deliverables/FlexiNET_ alcatel_wp2_d22_final.zip
120
E. Haleplidis et al.
17. Aladros, R.L., Kavadias, C.D., Tombros, S., Denazis, S., Kostopoulos, G., Soler, J., Haas, R., Dessiniotis, C., Winter, E.: FlexiNET: Flexible Network Architecture for Enhanced Access Network Services and Applications. In: IST Mobile & Wireless Communications Summit, Dresden, Germany (2005) 18. Hirata, T., Mimura, I.: Flexible Service Creation Node Architecture and its Implementation. In: IEEE Computer Communications Workshop 2003 (October 2003) 19. New to SOA and Web Services, http://www-106.ibm.com/developerworks/ webservices/newto/websvc.html 20. Booth, D., Haas, H., McCabe, F., Newcomer, E., Champion, M., Ferris, C., Orchard, D.: Web Services Architecture, W3C Working Group (February 2004), http://www. w3.org/TR/ws-arch/ 21. Clement, L., Hately, A., von Riegen, C., Rogergs, T.: UDDI Version 3.0.2 (October 2004), http://uddi.org/pubs/uddi_v3.htm 22. Booth, D., Liu, C.K.: Web Services Description Language (WSDL) Version 2.0 Part 0: Primer, W3C Working Draft (May 2005), http://www.w3.org/TR/2005/WDwsdl20-primer-20050510/ 23. Chrysoulas, C., Haleplidis, E., Haas, R., Denazis, S., Koyfopavlou, O.: Applying a WebService-Based Model to Dynamic Service-Deployment. In: The International Conference on Intelligent Agents, Web Technologies, and Internet Commerce (IAWTIC) (November 2005) (to be published)
An Extension to Packet Filtering of Programmable Networks Marcus Sch¨oller, Thomas Gamer, Roland Bless, and Martina Zitterbart Institut f¨ur Telematik Universit¨at Karlsruhe (TH), Germany
Abstract. Several projects proposed to use active or programmable networks to implement attack detection systems for detecting distributed denial of service attacks or worm propagation. In order to distinguish legal traffic from the attack traffic bypassing packets need to be inspected deeply which is resource consuming. Such an inspection can be realized either with additional and expensive special hardware or in software. But due to resource limitations inspection of all bypassing packets in software is not feasible if the packet rate is high. Therefore we propose to add packet selection mechanisms to the NodeOS reference architecture for programmable networks. A packet selector reduces the rate of packets which are inspected. In this paper we detail on various packet selectors and evaluate their suitability for an attack detection system. The results of our implementation show significant advantages by using packet sampling methods compared to packet filtering. Keywords: Programmable Networks, NodeOS, Packet Selection.
1 Introduction Distributed denial of service (DDoS) attacks are still a major threat to the Internet today. This is a long known problem to network researchers [8,9] and has attracted public attention since the attacks against Yahoo, CNN, eBay, and many more in the last years. In a major threatening type of DDoS attack the attacker does not exploit a weakness of the victims operating system or application but aims to overload resources like link capacity or memory by flooding the system with more traffic than it can process. The attack traffic is generated by several slave systems which the attacker has compromised before. The attacker has only to coordinate all these slave systems to start the attack nearly at the same time against a single victim. Since the slave systems are scattered over the Internet the attack flows are hard to identify nearby a slave system, because a single attack flow consumes a relative small portion of the overall bandwidth and is therefore indistinguishable from a regular communication flow. On their way towards the victim system the attack flows are aggregated. At this point the attack might be detectable if traffic analysis can be applied. Due to the high bandwidth of backbone links a deep packet inspection of all packets is infeasible even with today’s standard router hardware. Another threat to the Internet today are worms [16,12]. A worm is a piece of software which automatically exploits security holes in operating systems or applications D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 121–131, 2009. c IFIP International Federation for Information Processing 2009
122
M. Sch¨oller et al.
to infiltrate a system. After a successful break-in the worm starts to propagate itself to as many other systems as possible. One side effect of this propagation is the increasing bandwidth consumption since more and more worm instances try to propagate themselves to other systems. In the extreme this can lead to a denial of service attack if the traffic caused by worm propagation overloads the link capacity. Secondly, worms can easily be used to create slave systems for a subsequent DDoS attack. Due to their feature to spread themselves automatically to other systems a large number of slave systems can be aggregated in a relatively short period of time. Today’s countermeasures to worms are signature based detection systems scanning for well-known worms. These systems are typically located at the edge of the Internet preventing the worm propagation to a specific network. An earlier detection of such a worm propagation would be possible if the detection system is located in the backbone network. But again a deep packet inspection at backbone rate is needed which is impossible with standard router hardware. 1.1 An Attack Detection System One way to realize traffic analysis with deep packet inspection is to enhance the router with special purpose hardware. Another way to implement traffic analysis functionality with deep packet inspection is to use a programmable network like FlexiNet. Such a system allows to create instances of an attack detection system dynamically within the network and the detection system itself can be multi-level. We designed such an attack detection system to detect DDoS or worm attacks by monitoring the bypassing traffic and retrieving statistical data of predefined aggregates. For each of these aggregates the system computes the average packet count and derives a threshold based on the packet count average and deviation. In normal mode, meaning that no particular attack is currently suspected, only this simple traffic analysis module is running. An excess of the aggregate’s threshold indicates an ongoing attack within this aggregate. In this case specialized modules are loaded to analyze the suspicious traffic. These modules can apply various anomaly or signature based tests on the packet stream to verify the attack hypothesis. As mentioned before the traffic analysis module can not deeply inspect all packets flowing through the system in a backbone network due to resource limitations. Therefore we use a packet selection mechanism [17] to reduce the number of packets requiring deep packet inspection. Based on the statistical data of such samples the same procedures can be applied to find hints on an ongoing attack if the estimation error in each sampling interval remains small. In this paper we will detail on packet selectors in section 2, present an evaluation considering their suitability for an attack detection system and compare the suitable packet selectors with regard to their estimation accuracy. 1.2 Extending the NodeOS Specification The FlexiNet platform was designed according to the NodeOS specification [1] that is, the standard architecture for programmable networks. NodeOS specifies that packet filters for the incoming channel inChan must conform to the IPFIX flow definition [14]. This definition allows filtering of packets dependent on the packet’s content but it prohibits the implementation of many packet selectors for the incoming channel. The only
An Extension to Packet Filtering of Programmable Networks
123
way to implement a packet selector according to the NodeOS specification is to forward all packets to the execution environment, apply the packet selector there and further process only selected packets. This introduces a high overhead to the systems. This overhead can be reduced if the NodeOS specification gets extended to allow applying packet selectors for the incoming channel. Therefore we propose to extend the NodeOS specification to allow packet filtering according either to the IPFIX flow definition or to the PSAMP definition, packet sampling according to the PSAMP definition and every combination of these two definitions. The goal of the IETF working group PSAMP is to propose various packet selectors for the Internet especially with the background of traffic measurement. In contrast to other approaches, that are described in section 1.3 and use programmable networks to build an attack detection system, we base our system on a packet selector in the incoming channel. Besides building an attack detection system the enhancement of the incoming channel enables the creation of services like trajectory sampling, traffic accounting and measurement with reduced overhead. In section 3 we present implementation details and an evaluation of the most promising selectors for the FlexiNet platform, which we use to detect DDoS attacks and worm propagation in high-speed networks. The results show clearly that the usage of packet selection in the incoming channels reduces significantly the overhead on the system. Therefrom we conclude that an extension of the NodeOS architecture of programmable networks to include the presented sampling method is useful and necessary. 1.3 Related Work There are some existing approaches which design an attack detection system but they are not using packet selection. In [15] deep packet inspection of all observed packets in a backbone network is achieved by programmable network nodes built of host and network processors. The network processors in these network nodes are able to process packets at line rate so packet selection is not required but this is an expensive approach due to extra special purpose hardware. The approach in [10] does not need deep packet inspection but uses packets which are dropped on a router due to congestion to identify a suspicious aggregate of packets having a certain property and rate limit this high bandwidth aggregate. Furthermore, a pushback mechanism is proposed in which a router can ask upstream routers to control an identified aggregate. This pushback mechanism is achieved by programmable networking. Packet selection is used in various areas to infer knowledge about an observed packet stream without inspecting all packets. Two examples are the charging from sampled network usage [4] that estimates the user’s network usage on the basis of a sampled subset of packets, and the trajectory sampling for direct traffic observation [6] that uses packet filtering to determine the path through a network of a subset of packets. A performance study of some packet selectors is presented in [3]. The paper compares packet-triggered with time-triggered methods and analyzes the differences between systematic and random selectors. Filtering schemes and probabilistic selectors are not taken into account. The fact that the packet rate of an observed packet stream is not constant over time is addressed in [2]. This paper proposes the usage of an adaptive sampling probability to restrict the sampling error to a predefined tolerance level. This
124
M. Sch¨oller et al.
paper does not try to find the optimal choice of a packet selector but addresses a useful optimization in case an optimal packet selector has been chosen.
2 Packet Selectors The IETF PSAMP working group defined two types of packet selectors: filtering and sampling [17], [5]. Filtering is used if only a particular subset of packets is of interest. Filtering schemes are always deterministic and are based on packet content or router state. In contrast to filtering, sampling is used to infer knowledge about an observed packet stream without inspecting all packets. Therefore only a representative subset of packets is selected which enables an estimation of properties of the unsampled traffic. Sampling methods are either nondeterministic or do not depend on packet content or router state. The sampling methods are further grouped into two categories: random sampling and systematic sampling. First a very brief summary of the filtering schemes and sampling methods is given. A rationale which of the presented methods are suitable for an attack detection system is presented in section 2.3. This subset of methods is examined in section 2.4 in regard to estimation accuracy. 2.1 Filtering Schemes Currently the following three filtering schemes are defined by the IETF PSAMP working group [17]: – Field match filtering—This filtering scheme is based on the IPFIX flow definition. If a specific field of an IP packet matches a predefined value the packet is selected. – Hash based selection—The content of the IP packet or a portion of it is mapped to a hash range using a hash function. A subset of this hash range is defined to be the selection range. A packet is selected if the hash of the current packet is mapped into this selection range. – Router state filtering—A packet is selected if one or more specific states of the router match predefined values. Example states of the router are: ingress interface id, egress interface id, or no route for packet found. 2.2 Sampling Methods Within the sampling methods three indeterministic sampling methods are defined and two methods which are deterministic but independent of packet content and router state. – n-out-of-N sampling—For this method n different numbers must be randomly generated in the range of 1 to N . All packets with a packet position equal to one of the n numbers are selected. This procedure has to be repeated for every interval of N packets. – Uniform probabilistic sampling—Each packet is selected with a fixed uniform probability 1/p. – Non-uniform probabilistic sampling—This method allows to weight sampling probabilities. Different fixed probabilities can be assigned to different aggregates in order to increase the probability of selecting rare packets.
An Extension to Packet Filtering of Programmable Networks
125
– Systematic time based sampling—A sampling interval is defined consisting of a selection interval and a non-selection interval. A start trigger defines the beginning of a selection interval. All packets arriving after this trigger are selected until the stop trigger fires. No packets are selected thereafter until the new sampling interval starts. After this non-selection interval a new start trigger restarts the method. The unit of the intervals is time based. – Systematic count based sampling—Like systematic time based sampling a selection interval and a non-selection interval are defined. The unit of the intervals is count based. This means that n consecutive packets are selected and the next m packets are not. 2.3 Determining Suitable Packet Selectors for an Attack Detection System It is obvious that the presented filtering schemes are not suitable for an attack detection system. Any attacker who knows the filtering rules can adapt his attack in a way that his attack packets are not selected by the system. This makes bypassing of the detection system easy. Non-uniform probabilistic sampling was not taken into account because it needs a deep packet inspection since the selection probability depends on packet content. The systematic time based sampling was not considered either because the estimation accuracy varies on the number of packets during a sampling interval. Additionally the estimation accuracy drops dramatically if the number of packets during the selection interval falls below a threshold and no guarantees about the estimation accuracy can be made. To implement the n-out-of-N sampling method a list of n unique random numbers must be generated. Therefore, a random number generator must be implemented, memory to save these n numbers must be allocated, and an algorithm to detect duplicate random numbers is needed as well as a sorting algorithm. In contrast, uniform probabilistic sampling requires only a random number generator and memory to save the selection probability. Last, systematic count based sampling requires least resources of all methods. These are memory for the start trigger, the stop trigger, and the packet counter. A problem of uniform probabilistic sampling can be the different number of selected packets in consecutive intervals. This problem vanishes if enough packets during the selection interval are selected. A problem of the systematic count based sampling method is its deterministic approach. If the sampled traffic contains an inherent periodic pattern a detection might fail if the pattern always falls into the non-selection interval. In summary, we selected the following three sampling methods as interesting candidates for suitable packet selectors and investigated their estimation accuracy: n-out-of-N sampling, uniform probabilistic sampling and systematic count based sampling. 2.4 Estimation Accuracy We compared the suitable packet selectors described in section 2.3 with regard to estimation accuracy. The examination was carried out with an empirically determined sampling probability of 30% which produced acceptable deviations with the used network traces and interval lengths. One packet selector was additionally investigated with
126
M. Sch¨oller et al.
sampling probabilities of 20% and 40% to collect some comparative values. The examination used the following different configurations and parameter sets: i. ii. iii. iv. v. vi.
30-out-of-100 sampling 300-out-of-1000 sampling Uniform probabilistic sampling with a sampling probability of 20% Uniform probabilistic sampling with a sampling probability of 30% Uniform probabilistic sampling with a sampling probability of 40% Systematic count based sampling with a selection interval of 3 packets and a nonselection interval of 7 packets vii. Systematic count based sampling with a selection interval of 30 packets and a nonselection interval of 70 packets We applied some network traces which originated from the NLANR passive measurement and analysis project [11] to these packet selectors. Therefore, the observed packet stream was divided into intervals with a fixed length and the observed number of packets per interval was examined. The suitable packet selectors were used to infer knowledge about different aggregates like TCP packets, UDP packets etc. without inspecting all packets. The used network traces had a packet rate of about 20 000 packets per second and a duration of about 90 seconds. To make the examined packet selectors comparable the average X of the number of packets per interval of one network trace over all observed intervals was calculated for every aggregate. Afterwards the deviation between the original trace, that is when considering all packets, and the sampling run was computed. n 1 deviation = (Yi − Xi )2 (1) n − 1 i=1 Yi represents the estimated number of packets in interval i of the sampling run, Xi counts the number of packets in interval i of the original trace and n counts the number of observed intervals. To get unbiased results 10 sampling runs were carried out per examined packet selector and the average of these 10 runs was calculated. At the end the resulting value was correlated with the calculated average X of the original trace 10
deviationrel =
1 1 ( deviationi ) X 10 i=1
(2)
to derive a relative deviation. These relative deviations were used in the following examination to compare the different suitable packet selectors. Table 1 lists the relative deviations of all examined packet selectors for chosen aggregates and interval lengths. In the first examination an interval length of 5 seconds which corresponds to about 100 000 packets per interval was used. In this scenario we were able to show that with a sampling probability of 30% low bandwidth aggregates like the ICMP aggregate (ICMP 2), which are the worst case in sampling scenarios, only have a relative deviation of about 5% from the original trace’s values. High bandwidth aggregates like TCP packets (TCP 2) have an even lower relative deviation of under 1% which is an excellent estimation accuracy. In case of a lower interval length of 0.5 seconds, which corresponds to about 10 000 packets per interval, the relative deviation of
An Extension to Packet Filtering of Programmable Networks
127
Systematic 30/70
Systematic 3/7
Uniform 40%
Uniform 30%
Uniform 20%
300-out-of-1000
30-out-of-100
Average
Table 1. Relative deviations of all examined packet selectors from original traces for chosen aggregates and interval lengths
Interval length: 0.5 seconds ICMP 1 87.9 16.25% 16.8% 21.81% 16.47% 13.5% 16.85% 16.11% UDP 1 1041.95 4.7% 4.47% 6.15% 4.77% 3.8% 4.6% 4.47% TCP 1 9 343.11 1.03% 0.61% 2.11% 1.54% 1.27% 0.51% 0.7% Interval length: 5 seconds ICMP 2 890.82 5.29% 5.11% 6.84% UDP 2 10 451.29 1.68% 1.53% 2.15% TCP 2 93 423.88 0.9% 0.2% 0.69%
5.04% 4.39% 4.09% 1.52% 1.27% 1.58% 0.49% 0.42% 0.19%
4.0% 1.62% 0.21%
low bandwidth aggregates degrades to a value of about 16% (ICMP 1) which we found to be acceptable. In this case the relative deviation of the TCP aggregate is about 1% (TCP 1). Using a packet selector with a sampling probability of just 20% (column iii.) the relative deviation degrades in case of the ICMP aggregate to a value over 21% which we did not find acceptable for our attack detection system. With an interval length of 5 seconds the relative deviation of this packet selector would be acceptable, too. The values in table 1 and the previous results showed that the estimation accuracy can be improved by enlarging the interval length if the packet rate remains constant. But with an attack detection system in mind this surely is not a feasible solution since such a system has to choose the interval length according to detection needs instead of sampling accuracy needs. In our examination we could also show that a packet selector with 40% sampling probability (column v.) which has in all aggregates a lower relative deviation than the same selector with a sampling probability of 30% (column iv.) does not improve the estimation accuracy significantly. From these results we could conclude that a higher sampling probability does not justify the higher overhead which arises through the fact that more packets have to be inspected by the attack detection system. Table 1 also shows that for high bandwidth aggregates 300-out-of-1000 (column ii.) sampling performs slightly better than 30-out-of-100 sampling (column i.) and uniform sampling with the same sampling probability (column iv.). Systematic Count Based Sampling has a similar estimation accuracy than 300-out-of-1000 sampling. Because all suitable packet selectors have similar estimation accuracies we made our decision on the most suitable packet selector for an attack detection system depending on the required resources like memory or processor time consumption of the different sampling methods. These decision criteria resulted in the usage of systematic count based sampling for packet selection in an attack detection system since this selector needs less resources than n-out-of-N sampling and uniform probabilistic sampling as we already analyzed in section 2.3. Because for systematic count based sampling large
128
M. Sch¨oller et al.
selection intervals increase the probability of biased results we always choose the least possible selection interval. In case of a sampling probability of 30% this results in a selection interval of 3 packets and a non-selection interval of 7 packets.
3 Implementation and Evaluation To implement an attack detection system we used the programmable networking platform FlexiNet [7]. This platform is designed according to the NodeOS specification (see fig. 1a). A service module can install iptables filter [13] rules according to the IPFIX flow definition which select all matching packets in the incoming channel inChan and forward them to the FlexiNet execution environment through a netfilter callback function. Having these requirements we implemented in a first approach a NodeOS conforming FlexiNet service module which used systematic count based sampling with a selection interval of 3 packets and a non-selection interval of 7 packets to select packets from an observed packet stream. Then the attack detection system processed the selected packets and at the end of processing every packet was reinjected into normal packet processing through netfilter. The problem with this approach according to the NodeOS specification is that despite of a packet selector all packets of the observed packet stream have to pass through the FlexiNet execution environment since filter rules based on the IPFIX flow definition do not enable packet selection within iptables.
Execution Environment
Execution Environment
Packet processing
Packet processing
inChan
outChan
packet filter
inChan
packet filter NodeOS
(a) Packet filter for inChan
outChan
packet sampling NodeOS
(b) Packet filter and packet selector for inChan
Fig. 1. Proposed extension to the NodeOS reference architecture
A second approach (see fig. 1b) takes the aforementioned problem into account and changes the standard architecture for programmable networks in such a way that packet selection is already possible in the incoming channel inChan, that is, before the packets are forwarded to the FlexiNet execution environment. Therefore, the iptables target implementing systematic count based sampling was enabled to apply packet selection instead of forwarding all packets to the FlexiNet execution environment. A copy of every selected packet is queued for later processing while the packet is forwarded normally preserving the overall packet ordering. Packets which are not selected no longer have to pass through the FlexiNet execution environment which results in a significant performance improvement.
An Extension to Packet Filtering of Programmable Networks
129
3.1 Evaluation
3000
Processing time [in 1000 processor tics]
Processing time [in 1000 processor tics]
With the implementation of the two packet selection approaches described in the former section we measured the time that was required to process a single packet of the observed packet stream. The processing time starts with the check if the observed packet matches the iptables filter rules of the service module and ends with the drop of the copy forwarded to the execution environment. If the packet was not forwarded to the FlexiNet execution environment the processing time ends after the iptables check. The processing time was measured in processor tics by reading out a CPU register through an available C-function. The evaluation was executed on a 2.4 GHz machine so 1 000 processor tics are up to about 0.42 µs and 1 ms is up to about 240 000 processor tics, respectively.
2500 2000 1500 1000 500 0
3000 2500 2000 1500 1000 500 0
0
500
1000 Packet index
1500
(a) Packet selection by ee
2000
0
500
1000 Packet index
1500
2000
(b) Packet selection by iptables
Fig. 2. Comparison between processing time of an attack detection system with sampling according to the standard architecture (a) and to an extended architecture (b)
Figure 2(a) shows the processing time needed in case of systematic count based sampling as part of a FlexiNet service module according to the NodeOS reference architecture. In this case packets which are not selected by the sampling method nevertheless have to pass through the FlexiNet execution environment. Figure 2(b) shows the processing time needed in the second approach which implements systematic count based sampling as part of the inChan processing. Because of this change of the standard architecture only packets which are selected by the packet selector have to be forwarded to the FlexiNet execution environment. Packets which are not selected by the packet selector can immediately be reinjected into the normal IP packet processing. This behavior results in a significant lower processing time for packets which are not selected by the packet selector in comparison to the processing times in the first approach. This causes the visible gaps between the selection periods in figure 2(b). Selected packets still need a processing time similar to that of the first approach. The comparison between the minimal processing times of packets which are not selected in the standard architecture of about 250 000 processor tics and in the extended architecture of about 750 processor tics clearly shows that packet selection is a feasible possibility to apply an attack detection system also in backbone networks. This holds
130
M. Sch¨oller et al.
since the estimation accuracy of sampling methods is good enough and an extended architecture can save significant processing time for packets which are not selected. The so far used sampling probability of 30% is still quite large for backbone networks but was chosen due to the pretty low packet rate and duration of the analyzed network trace. In case of the much higher packet rates in backbone networks it is obvious that the sampling probability can be decreased without getting worse estimation accuracy if the interval length remains constant. If we observe for example a packet stream with a packet rate of 500 000 packets per second and use an interval length of 0.5 second we could get approximately the same deviations with a sampling probability of 0.6% than in section 2.4 with a sampling probability of 30% and an interval length of 0.5 seconds.
4 Summary In this paper we presented various packet selection methods proposed by the IETF PSAMP working group and reasoned which of these are suitable to build an attack detection system in high speed networks. Three sampling methods were considered as suitable and were compared with respect to their estimation accuracy. Since the estimation accuracy of all three methods was similar we preferred the sampling method with fewest resource requirements like CPU and memory. Further we argued that the current NodeOS specification lacks the possibility to implement packet selection in the incoming channel domain. This introduces unnecessary overhead to the system. In our opinion the NodeOS specification should thus be extended to include sampling methods in the incoming channel domain. This extension allows to build active and programmable networks for attack detection systems as well as for traffic measurement or other applications of packet selection. We further described such an extended programmable network and showed implementation results proving the advantages of our proposal.
References 1. Active Networking NodeOS Working Group. NodeOS Interface Specification (January 2002), http://www.lancs.ac.uk/postgrad/bourakis/papers/an_node_.pdf 2. Choi, B.-Y., Park, J., Zhang, Z.-L.: Adaptive random sampling for load change detection. In: SIGMETRICS 2002: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 272–273. ACM Press, New York (2002) 3. Claffy, K.C., Polyzos, G.C., Braun, H.-W.: Application of sampling methodologies to network traffic characterization. SIGCOMM Comput. Commun. Rev. 23(4), 194–203 (1993) 4. Duffield, N., Lund, C., Thorup, M.: Charging from sampled network usage. In: IMW 2001: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pp. 245–256. ACM Press, New York (2001) 5. Duffield, N.G.: A framework for packet selection and reporting. Internet Draft, draft-ietfpsamp-framework-10.txt, Work in Progress, Internet Engineering Task Force (January 2005) 6. Duffield, N.G., Grossglauser, M.: Trajectory sampling for direct traffic observation. In: Proceedings of SIGCOMM, pp. 271–282 (2000) 7. Fuhrmann, T., Harbaum, T., Sch¨oller, M., Zitterbart, M.: AMnet 3.0 source code distribution, http://www.flexinet.de
An Extension to Packet Filtering of Programmable Networks
131
8. Garber, L.: Denial-of-service attacks rip the internet. Computer 33(4), 12–17 (2000) 9. Hussain, A., Heidemann, J., Papadopoulos, C.: A framework for classifying denial of service attacks-extended. Technical Report ISI-TR-2003-569b, USC/Information Sciences Institute (June 2003) (Original TR, February 2003, updated June 2003) 10. Mahajan, R., Bellovin, S., Floyd, S., Vern, J., Scott, P.: Controlling high bandwidth aggregates in the network (2001) 11. N. Measurement and N. A. Group, http://pma.nlanr.net 12. Moore, D., Shannon, C., Claffy, K.C.: Code-red: a case study on the spread and victims of an internet worm. In: Internet Measurement Workshop, pp. 273–284 (2002) 13. T. netfilter/iptables project, http://www.iptables.org 14. Quittek, J., Bryant, S., Claise, B., Meyer, J.: Information model for ip flow information export. Internet Draft, draft-ietf-ipfix-info-07.txt, Work in Progress, Internet Engineering Task Force (May 2005) 15. Ruf, L., Wagner, A., Farkas, K., Plattner, B.: A Detection And Filter System for Use Against Large-Scale DDoS Attacks In the Internet-Backbone. In: Minden, G.J., Calvert, K.L., Solarski, M., Yamamoto, M. (eds.) IWAN 2004. LNCS, vol. 3912, pp. 169–187. Springer, Heidelberg (2007) 16. Shannon, C., Moore, D.: The spread of the witty worm. IEEE Security and Privacy 2(4), 46–50 (2004) 17. Zseby, T., Molina, M., Raspall, F., Duffield, N.G.: Sampling and filtering techniques for ip packet selection. Internet Draft, draft-ietf-psamp-sample-tech-07.txt, Work in Progress, Internet Engineering Task Force (July 2005)
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service M. Sifalakis, A. Mauthe, and D. Hutchison Computing Department, Lancaster University, LA1 4WA, U.K. {mjs,andreas,dh}@comp.lancs.ac.uk
Abstract. In the past a significant amount of work has been invested on architecting active node platforms that solve problems in various application areas by means of programmability. Yet, much less attention has been paid to the deployment aspects of these platforms in real networks. An open issue in particular is how active resources can be discovered and deployed. In this paper we present SAND, a scalable distributed and dynamic architecture that enables the discovery of active resources along and alongside a given network path. One of the main strengths of SAND is its customizability which renders it suitable to a multitude of network environments. As an active service, SAND does not have dependencies on any active platform and at the same time enables an active node to become part of a global infrastructure of discoverable active resources.
1 Introduction Probably the most important application area for active and programmable networks and fundamental requirement of future networks is the ability to deliver flexible and customisable network infrastructures. Research in active and programmable networks over the last decade has led to new developments in a number of areas ranging from secure programming languages [1,2], to mobile code techniques [3], execution environments [3,4], active node platforms [5,6,7,8,9], service composition models [9,10,11], etc. However, in spite of the unquestionable value of these advances, the equally important aspect of their real world deployment has received significantly less attention. This often provides fertile grounds to criticism and scepticism regarding the practical usability of active and programmable networks. Aspects such as the discovery of active resources, interoperability and interfacing of heterogeneous platforms (adhering to different programming paradigms), cooperation of different platforms at the control and signalling level, and so on, have been neglected for years. This paper addresses the issue of service/function discovery in a network. It introduces SAND (Scalable Active Networks Directory), a distributed, scalable, and dynamic architecture that enables discovery and browsing of active resources1 (subject to administrative policies) along or close to a network path, or within a given network neighbourhood. The two fundamental design goals of SAND are customisability 1
In SAND, “active resource” refers to any type of resource related to network programmability such as software (active services, execution environments, NodeOS), hardware (memory and CPU capacity), or policies conditioning the use of an active node.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 132–144, 2009. © IFIP International Federation for Information Processing 2009
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
133
which renders it suitable for a broad range of network environment (sensor networks, manets, mobile networks, fixed infrastructures, and GRIDs), and its cluster-based architecture which promotes scalability for different network sizes. The remainder of this paper is organised as follows: Section 2 examines the requirements of such a service in an autonomic network environment. In section 3 the SAND architecture is presented and in section 4 the deployment and operational aspects of the service are considered. Finally, the paper is concluded in section 5, discussing related work and also providing an outlook. In this work we have focused on the design aspects of the SAND service in order to deliver the required functionality. An implementation of SAND is underway along with a simulated testbed that will allow us to evaluate its performance. The results will be published in future work.
2 Requirements Analysis In programmable router platforms as well as most capsule-based active network architectures an active node must be able to dynamically find and load/fetch active services from the network (if the service code is not already available locally). This reveals the critical need for service (whole program) or function (library code) discovery. In order to comprehend the type of functionality a resource discovery service should provide the needs and expectations from such a service have to be understood. Let P = a vector, of a set of network locations where a user needs active networks support. This may be expressed as a set of IP addresses (assuming a uniform network transport), a sequence of autonomous system (AS) numbers identifying locations as administrative domains, a set of coordinates in the NPS [12] system, or a sequence of abstract namespace/object identifiers (as in OO languages), etc. Given vector P, the ultimate goal of a client is to obtain the available active resources along P. A simple but naïve interface between the client and the SAND service could be a function of the type: ResourceList FindResources (),
(1)
where ResourceList is a list of the available resources along P along with information about the hosting nodes of the resources. However, such an ill-defined interface is not very functional creating the following issues: • •
The amount of information in ResourceList is proportional to the dimension k of vector P, consuming the end device’s time and resources for filtering. The queried active nodes queried along P will have to generate larger amounts of data than what is actually needed, thus wasting useful network resources.
A better approach is to use a more expressive interface, which allows for more accurate and fine grained description of the information requested: ResourceList FindResources (
, ),
(2)
The FilterSpec parameter corresponds to a filter description that (a) imposes constraints with regard to the type of information requested and (b) permits a metricassisted ranking of the returned information. The objective here is that only a reduced subset of active nodes in ai needs to respond to a query and the amount of information returned is minimised and ranked by relevance to the query. On the other hand very
134
M. Sifalakis, A. Mauthe, and D. Hutchison
often the active resources will be situated close to a specific data path rather than along it, thus necessitating the re-routing of the data path through them in order to perform active processing. This manifests the ability to discover resources that are within an acceptable proximity from to the data path. Moreover, knowing “how far”, in network distance, from the actual data path the specific active resources lie, will allow us to assess the cost of a rerouting decision (number of hops, delay, congestion, etc). As a result it should be possible to support proximity metrics (hop count, path delay, etc) as heuristics to specify the range around a data path D where available active resources should be solicited. This can easily facilitated in the aforementioned interface as part of the FilterSpec parameter. E.g.: ResourceList FindResources (, FilterSpec (ResDistanceHops() = 5)), Note also that it is possible to facilitate discovery in a “neighbourhood” instead of along a path if P consists of only one entry, the current locus. On another subject information exchange between diverse and potentially noncompatible network transports has to be possible. Although addressing nodes between custom networks might be possible (using some sort of universal naming scheme), direct communication between them will not be feasible without “speaking” a common network transport protocol, or (dynamically) employing a protocol translator or adaptation layer. This condition can easily lead to a deadlock situation if the discovery process has to take place atomically along the complete end-to-path before deploying services (as is the case with other approaches). The SAND architecture needs to be flexible enough to allow the interplay between discovery and deployment steps, so that the discovery process completes “gradually as a path is opened” towards a destination, by deploying appropriate service elements. Although the cost (delay, overhead) of such a process seems high there will often be cases where the feasibility defies the trade-off, thus justifying such a design. In summary:
If the discovery process cannot be completed atomically for P, it should not fail. The service nodes should support query chaining whereby they are able to (recursively) proxy queries across different network transports. The query protocol supported by our architecture must be transport independent.
3 The SAND Architecture SAND2 is an active service that enables an active node to participate in a scalable distributed and dynamic directory-lookup service, storing information about resources, services, policies and access control information (to govern the way the information is used and distributed). Being an active service itself it is generic and does not impose any special requirements on the platforms it is deployed on. The SAND front-end interface is based on LDAP [14], which provides a simple, flexible and lightweight information access mechanism. LDAP allows easy interoperability and cooperation between heavily customised instances of SAND. The standard interface conforms to the one described earlier using two interchangeable APIs. Filter descriptions can be used to optimise a search. From a client perspective SAND is an opaque and ubiquitous service that can be used to discover active resources in the environment. 2
Due to space limitation several details are omitted. A detailed description is available in [13].
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
135
Fig. 1. The main concepts in SAND and their organisation in the Network
From the viewpoint of an active node SAND is a peer-based system, which it may join and register its resources with. Each participating node undertakes an “equal” role in the maintenance of the directory and is responsible of responding to client queries on behalf of the whole directory. This makes the service fault-tolerant and more resilient in episodic environments. 3.1 The SAND Framework A SAND node n ∈ S (S the set of all SAND nodes), is any active node that runs the SAND service. Specifically such a node is one that has the following capabilities:
Support of join/leave/peering functions in the SAND system and advertise its resources and services through it. Provides the client SAND APIs for answering client requests. Maintains a directory for its active resources, services and management/access policies. Maintenance of (unique) identification information and a set of hash functions to can generate a set of keys. These keys can be used to address the node. Support of an API for integrating information indexing functions and ability to communicate the index information to other SAND nodes.
These capabilities provide a SAND node with 2 very important properties: (a) Identification within the SAND system (and potentially beyond) which makes it discoverable and (b) Accessibility to its directory store for browsing and listing its resource and service information. These properties essentially provide the knowledge of how to reach the SAND system in a network location is and see what resources are available. In SAND, the smallest functional entity is the SAND node. However, the smallest opaque service entity is an “area” A, which refers to a group of SAND nodes coupled in a peering relationship (figure 1): A =
{ ² ni | | i | ≥ 1, ni ∈ S }
(3)
136
M. Sifalakis, A. Mauthe, and D. Hutchison
A SAND node can simultaneously be a member of more than one area, participating in each one of them independently. For an area the following postulates hold: • • • • •
Each node can communicate with every other node in the same area. This implies a uniform network (overlay) transport across the area. The area members cooperate to provide a distributed, externally opaque, directory base that stores information collectively for the whole area. Externally the area is represented by (unique) identification information and there are functions used to generate area-wide IDs from that information. There are common procedures and methods among area members for summarizing the directory stored information and communicating it to other SAND nodes. Area members that are also members of other areas may (individually or collaboratively) volunteer to proxy requests and selectively advertise summary directory information from one area to another. Such nodes are area border nodes (BN). The area border representation can be uni or bi-directional.
It follows, from definition (3), that for i = 1 a single SAND node also constitutes an area. This also complies with the postulates for a SAND node and an area. Some very interesting developments derive from this observation. First of all every SAND node is member of at least one SAND area, the self-area, and therefore it is self sufficient. Seeing it in exactly the other way round, every area can be represented as a node within another area (by means of its BNs). This creates a mirage of locally concentrated resource information about a large number of networks (running in other areas), in a scalable way and therefore hiding complexity from the servicees. If in definition (3) we replace ni with Ai , we can produce the recursive definition of a SAND hyper-area (area of areas – figure 1) Ad, of dimension d. Ad =
{ ² Aid-1 | | i |, | d-1 | ≥ 1 }
(4)
The concept of a hyper-area is very important as it describes the construction of a global SAND service recursively, starting from the basic entity, the SAND node, and using the same primitives and functions, re-occurring at different scales. The concept of an area is of key importance for the SAND service as it promotes scalability by dividing the network into independently configurable clusters, enables network and/or host mobility without impacting the stability and opaqueness of the service, and finally confines the distribution of the directory content. Another abstraction used in SAND is the SAND domain (figure 1). This is used to signify the boundaries of a network administration authority. A domain is nothing more than a set of SAND areas under the same administrative management and in that sense it can serve as a unique common identification attribute for a group of areas. 3.2 SAND, under the Hood The functionality of the service discovery mechanism can be decoupled in two abstract notions: (a) Location of required service and (b) Type of required service. A client must have sufficient flexibility in describing what resources and services it needs and where within the network. On the other hand the server side must be able to locate service points at those network locations and browse the available resources.
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
137
DIT organisation in a SAND node
Fig. 2. The SAND Architecture and structure of the DIT in the Directory store
Internally in SAND these aspects are addressed independently by two different, yet complementary, mechanisms perceived as 2 distinct layers. Figure 2 is a block diagram of the SAND architecture in an active node, showing the most important components, their interactions and their relative positions within the architecture. The DHT Layer. DHTs refer to a technology [15] that has become popular because it can provide a sophisticated and scalable (key) lookup facility, based on extensive use of hashes. In SAND we deployed DHTs as a means of looking up SAND entities. The DHT layer accommodates a set of structures that map keys to network transportspecific node addresses; thus providing a generic, abstract and scalable way for identifying, addressing and locating SAND entities, irrespective of network transport. These keys, called s-keys are somewhat different from conventional hash keys (addressed here as h-keys). They derive from, and thus relate to, the identification information of a SAND entity. If an s-key corresponds to a SAND node, a transport address of the node is stored in the key map. If the s-key corresponds to a hyper-area the transport address of one or more DNs are kept in the key map. Different sets of key maps allow a SAND node to partake in more than one area. The concept of an s-key is the fundamental primitive of customisability at this layer. In contrast to h-keys, s-keys are produced from a so called S-function, which combines a hash function H(x) and a “normalisation” function N(x): S (x) = a ⋅ N(x) ⊕ (1 - a) ⋅ H(x),
where 0 ≤ a ≤ 1
(5)
x indicates the network ID of a SAND node or hyper-area (e.g. the network address). The algebraic operation ⊕ determines how the two terms are combined to form the s-key. The coefficient a is the main amortizing factor and expresses the relative weight of the two terms in the generation of the s-key. Essentially, a determines how random or relevant to the network ID the s-key is. This is better realised through the following example: Assume IPv4 based transport, ⊕ to equal arithmetic addition, H(x) a randomising hash function that returns a random number from an IP address, and N(x) a function that masks an IP address to return its /28 netmask. Assuming x represents a continuous subset of IP addresses then in figure 3 graph (a) shows the distribution of the H-term (a = 0), and graph (b) shows the distribution of the N-term (a = 1). If a regression line would be drawn the correlation coefficient r for the
138
M. Sifalakis, A. Mauthe, and D. Hutchison
(b) s-keys for a=1
(a) skeys for a=0
0
10
20
30
40
50
60
0
10
20
(c) s-keys for a=0.3
0
10
20
30
x
30
40
50
60
50
60
x
x
40
(d) s-keys for a=0.75
50
60
0
10
20
30
40
x
Fig. 3. Relationship between s-keys and network address for different values of a
H-term would be close to 0 showing the independence of the h-keys and the corresponding IPs, whereas for the N-term, r is close to 1, showing the dependence on the IP addresses. The graphs in (c, d) show the distribution of the s-keys (S(x)) for various values of a, to imply a stronger (d) or weaker (c) relationship to the IP addresses. Therefore it is possible to customise the representation of s-keys in the DHT layer, in order to include/exclude locality information, and control the number of address resolution steps. This, combined with the flexibility to select a search algorithm allows the customisation of the DHT layer for various environments. The Directory Access Layer. On top of the DHT layer in the SAND system is the directory layer providing access to information about the active resources and services in a SAND node or area, their exact locations and their access permissions. Although it may be possible to integrate this function in the DHT layer, the notalways structured nature of the s-keys constrains the ability to aggregate and perform effective indexing. This creates scalability problems due to the amount of information that needs to be accumulated at the BNs, increasing the memory requirements and the search cost. For this reason a more structured form of identifiers is adopted for this layer, i.e. the ASN.1 object identifiers (OIDs) [16], which lends itself well to directory-based solutions (fast searching, aggregating). As a data model for the resource information the LDAP system [14] has been selected. It builds upon the notion of OIDs, and provides a lightweight, flexible and extensible solution, optimised for fast read operations. In LDAP information is organised in objects and attributes describing those objects. Objects belong to classes defining their type. An extensible schema language describes how to interpret the information related to an object class and perform operations between attribute types. Objects follow a hierarchy (imposed by the OIDs) for efficient searching, called directory information tree (DIT). To improve search performance the scope of a search operation can be limited to specific parts of the DIT through search filters.
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
139
In SAND each node maintains its own complete, locally rooted DIT which is partitioned in a number of sectors equal to the number of areas that the SAND node is a member of (figure 2). Each sector represents a sub-tree dedicated to one area, holding specific information related to that area. All sectors converge at the RootDN of the SAND node3 DIT. The sub-tree structure of each sector needs to be shared by all area members and is determined on a per area basis. This sub-tree forms the area virtual DIT and enables the SAND nodes in the area to have a common reference structure for their intra-area communications. Since all members of an area need to have a synchronised view of the virtual DIT structure, we need an exchange mechanism for the area-wide graph along with s-key information depicting the nodes responsible for each part of the DIT. As of version 3 of LDAP, each entity maintains within the directory the so called subschema information that encodes among other things its DIT structure. This enables one to access it through the standard LDAP interface. In SAND the subschema is being extended to include s-key based attributes. The exchange/refresh of the virtual DIT across the whole area is facilitated by sequenced updates to a subset of s-keys in the key-tables. The interfacing of the directory layer with the DHT layer requires the representation of a node in the LDAP data model. Easily a SAND entity (node or area) is described as a SandIdObject in the schema, storing all the node specific information. The DIT structure as well as the information exchange model (push/pull, periodic/immediate) is customisable per area. SAND Information Aggregation and Indexing. Consider the two virtual DIT structures in sector 1 and 2 in figure 2, and a SAND node that has received a query for a Java EE. If not enough information to answer the query is available locally by consulting the virtual DIT map, in case (a) it will propagate the query to node K (or respond with a referral to K). In case (b) though the process would be significantly more expensive as it would have to send the query in parallel to all active nodes, and expect to receive their responses. If however, some index information was available about the actual contents of the area in the above case (b), it would enable the SAND node to know that only L has a Java EE and so answer the query itself or have hints that only node L might have a Java EE and so prune and propagate the query only to L. In either case the amount of consumed resources would be significantly less. SAND considers an optional indexing framework customisable to match different configurations trading-off memory for search efficiency. The Index datasets, which refer to record-sets of indexed data (“hints” depicting where more explicit information can be found) are stored within a node-local directory as directory objects. To a certain degree the effectiveness of the indexing mechanism relies entirely on the ability to produce small and fast searchable datasets. The Index aggregation mechanism, i.e. the process of combining index records in a dataset is a two step operation on a dataset D: (a) collecting index information and (b) reducing it: More important is the Reduce operation which has two types (and therefore two types of aggregation), namely the implicit and the explicit (algebraic) reduction. In implicit aggregation the Reduce operation is defined as the intersection of the indexes in a dataset D: 3
Note the differentiation between a “SAND node” which refers to the hosting device and the “DIT node” which refers to a location in the DIT.
140
M. Sifalakis, A. Mauthe, and D. Hutchison
If D = {I1, I2, … Id},
d
IReduce (D) = ³i Ii
(6)
Practically this means that the reduced index will have as an OID the common prefix of the OIDs of all the indexes in the dataset, and as attributes only the common attributes types among all the indexes in the dataset (an example is given in [13]). On the other hand in explicit aggregation (EReduce) one can define any custom algebraic operation to modify the indexes in a dataset. Practically this may involve combining indexes, replacing an index with another, changing OID type, etc. Occasionally an explicit aggregation operation will simply modify the information in an index so as to make it implicitly reducible. Functions that perform explicit aggregation can be loaded at run-time as dynamic library plugins. Information on how the index can be aggregated with other indexes in a dataset is provided through the eReduce and iReduce attributes of the IndexObject class and may need to be area wide or area specific. Finally for the index exchange one can use the standard LDAP interface as the datasets reside within the directory. However it is more efficient if this process occurs at a lower priority than actual query servicing, and so SAND uses the Common Index Protocol [17], which is a generic mechanism for establishing indexing topologies.
4 Deployment and Operation 4.1 Start Up and Initialisation Once an active node has the SAND framework installed and initialised it will typically try to do one or more of the following (depending on the policy configuration): a)
Generate s-keys from its identification information for every area it participates (defined in the bootstrap configuration). This will involve at least the self-area. b) Build the DHT structures in the bottom layer for every area it is a member of. c) Construct the DIT and populate the directory with local resource information. d) If indexing is enabled, initialise its local index structures, datasets, and plugins. e) Start “beaconing” for the areas it is a member of, so as to announce its existence and invite other nodes to join. f) Start listening for join invitations from other areas. The mechanisms for performing the last two steps are outside of the scope of this work since semantically they are aspects related to self-association mechanisms. 4.2 Join and Leave Operations When a SAND node in listen mode receives a “beacon” about the existence of an area it needs to do the following: (a) decide if it “wants” to join and (b) find out if it is allowed to join. Both are subject to bilateral policies. First there is need for a negotiation/authentication phase, during which a set of preconditions have to be checked. To comply with the overall philosophy, in the current prototype these preconditions are expressed a set of attributes in a JoinInfoObject object, at a fixed location in the DIT. The joining node can contact the beacon node using the standard LDAP interface to query and check the attributes of interest in this object and decide if it complies with its local policies for joining. The beaconing node can optionally do the same.
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
141
Once the joining node has decided to proceed it issues a join request at the DHT level, using the standard join method of whatever DHT system is used. At this point it is possible for the area node to reject the join request. If the newcomer completes the join successfully, this means that it has already initialised its DHT layer data structures and has a valid s-key. In order to participate in the SAND area at the directory level, it initially undergoes a quality assessment period during which it operates simply as a replicating node. During this process a supervisor node (typically the one that sent the “invite” beacons) is responsible to “keep an eye” and assess the new member by issuing occasionally requests as a client in order to verify the validity of its contents. After the assessment period is over (decided by the supervising area node), and depending on the area specific configuration, the new member may be assigned responsibility for a specific part of the DIT or remain a replicator. If a node wants to gracefully leave the area, it deregisters its resources from the directory, removes its s-key from the virtual DIT and triggers an update, and finally informs accordingly its peers at the DHT level. Finally, if a node is likely to become a BN or a DN the following actions are taken:
As a BN, it establishes an index topology for the area’s contents Generates and advertises an area s-key so as to respond on behalf of the area, to external requests. As a DN, it builds domain ID information so as to be able to respond to queries originating outside the domain.
4.3 Service Operation When an entity wants to find an active resource it issues a request to its nearest SAND point through the abstract interface described in (2) or otherwise using the standard LDAP interface. The request is then decoupled in three parts: (a) where service must be provided, (b) what service needs to be found, and (c) heuristics for making the request more or less specific. Each of the first two questions essentially produces a part of a unique request identifier: Request ID ≡ location-ID │ resource-ID Location- ID ≡ s-key = S ( location ) Resource- ID ≡ resource OID = ASN.1 (resource) The location-ID (s-key) is resolved in the DHT (bottom) layer of the SAND architecture by contacting the SAND node responsible for the hyper-area and domain that corresponds to that s-key. This node, being part of that location-ID namespace can now produce the correct resource-ID (OID) and perform an internal query for it at the directory level. The resource-ID is looked up in the directory (upper) layer. One can see that for scalability and flexibility the resource-ID has local scope only within the location-ID namespace and therefore is late-bound to an OID only after the locationID has been resolved. If there is not enough information available locally, there should be enough index information so as to know whom to contact next and so a recursion of s-key and OID lookup steps will follow within the hyper-area until the information can be obtained. The search will “halt” either when the information has been found, or if there is not enough information to continue the lookup process.
142
M. Sifalakis, A. Mauthe, and D. Hutchison
At every lookup step of the search process the server side might respond with a referral to a new location where the search should continue or act as a proxy in chain mode and perform the next search on behalf of the servicee.
5 Related Work and Outlook This paper presented the design of a service discovery architecture called SAND. It provides a distributed and dynamic architecture that lends itself well to non-uniform and multi-transport network environments. It enables the discovery of active resources along or alongside a network path or within a given network neighbourhood. The main strengths of SAND are its customisability, and its cluster-based architecture which promotes scalability for different network sizes. So far there has not been much work in the area of active resource discovery in heterogeneous networks. None of the approaches we are aware of considers a resource discovery infrastructure that is independent of, or customizable to diverse network environments. Moreover they have limited flexibility as they restrict the discovery process to be completed “atomically” before service deployment. SAND on the other hand is very flexible since:
The discovery process can precede and succeed partial deployment of services. It can facilitate location discovery by using abstract and fuzzy expressions of a network location which can be independent of a specific network transport.
Service discovery mechanisms such as those advocated in [19, 20, 21] are not suitable for the application environments we consider because they are either non-generic for any network, or non-scalable (assume deployment of multicast), or else fairly static (rely on the existence of centralised registries in fixed locations). In [22] the authors consider a more dynamic approach, which enables the discovery of active nodes that are not in the data path. However it relies on other external services such as DNS to map consistently in the domain namespace, the network domain topology, thus assuming that proximity in the network is reflected in the domain names. An approach which has inspired SAND in some aspects, is presented in [23]. The HIGS algorithm combines the discovery and deployment process in one, architecture. In HIGS, SAND could be providing the mechanism for performing the solicitation and summarisation steps. The authors in [24] provide a mechanism for discovering active nodes in close vicinity. Although not a resource discovery mechanism per se, this work can be used in conjunction with SAND as a self-association mechanism for discovering SAND areas and nodes in close range. In [25] the authors devised a sophisticated approach of optimally selecting service points in a network between two end nodes. Although the deployment of composite services in the network is the focus of this work, they do not address the discovery aspect, thus assuming an external mechanism like SAND to provide a number of available service nodes. SAND is currently being implemented and refined. On-going work based on simulation aims to compare the performance of SAND against a fully DHT based solution. Future work will study more extensively the various aspects of unattended creation of
SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service
143
index topologies in order to further improve its customisability. We envision the proposed architecture to provide a sustainable solution for autonomic infrastructures. Acknowledgements. Special thanks to Stefan Schmid (NEC labs, Europe), for the insightful discussions on the topic.
References [1] [2] [3] [4]
[5]
[6]
[7] [8]
[9] [10]
[11]
[12] [13]
[14] [15] [16] [17]
Wakeman, I., Jeffrey, A., Owen, T., Pepper, D.: SafetyNet: A Language-Based Approach to Programmable Networks. Computer Networks and ISDN Systems 36(1) (2001) The Caml Language. Online Reference, INRIA, http://caml.inria.fr/ Wetherall, D.J., Guttag, J., Tennenhouse, T.L.: ANTS: A toolkit for building and dynamically deploying network protocols. Proc. of IEEE Openarch (April 1998) Hicks, M.W., Kaddar, P., Moore, J.T., Gunter, C.A., Nettles, S.: PLAN: A Packet Language for Active Networks. In: Proceedings of the 3rd ACM SIGPLAN International Conference on Functional Programming, pp. 86–93 (1998) Paterson, L., Gottlieb, Y., Hibler, M., Tullmann, P., Lepreau, J., Schwab, S., Dandelkar, H., Purtell, A., Hartman, J.: An OS Interface for Active Routers. IEEE Journal on Selected Areas in Communications 19(3), 473–487 (2001) Merugu, S., Bhattacharjee, S., Zegura, E., Calvert, K.: Bowman: A Node OS for Active Networks. In: Proceedings of IEEE INFOCOMM 2000, Tel Aviv, Israel, March 26-30 (2000) Keller, R., et al.: An Active Router Architecture for Multicast Video Distribution. In: Proc. of IEEE INFOCOM, vol. (3), pp. 1137–1146 (2000) Keller, R., et al.: PromethOS: A dynamically extensible router architecture supporting explicit routing. In: Sterbenz, J.P.G., Takada, O., Tschudin, C.F., Plattner, B. (eds.) IWAN 2002. LNCS, vol. 2546, pp. 20–31. Springer, Heidelberg (2002) Schmid, S., Finney, J., Scott, A.C., Shepherd, W.D.: Component-based Active Network Architecture. In: IEEE Symposium on Computers and Communications (July 2001) Merugu, S., Bhattacharjee, S., Chae, Y., Sanders, M., Calvert, K., Zegura, E.: Bowman and CANEs: Implementation of an Active Network. In: Proc. of 37th Conference on Communication, Control and Computing (September 1999) Bossardt, M., Antik, R.H., Moser, A., Plattner, B.: Chameleon: Realising Automatic Service Composition for Extensible Active Routers. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982. Springer, Heidelberg (2004) Eugene Ng, T.S., Zhang, H.: A Network Positioning System for the Interne. In: USENIX Annual Technical Conference (2004) Sifalakis, M., Mauthe, A., Hutchison, D.: SAND: A Scalable, Distributed and Dynamic Active Networks Directory Service. Technical Report TR-COMP-008-2005, Lancaster University (July 2005) Wahl, M., Howes, T., Kille, S.: Lightweight Directory Access Protocol (v3). RFC 2251 (December 1997) Plaxton, C.G.: On the network complexity of selection. In: Proc. of Annual Symposium on Foundations of Computer Science (October 1989) Abstract Syntax Notation One (ASN.1) and ASN.1 Encoding Rules. ITU-T Rec. X.680–683 and X.690–693 (2002) Allen, J., Mealling, M.: The Architecture of the Common Indexing Protocol (CIP). RFC 2651 (August 1999)
144
M. Sifalakis, A. Mauthe, and D. Hutchison
[18]
Sifalakis, M., Schmid, S., Chart, T., Hutchison, D.: A Generic Active Service Deployment Protocol. In: Proc. of ANTA 2003 (May 2003) Veizades, J., Guttman, E., Perkins, C., Kaplan, S.: Service Location Protocol. RFC 2165 (June 1997) Microsoft corporation. Windows Server 2003 Active Directory (2003) Gulbrandsen, A., Vixie, P., Esibov, L.: A DNS RR for specifying the location of services (DNS SRV). RFC 2782 (February 2000) Karrer, R., Gross, T.: Location Selection for Active Services. Cluster Comp.: Journal of Networks, Software and Applications (March 2002) Haas, R., Droz, P., Stiller, B.: Autonomic service deployment in networks. IBM Systems Journal 42(1), 150–164 (2003) Martin, S., Leduc, G.: Dynamic Neighbourhood Discovery Protocol for Active Overlay Networks. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982. Springer, Heidelberg (2004) Keller, R., et al.: Active Pipes: Service Composition for Programmable Networks. In: Proc. of IEEE MILCOM 2001 (2001)
[19] [20] [21] [22] [23] [24]
[25]
A Programmable Structured Peer-to-Peer Overlay Marius Portmann1, Sébastien Ardon2, and Patrick Sénac2 1
School of Information Technology and Electrical Engineering, University of Queensland, Brisbane QLD 4064, Australia [email protected] 2 DMI / ENSICA 1 place Emile Blouin, 31000 Toulouse, France {sardon,senac}@ensica.fr
Abstract. Structured peer-to-peer (P2P) overlay are scalable, robust and selforganizing in nature, and provide a promising platform for a range of largescale distributed applications. Applications proposed to date utilize a similar key-based routing service but “re-invent the wheel” by deploying their own dedicated structured P2P overlay network. This is highly inefficient and results in a significant duplication of work in terms of development, deployment and maintenance of the overlays. To address this problem, we propose a PROgrammable STructured P2P infrastructure (PROST), which allows the dynamic and incremental deployment of multiple applications over a single structured P2P overlay. In this paper, we outline the PROST architecture and discuss the implementation of our prototype. Keywords: Programmable networks, peer-to-peer.
1 Introduction In recent times, structured peer-to-peer (P2P) overlay networks have attracted a lot of attention in the research community. These systems are also referred to as Distributed Hash Tables (DHT), since Hash Tables are a common service abstraction implemented by structured P2P overlays. All the proposed structured P2P systems (e.g. [1], [2], [3]) share the features of an efficient lookup mechanism, fault-tolerance, scalability and a self-organizing nature. These characteristics make structured P2P systems an ideal building block for a wide range of large-scale distributed applications. All currently proposed applications using structured P2P overlay are tightly integrated with their own implementation of a dedicated structured P2P overlay network. Obviously, this results in an undesirable duplication of effort, and more importantly, this result in a higher cost of operation, in particular in terms of network traffic as every application with its own P2P overlay separately incurs the cost involved in deploying and maintaining it. To address this problem, we present a PROgrammable STructured P2P infrastructure (PROST), an overlay architecture, which allows multiple applications to share a common overlay network. This is achieved by allowing the dynamic and incremental deployment of applications and services onto overlay nodes. Our proposed infrastructure allows P2P applications to be run much more efficiently since the cost for D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 145–155, 2009. © IFIP International Federation for Information Processing 2009
146
M. Portmann, S. Ardon, and P. Sénac
deployment and maintenance of the overlay network is amortized over multiple applications. Finally, the development of new applications is also greatly facilitated by providing developers with a simple API to access the basic services provided by structured P2P systems. The remaining of this paper is organized as follows: Section 2 gives a brief background on structured P2P systems. In Section 3 we introduce PROST, our programmable P2P platform and outline its architecture. Section 4 discusses the dynamic deployment of applications in PROST. In Section 5 we present our proof-of-concept implementation of PROST. Section 6 and 7 discuss remaining challenges and related work, and Section 8 concludes the paper.
2 Structured P2P Overlays – Background Structured P2P overlays have recently gained popularity due to their ability to locate objects in a network very efficiently, with a cost of typically O(log N) messages exchanged, where N is the number of nodes in the overlay. This is in contrast to unstructured P2P systems, where lookup operations are based on flooding messages in the network, resulting in a cost that scales linearly in the number of nodes [20]. At the heart of every structured P2P system is a key-based routing (KBR) mechanism [8]. Every node in the overlay is assigned an identifier or nodeID from a large identifier space, typically 128 or 160 bits. Application-specific objects (files, database records, etc.) are assigned unique identifiers called keys, chosen from the same identifier space. Keys are typically assigned to an object by hashing the name of the object. Based on its key, each object is mapped to a node that is responsible for it, called its root node. The details of how this mapping is done vary for different P2P systems. In Pastry [2] for example, keys are mapped to the node with the closest nodeID. This key-to-node mapping is implemented by the KBR mechanism, through routing of lookup messages to their destination nodes, i.e. the root nodes. To achieve this efficiently, the overlay topology is tightly controlled. Each node has a small routing table with a number of carefully chosen links to peer nodes. Lookup messages are routed to their destination along these overlay links, in typically O(Log N) hops. The key-to-node mapping of the KBR mechanism forms the basis of all structured P2P systems, upon which a range of higher layer services and applications can be implemented. Figure 1 shows a layered model of structured P2P overlays as it was proposed in [8]. The Key-based routing (KBR) layer represents the greatest common denominator for all structured P2P overlay systems. As explained previously, this layer is responsible for routing messages to key’s root nodes. Built on top of it are higher layer communication abstractions, such as Distributed Object Location and Routing (DOLR), Distributed Hash Tables (DHT), and group anycast/multicast (CAST) which essentially provide different communication primitives on which to build applications. Finally, applications reside on the third layer. The applications shown in Figure 1 are distributed storage systems (CFS [7], PAST [12], OceanStore [4]), group communication/multicast systems (Scribe [9], Bayeux [10]), content distribution (Split Stream) [5] and a generic Indirection Infrastructure (I3) [6].
A Programmable Structured Peer-to-Peer Overlay
Layer 3: Applications
Layer 2: Higher Layer Abstractions
CFS
PAST
I3
Scribe
DHT
Split Stream
Bayeux
CAST
Layer 1: KBR
147
Ocean Store
DOLR
Key -based Routing Layer (KBR )
Fig. 1. A Layered Model of Structured P2P Systems
The higher layer abstractions shown at layer 2 of Figure 1 are quite diverse. Future applications will require an even wider range of functionality and service abstractions. This makes the implementation of layer 2 as a static service layer in our view difficult, if not impossible. With PROST, we propose to implement the basic KBR layer as a shared static infrastructure. Layers 2 and 3 are implemented via a programmable layer that allows dynamic and on-demand deployment of applications and services. The following sections describe the PROST architecture of our programmable structured P2P system in more detail.
3 PROST – A Programmable Structured P2P Overlay Even though all current structured P2P systems provide the same basic service of keyto-node mapping via the Key-based routing mechanism, they all export different APIs with slightly different semantics. In order to implement a generic structured P2P infrastructure that is not tied to one particular system, we also need a generic API to access the basic services of structured P2P overlays. Such an API has been proposed by Dabek et al.in [8]. The authors show that the API can easily be implemented by all current structured P2P systems and that it is rich enough to allow the implementation of all current higher layer service abstractions and applications. This makes the API an ideal building block for our proposed infrastructure. User Interface Component Peerlet
User Interface Component Peerlet
Peerlet Manager
Programmable Peer Layer
Key-based Routing Layer (KBR )
Fig. 2. PROST Node Architecture
Peerlet
148
M. Portmann, S. Ardon, and P. Sénac
Figure 2 illustrates the two-tiered architecture of a node in PROST. The KBR layer forms the basis of the infrastructure and is used to map keys onto node. Its API is similar to that proposed in [8]. On top of this KBR layer resides the Programmable Peer Layer, which comprises the functionality of Layers 2 and 3 of Figure 1. Applications and services are deployed in this layer dynamically loaded peerlets. Each of the components is discussed in more detail in the following sections. 3.1 Key-Based Routing Layer The Key-based routing layer forms the base of the PROST infrastructure, and is used to map keys onto nodes. This is implemented via the API’s route() method, which delivers a message to the node responsible for a given key (root node). To enable multiple applications to share a common KBR layer, every message sent over the overlay network needs to include an application identifier (app id), allowing the demultiplexing of messages and their delivery to the appropriate application, i.e. peerlet, similarly to the port number in transport protocols. In addition, the route() API primitive can take a first node ID as a parameter. This mechanism allows the application to bypass the default KBR routing process and specify the first-hop to use when sending a message (useful for some application-level multicast applications). The application identifier and application type parameters are necessary for the correct handling of messages and are therefore included in the header of each message routed by the KBR layer, as shown in Figure 3. The message further contains the destination key of the message as well as a Peerlet Code Locator parameter. This parameter informs the nodes about the mechanism and location for downloading the peerlet code. Applications running over structured peer-to-peer systems can be broadly classified in two categories. The first type of application, which we call end-node applications, only requires the destination or end nodes of the routing path to perform applicationspecific operations. A typical example of end-node application is any application based on the DHT concept, i.e. the storage of {key, values} couples. On the other hand, the second type of application, which we call per-hop applications, involves intermediary nodes in the routing path to perform per-hop operations on “their” messages before they are forwarded. This may include changing the next hop ID (i.e. changing the routing of a message).
dst key
app id
app type
peerlet code locator
application data
Fig. 3. PROST message format
To this end, the API defines a forward() callback method that is invoked at each node that forwards a message. This upcall informs the application that the message M with a key k is about to be forwarded to the node with ID nextHopNode. Examples of applications requiring per-hop treatment are multicast routing [9], [10] application-level multicast infrastructure [9] or result aggregation [11]. For example, in the SCRIBE event notification service [9], nodes requires the underlying key-based routing layer to invoke
A Programmable Structured Peer-to-Peer Overlay
149
its forward upcall in order to build the multicast event dissemination tree on the way from the subscriber to the rendezvous point. PROST uses the binary message parameter application type to differentiate between these two classes of applications, as shown on Figure 3. Note that intercepted messages are only sent to applications with ID matching that of the message. The KBR layer also provides mechanisms that deal with the transient and unreliable nature of individual nodes. The deal with node failure, the KBR layer defines set of replication nodes, called a replica set, for each key. In case a root node is unavailable, the KBR layer is responsible to route a message to one of the available replica nodes. Furthermore, the API’s replicaSet() method gives the application access to the current set of replication nodes, in order to keep the replication nodes synchronized. Finally, the routing tables of peer nodes need to be maintained and updated appropriately in case of node failure, or in case of nodes joining and leaving the overlay. Applications can be informed of such events via the API’s update() callback method. We refer to [8] for a detailed discussion of the API. 3.2 Programmable Peer Layer Above the KBR layer, the programmable peer layer performs all the operations concerning peerlets deployment, execution and termination. As mentioned previously, peerlets are mobile code modules, which are dynamically loaded and installed, similarly to any plug-in architecture. Peerlets implement the actual P2P applications. The Peerlet Manager, shown on Figure 2, is responsible for the loading and instantiation of peerlets. It also enforces the node’s local security and access control policy by controlling and limiting the peerlets’ access to local resources such as CPU, storage and the network. It further provides security by isolating peerlets from the host system to limit the impact of malicious or faulty peerlets. Currently, resource management in PROST is kept to a basic sandbox model, as explained in section 5. We are looking at applying existing research results, e.g. from the active network community to further improve on this. Applications on peer nodes in PROST consist of two components: a peerlet and a user interface component. The peerlets implements the server-component of the application. Again, this functionality can be as simple as a DHT, but it can be arbitrarily complex, depending on the application. The user interface component of an application allows end-users to access the services provided by the peerlets. This is typically a Graphical User Interface, but this is determined by the application and not restricted by our infrastructure. While peerlets can be deployed without a user interface component on a node, user interface components require a peerlet to be deployed on the same node.
4 Dynamic Application Deployment In this section, we discuss how multiple applications can co-exist in PROST and how new applications can be dynamically deployed. 4.1 End-Node Applications As mentioned previously, end-node applications only need to invoke applicationspecific functionality at the destination node of a message in the KBR layer. This is
150
M. Portmann, S. Ardon, and P. Sénac Application (Peerlet)
KBR
Application (Peerlet)
KBR
. . .
KBR
(a) Application (Peerlet)
Application (Peerlet)
KBR
KBR
Application (Peerlet)
. . .
KBR
(b)
Fig. 4. Message flow in End-node (a) and Per-hop Applications (b)
illustrated in Figure 4a. A simple DHT is an example for such an application. The intermediary nodes in the routing path are not involved in the application and simply forward messages to their destination. In a typical scenario, a peer application initiates an operation on another node designated by a key, by invoking the route() method (c.f figure 3). For example a put(key,data) operation in a DHT. This results in the KBR layer to route a message to the root node in charge of the key. In this case, the application type field tells intermediary nodes in the routing path that no application-specific code needs to be invoked and the message is therefore simply forwarded according to the default routing rules of the KBR layer Upon receiving the message, the root node lookup the message application id, and checks if the corresponding peerlet is installed. If this is the case, the message is delivered to the peerlet; otherwise, the node automatically downloads the necessary code from the source specified by the Peerlet Code Locator field in the message. Once the peerlet is loaded and instantiated it is passed the message and it can perform the necessary tasks. 4.2 Per-Hop Applications As mentioned in section 3.1, the second category of applications considered in PROST is per-hop applications, which requires per-hop operations to be performed at the intermediary nodes along the routing path. This is illustrated in Figure 4b. Per-hop applications, in contrast to End-node applications, therefore require the corresponding peerlets to be installed at the intermediary nodes of a message’s routing path. The process of dynamic application deployment is therefore slightly different from the previous case: at every node in the path the application’s peerlet is invoked via the forward() upcall of the KBR layer’s standard interface (c.f. section 3.1). This call passes the message up to the application and allows it to override the standard routing behavior or to perform the application-specific tasks. As in the case of End-node applications, an operation begins with an application invoking the route() method to send a message to the node responsible for a key k. The message is forwarded to the first node in the path to its destination. In this case, the application type field specifies that application-specific functionality needs to be invoked at intermediary nodes. Therefore, the node calls the forward() method of the
A Programmable Structured Peer-to-Peer Overlay
151
peerlet with the specified application identifier. If the corresponding peerlet is not installed, the node’s peerlet manager automatically downloads the code and installs it, in the same way as described in the previous section. Then, the peerlet’s forward() method is called which gives the application the opportunity to perform any per-hop operations. After the forward() method returns, the message is sent to the next node on its path and the process is repeated until it reaches its final destination. 4.3 Application Deployment Policing In a general use case, most nodes will want to control the type and which particular peerlet they run. This can be due to security or legal reason. It is therefore possible that some nodes are not able or not willing to install and run certain peerlets. Note that this problem is irrelevant if we consider the use of PROST in a corporate context, or more generally when a trust relationship exists between nodes and peerlet providers. This problem of non-cooperating nodes is different for per-hop-applications than for end-node applications. PROST requires per-hop applications, to tolerate a certain amount of non-cooperating intermediary nodes and implement a graceful degradation of service. The impact of this limitation on the performance of several existing perhop applications is currently under investigation. On the other hand, the situation where a message’s destination node refuses to install a peerlet is more severe. This problem is treated in PROST in the same way as general node failures are treated in most P2P systems, by means of replication. As briefly discussed in Section 3.1, the KBR layer provides applications with the necessary mechanisms for implementing replication.
5 Implementation We have implemented a proof-of-concept prototype of PROST in Java. The KBR layer of our prototype is based on the implementation of the CHORD [1] protocol from the XML-Store project [21], with a few minor modifications. We replaced the UDP-based RPC mechanism with Java’s Remote Method Invocation (RMI). This increased the code stability and allowed us to easily detect node failures in the CHORD overlay, which was not implemented in the original XML-Store code. The downside of using RMI for inter-node communication in a peer-to-peer overlay is the relatively high overhead. However, the focus of our first proof-of-concept implementation was simplicity and ease of development rather than performance. We also implemented the standard API for structured P2P systems. In addition to the methods defined in [8], we implemented a lookupPeerlet() method. This recursive method takes a message (see Figure 3) and the corresponding parameters and delivers it to the root node responsible for the given key k, in the same way the API’s route() method does it, including the invocation of per-hop-operation if required. In contrast to route(), the lookupPeerlet() method is blocking returns a RMI reference of the peerlet the message was delivered to. With such a reference, peerlets can invoke any application-specific operations on each other, using Remote Method Invocation. For example, in the case of a DHT, the application would simply call the put() and get() methods of the remote peerlet.
152
M. Portmann, S. Ardon, and P. Sénac
Java’s support for mobile code greatly facilitated the implementation of the Programmable Peer Layer. Peerlets are implemented as Java classes implementing the Peerlet interface. The corresponding class files are dynamically loaded by the Peerlet Manager from remote sources via Java’s URLClassLoader. In our implementation, we used a standard web server as a peerlet code server, which can be authenticated and secured using standard methods. We are currently working on implementing the peerlet code server functionality in a distributed fashion as an integral part of PROST. We currently employ two basic methods to protect a node from malicious (or faulty) third party code. Firstly, peerlets executing on a PROST node are contained to a restricted environment, also referred to as a sandbox. The sandbox provides a separate name and address space for each peerlet and limits its access to functionality and resources on the host node. Secondly, PROST supports the concept of code signing, where each peerlet is digitally signed by its producer. When dynamically loading peerlet code, a node can verify its authenticity and the identity of its producer. This allows restricting the loading of peerlets from trusted sources only, depending on a node’s local security policy. To evaluate the design and usability of our infrastructure, we implemented two sample applications: an instant messaging (IM) application and a yellow pages-style service directory that maps service categories to a list of service providers. We successfully deployed and tested these applications simultaneously on a small overlay of 20 nodes on 5 physical machines. Deploying new P2P applications, even on small overlays, is typically a very tedious and costly operation. Application deployment in PROST is relatively easy and involves the following steps: first, a 128-bit application identifier is assigned pseudo-randomly, with a negligibly small probability of a collision. Then, the peerlet code needs to be made available on a code server. An initial peerlet and the corresponding user interface component must be installed manually on one of the overlay nodes. Further deployment of peerlets is done automatically, as described in Section 4. For example, the creation of new service categories in our service directory application results in the automatic deployment of peerlets on the nodes responsible for these categories i.e. the corresponding key. Reliability through replication has not been implemented yet for our two sample applications. This is one of the issues that we will address in our future work.
6 Future Work There are a large number of issues remaining to be addressed in PROST. The first challenge that comes to mind in PROST is the security aspect of the KBR layer, as identified in [16]. These security problems are shared by all structured P2P systems. Some ideas proposed in [17] can be applied in the context of PROST and they provide partial solutions to some of the problems. However, we believe the basis of any security mechanism in P2P overlays are strong and verifiable node identities. We are currently exploring the use of crypto-based IDs [19] for our infrastructure. We are also investigating the use of Threshold Cryptography [18] to implement a distributed admission control mechanism. Secondly, resource management and resource policing aspects in PROST also needs to be further developed. We believe some existing results from the active
A Programmable Structured Peer-to-Peer Overlay
153
network (AN) research can be re-used in this context. Thirdly, there are a number of performance aspects to be investigated, such as quantifying the impact of the larger overlay network on delays and on probability of failure/cache miss. Finally, we plan to develop a range of applications to further evaluate our infrastructure and perform performance measurements on a medium scale deployment of PROST.
7 Related Work The lack of a generic shared infrastructure for structured P2P system has also been identified as a being a problem in [13]. As a solution, the authors propose OpenHash (subsequently re-named OpenDHT), an open, publicly accessible DHT service that runs on a set of infrastructure hosts. It provides applications with the simple put/get interface of a hash table. Applications for which this interface is sufficient are called Lite Applications, and can be directly implemented on top of OpenHash. For the majority of applications for which a simple DHT interface is not adequate, OpenHash serves as a distributed rendezvous mechanism that allows discovery and coordination between nodes implementing the same applications. The actual P2P applications are deployed on arbitrary nodes outside the OpenHash infrastructure. This represents a departure from the pure P2P paradigm that stipulates symmetric roles of all nodes, and is one of the main points in which OpenHash differs from our approach. Finally, OpenHash does not provide a mechanism for the dynamic deployment of application and services. In [8], a standard API for the KBR layer, which is common to all structured P2P systems, is proposed. The authors go further and state their intention to define static APIs for a range of higher layer abstractions (Layer 2 in Figure 1) such as DHTs, DOLR, CAST etc. This is in contrast to our approach, which only defines a static API for the KBR layer, but provides maximum flexibility and ease of deployment for higher layer functionality via a programmable platform. In [3], the authors mention that their structured P2P system has an extensible API and it can be shared by multiple applications. However, the paper does not address the details of this and neither does it provide a mechanism to dynamically deploy new functionality and applications. JXTA [22] is a project that defines a set of protocols and concepts for P2P computing, such as peer and resource discovery, peer communication and peer group management. One of the main point in which JXTA implementations differ from our proposed infrastructure is the way in which messages routing is implemented. JXTA uses an adaptive source-based routing, where routes are initially computed by the sender. This is in contrast to our proposal that specifically focuses on the Key-based routing mechanism of structured P2P systems. Furthermore, JXTA does not provide a mechanism for the dynamic deployment of applications. Finally, the idea of programmable overlay architecture is not new and has been proposed previously by a number of authors [14], [15]. However, to the best of our knowledge, our proposed architecture is the first to apply the idea of programmability in the context of structured P2P overlays to alleviate the overlay applications from maintaining their own dedicated overlay network
154
M. Portmann, S. Ardon, and P. Sénac
8 Conclusions In this paper, we outlined the idea of PROST, a programmable structured P2P platform that allows dynamic deployment of new distributed applications and services. It is built on top of a key-based routing layer that provides a scalable and efficient lookup service. The programmability of our proposed infrastructure is a new concept in the context of structured P2P networks. It allows the basic P2P routing infrastructure to be shared by multiple applications, thereby also sharing the cost of deployment and maintenance, while providing a maximum degree of flexibility to accommodate the requirements of wide range of current and future applications. We believe that the availability of a shared and programmable infrastructure, as proposed in this paper, would greatly encourage and facilitate the innovation and development of new large-scale distributed applications using structured P2P functionality. Our work presented here is in its early stages, and with many unresolved issues. However, we believe our novel approach provides a promising solution that can serve as a versatile platform for a wide range of distributed applications.
References 1. Stoica, I., Morris, R., Lien-Nowell, D., Karger, D.R., Kaashoek, M., Dabek, F., Balakrishnan, H.: Chord: A Scalable P2P Lookup Protocol for Internet Applications. In: ACM SIGCOMM 2001, San Diego, CA (August 2001) 2. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, p. 329. Springer, Heidelberg (2001) 3. Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications (January 2004) 4. Kubiatowicz, J., Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: OCEANSTORE: An Architecture for Global-Scale Persistent Storage. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Cambridge, MA. ACM, New York (2000) 5. Castro, M., Druschel, P., Kermarrec, A., Nandi, A., Rowstron, A., Singh, A.: Splitstream: High-bandwidth multicast in cooperative environments. In: 19th ACM Symposium on Operating Systems Principles (2003) 6. Stoica, I., Adkins, D., Zhuang, S., Shenker, S., Surana, S.: Internet indirection infrastructure. In: Proceedings of ACM SIGCOMM (August 2002) 7. Dabek, F., et al.: Wide-area cooperative storage with CFS. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 2001), Banff, Canada (October 2001) 8. Dabek, F., et al.: Towards a common API for structured P2P overlays. In: Proceedings of IPTPS 2003, Berkeley, CA (February 2003) 9. Rowstron, A., Kermarrec, A.-M., Castro, M., Druschel, P.: SCRIBE: The design of a large-scale event notification infrastructure. In: Crowcroft, J., Hofmann, M. (eds.) NGC 2001. LNCS, vol. 2233, p. 30. Springer, Heidelberg (2001) 10. Zhuang, S.Q., Zhao, B.Y., Joseph, A.D.: Bayeux: An architecture for scalable and faulttolerant wide-area data dissemination. In: The Proceedings of the 11th ACM/IEEE NOSSDAV, New York (June 2001)
A Programmable Structured Peer-to-Peer Overlay
155
11. Huebsch, R., Hellerstein, J.M., Lanham, N., Thau Loo, B., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: Proceedings of the 9th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, September 9-12 (2003) 12. Druschel, P., Rowstron, A.: PAST: Persistent and anonymous storage in a peer-to-peer networking environment. In: Proceedings of the 8th IEEE Workshop on Hot Topics in Operating Systems (HotOS-VIII) (2001) 13. Karp, B., Ratnasamy, S., Rhea, S., Shenker, S.: Spurring adoption of DHTs with openHash, a public DHT service. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, pp. 195–205. Springer, Heidelberg (2005) 14. Fry, M., Ghosh, A.: Application Level Active Networking. Computer Networks 31(7), 655–667 (1999) 15. Ardon, S., Gunningberg, P., Ismailov, Y., Landfeldt, B., Portmann, M., Seneviratne, A., Thai, B.: Mobile Aware Server Architecture: A distributed proxy architecture for content adaptation. In: Proceedings of The 11th annual Internet Society Conference (INET 2001), Stockholm, Sweden (June 2001) 16. Sit, E., Morris, R.: Security considerations for peer-to-peer distributed hash tables. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 261. Springer, Heidelberg (2002) 17. Castro, M., Druschel, P., Ganesh, A.J., Rowstron, A., Wallach, D.S.: Secure Routing for Structured Peer-to-Peer Overlay Networks. In: Proceedings of the 5th Symposium on Operating System Design and Implementation (OSDI 2002), Boston, Massachusetts, USA, December 9-11, 2002, USENIX Association (2002) 18. Saxena, N., Tsudik, G., Hyun Yi, J.: Admission Control in Peer-to-peer: Design and Performance Evaluation. In: Proceedings of the 1st ACM Workshop on Security of Ad Hoc and Sensor Networks, Fairfax, Virginia (2003) 19. Montenegro, G., Castelluccia, C.: Crypto-based identifiers (CBIDs): Concepts and applications. ACM Transactions on Information and System Security (TISSEC) 7 (2004) 20. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and Replication in Unstructured Peer-to-Peer Networks. In: Proceedings of the 16th annual ACM International Conference on Supercomputing (ICS 2002), New York, USA (2002) 21. Thorn, T., Fennestad, M., Baumann, A.: A Distributed, Value-Oriented XML Store, Master’s Thesis IT, University of Copenhagen (2002) 22. Traversat, B., Abdelaziz, M., Duigou, M., Hugly, J., Pouyoul, E., Yeager, B.: Project JXTA Virtual Network Sun Microsystems Inc. (October 28, 2002), http://www.jxta.org
Interpreted Active Packets for Ephemeral State Processing Routers Sylvain Martin and Guy Leduc Research Unit in Networking, Université de Liège, Institut Montefiore B28, 4000 Liège 1, Belgium {martin,leduc}@run.montefiore.ulg.ac.be http://www.run.montefiore.ulg.ac.be/
Abstract. We propose WASP (lightweight and World-friendly Active packets for ephemeral State Processing), a new active platform based on Ephemeral State designed to allow bytecode interpretation on programmable datapath elements. We designed WASP to be a good compromise between flexibility (e.g. offering solutions in quality-adaptive multimedia flows, service discovery or mobility support) and safety (i.e. protection of router and network resource).
1 Introduction and Motivations With the emergence of network processors, the way we design active networks has evolved [14]. Older active platforms like ANTS [1] offer fully-featured environment supporting complex functions like transcoding video flows, redistributing a packet towards a collection of targets, etc. In constrast, SNAP (Safe and Nimble Active Packets from the SwitchWare project[5]) and more recently ESP (Ephemeral State Processing router designed at University of Kentucky [4]) stress that active processing should remain safe and efficient in addition of being flexible (in a word, practical [2]). WASP is an attempt to merge the benefits of these two platforms to offer a better compromise between users and network operators’ expectations about active networking. The WASP router keeps the focus on routing the packets, with the option of performing very simple tasks (e.g. that are “too cheap to measure” compared to packet forwarding) that can help applications in end-system to take better decisions or locally improve flow efficiency. In other words, we can express the constraint we put on WASP design as follows: User-Friendly: WASP should offer significant programmability while allowing the end-user to know to what extent he can trust what he gets from the active network. Network-Friendly: WASP should not become a nuisance to networks (and operators). It should require no configuration from the operator and the network load it produces should be predictable. Router-Friendly: Active packets should not be able to harm a router nor degrade its performance.
Sylvain Martin is a Research Fellow of the Belgian National Fund for Scientific Research (FNRS). This work has been partially supported by the Belgian Science Policy in the framework of the IAP program (Motion PS/11 project) and by the E-Next European Network of Excellence.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 156–167, 2009. c IFIP International Federation for Information Processing 2009
Interpreted Active Packets for Ephemeral State Processing Routers
157
1.1 The Need for Speed The speed at which our active platform will be able to process active packets will define the network locations where it can actually be deployed. With the appearance of network processor devices that offer datapath programmability at rates of up to 1Gbps, we may expect a network environment where programmable nodes are not simply endsystems in an overlay but even border routers in a transit domain. A simple active packet crossing an active node will incur different types of timeconsuming operations, among which delivery to the software component that will evaluate it and unmarshalling of high-level language abstractions can be a significant share (up to 42% and 32% of the smallest ANTS capsule respectively, according to [10]) of the total processing time. Considering these results, second-generation active platforms have migrated from the user-level space to kernel level and tend to operate directly on packet data rather than requiring serialization/deserialization between the representation active code uses and the one stored in the packet. Because it is extremely small, the WASP microbytes interpreter can be made safe enough to run with the same privileges as the “fast path” of the router. Moreover, it doesn’t require any marshalling cost as it directly operate on the packet as if it were flat memory rather than trying to assign strong types to data objects. 1.2 How Network Processors Differs from Generic-Purpose Processors We developed the WASP platform with two target platforms in mind: regular PC systems (where it could be for instance running as a Linux kernel module) and routers equipped with network processors, where it is possible to implement efficiently solutions that require custom operations on packets. Network processors typically include a general-purpose processor (for control plane purpose), specialized co-processors (for hashing or trie lookup) and dedicated RISC cores for programmable datapath on a single chip. These chips try to avoid SDRAM lookup latencies1 by providing closer, faster (and smaller) storage for both data and code and often access multiple words in a row. These hardware considerations have an impact on the design we can implement efficiently on such system. In addition, in the Intel IXP family ([9]), each execution unit (called a microengine) has very small instruction memory (2K-4K) whose content cannot be reprogrammed while the microengine is in use. It is thus quite unpractical to implement a platform that would cache native code at the microengine level. Moreover, if one wishes to implement a bytecode interpreter, it will have to be simple enough to fit a few microengines. 1.3 Third-Party Services: An Hybrid Networking Approach Literature on active applications have shown how end-users could benefit from highlevel operations such as hierarchical HTTP caches and multimedia flow transcoding depending on terminal/network abilities. A network operator who is willing to offer such “value-added services” to customers will however face the problem of client-side configuration and service advertisements. In our previous work [8], we have shown 1
According to [6], it will take up to 750 cycles of the IXP1200 microengine.
158
S. Martin and G. Leduc
how the WASP platform could be used to help deploy such services through a fully decentralized, configurationless discovery scheme. The key idea of that service location facility is that, if sufficient WASP nodes are deployed in a network domain, nodes that are willing to offer a given service S simply need to leave information in WASP routers so that, when customers establish a new connection, they can automatically detect whether a given service is available or not and which “proxy” node should be contacted to get the service. In terms of speed improvement, this hybrid approach allows the operator to draw the full power of nodes that hosts services with limited overhead on routers, taking benefits of both componentbased active networks for the services implementation and active packets for services discovery.
2 A Router-Friendly Platform Even if WASP is based on active packets, it is much more restricted than generalpurpose capsules, so that user’s bytecode cannot waste router’s resources. For instance, WASP bytecode language prohibits backward jumps and all instructions have predictable execution time, which makes packet processing time trivial to control (as shown in SNAP [5]). If such a restriction is impractical for general-purpose services, it perfectly fits the lightweight control tasks that WASP will have to perform. While alternative solutions exist, such as associating a counter to any backward jump in the code, we believe the benefit we could get in WASP is not worth the additional management required. 2.1 Memory and Storage Most active protocols will need information to be stored temporarily on intermediate nodes, so that it can be later retrieved by other active packets. For instance, we could drop B frames of an MPEG video stream if the I frame they refer to has been dropped by the node or if it is likely to be dropped, for instance due to a congested output link. This requires I frame to leave information on the router status for further frames. It is important for network availability and performance that this local storage remains easy to manage and can automatically discard information that is no longer pertinent. ANTS [1] and many other platforms use soft-state-based memory management to release memory that has not been used by packets for a given amount of time. Unfortunately, soft-state based managers make it hard for the access control to define if there will be sufficient memory to accept the flow. It has been shown in ESP [4] that memory will be much easier to manage in the ephemeral state approach, that is if the store only keeps data for a constant period (10 seconds), regardless of how frequent the data is referenced during that period. If we also ensure that all the data slots in the store have the same size, collecting freefor-reuse slots becomes simple enough to execute without disturbing packet forwarding tasks on the router, and checking if the router will have sufficient resources to process an additional flow simply requires that the router checks how many different slots are used by the flow. Those small, fixed-size data slots that the node associate with a key
Interpreted Active Packets for Ephemeral State Processing Routers load(@queue_state); test(heavy_loaded_bit); bnz(+1); FORWARD; insert $key; FORWARD;
lookup $key; bundef(+1); DROP; FORWARD; $key=0xdecafbad MPEG DATA (bframe)
B
P
B
159
$key=0xdecafbad MPEG DATA (iframe)
B
I MPEG
stream
frames dependencies
WASP router
Fig. 1. WASP code attached to MPEG I frames and B frames
for a fixed amount of time are called tags in the ESP terminology. Note that no access control is required for tags. It is simply assumed that each source picks up a random 64-bit value and uses it as a key. The chances that two sources randomly pick the same tag and send packets over routes that cross the same router in a 10 seconds timeslice (otherwise no collision occurs) are extremely small. 2.2 A First Example The above decisions may sound excessively restrictive, but the platform we build on it can still be sufficiently expressive. Figure 1 illustrate a WASP implementation of the selective frame filter [12]. The code for I frames checks a bit of the interface state to evaluate the load on the outgoing interface (set by a RED queue manager, for instance) and, depending on the result, leaves (insert) a tag demanding B frames to be dropped. All the B frames depending on a given I frame will carry the same unique identifier (e.g. 0xdecafbad) generated by the source and use it to check (lookup) the presence of the tag in the router’s store. Note that this scheme could easily be extended so that, for instance, a B frame that faces a ‘drop request’ propagates this information backwards to the source. We could also use an I frame code that would compare the local time with the reception time of the last I frame to see if the deadline is still achievable.
3 A Network-Friendly Framework Even if we take care of router resources, an ill-intentioned active packet could easily create an avalanche of clones to overload its destination. Among ‘first generation’ active platforms, PLAN [11] was the only project that addressed this issue by making sure all children’s resource counters receive a portion of the parent’s counter. Unfortunately, knowing what initial bound should be allowed remains a complex issue. In the case of the WASP platform, packets do not have the ability to create child packets unless they are targetted at a multicast address, but they can drop themselves or return to their source. If we focus on applications like service discovery or server load balancing, we have no real need for more: we can store the new destination in the active
160
S. Martin and G. Leduc
packet, return the packet towards its source and let the source issue a new connection attempt to the real destination. In the case of content distribution networks, however, being able to change the destination address could have a great impact on performance, especially if we are allowed to swap a multicast address to unicast addresses. Using this scheme, it becomes possible to take advantage of multicast routing where it is deployed rather than requiring a consistent deployment along the whole path. From the network operator point of view, such rerouting needs to be carefully handled so that it does not lead to packets looping endlessly within a domain or “pingponging” between two domains. For instance, by the action of rerouting, a WASP packet should never require an additional IP table lookup and it may never lead to sending a packet back on the interface it is coming from. The problem of rerouting is further discussed in section 6.
4 The WASP Platform WASP is derived from ESP router[4], which consists of the router logic and Ephemeral State Stores (ESS) containing tags that packets access based on 64-bit keys. As shown on Fig. 2, ESS are either bound to a network interface or to the “center” location, and active packets that cross the router can only request interpretation at 3 logical locations: incoming interface, center and/or outgoing interface. Each ESP packet requests the execution of one of the pre-defined operations on certain tags. Unfortunately, those operations remain tightly bound to a specific application domain (multicast) and non-trivial protocols require almost dedicated operations. The WASP platform thus keeps the overall design of ESP but replaces pre-defined operations by a virtual processor interpreting a bytecode language inspirated by SNAP [5]. We also extended the packet control operations, access control semantics on the tags and access to node-specific state. 4.1 WASP Packets WASP uses the active packets paradigm: each packet contains its own code and the data on which it can operate. The evaluation of packet code (up to 256 bytecoded
eth0 ESS
Ephemeral State Store
center outgoing
store
lookup insert
packet variables
load
VPU incoming
packet header
packet code
environment variables
Fig. 2. A WASP router and WASP execution environment. Gray items mean the VPU has readonly access to the resource.
Interpreted Active Packets for Ephemeral State Processing Routers
161
micro-instructions called microbytes) terminates when a packet control microbyte is encountered, which tells the router what to do with the packet. In addition to “forward” and “drop” semantics, WASP allows the packet to be sent back to the source at any router, which can be useful when a quick feedback of a discovered state is required (e.g. filtering more packets in the router ahead of a congestion point). During interpretation, the data part of the WASP packet is available as a 128 byte region of random-access memory and only the IP header is readable. Other parts (WASP bytecode, other IP options, transport payload) are unavailable to WASP code. 4.2 The WASP Node Each WASP node has a certain number of Ephemeral State Stores that will associate 64-bit keys with small, fixed-size data into tags. Each ESS is associated with a Virtual Processing Unit (VPU) that processes the WASP packets. Since all exchanges between packets occur in the ESS, there is no need to store VPU state between the evaluation of two packets. This greatly simplifies the synchronization problems, even on a multiprocessor system, since it means we can bind VPU data to one real CPU rather than to an ESS. Before a VPU starts evaluating a packet, it retrieves the node and interface environment variables and exports them as banks of read-only memory to WASP code. Those variable will typically contain the node IP address, netmask, local time, etc. plus statistics about the current interface (recent packet transmission statistics, queues status, etc.), which can be useful for applications sensible to network conditions. Considering the restricted resources of network processors, we tried to keep the design of WASP’s virtual processor as simple as possible, which makes it look more like an embedded microcontroller than a modern microprocessor, from an architectural point of view. – Work registers are organized following the ’accumulator and stack’ approach, leading to smaller encoding and better use of hardware registers. – Once the index register has been loaded, any memory reference can either stay on that index or advance to the next word. With an appropriate ordering of data according to code sequence, this may save up to 50% of code size for the ESP instructions involved in data aggregation service [15]. – Only simple ALU and shifting operations are allowed. The VPU has, for instance, no support for floating point values, multiplications/divisions, or signed arithmetic. 4.3 More Efficient Access to ESS Among all microbytes, interactions with the ephemeral state store will be the most important to tune, as they will likely be the most costly operations the VPU will have to handle. It is clear that we’d like to avoid repetitive hashing in a lookup-then-update cycle, for instance. In native implementations, once the hash table has been looked up, resulting memory references are kept for further updates. In WASP VPU, those “resolved pointers” are stored in a cache transparent to the bytecode programmer. We tested two cache policies, small caching, where only the last resolved pointer is kept, and full caching, which keeps every resolved pointers.
162
S. Martin and G. Leduc Table 1. Comparing caching policies. Timings in CPU cycles on 1GHz Pentium III.
count collect rchild rcollect
no caching full small mapping native 721 637 592 586 349 1245 1082 958 842 633 2058 1830 1845 1509 775 2980 2438 2394 2020 1091
Moreover, previous work with ESS-based programmable nodes has shown that nontrivial operations (including rchild and rcollect used for the robust aggregation service) may require 3 or 4 logical variables in the ESS, leading to increased processing cost due to repeated search in the ESS and repeated access to SDRAM. WASP solves these issue by allowing larger values (namely 32 bytes) to be stored in the ESS, and through a map microbytes that makes a whole ESS value appear as a memory bank for the VPU. The packet can then access individual bytes/words of that bank with no extra key resolution until another map forces the bank to be written back in the ESS. Even on the PC implementation, mapping larger memory banks has allowed us to reduce execution time by about 30%, and we expect even higher improvements on IXP architecture thanks to burst transfers with SDRAM. It is also interesting to note that, due to overhead in the ESS entries, allowing 8 times more storage leads to better memory consumption as soon as we have an average of at least 2 related values. 4.4 Preliminary Performance Results To validate our assumptions, we compared the execution time of ESP operations on a Pentium III processor, interpreted by WASP using different access policies, against ESP’s native implementation (see Table 1). Note that the small caching policy behaves better than full caching here. This can be explained by the fact most of ESP operations (as described in [15]) don’t look up for a given variable more than once and that updates can be done before another lookup is issued. The cost for setting up and maintaining a more complex policy such as full caching is thus greater than the benefit one can expect from the cache hit ratio. The mapping scheme is clearly giving even better results once the bytecode has been rewritten to use the map instruction, approaching an execution time of 200 to 150% of the native implementation provided by University of Kentucky. Note that even if interpretation with WASP is twice longer than execution of native code, this represents only a small fraction of the code that is actually executed to process a packet. For instance, on a Linux router running at 300MHz featuring the ESP/WASP module (with small caching policy), the ESP:count packet took an average 69.8µs2 while processing WASP:count took 77.6µs – only a penalty of 10%, compared to 69% suggested by Table 1. Comparatively, the same packets take respectively 23.6µs and 24.3µs to be just forwarded. Moreover, a more complex packet like WASP:collect took 80.0µs for processing and 25.0µs for plain forwarding. Finally, 2
Those timings are obtained using timestamps returned by libpcap on a machine running Linux kernel 2.4.18, which forwards a ping with an average latency of 48µs and takes 99.8µs to reply to a ping.
Interpreted Active Packets for Ephemeral State Processing Routers
163
a WASP packet of the same size as WASP:count that simply contains the forward microbyte took 66.6µs, which means most of the overhead is in packet checking, initializing and finalizing rather than in instructions processing.
5 Trustworthy Storage for the End-User As soon as WASP is used to locate services, packets need to use a well-known key to access information other participants might have left in routers. Such a well-known key can be for instance produced by hashing a service name, which makes them easier to guess from an external attacker than random keys of section 2.1. Therefore WASP introduces protected tags that can only be modified by super packets. If the domain operator ensures that no super packets can come from outside, the end user can be sure that the information bound to the tag has been set up by the domain operator. The node determines whether a tag is protected or not by checking its key against a specific pattern3 , and will allow writes to such tags only to packets that are marked ‘super’ in their WASP header. All a network manager will have to do in this case is (1) filter out WASP super packets at ingress nodes from the outside and (2) use super packets to advertise services within his own network. 5.1 Hash-Requesting Packets and Private Tags When participants and attackers can come from the same domain, protected tags are no longer helpful. For such cases, WASP offers private tags, which works like protocolprivate data in ANTS. Unlike other tags, the application programmer has no direct control on the key that will be used for private tags. Instead, the WASP node will hash the code contained in the packet and use the result as the private key for that packet, which is kept secret by the router. To make sure that regular packets do not attempt to use brute-force scan, private tags have an identifiable prefix and any attempt to use keys with that prefix explicitly will abort packet execution. If the hash method is carefully chosen (e.g. a one-way hash like MD5), it means that packets will have access to the same private space only if they have the same code, which means we are sure they play the same game with same rules. Under those circumstances, an attacker can only hope to break the protocol by sending more (or less) packets than expected by the protocol – which a properly designed protocol should handle anyway. Note that the implementer of a WASP node is free to use any hash method that best suits its hardware as the resulting hashes are used only on the computing node. The only rule is that packets with the same hashed part operate on the same private tag and that packets with different hashed parts operate on different private tags. Note that routers that cannot afford MD5 could be allowed to use cheaper algorithms (e.g. CRC) by mixing the bytecode with a locally-generated random number and still be able to safely assume that protocol’s privacy cannot be broken.
3
Highest 8 bits are all 1 in current implementation.
164
S. Martin and G. Leduc
5.2 World-Readable, Protocol-Writable Tags While private tags guarantee that a collection of participants will modify state in the router following a common set of rules (e.g. the protocol), their cost may not be acceptable for packets that just need to follow the decision without altering the state (e.g. a multimedia stream). Each packet would also have to carry the whole protocol so that it receives the same hash value, regardless of what part of the code is useful for itself. As a result, the expose opcode allows a hash-requesting packet to have its private state accessible read-only as a protected tag. The result is a new ESS tag that contains a link to the private tag, which is transparently resolved by the VPU when a packet tries to read it. Writing to an exposed tag via a link is of course not allowed. Note that the presence of a link only tells that it exposes private data, but not what protocol exposes them. It will thus be up to the protocol designer to ensure that the key used for exposing the data cannot be guessed by an attacker before the link is created. A simple way to achieve this is to generate the key from a random number on the router and inform participants of its value after data has been exposed.
6 Rerouting Packets While considerably augmenting the flexibility of the WASP platform, packet rerouting raises a number of issues for both network operator and end-users. The main concern for the end-user will be to ensure that packets still reach the expected destination, even when re-routing applies. When the sequence of destination addresses to the final target is given explicitly in the packet’s variables, the offered service has the same security semantic as loose source routing in IP. When that sequence is retrieved from ESS, however, we need to ensure that no one is trying to abuse the end-systems to gain an intermediate position on a specific data flow. We are confident that protected/private tags should however help the end-user build reliable rerouting-based applications. From the operator’s point of view, the main difficulty comes from the fact that generally speaking, we do not want to allow any packet to be received from any link. Business agreements with other peer domains, for instance, may only allow a domain A to use link to domain B to reach B’s clients, but not to reach other peer domains or providers of B (see Fig. 3a). Nowadays, most of these agreements are enforced by filtering routes customer to provider relationship
$ Provider to: any
=
$
Provider
peering relationship
regular A to U
$ $
A
Peer
B
=
$
to: X,Y
A fake A to X
$ $
X (a)
Peer
B
=
Y
U
X
v
rerouted A to U
(b)
Fig. 3. Typical interdomain policies for domain A (a), broken by blind rerouting (b)
U
Interpreted Active Packets for Ephemeral State Processing Routers
165
advertised by BGP rather than by filtering packets, but blindly enabling rerouting of WASP packets could lead to situations where a packet leaves A with a destination address falling in one of B’s clients and then reroute itself to another peer domain on B’s ingress router, thus cheating the business model (see Fig. 3b). We propose to solve those problems by means of invitations left in the ESS by former packets. When a WASP packet executes on an interface VPU, it can create a new tag carrying its source address by means of the invite opcode. The binary pattern of the key used with invite tells the VPU that the value can be safely used as a redirection target. Depending on how the interface is configured, the reroute opcode will accept either any target value (e.g. for customers-ingress links) or will be restricted to protected tags and invitations (e.g. for any other link). This way, “ping-ponging” between domains is no longer possible if all egress interfaces restrict rerouting to invited destinations. An invitation to address Y present on the interface means that the peer router for that interface has once sent a packet coming from Y , and thus it should be able to route another packet towards Y properly. We can also prevent cheating on the business model if non-client (guest) packets can leave invitations only on incoming VPU of ingress routers. More precisely, if we make sure that WASP packets from peers and providers are tagged as guest when they reach the center (see Fig. 2) of their ingress router and that invite opcode is not allowed for guest packets, then a packet p received on a non-client ingress interface that is targetted to a client domain can only be delivered to a client domain. By contradiction, suppose V is the first VPU where rerouting changes packet p’s destination towards a non-client destination U . This is only possible if an invitation to U is present in V , however: – if V is on a core router, it cannot have the invitation since guest packets can only leave invitations on their ingress VPU (unless source addresses are spoofed) – if V is on a border router, it implies V is bound to an outgoing interface, say itf, towards domain U . Note that p can only reach that hypothetical interface if itf connects to both clients and non-clients domains (likely to be a configuration error). In Fig. 3, note that B is not protected if A allows blind rerouting on its egress interface to B (we could easily disable blind rerouting on output interface, anyway). If both A and B configure rerouting properly, malicious clients from A cannot lead A to a situation where it unwillingly misroutes packets through B. Note that even if rerouting does not, by itself, allow source spoofing (which is the root of most DDoS attacks [13]), it might disturb tools based on header-hashing for traceback of packets used to react to those attacks. Allowing interoperations between rerouting and traceback might include storing previous destination in a field of the WASP packet or keeping traces of applied rerouting on routers and will be an interesting challenge for future work.
7 Conclusion and Future Work We proposed a new platform that combines safety of ephemeral state processing and flexibility of a bytecode language. While involving strong restrictions on programs one can write, WASP can still address a large range of problems and it can be used as a
166
S. Martin and G. Leduc
“helper” tool for more complex solutions that require a more “applicative layer” approach. Still, for the network operator, WASP is safe and offers strong guarantees on processing time, memory requirements and network link consumption. For the end-user, WASP gives enough programmability for most per-packet control operations. Moreover, WASP is clear concerning what may happen and what may not: the active network will never, for instance, alter source addresses or payloads, nor will it reroute packets if not explicitly requested. It would be interesting to investigate further how the receiver can be given more control on what WASP function can/must be supported by received flows, and whether it could help hosts protect themselves against DDoS, spam, etc. The preliminary benchmarks and the design choices of this platform give us good hope that an implementation on network processors will be able to sustain high throughput of active packets, even if a real implementation on IXP network processor is still required to measure under which proportion of active packets wire speed can be achieved.
Acknowledgment We would like to address special thanks to Jiangbo Li from Kenneth L. Calvert’s team for having so kindly replied to all our questions related to ESP.
References 1. Wetherall, D., Whitaker, A.: ANTS - an Active Node Transfer System. version 2.0, http://www.cs.washington.edu/research/networking/ants/ 2. Moore, J., Nettles, S.: Towards Practical Programmable Packets. In: Proc. of the 20th IEEE INFOCOM, Anchorage, Alaska (April 2001) 3. Nygren, E., Garland, S., Kaashoek, M.: PAN: A High-Performance Active Network Node Supporting Multiple Mobile Code Systems. In: Proc. of IEEE OPENARCH, New York, pp. 78–89 (March 1999) 4. Calvert, K., Griffioen, J., Wen, S.: Lightweight Network Support for Scalable End-to-End Services. In: Proc. of ACM SIGCOMM, Pittsburg, PA, August 2002, pp. 265–278 (2002) 5. Moore, J.T.: Safe and Efficient Active Packets, Technical Report MS-CIS-99-24, University of Pennsylvania (October 1999) 6. Calvert, K., Griffioen, J., Imam, N., Li, J.: Challenges in implementing an ESP service. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982, pp. 3–19. Springer, Heidelberg (2004) 7. Martin, S., Leduc, G.: A Dynamic Neighbourhood Discovery Protocol for Active Overlay Networks. In: Wakamiya, N., Solarski, M., Sterbenz, J.P.G. (eds.) IWAN 2003. LNCS, vol. 2982, pp. 151–162. Springer, Heidelberg (2004) 8. Martin, S., Leduc, G.: An Active Platform as Middleware for Services and Communities Discovery. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3516, pp. 237–245. Springer, Heidelberg (2005) 9. Intel Corporation The IXP2400 Hardware Reference Manual (November 2003) 10. Wetherall, D.: Active network vision and reality: lessons from a capsule-based system. Operating Systems Review 33, 64–79 (1999) 11. Hicks, M., Kakkar, P., Moore, J.T., Gunter, C.A., Nettles, S.: PLAN: A programming language for active networks. In: Proc. of ACM ICFP 1998, pp. 86–93 (September 1998)
Interpreted Active Packets for Ephemeral State Processing Routers
167
12. Bhattacharjee, S., Calvert, K., Zegura, E.: Technical Report GIT-CC-96/02, On Active Networking and Congestion, Georgia Institute of Technology, ftp://ftp.cc. gatech.edu/pub/coc/tech_reports/1996/GIT-CC-96-02.ps.Z 13. Dübendorfer, T., Bossardt, M., Plattner, B.: Adaptive Distributed Traffic Control Service for DDoS Attack Mitigation. In: Proc. of SSN 2005, Denver, USA (April 2005) 14. Sterbenz, J.P.G.: Intelligence in Future Broadband Networks: Challenges and Opportunities in High-Speed Active Networking. In: Proc. of IEEE IZS 2002, Zürich, pp. 2-1 – 2-7 (Feburary 2002) 15. Calvert, K., et al.: ESP Packet & ESP Instruction Specification, Technical Report. University of Kentucky, http://protocols.netlab.uky.edu/~esp/documents/esp_spec.pdf
A Secure Code Deployment Scheme for Active Networks Le¨ıla Kloul and Amdjed Mokhtari PRiSM, Universit´e de Versailles, 45 av. des Etats-Unis, 78000 Versailles France {kle,amok}@prism.uvsq.fr
Abstract. Active Networking is an innovative technology which can open the network and make it more flexible. But introducing active codes within the network increases the network vulnerability from the security point of view. The security is always considered as a separated layer from the other layers of the active network architecture. In this paper, we develop a global security architecture for a safe code distribution. A three level mechanism is defined to provide a unique identification, authentication, and classification of a code, according to its developer and its users. Index Terms: Active networks, Code distribution and identification, Network security.
1
Introduction
Beside the advantages of opening the network such as introducing quickly a new protocols and applications within the network, injecting a smart program in the network nodes can affect their performance. Moreover, this new task for the routers can deteriorate functions of forwarding packets and increase dramatically the security failures. Usually the security enforcement is operated in active network architectures as an upper layer added at the end of the design process. This enforcement is typically an admission control interface or a simple authentication mechanism based on Public Key Infrastructure (PKI). However, the security layer is not integrated into the lower layers like the code distribution and execution. Our goal is to build a global architecture where the security system is included in the conception of code distribution and execution systems. In our architecture, the security mechanism interacts with all actors which are concerned by the injection of the active programs within the network and their execution. These actors or entities can be identified as the code provider or developer, the code user and the code itself. Because they are external to the network, these three entities push us to define a three level security mechanism which defines a strong and a unique identification, authentication, and classification of the entities. The development of a global security mechanism implies the definition at the same time of a global architecture of code distribution. Both aspects are strongly linked and must be led seriously to implement a secure active network.
Corresponding author.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 168–181, 2009. c IFIP International Federation for Information Processing 2009
A Secure Code Deployment Scheme for Active Networks
169
The code distribution is the main key to the introduction and the implantation of the active network in the current Intranet and Internet. Without an appropriate code deployment scheme, in terms of performance (time of downloading, loading and executing the code) and security, the active network stays in state of a project without an effective implementation. We divide the code distribution into two main phases, the code identification and its deployment. Code identification consists in an universal mechanism which gives to the code a unique and non-ambiguous identifier which is independent of its developer and its users and making it easy to share. The code identification has not found yet a normalisation in the active network. Each project defines its own identification method. The code deployment phase consists of the policy to follow which allows the active programs to reach the desired network nodes. The code distribution security enforcement allows the right developer to publish its code, the network to check the safe execution of the new code on the nodes and finally the right user to deploy the code through the network nodes and to execute it. The three poles relation between the code, the developer and the user leads us to define a three poles security architecture. The security function at the developer level must at least check if this developer is authorised to publish its code. The developers of active codes must be authenticated and classified into different groups. The distinction between the groups is based on their relation with the network, nodes resources and the user applications. Different classes can be defined in order to give several levels of grants. The differentiation between the developers creates a set of code categories. Each category of program corresponds to one class of users. The security function at the code level verifies if the active program adheres to the security rules defined by the network. The safety requirements must be related mainly to the resources consumption and data access. Finally, the security function at the user level must allow the nodes to formally verify if the user, through its data packets, has the authorisation to execute the referenced code with few computation and communications. In this article, we define a new approach of code deployment and identification in active networks. Our objective is to provide a secure mechanism which guarantees a unique identification of the code and allows its use by other users in addition to its owner. This approach combines different techniques and is mainly based on the notion of the code server. This server is able to achieve other tasks such as the attribution to each application of a unique identifier, the verification of the conformity of the code, the authentication of code developer and user, and the storage and publication of codes. We study also the security issues to establish the minimum requirements for a secure active network. We propose a new approach which places our code server in the heart of a global architecture interacting with the developer, the code, the user and the node to set up the previous security functions. In the following section, we give an overview of the code distribution in current active networks. Based on the minimal requirements that are fundamental to build a secure active network environment, in Section 3, we define a new
170
L. Kloul and A. Mokhtari
approach for code security and distribution. We investigate the performances of our approach, in terms of throughput and latency, in Section 4. The results obtained are compared with those of the hop by hop technique developed for ANTS platform [15]. The related works are presented in Section 5. Our conclusions and the possible extensions of this work are pointed out in Section 6.
2
Code Distribution Overview
The code distribution consists in injecting active programs in a semi-closed system constituted of the network routers. This operation must provide an homogeneous mechanism of identification of the active programs inside and outside this system. We distinguish between two independent steps in the distribution of the code. The first step is the effective deployment of the new application or the active service in the concerned active nodes. The second is the identification of the code. 2.1
Effective Deployment of Active Code
The main key of active networking is to load dynamically a new code into the network nodes. The way the code reaches the node determines the chosen policy of code deployment in the network. The recent active networks projects fall onto two mechanisms to deploy their code, the capsule based code distribution and the switch based code distribution. Capsule based code distribution: In this approach, the code is transmitted, in band, at the same time as the data; the capsules carry the active code and the data. When an active packet arrives at the active node, this one loads immediately the code part of the packet and executes it using the corresponding execution environment. As the code is loaded in the nodes along the packet path, only the concerned nodes are programmed. Thus this scheme adapts itself dynamically to packets path changes. The inconvenient of this approach is that the presence of the active code in the data packets increases the packet length [4]. Consequently the response time (treatment and propagation time) is sharply increased. And we know that the network protocols are limited in term of packet size. To improve this aspect, a hop by hop based approach has been proposed in [15]. In this technique, a node receiving a capsule which does not contain the active code, extracts the reference of this code and checks its presence in the cache. If the code is in the cache, the node downloads it and executes it on the data. Otherwise, the node requests it from the active node previously visited by the packet (a previous hop address field is updated at each node). This process is repeated hop by hop. Switch based code distribution: This approach is characterised by two distinct phases. The first phase consists of sending, out of band, the active code to all nodes, through the possible data packet paths, generally one potential path for each user session. The second phase is the sending of the data packets. Here, the nodes are pre-programmed, because a code is sent before the data. With this
A Secure Code Deployment Scheme for Active Networks
171
approach, we can expect that unknown and several paths can be taken by user data packets. Thus we cannot know exactly the nodes to pre-program. Both approaches have the following characteristics in common: - An active code cannot be stored indefinitely in the active node. - It is difficult to share the active code. The user must ask for the code from its developer or to develop it himself again and deploy it in the network. - An active node cannot check itself the integrity and the authentication of the received code and its owner. The program verification requires a lot of processor time and memory space and its execution may penalise the node performance. - With the switch based approach we must consider two delays. The time to pre-program the nodes and the session time, i.e. the time to receive the user active data and to execute the corresponding active code. The session time may be small but we may expect the total time to be considerable. Similarly, with the capsule approach (hop by hop), the code downloading time and the session time are mixed. The session time includes the downloading time which may be considerable. 2.2
The Code Identification Issue
Because of the multiple code sources and the multilevel node architecture, a unique and multilevel identification is essential. The code identification must allow the code to be shared and reused by other users. The code identification has not found a normalisation in the active networks yet. Each project defines its own identification method. Currently several projects such as ANTS [16] and Switchware [2] use a three layers based architecture developed in [5]. In this architecture, the identification of the application data is specific to each session user. A data packet must contain the identifier of the active application (AA) and also the identifier of the corresponding execution environment (EE) which is able to interpret the AA. The reference of the EE is given only once by the Active Networks Assigned Number Authority (ANANA). The encapsulation packet ANEP (Active Node Encapsulation Packet), a standard in terms of format of packets, includes a dedicated field of 16 bits length to the EE identifier. At the arrival of a data packet to an active node, this node loads the EE which its identifier is specified in the packet. Once loaded, the EE must then trigger the loading of the corresponding active application. Each EE is basically able to manage several active applications at the same time, to load them in memory and to eliminate them according to user’s needs. This implies that, like the EE, each application must have a unique identifier. However as the identifier of an application is currently defined by its developer and each application can be developed by an independent user, the problem of the multiple identifiers can be inevitable. Indeed, because of the multitude of code sources, their independence and the absence of central regulation authority, situations where two active applications have the same identifier are not impossible. Moreover, as the application identifier is known only by its developer, the code of this application can hardly be shared with the other users.
172
L. Kloul and A. Mokhtari
The identification depends also on deployment techniques. In the capsule based technique, all data packets are active and identifies the couple AA-EE treating it. On the other hand, in the switch based technique, a filter is installed at the NodeOS level which is responsible of recognising the potential passive flows on which to apply the active code. The filter uses packet header information (packet type, source address and/or destination address, . . . ) to orient the packets. In the following section, we present a new approach which integrates code deployment and security. This approach constitutes a response to the problems of identification, verification and sharing of the code. It allows the authentication and authorisation of the entities manipulating this code.
3
A New Security Approach for Code Distribution
The requirements to deploy the code must consider the security at the level of three entities: the code developer, the code user and the code itself. We can summarise the minimal requirements as follows: - A unique developer identification, its authentication, and hierarchical developer rights to publish a code according to the code effect on the network. Moreover, it requires a certificate. - A unique user identification, its authentication, and hierarchical user permissions to execute a code. It also requires a certificate. - A unique code identification, a safety control mechanism and a key as a hash code. To satisfy these security requirements, we define a Code Identification and Storage Server (CISS). To be able to manage our three poles structure, a central authority acting as a code server plays a crucial role in the verification of the code safety and the authentication of both the developer and the user. The CISS associates a unique identifier to an application and places at the disposal of all network users this identifier and its code. With this technique, sharing the code becomes easier, because the user who wants to use an existing code has only to reference it. In our approach we assume that the certificates are delivered by CAAN (Certificate Authority for Active Network). It is an autonomous authority which delivers a certificate based on the PKI (such as X.509). The certificates must contain the public key, information about the holder and its class. Moreover, we consider a publication web site where a user can have knowledge about the existing active applications. This web site allows the visitors to compose their applications by choosing different code modules. The access to this web site takes into account the user rights. Only applications of the same or lower class are showed to the users. We established another security requirement to authenticate all routers, CISS, CAAN and the publication web site server by giving them a certificate or at least a couple of keys.
A Secure Code Deployment Scheme for Active Networks
3.1
173
User and Developer Classification and Authentication
We can expect that most end users will not program network nodes. The code users and the code developers can be different entities. For this reason we differentiate between who publishes the codes and who uses them. We classify the developers and the users into hierarchical classes according to their permissions and the executed code effects on the network. To simplify this model, we consider only two classes. These classes are mainly the user application class and the network application class. A typical example of the first class developer is the Internet Service Provider (ISP) manager and its customers are considered as users in this class. The code of this class can be executed on the user data only. The second class gathers the developers and users who are allowed to operate on the network level and can change the node behaviour by introducing new protocols and routing algorithm codes. This is the case, for example, of the network administrator. Note that each class can be divided onto several subclasses. Thus, node resources access depends on this classification. This distinction is a first fire-wall against malicious codes as it allows a first control of the code execution. It is managed by the CISS during the code publication phase. 3.1.1 The Identification The identification of the developer or the user must allow us to determine exactly to which class he belongs. The CISS requests the certificate from the CAAN. This certificate contains information about the holder (name, address, . . . ), its class and also its public key. The attribution of the developer identifier can be performed by this trust authority. 3.1.2 The Authentication The authentication process is based on the PKI. At the beginning of this process the CISS sends a private information (a simple random text to strengthen the process) to the developer or to the users. When the developer receives this information, it crypts it with his own private key and sends the encrypted information to the CISS. Using the developer public key, the CISS decrypts the received information. If the decrypted information and the original one match then the developer is authenticated. Otherwise, it is another entity acting on behalf the real one. Therefore, the developer or the user is strongly identified and authenticated and cannot deny or repudiate the code sent or execution. Moreover, the untrusted entity which is not identified (does not hold a certificate) cannot send, publish or deploy any code. 3.2
Code Distribution Phases
Our approach consists of three phases as follows: 1. Code publication phase: The application developer sends to the CISS its active code, in order to publish it. This phase is noted by 1 in Fig 1. After verifying the code safety using the Proof-Carrying Code (PCC) technique and authenticating the user certificate with CAAN (exchanging Certificate Check-CC message), the CISS sends an Active Code Identifier (ACI) (2 in Fig 1) to the developer.
174
L. Kloul and A. Mokhtari
PCC technique [13] is used for a secure execution of untrusted code. In our context, the CISS establishes a set of safety rules that guarantee a safe behaviour of programs. According to the developer class, the CISS requires different rules. The rules for user data level code developers are more restrictive than the rules for network level code developers. In the other hand, the code developer creates a formal safety proof that proves the code adherence to the required safety rules. Then, the CISS is able to use a simple and fast proof validation system to check, with certainty, that the proof is valid and hence the code is safe to be executed [13] by the nodes. It is possible to use a heavy code analysis mechanism to check memory use and CPU performed by CISS. If one of those tasks fails, the CISS sends a negative message to the developer. Once the ACI is sent to the developer, the CISS generates a unique key for the code (Kc) to be used when checking if a certain user has the right to execute this code. Once all information about the application service are available, a notification is sent to the publication web site (3 in Fig 1) which generates a web page containing the application identifier and a description of all its modules. 2. Code referencing phase: When a user wants to customise the treatment of the network nodes on its data packets, it connects to the publication web site to choose the application service and notes its ACI (4 Fig 1). With the ACI and its own certificate, the user requests from the CISS (5 Fig 1) the generation of a symmetric key (K) to be able to deploy the code. Once the CISS is sure about the user rights to deploy and execute the code by checking the user certificate (CC message exchanged with CAAN), it generates a key K and sent it to the user. We adapt a distributed symmetric key generation based credential technique, used by [10], in which the key is generated as a combination of the code key, user key, address source, the authorisation time and a validity period. The key is hashed by the CISS and sent to the users. With this technique the active node can verify the user authentication and authorisation without connecting to any third authority, without additional computation, and without holding any further users list. In [10], the code key is generated and shared frequently by authorisation authority and different code servers. To secure this generation, the Kc is now generated by CISS itself as a hash-code signed with CISS private key. This hash-code allows the node to verify at the same time the code integrity and CISS authenticity. The symmetric-key offers good performances because the node does not perform heavy computations or further communications. 3. Code deployment phase: The user sends its packets (6 Fig 1) with the reference of the application service and the generated symmetric-key K through the network. The node receiving such packets and holding the code referenced and its corresponding key (code key), reconstitutes a symmetric-key K’ using the code key sent by CISS, the user key, the source address, the session time and the validity period sent by the user. In the case where the node does not hold the referenced code and its code key, it must send a demand of code (7 Fig 1) to the CISS. Consequently, the CISS provides the node with the desired code and key (8 Fig 1). If both code keys match, the node can execute the code on the user data.
A Secure Code Deployment Scheme for Active Networks
175
Fig. 1. Active code distribution phases
This approach can be considered as an intermediate between the integrated and discrete approaches. Indeed the active code is downloaded out of band (discrete approach) from a certain server. The data packets (at least the first ones) arrive at the nodes before the code and can reference an inexistent code in the node cache (integrated approach). This method solves the problem of multiple paths that may be taken by the packets of the same application, which is a characteristic of IP networks. It allows also the node to not keep a code indefinitely because the codes are always retrieved from the CISS. It is important to note that, using an event based system, it is easy to consider the arrival of standard packets as an event and to treat them as an active. 3.3
The Multi-CISS Approach
The solution with one code server can present a major drawback, the CISS can become the bottleneck of the network. Indeed, it can be congested if a lot of nodes send requests at the same time. One approach to avoid the congestion of the CISS is to add other CISS servers in appropriate places in the network. The first step performed when a passive node becomes active is to detect all CISS present in its neighbourhood. Based on the RTT (Round Trip Time) counted from the set of CISS responses, the new active node orders its CISS neighbours according to their distance. When the search of a code is needed (the active code is not available in the active node), the node requests it from the first CISS in the list of CISSs. This one responds with a negative message or the active node does not receive a message after a notified RTT from this server, the active node sends a new request to the second CISS. This process is repeated for the other CISSs until the active node receives the appropriate active modules. To avoid the saturation of the network by multiple code packets sent by the CISSs, the active node which wants an active module can begin a negotiation protocol by sending, at first, a request for code message or notification of code
176
L. Kloul and A. Mokhtari
existence to the set of CISS. Each CISS able to satisfy the code request sends to the active node an equivalent Ack packet, otherwise, it sends an equivalent Nack packet. When the current node receives an Ack packets, it can send to the chosen CISS, and only one, a downloading request. At this time a CISS receiving the downloading request message sends to the node the code. We assume that negotiation protocol messages are small packets and their transfer in the underlying network is as fast as ‘syn’ messages. The next section presents another approach which is a mixed solution between CISS and hop by hop approaches. 3.4
The Mixed Approach
We expect that the CISS will be placed in the edge of the global network, in the sub-domains. For the nodes close to CISS servers (belonging to the same domain), the number of the routers separating them is very small and the RTT is also small. However, for the node in the other domain the path to CISS is longer and downloading the code from a nearest node is suitable. The CISS approach evolve easily with other techniques and can be combined with hop by hop scheme for its connectless and flexibility. In this combination CISS plays its classic role of attributing and saving the active codes and also has the role of the first and permanent source of codes. In this case we can consider either one or several CISS. The hop by hop code migration intervenes to reduce the load on the CISS and also to improve the code transfer time by choosing the nearest source. Moreover, the combination may be an alternative in the case where an active node cannot receive a code or receives a negative response from CISS. This can be due to, for example, one of the following reasons: the CISS is congested, the link is temporary broken or bottlenecked, the server is off or does not have the right code in the case of several CISS managing a distributed code base. Using this approach requires two steps which must be followed and the distinction between the first active node and the other nodes. 1. Code injection: this is the first time the user sends its data packets with references to the code, to the first attached active node. At this moment, the “previous hop” field of the data packets is empty. This first node is the first visited node and must download the code and the corresponding key (Kc) from the CISS if the active code is missing. Before forwarding the data packets, this active node puts its address in previous hop field. 2. Code migration: the next active nodes visited by the data packets look for previous hop field which determines the new source of the code, and the code request must be sent to the previous node. However, the other nodes must use this same code migration technique to find the code and will have to ask the CISS for the code only in the error case. The previous hop field should be updated at each node. When receiving the code and its key from the previous node, the current node can also perform the symmetric key verification to authorise the code execution. With this technique, the CISS is requested only once for all the data session.
A Secure Code Deployment Scheme for Active Networks
4
177
Performance Analysis
In this section, we are interested in the performance evaluation of the code deployment techniques presented previously. We compare the two approaches single CISS and multi-CISS that we developed for our platform ARFANet (Active Rule Framework for Active Networks) [3] and the approach hop by hop. We are interested particularly in the throughput of the network for the active packets and end-to-end latency for this type of packets. The considered topology of the network is composed of one CISS, three active nodes and three passive nodes. In the case of the multi-CISS approach, a second CISS is added to this topology. We varied the arrival rate of the packets to this network from 250 packets/s to 4500 packets/s for a size of the packets of 200 bytes. According to our analysis of the impact of the active packets introduction into a network on the standard packets, several proportions of active packets were considered [9]. This work showed that the proportion K = 20% of active applications does not deteriorate the active nodes performances with respect to the standard packets. Also, in this study, we retained the same proportion of active packets. This proportion includes the packets of data, programs as well as the code requests packets sent to the CISS. The data packets sent in the network can reference four different types of applications. These applications are published on the server for the CISS approaches and are provided by the source node for the hop by hop approach. Our tests were performed on Linux machines with a processor ”AMD Athlon(tm) XP 1500+” of 1339 MHz, 256 KB of cache memory and a 256Mo read-write memory. 4.1
The Throughput
Figure 2(a) shows the throughput of the network for the active packets according to the total arrival rate of the packets (active and passive) for the three techniques. These results show that although the throughput of the single CISS technique is slightly higher than the throughput of the technique hop by hop, both throughputs remain very similar. We can note that both throughputs are stabilised at 8500 active packets/s from a high arrival rate (2000 packets/s). It is interesting to note these results because the proportion of active packets arriving at the network is only of 20% i.e. 400 active packets/s when the total arrival rate is 2000 packets/s. This high throughput compared to the real arrival rate of the active packets is due to the fact that the first data packets of an application arriving at a node will have to wait until the node receives the corresponding code from the CISS (single CISS ) or from the previous node (hop by hop). Once the code received, an immediate processing is carried out on the data packets accumulated in the queue causing a successive sending of the treated packets towards the following node. This phenomenon is repeated for all the active nodes forwarding the packets, leading to a throughput which is much higher than the arrival rate. Consequently, along way of the data, the rhythm of migration of the active packets does not respect necessarily the rhythm of the source.
178
L. Kloul and A. Mokhtari Packet Throughput
Packet Latency
9000
450 One CISS Hop by Hop Several CISS Mixed
8000
400
7000
350
6000 300 Latency (ms)
Packet Throughput (packets/s)
One CISS Hop by Hop Several CISS Mixed
5000 4000
250
200 3000 150
2000
100
1000 0
50 0
500
1000
1500
2000
2500
3000
3500
4000
4500
500
1000
Packet rate (packets/s)
1500
2000
2500
3000
3500
4000
4500
Packet rate (packets/s)
(a) Throughput
(b) The latency Fig. 2. Numerical results
In addition, we observe that the throughput obtained with the multi-CISS approach is significantly lower than those obtained with the two previous approaches, in particular the single CISS approach. This is due to the presence of an additional CISS in this approach. This additional server allows decreasing the load from the first CISS and absorbing the part of the requests for codes sent from the nodes in its immediate neighbourhood. However, the first data packets of an application arriving at a node will wait the arrival of the code to the node less longer than in the single CISS approach. The packets will tend to less accumulate in the queues of the nodes. Thus the throughput is less significant than in the other approaches, although it is still higher than the arrival rate. The difference between the throughput of the network for the active packets and the arrival rate of this type of packets remains less significant than for for the other approaches. For an arrival rate of 400 active packets/s, the throughput is of 800 packets/, i.e. ten times lower than in the single CISS approach. We can note, when the arrival rate increases, the throughput increases too until reaching 5000 packets/s for a total arrival rate of 4500 packets/s (900 active packets/s). As conclusion, the more the arrival rate increases, the more both CISS are requested and the more the data packets wait in the nodes. 4.2
The Latency
In this section, we investigate the end-to-end latency experienced by the active packets for the three approaches. In Figure 2(b), although the nodes must request the code from the CISS which can be some routers away, the latency of the active packets is better in the single CISS approach than in the hop by hop approach. That can be explained by the fact that the active nodes in the second technique have to manage also the code requests emanating from other nodes. This additional task has an impact on latency. This figure shows also that the introduction of an additional CISS (muti-CISS approach) allows us to improve the end-to-end latency of active packets. As explained previously for the throughput, the second CISS takes in charge a part of the code requests, those coming from the active nodes in its neighbourhood. Clearly the management of the codes by
A Secure Code Deployment Scheme for Active Networks
179
one or more dedicated entities allows us to have better performances than when this task is achieved by the nodes themselves.
5
Related Work
Beyond the two main techniques, integrated and discrete, there are few works in the code distribution area. We can divide the current contributions into two main categories according to the use or not of a code server. The first category gathers the works in which a code server is used for an out-band code deployment (discrete approach). The fist technique introducing the code server notion for the storage and the publication of active programs is DAN (Distributed Code Caching for Active Networks) [6], which has been proposed for high performance active routers. Nevertheless, this work is neither motivated by a universal active code identification nor by proposing safety and security verification mechanisms. Future Active IP Networks project (FAIN) [11] is a component based approach. The components are stored in a code server and compose a service. The service is identified as a concatenation of the service provider name and the service name. Note that with this identification method the service stays strongly linked to his provider which must guarantee the service name unicity. The user data are oriented to the corresponding service by a channel installed at the NodeOS level without the need of the service reference. This manner limits service users to whom belonging to the service provider customers. The main difference between our code distribution system and FAIN is that this approach is dedicated to component based architecture rather than packet based one. Code Distribution Scheme for active networks (CDS), proposed in [17], is a set of recommendations for the design of Code Distribution Protocol (CDP). The authors recommend out-band code transport rather than in-band transport arguing on the large capsule size. They suggest also the store of AA with its code using the code server. Considering naming and name resolution method of an AA as a basic function of a CDP, the authors suggest that the name of an AA should be universal. The projects of the second category consider mainly in-band code deployment. The basic idea is the use of a unique key [15] hold by the code developer and provided by a trust certificate authority to create a unique reference. With the key, the developer defines a one way MD5 hash code with 128 bits length on code content. This approach can be useful in order to limit the number of code developers authorised to deploy their codes in the network. However, this approach makes hard the share and the re-use of the code by other users because the code reference is strongly linked to the code and its owner. The security must rely on how the code is distributed and the elements which take part in the process of injecting the code into the network (the developer and the user). Several works, prompted by the security working group [7], are leaning towards secure execution rather than secure deployment. Projects like SANE [1] and SANTS [12] provide a secure framework for active nodes. Another technique
180
L. Kloul and A. Mokhtari
adopts a different method to secure the execution of the code by restrictive active languages like PLAN [8] and SafetyNet [14]. ROSA [10] defines a user authorisation scheme based on the symmetric-key generation. The generated key corresponds to a key related to the code itself, shared between the code server and the authentication authority, and the session time and duration. The task cannot be performed only if the user has the grant to deploy the code. However, ROSA does not define identification and publication mechanisms and does not secure the developer side. We have adapted and improved this technique to be a part of our global security scheme.
6
Conclusion
The processes of code identification and deployment define respectively how the code is identified by the different entities of the network (nodes and users) and how it reaches the nodes. In this paper, we developed a new approach based on the classification of the developers and the users according to their rights in the network. Two mains categories can be distinguished. The first one can be the application level related particularly to the user data. The second category concerns the network level which is more restrictive and reserved to the network manager. We developed also a new mechanism for active code distribution based on the use of a code server (CISS) which is the central point of a three poles security architecture. The CISS attributes a unique identifier and checks the developer grants and code safety before its publication. Moreover, this server allows the user authentication and authorisation by the generation of a symmetric-key which involves the code key, the user key and a valid session time. This technique avoids to the nodes to perform a hard task of verification which arises a lots of computations and communications. The CISS mixes the advantages of the two approaches of code distribution, the approach based on the pre-programming of certain active nodes and the hop by hop which uses capsules of code. We compared CISS performances to the hop by hop performances, in terms of latency and throughput. The results obtained showed that the use of the code server gives better performances. Although our approach was applied in the context of our framework ARFANet [3], it can be used for any kind of active applications and in other existing platforms. The use of a Publication web site allows an easy access to all published application and their composition. An extension to this work would consist in analysing the different management techniques for distributed databases for a possible application within the case of several and distributed CISS maintaining an harmonious and reliable security protocol. In particular, we are interested in the number of CISS, their location in the network and the distributed code base management.
A Secure Code Deployment Scheme for Active Networks
181
References [1] Alexander, D., Arbaugh, W., Keromytis, A., Smith, J.: Safety and security of programmable network infrastructures (1998) [2] Alexander, D.S., Arbaugh, W.A., Hicks, M., Kakkar, P., Keromytis, A.D., Moore, J.T., Gunter, C.A., Nettles, S., Smith, J.M.: The switchware active network architecture. Departement of Computer and Information Science University of Pennsylvania (Juillet 1998) [3] Bouzeghoub, M., Kloul, L., Mokhtari, A.: A new active network framework based on active rules. Technical Recherche 21, PRiSM (2002) [4] Braden, R., Lindell, B., Berson, S., Faber, T.: The asp ee: An active network execution environment. In: Proceedings of DARPA Active Networks Conference and Exposition (DANCE 2002). IEEE CS Press, Los Alamitos (2002) [5] Calvert, K.: Architectural framework for active networks. Technical repport, AN Architecture Working Group (Juillet 1998) [6] Decasper, D., Plattner, B.: Dan - distributed code caching for active networks. In: IEEE INFOCOM, San Francisco (April 1998) [7] AN Security Working Group. Security architecture for active nets (2001) [8] Hicks, M., Kakkar, P., Moore, J.T., Gunter, C.A., Nettles, S.: Plan: A packet language for active networks. Departement of Computer and Information Science University of Pennsylvania (Juillet 1998) [9] Hillston, J., Kloul, L., Mokhtari, A.: Towards a feasible active networking scenario. Telecommunication Systems 27(2-4), 413, 438 (2004) [10] Calderon, M., Bagnulo, M., Alarcos, B., Sedano, M.: Providing Authentication & Authorization Mechanisms for Active Service Charging. In: Stiller, B., Smirnow, M., Karsten, M., Reichl, P. (eds.) QofIS 2002 and ICQT 2002. LNCS, vol. 2511, pp. 337–346. Springer, Heidelberg (2002) [11] Otsuki, H., Bossardt, M., Egawa, T., Plattner, B.: Integrated service deployment for active networks. In: Sterbenz, J.P.G., Takada, O., Tschudin, C.F., Plattner, B. (eds.) IWAN 2002. LNCS, vol. 2546, pp. 74–86. Springer, Heidelberg (2002) [12] Murphy, S., Lewis, E., Puga, R., Watson, R., Yee, R.: Strong security for active networks (2001) [13] Necula, G.C., Lee, P.: Research on proof-carrying code for untrusted-code security. In: Proceedings of the IEEE Symposium on Security and Privacy, Oakland (1997) [14] Wakeman, I., Jeffrey, A., Owen, T., Pepper, D.: Safetynet: A language-based approach to programmable networks. Computer Networks 36(1), 101–114 (2001) [15] Wetherall, D., et al.: Ants: A toolkit for building and dynamically deploying network protocols. In: IEEE OPENARCH, San Francisco (April 1998) [16] Wetherall, D.J.: Active network vision and reality: lessons from a capsule-based system. In: Operating Systems Review, 17th ACM Symposium on Operating Systems Principles (SOPS 1999), December 1999, vol. (34), pp. 64–79 (1999) [17] Zhou, Y., Zhang, Y., Lu, J.: Cds: a code distribution scheme for active networks. Computer Communications 27(3), 315, 321(7) (2004)
Securing AODV Routing Protocol in Mobile Ad-Hoc Networks Phung Huu Phu, Myeongjae Yi , and Myung-Kyun Kim Network-based Automation Research Center and School of Computer Engineering and Information Technology University of Ulsan, Ulsan Metropolitan City, 680-749, Republic of Korea [email protected], {ymj,mkkim}@mail.ulsan.ac.kr
Abstract. In this paper, we have proposed a security schema for Ad-hoc On-Demand Distance Vector (AODV) routing protocol. In this schema, each node in a network has a list of its neighbor nodes including a shared secret key which is obtained by executing a key agreement when joining a network. One key principle in our schema is that before executing route discovery steps in AODV protocol, each node executes message authentication process with the sender to guarantee the integrity and non-repudiation of routing messages and therefore, could prevent attacks from malicious nodes. Comparing with other recently proposed security routing protocols, our security schema needs less computation power in routing transactions and does not need any centralized element in mobile ad-hoc networks.
1
Introduction
The AODV routing protocol [1] is being considered by the Internet Engineering Task Force (IETF) for Mobile Ad-hoc Network (MANET) routing protocol standardization. AODV is improved from DSR [2] routing protocol and both of them are reactive routing protocols. In general, the AODV routing protocol fulfills the requirements of routing protocol in MANETs and it is efficient in terms of network performance. However, security aspects have not been considered in the protocol; attackers, therefore, can use many kinds of attacks via route discovery or path maintenance process such as advertising falsified route information, redirecting routes, launching denial-of-service attacks, sending falsified error reports and so on. Recently, a number of researches have been investigated for secure routing protocols in MANETs. Some focus on AODV and others examine general solution for secure routing in MANETs. Most routing security solutions make unrealistic assumptions about the availability of key management infrastructures that are in contrast with the very nature of ad hoc networks. Integrity of transactions between neighbor nodes, which is required to protect against fabrication attacks, is not examined in most of these protocols.
Corresponding author.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 182–187, 2009. c IFIP International Federation for Information Processing 2009
Securing AODV Routing Protocol in Mobile Ad-Hoc Networks
183
In this paper, we examine and discuss recent secure routing protocols in order to identify the flaws of current security approaches. Based on the analysis, a security schema for AODV routing protocol has been proposed to eliminate the security flaws in the protocol and compensate identified security weaknesses in recent secure routing approaches. The remainder of this paper is organized as follows. Section 2 examines current approaches of security routing, then the scope of problems which needs to be solved in this research is stated. In section 3, we detail our proposed schema to secure AODV. We discuss and analyze our schema in section 4. Finally, we conclude our contribution and specify future work in section 5.
2
Problem Statement
Lately, there are a number of solutions for securing routing protocols in MANETs. In this section, however, we briefly describe only two schemas: ARAN [4] and SAODV [5] since they are closely related to our approach. In [4], the authors categorized three kinds of threats which are modification, impersonation and fabrication in AODV and DSR. On the basic of these analysis, the authors proposed a protocol called ARAN (Authenticated Routing for Ad hoc Networks) using cryptographic certificates to bring authentication, message-integrity and non-repudiation to the route discovery process based on the assumption of existing of a trusted certificate server. It is not appropriate with ad hoc networks because it forms a centralized element. Moreover, in this protocol, because the source node cannot authenticate intermediate nodes in the routing path, intermediate malicious nodes can use error message attacks to networks. In [5], the authors extend the AODV routing protocol to guarantee security based on the approach of key management scheme in which each node must have certificated public keys of all nodes in the network. This work uses two mechanisms to secure the AODV messages: digital signature to authenticate the fixed fields of the messages and hash chains to secure the hop count field. This protocol uses public key distribution approach in the ad hoc network; therefore, it is difficult to deploy and computationally heavy since it requires both asymmetric cryptography and hash chains in exchanging messages. The protocol also did not consider the authentication of intermediate nodes; hence it could not prevent the attack of falsifying error messages in ad hoc networks. In general, the existing schemas for secure routing are based on the assumptions of the availability of key management infrastructures which are unrealistic and contrast to the ad-hoc network concepts. Moreover, these schemas do not consider intermediate nodes during the routing steps; therefore, the nodes may perform fabrication attacks. From these weaknesses of current approaches, our goal is to design a schema which performs point-to-point message authentication without a deployed key management infrastructure.
3
The Proposed Security Schema for AODV
The principle of our schema is that messages in AODV must be authenticated to guarantee the integrity and non-repudiation so that the protocol can be prevented
184
P.H. Phu, M. Yi, and M.-K. Kim
against several kinds of attacks. Each node in a network has its own a pair of public key e and private key d following RSA Public-key Crypto-system [6] by selfgeneration, and each node contains a list of neighbor nodes with records containing the information of a neighbor node including neighbor address, neighbor public key, and a shared secret key. This information is formed after the key agreement between two neighbor nodes to negotiate a pair of keys and a shared secret key. The details of our security schema for AODV are described as the following sections. 3.1
Key Agreement Process between Neighbor Nodes
A node joining a network requires to send key agreement messages to its neighbors to negotiate a shared secret key. The concept of this process is based on HELLO message in ad-hoc routing protocols. The node broadcasts a message indicating the negotiation request with neighbor nodes: . On receiving this request, nodes reply a message: (where eS and eN are the public key of the sender node and replying node, respectively; request id is a sequence number generated by the sender node) to indicate the receiving of the request message and inform that it is ready for the key agreement process. For each received message, the request node creates a new record in its neighbor list. Each record contains filled neighbor address and filled neighbor public key; the other fields of the record are empty. For each new record in the list, the request node (A) negotiates a secret key with the neighbor node (B) by the following steps: 1. 2. 3. 4.
Generate a key Ks by using a secure random number generator, Encrypt Ks with eB (node B’s public key) = encrypteB (Ks ), Send an offer message to B, Wait ACK from B and check message integrity to finish the negotiation
When node B receives the offer message, it decrypts encrypteB (Ks ) by its private key (dB )to get the shared key Ks . Then, node B sends the ACK message to indicate successful shared secret key negotiation, where hKs (request id) is the hashed message of request id by the shared key Ks . Since RSA algorithm is used in the negotiation, the confidentiality of the shared key is guaranteed between the two nodes. The shared key is used for authenticating messages between two adjacent nodes later in AODV routing protocol. In the case a node does not have a shared key with its neighbor nodes, it can not participate in routing transactions. 3.2
Route Request
Route request (RREQ) is initiated by a source node (S) and then propagated by intermediate nodes until the message reaches its destination node (D). On receiving RREQ, an intermediate node I, according to AODV routing protocol, checks whether the message will be re-broadcasted or not. If the message needs to be re-broadcasted and the sender is in node I’s neighbor list,
Securing AODV Routing Protocol in Mobile Ad-Hoc Networks
185
it will send (unicast) a message to request the authentication process from the sender: . When receiving the authentication request, the sender creates an authentication reply message containing where hashKs (RREQ) is the hashed value of RREQ message by the shared key Ks between the two nodes. The authentication reply message is unicasted back to node I. Node I on receiving the message will check the integrity of the RREQ message by hashing the message with using the shared key Ks and then comparing with the received hashed digest. If the comparison is successful (the integrity of the RREQ message is guaranteed), node I continues steps following AODV such as set up reverse path, increase the hop count, rebroadcast the message and so on; otherwise, the RREQ will be discarded. The process continues until the message reaches the destination. The destination also authenticates the sender of RREQ (neighbor of the destination) by the same procedure.
Fig. 1. Illustration of the message authentication
3.3
Route Reply and Route Maintenance
Route replies (RREP) in AODV are also targets for attacks by malicious nodes. In our schema, when receiving a RREP, a node requests the sender to proof the integrity and non-repudiation of the message by sending an authentication message . The request for authentication is and the reply is where hashKs (RREP) is the hashed value of RREP message by the shared key Ks between the two nodes. After the authentication process is successful, a node continues to the steps in AODV, otherwise, the node drops RREP since it is invalid. In route maintenance process, only route error report message (RERR) is a target for attacks in AODV protocol. Our schema requires the authentication process in sending route error messages to prevent attacks from malicious nodes. The authentication request and response for RERR is , and , respectively.
186
4
P.H. Phu, M. Yi, and M.-K. Kim
Security Analysis
Our schema proposes a new fully distributed authentication process which does not require any third parties. The schema does not supply the confidentiality but it provides the integrity and non-repudiation of messages. Our schema has a similar approach with SAODV or ARAN in supplying the integrity and nonrepudiation of messages. However, it uses point-to-point authentication process, therefore, it can authenticate intermediate nodes in routing steps and it does not require a certificate server (like ARAN) or assumption of key distribution (SAODV). By supplying integrity of exchanging messages, our schema can prevent against attacks from malicious nodes. A malicious node can not forms loops by spoofing nodes thanks to authentication process between neighbor nodes. Based on the integrity of exchanging messages, the schema also can prevent falsified error messages or modification attacks during route discovery process. However, the end-to-end authentication process has not been considered yet in our current work, some kinds of attacks such as impersonating a source node, a destination node, or neighbor of destination could not been prevented if malicious nodes comply with the proposed procedure. It is assumed that after time of working, trust relationship between two neighbor nodes will be established; a node that continues to perform malicious activities will be detected and excluded from trust list and therefore, so it can not participate in future routing. This aspect is expected to be studied in future. In general, the proposed security schema needs heavy computation when a node joins a network, but during routing transactions, it needs less computation power compared to existing approaches since it only uses hash algorithm to authenticate between two nodes and it does not need a centralized element in the network which causes heavy computation in the existing secure routing protocols. In our opinion, the proposed schema is more appropriate to the MANETs since there is no centralized element. This approach differs from most routing security solutions which make unrealistic assumptions about the availability of key management infrastructures that are in contrast with the very nature of ad hoc networks.
5
Conclusions and Future Work
Our work focuses on AODV routing protocol, which is under consideration as a standard for routing in MANETs. A security schema for AODV has been proposed to prevent common kinds of attacks and compensate for the security flaws of recent related works. The approach of our work is that exchanging messages in AODV are required to be authenticated in point-to-point step by using hash chains during a transaction. When joining a network, a node must execute key agreement with neighbor nodes so that each two neighbor nodes have a shared secret key. Before executing steps in AODV, each node performs message authentication process with the sender by requesting hashed digest value
Securing AODV Routing Protocol in Mobile Ad-Hoc Networks
187
of the message and then checking the integrity and non-repudiation of routing messages from the hashed message in order to prevent attacks from malicious nodes. However, some kinds of attacks such as tunneling attacks or selfishness problems (so far, no security schema has been able to detect these [5]) have not been considered in this work. This work has proposed a point-to-point and fully distributed authentication approach for securing AODV. It can compensate for weaknesses in SAODV or ARAN as above-mentioned since it can authenticate intermediate nodes in transactions and it does not need any centralized element such as certificate server. Recently, this work just focuses on AODV routing protocol; however, future work will investigate on applying the schema to other routing protocols in MANETs. The implementation and simulation of the schema will be investigated to compare security features with similar approaches in particular kind of attacks. The end-to-end authentication procedure will be added to the current approach in order to improve our current schema.
Acknowledgments The authors would like to thank Ministry of Commerce, Industry and Energy, Ulsan Metropolitan City, University of Ulsan, and the Network-based Automation Research Center (NARC) which partly supported this research. The authors also thank Prof. Hoon Oh (University of Ulsan) and the anonymous reviewers for their carefully reading and commenting this paper.
References 1. Charles, E.P., Elizabeth, M.R.: Ad hoc On-Demand Distance Vector Routing. In: Proc. of the 2nd IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, pp. 90–100 (1999) 2. Johnson, D., Maltz, D.: Dynamic source routing in ad hoc wireless networks. In: Imielinski, T., Korth, H. (eds.) Mobile computing. Kluwer Academic Publ., Dordrecht (1996) 3. Refik, M., Pietro, M.: Security in ad hoc networks. In: Conti, M., Giordano, S., Gregori, E., Olariu, S. (eds.) PWC 2003. LNCS, vol. 2775, pp. 756–775. Springer, Heidelberg (2003) 4. Kimaya, S., et al.: Authenticated routing for ad hoc networks. Journal on Selected Areas in Communications 23(3), 598–610 (2005) 5. Zapata, M.G., Asokan, N.: Securing Ad hoc Routing Protocols. In: Proc. of the ACM workshop on Wireless security, Atlanta, USA, pp. 1–10 (2002) 6. Man, Y.R.: Interner Security-cryptographic: principles, algorithms and protocols. Wiley Publishing House, Chichester (2004) 7. Panagiotis, P., Zygmunt, J.H.: Secure Routing for Mobile Ad hoc Networks. In: SCS Communication Networks and Distributed Systems Modeling and Simulation Conference (CNDS 2002), San Antonio, TX (2002)
Extensible Network Configuration and Communication Framework Todd Sproull and John Lockwood Applied Research Laboratory Department of Computer Science and Engineering: Washington University in Saint Louis 1 Brookings Drive, Campus Box 1045 St. Louis, MO 63130 USA http://www.arl.wustl.edu/arl/projects/fpx/reconfig.htm
Abstract. The effort to manage network security systems has increased in complexity over the past years. Network security for a company, university, or government agency can no longer be provided using a single Internet firewall or Intrusion Prevention System (IPS). Today, network administrators must deploy multiple intrusion detection and prevention nodes, traffic shapers, and firewalls in order to effectively protect their network. As the number of devices increases, maintaining a secure environment becomes difficult. This paper presents an infrastructure for control, configuration, and communication between heterogeneous network devices. The approach presented uses a Publish/Subscribe model built on top of a peer-to-peer overlay network in order to distribute information between network intrusion detection and prevention devices.
1
Introduction
Network administrators have become overwhelmed by the task of securing their networks against attacks on hosts in their network. End hosts are difficult to protect because they can be subverted via attacks against flaws in their operating system, trojan programs, and misconfiguration by users. Firewalls, Network Intrusion Detection Systems (NIDS) and Intrusion Prevention Systems (IPS) integrated within the network help protect against exploitation of such flaws. Many organizations also deploy traffic shapers to help organize bandwidth more fairly. Today, network administrators spend much of their time patching or changing device configurations to prepare or recover from the latest computer worm or system exploit. The manual configuration and administration of each device contributes to down time of the network. It is difficult to manage a group of firewalls, NIDS, IPS, and traffic shapers because today’s different devices each user proprietary management infrastructure. Developing a common framework for these devices to communicate would greatly reduce the time associated with control and configuration of all systems in the network. In this work, we propose a unified management network that exchanges information using an eXtensible Markup Language (XML). D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 188–193, 2009. c IFIP International Federation for Information Processing 2009
Extensible Network Configuration and Communication Framework
189
The proposed infrastructure allows heterogenous nodes to communicate via XML messages sent over an adhoc, Peer-to-Peer (P2P) overlay network. Experimental results demonstrate the performance of the overlay forwarding high level security alerts.
2 2.1
System Architecture Services
Nodes in the overlay network subscribes to services of interest. Nodes subscribe to the services by issuing discovery queries to the network. Nodes implementing services respond with a list of the services they offer as well as advertisements of other peers in the group. Services include, but are not limited to: rule updates for intrusion detection and prevention, signatures distribution for viruses and trojan horses, anomaly detection and SPAM filters. In general, these services run in hardware or software in the network infrastructure and communicate through an overlay network. 2.2
Overlay Network
The P2P overlay network is developed using JXTA [11]. JXTA provides an open infrastructure allowing developers to create P2P networks operating across a variety of platforms. The original implementation of JXTA was developed in JAVA. JXTA-C and JXTA for mobile devices also exist. The default message distribution broadcasts messages to communicate with nodes. JXTA also supports a two tiered hierarchy for the overlay network, Which could be used to reduce the amount of traffic forwarded between nodes when discovering services. 2.3
Connecting JXTA with Services
In order for JXTA to interface with existing software running on hosts that provide network services, a mechanism was needed to hook into applications that need to communicate. This was accomplished through a generic wrapper around the process implementing the security service. Updates to the peer sent from an administrator or authorized node are processed by the peer. The peer reformats the control and configuration commands for the specific application executing the service. Commands are redirected to a configuration file and the application level process is restarted.
3
Implementation
Table 1 identifies several types of security services and how each maps to a particular platform. Services were implemented for three types of network processing platforms: a Linksys Wireless Router, (model WRT54GS) [5], a workstation running Linux and a Global Velocity GVS1500 Extensible Network Switch [1]. The applications which run on these platforms include Intrusion Detection or Prevention, Quality of Service, and Anomaly Detection.
190
T. Sproull and J. Lockwood Table 1. Security Applications for each respective platform Wireless Router Workstation Extensible Switch Intrusion Detection Snort with Snort or Bro FPGA Snort or Prevention limited ruleset Lite Quality of Service Linksys QoS HTB FPGA Queue Support Manager Anomaly or None SPADE FPGA Worm Event Detection Detector
In this work we modified a Linksys wireless router to serve as a node in the network-wide managed infrastructure. The wireless router consists of a 54 MBit/sec WiFi connection, a 4 port 100Mbit/sec Ethernet switch and one external 100Mbit/sec uplink connection. The Linksys router uses a 200 MHz embedded processor with 32MBytes of RAM. An experiment has been discussed demonstrating the IDS software Snort [10] running on the Linksys router in [3]. This implementation of Snort on the Linksys router is limited in terms of the number of rules it supports and the rate at which it processes requests. The router provides native support for application and port based Quality of Service. Performing anomaly detection at the wireless router was not the intended purpose of the device and only works to a marginal degree because of limited CPU and memory resources. Anomaly services are better implemented using higher powered nodes. A workstation configured with the Linux operating system is another platform we support for distributed management. Linux workstations are commonly used by network administrators implementing IDS. Networks running IDS software operate at rates measured in the 100’s of Megabits/sec range. Two open source software tools perform IDS, Snort and Bro [9]. Linux systems can be used to provide some types of Quality of Service. The Hierarchical Token Bucket (HTB) packet scheduler [2] is found in standard Linux distributions from 2.4.20. Anomaly detection can be performed using Linux with the Statistical Packet Anomaly Detection Engine (SPADE) [4]. SPADE is a Snort preprocessor plugin which sends alerts through the standard Snort reporting mechanisms. The Global Velocity GVS1500 system uses the FPX hardware platform [7] to process packets in reconfigurable hardware at Gigabits/second rates. is a reconfigurable hardware device able to process traffic at gigabit link rates using FPGA’s. The device also contains a single board computer executing software applications. FPGA Circuits have been developed for FPX modules that process Snort rules operating in both Intrusion Detection and Prevention modes. This work was presented in [6], and demonstrated Gigabits/sec header and payload processing of TCP streams. An application to detect worm activity was presented in [8]. This approach demonstrated the use of Bloom filters to maintain statistics on commonly occurring content.
Extensible Network Configuration and Communication Framework
4 4.1
191
Experimental Results High Level
In order to test the capabilities of the overlay network, experiments were performed to measure the overhead of JXTA and the publish/subscribe model for nodes. Specifically, the time to process alerts and forward relevant information to nodes or groups of nodes interested in the alerts or aggregation of alerts. Figure 1 illustrates the network created to test the overlay network. The topology consists of an XML generator host, a host which advertises and publishes alerts, the server, and clients interested in subscribing to the server. The XML generator sends alerts to the server which are forwarded to subscribers. XML Generator
Server/Publisher Clients/Subscribers
100Mbit/sec
100Mbit/sec
Workgroup Switch
CiscoSystems
Catalyst
10.1.2.X
Workgroup Switch
CiscoSystems
Catalyst
10.1.1.X
Fig. 1. Network topology constructed in Emulab
4.2
Experiment Setup
The Emulab [12] environment is used to test the performance of the P2P overlay network. The nodes in the experiment consist of three 2Ghz Pentium 4 computers executing the Linux 2.4.20 kernel. The links between each node are set to 100Mbits/sec with 0ms delays between links. The XML generator injects 1000 144byte XML alert packets at various link rates. JXTA version 2.3.3 is used on all hosts. Communication between the nodes is through the JXTA API, using unreliable unicast pipes on top of TCP. JXTA pipes provide a mechanism to connect the input of one service (or node) to the output of another. 4.3
Results
Figure 2 represents the number of packets per second the generator is able to inject while avoiding packet loss at the client. The client is subscribed to one service, which consists of every alert from the XML generator. The number of dropped alerts increases with the increase of packets per seconds from the XML generator. At approximately 55 packets per second, JXTA is able to forward all 1000 alerts to a client. Figure 3 illustrates the percentage of alert traffic dropped as the number of subscriptions increase. For each subscription an additional copy of the alert is generated. For example, when seven clients are subscribed, JXTA attempts to send out 1000 alerts to each of the seven clients (7000 total alerts). In this example, the clients are individual nodes, however they are capable of acting as
192
T. Sproull and J. Lockwood
100 90 80
Packet Loss %
70 60 50 40 30 20 10 0 0
100
200
300
400
500
600
700
Packets per Second
Fig. 2. Percentage of dropped as the number of packets per second increases
100 90
Packet Loss %
80 70 60 50 40 30 20 10 0 1
2
3
4
5
6
7
8
9
10
Number of Subscribers
Fig. 3. Percentage of dropped packets as the number of subscribes increases
rendezvous peers connected to a large group of nodes, similar to the top tier of a two tier overlay network. From the graph, we observe that with two subscriptions the number of alerts the server processes and forwards is cut almost in half. As the number of subscriptions increase, the total amount of alerts generated increases, however the number received at each client continues to decline. During both of these experiments the CPU utilization of both the client and the server operated around 60-70%. The main reason for such low performance stems from the amount of overhead necessary for a single JXTA alert message. In order to send a message, the server first creates a JXTA unidirectional output pipe to connect to the client. A reason for this is to ensure that the client is alive, as JXTA makes no assumptions about reliable nodes. The overhead associated with this however, is around 40ms per alert. The overhead of transmitting the JXTA alert was observed with tcpdump. Nine packets were transmitted between the client and server to create the output pipe and deliver the message. Despite this drawback, optimizations exist to increase performance. The authors chose an implementation consist with examples provided by the JXTA tutorials. Maintaining individual state per client should increase overall performance.
Extensible Network Configuration and Communication Framework
5
193
Conclusion
The goal of this initial research is to investigate techniques for deploying services in the network for heterogeneous communications. Migrating to an open XML based solution imposes a fair amount of overhead as observed through the experiments. Characterizing the overhead and investigating appropriate uses of this technology is necessary. Moving forward, the goal is to developing more automated network management using open standards targeting network security.
References 1. Global Velocity, http://www.globalvelocity.info/ 2. Heirarchical token bucket, http://luxik.cdi.cz.devik/qpos/htb/ 3. IETF Simple public key infrastructure (spki) charter (September 2003), http://www.batbox.org/wrt54g.html 4. Spade - statistical packet anomaly detection engine (2004), http://www.computersecurityonline.com/spade 5. Linksys (2005), http://www.linksys.com 6. Attig, M., Dharmapurikar, S., Lockwood, J.: Implementation results of bloom filters for string matchings. In: FCCM, Napa, CA (April 2004) 7. Lockwood, J.W.: Evolvable Internet hardware platforms. In: The Third NASA/DoD Workshop on Evolvable Hardware (EH 2001), pp. 271–279 (July 2001) 8. Madhusudan, B., Lockwood, J.: Design of a system for real-time worm detection. In: Hot Interconnects, Stanford, CA, pp. 77–83 (August 2004) 9. Paxson, V.: Bro: a system for detecting network intruders in real-time. In: Computer Networks, Amsterdam, Netherlands, vol. 31(23–24), pp. 2435–2463 (1999) 10. Roesch, M.: SNORT - lightweight intrusion detection for networks. In: LISA 1999: USENIX 13th Systems Administration Conference, Seattle, Washington (November 1999) 11. Traversat, B.: Project jxta 2.0 super-peer virtual network 12. White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., Joglekar, A.: An integrated experimental environment for distributed systems and networks. In: Proc. of the Fifth Symposium on Operating Systems Design and Implementation, Boston, MA, pp. 255–270. USENIX Association (2002)
A Model for Scalable and Autonomic Network Management Amir Eyal and Robin Braun Institute of Information and Communication Technology, Faculty of Engineering, University of Technology, Sydney {Amir.Eyal,Robin.Braun}@uts.edu.au Tel.: +61 2 9514 2460; Fax: +61 2 9514 2435
Abstract. Current telecommunication network management systems rely extensively on human intervention. They are also prone to fundamental changes as the managed network evolves. These two attributes, combined with the growing complexity of networks and services, make the cost of network management very high. In recent years, we have witnessed the emergence of artificial intelligence applications. Some are aimed at the creation of autonomic network management systems. This paper offers a novel approach to the design of a network management system that incorporates intelligent agents. As a benchmark to this model, we use two approaches most widely in use in network management systems today. The focus of this paper is on synchronization issues, service discovery and policy enforcement. Keywords: Autonomous Network, Network Management, Scalability, Information Model.
1
Introduction
Telecommunication networks are by definition complex systems [5]. They consist of large numbers of components and the number of services and activities taking place is even larger. The behaviour of the network is according to a changing set of rules that is hard to accurately predict. Given the complexity, it is a difficult task to maintain and manage these systems. Moreover, the complexity has been growing and is expected to grow vastly with the introduction of new types of devices, new services and networks, and with the standardization of service differentiation [4, 6]. Substantial efforts are being put into automating management tasks. This includes the development of algorithms, protocols and management tools. By having some of the labour done automatically, the network operator can cut down on the resources spent on maintenance and management [1, 2]. It will still
The authors are with the Institute for Information and Communications Technology, Faculty of Engineering, University of Technology, Sydney. They are members of the Teleholonic Systems Research Group. http://teleholonics.eng.uts.edu.au. Amir Eyal is a doctoral candidate. Robin Braun is Group Leader.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 194–199, 2009. c IFIP International Federation for Information Processing 2009
A Model for Scalable and Autonomic Network Management
195
be required for network managers to perform some manual tasks, and to coordinate and supervise the automated tasks. By definition, the managed component has to behave in a certain pattern in order for the management algorithms to be effective. Any deviation from this pattern would mean efficiency degradation or a malfunction. To overcome that, the algorithm needs to be adaptive. This attribute in the management system would transform it from automatic to autonomic [5, 4]. A central part of the network management system (NMS) is the information layer (IL). The IL’s purpose is to represent the state of the network. Parameters in the information layer correspond to settable parameters in the network. The meta-tructure is an information model. It consists of two parts, informational, containing the network status, and operational, which has the ability to affect the status of the network. We use two opposite approaches in network management as benchmarks the distributed model and the centralized model. The distributed model with a distributed information base is epitomized by SNMP to control and monitor network devices like switches. The CIM [3] is a comprehensive example for the centralized model. This paper offers a novel model for the information layer that enables a strong binding by utilizing autonomic agents that perform the synchronization tasks. We show that it bridges over the differences between the standards in use to utilize the best suited standard for each component in a natural way. It also enables the use of intelligent agents that perform several duties. The paper proceeds as follows: section 2 is a survey on current management systems and models. Section 3 describes the autonomic network management system model. Section 5 describes our model, followed by a detailed description of the information layer (IL) and how it addresses the challenges presented above.
2
The Information Model
Information models divide to two groups or models. One model is the distributed information model. It can be seen in management systems like SNMP [7], where each network element contains its own parameters. A management system can perform a “SET” operations to change the state of the network, and a “GET” in order to check a certain parameter. Due to the differences between network elements, the structure, the hierarchy, of the information may differ from one element’s storage to another’s. Thus, it is also required of the system to know the structure of the informatio n model contained in each network element. Another constraint of this model is that information must be related to the physical element it exists on. This means that a parameter that cannot be associated with a specific network element does not get the appropriate representation. On the other hand, this approach is scalable, since the larger the network, the more distributed the MIB is. With centralized model, network management tools like CIM provide more versatile capabilities. CIM is a common definition of management information for
196
A. Eyal and R. Braun
systems, networks, applications and services [3]. Since its a centralized database (or distributed between a set of management stations) we can map logical entities properly. However, issues like synchronization and scalability rise, since all the weight of the network management falls on a small number of management stations and its corresponding databases.
3
The Hybrid Model
Our model’s contribution is twofold. Firstly, it remaps the network components into the TMF 4 layer structure [4], combining it with a 3 columns (vertical layers) of the management system. Secondly, it adds the concept of intelligent agents performing most of the duties in the management system. The usage of artificial intelligence concepts and tools, like JADE, which provides mobile agents, can alleviate accuracy and scalability issues, reduce delays and spare most of the effort done by the human network administra tor. It may also reduce the development cost of network management tools. The block representation of our two-dimensional autonomic network management model is presented in Figure 1. By dividing the network into 4 layers, the network can be fully represented in the information layer [4]: – – – –
Resource Layer - storage devices and available bandwidth. Services Layer - services offered by the system. The services rely on use resources in the layer below. The products and their components are mapped to the uppermost layers.
Fig. 1. Block Diagram of the Layered Model
In addition to the 4 horizontal layers, there are 3 vertical management layers. The first provides objectives to the system. An objective is an ambition that produces management actions. The middle layer translates those objectives into network activity. The rightmost layer is split in two parts. One stores the state of the network, corresponding to the 4 horizontal layers, while the other is in charge of maintaining integrity between the information storage and the network.
A Model for Scalable and Autonomic Network Management
4 4.1
197
Issues with the Information Binding Requirements
The flow of commands in the management systems is as follows. Any activity initiated by the objectives layer and translated by the management layer into changes in the information store, has to be reflected correctly in the network. It has to happen with as little delay and as much accuracy as possible. This is the IL ⇒ Network binding. On the other hand, any event that changes the state of the network needs to be visible to the management and objective layers. This means we need a binding in the other direction, IL ⇐ Network. The following discusses the inner structure of the synchronization block and the ways to achieve the IL ⇔ Network goal. In order for a network management system to become autonomous, it needs a number of prerequisite. – It has to be capable of incorporating specially tailored artificial intelligence algorithms. – It has to interface with intelligent agents. – It has to have a strong binding between the management systems information layer and the network parameters. 4.2
Structure
As shown in the logical representation in Figure 2(a), the management system has two parts, the management activity and the synchronization activity. The management activity is making and carrying out management decisions. This may be done by an autonomous decision making process or by a network administrator. Figure 2(b) shows the actual layout of the network management system with a physical network. The network consists of both physical elements (resources) and logical ones (services and products). Furthermore, the physical elements can be of any type. The synchronization activities taking place in the management system are separated from the actual management activity. The synchronization block is divided into three parts. The upper part is responsible for collecting data from the network. It is home to foraging agents that roam the network performing service-discover y activities. The lower part of this block is in charge of enforcing those management decisions taken in the management block, on the network, the IL ⇒ Network binding. It multiple types of agents and algorithms. The middle part interfaces between the two former parts and the management block. In events of an initiative from the management block, the interface part has to translate it to input for the algorithms that run in the lower part. Whenever the higher part comes up with new discovered data, the interface will perform the corresponding changes in the information layer. It is important to say that parts of this block are likely to be mobile agents, swarming the network devices themselves. The combination of an information layer and a synchronization layer would form a “terrain” or “environment” for the agents in the management layer, which take on the role of inhabitants of that terrain.
198
A. Eyal and R. Braun
Fig. 2. The hybrid system
4.3
A Hybrid System
A comparison of the two leads us to the conclusion that in the hybrid model, we would want to include as many high-level parameters as possible, while having the bulk of the lower-level ones distributed. That way we can enjoy the possibility of optimization in the centralized management layer and leave the IL scalable. The main performance issues we want to measure in the model for the autonomous management system is the bandwidth used for management. It is closely related to the number of messages passed between the IL and the network. In such case, the more centralized the model, the smaller the number of low-level commands and the total number of messages passed will be.
5
Possible Implementation
We need to implement two classes of intelligent agents. One class will contain agents capable of collecting the state of those parameters. The other class is of agents that affect the parameters within the domain. They synthesize changes in the information store and execute them on the network. Each class may consist of multiple agent types. In some cases, one type of agents may be responsible for more than one parameter or for both collecting a parameter’s state and setting it. Finally, after establishing a kind of sandbox that is able to perform network management duties autonomically; we can set to out explore ways to transform the management itself into an autonomic process. We might end up using a 3rd class of intelligent agents, a genetic algorithm, or any combination of those. At the this time, we have started with an email service. We mapped the email box into resources and services that it consists of, shown in Figure 2. We designed agents that perform the setting and the retrieving of parameters related to the service level of the email on multiple mail servers. For that particular service, we need to control on each server the list of users and passwords and be able to manage the quota allocated to each user. We are going to evaluate the management system using numerous measurements. The most obvious parameter would be the integrity of the system. Here,
A Model for Scalable and Autonomic Network Management
199
we want to see that the network stabilizes in the right state and that changes in the network’s state are reflected properly in the IL. Secondly, we will look at the delay, the period of time it takes the system to enter the state of stability. We will also notice the resources used in the process of performing a management act (i.e. number of messages, number of procedures, storage used, etc.). An important issue to check is the scalability of the management system. The measurement will be done by experimenting with different scales of networks, growing number of operations and user requirements and checking the rate of growth in used resources and of performance degradation experienced.
6
Conclusions
We have discussed the different model structure options in the design of an autonomic network management system with the emphasis on the Information Layer. We suggested that the appropriate model might be of a hybrid nature between the distributed and centralized models. We are going to test the effectiveness and efficiency of the model for supporting autonomic algorithms according to a set of parameters that include the efficiency and grade of results.
References [1] Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence - From Natural to Artificial Systems, Santa Fe Institute (1999) [2] Holland, J.H.: Hidden Order - How Adaptation Builds Complexity (1996) [3] DMTF, Distributed Management Task Force [4] Braun, R.: Towards a Management Paradigm for Autonomic Communications (2005) [5] Ottino, L.A.N.A.,, J.M.: Complex Networks, Augmenting the Framework for the Study of Complex Systems. The European Physical Journal, B 38, 147–162 (2003) [6] Magrath, S., Braun, R., Cuervo, F.: Policy Interoperability and Network Autonomics. In: 7th International Symposium on Autonomous Decentralized Systems (2005) [7] Strauß, Klie, T., Frank: Integrating SNMP Agents with XML-based Management Systems. In: XML-based Management of Networks and Services in the IEEE Communications Magazine (2004) [8] Bellifemine, F., Caire, G., Poggi, A., Rimassa, G.: JADE - A White Paper (September 2003)
Intelligibility Evaluation of a VoIP Multi-flow Block Interleaver ´ Juan J. Ramos-Mu˜ noz, Angel M. G´ omez, and Juan M. Lopez-Soler Signal Theory, Telematics and Communications Department University of Granada, Spain [email protected], [email protected], [email protected]
Abstract. This work contributes to demonstrate what perceptual benefits can be expected by adding some processing capabilities to the network nodes for the class of interactive audio streaming applications. In particular, we propose a new voice stream multi-flow block interleaver and we show that it provides an intelligibility performance very close to the reference end-to-end interleaver, even under conditions where the end-to-end interleaving is unfeasible. Keywords: VoIP subjective evaluation, interleaving, speech recognition.
1
Introduction
It is well established that in streaming voice applications packet losses are more harmful as they are consecutive, since the subjective quality degradation increases as the burst length increases [1]. Based on this fact, to improve the VoIP perceived quality some procedures should be considered to combat the unwished bursty-error-prone nature of the Internet. To this end, by using active technology, we aim to scatter the pattern of losses without increasing both the bandwidth consumption and, ideally, the end-to-end delay as well. Traditionally, error control techniques operate end-to-end [2]. However, with the advent of the Active Networks technology [3] new promising router functionalities can be envisaged. In this work, we evaluate our active procedure in terms of intelligibility, and we experimentally show that the quality of the received audio stream increases by using active routers. More precisely, we take up again the packet interleaving problem but now we use the processing capabilities of the network elements. Since intermediate network nodes can use different multimedia flows [4], we propose an interleaver algorithm that take advantage of it [5]. For comparison purposes, we simulate a reference single flow end-to-end interleaver and, we experimentally demonstrate that, under some circumstances, our algorithm outperforms the reference system in terms of intelligibility with light impact in the end-to-end packet delay.
This work has been partially financed by the Spanish Science and Technology Ministry under Research Project TIC2002-02978 (with 70% of FEDER funds).
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 200–205, 2009. c IFIP International Federation for Information Processing 2009
Intelligibility Evaluation of a VoIP Multi-flow Block Interleaver
2
201
Basic and Multi-flow Block Interleavers Algorithms
Given a input packet sequence, denoted by {ai }, and the output sequence, denoted by {bi } the interleaver defines a permutation π : ZZ → ZZ such that ai = bπ(i) . An interleaver is said to be periodic if it verifies that π(i+e) = π(i)+e, being e its period. An interleaver has a spread s, if any two input symbols in an interval of length s are separated by distance of at least s at the output. In interactive VoIP applications, the end-to-end delay must be bounded. Thus, for getting the minimum delay packet reallocation, given a spread s, the basic block interleaver algorithm [6] (hereafter referred to as Type I (s)) is: 1. Arrange the symbols associated to the input packets in a (s × s) matrix in rows, from left to right and from top to bottom. 2. Read the matrix by columns from bottom to top and from left to right, and accordingly send the packets. In this case, Dmax , defined as the maximum number of symbols (in the worst case) that any packet will wait at the interleaver, will be equal to Dmax = s·(s− 1). Note that for a periodic VoIP flow, given the imposed end-to-end delay constraint, Type I (s) interleaver will be limited to s such as s·(s−1)·tf < dmax , where tf is the inter-packet period, and dmax is the maximum end-to-end delay that any packet can tolerate. For example, for typical values of dmax = 300 ms and tf = 20 ms, it can not be assured that bursts length longer than 4 packets will be scattered. However, if the router jointly interleaves nf > 1 different flows, a reduction in the interleaver packet delay can be potentially achieved. In this work, we will assume that all the flows have the same period tf . To fill the interleaver matrices, each flow will maintain the relative order with respect to the others. In addition, for a given flow each row will be written from left to right according to the packet sequence number. Let (f 1 , f 2 , . . . , f nf ) be the nf available audio flows, and let s be the maximum expected burst length. To simplify, let us additionally define Rji , with i = {1, . . . , nf } and {j = 1, . . . , nm }, as the number of consecutive rows that the flow f i will be assigned for filling the interleaver matrix j, being nm the number of matrices. Depending on nf and s, we will consider two different cases. 1. Whenever nf ≥ s, the interleaver will be based on just one (nf × 1) matrix (nm = 1), in which R1i = 1, ∀i = {1, . . . , nf }. For this case, the interleaver n nf 1 2 output will be given by . . ., fi1 , fj2 , . . ., fk f , fi+1 , fj+1 , . . ., fk+1 , . . ., where the subscripts i, j, . . ., k denote the sequence number for flows f 1 , f 2 , . . ., f nf . We refer to this interleaver as Type II (nf ). 2. If nf < s, we will refer to this interleaver as Type II (nf , s). Under this condition, two different cases will be considered. – If s = (nf · i), i ∈ IN ⇒ nm = 1. That is, only one interleaver (s × s) matrix will be used; – Otherwise, nm = nf square (s × s) matrices will be required. Going ahead, if we denote rem(x, y) as the remainder of the integer division x/y, the writing matrices algorithm will be as follows:
202
´ J.J. Ramos-Mu˜ noz, A.M. G´ omez, and J.M. Lopez-Soler
, for i = {1, 2, . . . , (nf − rem(s, nf ))}. Similarly, we will set R1j = nsf + 1, for j = {(nf − rem(s, nf ) + 1), . . . , (nf − 1), nf }. – If applicable, for the next j = {2, . . . , nf } additional matrices, and for – For the first matrix, we will set R1i =
i i = {2, . . . , nf } flows, if R(j−1) = ( nsf (i−1) Rji = nsf and Rj = ( nsf + 1).
s nf
(i−1)
+ 1) and R(j−1) =
s nf
then
As it can be checked, any burst of length less or equal to s will be scattered at the de-interleaver output. In this case, if we define r = rem(s, nf ) and d = (s−r)/nf , Dmax , the maximum delay will obey the following expressions - If r ≤ (nf − r) ⇒ Dmax = s · (r · (d + 1) − 1 − (r − 1) · d). - If r > (nf − r) ⇒ Dmax = s · (r · (d + 1) − 1 − ((r − 1) · d + 2 · r − nf − 1)) For a given s, the lower maximum delay that we can obtain is achieved when nf = (s−1), and when s/nf = 2 and r = 0. This delay corresponds to Dmax = s. Therefore, the maximum tolerated s, given a flow with a maximum per packet time to live dmax and a period of tf must satisfy that s < dmax tf . For the provided numerical example, in which tf = 20 ms and dmax = 300 ms, it yields that s < 15, what is significantly less demanding compared to the upper bound of s < 5 for the Type I (s) end-to-end interleaver. The period of the proposed Type 2 II (nf , s) interleaver is equal to ns f , if s ≡ 0 (mod nf ), and s2 in the other case.
3
Quality and Intelligibility Evaluation
To evaluate our VoIP interleavers we plan to use a high level criterion. In noise free conditions, Automatic Speech Recognition (ASR) rate is highly correlated to human intelligibility [7]. Based on that, we propose to use this score as the performance measure. We feel that this methodology should be definitively considered to evaluate any VoIP service enhancement. Compared to MOS subjective tests, ASR has lower cost and is more reproducible. In addition, in terms of the intelligibility perceived by the user, ASR rate can be more suitable than other quality measures like PESQ (ITU-T recommendation P.862) or the E-model [8]. Speech recognizer’s performance is measured in terms of the Word Error-Rate (WER), defined by: ni + n s + n d W ER = × 100 (1) nt where ns is the number of substituted words, ni is the number of spurious words inserted, nd is the number of deleted words and, nt is the overall number of words. Prior to the count of substitution, deletion and insertion errors, dynamic programming is used to align the recognized sentence with its correct transcription.
Intelligibility Evaluation of a VoIP Multi-flow Block Interleaver
4
203
Experimental Results
Experimental results are provided by means of simulation. A simple scenario is set. nf periodic flows arrive into the active router with period equal to tf = 20 ms. For the Type I (s) case, just one flow (nf = 1) is considered. Ideally, we assume no switching or any other routing delay. 1
Cumulative probability
0.95
0.9
0.85
0.8
0.75
0.7
0.65
Generated losses Type I(10) (Dmax=1.80 s) Type II(5) (Dmax=0.00 s) Type II(5,10) (Dmax=0.20 s)
0.6
0.55 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Burst length Fig. 1. Bursts length CDF
We adopt a single error model. It is based on a Markov chain trained with collected traces described in [9]. The bursts length CDF of the trace obtained from the trained model is shown in Fig. 1. The overall probability of loss is 8.2%. In the same figure, as an illustrative example (nf = 5 and s = 10), we plot the bursts length CDFs obtained by using the simulated interleavers. Note that T ypeI(10) is not practically applicable, T ypeII(5) although does not introduce extra delay, it has less scattering capabilities and, finally, T ypeII(5, 10) exhibits a balance between the introduced delay and the loss isolation capacity. To enhance the quality of the received flow, before ASR evaluation, whenever a loss packet is detected, the previous received packet will be artificially repeated. For ASR evaluation we use the connected digit Project Aurora 2 database [10]. After transmission, in order to reduce its inherent variability the speech signal is processed. A feature extractor segments the received speech signal in overlapped frames of 25 ms every 10 ms. Each speech frame is represented by a feature vector containing 13 Mel Frequency Cepstrum Coefficients and the log-Energy. Finally, the feature vectors are extended with their first and second derivatives. The speech recognizer is based on Hidden Markov Models (HMM). We use eleven 16-state continuous HMM word models, (plus silence and pause, that have 3 and 1 states, respectively), with 3 gaussians per state (except silence, with 6
204
´ J.J. Ramos-Mu˜ noz, A.M. G´ omez, and J.M. Lopez-Soler
gaussians per state). The HMM models are trained from a set of 8440 noise-free sentences, while the out-of-train-test set comprises 4004 noise-free sentences. In Table 1, WER and Dmax values are summarized for T ypeI(s), T ypeII(nf ) and T ypeII(nf , s) interleavers with different nf and s. For a given nf , the s value was chosen such as Dmax , expressed in seconds, would be lower than 0.300 for T ypeII(nf , s) interleaver. Note that for T ypeII(nf ), Dmax is not shown because its theoretic delay is equal to 0. It can be observed that WER performance for T ypeII(nf ) interleaver strongly depends on nf . For nf = 2, T ypeII(nf , s) can reduce the T ypeII(nf ) WER without breaking the end-to-end constraint. However, as more nf will be available, the WER difference is less noticeable. Note that although T ypeI(s) outperforms both T ypeII interleavers it can be only used for s < 5, given the VoIP end-to-end delay constraint. Table 1. WER (%) and Dmax in seconds for the three simulated interleavers II(nf ) nf WER s 3 2 5.401 5 6 3 4.333 7 4 3.419 8 5 2.875 10
II(nf , s) WER Dmax 4.610 0.060 2.892 0.200 2.255 0.240 1.831 0.280 1.848 0.160 1.567 0.200
s 3 5 6 7 8 10
I(s) WER Dmax 3.080 0.120 1.595 0.400 1.540 0.600 1.307 0.840 1.289 1.120 1.325 1.800
nf 6 7 8 9 10 12
II(nf ) WER 2.420 1.968 1.819 1.613 1.611 1.534
s 12 14 9 10 11 13
II(nf , s) WER Dmax 1.449 0.240 1.381 0.280 1.626 0.180 1.526 0.200 1.533 0.220 1.455 0.260
s 12 14 9 10 11 13
I(s) WER Dmax 1.263 2.640 1.340 3.640 1.304 1.440 1.285 1.800 1.282 2.200 1.322 3.120
Additionally, note that the T ypeII(4, 8) maximum delay (160 ms) is lower than the T ypeII(3, 7) delay (280 ms), although the s value is greater. This is due to the peculiarity of the interleaver, which results in non linear delay dependence with the number of flows and the burst length considered. Summing up, both the Type II interleavers diminish the packet interleaving delay. Although Type II (nf ) is designed to work properly when nf ≥ s, it can be suited when nf ≈ s, without introducing any additional delay. Compared to Type II (nf ), the Type II (nf , s) interleaver improves the VoIP intelligibility. It scatters a high percentage of losses patterns, and reduces the maximum length of the bursts at the de-interleaver output. Furthermore, although T ypeII(nf , s) introduces some additional delay, it can be used under conditions that T ypeII(nf ) does not tolerate (long burst length and low number of different flows). On a separate note, given the processing capabilities of the active router, it could be always possible to select which interleaver algorithm to use (T ypeII(nf ) or T ypeII(nf , s)). In this case, the network dynamics and the number of available flows should be taken into account. As a rule of thumb, we would suggest the consideration of the T ypeII(nf ) interleaver instead of T ypeII(nf , s) whenever nf ≈ s.
Intelligibility Evaluation of a VoIP Multi-flow Block Interleaver
5
205
Conclusion
In this paper the block interleaving problem for audio applications is revisited. To increase the final audio quality we aim to scatter long bursts of packet losses. We propose a new VoIP interleaver algorithm which not only diminish the per packet delay, but also allows its use under conditions where end-to-end approaches are unfeasible. Our algorithm interleaves packets from different flows. To work properly, the interleaver must be placed in a common node before the path where losses are expected to occur. We show that the resulting speech intelligibility is maximized, especially when the number of available flows is small. In this work, because of its reproducibility and low cost, we have considered automatic speech recognition in order to assess the intelligibility improvements of the proposed VoIP active service. This procedure can be extended to evaluate any other VoIP enhancement. As future work, to establish a mapping function for human to machine recognition rate remains. Similarly, the mapping functions between recognition rate and MOS score should be studied as well. By using these mapping functions, enhanced VoIP active services can be envisaged.
References [1] Liang, Y.J., Farber, N., Girod, B.: Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Transactions on Multimedia 5(4), 532–543 (2003) [2] Towsley, D., Kurose, J., Pingali, S.: A comparison of sender-initiated and receiverinitiated reliable multicast protocols. IEEE Journal on Selected Areas in Communications 15(3), 398–406 (1997) [3] Tennenhouse, D.L., Smith, J.M., Sincoskie, W.D., Wetherall, D.J., Minden, G.J.: A survey of active network research. IEEE Communications Magazine 35(1), 80– 86 (1997) [4] Ott, D.E., Sparks, T., Mayer-Patel, K.: Aggregate Congestion Control for Distributed Multimedia Applications. In: IEEE INFOCOM 2004, vol. 1, pp. 13–23 (March 2004) [5] Ramos-Mu˜ noz, J.J., Lopez-Soler, J.M.: Low delay multiflow block interleavers for real-time audio streaming. In: Lorenz, P., Dini, P. (eds.) ICN 2005. LNCS, vol. 3420, pp. 909–916. Springer, Heidelberg (2005) [6] Kenneth Andrews, C.H., Kozen, D.: A theory of interleavers. Technical Report 97-1634, Computer Science Department, Cornell University (1997) [7] Jiang, W., Schulzrinne, H.: Speech recognition performance as an effective perceived quality predictor. In: Tenth IEEE International Workshop on Quality of Service, pp. 269–275 (May 2002) [8] Cole, R.G., Rosenbluth, J.H.: Voice Over IP Performance Monitoring. SIGCOMM Comput. Commun. Rev. 31(2), 9–24 (2001) [9] Yajnik, M., Kurose, J., Towsley, D.: Packet loss correlation in the MBone multicast network experimental measurements and markov chain models. Tech. Rep. UMCS-1995-115 (1995) [10] Hirsch, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Condition. In: ISCA ITRW ASR 2000, France (2000)
A Web-Services Based Architecture for Dynamic-Service Deployment Christos Chrysoulas1, Evangelos Haleplidis1, Robert Haas2, Spyros Denazis1,3, and Odysseas Koufopavlou1 1 University of Patras, ECE Department, Patras, Greece {cchrys,ehalep,odysseas}@ee.upatras.gr 2 IBM Research, Zurich Research Laboratory, Rüschlikon, Switzerland [email protected] 3 Hitachi Sophia Antipolis Lab, France [email protected]
Abstract. Due to the increase in both heterogeneity and complexity in today’s networking systems, there arises a demand for an architecture for networkbased services, that gives flexibility and efficiency in the definition, deployment and execution of the services and at the same time, takes care of the adaptability and evolution of such services. In this paper we present an approach that applies a component model to GT4, a Web-service based Grid environment, which enables the provision of parallel applications as QoS-aware (Grid) services, whose performance characteristics may be dynamically negotiated between a client application and service providers. Our component model allows context dependencies to be explicitly expressed and dynamically managed with respect to the hosting environment, computational resources, as well as dependencies on other components. Our work can be seen as a first step towards a component-based programming-model for service–oriented infrastructures utilizing standard Web services technologies.
1 Introduction In the recent years, Web service technology has gained more and more importance in the area of Grid Computing. The Open Grid Service Architecture [1] has motivated Grid architects to build environments based on a service-oriented architecture utilizing Web-service technology. The evolution of the Globus Toolkit 4 [2] towards the Web Service Resource Framework [3] was the outcome of that effort. Grids are mostly built following a service-oriented architecture using Web-services technology which has not been designed to fit the idea of a component-based plugand-play client programming framework. Services are typically discovered dynamically, using technologies like the Monitoring and Discovery System (MDS) [4] in Globus Toolkit 4, rather than created, they further do not provide means to describe D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 206–211, 2009. © IFIP International Federation for Information Processing 2009
A Web-Services Based Architecture for Dynamic-Service Deployment
207
dependencies for example in other services running outside the Grid. Web-service technology provides a versatile messaging facility but lacks an extensive component model applicable to service composition. In this paper we present a component-based architecture in order to address the above issues. Our architecture is based not only on the Globus Toolkit 4 environment, a Grid architecture for the provision of parallel applications as Grid services over standard Web-service technology, but it also makes heavy use of a component-based architecture trying to solve the problem of creating new services and the dependencies in other services and components outside the architecture we present. Our proposed Dynamic Service-Deployment Architecture is developing as part of the FlexiNET [5] IST research project and particularly as a sub-module of the FlexiNET Wireless Access Node (FWAN) module. The remainder of the paper is organized as follows: Section 2 describes what FlexiNET is. Section 3 describes the architecture regarding the Dynamic ServiceDeployment. A discussion on related work is given in section 4. Conclusions and future work are presented in section 5.
2 FlexiNET Architecture As stated in section I, the DSD module is developed for the FlexiNET Project. The primary aim of the project is to define and implement a scalable and modular network architecture incorporating adequate network elements (FlexiNET Node Instances) offering roaming connection control, switching/routing control, and advanced services management/access functions to the network access points that currently only support connectivity between user terminals and network core infrastructures [6], [7], [8]. The FlexiNET network architecture consists mainly of node instances, communication buses and data repositories. The DSD module is part of the FWAN. The FWAN architecture can be seen in Figure 1 and is based on Hitachi’s distributed router. Hitachi’s distributed router consists of two functional blocks, the basic and the extensible function block.
Fig. 1. FWAN architecture
208
C. Chrysoulas et al.
The FWAN has, as a basic functional block, a network processor, and as extended functional block, two PCs. A user will access the FWAN through an access point using either a laptop or a mobile phone. The FWAN is responsible for authenticating native and roaming users through the FLAS using an AAA proxy. The Dynamic Service Deployment Module (DSD) must be deployed on the FWAN before boot-up. The Bootstrap Process is responsible for booting up the FWAN with the AAA proxy module. In order to accomplish its task, it reads from a static configuration file which is stored in a local of the Bootstrap Process, followed by a series of commands which will be sent to the DSD Module in order to create the FWAN’s node fundamental functionalities. The Bootstrap Process mainly will trigger the install of the AAA proxy through the DSD module. The AAA Proxy Module is forwarding the Authentication packets to the FLAS Server, encapsulates the EAP Packets [9] into XML messages that are passed over Web Services, and the opposite, in order to authenticate and authorize the user. The AAA proxy service is deployed in the FWAN at boot-up time. It is stored in a local directory and deployed by the DSD module. The code will be requested from the DGWN through Web services. On boot-up the DSD module is requested by the Boot-up process to deploy the AAA proxy module. The DSD module retrieves through the DGWN the AAA proxy service code and deploys it on any of the two PCs based on specific algorithms. Also based upon the user profiles, the DSD module will deploy a Quality of Service Module (QoS), which is responsible for providing QoS to specific users. The required configuration of the network processor will be handled by the ForCEG module which receives Web Service requests from the AAA Proxy and the QoS Module and translates them into ForCES protocol messages [10].
3 DSD Architecture 3.1 DSD Definition By Dynamic Service Deployment we are referring to a sequence of steps that must be taken in order to deploy a service on demand. The necessary steps regarding the service deployment refer to service code retrieval, code installing destination according to matchmaking algorithms, and service deployment. The matchmaking algorithms provide the most efficient use of system resources by examining the available resources of the FWAN with the required resources of the service to be deployed. 3.2 Proposed DSD Architecture The following figure depicts the current proposed DSD architecture. As can be deduced from the figure the DSD is the sum of the following sub-components:
A Web-Services Based Architecture for Dynamic-Service Deployment
209
Fig. 2. DSD Architecture
Web Services Server The Web Services Server sub-component hosts the interfaces with the AAA Proxy and the Bootstrap Process. At this stage only these two processes will interact with the DSD Module. This server is responsible for exchanging messages between the DSD Module and the AAA Proxy Module and the Bootstrap Process. The Web Services Server sub-component has the necessary functionalities necessary to register a Web Service in a UDDI directory. This component also is capable of finding other Web Service Interfaces. DSD Manager The DSD Manager sub-component has two functions depending on whether the user’s profile is required: In the case of the AAA Proxy communicates with the DGWN, the DSD Manager must download the user profile, in order to find, which services must be deployed , and provides the request to the DSD Controller. In the case of Bootstrap Process, the DSD Manager passes the bootstrap services required for deployment to the DSD Controller The DSD Manager is responsible to check if a user has terminated the connection and undo the user’s personal configuration. DSD Controller The DSD Controller sub-component is assigned to receive the service request from the DSD Manager, to communicate with the DGWN in order to download the service code and the service requirements, to retrieve from the Node Model the available resources, to perform the Matchmaking Algorithm in order to find the most suitable resources, and finally to deploy the service. The DSD Controller is responsible for the Services, in 3 dimensions: Download, Deploy, and Configure.
210
C. Chrysoulas et al.
Resource Manager The Resource Manager sub-component is assigned to do the discovery and monitoring of the resources. It collects information, with the help of the Resource Manager Interface, from all the components of the Node model, and also from the DSD Controller. All the collected information is available to the rest of the sub-components through the WebMDS Interface it provides. Only the necessary information is passed to the Node Model. Node Model The Node Model is responsible for keeping all the information about FWAN. It provides us with a complete view regarding the FWAN. The node model contains information regarding physical resources available and used, and data about running services. User Profile User Profile is the data–storage where the downloaded User Profile is stored. It is responsible for keeping the User Profile. Service code and Requirements The Service Code and Requirements data-storage is responsible for storing the downloaded code and the requirements (in terms of physical resources) that describe a service. Running Services and Configuration The Running Services and Configuration data-storage is responsible for storing data about running services and their current configuration.
4 Discussion of Related Work Distributed component models such as Cobra [11], DCOM [12] are widely used mostly in the context of commercial applications. The Common Component Architecture developed within the CCA forum [13] defines a component model for highperformance computing based on interface definitions. XCAT3 [14] is a distributed framework that accesses Grid services e.g. OGSI [1] based on CCA mechanisms. It uses XSOAP for communication and can use GRAM [2] for remote component instantiation. Vienna Grid Environment [15] is a Web service oriented environment for supporting High Performance Computing applications. Vienna Grid Environment (VGE) has been realized based on state-of-the-art Grid and Web Services technologies, Java and XML. Globus Toolkit 4 [2] is an environment that mostly deals with the discovery of services and resources in a distributed environment rather than the deployment of the services themselves.
5 Conclusion and Future Work We presented an architecture that adds a dynamic perspective to Web service based Grid Infrastructure. Our component-based model is addressing the issue regarding the
A Web-Services Based Architecture for Dynamic-Service Deployment
211
dynamic deployment of new services in a distributed environment and the way they address themselves in that environment. We expect this work to be not only relevant to the Grid community but also to the Web service and the Network communities as we did not only address concerns related to Grid computing but also discussed architectural issues regarding Web service configuration and deployment. Our implementation of the model is still in prototype stage which requires further refinement and analysis. For future work we plan to provide a more sophisticated model for service deployment and selection based on QoS properties.
References 1. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, Globus Project (2002), http://www.globus.org/research/papers/ogsa.pdf 2. The Globus Alliance, http://www.globus.org 3. The Web Service Resource Framework, http://www.globus.org/wsrf/ 4. The Globus Alliance, http://www.globus.org/toolkit/docs/development/3.9.4/info/ wsmds.html 5. FP6-IST1 507646 FlexiNET Technical Annex 6. FP6-IST1 507646 FlexiNET D21 Requirement, Scenarios and Initial FlexiNET Architecture 7. FP6-IST1 507646 FlexiNET D22 Final FlexiNET Network Architecture and Specifications 8. Aladros, R.L., Kavadias, C.D., Tombros, S., Denazis, S., Kostopoulos, G., Soler, J., Haas, R., Dessiniotis, C., Winter, E.: FlexiNET: Flexible Network Architecture for Enhanced Access Network Services and Applications. In: IST Mobile & Wireless Communications Summit 2005, Dresden (2005) 9. RFC 3748: Extensible Authentication Protocol (EAP) (June 2004) 10. Haleplidis, E., Haas, R., Denazis, S., Koufopavlou, O.: A Web Service- and ForCES-based Programmable Router Architecture. In: IWAN 2005, France (2005) 11. CORBA Component Model, v3.0, OMG, http://www.omg.org/technology/ documents/formal/components.htm 12. COM Component Object Model Technologies, Microsoft, http://www.microsoft.com/com/default.mspx 13. The CCA Forum, http://cca-forum.org/ 14. Krishnan, S., Gannon, D.: XCAT3: A Framework for CCA Components as OGSA Services. In: Proceedings of the Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, pp. 90–97 (April 2004) 15. Benkner, S., Brandic, I., Engelbrecht, G., Schmidt, R.: VGE - A Service-Oriented Environment for On-Demand Supercomputing. In: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (Grid 2004), Pittsburgh, PA, USA (November 2004)
The Active Embedded Ubiquitous Web Service Framework Dugki Min1,*, Junggyum Lee1, and Eunmi Choi2,** 1
School of Computer Science and Engineering, Konkuk University Hwayang-dong, Kwangjin-gu, Seoul, 143-701, Korea [email protected], [email protected] 2 School of Business IT, Kookmin University Chongnung-dong, Songbuk-gu, Seoul, 136-702, Korea [email protected]
Abstract. We develop an active embedded middleware framework, called the EUWS (Embedded Ubiquitous Web Service), in WinCE.NET. The EUWS seamlessly integrates home network services and the Web Services on the Internet and provides a management framework for ubiquitous web services. As the initial stage of our project, our current focus has been on designing and implementing a prototype of the EUWS in WinCE.NET. The architecture of the EUWS prototype system includes an extensible and reconfigurable Embedded Ubiquitous Web Service(EUWS) framework and an UPnP2WS processing module that seamlessly integrates the UPnP standard with the Web Service standard.
1 Introduction Recently, a number of middleware standards are proposed to implement home network. They are the UPnP(Universal Plug and Play)[1] for easy interoperability among devices, the HAVi(Home Audio and Video Interoperability)[2] for interoperability between video and audio devices, the Jini[3] for interoperability for Java applications, and the OSGI[4] for middleware framework between networked services. These recent home network middlewares are used to connect, integrate, and manage services provided by devices that are in a restricted area. However, none of them considers the seamless interconnection and integration with the external Internet services, i.e., to access home network services from the external client or to reach out the external Internet services from home network services. As the standard technology to integrate Internet services, the Web Service[5] becomes the major trend. The Web Service is platform-independent and programming languageindependent, and it is the XML-based middleware standard determined and developed by * This paper was supported by Konkuk University in 2005", and also by Microsoft Research Asia under the Grant of MSRA Joint Research Project in 2004. **Corresponding author: This work was supported the Korea Research Foundation (KRF) under Grant No. D00021 (R04-2003-000-10213-0), and also by research program and research center UICRC of Kookmin University in 2005. D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 212–217, 2009. © IFIP International Federation for Information Processing 2009
The Active Embedded Ubiquitous Web Service Framework
213
most of IT vendors. The Web Service provides the fundamental middleware standards of Internet distributed computing: SOAP[6] for the communication standard between services, WSDL[7] for description standard to define the Web Service in XML, UDDI[8] for service searching standard over Internet. Moreover, it extends up to the SOA-based application middleware standard by considering security, transaction, event, and business process management. We develop an active embedded middleware framework, called the EUWS (Embedded Ubiquitous Web Service), in WinCE.NET that can be applied to home gateways. The EUWS seamlessly integrates home network services, and the Web Services on the Internet and provides a management framework for ubiquitous web services. Through the EUWS home gateway, the device services are converted to the Web Services for external accesses, so that a remote client can control the device via the Web Services. At the same time, the EUWS gateway converts the external Web Services into the device-specific services, so that the internal devices can access and use the external services according to the device-specific protocol. As the initial stage of our research, our current focus has been on designing and implementing a prototype of the EUWS in WinCE.NET for the UPnP middleware protocol. The architecture of the EUWS prototype system includes an extensible and reconfigurable Embedded Ubiquitous Web Service(EUWS) framework and a UPnP2WS processing module that seamlessly integrates the UPnP standard with the Web Service standard. The EUWS technology will be used as an encore technology to build the Advanced Home Gateway. We have the EUWS prototype be deployed into a home gateway in our demo system where a digital audio and a digital TV can be controlled by an external remote controller. While the traditional gateway focuses on connection between internal segment and external segment at the network level, the EUWS focuses on connection between internal services and external services at the service level. The EUWS integrates various services which exist in home network environment, and integrates internal and external services. The next section presents the EUWS architecture. Section 3 presents our demo prototype system currently implemented. We conclude in the last section.
2 The EUWS Architecture In this section, we introduce the architecture of the active EUWS (Embedded Ubiquitous Web Service) framework. As shown in Figure 1, the active EUWS framework has two major parts: one is the Protocol Abstraction Sub-framework and the other is the Service Orientation Sub-framework. The Protocol Abstraction Subframework is for seamlessly integrating various devices each of which uses different middleware protocols. The Service Orientation Sub-framework is for creating a virtual service-oriented space where everything in a ubiquitous environment, e.g. devices or services, is a standard service. 2.1 Protocol Abstraction Sub-framework In order to support various ubiquitous devices that use different communication protocols, the Protocol Abstraction Sub-framework contains an Active Protocol
214
D. Min, J. Lee, and E. Choi
Fig. 1. The EUWS Framework Architecture
Reconfigurable architecture that can dynamically deploy protocol processing modules on demand. The active protocol reconfigurable architecture is composed of two dedicated modules and one or more pluggable modules. The two dedicated modules are the Protocol Detection Module and the Dynamic Protocol Deployment Module. The Protocol Detection Module is used for detecting new devices whose protocol processing module is not yet plugged-in. While the EUWS framework is running, the Protocol Detection Module periodically broadcasts a sequence of protocol-specific discovery messages one-by-one at a time. When it receives any response from a new device whose protocol execution module is not yet plugged-in, the related protocol processing module is downloaded and deployed automatically by the Protocol Deployment Module, if the related protocol processing module is available. Otherwise, error message is sent to the newly detected devices. The Protocol Detection Module can be upgraded whenever its new version is available. 2.2 Service Orientation Sub-framework The Service Orientation Sub-framework provides a service orientation environment where every distinctive device or service is recognized and treated as the same type of standard service. This framework also provides common middleware services, such as resource management service, event service, and security service. In our project, we employ the Web service as our service-orientation standard, since the Web service becomes the de-facto standard of business domain service area, and is also good at extensibility and self-description. The Service Orientation Sub-framework has three components: Service Manager, Service Container, and Middleware Services. The Service Manager is the manager of service orientation; it is in charge of transforming everything to standard service, registering and searching a registered service to the service directory, and providing metadata information of the registered services. Let us suppose that an UPnP device is newly arrived into the managed space. Then, the UPnP Processing Module detects the UPnP device and registers it to the Service Manager. The Service Manager creates a service proxy that acts as the corresponding service object for the device and uploads
The Active Embedded Ubiquitous Web Service Framework
215
it to the Service Container. At the same time, the WSDL of the service proxy is automatically generated. After the service proxy is deployed with WSDL into the Service Container, the Service Manager registers the device as a Web service into the UDDI. Other web services, which locate outside of the framework, consider the UPnP device as a registered Web service. Within our framework, this device is also treated as a standard service. That is, the Protocol Execution Module or the Service Manager accesses a registered device through the corresponding service proxy. However, external devices, that use their own protocols, access and use the registered device through their own protocols via the related Protocol Execution Modules. Other functions of the Service Orientation Sub-framework are the Middleware Services related to Service Container. The Service Container is in charge of system resource management and dynamic service deployment into memory if necessary. Also, the Service Container performs various management functions by detecting service invocation and generating events.
3 EUWS Prototype Implementation In this section, we introduce the EUWS prototype implemented. The EUWS prototype is a system operated on .NET framework and it contains the initial version of EUWS Framework with the UPnP execution module. In order to deploy the EUWS into a device, Win CE .NET Platform Builder is used to upload a kernel image into the board. 3.1 Development Environment The EUWS is implemented in an embedded board on top of Win CE .NET containing .NET Compact Framework. Two embedded boards are used; one is for a device and the other is for home gateway. The prototype device and home gateway are implemented on the similar boards called X-Hyper255B and X-Hyper255BTKUIII respectively. These boards have a 400MHz Intel XScale PXA255 CPU, 64MB SDRAM, 32MB flash memory, 10Base-T CS8900A and PCMCIA Slot. Detailed information is in Table 1. As for the external control client, HP iPAQ PDA is used. As a design tool, the IBM Rational Rose XDE is used, and as a development tool, the MS Visual Studio .NET is used. Table 1. Device Platform & Home Gateway Platform
Device Platform
Home Gateway Platform
3.2 Devices of Demo System As our demo systems, a digital audio and a digital TV are developed as UPnP devices (see Figure 2) Those devices contain UPnP device modules deployed on top of .NET
216
D. Min, J. Lee, and E. Choi
Compact Framework. When initially operating, their device information is transferred to the home gateway. Also, they have UPnP control point modules so that they can access web services provided by the home gateway as UPnP services. For example, the digital TV can receive channel broadcasting information service from the home gateway, and the digital audio can receive new media information.
Fig. 2. UPnP Devices (Digital TV & Digital Audio)
The home gateway contains the active EUWS framework explained in section 2. It discovers internal devices, and converts the services of the internal devices into Web services. It also converts external Web services into UPnP services. At the same time, the home gateway reads events from UPnP devices and transfers them to external clients. Especially, when an external Web service is accessed by an UPnP device, the external Web service is perceived as an UPnP service by the EUWS framework of the home gateway, that is an UPnP device including an UPnP control point can access the external Web Services as easy as UPnP services via the home gateway. In order to control digital audio and TV devices within home area, a PDA is used as external device that contains the home control application. When booting the PDA, the control application automatically begins and receives service information of home devices from the designated home gateway. The device information received by the home control application is written in the Web services of internal UPnP devices exported by the home gateway. Thus, when a user selects and invokes a service shown in the screen of the PDA, it invokes a Web service provided by the home gateway and it consequently invokes an UPnP service provided by an internal device. The Figure 3 shows the GUI of the PDA.
Fig. 3. External Devices
The Active Embedded Ubiquitous Web Service Framework
217
4 Conclusion In order to establish a ubiquitous home network, we developed an active embedded middleware framework, the EUWS, in WinCE.NET environment. The EUWS seamlessly integrates home network services and the Web Services on the Internet, and also provides a management framework for ubiquitous web services. In this paper, we presented a prototype of the EUWS in WinCE.NET for the UPnP middleware protocol, so that we can seamlessly integrates the UPnP standard with the Web Service standard, and work with UPnP devices and home gateway for the home network. Through our demo system, we could control a digital audio and a digital TV by an external remote controller. As an active embedded middleware framework, the EUWS achieved to integrate various existing services, internal and external services.
References [1] Miller, B.A., Nixon, T., Tai, C., Wood, M.D.: Home networking with Universal Plug and Play. Communications Magazine 39(12), 104–109 (2001) [2] HAVI, http://www.havi.org [3] Allard, J., Chinta, V., Gundala, S., Richard III, G.G.: Jini meets UPnP: an architecture for Jini/UPnP interoperability. In: Proceedings. 2003 Symposium on Applications and the Internet, pp. 268–275, January 27-31 (2003) [4] Dobrev, P., Famolari, D., Kurzke, C., Miller, B.A.: Device and service discovery in home networks with OSGi. Communications Magazine 40(8), 86–92 (2002) [5] Hung, P.C.K., Ferrari, E., Carminati, B.: Towards standardized Web services privacy technologies. In: Proceedings IEEE International Conference on Web Services, p. 174, July 6-9 (2004) [6] Box, D.: SOAP 1.1., http://www.w3.org/TR/SOAP/ [7] WSDL Verstion 2.0 Part 3: Bindings, http://www.w3.org/TR/wsdl20bindings/ [8] UDDI Technical WhitePaper, UDDI.org, http://www.uddi.org/whitepapers.html
Framework of an Application-Aware Adaptation Scheme for Disconnected Operations Umar Kalim, Hassan Jameel, Ali Sajjad, Sang Man Han, Sungyoung Lee, and Young-Koo Lee Department of Computer Engineering, Kyung Hee University Sochen-ri, Giheung-eup, Yongin-si, Gyeonggi-do, 449-701, South Korea {umar,hassan,ali,i30000,sylee}@oslab.khu.ac.kr, [email protected]
Abstract. The complex software development scenarios for mobile/ hand-held devices operating in wireless environments require adaptation to the variations in the environment (such as fluctuating bandwidth). This translates to maintenance of service availability in preferably all circumstances. In this paper we propose that a mobile computing system (for hand-held, wireless devices) must be based on the combination of reflection, remote evaluation and code mobility mechanisms such that the communication framework1 allows developers to design disconnectionaware applications which maintain service availability in case of varying circumstances by automatically redeploying essential components to appropriate locations. This not only allows the application to continue executing in varying conditions, but also in entirely disconnected modes.
1
Introduction
The complexity, size and distribution of software today is rapidly increasing. This has complemented the ubiquity of hand-held devices and promoted the growth of distributed systems and applications. One can thus think of numerous, complex software development scenarios which utilize a large number of handheld devices, such as in environment monitoring and surveying, postal services, patient monitoring etc. Such scenarios present intricate technical challenges for middleware technologies [1]. In particular, the middleware must adapt to the variations in the environment (e.g. fluctuating bandwidth) of the mobile device and service availability must be maintained in (preferably) all circumstances [1]. The conventional middleware being heavy, and relatively inflexible, fails to properly address such requirements. The fundamental reason is that traditional middleware systems have been designed adhering to the principle of transparency. Despite the fact that this design [2] [3] has proved successful for traditional distributed systems, this concept has limitations when considered in a mobile environment, where it is neither possible, nor preferred, to hide the implementation details from the user. Also, applications may posses information 1
This research work has been partially supported by KOSEF, Korea, for which Professor Sungyoung Lee is the corresponding author.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 218–223, 2009. c IFIP International Federation for Information Processing 2009
Framework of an Application-Aware Adaptation Scheme
219
that could facilitate the middleware to perform efficiently. Thus to cope with such limitations, numerous research efforts have been made [4] [5] on designing middleware systems suitable for such environments. However the solutions developed to date do not fully support the necessary level of middleware configurability and reconfigurability that is required to facilitate mobile computing and disconnected operations. Thus in our opinion a simple but dynamic solution is the need of the hour. We propose that a mobile computing system for hand-held devices must be based on the combination of reflection [6], remote evaluation [7] and code mobility [8]. The remote evaluation paradigm not only enables the clients to out-source resource intensive tasks, but also allows more client-side applications (because of their smaller footprint). Reflection is a fundamental technique that supports both introspection and adaptation. In order to maintain service availability in a distributed system, in case of varying circumstances the middleware can utilize code mobility and reflection to automatically redeploy essential components to appropriate locations defined by the application policy. Moreover, such systems can be implemented optimally if the adaptation scheme is application aware, i.e. the framework allows the developer to determine the policies, such as the constraints the components may have, dependency among components, restriction on collocation of components, how the system should react to different circumstances etc. 1.1
Code Mobility and Autonomy of Components
Under favourable circumstances, remote evaluation is one of the most appropriate solutions for mobile, hand-held devices. However in unfavourable conditions, code mobility overcomes the limits of fluctuating bandwidth and disconnections as it allows for specifying complex computations to move across a network from the server to the client end. This way, the services that need to be executed at a platform residing in a portion of the network reachable only through an unreliable link could be relocated and hence maintain Quality of Service. In particular, it would not need any connection with the node that sent it, except for the transmission of the final results of its computation. Also, autonomy of application components brings improved fault tolerance as a side-effect. In conventional client-server systems, the state of the computation is distributed between the client and the server. A client program is made of statements that are executed in the local environment, interleaved with statements that invoke remote services on the server. The server contains (copies of) data/code that belongs to the environment of the client program, and will eventually return a result that has to be inserted into the same environment. This structure leads to well-known problems in presence of partial failures, because it is very difficult to determine where and how to intervene to reconstruct a consistent state. The action of migrating code, and possibly sending back the results, is not immune from this problem. In order to determine whether the code has been received and avoid duplicated or lost mobile code, an appropriate protocol must be in place. However, the action of executing code that embodies a set of
220
U. Kalim et al.
interactions that should otherwise take place across the network is actually immune from partial failure and can permit execution even in the face of absolute disconnection. An autonomous component encapsulates all the state involving a distributed computation, and can be easily traced, check-pointed, and possibly recovered locally, without any need for knowledge of the global state. Thus to introduce the capability of dynamic reconfiguration to achieve the above mentioned objectives, the system must posses certain characteristics, such as, it should be based on a distributed object framework, the system must be able to redeploy/replace2 components3 , it should be able to recover gracefully in the case of faults and there should be a procedure to reconcile components upon reconnection.
2
Framework
In order to narrow down the scope of the problem, we distinguish between voluntary and involuntary disconnection. The former refers to a user-initiated event that enables the system to prepare for disconnection, and the latter to an unplanned disconnection (e.g., due to network failure). The difference between the two cases is that in involuntary disconnection the system needs to be able to detect disconnection and reconnection, and it needs to be pessimistically prepared for disconnection at any moment, hence requiring to proactively reserve and obtain redundant resources (if any) at the client. Here we are only focusing on voluntary disconnections and defer the task of predicting and dealing with involuntary disconnections as future work. Note that the steps for the remedy will be the same, whether the disconnection is voluntary or involuntary. 2.1
Characteristics of the Components and Their Classification
As the system comprises of components as the building blocks we propose that the primary components participating in the reconfiguration must be serializable and they must implement the DisconnectionManagement interface as shown in figure 1. This interface advertises two primary methods; disconnect and reconnect. These methods are invoked by the framework on disconnection and reconnection. The first requirement facilitates the components in relocating themselves while maintaining their state. The second requirement enables the application to prepare for disconnection. The use of disconnect is to compile the component state and transfer it in marshaled form, over the network along, with the code to be executed locally at the client. Similarly, reconnect is used to perform the process of reconciliation among components upon reconnection, details of which are explained in [9]. The components are classified with respect 2 3
The use of a component with reduced (or similar) functionality but the same interface, as the substitute to a (remote) component with reference functionality. Analogous to an object, here a component is referred to as a self contained entity that comprises of both data and procedures. Also data access has not been considered separately because data is always encapsulated in a component.
Framework of an Application-Aware Adaptation Scheme Remote Object <>
Disconnection Management <>
Application Logic (Server Implementation ) <>
Application Logic <>
221
Fig. 1. Hierarchy of interfaces for disconnection-aware components
reconfigure Transferring state (as per policy)
Traversing reference graph
identify references for relocation
disconnect
initialize reconnection complete
Swapping local references with remote references
Connected
Disconnection triggered
disconnection Swapping remote references with local complete references Reconnection triggered
Disconnected
Traversing reference graph
download references Downloading reference implementation (.class files )
reconfigure identify references for relocation Reconnecting
reconnect finalize [off=true]
Transferring state (as per policy)
create instances
Creating local references
transfer state Disconnecting
Fig. 2. State-transition diagram from disconnection/reconnection management
to disconnection (Log, Substitute and Replica) and reconnection (Latest, Revoke, Prime and Merge), details of which are specified in [9].
3
Disconnection Management
When it comes to maintaining service availability in the face of a disconnection, there is a need to relocate the required server code (partially or completely) to the client, in order to make local processing possible. 3.1
Working
The state-transition diagram in figure 2 summarizes the working [9] of the framework. The mechanisms of Reflection, dynamic class loading and linking and serialization (provided by Java [10]) are employed to achieve code mobility. Once a disconnection event is fired, the framework propagates the event to all disconnection-aware references. These references then invoke the disconnect method. This method prepares the reference for the disconnection. Using the mechanisms of introspection each component and each of its contained objects are traversed recursively and a list of references to be relocated is prepared. This list is prioritized with respect to the policy determined by the application designer and each reference is treated as per its classification [9]. The framework maintains a sufficient state of each reference in order to restore the system to the state before disconnection.
222
4
U. Kalim et al.
Related Work
A substantial debt is owed to Coda [4]. The authors were among the first to demonstrate that client resources could be effectively used to insulate users and applications from the hurdles of mobile information access. Coda treats disconnection as a special case of network partitioning where the client may continue to use the data present in its cache, even if its disconnected. Odyssey [5], inspired by Coda [4], proposed the concept of application-aware adaptation. The essence of this approach is to have a collaborative partnership between the application and the system, with a clear separation of concerns. FarGo-DA [11], an extension of FarGo, a mobile component framework for distributed applications proposes a programming model with support for designing the behaviour of applications under frequent disconnection conditions. The programming model enables designers to augment their applications with disconnection-aware semantics that are tightly coupled with the architecture, and are automatically carried out upon disconnection.
5
Prototype Implementation
The framework is developed using J2SE [10] as the fundamental platform for the application; both the client and server components. Java RMI [12] is used for remote evaluation, where as the Reflection classes are used for introspection and reference management when objects are relocated (from the server to the client or vice versa) and references are swapped. Components are notified about disconnection or reconnection via event-notification mechanism. We have implemented a prototype application for patient monitoring and diagnosis service along with the framework libraries in order to verify the feasibility of our proposal. This implementation [9] is part of our ongoing research [13]. The module layout of the framework along with the application is shown in figure 3. It may be noted that the framework comprises of two sub-systems; one operating at the server end and the other at the client end. Unlike [14], our approach being simple and discreet avoids the computational overhead required to determine the component distribution in different circumstances. This is primarily due to the application aware approach, which allows the developer to determine the application policies Core Application Java (J2SE) API Java RMI
Reference Manager
Component Relocator
Resource Monitor
Event Listener
Java Virtual Machine (JVM)
Fig. 3. Module layout of the prototype implementation
Framework of an Application-Aware Adaptation Scheme
6
223
Conclusion
In this paper we proposed a mobile computing middleware-framework for handheld devices which is based on the combination of reflection [6], remote evaluation [7] and code mobility [8]. We have implemented a prototype application [9] along with the framework libraries in order to demonstrate the feasibility of the approach. The results reflect that significant benefits may be obtained by maintaining service availability even in the face of a disconnection.
References 1. Satyanarayanan, M.: Pervasive computing: Vision and challenges. IEEE Personal Communications, 10–17 (2001) 2. Reference-model: Iso 10746-1 - open distributed processing. International Standardization Organization (1998) 3. Emmerich, W.: Engineering Distributed Objects. John Wiley and Sons, Chichester (2000) 4. Kistler, J., Satyanarayanan, M.: Disconnected operation in the coda file system. In: 13th ACM symposium on Operating Systems Principles, pp. 213–225. ACM, New York (1991) 5. Noble, B., Satyanarayanan, M.: Agile application-aware adaptation for mobility. In: 16th ACM Symposium on Operating Systems Principles. ACM, New York (1997) 6. Maes, P.: Concepts and experiments in computational reflection. In: 2nd Conference on Object Oriented Programming Systems, Languages and Applications 7. Stamos, J., Gifford, D.: Remote evaluation. In: Transactions on Programming Languages and Systems, pp. 537–564. ACM, New York (1990) 8. Fuggetta, A.: Understanding code mobility. In: Transactions on Software Engineering, vol. 24, pp. 342–361. IEEE, Los Alamitos 9. Kalim, U.: Technical report: Framework of an application-aware adaptation scheme for disconnected operations, http://oslab.khu.ac.kr/xims/mgrid/techreport-disconn-umar.pdf 10. Sun-Microsystems: Java, http://java.sun.com/j2se/ 11. Weinsberg, Y., Israel, H.: A programming model and system support for disconnected-aware applications on resource-constrained devices. In: 24th International Conference on Software Engineering, pp. 374–384 (2002) 12. Sun-Microsystems: Java rmi, http://java.sun.com/products/jdk/rmi/ 13. Kalim, U., Jameel, H.: Mobile-to-grid middleware: An approach for breaching the divide between mobile and grid environments. In: Lorenz, P., Dini, P. (eds.) ICN 2005. LNCS, vol. 3420, pp. 1–8. Springer, Heidelberg (2005) 14. Marija, M.: Improving availability in large, distributed, component-based systems via redeployment. Technical Report USC-CSE-2003-515, Center for Software Engineering, University of Southern California (2003)
Kinetic Multipoint Relaying: Improvements Using Mobility Predictions J´erˆome H¨arri, Fethi Filali, and Christian Bonnet Institut Eur´ecom Department of Mobile Communication B.P. 193 06904 Sophia-Antipolis, France {Jerome.Haerri,Fethi.Filali,Christian.Bonnet}@eurecom.fr
Abstract. Multipoint Relaying (MPR) is a technique to reduce the number of redundant retransmissions while diffusing a broadcast message in the network, where only a subset of nodes are allowed to forward packets. The selection is based on instantaneous nodes’ degrees, and is periodically refreshed. We propose in this chapter a novel heuristic to select kinetic multipoint relays based on nodes’ overall predicted degree, which is solely updated on a per-event basis. We illustrate that this approach significantly reduces the number of messages needed to operate the protocol, yet with similar broadcast properties that the regular MPR, such as network coverage, number of multipoint relays, or flooding capacity.
1 Introduction Multipoint relaying (MPR, [1]) provide a localized way of flooding reduction in a mobile ad hoc network. Using 2-hops neighborhood information, each node determines a small set of forward neighbors for message relaying, which avoids multiple retransmissions and blind flooding. MPR has been designed to be part of the Optimized Link State Routing algorithm (OLSR, [2]) to specifically reduce the flooding of TC messages sent by OLSR to create optimal routes. Yet, the election criteria is solely based on instantaneous nodes’ degrees. The network global state is then kept coherent through periodic exchanges of messages. Some studies showed the impact of periodic beacons on the probability of transmission in 802.11, or on the battery life [4,3]. This denotes that these approaches have major drawbacks in terms of reliability, scalability and energy consumptions. The next step to their evolution should therefore be designed to improve the channel occupation and the energy consumption. In this chapter, we propose to improve the MPR protocol by using mobility predictions. We introduce the Kinetic Multipoint Relaying (KMPR) protocol, which heuristic selects kinetic relays based on nodes actual and future predicted nodal degrees. Based
An extended version of this chapter is available as a technical report under the reference RR 05 148 at http://www.eurecom.fr/people/haerri.en.htm Institut Eur´ecom’s research is partially supported by its industrial members: Bouygues T´el´ecom, France T´el´ecom, Hitachi Europe, SFR, Sharp, ST Microelectronics, Swisscom, Texas Instruments, Thales. D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 224–229, 2009. c IFIP International Federation for Information Processing 2009
Kinetic Multipoint Relaying: Improvements Using Mobility Predictions
225
on this, periodic topology maintenance may be limited to the instant when a change in the neighborhood actually occurs. Our objective is to show that this approach is able to significantly reduce the number of messages needed to maintain the backbone’s consistency, thus saving network resources, yet with similar flooding properties as the regular MPR. The rest of the chapter is organized as follows. Section 2 describes the heuristic to compute kinetic degrees. Then, in Section 3, we describe the KMPR protocol. Finally, Section 4 provides simulation results, while Section 5 draws some concluding remarks.
2 Kinetic Nodal Degree in MANETs We explain in this section the method for modeling kinetic degrees in MANETs. We model nodes’ positions as a piece-wise linear trajectory and, as shown in [5], the corresponding trajectory durations are lengthy enough to become a valuable cost for using kinetic degrees. Over a relatively short period of time1 , one can assume that each such node, say i, follows a linear trajectory. Its position as a function of time is then described by xi + dxi · t Posi (t) = , yi + dyi · t where P osi (t) represents the position of node i at time t, the vector [xi , yi ]T denotes the initial position of node i, and vector [dxi , dyi ]T its initial instantaneous velocity. Let us consider node j as a neighbor of i. The squared distance between nodes i and j is defined as 2 2 Dij (t) = Dji (t) = Posj (t) − Posi (t)22 2 xj − xi dxj − dxi = + · t yj − yi dyj − dyi
= aij t2 + bij t + cij , 2 Considering r as nodes maximum transmission range, as long as Dij (t) ≤ r2 , nodes i and j are neighbors. Therefore, solving 2 Dij (t) − r2 = 0
gives tfijrom and tto ij as the time intervals during which nodes i and j remain neighbors. Consequently, we can model nodes’ kinetic degree as two successive sigmoid functions, where the first one jumps to one when a node enters another node’s neighborhood, and the second one drops to zero when that node effectively leaves that neighborhood. Considering nbrsi as the total number of neighbors detected in node i’s neighborhood at time t, we define nbrsi
Degi (t) =
k=0 1
1 1 · 1 + exp(−a · (t − tfk rom )) 1 + exp(a · (t − tto k ))
(1)
The time required to transmit a data packet is orders of magnitude shorter than the time the node is moving along a fixed trajectory.
226
J. H¨arri, F. Filali, and C. Bonnet
t=16
r
i t=4 j
t=20
k
(a) Node i kinetic neighborhood
(b) Node i kinetic nodal degree
Fig. 1. Illustration of nodes kinetic degrees
as node i’s kinetic degree function, where tfk rom and tto k represent respectively the time a node k enters and leaves i’s neighborhood. Thanks to (1), each node is able to predict its actual and future degree and thus is able to proactively adapt its coverage capacity. Fig. 1(a) illustrates the situation for three nodes. Node k enters i’s neighborhood at time t = 4s and leave it at time t = 16s. Meanwhile, node j leaves i’s neighborhood at time t = 20s. Consequently, Fig. 1(b) illustrates the evolution of the kinetic degree function over t. Finally, the kinetic degree is obtained by integrating (1) i (t) = Deg
∞ t
k=nbrs i k=0
1 1 ( · ) f rom 1 + exp(a · (t − tto 1 + exp(−a · (t − tk )) k ))
(2)
For example, in Fig. 1(b), node i kinetic degree is ≈ 32.
3 Kinetic Multipoint Relays In this section, we describe our Kinetic Multipoint Relaying protocol. It is mainly extracted from the regular MPR protocol. Yet, we adapt it to deal with kinetic degrees. To select the kinetic multipoint relays for node i, let us call the set of 1-hop neighbors of node i as N (i), and the set of its 2-hops neighbors as N 2 (i). We first start by giving some definitions. Definition 1 (Covering Interval). The covering interval is a time interval during which a node in N 2 (i) is covered by a node in N (i). Each node in N 2 (i) has a covering interval per node i, which is initially equal to the connection interval between its covering node in N (i) and node i. Then, each time a node in N 2 (i) is covered by a node in N (i) during some time interval, this covering interval is properly reduced. When the covering interval is reduced to ∅, we say that the node is fully covered. Definition 2 (Logical Kinetic Degree). The logical kinetic degree is the nodal degree obtained with (2) but considering covering intervals instead of connection intervals. In that case, tfk rom and tto k will then represent the time interval during which a node k ∈ N 2 (i) starts and stops being covered by some node in N (i).
Kinetic Multipoint Relaying: Improvements Using Mobility Predictions
227
The basic difference between MPR and KMPR is that unlike MPR, KMPR does not work on time instants but on time intervals. Therefore, a node is not periodically elected, but is instead designated KMPR for a time interval. During this interval, we say that the KMPR node is active and the time interval is called its activation. The KMPR protocol elects a node as KMPR a node in N (i) with the largest logical kinetic degree. The activation of this KMPR node is the largest covering interval of its nodes in N 2 (i). Kinetic Multipoint Relaying (KMPR) The KMPR protocol applied to an initiator node i is defined as follows: – Begin with an empty KMPR set. – First Step: Compute the logical kinetic degree of each node in N (i). – Second Step: Add in the KMPR set the node in N (i) that has the maximum logical kinetic degree. Compute the activation of the KMPR node as the maximum covering interval this node can provide. Update all other covering intervals of nodes in N 2 (i) considering the activation of the elected KMPR, then recompute all logical kinetic degrees. Finally, repeat this step until all nodes in N 2 (i) are fully covered. Then, each node having elected a node KMPR for some activations is then a KMPR Selector during the same activation. Finally, KMPR flooding is defines as follows: Definition 3 (KMPR flooding). A node retransmits a packet only once after having received the packet the first time from an active KMPR selector.
4 Simulation Results We implemented the KMPR protocol under ns-2 and used the NRL MPR [7] implementation for comparison with KMPR. We measured several significant metrics for Manets: The effectiveness of flooding reduction, the delay before the network receives a broadcast packet, the number of duplicate packets and finally the routing overhead. The following metrics were obtained after the population of 20 nodes were uniformly distributed in a 1500 × 300 grid. Each node has a transmission range of 250m. The mobility model we used is the standard Random Mobility Model where we made nodes average velocity vary from 5m/s to 30m/s. Finally, we simulated the system for 100s. Figure 2 illustrates the flooding reduction of MPR and KMPR. Although MPR is slightly more performing than KMPR, we can see that both protocols are close together and have a fairly good flooding reduction, both in terms of duplicate and forwarded packets. Note that the low fraction of relays in Fig 2(b) comes from the rectangular topology, where only a couple of MPRs are used as bridge in the center of the rectangle. On Fig. 3, we depicted the broadcast efficiency of MPR and KMPR. In the simulations we performed, we measured the broadcast efficiency as the time a packet takes before being correctly delivered to the entire network. As we can see, KMPR has a delivery time faster than MPR by 50%. This might comes from two properties of KMPR. Firstly, as described in [6], MPR suffers from message decoding issues, which we corrected in KMPR. Secondly, as we will see in the next figure, KMPR’s backbone maintenance is significantly less than MPR. Therefore, the channel access is faster and the probability of collisions is decreased.
228
J. H¨arri, F. Filali, and C. Bonnet 30
2
MPR KMPR
25
1.6
Forwared Packets ratio
Duplicate Packets ratio
MPR KMPR
1.8
20
15
10
1.4 1.2 1 0.8 0.6 0.4
5
0.2
0 10
15
20
25
30
35
0 10
40
15
20
Average Speed [s]
25
30
35
40
Average Speed [s]
(a) Duplicate reception
(b) Forwarding Nodes
Fig. 2. Illustration of the flooding reduction of MPR and KMPR 5
MPR KMPR
4.5
Delivery Delay [s]
4 3.5 3 2.5 2 1.5 1 0.5 0 10
15
20
25
30
35
40
Average Speed [s]
Fig. 3. Illustration of the broadcast efficiency of MPR and KMPR 5
x 10
4500
4
4000
3.5
3500
3
3000
# hello packets
Routing Overhead [bytes]
4.5
2.5 2 1.5
2500 2000 1500 1000
MPR KMPR
1 0.5 10
MPR KMPR
15
20
25
30
Average Speed [s]
(a) Routing overhead
35
500
40
0 10
15
20
25
30
35
Average Speed [s]
(b) Number of Hello packets
Fig. 4. Illustration of the network load for MPR and KMPR
40
Kinetic Multipoint Relaying: Improvements Using Mobility Predictions
229
In the two previous figures, we have shown that KMPR had similar properties than MPR in term of flooding reduction and delay. Now, in Fig. 4, we illustrate the principal benefit of KMPR: its low routing overhead. Indeed, by using mobility predictions, the routing overhead may be reduced by 75% as it may be seen on Fig. 4(a). We also show on Fig. 4(b) the number of hello messages which drops dramatically with KMPR, yet still preserving the network’s consistency.
5 Conclusions In this chapter, we presented a original approach for improving the well-known MPR protocol by using mobility predictions. We showed that the Kinetic Multipoint Relaying (KMPR) protocol was able to meet the flooding properties of MPR, and this by reducing the MPR channel access by 75% and MPR broadcast delay by 50%. We consequently illustrated that, after having been studied in other fields of mobile ad hoc networking, mobility predictions are also an interesting technique to improve broadcasting protocols.
References 1. Laouiti, A., et al.: Multipoint Relaying: An Efficient Technique for Flooding in Mobile Wireless Networks. In: 35th Annual Hawaii International Conference on System Sciences (HICSS 2001), Hawaii, USA (2001) 2. Clausen, T., Jacquet, P.: Optimized Link State Routing Protocol (OLSR), Project Hipercom, INRIA, France (October 2003), www.ietf.org/rfc/rfc3626.txt 3. Bianchi, G.: Performance analysis of the IEEE 802.11 distributed coordination function. IEEE Journal on Selected Areas in Communications 18(3), 535–547 (2000) 4. Toh, C.-K., Cobb, H., Scott, D.A.: Performance evaluation of battery-life-aware routing schemes for wireless ad hoc networks. In: 2001 IEEE International Conference on Communications (ICC 2001), June 2001, vol. 9, pp. 2824–2829 (2001) 5. Haerri, J., Bonnet, C.: A Lower Bound for Vehicles Trajectory Duration. In: IEEE VTC Fall 2005, Dallas, USA (September 2005) 6. Haerri, J., Bonnet, C., Filali, F.: OLSR and MPR: Mutual Dependences and Performances. In: Proc. of the 2005 IFIP Med-Hoc-Net Conference, Porquerolles, France (June 2005) 7. NRLOLSR, http://pf.itd.nrl.navy.mil/projects.php?name=olsr
The Three-Level Approaches for Differentiated Service in Clustering Web Server Myung-Sub Lee and Chang-Hyeon Park School of Computer Science and Electrical Engineering, Yeungnam University Kyungsan, Kyungbuk 712-749, Republic of Korea {skydream,park}@yu.ac.kr
Abstract. This paper presents three-level approaches for the differentiated Web Qos. A kernel-level approach adds a realtime scheduler to the operating system kernel to keep the priority of the user requests determined by the scheduler in the Web server. An application-level approach which uses IP-level masquerading and tunneling technology improves the reliability and response speed of the Web services. A dynamic load-balancing approach uses the parameters related to the MIB-II of SNMPand the parameters related to the load of the system resources such as memory and CPU to perform load balancing dynamically. These approaches proposed in this paper are implemented using a Linux kernel 2.4.7 and tested in three different situations. The result of tests shows that the approaches support the differentiated services in clustering web server environment. Keywords: Differentiated Qos, Dynamic load balancing, SNMP, MIBII, Realtime scheduler.
1
Introduction
Recently the technologies related to Web QoS(Quality of Service) which guarantees the quality of Web services are becoming more important[1,2,3]. Particularly for the differentiated quality of Web services, Web servers are required to be able to classify contents depending on the importance of the information and the priority of the customer and perform scheduling among the classified contents. However, most Web servers currently provide best effort services on a FIFO (First In First Out) basis only. This means that, when they are overloaded, the servers cannot provide the appropriate services for the premium user [4,5]. Hence, a new server model is needed so that it may guarantee the quality of services by classifying services according to specific criteria and providing differentiated services. Despite the rapid expansion in Web use, the capacity of current Web servers is unable to satisfy the increasing demands. Consequently
This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment).
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 230–235, 2009. c IFIP International Federation for Information Processing 2009
The Three-Level Approaches for Differentiated Service
231
even if a Web server providing differentiated services is developed, it cannot guarantee perfect service. As a resolution for Web QoS, Web server technologies employing load balancing have been proposed. However, the exiting load balancing technologies for Web servers still have some problems, such as incompatibility between different client application programs[6], inability to process overloaded servers[7], overload when processing HTTP requests/ replies [8,9,10,11], packet conversion overheads[12,13], and etc. This paper proposed three-level approaches for implementing load balancing Web servers the can guarantee differentiated Web QoS. In the first approach, a scheduling module is added to Web server, which assigns a priority to a client request according to its importance, and a realtime scheduler is inserted into the OS kernel so that the assigned priority can be kept in the OS, and thereby more efficient differentiated service Is provided. In the second approach, the load balancing Web server is configured using masquerading and tunneling technologies to distribute the load by class, thereby the reliability and response time of the Web services are improved. The third approach uses the parameters related to the MIB-II of SNMP and the parameters related to the load of the system resources such as memory and CPU to perform load balancing dynamically.
2
A Differentiated Web Service System
The proposed system uses three-level approaches: kernel-level approach, application-level approach, and load-balancing approach. 2.1
Kernel-Level Approach
For the client requests, this approach maintains their priority order determined by the Web server in the OS kernel. This approach is implemented by mapping the scheduling process in the Apache Web server to realtime scheduling processes in the OS kernel. When the client requests come through a Network Interface Card(NIC), the Web server receives them from port 80 in the TCP listening buffer classified them by connection according to specific classification policies(client IP,URL, file name, directory, user authentication, etc.), assigns the proper priority, then inserts them into the corresponding queues. Thereafter, at the same time the requests are being scheduled, the scheduling process in the Web server are mapped one-to-one the processes in the realtime scheduler(Montavista in this paper) in the Linux Os kernel. 2.2
Application-Level Approach
The load balancing Web server proposed in this paper has a high performance and expansibility by enhancing the packet transmission rate and by resolving the bottleneck in the load balancer through the use of IP-level masquerading
232
M.-S. Lee and C.-H. Park
and tunneling. In the proposed system, a single load-balancer distributes the requests to several real servers, which share a common IP address, using a masquerading techniques so that they look like a single server from the outside. IP masquerading hides the real servers behind a virtual server that acts as a gateway to external networks. 2.3
Dynamic Load Balancing Approach
The load balancer analyzes the load of the actual server, by analyzing the utilization of the ethernet and the rate of system load, after processing the MIB-II value that is related to the load of SNMPv2. The systemic load analysis proposed in this paper is the value in which the utilization of the system is added to all the utilization of the ethernet, The equation(1) for load analysis is as follows: T otal loadi = Ethernet U tilizationi + Sys U tilizationi
(1)
The ethernet utilization, given equation (2), means all the traffic amount of input and output of the load balancer. In other words, the sum of all the bit number of packets that transmitted from the sending side and all the bit number that the receiving side received is divided by the whole bandwidth of the network. The variables used to measure the utilization of the ethernet in this paper are listed in table 1. Ethernet U tilizationi = (total bit senti + total bit receivedi )/bandwidth (2) Table 1. The variables of ethernet utilization Item
Explanation
x T ifInOctets ifOutOctets sysUpTime ifSpeed
Previous polling time. Polling interval. The number of received octets. The number of transmitted octets. System boot time. Current network bandwidth.
In order to determine the utilization of an ethernet network, the input and output traffic must be added, and then the sum is to be divided by the maximum transmission speed. The ethernet traffic analysis equation is given by equation (3). (if InOctets(x+t) − if InOctets(x) + if OutOctets(x+t) − if OutOctets(x) ) × 8 × 100 (sysU ptime(x+t) − sysU pT ime(x) ) × if Speed × 10 (3)
The utilization of the system is the sum of memory utilization, CPU average utilization, and disc utilization, as shown in equation (4) The variables used to measure the utilization of the system in this paper are listed in table 2.
The Three-Level Approaches for Differentiated Service
233
Table 2. The variables of system utilization Item
Explanation
memTotalSwap memAvailSwap memTotalReal memAvailReal memTotalFree laLoad x dskTotal dskAvail dskUsed
The The The The The The The The The
total space of swap memory. available space of swap memory. total space of physical memory. available space of physical memory. total space of free memory. average load of CPU for x minutes. total disk space. available disk space. used disk space.
Sys U tilizationi = memSwapLoadi + laLoadi + dskLoadi
(4)
In order to determine the utilization of a system, the memory utilization and CPU utilization and disk utilization must be added. The calculation equation of system utilization is given below: memSwapLoadi = memRealLoadi =
memT otalSwapi − memAvailSwapi × 100 memT otalSwapi
(5)
memT otalReali − memAvailReali × 100 memT otalReali
(6)
laLoad = laLoadx × 100 dskLoadi =
dskU sedi × 100 dskT otali
(7) (8)
Equation (5) is used to calculate the use rate of swap memory using the memTotalSwap value and the memAvailSwap value. Equation (6) is to calculate the utilization of physical memory using the memTotalReal value and the memAvailReal value. Equation (7) is to calculate the average utilization of CPU for x minutes by percent. Equation (8) is to calculate the utilization of disc using the dskTotal value and the dskUsed value.
3
Implementation and Experiment
The differentiated Web service system proposed in this paper is implemented using a Linux Kernel 2.4.7 and PCs with a Pentium-III 800Mhz processor and a 256MB RAM, while the test environment is built by networking three clients, one load balancer, two servers, and one monitoring server. An Apache Web Server 2.4.17 is modified for the Web server, and a Montavista realtime scheduler is added to the Linux kernel. In this paper, HP’s httperf program, and AB(Apache HTTP server Benchmark tool) that measure the response speed of the Apache server are used to evaluate the capability of Web server.
234
M.-S. Lee and C.-H. Park
Fig. 1. Experimental graphs of the ethernet and system utilization
Test are carried out for three cases: when the servers are not overloaded(test 1), when the servers are loaded (test 2), and when the servers are overloaded and some requests are subsequently stopped (test 3). In test 1, the virtual IP address is 165.229.192.14, the total number of connections 50000, the number of concurrent users per session 1, and the number of calls per session 50. Fig. 1(A) presents the results of the ethernet and system utilization which shows the reply changes of Web servers upon the three clients. As the servers are not overloaded, the graphs are almost the same. However, if Web servers capability 1.3 times better than the least connection scheduling, and 1.5 times better than the round-robin scheduling. In test 3, which uses same conditions as test 2. But the script code was formatted with such mechanism that the CPU load of real server 1 would increase. As shown in Fig. 1(C), the proposed mechanism realized 1.3 - 1.6 times better capability than other scheduling algorithms. It was because load balancing was precise owing to the periodical measure of present load of every real server.
4
Conclusion
To implement a differentiated Web service system that provides differentiated services according to information importance of user priority, this paper proposed three-level approaches : a kernel-level approach, an application-level approach and a dynamic load-balancing approach. In the kernel-level approach, a realtime scheduler is added to the kernel, while in the application-level approach, the load balancer is implemented using an IP-level masquerading technique and
The Three-Level Approaches for Differentiated Service
235
tunneling technique. The performance of the load balancing system was tested in three different situations, and the results confirmed that the system supported differentiated Web services.
References 1. Fielding, R., Getys, J., Mogul, J., Frystyk, H., Berners-Lee, T.: Hypertext Transfer protocol HTTP/1.1, IETF (1997) 2. Bhatti, N., Bouch, A., Kuchinsky, A.: Integraing User perceived Quality into Web Server Design. In: Proc. of the 9th International World Wide Web Conference, Amsterdam, Netherlands, pp. 92–115 (2000) 3. Vasiliou, N., Lutfiyya, H.: Providing a Differentiated Quality of Service in a World Wide Web Server. In: Proc. of the Performance and Architecture of Web Servers Workshop, Santa Clara, California, USA, pp. 14–20 (2000) 4. Apach Group, http://www.apache.org/ 5. Bhatti, R., Friedrich, R.: Web Server Support for Tiered Services. IEEE Network, 64–71 (1999) 6. Yoshikawa, C., Chun, B., Eastharn, P., Vahdat, A., Anderson, T., Culler, D.: Using Smart Client to Build Scalable Services. In: USENIX 1997 (1997), http://now.cs.berkeley.edu/ 7. Kwan, T.T., McGrath, R.E., Reed, D.A.: NCSA’s World Wide Web Server: Design and Performance. IEEE Computer, 68–74 (1995) 8. Dahlin, A., Froberg, M., Walerud, J., Winroth, P.: EDDIE: A Robust and Scalable Internet Server (1998), http://www.eddieware.org/ 9. Engelschall, R.S.: Engelschall, Load Balancing Your Web Site: Practical Approaches for Distributing HTTP Traffic. Web Techniques Magazine 3 (1998), http://www.webtechniques.com 10. Walker, E.: pWEB - A Parallel Web Server Harness (1997), http://www.ihpc.nus.edu.sg/STAFF/edward/pweb.html 11. Andresen, D., Yang, T., Ibarra, O.H.: Toward a Scalable Distributed WWW Server on Workstation Clusters. In: Proc. of 10th IEE Intl. Symp. of Parallel Processing (IPPS 1996), pp. 850–856 (1996) 12. Anderson, E., Patterson, D., Brewer, E.: The Magicrouter: an Application of Fast Packet Interposing (1996), http://www.cs.berkeley.edu/~ eanders/magicrouter/ 13. Zhang, W.: Linux Virtual Server Project (1998), http://proxy.iinchina.net/~ wensong/ippfvs 14. Montavista Software, http://www.montavista.com
On the Manipulation of JPEG2000, In-Flight, Using Active Components on Next Generation Satellites L. Sacks1, H.K. Sellappan1, S. Zachariadis2, S. Bhatti2, P. Kirstein2, W. Fritsche3, G. Gessler3, and K. Mayer3 1
Department of Electronic & Electrical Engineering, University College London, London, UK 2 Department of Computer Science University College London, London, UK 3 IABG mbH, Ottobrunn, Germany
1 Introduction This paper describes two approaches to manipulating JPEG2000 frames with programmable and active networks. The first approach is the use of transcoding and the second is intelligent dropping. These two approaches where considered, in particular, for possible deployment with space based platforms; specifically, communication satellites which are not only IP enabled but may host active components. Each approach offers different possibilities and may be suitable for solving overlapping but different problems. The work presented here brings together a number of background technical developments from the communications satellite world, video-coding and intelligent programming models. A detailed look at the developments of satellite based communication platforms shows that there is the possibility of a fully IP enabled system, supporting multicast and quality of service in space. This not only opens a range of possibilities, but presents new challenges. Further, emerging coding schemes open up new possibilities for manipulation of content within the networks. JPEG2000 was used as an example of the next generation of scalable codecs and it has been found that it lends its self easily to helping the kind of problems considered here, although much of these developments can be applied to other coding schemes. The two scenarios considered in detail in this paper – intelligent dropping and transcoding – show two approaches to coping with varying available bandwidth, as will be available with DVB-S2. They also illustrate two approaches to programmable networks. Intelligent dropping is best performed by dedicated systems where queuing is performed (for example routers in the input to link modulators) and so is best managed through Policies. Transcoding, in contrast, is codec specific and needs to be performed with a procedural language; it is thus an excellent example of an application level active networking technology. The possibility of network level active networking is not demonstrated by these, but is considered as a general issue in this context. The work presented here was initiated and funded by the European Space Agency (ESA), European Space Research and Technology Centre (ESTEC) ARTES 1 program. The complete project report will be made available in due time. The brief was to undertake a “study addressing the use of Active and Programmable Networking in Space and Ground Segment, to improve the delivery of Multimedia Services over Satellite”. The project covered a wide range of issues including architectural issues, D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 236–246, 2009. © IFIP International Federation for Information Processing 2009
On the Manipulation of JPEG2000
237
candidate technologies, standards, performance and security. It also included two demonstrations. The work presented here reflects this project with an emphasis on the demonstrations developed (it should be noted that the intelligent dropping scenario discussed in this paper was developed with Mr. Sellappan, as part of his MSc, and is not identical with that used in the study). Section 2 discusses the architectural issues both from the high level and business model perspective and considering some details of the space segment. Section 3 reviews some technologies used in this study. In particular it looks at the evolution of satellite platforms for broadcast media, JPEG20000 and SATIN, the mobile code technology used here. Section 4 reviews a number of applications for active networking in this context and outlines in more detail the two target scenarios of intelligent dropping and transcending.
2 Architectural Issues To understand the application of active networking to satellite telecommunications it is important to develop a high level architecture and business model, describing both ways in which the technologies may be applied and the roles of organisations and players involved. It is also important to consider the detailed architecture of the space platform, as this is one of the least versatile components and has the most stringent performance requirements. 2.1 The High Level Architecture The high level architecture and context of the technologies discussed here is not well defined; but is open to a number of scenarios, depending on business and application needs. To understand the overall concept, we define the basic service scenario in which some media (still images or a video stream) is sent from a single source to one or several users; and transits a satellite link. There may or may not be wide area or local area networks on the user side as the users may, for example, be within an organisation owning a satellite receive, they may be using a VSAT device or the satellite link may be providing a distribution system between two internet service providers. In our context we define the following roles (which, of course, may be taken by one or several players depending on the details of the business model). The Media Provider owns the content which may be in any format. Optionally, the media provider may participate ‘actively’ in the service by having a server which can encode the content in JPEG2000 format. As an alternative, the server may provide content in other formats which are encoded on the fly at the satellite link ingress. For the intelligent dropping scenario, the ingress point has to be involved in order to prioritise packets and add information to the data stream to support this. In either case, the satellite is considered there to be an active component in which specific flows can be directed to programmable components for treatment (e.g. trancoding or intelligent dropping). Finally, the stream may either be re-coded to it original format at an egress active component, or may be sent directly to a terminal devices which can either directly decode the content through resident or downloaded components. The final role in this high level model is the Active Components Provider which hosts the active components to be diploid in each appropriate node as required. In practice this role may be played by a third party, the content provider or the satellite service provider.
238
L. Sacks et al.
Media Provider Ingress Nodes
Egress Node Satellite Node
Active Component Provider
Users Fig. 1. High level Architecture
2.2 The Satellite Platform Architecture For this study our primary focus was on the functional description of the satellite platform. For this we defined a detailed architecture as a starting point for developing a standard and the appropriate functional requirements. This architecture was developed so as to define the full context for active services on a satellite platform. However we did not define control and management concepts for this. The Active components are expected to be hosted in a virtual machine run time environment such as the JavaVM. Within this context we defined four types of component. The Dynamic Components are those which constitute the active service and are deployed on demand. To support these, we defined two sets of components which interact via the same semantics and interfaces as the dynamic components; the Interface and Resident components. The Interface components allow the service to interact with the infrastructure of the satellite and the Resident components allow the service to receive and send the stream data. The final class of components in the execution environment are those which manage the location and deployment of the active components themselves. Around the execution environment is the transport system and the platform infrastructure. The transport system comprises the DVB and MPEG (multiplex) capabilities; and some sense of switching or routing capability. The transport system has to be capable of re-directing appropriate flows to the active components as required and so would integrate with an active routing technology (which was not explored in this study). Further, the transport system can be controlled through policies from the active components. The control through policies represents a second, widely accepted, view of active networking and thus our system facilitates both application level active networking and policy based programmability. Both of these are useful and have been explored in this study. The infrastructure of the satellite platform is accessed to retrieve and, possibly, manipulate key capabilities. For example the action of an active
Deployer/Registrar Registrar Deployer/
Edu.UCL.satin … Edu.UCL.satin …
JVM:J2ME:CDC
239
JNI
On the Manipulation of JPEG2000
Sat Platform
IF-LMU Platform IF
PAN Resources Dynamic -LMU Dynamic -
Res -LMU Flow - IF Processing Resources
‘IP’ Layer Comms Control
Encap/ Decap
demod demod
Switching
mod mod
Fig. 2. Satellite Active Architecture
component may need to know the quality of the down link (discussed in detail below), the state of IP queues in the switch/router or the current loading on the execution environment resources (CPU and Memory). The function of the resident components is to provide consistent interfaces to the dynamic components, which preserving the security and integrity of the satellites systems. Although components to be deployed may be checked ‘off line’ and will be certified, it is still important that the platform is capable of protecting its self. Thus the Interface components would not just provide APIs, but should restrict and schedule access to the platform; and should monitor for integrity issues such as deadlock and livelock situations.
3 Technologies This section discusses the technologies used to demonstrate transcoding and intelligent dropping with JPEG2000. The overall project reviewed a range of possible technologies, both for media transport and programmable network implementation. The SATIN mobile code platform was selected as an interesting example of a framework for reasons discussed below. JPEG2000 was selected as an interesting scalable codec, compatible with a wide range of future applications. 3.1 The Programmable Component System: SATIN We used the SATIN [1] platform to implement our active network system. SATIN is a component metamodel, which can be used to build adaptable systems. It is instantiated
240
L. Sacks et al.
as a middleware system, which can adapt itself or component based applications which run on top of it. Its use as an active networking platform was initially outlined as a case study in [2]. SATIN uses the principles of reflection and logical mobility to offer adaptation primitives, and offers them as first class citizens to applications. Reflection offers a meta-interface to the system, that allows SATIN applications discover which components are available locally, to be notified of changes in component availability, to load and discard components, to discover the interfaces that they offer and to read and write metadata attached to a component. Logical mobility allows SATIN systems to send and receive components, classes, objects and data. SATIN was designed for mobile devices and devices with limited resources, such as mobile phones and PDAs, and is implemented using Java 2 Micro Edition (Connected Device Configuration / Personal Profile). It occupies 150329 bytes of memory, and provides the following services which are relevant to an active networking system: •
• • • •
Reflection. The use of reflection allows the system to reason about what it can currently do, dynamically invoke components, use an event service which notifies when there are changes in component availability, and to discover new APIs that are offered by components. Logical Mobility. The use of logical mobility allows components to be dynamically transfered from one SATIN node to another. Dynamic Component Instantiation. Components can be dynamically discovered, instantiated and invoked. Advertising and Discovery. SATIN provides an adaptable framework for component advertising and discovery. Security. SATIN nodes can employ digital signatures and trust mechanisms to verify incoming components. Moreover, SATIN uses the Java sandbox to instantiate components in.
In conclusion, SATIN is a small footprint middleware system that can be used to build adaptable systems. It provides a minimal runtime API that can be used to discover, request, deploy and invoke new components and an interface to reason about what individual components or the system itself can do. 3.2 The JPEG2000 Codec The final component used in this study was the JPEG2000 scalable codec. The fast expansion of multimedia and Internet applications led to the development of a new still image compression standard, JPEG2000 [3]. This image coding system uses latest compression techniques based on wavelet technology. Designed to complement the older JPEG standard, JPEG2000 provides low bit-rate operation, error resilience and superior compression index. Some of the important features of JPEG2000 are lossless and lossy compression, progressive transmission by resolution or component and region-of-interest (ROI) coding. In JPEG2000, images are typically divided into multiple tiles and encoded independently. This is done to avoid the need for complex and powerful processors when loading and encoding huge images in hardware. The Discrete Wavelet Transformation (DWT) is designed for this purpose. A tile-component is referred to a tile which consists of only one colour component. Each tile-component is then further divided down to different
On the Manipulation of JPEG2000
241
resolutions and sub-bands with the use of DWT. Each resolution is then divided into multiple precincts which identify a geometric position of a tile-component of an image. Furthermore, each sub-band at each resolution is divided into multiple code-blocks, which will be coded into individual packets. The packets are arranged in the codestream following a particular progression order specified at the encoder. Motion JPEG2000 (MJ2) has been defined in part III of the JPEG2000 standard [5] for compression and encoding of time sequences of images, like video streams. The standard was developed to generate highly scalable compressed video, which can be easily edited. Thus, MJ2 does not include motion compensation and every frame in the video stream is individually encoded [6]. The concept of intra-frame coding reduces the complexity of inter-frame interdependencies found in other video codec. Since each frame is treated like a still image, a single codec can be used for both JPEG2000 still pictures and MJ2 video compression and encoding.
Pixel Depth (SNR) coding
Code Stream Æ Sequence Progressively resolution coding
Fig. 3. JPEG2000 Coding by Resolution and SNR [4]
A key feature of JPEG2000 is that it supports two forms of scalability. The first SNR scaling can progressively reduce the amount of information per pixel. This may be performed in a number of ways, on the various coding layers. The affect of SNR scaling is shown in Figure 4. The second form of scaling is progressive resolution coding. This effectively changes then size of the transmitted frame. Illustrated in Figure 3 is the organisation of the JPEG2000 code stream. Four levels of resolution coding are shown. The dark square is a low resolution image, when combined with the neighbouring three code blocks; a higher resolution image is formed. This is then repeated with the next two sets of three code blocks. At the same time the number of bits per pixel (bpp) represents the SNR coding depth. 3.3 Satellite Transport At present, the trend of satellite communication is heading towards deployment of next generation broadband satellites that provides multimedia applications with high demands for quality and availability of service. In line with the anticipated advancement and to meet the growing demand for high data rates, much research has been focused on regenerative and full on board processing (OBP) payloads, advance mesh
242
L. Sacks et al.
antennae, high speed communication system, optical satellite links, high speed transponders and miniaturisation of satellite components [7, 8]. The first regenerative satellite payloads are operational or under construction, performing on-board MPEG-2 cell multiplexing and switching (e.g. Skyplex or AmerHis) or ATM cell switching (e.g. WEST and WEB 2000 architecture), and the UK-DMC satellite even implements an IP stack. It is foreseeable that future communication satellites will implement an IP stack and on top of this a programmable network platform. Future generations of communication satellites may support a range of new capabilities. The most immediate is the emergence of DVB-S2[9], an effective upgrade of the current Digital Video Broadcast over Satellite (DVB-S) transport technology, which is the dominant means for transporting most broadcast video to date. DVB-S2 includes adaptive Forward Error Correction (FEC) which means that the up and down link data capacities can adapt to changes in fading, for example from varying cloud cover. This also means, however, that both the up and down link transport capacities may independently vary; increasing with weaker FEC and contracting as stronger FEC is required. Each FEC frame has fixed size of either 64,800 bits for normal FEC frame or 16,200 bits for short FEC frame. Depending on the FEC code rates, the capacity of the frames may be anything between these limits. Further, work is already progressing on IP enabled capabilities for satellites. Although this may not be an obvious thing by its self, combined with developments in tuneable, multiple footprint platforms, we see a model of a satellite which not only may have to do routing, but multicast and some amount of Quality of Service management. Standards for DVB over IP are already in progress – and so we see the emergence of true IP based broadcast architecture.
4 Active Services in Satellites A wide range of applications scenarios can be considered for application in the architecture described above. The following is a list of applications considered in some detail in the study. • • •
•
•
Reliable Multicast (RM) scenario: In the RM scenario, active networking may be used to load RM protocol instances and FEC codecs to relevant network nodes on demand while runtime. Intelligent Dropping (ID) scenario: In the ID scenario, intelligent dropping processes (or just policies) that evaluated the priority of packets or streams are loaded dynamically via PAN to satellite nodes that have to drop packets. Transcoding (TC) scenario: In the TC scenario, via active networking transcoders are controlled and loaded dynamically by satellite network nodes in order to adapt parameters (e.g. codec, data rate, etc.) of multimedia streams to network conditions or user requirements. MHP scenario: In the Multimedia Home Platform (MHP) scenario an intelligent process in the satellite routes certain MHP content only to specific spot beams and performs user feedback aggregation. via active networking, the intelligent process could be loaded by the satellite and adapted to new applications. MPEG-4 scenario: In the MPEG-4 multiplexing scenario content from different sources composing a single MPEG-4 scene are multiplexed in the satellite. via
On the Manipulation of JPEG2000
• •
243
active networking, the composition software can be loaded by the satellite and controlled from the base station. Advanced network management: In the advanced network management scenario, management or monitoring software is adapted on demand to operator needs via a active networking system. Advanced caching scenario: Caching software is loaded, updated, and controlled on demand via a active networking system in order to adapt the caching software to applications and services.
These scenarios represent a broad range of; applications, network and policy level programmable network problems. The scenarios described below focus on application level and policy level approaches. Both cases are to do with manipulating JPEG2000 images. Both where implemented on PC based test-beds and used the Java JJ2000 implementation [10] available from EPFL. At this time, there is no reference satellite platform defined for developing and testing the kind of applications discussed here; and one of the outputs from the Arties 1 project was the recommendation that such a platform be developed. 4.1 The Transcoding Scenario The issue of transcoding has been widely explored (see for example [11]). This transcoding scenario considers a stream of JPEG2000 images, possibly forming a moving image and which requires that a constant frame rate is preserved (or considers cases where the download latency is just to slow for ‘full fat’ images). Issues impeding this can arise in a satellite situation when, for example, the down-link FEC reduces the available link bandwidth. It may also occur if there is routing in the satellite platform, which may have multiple foot prints. There are then two options; reduce the resolution or change the information depth (SNR) of the picture as illustrated in Figure 4. The former of these may be preferred if, for example, the terminal devices has a small screen.
Fig. 4. SNR Progressive coding (3.55bpp, .355bpp, .0355bpp and -0.009bpp)
To implement this application the following functions are performed, with regard to the reference architecture (shown in Figure 1): • •
The Media Provider may re-code the images in JPEG2000 format. This may, optionally, be performed on the Ingress Node. The Satellite Node hosts the active components, loaded from the Active Component Provider.
244
•
L. Sacks et al.
If the stream is encoded at the Ingress Node, it should be decoded back to the originating format at the Egress Node. Otherwise the User should have the appropriate decoder.
On the satellite node, the active component is loaded as a Dynamic Component. It should then have access to the link capacity from the modulator (Figure 2) to discover the available down link capacity and decide how to transcode the image. It should, of course, also know the rate at which the images are being sent for the target stream and the overall downlink load. Implementation of the transcoding component may be approached in a number of ways. Within the scope of the ARTIS project, and expedient approach was taken and the encoder and decoder provided with [10] where used. However, the code streams of JPEG2000 are so designed that significant efficiency gains may be had. The resolution transcoding is the most straightforward as can be seen from Figure 3, this requires simply that the code stream is truncated at the right place. In this respect, it is similar to the intelligent dropping scenario below. SNR progressive scaling is more complex and detailed information about each tile is required. Never the less, these are all integer operations and so can be quite efficient from the performance perspective. 4.2 The Intelligent Dropping Scenario As with the transcoding scenario, intelligent dropping may be appropriate when constant frame rates are required in the presence of congestion or reduced link capacity. However it maybe that there are queuing mechanisms involved which by default drop packets randomly – where by randomly we mean, with out regard to the contents of the packets. The impact of this can be seen in Figure 5 in which the first image is the reference graphic and the second has 1% of packets lost at random. Of course, it is possible that dropping packets has little impact and the figure here shows a bad case, although not unusual.
Fig. 5. Intelligent Dropping (0% loss, 1% Random loss, 80% intelligent dropping)
The effect seen, can be understood with reference to Figure 3. It can be seen that if a packet containing the lowest resolution ‘thumbprint’ of the coded image is lost, recovering the graphic will be almost impossible and the impact becomes progressively less server as the outer code blocks are lost. Thus, prioritising loss progressively, following the flow of the codec would allow a more graceful information loss; and allowing for this was part of the initial intent of the JPEG2000 technology. The last image in Figure 5 has 80% of its data dropped, but using low priority packets.
On the Manipulation of JPEG2000
245
Architecturally, the implementation of intelligent dropping is similar to the transcoding scenario. The two key changes are the introduction of an intelligent, adaptive encapsulateor at the satellite ingress point and the control of the dropping mechanism in the space segment. These are shown in Figure 5. Intelligent Dropping Service
Link Quality & Congestion Feedback
Link quality Feedback
Satellite (Active Node)
Satellite RF Link
Intelligent Dropping Service
Video Recorder (Source)
Motion JPEG 2000 Encoder
Satellite TX Terminal
Active Encapsulator
Satellite RX Terminal
LDPC & FEC feedback (max DFL)
Active Decapsulator
Motion JPEG 2000 Decoder
Video Player (Sink)
Fig. 6. The Intelligent Dropping Architecture
In this architecture [12], the active encapsultor has two jobs. The first is to decide, depending on things like link quality, how to place the JPEG2000 code stream in the IP packets. There is an important trade off here as, on the one hand, the finer the granularity this is done at the more precisely the available band width can be matched; but the overheads are increased. Too course a granularity and the quality of the image will be degraded too quickly. The second job is to add a header which describes the priority of the packets as well as which is the; first, last and sequence number of the packets of a given video frame. The ingress point can drop packets in response to the uplink quality as can the satellite platform its self. On the satellite platform, the condition of packet dropping can be controlled with policies transported via any policy management framework. Finally, the satellite egress point has to extract and reassemble the code stream.
5 Conclusions This paper has reviewed the issues which arise when considering the role of programmable and active networking in the context of satellite based telecommunications. We have motivated the plausibility of such an architecture by considering the evolutionary path of telecommunications satellites. We have also tried to motivate the advantages of considering placing active components on the satellites themselves. Indeed, throughout our project trying to understand where the business advantage come has been more difficult then considering its technical viability. However, if we consider the combination of factors such as the adaptive FEC, multiple, tuneable footprints with onboard routing or switching, the importance of on-board processing
246
L. Sacks et al.
becomes more apparent. Finally, it is important to note that the timescales of satellite deployment are long compared to those in the fixed network world, so it is very difficult to know which application level protocols will be used once the satellites are commissioned. Thus, if it is important to have intelligence in the satellite at all, it is important that that intelligence is adaptable both to protocols and codecs which might emerge in the future; and can adapt to local conditions (such as link capacity variation and congestion). To emphases these issues of adaptation we have both used a codec which is very much in development and have explored how adapting to local conditions can be used to improve application performance in the face of degraded transport (down link) quality.
References [1] Zachariadis, S., Mascolo, C., Emmerich, W.: SATIN: A Component Model for Mobile Self- Organisation. In: International Symposium on Distributed Objects and Applications (DOA), Agia Napa, Cyprus. Springer, Heidelberg (2004) [2] Zachariadis, S., Mascolo, C., Emmerich, W.: Exploiting Logical Mobility in Mobile Computing Middleware. In: Proceedings of the IEEE International Workshop on Mobile Teamwork Support, Collocated with ICDCS 2002, pp. 385–386 (July 2002) [3] Boliek, M., Christopoulos, C., Majani, E. (eds.): JPEG 2000 Part I Final Draft International Standard (ISO/IEC FDIS15444-1), ISO/IEC JTC1/SC29/WG1 N1855 (August 2000) [4] Marcellin, M., Gormish, M., Bilgin, A., Boliek, M.: An Overview of JPEG 2000. In: Proceedings of IEEE Data Compression Conference, Snowbird, Utah (March 2000) [5] Information technology - JPEG 2000 image coding system - Part 3: Motion JPEG 2000, ISO/IEC 15444-3 (2002) [6] Dagher, J., Bilgin, A., Marcellin, M.: Resource-Constrained Rate Control for Motion JPEG 2000. IEEE Transactions on Image Processing 12(12), 1522–1529 (2003) [7] Iida, T., Suzuki, Y.: Communications satellite R&D for next 30 years. Space Communications 17, 271–277 (2001) [8] Verma, S., Wiswell, E.: Next Generation Broadband Satellite Communication Systems. In: 20th AIAA International Communication Satellite Systems Conference and Exhibit, Montreal, Quebec, May 12-15 (2002) [9] ETSI: Final Draft EN 302 307 (v.1.1.1) Digital Video Broadcasting (DVB); Second generation framing structure, channel coding and modulation systems for Broadcasting, Interactive Services, News Gathering and other broadband satellite applications (DVB-S2) (January 2005), http://www.etsi.org [10] http://jj2000.epfl.ch/ [11] Gibson, J.D.: Multimedia Communications: Directions and Innovations. Academic Press, London (2000) [12] Sellappan, H.K.: Active Networks in Satellite Communications: Intelligent Dropping Scenario for Motion JPEG 2000 Transmission MSc. Dissertation, UCL 2005 (2005)
TAON: A Topology-Oriented Active Overlay Network Protocol Xinli Huang, Fanyuan Ma, and Wenju Zhang Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China, 200030 {huang-xl,ma-fy,zwj03}@sjtu.edu.cn
Abstract. Built upon overlay topologies, Peer-to-Peer (P2P) networks behave in an ad-hoc way, conduct application-layer routing, enable usercustomized decentralized resources sharing, and thus can be taken as an emerging representative of Active Networks. An important problem in current unstructured P2P networks is that, however, existing searching mechanisms do not scale well because they are either based on the idea of flooding the network with queries or because they know very little about the nature of the network topology. In this paper, we propose the Topology-oriented Active Overlay Network (TAON) which is an efficient, scalable yet simple protocol for improving decentralized resources sharing in P2P networks. TAON consists of three novel components: a Desirable Topology Construction and Adaptation algorithm to guide the evolution of the overlay topology towards a small-world-like graph, a Semanticbased Neighbor Selection Scheme to conduct an online neighbor ranking, and a Topology-aware Intelligent Search mechanism to forward incoming queries to deliberately selected neighbors. We deploy and compare TAON with a number of other distributed search techniques over static and dynamic environments, and the results indicate that TAON outperforms its competitors by achieving higher recall rate while using much less network resources, in both of the above environments.
1
Introduction
In contrast to traditional data networks, Active Networks not only allow the network nodes to perform computations on the data but also allow their users to inject customized programs into the nodes of the network, that may modify, store or redirect the user data flowing through the network [1]. These programmable networks open many new doors for possible applications that were unimaginable with traditional data networks. For example, Peer-to-Peer (P2P) overlay networks conduct application-layer routing in an ad-hoc way, enable user-customized decentralized resources sharing, and thus can be taken as an emerging representative of Active Networks.
This research work is supported in part by the the National High Technology Research and Development Program of China (863 Program), under Grant No. 2004AA104270.
D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 247–252, 2009. c IFIP International Federation for Information Processing 2009
248
X. Huang, F. Ma, and W. Zhang
The most dominate application in use on current P2P networks is large-scale distributed file sharing, especially the Web-based search applications. YouSearch [2] maintains a centralized search registry for query routing, making it difficult to adapt the search process to the heterogeneous and dynamic contexts of the peer users. A more distributed approach is to completely decentralize search, as in Gnutella [3]. Queries are sent and forwarded blindly by each peer. But since the peer network topology is uncorrelated with the interests of the peer users, peers are flooded by requests and cannot effectively manage the ensuing traffic. Adaptive, content based routing has been proposed to overcome this difficulty in the file sharing setting. NeuroGrid [4] employs a learning mechanism to adjust metadata describing the contents of nodes. A similar idea has been proposed to distribute and personalize Web search using a query-based model and collaborative filtering [5]. Search however is disjoint from crawling, making it necessary to rely on centralized search engines for content. The major limitation of these systems lies in their relatively poor search performance and ignorance of the nature of the underlying topology, which results in fatal scaling problem. To address the scalability limitations mentioned above, we in this paper consider the Web information retrieval problem and propose a Topology-oriented Active Overlay Network (TAON ) protocol. TAON allows for symbiotic interactions of Web crawling and searching, whereby a peer can vertically adapt to its users’s search interests, while horizontally peers can achieve better coverage by learning to collaboratively route and respond to queries. TAON consists of three key components: a Desirable Topology Construction and Adaptation algorithm to guide the evolution of the overlay topology towards a power-law graph, a Semantic-based Neighbor Selection scheme to conduct an online neighbor ranking, and a Topology-Aware Intelligent Search mechanism to forward incoming queries to deliberately selected neighbors. We predict that the resultant topology for such a network is a small world, allowing for any two peers to reach each other via a short path (small diameter) while maximizing the efficiency of communications within clustered peer communities. To evaluate the performance gains of TAON, we will deploy and compare TAON with a number of other distributed search techniques over static and dynamic environments, through extensive simulations. The rest of this paper is organized as follows: In Section 2, we detail the design of TAON. In Section 3 and 4, we present the experimental setup and the simulation results respectively. We conclude this paper in the last section.
2
TAON Design
The objective of TAON is to help the querying peer to find the most relevant answers to its query quickly and efficiently rather than finding the largest number of answers. To achieve this goal, TAON exploits both the semantic and geographical locality to construct small-world-like peer communities, by incorporating three novel techniques presented below respectively, in a very brief fashion due to the space limitation.
TAON: A Topology-Oriented Active Overlay Network Protocol
2.1
249
Desirable Topology Construction and Adaptation
The Desirable Topology Construction and Adaptation algorithm is the core component that connects the TAON node to the rest of the network. To obtain desirable overlay topology and sequentially adapt it towards a better one dynamically, we prefer to keeping the out-degree of the network following a powerlaw distribution, and expect that this unique topological property, together with the other two novel techniques (to be addressed in the Section 2.2 and 2.3), will produce the desired “small-world ” phenomena [6]. We achieve the goal by adding and deleting links in a way that the total number of outgoing links at each node is conserved. We choose a node A at random, build a link from this node to a new node B chosen by a certain metric, and then immediately delete an existing link say with C to conserve links at A. In addition, by increasing the fraction of links rewired we get the required short path length. If the fraction of links deleted and rewired is p, then for very small p the average path length L(p) comes down by orders of magnitude and is close to that of a random graph whereas the clustering coefficient C(p) is still much large similar to that of a regular graph [7]. 2.2
Semantic-Based Neighbor Selection
To decide to which peers a queries will be sent, a node ranks all its neighbors with respect to a given query. To do this, each node maintains a profile for each of its neighbor peers. The profile contains the list of the most recent past queries, which peers provided an answer for a particular query as well as the number of results that a particular peer returned. According to [8], we use the Cosine Similarity function below to compute the similarity between different queries: (q ∗ qi ) sim(q, qi ) = cos(q, qi ) = , (1) (q)2 ∗ (qi )2 where sim(q, qi ) is the similarity of the query q and the query qi , calculated by the cosine of the angle between the two vectors q and qi . Based on this similarity between queries, we then use the relevance function in [9] to rank the neighbor peers of a node P0 following: RP0 (Pi , q) = sim(qj , q)α ∗ S(Pi , qj ), (2) j
where α is a parameter allowing us to add more weight to the most similar queries, j is the ID of each query answered by Pi , S(Pi , qj ) is the number of results returned by Pi for query qj , RP0 (Pi , q) denotes the relevance rank function of Pi and is used by P0 to perform an online ranking of its neighbors. The R function allows us to rank higher the peers that returned more relevant results, and thus realizes the semantic-based neighborhood optimization. In addition, we also make this semantic-based better neighbor selection strategy be orthogonal to the physical proximity based strategy that is integrated into the Topology-aware Intelligent Search mechanism (Section 2.3).
250
2.3
X. Huang, F. Ma, and W. Zhang
Topology-Aware Intelligent Search
To make the TAON node in the overlay topology being aware of the physical proximity in the underlying network, we divide its neighbors into local neighbors and global neighbors. The fraction of neighbors that are local, called the proximity factor (α), is a key design parameter that governs the overall structure of the topology. Different values of α let us span the spectrum of this class of overlay topologies. In between these two ends of the spectrum, we foresee that the topologies, with many local links and a few global links, have desirable properties: They not only have low diameter, large search space and connectedness, but are also aware of the underlying network and can utilize these links better than the both topologies. We aim to find a suitable balance between these advantages by simulation through populating the range of α value. The combination of the Semantic-based Neighbor Selection and the Physical Proximity based Neighbor Discrimination ensure that increasing queries from a node P can be answered by its neighbor nodes or their nearby nodes in the overlay topology, and that many such answerers may be geographically close to the requester. These properties are especially useful for reducing response time and network resources consumption. Based on these techniques, we then develop a novel search mechanism, called Topology-aware Intelligent Search, to conduct a kind of bi-forked and directed search as follows: – flooding the incoming queries to all local neighbors with a much smaller TTL value than that of the standard Gnutella protocol. – forwarding the incoming queries to the first k best global neighbors using multiple random walks coupled with the mechanisms of termination checking and duplication avoiding proposed in [10]. Here, all the local and global neighbors are beforehand selected and optimized using the Semantic-based Neighbor Selection strategy, and are discriminated by their physical proximity.
3
Experimental Setup
TAON is designed to perform efficient, scalable yet simple Web-based search by exploiting the locality semantically and geographically. Hence, our experimental evaluation focuses on four performance metrics below: – recall rate, that is, the fraction of documents that the search mechanisms retrieves. – search efficiency, that is, the number of messages used to find the results as well as the required time to locate the results. – utilization of underlying network, measured by the traffic load on the links in the underlying network, according to [11]. – small-world statistics, as a indicator to show whether the network topology is being evolved towards a “small-world” graph, and measured using the factors of clustering coefficient and diameter.
TAON: A Topology-Oriented Active Overlay Network Protocol
251
Based on the PLOD topology generator [12], we create a simulator (in which the TAON protocol is implemented and deployed) that initiates a power-law overlay topology and allows users to run their queries over real indexes obtained from actual distributed Web crawlers. Our simulator takes a snapshot of the network for every time step, during which all of the peers process all of their buffered incoming messages and send them following the TAON protocol. This may include the generation of a local query as well as forwarding and responding to the queries received by other peers.
4
Simulation Results
In this section, we describe a series of experiments that attempt to investigate the performance gains of TAON over its competitors of a) Breadth First Search (BFS, i.e., Gnutella), b) Random Walks (RW), and the Most Results in Past (MRP, i.e., the technique proposed in [13]).
Fig. 1. Comparisons of Recall Rate (a), Messages (b), and Physical Latency to Results (c) between the four search techniques, and Small-World Statistics of the TAON network (d)
Fig.1.(a) and (b) indicate that BFS requires almost three times as many messages as its competitors with around 1, 230 messages per query. In contrast, all of RW, MRP and TAON use dramatically less messages but TAON is the one that finds the most documents. In addition, the curves in Fig.1.(c) shows clearly that TAON results in smaller physical latency than the other three techniques, which means a better utilization of the underlying physical network.
252
X. Huang, F. Ma, and W. Zhang
Fig.1.(d) shows that the diameter remains roughly equal to the initial random graph diameter, while the clustering coefficient increases rapidly and significantly, stabilizing around a value 100 ∼ 125% larger than that of the initial random graph. These conditions define the emergence of a “small world ” topology in the TAON network. This is a very interesting finding, indicating that the peer interactions cause the peers to route queries in such a way that communities of users with similar interests cluster together to find qualified results quickly, while it is still possible to reach any peer in a small number of steps.
5
Conclusions
The TAON protocol we proposed in this paper provably results in significant performance gains of both enhanced search efficiency and reduced traffic load, by explicitly guaranteeing the desirable topological properties like small-world properties, and by exploiting the semantic and geographical locality to form better neighborhood and peer communities dynamically.
References 1. David, L., David Sincoskie, W., Wetherall, D.J., Minden, G.J.: A Survey of Active Network Research. IEEE Transactions on Communications (1997) 2. Bawa, M., et al.: Make it fresh, make it quick searching a network of personal webservers. In: Proc. of 12th WWW (2003) 3. http://rfc-gnutella.sourceforge.net 4. Joseph, S.: Neurogrid: Semantically routing queries in Peer-to-Peer networks. In: Proc. of Intl. Work. P2P Computing (2002) 5. Pujol, J., Sang¨ uesa, R., Berm´ udez, J.: Porqpine: A distributed and collaborative search engine. In: Proc. of 12th WWW (2003) 6. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998) 7. Puniyani, A.R., Lukose, R.M., Huberman, B.A.: Intentional Walks on Scale Free Small Worlds. LANL archive: cond-mat/0107212 (2001) 8. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press Series/Addison Wesley, New York (1999) 9. Zeinalipour-Yazti, D., Kalogeraki, V., Gunopulos, D.: Exploiting locality for scalable information retrieval in peer-to-peer networks. Information Systems 30(4), 277–298 (2005) 10. Lv, C., et al.: Search and replication in unstructured peer-to-peer networks. In: Proc. of ACM International Conference on Supercomputing (ICS) (June 2002) 11. Ripeanu, M., Foster, I., Iamnitchi, A.: Mapping the Gnutella Network: Properties of Large Scale Peer-to-Peer Systems and Implications for System Design. IEEE J. on Internet Computing, Special Issue on Peer-to-peer Networking (2002) 12. Palmer, C.R., Steffan, J.G.: Generating Network Topologies That Obey Powers. In: Proc. of Globecom 2000, San Francisco (November 2000) 13. Yang, B., Garcia-Molina, H.: Efficient Search in Peer-to-Peer Networks. In: Proc. of Int. Conf. on Distributed Computing Systems (2002)
A Biologically Inspired Service Architecture in Ubiquitous Computing Environments Frank Chiang and Robin Braun Faculty of Engineering, University of Technology Sydney, Broadway, NSW 2007, Australia [email protected]
Abstract. This paper describes the design of a scalable bio-mimetic framework in the management domain of complex Ubiquitous ServiceOriented Networks. An autonomous network service management platform - SwarmingNet is proposed. In this SwarmingNet architecture, the required network service processes are implemented by a group of highly diverse and autonomic objects. These objects are called TeleService Solons as elements of TeleService Holons, analogue to individual insects as particles of the whole colony. A group of TSSs have the capabilities of fulfilling the complex tasks relating to service discovery and service activation. We simulate a service configuration process for Multimedia Messaging Service, and a performance comparison is made between the bio-agents scheme and normal multi-agents scheme.
1
Introduction
The operational management of Next Generation Network (NGN) services is expected to be autonomous, scalable, interoperable and adaptable to the diverse, large-scale, highly distributed, and dynamically ever-changing network environment in the future. The functional management for them are also desired to be as simple as possible from the perspective of both designs and implementations. Current network management infrastructure is struggling to cope with these challenges. In contrast, social insects and biological organisms have developed relatively easy and efficient mechanisms to thrive in hostile, dynamic and uncertain environments after many years’ evolution and natural selection. Hence, taking advantages of the synthesis on full-scaled biological societies is of vital importance in achieving autonomic management in the future Ubiquitous ServiceOriented Network (USON) [1], which will dynamically connect human beings and home/office electronic appliances via distributed devices (e.g., cell phones, notebooks, PDAs) and applications running on these devices to enable services at any time, any places without constraints in quantity or frequency [2]. The aim of this paper is to propose a bio-swarming framework — SwarmingNet for network service managements. The biological platforms proposed by Suda [3] and Suzuki [4] are more emphasizing evolutionary behaviors of agents. The status of network services (mutation, clone, reproduction and replication) depends on D. Hutchison et al. (Eds.): IWAN 2005, LNCS 4388, pp. 253–258, 2009. c IFIP International Federation for Information Processing 2009
254
F. Chiang and R. Braun
multi-agents’ internal states. These platforms are particularly complicated in terms of rapid practical application. We distinguish our framework from theirs by applying TeleSolon hierarchy1 concepts into hierarchical network management system which is analogue to the ecosystem in nature and colony structure in ants. We apply stigmergic ant-foraging behaviors into the threshold-based selforganization algorithm. This is much easier to be applied into practical embedded systems with industries. This paper is organized as follows: The design principles and our self-organized provisioning algorithm for event-based autonomic management architecture are mainly specified in section 2. The system-level architecture are included in section 2. Section 2.1 covers the definitions and theorem for the threshold-based self-organized algorithm. Service configuration process of MMS is simulated as an illustrative application, simulation results are shown in Section 3. Finally we conclude with performance comparisons and future work trends in section 4.
2
Design Principles
The incorporation of social insects paradigm into autonomic service configuration is believed to be the right solution to meet these requirements. This can be achieved by modelling networks as a distributed aggregation of self-organized autonomous TSS solons in our approach. This is similar to the social insects colony (networks) consisting of large amount of individual insect (solon). 2.1
Self-organized Service Provisioning and Algorithm
Future networks should not only provide basic connectivity but also intelligently and immediately enable on-demand services in pervasive computing environments at anywhere, anytime. Those services must be provisioned in a flexible and distributive way in highly dynamic runtime infrastructure. Thus, service deployment and management for devices in USON is extremely difficult since a provisioning infrastructure ought to cope with the high level of heterogeneity, degree of mobility, and also take into account limited device resources. In this context of self-organization, we describe the service provisioning as the ability to create, to remove, to reproduce, to reconfigure the instances of services at runtime. Moreover, the bio-mimetic agents running at particular network nodes, (1) measure the local demands for network services autonomically beyond other nodes; (2) reconfigure/reproduce local services when demands are detected; (3) remove the services when there are no demands of quantity. Self-organization Algorithm — This section defines the algorithm2 to enable self-organized bio-networks. This algorithm also considers the scalable design principle, such as adaptivity and robustness in distributed environment; localized 1 2
Due to the limitation of pages, details of THSs and TSSs can be referred to in the full version under requests. Details of this algorithm and proof can be referred to in full version of this paper.
A Biologically Inspired Service Architecture
255
decision making process based on neighborhood information. Hence, it is also a practical principle to consider only local customers who are not too far from the service center. The content-based event messaging system categorizes and stores the messages for different services into different space in information store (e.g., database). The following definitions as a whole define the problem domain in our service provisioning process. Our self-organization algorithm considers both the factors of time and space as follows: Definition 1. A dynamic threshold θ is configured for each requested service respectively; A parameter η is used to evaluate the keen degree for customers who require certain services; A parameter d represents Euclidean distance between available server of services and customers; The localized service zone ω is designated as ω(d) ≤ 10% · D, where D is the diameter value of the whole service area. Definition 2. A parameter τ evaluates the intensity of digital pheromone which are placed along the traces to servers in dispersed area by previous bio-agents. The intensity of digital pheromone measures how easy the service is available in the particular server the path gets to. Moreover, τ is related to the Euclidean distance d. Lemma 1. (Time) When customer requests are accumulated to the θ value, the service provisioning starts. This is a dynamic value varying in accordance with essence of specific services. For some reason, some services should be activated as soon as there is a demand for it. While some services can be activated only when there are enough number of requests; (Space) When the customer and services are both inside localized service zone, the service could be provisioned. Theorem 1. The self-organized process for service provisioning is activated successfully iff Lemma 1 and Lemma 2 are satisfied simultaneously. 2.2
System-Level Architecture
This architecture we propose here is partly depicted in Figure 1 of our paper [5]. We have to omit it due to page limitations. This systems-level architecture illustrates the combinatorial links among our three indiscerptible parts for autonomic service activation process: 1) Users; 2) Instrumentation support and measurement, monitoring; 3) Enhanced 4-layers TMF management model. Analogue to the biological society, we introduce the concept of ecosystem into the whole system which acts as the environment where agents create, live and die. We designate the energy exchange is the ”currency” between ecosystem components (e.g. swarm agents) and eco-environment.
3
Simulation and Experimental Measurement
We choose the Multimedia Messaging Service (MMS) as an evaluation application. As for the system-level architecture, the managed object in this context is MM Box (MultimediaMessaging Box); The product components are 1)
256
F. Chiang and R. Braun
Gold MM Box (capacity=1000MB), 2) Silver MM Box (capacity=100MB), 3) Bronze MM Box (capacity=10MB). The events messages contain on-demanding service provisioning requests from clients, these messages include information about: 1) creation and deletion of users’ MM account in the product or 2) migration Multimedia Messaging (MM) account among the products – Gold, Silver and Bronze boxes. The service provisioning results indicate bio-inspired network management paradigm and maintain SLA compliance as well as efficient transaction time. Digital pheromone evaluates the degree of difficulty in activating or migrating MM accounts which are stored in MM servers in this context (The large intensity of digital pheromone means MM boxes are easier to be migrated from silver to gold, or from bronze to silver, etc.). The effectiveness of digital pheromone in MMS server configuration process in the framework has been tested. The service-configuration performance comparison between the bio-agents and normal agents are analyzed. Java classes R are built on the hybrid modelling platform AnyLogic . The experiment scenario is summarized here: The event messages with service requests from clients PC in our testbed will trigger the service configuration process whenever the service requests approach a service threshold θ ij where i represents the service ID, j represents the clients’ ID. We argue this is an autonomic process instead of an automatic paradigm because the θ ij value changes according to the requested service profiles and autonomic agents learn and decide the best threshold. Our adaptation strategies are not depending on a set of preconfigured rules like that in automatic system, on the contrary, the autonomy is achieved by goal setting and suggestion through learning and modification of the existing adaptation strategy. The multiplication ω of digital pheromone intensity and customer keen index will be considered as an important index in an exponential formula (e.g., exp(ω)), which determines which MM account will be configured to activate in certain MM servers. Service lifetime is calculated by multiplication of these two factors. Moreover, network vendor agents go through cach´e database for updated information being synchronized with our 4-layer structure which covers the specification files for products and services, and all the configuration files for resources (e.g., devices, equipment, etc.). Specifically, by taking into fact that service requests usually are provisioned by local servers, the factor d, an Euclidean distance measuring our virtual distance between service requests to MM server in the coordinate plane. If there are multiple servers meeting the requirement simultaneously, we will randomly pick up one of them. Figure 1 describes the overall simulation configurations in details. Figure 2 shows the parameters of 3 experimental scenarios which test the service provisioning. As shown in Figure 2 for Experiment 3, the number of MM servers is decreased till 100 while other parameters remain the same as above. 3.1
Experimental Results
Based on the three experimental test scenarios described in the previous subsection, the performance comparison between bio-agents and normal multi-agents
A Biologically Inspired Service Architecture
Topology
1000 virtual MM servers are uniformly distributed into an area [0, 280]
Event Parameters
Requests for on-demanding service are randomly generated by clients with fixed seed=1 over a time interval of 20 days Services lifetime are not permanent, they will deceased whenever there are no needs or termination willing from customers. {We give maximum_ service_lifetime = 1 day}
Space/Time Dynamics
Transaction time for each service requests dynamically change according to A = Customer_Keen_Index and B = Digital_Pheronmone_Intensity; In order to simplify the simulation factors, we set A=0.5; B = γ × d , (γ =0.5 or 0.6) where d = ( x − x1 ) 2 + ( y − y1 ) 2, d represents the Euclidean Distance between one particular service request and one particular MM server; Maximum duration for any certain service provisioning λ = 0.4 days Services configuration happen on those servers which are close to customers like in real world. We only provision the service distance d ≤ 30
257
Fig. 1. Experiment Description Digital_Pheromone_ Num_of_MM_Servers Initial_Value_Provision Intensity ed_Servers
Customer_Service_Keen _Index
Test 1
1000
0
0
0.5
Test 2
1000
0
0.6
0.5
Test 3
100
0
0.6
0.5
Fig. 2. Parameters for Experimental Test 1, 2 and 3
100
450
90 Service Provisioned for Services Requests (%)
Number of Current Provisioned Servers for Service Requests
Bio-inspired Service Provisioning Scheme
500
400 350 300 250 200 150 Biological agents scheme Normal agents scheme
100 50 0
Biological agents scheme Normal agents scheme
80 70 60 50 40 30 20 10
0
2
4
6
8
10
12
14
16
18
20
0 Low Client Nodes Density
High Client Nodes Density
Time in days
Fig. 3. Performance Comparison for Fig. 4. Service Configuring Percentage in Number of Servers Configured between Heterogeneous and Dynamic Network EnBiological Inspired Agents Scheme and vironment with Different Clients Density Normal Agents Scheme
258
F. Chiang and R. Braun
without biological behaviors are presented in Figure 3. As we can see, the number of servers in service provisioning process is bigger for bio-inspired agent framework when the same quantity of service requests (=11692) arrives. Service configuration tasks, or workload are distributed uniformly into servers with a shorter response time, and performance for load balancing is also optimized by this bio-inspired framework. We calculate the percentage of configured service of the total number of services, which though are finally provisioned respectively in the low client node density (=100 in the fixed area) and high client node density (=1000 in the fixed area). In low client requests environment, service configured percentage is 72%, which is higher than that in high client requests environment. However, our biological agents scheme results in better service configured percentages in the same environment as those normal multi-agent schemes. Details are illustrated in Figure 4.
4
Conclusion and Future Work
Firstly, we conclude that our bio-inspired multi-agent framework provides a solution to envision future autonomic service management system. This framework outperforms the current normal multi-agent based system in terms of service discovery and service assurance for future IP networks. Secondly, this framework does not rely on particular types of insects societies or colonies, i.e. agents could be entities in USON ranging from any hardware devices to robotic agents, or biologically-inspired software elements. Finally, our future work will specifically focus on the performance comparison among Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), and Genetic Algorithm (GA) with regards to efficient service configuration issues on the basis of this framework.
References 1. Yamazaki, K.: Research directions for ubiquitous services. In: Proceedings of International Symposium on Applications and the Internet, p. 12 (2004) 2. Suzuki, K.: Ubiquitous services and networking: monitoring the real world. In: Proceedings of ISAI, p. 11 (2004) 3. Nakano, T., Suda, T.: Self-organizing network services with evolutionary adaptation. IEEE Transactions on Neural Networks 16(5), 1269–1278 (2005) 4. Makinae, N., Suzuki, H., Minamoto, S.: Communication platform for service operations systems in advanced intelligent network. In: IEEE International Conference on Communications, vol. 2, pp. 872–877 (1997) 5. Chiang, F., Braun, R., Magrath, S., Markovits, S.: Autonomic service configuration in telecommunication mass with extended role-based Gaia and Jadex. In: Proceedings of IEEE International Conference on Service Systems and Service Management, vol. 2, pp. 1319–1324 (2005)
Author Index
Ardon, S´ebastien
145
Baldi, Mario 28 Banfield, Mark 83 Bhatti, S. 236 Bless, Roland 121 Bonnet, Christian 224 Boschi, Elisa 1 Bossardt, Matthias 1 Braun, Robin 194, 253 Chaudier, Martine 96 Chiang, Frank 253 Choi, Eunmi 212 Chrysoulas, Christos 206 Dedinski, I. 13 De Meer, H. 13 Denazis, Spyros 108, 206 D¨ ubendorfer, Thomas 1 Eyal, Amir
194
Farkas, K´ aroly 53 Filali, Fethi 224 Fritsche, W. 236 Gamer, Thomas 121 Gelas, Jean-Patrick 96 Gessler, G. 236 ´ G´ omez, Angel M. 200 Haas, Robert 108, 206 Haleplidis, Evangelos 108, 206 Han, L. 13 Han, Sang Man 218 H¨ arri, J´erˆ ome 224 Huang, Xinli 247 Hug, Hanspeter 53 Hutchison, David 83, 132 Jameel, Hassan
218
Kalim, Umar 218 Kim, Myung-Kyun Kirstein, P. 236
182
Kloul, Le¨ıla 168 Koufopavlou, Odysseas
108, 206
LakshmiPriya, T.K.S. 38 Leduc, Guy 156 Lee, Junggyum 212 Lee, Myung-Sub 230 Lee, Sungyoung 218 Lee, Young-Koo 218 Lef`evre, Laurent 96 Leopold, Helmut 83 Levitt, Karl 65 Lockwood, John 188 Lopez-Soler, Juan M. 200 Ma, Fanyuan 247 Martin, Sylvain 156 Mathy, L. 13 Mauthe, A. 132 Mayer, K. 236 Min, Dugki 212 Mokhtari, Amdjed 168 Park, Chang-Hyeon 230 Parthasarathi, Ranjani 38 Pezaros, D.P. 13 Phu, Phung Huu 182 Plattner, Bernhard 53 Portmann, Marius 145 Ramos-Mu˜ noz, Juan J. Risso, Fulvio 28 Ruf, Lukas 53
200
Sacks, L. 236 Sajjad, Ali 218 Sch¨ oller, Marcus 121 Sellappan, H.K. 236 S´enac, Patrick 145 Sifalakis, M. 132 Smith, Paul 83 Sproull, Todd 188 Sterbenz, James P.G. 83 Sventek, J.S. 13
260
Author Index
Tylutki, Marcus Xie, Linlin
65
Zhan, X.Y.
83
Yi, Myeongjae
Zachariadis, S. Zhang, Wenju
182
236
13 247
Zitterbart, Martina
121