Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
13
Series Editors Gerhard GOOS,Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands
Volume Editor Stefan Covaci GMD Fokus German National Research Center for Information Technology Research Institute for Open Communications Systems Kaiserin-Augusta-Allee 3 1, D-10589 Berlin, Germany E-mail:
[email protected] Cataloging-in-Publication data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Active network :first international workshop ;proceedings I IWAN '99, Berlin, Germany. June 30 -July 2, 1999. Stefan Covaci (ed.). Berlin ;Heidelberg ; New York ;Barcelona ;Hong Kong ;London ;Milan ;Paris ;Singapore ;Tokyo : Springer, 1999 (Lecture notes in computer science ; Vol. 1653) ISBN 3-540-66238-3
-
CR Subject Classification (1998): C.2, K.6, K.4, K.5, D.2 ISSN 0302-9743 ISBN 3-540-66238-3 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations m e liable for prosecution under the German Copyright Law. 0 Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10703977 0613142 - 5 4 3 2 1 0
Printed on acid-free paper
The First International Working Conference on Active Networks — IWAN ’99 The network model is changing from the traditional s„ tore-forward“ towards the s„ tore-compute-forward“ one. We are moving from a passive network, where the level of abstraction is the protocol (service model), towards an active network where the level of abstraction is raised to APIs (programming model) for programming the new network resources (communication, processing and storage). The network becomes a distributed computer capable of routing at gigabit and terrabit per second speeds and hosting several programming environments. This is going to change the whole network service and application design paradigm, enabling a new generation of network-aware software following the model of c„ ompute while travelling.“ For example, packets will carry their custom network service which will be computed in the active nodes. The implications of the active network infrastructure go beyond the technical issues and also address the way business will be created and managed in the future networked environment. Service creation, deployment and operations will no longer be the sole business of the owners and manufacturers of the infrastructure, but will become a customized business shared with the users of this infrastructure, too. In this sense the Active Network Operators will be adopting a new outsourcing-based business model in order to be capable of hosting an increased number of networkaware applications and also to manage an applications-aware network. The 1st International Working Conference on Active Networks – IWAN’99 has set itself the goal of pulling together the main streams of activities in the area of Active Networks in order to strengthen synergies and create a common view of the domain and its most difficult problems. Although no widely agreed taxonomy exists to this end, the contributions have demonstrated that a general architecture will include several dimensions such as active code distribution, active code execution (EE-execution environment, programming language, safety), active code communication (cooperation), active node resource control (NodeOS) and active network security. Each of these subarchitectures is component-based and includes a management component that could offer an API to be used by applications. Important aspects like autonomy (degree of self-management) and asynchronicity start to be addressed in the context of scalability and fault tolerance, and technologies such as mobile agents and CORBA are the first ones to provide the right support. Another important aspect is the integration of active networks with the legacy networks as well as the interoperability at the level of active networks. Issues related to active network architecture are well represented in the program and this is reflected in 9 papers in this volume. The challenge in implementing such architectures is to find the right balance between flexibility, usability, security, robustness, and performance. Solutions to this problem are assessed by a number of prototype platforms and results are presented by 8 papers in this volume. As one of the main advantages of active networks is based upon the rapid introduction of customized network services and applications, we continue to see a growing number of examples primarilyrelated to the Internet. The next generation of Internet will provide its users with dynamically managed QoS, multicast and security
VIII
Preface
and could rely on active networking as an ideal infrastructure for such implementations. Most of the papers about applications (14) relate to infrastructure owner applications – management, control — but some also address the class of customer - owned applications. The papers also give a clear indication that service and application creation paradigms and supporting methods and tools are going to come into focus once the field matures. The proof that active networks are gaining momentum is apparent considering the large number and high quality of papers submitted to this first international working conference – some 80 submissions of which the best 30 have been accepted and published in this volume. This book provides a unique state-of-the-art account of architectural approaches, technologies and prototype systems that will impact the way future networked businesses will be created and managed. It is unique not only in that it reflects all relevant achievements to date from every continent, but also because via its cooperative preparation, it has led to a truly Active Network of authoritative persons in the field. I hope you will benefit from reading it. June 1999
Stefan Covaci IWAN9’ 9 Program Chair
Acknowledgements This volume resulted from the papers accepted to be presented at the 1st International Working Conference on Active Networks – IWAN9’ 9. First, we would like to thank the authors for providing their material and in addition would like to acknowledge the effort of the many other authors whose submitted papers were not accepted. We would like to thank the members of the Technical Program Committee, listed below, for their quality reviews and useful suggestions regarding the technical contents and organization of the book. We are also indebted to our own reviewers, namely Hui Guo, Bahrat Bushan, and Ascan Morlang. Special thanks go to Dr. Eckhard Moeller and Ms. Cynthia Hardey for their continuous help in coordinating the review process and for their support in preparing and organizing this book. Our general chairs, Professor Dr. h. c. Radu Popescu-Zeletin and Mr. Masanori Kataoka, and our sponsors, Hitachi Ltd., European Commission - ACTS Programme, OMG, IBM Zurich Research Lab, Deutsche Telekom Technologiezentrum, IKV++, were a continuous source of help and encouragement. General Chair R. Popescu-Zeletin, Technical University of Berlin, Germany General Co-Chair M. Kataoka, Hitachi Ltd, Japan Technical Program Committee Chair: S. Covaci, GMD FOKUS, Germany Vice Chair: A. Lazar, Columbia University, USA Vice Chair: H. Yasuda, The University of Tokyo, Japan M. Bonatti, ITALTEL, Italy I. Busse, GMD FOKUS, Germany M. Campolargo, European Commission, Belgium H. Dibold, Deutsche Telekom, Germany P Dini, CRIM, Canada A. Gavras, Sprint, USA M. Gien, Sun Microsystems, France G. Le Lann, INRIA, France T. Magedanz, IKV++, Germany W. Marcus, Bellcore, USA I. Marshall, British Telecom, United Kingdom G. Minden, The University of Kansas, USA H. Miyahara, Osaka University, Japan E. Moeller, GMD FOKUS, Germany K. Nakane, Hitachi Ltd, Japan F. Schulz, Deutsche Telekom, Germany J. Smith, University of Pennsylvania, USA
VIII R. L. A. S. A.
Acknowledgements
Soley, OMG, USA Svobodova, IBM, Switzerland Tantawi, IBM, USA Weinstein, NEC, USA Wolisz, Technical University of Berlin, Germany
Steering Committee A. E. O. D.
Casaca, INESC, Portugal Raubold, Deutsche Telekom, Germany Spaniol, RWTH, Germany Tennenhouse, MIT, USA
Organizing Committee Local Arrangements: C. Hardey Publicity: B. Intelmann
IWAN'99 Message from the Conference General Chairs Internet technologies have begun to affect our lives and work. We are still talking about network protocols, reservation mechanisms, performance, and applications. It is time, based on the experience we have and the lessons we are learning from the large deployment of the Internet, to consider new research and development directions for the future. One of the most attractive areas is that of ACTIVE NETWORKS. Welcome to the first International Working Conference on Active Networks I‘ WAN 99’ in Berlin, a city which, similar to this new domain, is in the process of developing its structure and image for the future. We hope that this first workshop will become a forum for the exchange of ideas and results in this domain on an international level. Already a network of R&D activities worldwide in Active Networks can be identified. Most of them are presented at this workshop. The positive answers we received during the organization of the workshop from authors, industry, research groups from all over the world guarantee that this event will continue and grow in the future. We are sure that you will find the program stimulating and that you will take advantage of the opportunity to meet your colleagues from around the world. The success of the symposium depends on the dedication and contribution of many volunteers, committee members, authors, reviewers, invited speakers, and sponsors. Our personal thanks go to all of them and to you who, we are confident, will make IWAN 9‘ 9 a success. Prof. Dr. Dr. h.c. Radu Popescu-Zeletin Technical University of Berlin, Germany
Masanori Kataoka Hitachi Ltd, Japan
Table of Contents Architectures The Architecture of ALIEN D. Scott Alexander, Jonathan M. Smith
1
Designing Interfaces for Open Programmable Routers Spyros Denazis, Kazuho Miki, John Vicente, Andrew Campbell
13
Rcane: A Resource Controlled Framework for Active Network Services Paul Menage
25
The Protean Programmable Network Architecture: Design and Initial Experience Raghupathy Sivakumar, Narayanan Venkitaraman, Vaduvur Bharghavan
37
A Dynamic Pricing Framework to Support a Scalable, Usage-Based Charging Model for Packet-Switched Networks Mike Rizzo, Bob Briscoe, Jérôme Tassel, Konstantinos Damianakis
48
Active Information Networks and XML Ian Marshall, Mike Fry, Luis Velasco, Atanu Ghosh
60
Policy Specification for Programmable Networks Morris Sloman, Emil Lupu
73
A Self-Configuring Data Caching Architecture Based on Active Networking Techniques Gaëtan Vanet , Yoshiaki Kiriha Interference and Communications among Active Network Applications Luca Delgrossi, Giuseppe Di Fatta, Domenico Ferrari, Giuseppe Lo Re
85 97
Platforms The Grasshopper Mobile Agent Platform Enabling Short-Term Active Broadband Intelligent Network Implementation C. Bäumer, T. Magedanz
109
XII
Table of Contents
LARA: A Prototype System for Supporting High Performance Active Networking R. Cardoe, J. Finney, A.C. Scott , W.D. Shepherd
117
A Programming Interface for Supporting IP Traffic Processing Ariel Cohen, Sampath Rangarajan
132
New Generation of Control Planes in Emerging Data Networks Nelu Mihai, George Vanecek
144
An Active Networks Overlay Network (ANON) Christian Tschudin
156
Autonomy and Decentralization in Active Networks: A Case Study for Mobile Agents Ingo Busse, Stefan Covaci, André Leichsenring
165
Towards Active Hardware David C. Lee, Mark T. Jones, Scott F. Midkiff, Peter M. Athanas
180
The Impact of AN on Established Network Operators Arto Juhola, Ian Marshall, Stefan Covaci, Thomas Velte, Seppo Parkkila, Mike Donohoe
188
Active Management and Control Using Active Processes as the Basis for an Integrated Distributed Network Management Architecture Dominic P. A. Greenwood, Damianos Gavalas
199
ANMAC : An Architectural Framework for Network Management and Control Using Active Networks Samphel Norden, Kenneth F. Wong
212
An Active Network Approach to Efficient Network Management Danny Raz, Yuval Shavitt
220
Virtual Networks for Customizable Traffic Treatments Jens-Peter Redlich, Masa Suzuki, Steve Weinstein
232
Table of Contents
XIII
Flexible Network Management Using Active Network Framework Kiminori Sugauchi, Satoshi Miyazaki, Kenichi Yoshida, Keiichi Nakane, Stefan Covaci, Tianning Zhang
241
Managing Spawned Virtual Networks Andrew T. Campbell, John Vicente, Daniel A. Villela,
249
Active Organisations for Routing Steven Willmott, Boi Faltings
262
A Dynamic Interdomain Communication Path Setup in Active Network Jyh-haw Yeh, Randy Chow, Richard Newman
274
Active Network Challenges to TMN Bharat Bhushan, Jane Hall
285
Survivability of Active Networking Services Amit Kulkarni, Gary Minden, Victor Frost, Joseph Evans
299
Security A Secure Plan Michael Hicks, Angelos D. Keromytis
307
Control on Demand Gísli Hjálmtýsson, Samrat Bhattacharjee
315
Agent Based Security for the Active Network Infrastructure Stamatis Karnouskos, Ingo Busse, Stefan Covaci
330
Author Index
345
The Architecture of alien D. Scott Alexander1 and Jonathan M. Smith2, 1
2
Bell Labs, Lucent Technologies, Murray Hill, NJ 07974, USA
[email protected] University of Pennsylvania, Department of CIS, Philadelphia, PA 19104
[email protected]
Abstract. The alien architecture exposes all node-resident features to modification by a module loader, with the exception of the loader itself. As a structuring principle, alien divides its loadable portions into a privileged loader-initiated Core Switchlet and an unprivileged collection of libraries which use the Core Switchlet and are loaded by it. The loader, Core Switchlet and libraries comprise the network-resident functionality of alien. We make three claims. First, by dint of a library for Active Packets written in Caml, alien is the first system to support both Active Packets and Active Extensions. Second, by use of language features such as module thinning, alien can provide multiuser security within a single address space. Third, by isolating only a small set of functions with privilege, the system achieves security, flexibility and good performance, with a measured throughput of about 60 Mbps when used for LAN bridging.
1
Introduction
There are a variety of conceptual models or “visions” of what “Active Networking” means and what it could be. These are centered around the programming model. We believe that the distinction between the “programmable switch” (active extension) and “capsule” (active packet) models outlined in [1] is a distraction rather than a central question. The real question is what the programmer of the active extension or active packet can expect from the active network. This question resembles the debate and exploration of what should be resident in an operating system. One model from early computing history was that of the “I/O Control System,” loaded with the compiler in the card deck preceding the program itself. The active network analogy would be active packets which carry everything they need to be “activated” along with them, in some form recognized by the universal “computing machine.” The other historical model was the operating system — a universally available, but privileged “kernel” of resource management services enhancing the raw machine, coupled with standard compilers and run time environments. The active network analogy in this case would be the set of services available in network elements as well as any
This work was supported by DARPA under Contract #N66001-96-C-852.
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 1–13, 1999. c Springer-Verlag Berlin Heidelberg 1999
2
D. Scott Alexander and Jonathan M. Smith
structuring principles use to organize them. The tension between the two historical approaches continues, as there has been a proliferation of micro-, nanoand exo-kernels which place less and less behind a privilege boundary to provide greater flexibility to unprivileged applications. The analogy in active networking is the tension between the programmer’s freedom, which when maximized offers great flexibility, and the services available at a network element, such as concurrent multiprocessing, which add functionality in exchange for reducing certain freedoms. We believe that we have made significant progress towards showing what can be done to balance concerns of flexibility, usability, security and performance in an active network element. Sect. 2 discusses the design of the alien architecture for active networking. Sect. 3 suggests reasons why many security and performance issues can be addressed by an appropriate programming language. Sections 4, 5 and 6 detail the three layers of the alien architecture and Sect. 7 explains how choices are made in locating functionality. Sect. 8 briefly addresses performance issues stemming from the use of byte-coded languages. Sect. 9 makes our point that the active extensions vs. active packets distinction is of secondary concern by describing implementations of each within alien, and Sect. 10 summarizes our contributions and outlines challenges for the future.
2
The Design of alien
In designing alien, our goal was a system which would allow experimentation and prototyping to test Active Networks ideas. In particular, we wanted to be able to implement experiments built on either active extensions or active packets. At the same time, we felt that it was important that alien have sufficient performance to allow a realistic understanding of the compromises of various designs. An important element of our design is the choice of a computing model. We have chosen to use a full Turing machine model for computation by picking a Turing equivalent language. We then provide the ability to control which shared resources are available to which active program. This allows us to build security into the system. One of the elements of the alien design intended to ensure performance was our choice to use a single address space for alien. This has security implications that we will discuss below, but allows processes to communicate very quickly using a shared memory model. Additionally, we did not feel that it was realistic to expect a hardware router to contain a memory management unit. This affected our choice of language as we will describe in Sect. 3. Another goal was to make the system flexible. This means that it should be possible to change the system at runtime to the greatest reasonable extent. To achieve this, we used a layered design with only a small kernel of unloadable functionality. Similarly, for security reasons, we attempted to minimize the size of the trusted code. The combination of these considerations lead to a three layer design as shown in Fig. 1.
The Architecture of alien
unprivileged loadable privileged loadable privileged non-loadable
3
libraries Core Switchlet Loader runtime (Caml) OS (Linux)
Fig. 1. alien layers
The lowest layer is the alien loader which is invoked to start the system. This is described in Sect. 4. The loader loads in the Core Switchlet as described in Sect. 5. Both of these elements are privileged. Finally, the Core Switchlet loads in various libraries (Sect. 6) which are unprivileged, but provide “expected” services.
3
The Choice of a Language
The tradeoffs between security and performance are critical in the choice of a language. If one were working in a completely trusted environment and failures due to programming errors were not a concern, any modern programming language that was capable of dynamically loading programs would be sufficient. Our need for security in an environment where resources must be shared by competing interests leads to the need for more restrictions on our choice of programming language. When combined with our desire to run in a single address space, we exclude the ability to use any language that allows manipulation of pointers. Additionally, because different principals may have different access to the system, we need to be able to control the view of the system granted to each principal. This lead us to identify six characteristics that we feel are useful to build alien: 1. 2. 3. 4. 5. 6.
strong typing, garbage collection, module thinning, dynamic loading, homogeneous representation of active programs, and performance.
4
3.1
D. Scott Alexander and Jonathan M. Smith
Primary Considerations
Strong typing and garbage collection were chosen to aid security. Strong typing ensures that any readable memory location has a type and that memory locations can only be accessed by functions that operate on the appropriate type. Moreover, conversion functions are carefully regulated to ensure that these properties hold. For further discussion of why we chose strong typing, see [2]. Garbage collection supports strong typing. If an active program can explicitly deallocate memory using a function like free(), the chunk of memory freed loses its type, but the user can still access it. This becomes most important when the memory is reallocated. If the memory is allocated to a different active program, it now is possible for the first program to read (and possibly modify) the second’s data. Module thinning is a technique which allows us to maintain multiple interfaces for a single module. Thus, we might have a module which allows access to the filesystem. One interface would allow access to the entire set of functions normally offered by the filesystem. A second interface would offer access to the functions which allow one to read data, but thin out the functions which would allow modification of the filesystem. A third interface might allow access only to functions which would interpret and return information from the files in the filesystem without providing access to the files themselves. These three interfaces would be appropriate, for example, to a node administrator, to a program gathering status information, and to an application. Dynamic loading is clearly necessary. Our definition of active networking requires that it be possible to load functions while the system is running. Dynamic loading is a name for this ability. While it would be possible to represent active programs heterogenously [3], it adds considerable complexity to the system. For this reason, we have chosen to require that programs be represented in the same form regardless of the hardware present at any node that the program transits. In particular, this means that we will be transmitting an intermediate representation of the programs and that this will need to be translated or interpreted to be executed. Finally, performance is a concern. If alien ran too slowly, we could not have made useful conclusions about active networking systems. At the same time, the experimental nature of alien means that performance is one concern of many. 3.2
Secondary Considerations
In addition to the features in the previous section, we found threads and static typing to be useful characteristics that were available to us. Having a thread system allows us to more naturally structure the system. Each active program in alien consists of at least one thread. The thread scheduler is allowed to mediate access to the CPU. If a thread system was not present in the language we chose, we suspect that we would have ended up implementing one of our own. Static typing allows all types to be determined at compile time. Thus, the only type-related checks which occur at runtime are array bounds checks. Additionally, certain errors are caught at compile time instead of runtime. Since an
The Architecture of alien
5
active network can be a complex distributed system, this can be an aid to debugging. There can be a tradeoff in a system like alien though. Types must be checked at link time and when an object is unmarshaled. With dynamic loading, these are essentially runtime checks. The potential advantage is that these types are checked only once regardless of frequency of use. The potential disadvantage is that if a module is sent with unused functions or a data object is sent with unused values, dynamic typing would never have checked those types. 3.3
The Caml Programming Language
In our implementation of the alien architecture, we chose to use an existing language, Objective Caml [4], as it implements all of the properties identified to varying degrees. We will discuss it and some of the other choices that we considered and discarded in the following sections. Objective Caml is a language from the ML [5] family of languages. It is a strongly typed, garbage collected, functional language. The compilers provided can produce byte code for a wide variety of Unix variants as well as for Microsoft Windows 95. Additionally, native code compilers are provided for the Digital Alpha [6] and Intel x86 [7] amongst others. Both types of compilers use static type checking. Caml also has a dynamic loader which allows byte code to be loaded into a running byte code program. Since the byte code is machine independent, our active programs are composed of byte code files which may be shipped around the network without regard for the underlying machine architecture. Additionally, module thinning is provided. Currently, we use the ability to define one unrestricted interface for the internal components of our system and a second, restricted, interface for active programs that are loaded. This ability is discussed further in Sect. 5. Caml performs a set of checks when a byte code file is loaded. In particular, it checks to see that each interface required by the new byte code file matches an interface provided by the running system. To facilitate this check, the compiler stores an MD5 hash [8,9] of each interface that it compiles or compiles against. At link time, these hashes are compared.1 Because Caml is not designed as a network language, per se, the dynamic loader is designed to load byte code files from disk. This was an area in which an extension of the Caml library was required to make Caml suitable for our purposes. (See Sect. 9 for more details.) 3.4
Other Possible Languages
We considered other possibilities before choosing Caml. There are other languages which meet the requirements listed and alien could have been imple1
Obviously, the hashes can be trivially forged to attack this system. Rouaix [10] suggests using a well-known, certified compiler which digitally signs its output. A verifier such as is used in Java [11] would be another approach to this problem.
6
D. Scott Alexander and Jonathan M. Smith
mented in most if not all of them. Nonetheless, there were some factors which inclined us toward Caml and away from some other languages. Java [11] has been a popular language for implementing active networking systems. While Java meets our requirements, meeting our security requirements requires using the SecurityManager to implement a scheme similar to module thinning. This need along with the need to use native methods to implement some of the network access we require (i.e., access to Ethernet frames) destroys the “write once” advantage that Java enjoys for applets. Thus, we could have implemented alien in Java, but such an implementation did not have any clear technical advantage over Caml. Another approach would be to design a new language. This approach has been taken with PLAN [12] and Netscript [13]. The advantage to such an approach is that the language can be tailored to active networking. The difficulty is that if the designer leaves out a feature needed by a user, it can become difficult or impossible to implement desirable active programs in the new language. Nonetheless, it would have been possible to design a new language and use it to implement alien.
4
The alien Loader
The Loader provides the core of alien’s functionality. It provides the interface to the operating system (through the language runtime) plus some essential functions to allow system startup and loading of active programs. Thus, it defines the “view of the world” for the rest of alien. Moreover, since security involves interaction with either the external system or with other active programs, the Loader provides the basis of security. As its name implies, another role of the Loader is to load active programs into the system. Therefore, to simplify implementation of the architecture, we have made the Loader non-loadable. We also expect this to improve efficiency in some implementations. The other side of this decision, though, is that we attempt to make the Loader as small as possible because we would like to have components of alien replaceable (which means loadable) whenever practical. Also ameliorating this inflexibility of the Loader is the fact that it is often possible to overlay functionality in the Loader with a different interface or sometimes even a different mechanism at a higher level. The Loader provides mechanisms rather than policy; policies are implemented in the Core Switchlet and can be changed by changing it. In addition to the functionality provided by the language runtime, there are three areas in which added capabilities are needed, as shown in Table 1. The first of these areas is a set of startup functionality. This consists of performing any initializations needed by either the runtime or alien itself to bring alien to a stable state. The second area is active program loading. Dynamic loading of code is obviously crucial to an Active Network node. By placing this functionality in the Loader, we are able to make the Core Switchlet loadable. The third area is what we call the system console. This provides a way for the operator to provide
The Architecture of alien
7
commands to the system. This allows maintenance and diagnostic operations to be performed before the network is fully available or in the event of network failure.
Table 1. Loader functionality startup routines initialize system active program loading load active programs consistent with alien security system console console read loop
5
The Core Switchlet
Above the Loader is the Core Switchlet. It is responsible for providing the interface that active programs see. It relies upon the Loader for access to operating system resources and then layers additional mechanisms to add security, and often, utility. In providing an interface to active programs, it determines the security policies of the system. By including or excluding any function, it can determine what active programs can or cannot do. Additionally, it is loadable, so the administrator can change or upgrade its pieces as necessary. This can also allow changes in the security policy. The security policies of the Core Switchlet are enforced through a combination of module thinning and type safety. Type safety ensures that an active program can only access data or call functions that it can name. Module thinning assures that the system controls which data and functions the active program can name so that the security policy is enforced.
5.1
The Facilities of the Core Switchlet
In many ways, the interface that the Core Switchlet presents to active programs and libraries is like the system call interface that a kernel presents to applications. Through design of the interface the system can control access to underlying resources. With a well-designed interface, the caller can combine the functions provided to get useful work done. Table 2 shows the functionality provided by the Core Switchlet. Language Primitives, Operating System Access, and Thread Access The language primitives category encompasses those functions that one expects to find in a programming language such as +, boolean “and,” and simple type conversions. These functions are implemented in the runtime and thus are part of the Loader. However, the Core Switchlet is responsible for maintaining the policy regarding which functions are available to active programs.
8
D. Scott Alexander and Jonathan M. Smith
Table 2. Core Switchlet functionality language primitives operating system access network access thread access loading support message logging
policy for access to the basic functions of the language policy for access to the operating system calls policy and mechanism for access to the network policy for access to threads primitives policy and mechanism to support loading of active programs policy and mechanism for adding messages to the log file
Thus, for example, open in which opens a file for reading, is in the language primitives made available by the Loader, but the Core Switchlet might omit it if there was a policy forbidding active programs access to the disk. Similarly, operating system access functions are those which allow access to a system call. Thread access functions allow access to operations such as the creation or deletion of a thread, to mutual exclusion, and to condition variables. Again, these are implemented in the runtime, but the interface seen by active programs is thinned by the Core Switchlet in accordance with the system policies. Network Access Because we are implementing a network node, access to the network is particularly important. Generally this consists of allowing active programs to discover information about the interfaces on the machines and the attached networks, receive packets, and send packets. One element of this task which is particularly important is the demultiplexing of incoming packets. The Core Switchlet must be able to determine whether zero, one, or more than one active program is interested in an arriving packet. If more than one active program is interested in the packet, policy should dictate which active program or active programs receive a copy of the packet. Security is an important element of the decision as an active program should be able to be certain that it will get all packets that it should receive under the policy, and should not be able to get any packets that it should not receive under the policy. Without such security, denial-of-service attacks and information stealing are quite easy. Loading Support Loading support includes the loading functionality from the Loader with thinning appropriate to control which active programs may be loaded by whom. Additionally, the Core Switchlet adds mechanism for tracking which active programs have been loaded and what functions those active programs wish to make available to other active programs. This mechanism is important because it gives active programs a way to make use of functions found on a switch. In conjunction with a uniform naming scheme [14], it becomes possible for active programs moving through the network to make use of facilities that are present without failing if the facilities are not provided.
The Architecture of alien
9
Message Logging Message logging is a generalized facility which allows an active program to attempt to leave a message for human consumption. Because we expect policies limiting access to persistent storage to be common, we believe that it is important to provide such a facility. Our facility allows the active program to request that a string be logged. This simplicity is important because the facility may be implemented by appending the string to a file, by sending it to an output device such as a terminal, or by using a more powerful logging mechanism. A simple solution is easily mapped onto any of these means. Additionally, no guarantees are made about what will happen to the log message. If, for example, a policy exists limiting the number of messages per unit time produced by an active program, the Core Switchlet may silently discard messages after that limit has been reached.
6
The Library
The library is a set of functions which provide useful routines that do not require privilege to run. The proper set of functions for the library is a continuing area of research. Some of the things that are in the library for the experiments we have performed include utility functions and implementations of IP [15] and UDP [16].
7
Locating Functionality
When expanding our implementation, it is not always obvious in which layer the new functionality belongs. In this section, we present the principles we use to make this determination. Our first principle is that if the functionality can be implemented in a library, it should be. Said another way, if the functions exposed by the Core Switchlet or available from other libraries provide the infrastructure needed to implement the new functionality, a library is warranted. If the new functionality relies on some element of the runtime not made available to unprivileged code, then either the Loader or the Core Switchlet must be expanded. Because these elements define the common, expected interface available at the switch, we attempt to keep them small to minimize the required resources. Thus, our second principle is that we prefer to break off the smallest reasonable portion of the new functionality (consistent with security) that can be implemented in the privileged parts of the system. The remainder becomes a library. In our experience this also aids generality, as the privileged portion is often useful to other libraries developed later. For example, to implement IP, we built a small module inside the Core Switchlet which reads Ethernet frames from the operating system. It also demultiplexes the frames based on the Ethernet type field to increase generality. The remainder of IP, which processes headers, could then be made a non-privileged library. Our third principle is that if this privileged functionality sets policy, it needs to go into the Core Switchlet. As discussed above, policy must be set in the Core Switchlet so that the loading mechanism can be used as needed to change
10
D. Scott Alexander and Jonathan M. Smith
policy. Our final principle is any functionality that provides pure mechanism is placed in the Core Switchlet unless it is needed before the Core Switchlet can be running.
8
Performance and Byte Code
While we have followed the principles outlined in Sect. 7 closely in our implementation, we did find that the performance penalty was too great in one instance. As we describe in [2], implementing SHA-1 (a hash algorithm) in Caml was too slow, so we resorted to a C implementation. We also resort to C to extend the runtime (e.g., to add access to Ethernet frames). Any C extension in Caml appears as part of the runtime and thus is part of the Loader.
9
Active Programs
Of course, the goal of the infrastructure is to be able to use code which is not known ahead of time or is not generally used. Thus, the success of the infrastructure is based on how well it handles a active program or group of interacting active programs sent by a user. The next two sections describe how each type of active program works in alien. 9.1
Active Extensions
Active extensions can be loaded either from the local disk or over the network via the TFTP [17] protocol. TFTP and the underlying UDP [16] and IP [15] services are all implemented as active programs. When an active extension is loaded, Caml first checks to ensure that the interfaces that it requires are satisfied by the set of (thinned) interfaces that alien is willing to provide to this extension. Next, the interface exported by this module is added to the symbol table. The extension is also given the opportunity to register name to function mappings in a table; this allows a module in alien to make calls into extensions even if the callee was loaded after the caller. The Active Bridge [18] is an example of a system that we built using active extensions. It has demonstrated bridging throughput of 57 Mbps [2] when bridging two 100 Mbps Ethernets. Please see the references for further details. 9.2
Active Packets
For active packets to be processed in alien, a set of libraries must be loaded to receive and process the packets. Our active packets are ANEP [19] encapsulated (and currently UDP encapsulated for convenience) as shown in Fig. 2. Thus, the first of the libraries receives an ANEP packet and performs header processing including determination of the execution environment for the packet. Also, some authentication can occur at this level as described in [14].
The Architecture of alien
link layer header
ANEP header
code portion
data portion
11
function name
Fig. 2. An Active Packet
Assuming the active packet is to be executed in the alien environment, the next library is responsible for marshaling and unmarshaling of active packets. In alien, each active packet contains a code portion, a data portion, and a function name. The code is dynamically loaded from memory as an active extension would be. It is responsible for using Func.register to register a function with the function name listed. That function name is then invoked with an argument which is an encoded form of the the code, data, and function name. We have used a linked/ procedure call model for communication with other active programs. When programming one of our active packets, we assume that alien plus some set of active extensions will be on the node. If this is not the case, an error will occur during linking and the packet will be (silently) dropped. (The issue of error handling is left for future work.) When the code is loaded, it is linked against those extensions and alien, and makes use of resources via function calls. This implies that active packets have to trust the services they call to a substantial extent.
10
Conclusions
We have described the alien architecture for active networking. This architecture has made three contributions. First, it organizes the three interesting cases of crossing privilege with loadability: the privileged and immutable loader, the privileged and loadable Core Switchlet, and the unprivileged and loadable libraries. Second, it shows how the use of a modern programming language (Caml), coupled with a set of design principles, can result in a usable active networking system which is flexible and fast while preserving security. Finally, using the active extension model as a basis, it has demonstrated active packet service as a library. This proof-of-concept makes alien the first active networking system to support both models, and makes the case that the distinction between the models is probably not important. The design principles we have proposed are generally useful, and can be used in other Active Networking environments such as ANTS when issues such as concurrent multiprocessing and privilege are addressed in depth. The “takehome message” is in what goes where in the organization rather than any details of the alien implementation. alien suggests two promising directions for exploration. First, while alien provides effective control, and thus security, for objects it manages, there are a variety of resources it does not manage. In particular, the heap managed by
12
D. Scott Alexander and Jonathan M. Smith
the Caml runtime and the multiplexing of the system hardware managed by (in the current instantiation of alien) Linux expose a number of denial-of-service attacks that alien can do nothing about. These examples suggest promising areas for support of realistic systems will be schemes for providing better user isolation in the time domain, such as operating systems with support for Quality of Service. The second direction for exploration is global support for security properties which alien can enforce locally. Some results along these lines were presented in [14], which used the idea of granting cryptographic credentials for access to particular modules to extend module thinning to remotely executing code. Over the long term, however, the issue will become one of mapping security policies onto active network elements, and that will require thinking about scalable trust management in an active network.
References 1. D. L. Tennenhouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall, and G. J. Minden. A survey of active network research. IEEE Communications Magazine, Jan. 1997. 1 2. D. Scott Alexander. alien: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, Philadelphia, December 1998. 4, 10 3. F. C. Knabe. Language Support for Mobile Agents. PhD thesis, CMU, Dec. 1995. 4 4. Xavier Leroy. The Caml Special Light System (Release 1.10). INRIA, France, November 1995. 5 5. R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. MIT Press. 5 6. Richard L. Sites and Richard T. Witek. Alpha AXP Architecture Reference Manual. Digital Press, 2nd edition, 1995. 5 7. Don Anderson and Tom Shanley. Pentium Processor System Architecture. Addison Wesley, 2nd edition, 1995. 5 8. Ron Rivest. The MD5 message-digest algorithm. RFC 1321, April 1992. 5 9. Bruce Schneier. Applied Cryptography, pages 436 – 441. Wiley, 2nd edition, 1996. 5 10. F. Rouaix. A web navigator with applets in Caml. Fifth WWW Conf., May 1996. 5 11. Ken Arnold and James Gosling. The Java Programming Language. Java Series. Sun Microsystems, 1996. ISBN 0-201-63455-4. 5, 6 12. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A packet language for active networks. In Proceedings of the International Conference on Function Programming (ICFP), September 1998. 6 13. Y. Yemini and S. daSilva. Towards programmable networks. In IFIP/IEEE Intl. Workshop on Distributed Systems: Operations and Management, October 1996. 6 14. D. Scott Alexander, William A. Arbaugh, Angelos D. Keromytis, and Jonathan M. Smith. A secure active network architecture: Realization in SwitchWare. IEEE Network, 12(3):37–45, May/June 1998. issue on Active and Programmable Networks. 8, 10, 12 15. Jon Postel. INTERNET protocol. Internet RFC 791, 1981. 9, 10
The Architecture of alien
13
16. Jon Postel. User datagram protocol. Internet RFC 768, 1980. 9, 10 17. Karen R. Sollins. The TFTP protocol (revision 2). Internet RFC 1350, 1992. 10 18. D. Scott Alexander, Marianne Shaw, Scott M. Nettles, and Jonathan M. Smith. Active bridging. Proc. 1997 ACM SIGCOMM Conference, September 1997. 10 19. D. Scott Alexander, Bob Braden, Carl A. Gunter, Alden W. Jackson, Angelos D. Keromytis, Gary J. Minden, and David Wetherall. Active network encapsulation protocol (ANEP). http://www.cis.upenn.edu/˜angelos/ANEP.txt.gz, August 1997. 10
Designing Interfaces for Open Programmable Routers Spyros Denazis1, Kazuho Miki2, John Vicente3, and Andrew Campbell4 1
Centre for Communications Systems Research (CCSR) University of Cambridge, UK Industrial Research Fellow, Hitachi Europe Ltd, UK.
[email protected] 2 Hitachi Ltd., Japan 3 Intel Corporation, USA 4
Center for Telecommunications Research (CTR) Columbia University, NY, USA
{miki,jvicente,campbell}@comet.columbia.edu Abstract. The ability to rapidly create and deploy new network services and architectures in response to new user demands is a driving force behind the emergence of programmable networks. The goal of open network control is being addressed in the IEEE P1520 Working Group (http://www.ieee-pin.org/) through the definition of a set of open network programming interfaces for networks. These interfaces would allow service providers to manipulate the state of the network through high-level languages and abstractions in order to construct and manage new network services with quality of service support. In this paper, we provide an overview of the IEEE P1520 reference model and a detailed framework for the development of low-level, open programmable interfaces for IP-based router and switch networks. Keywords: Open Programmable Routers, Router Interfaces, Differentiated Services
1
Introduction
Over the past several years, we have witnessed a growing amount of work in the area of open programmable networks [1,2,14]. The aim of this work is the design of new network architectures which facilitate rapid creation and deployment of new network services. Central to this goal has been the emergence of open interfaces, enabling control, management and composition of network resources through the introduction of new innovative services, which cannot otherwise be realized with today's proprietary (i.e., closed) network technologies. The need for programmable networks is becoming more apparent as open programmability has become a central theme to a number of standardization efforts and consortia [3,4,5]. As technology for open programmable networks matures, well2 3
Kazuho Miki is a Visiting Scientist at CTR, Columbia University John Vicente is a Visiting Scientist at CTR, Columbia University
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 13-24, 1999. Springer-Verlag Berlin Heidelberg 1999
14
Spyros Denazis et al.
designed interfaces become more important for flexible customization, operation and extensibility. The development of open programmable interfaces has mainly been the result of academic projects, which have fairly specific objectives. In addition, open programmable interfaces have been constructed in an ad-hoc manner to support the introduction of services as a proof of concept for network programmability. As a result, a design framework for open interfaces in the context of programmable networks has not been addressed. Most existing proposals found in the literature have focused primarily on the control and management of ATM network elements motivated by the proprietary limitations of existing switching technology [6,7,8]. Through the activities of the DARPA initiative for Active Networks [9], [3], there has been an attempt to enable programmability within router-based networks. In parallel to the Active Networks initiative, OPENSIG [10] has been investigating open signaling and programmable network architectures. The open interfaces of a programmable network architecture are structured in a layered fashion whereby the higher interface relies on the services of the interface below it, while it exposes its own services to the layer above it. Hence, an interface is characterized by its scope and the services it offers. For example, the scope of an interface may be such that it distinguishes between node and network-wide services that it can offer. In this paper, we define a set of router (node) interfaces for programmable routerbased networks. We believe that in order to define a node interface, it is first important to describe a framework that assists in the design process and consequently the use and maintenance of the interface. The basic principles that underpin our proposed framework are driven by experiences with existing router technology. The proposed model, terminology and interface definition may serve related initiatives (e.g., active networks) by offering a generalized, yet well structured interface model in the support of programmable network architectures. The contribution as presented in this paper has been submitted to the IEEE P1520 for consideration. The paper is structured as follows. In Section 2, we present an overview of the IEEE P1520 reference model and terminology. Following this, in Section 3 we discuss a number of requirements for open router interfaces. Section 4 introduces our framework and its three basic components. Sections 5 and 6 present resource and service-specific abstractions, respectively. Section 7 presents an example scenario for realization of the proposed interfaces in support of Differentiated Services [12,13,14]. Finally, we present some concluding remarks in Section 8.
2
P1520 Reference Model
The IEEE P1520 standardization effort addresses the need for a set of standard software interfaces for programming of networks in terms of rapid service creation and open signaling [5]. The technology under consideration spans from ATM switches, IP routers to circuit or hybrid switches. The interfaces are structured in a layered fashion offering their services to the layer above. Each layer defines what is termed as a level. Each level comprises a number of entities in the form of algorithms or objects representing logical or physical resources depending on the level’s scope and functionality. This approach gives rise to the reference model depicted on the left of Fig. 1.
Designing Interfaces for Open Programmable Routers
15
P1520 Reference Model Users V interfaceAlgorithms for value-added Value-added communication services created by Services Level network operators, users, and third parties U interface Network Algorithms for routing and connection Generic management, directory services, … Services Level
}
L interface CCM interface
Virtual Network Device (software representation) Physical Elements (Hardware, Name Space)
} }
Virtual Network Device Level PE Level
U Interface Mapping of P1520 RM to IP Routers
}
Applications invoking methods on objects below
Differentiated Services Scheduling
Customised Routing
RSVP or Other per-flow protocol
Routing Algorithms
Software representation of routing resources
CCM Interface
Controller Hardware and other resources
L Interface
Routing table Data
Fig. 1: The P1520 Reference Model and mapping to IP routers.
More specifically, we can distinguish the four levels as follows: • • • •
The physical element (PE) level consisting of entities such as hardware and the device architecture that actually reflects upon the supported capabilities; The virtual network device level (VNDL) which logically represents resources in the form of objects (entities); isolating the upper layers from hardware dependencies or other proprietary interfaces; The network generic services level (NGSL) consists of entities in the form of distributed algorithms that bind (interconnect) together the objects of the VNDL level according to specific network functionality, e.g., routing, connection setup; Finally, the value-added services level (VASL) includes entities in the form of end-to-end algorithms that enhance the generic services of the NGSL level; providing user-oriented features and capabilities in the applications.
The four levels give rise to four interfaces, namely, CCM (Connection Control and Management), L (lower), U (upper), V (value-add) interfaces. The CCM interface is actually a collection of protocols that enable the exchange of state and control information at a very low level between the device and an external agent. The Linterface defines an API that consists of methods for manipulating local network resources abstracted as objects. CCM and L-interfaces fall under the category of node interfaces. The U-interface mainly provides an API that deals with connection setup issues. As in the case of the L-interface, the U-interface isolates the diversity of connection setup requests from the actual algorithms that implement them. Finally, the V-interface (not shown in Figure 1) provides a rich set of APIs to write highly customized software often in the form of value-added services. Additionally, Uand V-interfaces constitute network-wide interfaces. The P1520 Reference Model (RM) provides a general framework for mapping programming interfaces and operations of networks, over any given networking technology. Mapping diverse network technologies and their corresponding
16
Spyros Denazis et al.
functionality to the P1520 RM is essential. The right side of Fig. 1 illustrates a mapping of the P1520 RM to IP routers. Given this mapping, it is important to establish an L-interface definition that abstracts router resources and functionality such that it satisfies the service requirements above it, while it is flexible enough to accommodate future service requirements.
3
L-Interface Requirements
The objective of the L-Interface is to create programmable abstractions of the underlying resources of the router or switch, enabling third-party service providers, network administrators, network programmers or application developers to influence or extend router or switch control through the use of high-level API's. The requirements of the L-interface design are driven from the perspective of the users of the L-interface. It is therefore relevant to detail and understand these as fundamental to the basis of the L-interface. We enumerate the following requirements: 1. Open programmability - This is an enabling requirement for the L-interface where separation is achieved between hardware and software, fostering third-party service creation and open competition. 2. Operational support - Management and administrative functions must be improved or facilitated via the L-interface, where over slow-time scales greater control and intelligence data gathering is achievable by network administrators and architects. 3. Service provisioning - Through open APIs, third-party service providers must have the ability to modify existing network services as well the ability to introduce entirely new network services to the router or switch functionality. 4. Extensibility - The associated L-interface model must provide flexibility for extension without intrinsically creating a proprietary format, limiting the extensibility and virtualization process. 5. Programmable abstraction granularity - The support for granularity in router object programmability and service provisioning is essential to flexible customization. 6. Timescale flexibility - Access and control of router resources through the Linterface must be achievable over different time scales (i.e., control, management and transport). 7. Resource partitioning – Through management and control plane operations, the Linterface should support partitioning of router resources allowing network operators or service providers to deploy and operate their network architectures on the same physical infrastructure confined to the allocated portion of the router(s) resources. Finally, it is imperative that the L-interface requirements are not hindered or otherwise restricted to the abstracted resources level (VNDL) due to limitations of lower-level protocols (e.g., GSMP) or proprietary design features of a router kernel or hardware. The basic tenet argues that not only should the concept of open network programmability be supported by lower level protocols/interfaces, e.g. CCM, but also new router architectures should be designed to support these requirements through the above interfaces. We view this as critical for the success and widespread acceptance of open programmable routers.
Designing Interfaces for Open Programmable Routers
4
17
Generalized L-Interface Model
The framework for designing the L-interface or more generally node interfaces is comprised of three components. The first is a two-layer model representing the Linterface separation structure for abstracting router resources. The second is a hierarchical decomposition approach to representing router resources in the form of a tree-like structure with their corresponding inter-relationship. Finally, the third component describes how the interface definition structure of each abstracted resource, namely each node of the tree, should support control, transport, and management administrative operations on the router resources. In what follows, we further elaborate on each of these model components.
4.1 Two-Layer Abstractions
Fig. 2: Two-layer model for abstracting resources
The process of defining the L-interface through the abstraction of node resources is conceived as a two layer model depicted in Fig. 2. At its core lies the process of abstracting resources that are considered generic in the sense that they are not used in a specific-service context. At the outer circle lies the process of abstracting resources that have only meaning when they are used within a specific service context, e.g. Differentiated services. To this end, we view router resources from two different perspectives, corresponding to an association with a particular layer of the abstraction model. The first viewpoint is to consider router resources as general-purpose facilities; the abstraction of which leads to general-purpose interfaces that may be used and combined simultaneously by different service domains. The second viewpoint is to identify resources associated with the functionality they are intended. In this context, resources of the latter represent a partition of the general-purpose resources eventually used for certain tasks. In addition, L-interfaces that are the result of the generic abstraction process will form the basis on which service-specific interfaces may be defined for a variety of purposes. This in turn may also become a part of the Linterface definition. The advantage of the two-layer model is that it allows for true interface openness and resource reusability under different service contexts. Thus, we are suggesting that such an abstraction model allows, for example, upper level
18
Spyros Denazis et al.
interfaces (e.g. U interface) to create or program completely new network services using generic abstractions or modify existing services using service-specific abstractions, which are themselves built on generic abstractions. This makes the Linterface flexible rather than static in the sense that as new services are conceived, they can form their interface representation in a seamless fashion.
4.2 Hierarchical Resource Decomposition Abstracting resources requires knowledge of an IP router architecture, its corresponding physical resources, identification of those resources pertaining to our objective, followed by some form of classification. Consequently, this activity will result in an IP router reference model that may be used as a guiding model for resource abstraction.
Buffers
e.g.,DiffServ PHB
Bandwidth
Other Buffer Mgmnt
Scheduler
e.g.,OSPF hop cost
e.g., OSPF routing table
CPU Capacity
Other
e.g., EF PHB buffers
Service-specific Abstraction
Various Tables
L Interface
CPU Scheduling
Routing Table
algorithm
Other
Packet Classifier
Routing Calculation Database component
Software representation of routing resources
Generic Abstraction
CCM Interface
Processor
Line Card
Traffic Control
Route Control
Forwarding Engine
Data
Connection Module Line Card
Forwarding Engine
Capacity
IP Router Fig. 3: IP Router Architecture with example resource abstractions
Fig. 3, depicts a generalized IP Router Architecture and an example decomposition of the router resources, superimposed onto the P1520 interface model. As shown, the architecture is composed of a number of distinct functional elements, specifically: the line cards whereon input/output ports are instantiated; the forwarding engines wherein forwarding of packets takes place with routing and traffic control elements influencing the forwarding policy and traffic control services, respectively; a general-purpose processor which executes router kernel and network-level services and is responsible for hosting a number processes like routing protocols and other housekeeping or special purpose functions; finally, the connection module which is
Designing Interfaces for Open Programmable Routers
19
used for transport interconnection among the other elements and may represent a switching fabric or a bus architecture. Each of these elements can be viewed as containers of resources that should eventually be reflected upon the L-interface and thus, become available to the user or consumer of the resource. Router capacity, not unlike the other resources, plays a part in the functional architecture of the router, where local computation capacity, network bandwidth and static configuration are key abstracts for managing proper utilization of local processing and network-level control and transport services. To this end, it constitutes the quantitative representation of the router, and as such it may be viewed orthogonal to the actual router resource representation as illustrated in Fig. 3. The importance of such a distinction will become clearer in later sections. Observing generic abstractions requires viewing resources irrespective of functional service domain. Hence, we would identify generic router resources, translate them into generic L-interfaces, and further, provide methods for resource partitioning and methods to forge partitions according to the specifications of the caller of the method. Generic Abstractions
Base Functional abstractions abstractions Resource Hierarchy Root
(Capacity, Controller, Transporter)
(Examples: Connection Module, line or port, routing services, traffic control, capacity regions)
Service-Specific Abstractions
Component abstractions
Service binding abstractions
(Examples: Queues, tables, classifiers, databases, scheduler, path, ( flow or flow aggregates, threads, addresses, filters)
(Examples: Algorithms, protocols, profiles, code-points, policies, index types, entry )
Control Mgmt Transport
Fig. 4: Hierarchical Abstraction Model
The core router abstractions can be viewed in a hierarchical manner. As illustrated in Fig. 4, we depict this hierarchy and provide a layered mapping of the Generic and Service-specific abstractions. In this section, we focus on the generalized abstractions of the router, namely: i) Base abstractions. The base abstractions serve as the major stems of the binding hierarchy for core router services, and represent the highest-level router binding abstractions. They are fundamental services provided by the router, more specifically transporter, controller and capacity.
20
Spyros Denazis et al.
ii) Functional abstractions. Base abstractions are composed of functional element abstractions. For example, switch fabric and line card resources are functional router elements that serve the transport abstraction for forward processing. iii) Component abstractions. Below the functional layer, one or more component abstractions form router or switch functional abstractions. Through network service binding, static components (e.g., queues, schedulers, etc) are realized through the creation or binding of tables or software components composed via programmable instantiation. iv) Service binding abstractions. The service binding interface realizes or binds new or existing network service abstractions to the generic component resources (e.g., scheduler) supporting the router functional elements (e.g., line card port). These tightly-coupled interfaces cast service specific abstractions onto the component implementations, thereby binding service-specific policy, algorithms, protocols and the like to local router component resources.
4.3 Operational Aspects of the L-Interface The structure of each L-interface adds one more dimension to the framework necessary to account for control, management and transport aspects of the services offered through the L-interface. Generally, each resource abstraction is comprised of data structures and methods, thus reflecting an object-oriented approach. In this context, we characterize data structures and methods according to the type of operations they have originated from or have been used. As a caveat, we generalize the operational aspects to have a flexible requirement to support control, management and transport (i.e., not all resources require these services). Finer granularity within each of these basic categories may be possible. For instance, configuration operations may be considered as a sub-category of management. In this manner, it is possible to map control, management and transport services on to the hierarchical abstraction model. Finally, by allowing a resource abstraction interface to reflect operation types in an explicit way (e.g., standard method invocation naming formats) it can assist the consumer or the designer of the L-interface in the use or development of services.
5
Generic Resource Abstractions
Fig. 5 depicts the interface inheritance hierarchy for a Differential Services IP router. As shown, base abstractions are the highest-level interfaces for the Transporter, Controller and Capacity abstractions of the router. The Transporter is primarily responsible for forwarding functional abstractions; the Controller abstraction serves routing and traffic control functional abstractions; and finally, the Capacity abstraction provides abstractions for local computation, QoS and traffic control supporting, e.g., bandwidth scheduling. As mentioned previously, the Capacity branch of the router hierarchy is orthogonal to the Transporter and Controller base abstractions. It represents router resources, which are quantitatively limited and thus would provide methods that exert
Designing Interfaces for Open Programmable Routers
21
partitioning and shaping of the resource capacity. For example, with the memory resource we can partition it as a queue with a specific structure and size. An advantage of this approach is that it creates measurable, hence comparable, specifications for the router resources. This, in turn, can be used to perform a number of operations like bandwidth management, admission control before actually committing router resources to specific data transportation duties. Similar approaches have been proposed in [8,10]. Marker
Classifier Forwarding Engine
Transporter
Connection Module
Forward Mapping
Connection Module Scheduler
Connection Module Queues
Line & Port Queues
Line & Port Line & Port Scheduler
IP Router
Controller
Route Controller
Routing Tables
Traffic Controller
Storage
Capacity
Threads
Memory Disk Space
Transport Configuration Mapping
Traffic Policy Database(s)
Traffic Tables Flows
Computation
Forward Policy Database
Configuration Space Name Space
Forward Paths
Traffic Descriptor QoS Parameters Address Space
Meter Shaper
Differentiated Services Resource Abstractions
Dropper
• DSCP • Scheduling PHB Configuration parameters • Queuing parameters • Customer TCA • AF/MF filters Configuration • Profiles • Treatment Monitoring Data • DSCP Code-point • PHB name Mapping • Description • Filter Traffic • Name Profiles • Qualitative service-level • Quantitative parameters • Traffic profile Traffic • In-profile treatment Treatment • Out-profile treatment
Fig. 5: Differentiated Services Abstraction Model
The functional and component resource layers compose the generic resources for the router from the base abstractions. The functional abstractions of the router may have one or more service implementation scenarios (e.g., routing - RIP, OSPF, multicast) and may require specific component resource abstractions (e.g., queues structures, table, databases) implementations statically resident or created dynamically by way of a network service-specific implementation. These interfaces abstract the generic resources allowing the ability to manipulate, update, read or modify the implementation of the component resource.
6
Service-Specific Abstractions
Service domains are realized and can operate in any of these elements by accessing, manipulating and consuming resources. It is a requirement therefore not to associate generic component resources with a certain network service domain. The binding realization of network services with router component resources occurs within the service binding layer of the abstraction model. Static or dynamic component abstractions are influenced by way of service, domain-specific abstractions. Algorithms, protocols, policies, code-pointers, etc., which are proprietary to the network service, are instantiated through the interfaces created by
22
Spyros Denazis et al.
this layer of the abstraction model. An important aim of this layer is the abstraction of the context of the service being instantiated such that the proprietary nature of the service is hidden or at least mapped to what seems to be a general router resource abstraction. As an example, consider the Differentiated Services functional domain illustrated in Fig. 5, and lets assume again that a generic resource is a memory module. We may desire to reserve a portion of the memory for use by Differential Services. In addition, we should be able to impose a structure on this portion according to the specifications of a Diff-serv table, namely a table with fields for the packet ID tuple and the DS field. An amount of buffers can be reserved, and a specific structure (queue structure) is imposed. This is further complemented with a buffer management scheme, and finally, a portion of bandwidth is allocated and a code-point algorithm is invoked that implements the per-hop behavior. The purpose of the hierarchical abstraction model is to facilitate and guide the process, of using the L-interface as well as extending it as new services are required from the router resources. This can be achieved by starting from a generic abstraction of resources resulting in a number of generic interfaces, which in turn can be used to create service-specific abstractions of resources which are then realized as specialized interfaces, e.g. Differentiated Services interfaces.
7
Scenario – Differentiated Services
The Differentiated Services architectural model [12,13] is positioned as a collection of service mechanisms, which allow network service providers to offer "differentiated" service levels to alternative customer traffic aggregates. This is done so by packet marking, traffic forwarding using Per Hop Behaviors (PHB), and conditioning of traffic through traffic conditioning mechanisms and/or policies [13]. The policies and router service mechanisms are appropriately realized and enforced at boundary routers or edge devices of different provider Diff-serv domains whereby Service Level Agreements (SLA) are deemed necessary. We illustrate in Fig. 5 an abstraction model for Diff-serv which maps according to our proposed generalized resource abstraction model (presented earlier), separating core generic router resources from those that are specific to the realization of Diffserv mechanisms and policies. As such, the service implementation or binding of Diff-serv services to generic router resources support the service programming or provisioning, administration and operational aspects required of the L-interface in a hierarchical manner. This is necessary in that the user (e.g., programmer or administrator) of the L-interface may require operational services rather generically to manage or provision services over (i) entire functional service (e.g., traffic controller); (ii) a generic resource (e.g. classifier); (iii) a service-specific resource (e.g., shaper); or (iv) an entire network service (e.g., Diff-serv) covering multiple service-specific resources. The hierarchical model may also apply when a 'new' network service (e.g., routing service) is being deployed; requiring service binding to the existing network infrastructure through collective instantiation of service-specific 'resources' to generic router resources. Diff-serv building blocks are instantiated at the lowest tier of the resource abstraction hierarchy tree through the three major Base Abstractions, namely, i) PHB and traffic conditioning mechanisms are employed through the Transport; ii) PHB
Designing Interfaces for Open Programmable Routers
23
configuration, traffic conditioning configuration and monitoring are structured through the Controller; and finally, iii) traffic profiles, code-point mappings, traffic treatment policy and parameters are captured under the Capacity Base Abstraction. Operational services (e.g., install, enable or disable PHB configuration) are invoked through the Diff-serv L-interface abstractions through their methods. The object class definitions are structured in a manner that is consistent with our proposed L-interface abstraction model. We describe the following semantics of a generalized interface definition for resource abstractions and as shown in Table 1, an example of the general interface scheme applied to the Per Hop Behavior object, associated with Differentiated Services. General semantics: - name/ID identifies the abstraction with a unique ID - type identifies the abstract data type - structures defines one or more structures supporting the local abstraction - status implementation status, either new (requiring instantiation) or resident - parent abstraction associates the local abstraction with a parent abstraction - peer abstraction defines peer abstractions as required within a hierarchical level -methods defines methods for operational, management or control of the abstraction Table 1. Example: Differential Services Per-Hop Behavior service-specific abstraction Per-Hop-Behaviour (PHB) The per-hop behavior m echanism s are used to forward different traffic types with differing behavior. These are im plem ented via param eterization and policies that affect interfaces queues and schedulers on the router's egress Interface definition specification name/ID: PHB_configuration; type: database status:
Include parent abstraction: traffic_policy_database; Include peer abstractions: LP_interface(); LP_queue(); LP_scheduler(); Structures: Struct{ code-point; *LPqueue_number struct scheduling {parameters}; struct queuing {parameters}; }; Struct{ *LPinterface_number; interfacestatus; EFstatus; Afstatus; MaxEFrate; MaxAFrate; }; Methods: InstallPHBdB(); EnablePHB(); DisablePHB(); ReaddB(), WritedB(), UpdatedB();
24
Spyros Denazis et al.
8
Conclusion
In this paper, we have presented an overview of the IEEE P1520 reference model. Fundamental requirements for open router interfaces were discussed. We have proposed an L-interface framework, introduced through a two-layer model consisting of generic resource and service-specific resource abstractions. In addition, we discussed a hierarchical model for resource inheritance and operational aspects supported by the methods, structures and semantics of individual resources. We also presented a simple scenario using Differentiated Services as an illustrated example of the use of the L-interface.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14.
IEEE Comm. Mag., Special Issue on Programmable Networks, Vol. 36, No 10, Oct., 1998. IEEE Network, Special Issue on Active and Programmable Networks, Vol. 12, No 3, May/June, 1998. Calvert, K, et al, “Architectural Framework for Active Networks”, Version 0.9, Active Networks Working Group, August, 1998. http://www.dyncorp-is.com/darpa/meetings/anets98jul/anets-arch.html Multiservice Switching Forum (MSF). http://www.msforum.org/ Biswas, J., et al., “The IEEE P1520 Standards Initiative for Programmable Network Interfaces”, IEEE Communications, Special Issue on Programmable Networks, Vol. 36, No 10, October, 1998. http://www.ieee-pin.org/ Buckley, W., “Virtual Switch Interface (VSI) Specification”, MSF Contribution Document, MSF98.002, November, 1998. Newman, P., W. Edwards, R. Hinden, E. Hoffman, F. C. Liaw, T. and G.Minshall, “Ipsilon’s General Switch Management Protocol Specification Version 2.0”, RFC 2297, Internet Engineering Task Force, March 1998. C. Adam, M.C. Chan, J-F. Huard, A.A. Lazar, and K-S. Lim, “Binding Interface Base Specification: Revision 2”, OPENSIG Draft, April 1997. http://comet.ctr.columbia.edu/xbind/documentation/ DARPA Active Network Programs. http://www.darpa.mil/ito/research/anets/index.html OPENSIG Working Group http://comet.columbia.edu/opensig/. J.E van der Merwe, S. Rooney, I.M. Leslie, and S.A. Crosby, “The Tempest: A Practical Framework for Network Programmability”, IEEE Network, Vol. 12, No. 3, pp. 20-28, May/June 1998. http://www.cl.cam.ac.uk/Research/SRG/dcan/ Blake, S., et al, “An Architecture for Differentiated Services”, RFC 2475, December 1998. Bernet, Y., et al, “A Framework for Differentiated Services”, Internet Draft , October 1998 (Work in progress).Nichols, K., et al, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, RFC 2474, December 1998. Campbell, A.T., Kounavis, M.E., Vicente, J., Villela, Miki, K. and H. De Meer, "A Survey of Programmable Networks", ACM SIGCOMM Computer Communication Review, April 1999.
RCANE: A Resource Controlled Framework for Active Network Services Paul Menage University of Cambridge Computer Laboratory Pembroke Street, Cambridge, CB2 3QG, UK [email protected]
Abstract. Existing research into active networking has addressed the design and evaluation of programming environments. Testbeds have been implemented on traditional operating systems, deferring issues regarding resource control. This paper describes the architecture, resource models and prototype implementation of the Resource Controlled Active Network Environment (Rcane). Rcane supports an active network programming model over the Nemesis Operating System, providing robust control and accounting of system resources, including CPU and I/O scheduling, and garbage collection overhead. It is thus resistant to many classes of denial of service (DoS) attack.
1
Introduction
Adding programmability to a network greatly increases its flexibility. However, with this flexibility comes greater complexity in the ways that network resources, including CPU, memory and bandwidth, may be consumed by end-users. In a traditional network, the resources consumed by an end-user at a network node are roughly bounded by the bandwidth between that node and the user; in most cases, the buffer memory and output link time consumed in storing and forwarding a packet are proportional to its size, and the CPU time required is likely to be roughly constant. Thus, limiting the bandwidth available to a user also limits the usage of other resources on the node. In an active network hostile (or greedy or careless) forwarding code could potentially consume all available resources at a node. Even in the absence of specific denial of service (DoS) attacks, the task of allocating resources according to a specified Quality of Service (QoS) policy is complicated by lack of knowledge about the behaviour of the user-supplied code. The resources consumed by untrusted code need to be controlled in two ways. The first is to limit the execution qualitatively – limit what the code can do. This involves restricting either the language in which the code can be written, or the (possibly privileged) services which it can invoke. The second way is to limit the code quantitatively – limit how much effect its activities can have on the resources available to the system. This requires fine-grained scheduling and accounting. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 25–37, 1999. c Springer-Verlag Berlin Heidelberg 1999
26
Paul Menage
This paper discusses the design and implementation of a framework to permit such quantitative and qualitative control. Section 2 outlines the design of the framework. Section 3 discusses its implementation on the Nemesis Operating System. Section 4 presents experiments to demonstrate the validity of the approach. Section 5 surveys related approaches to active networking and resource control.
2
RCANE Design
This section provides an outline of the design of Rcane. The architecture follows the principles given in [1] to partition the system: – The Runtime is written in native code and provides access to, and scheduling for, the resources on the node, and services such as garbage collection (GC). – The Loader is written in a safe language (OCaml [2] in the current implementation of Rcane), as are all higher levels. The Loader is responsible for system initialisation and loading/linking other code. – The Core, loaded at system initialisation time, provides safe access to the Runtime and the Loader and performs admission control for the resources on the node. – Libraries may be loaded both at system initialisation and by the actions of remote applications. They have no direct access to the Runtime or the Loader, except where permitted by the Core. 2.1
Discussion – Hardware or Software Protection?
Since an active network node is expected to execute untrusted code, there needs to exist a layer of protection between each principal and the node, and between principals. It is possible to utilise the memory protection capabilities of the node’s hardware, allowing principals to execute programs written in arbitrary languages [3]. However, at the time-scales over which active network applications are likely to execute, this paradigm is too heavyweight. An alternative, taken by Rcane, is to require principals’ code to be written in a safe, verifiable language. This allows much of the protection checking to be done statically at compile or load time, and allows much lighter-weight barriers between principals. In particular, it means that interactions between principals can be almost as efficient as a direct procedure call. 2.2
Sessions
A Session represents a principal with resources reserved on the node. Sessions are isolated, so that activity occurring in one session should have no effect on the QoS received by other sessions, except where explicit interaction is requested (e.g. due to one session using services provided by another session). Figure 1 shows part of the Session interface provided by the Core to permit control over a session and its resources. createSession() requests the creation
RCANE: A Resource Controlled Framework
27
of a new session. Credentials to authenticate the owner of the new session for both security and accounting purposes are supplied, along with a specification of the required resources and the code to be executed to initialise the session. destroySession() releases any resources associated with the current session. loadModule() requests that a supplied code module be loaded and linked for the session. linkModule() requests that an existing code module (possibly loaded by a different session) be made available for use by this session. The code may be specified simply by the interface which it exports, or by a digest of the code implementing the module, to prevent module-spoofing attacks. bindDevice() reserves bandwidth and buffers on the specified network device. Other functions concerning modification of resource requirements are not shown.
bool void bool bool bool
createSession (c : Credentials, r : ResourceSpec, code : CodeSpec); destroySession (void); loadModule (l : LoadRequest); linkModule (l : LinkRequest); bindDevice (d : Device, bu : BufferSpec, bw : BandwidthSpec);
Fig. 1. Part of the Session interface At system initialisation time two sessions are created: – The System session represents activity carried out as housekeeping work for Rcane. It has full control over the Runtime. Many of the control-path services exported from the Loader and the Core are accessed through communication with the System session. – The Best-Effort session represents activity carried out by all remote principals without resource reservations. Packets processed by the Best-Effort session supply code written in a restricted language and are given minimal access to system resources. Access to createSession() is permitted, to allow code to initiate a new session; further packets may then be processed by the newly created session. 2.3
Resource Accounting
Resources used by sessions running on Rcane are be accounted to the appropriate session, and charged to the principal who authorised the creation of the session. Pricing and charging policies will be system dependent. Resource requests are processed by the System session and, if accepted, are communicated to the Runtime’s schedulers. In general, data-path activity, e.g. sending packets, is carried out within the originating session. System modules in the Core are linked against entry points in the (unsafe) Runtime; these are then exported through safe interfaces to which the untrusted sessions can link directly. The Runtime performs a policing function on use of the node’s resources.
28
Paul Menage
2.4
CPU Scheduling
Rcane uses the following abstractions to control CPU usage by sessions: – A virtual processor (VP1 ) represents a regular guaranteed allocation of CPU time, according to some scheduling policy. A session may have one or more VPs. All activities carried out within a single VP share that VP’s CPU guarantee. – A thread is the basic unit of execution, and at any time is be either runnable (working on computation), blocked (e.g. on a semaphore, or awaiting more resources to become available) or idle (in a quiescent state, awaiting the arrival of further work items). – A thread pool is a collection of one or more threads. Each thread is a member of one pool. Associated with each pool is a queue of packets and a queue of events. Each pool is associated with a single VP; its threads are only eligible to run when its VP receives CPU time. Incoming packets (see Sect. 2.5) are routed to the associated pool and added to its packet queue. Events (functions for execution at a given time in the future) may be added to a pool’s event queue. Whenever there is work to be done in a pool (either newly arrived packets, or events whose timeouts have passed), any idle threads in the pool are be dispatched to process the work. When a running thread has finished its task, it returns to the idle state. This allows sessions flexibility in how they map their work onto threads. For tasks that have to be processed serially (e.g. routing a stream of packets) a single thread might be bound into a pool, to perform all processing required for that pool. For network services where it is desirable to service multiple requests at a time, several threads can be bound into a single pool, and as packets come in they will be dispatched to an idle thread. Alternatively, sessions wishing to perform both event-driven and packet-driven activities can choose between running two threads in separate pools (to prevent interference between the two activities), or saving resources by having a single thread in one pool (but risking occasional interference between packet and event activity). Similarly, for even better isolation between activities, the session could associate each activity with a separate VP (i.e. give each activity its own particular CPU guarantee). 2.5
Network I/O
Sessions running under Rcane can pass demultiplexing specifications to the Runtime, associating incoming flows of packets with specified pools and processing functions. To prevent crosstalk between the network activity of different principals, all packets are demultiplexed to their receiving pools by the Runtime at the lowest possible level. As little work as possible is carried out on those packets before 1
For those familiar with the Nemesis Operating System, over which this work is based, this abstraction is distinct from the normal Nemesis notion of a VP
RCANE: A Resource Controlled Framework
29
demultiplexing. Once the VP associated with the receiving pool is given CPU time, one of the pool’s idle threads can be used to invoke the flow’s processing functions. This allows each session full control over decisions such as whether, and what kind of, authentication is used for packets on a given flow. For non-authenticated flows, a session can specify a function which processes the packet’s payload immediately; should authentication be required, the session’s favoured authentication routines may be invoked with the relevant authentication data from the packet. A session may request a guaranteed allocation of buffers for receiving packets from a given network device. Incoming packets demultiplexed to the session will be accounted to this allocation, and returned to it when packet processing is completed. Packets for sessions without a guaranteed allocation are received into buffers associated with the Best-Effort session. Thus, although such sessions can receive packets, they will be competing with other sessions on the node. Similarly, a session may request its own allocation of guaranteed transmission bandwidth and buffers for a specified network device, or may use the transmission resources of the Best-Effort session. 2.6
Memory
The memory managed by Rcane falls into four categories: network buffers (discussed in Sect. 2.5), thread stacks, dynamically-loaded code and heap memory. Network buffers and thread stacks are accounted to the owning session in proportion to the memory consumed. Charging for keeping code modules in memory is likely to be a system specific policy e.g. it might be the case that linking to a commonly used module would be less expensive than loading a private module. Heap memory presents more challenges. Since safe languages generally require GC to prevent malicious (or careless) programmers exploiting dangling pointers, Rcane needs to provide GC services. The framework must be able to support the following features: – Efficient tracking of the memory usage of each session. – Ability to revoke references from other sessions when a session is deleted. – Prevention of crosstalk between sessions due to GC activity. These three requirements suggest giving each session its own independently garbage-collected heap. Tracking the allocations made by a session is straightforward; deciding to whom to refund garbage-collected memory is difficult to perform efficiently without separate heaps. Sessions which have completed their tasks (or whose authorisation/credit has expired) are destroyed – if other sessions have pointers to their data, it is impossible to safely release the session’s memory. Finally, deciding to whom to account the time spent on GC activity is difficult without separate heaps. Rcane uses an incremental garbage collector to prevent excessive interruptions to execution. Each session reserves a maximum heap size, and tunes the parameters of the GC activity – such as frequency and duration of collection
30
Paul Menage
slices – to allow it to trade off responsiveness against overhead. Charging can then be based on the size of the reserved memory blocks that comprise the heap, rather than the amount of live memory within those blocks, simplifying the accounting process. 2.7
Service Functions
The use of separate heaps and garbage collectors for each session requires Rcane to prevent the existence of pointers between different sessions’ heaps. In general this does not present a problem, since applications will not generally be relying on shared servers to perform data-path activities. However, in some situations it may be necessary or desirable to communicate with other sessions: – When talking to the System session to request a change in reserved resources, or to make use of services provided by the System session (such as default routing tables). – Some sessions may wish to export services to other sessions running on the node (e.g. extended routing tables, or access to proprietary algorithms). In each of these cases, a client executing in one session requires a local reference to a service function implemented in a different session. This reference is opaque to the client, and enables the runtime to identify the service associated with the reference. Invoking this service involves the following steps: 1. Copying the function’s parameters into the server session’s heap. 2. Invoking the underlying function in the context of the server’s heap. 3. Copying the results back into the client session’s heap. During both the invocation and return copying phases, the runtime notes when a copied value is itself a service, and creates a new reference (or reuses an existing reference) to the same service which is available in the destination session. Services can thus be passed from session to session. Server-specified policy can limit such copying to allow additional control over which sessions can utilise a service. Any work carried out by the server during the invocation is performed using the client’s thread, and accounted to the client’s CPU allocation. Figure 2 shows the interface provided for creation and manipulation of services. create() takes an ordinary function and returns a service function – invoking the returned function will cause the session switch described above. Thus invoking a service appears the same as invoking an ordinary function. Other parameters to create() specify the maximum amount of memory to be copied when invoking the service and whether the service may be passed from one client to another. The memory limit is currently a rather crude method of preventing DoS attacks by clients on servers. Ideally, the server would be able to inspect the data before it was copied, but this could result in untracked pointers from the server’s heap to the uncopied data in the client’s heap. destroy() withdraws a service – clients attempting to invoke it in future will experience a Revoked exception.
RCANE: A Resource Controlled Framework
31
type α → β service exception Revoked α → β service create (func : α → β; limit : int, shared : bool); void destroy (s : α → β service);
Fig. 2. The Service interface
3
Implementation
A prototype of Rcane has been implemented over the Nemesis Operating System [4]. The Runtime is based on the OCaml system from INRIA [2], with support for real-time CPU scheduling, multiple isolated heaps and access to Nemesis I/O. The Best-Effort session uses the PLAN interpreter [5] to provide a limited execution environment for unauthenticated packets, with PLAN wrappers around the Session interface to permit authentication and session creation. Rcane interoperates with PLAN systems running on standard (non resource-controlled) platforms, allow straightforward control of an Rcane system. Support for demand-loaded code in the style of ANTS [6] is also provided. In general, data path operations such as network I/O and CPU scheduling are implemented in native code in the runtime for efficiency. Most control path operations (such as bytecode loading and session creation) are implemented in OCaml for flexibility and ease of interaction with clients. 3.1
CPU Scheduling
CPU scheduling is accomplished using a modified EDF [7] algorithm similar to that described in [8]. Each VP’s guarantee is expressed as a slice of time and a period over which the time should be received (e.g. 300µs of CPU time in each 40ms period). Whenever the Rcane scheduler is entered, the following sequence of events occurs: 1. If there was a previously running VP, the elapsed time since the last reschedule is accounted to it. 2. The next VP to be run, and the period until its next pre-emption are calculated. From this point onwards, all work carried out is on behalf of the new VP, and hence can be accounted to it. 3. If there are packets waiting on the owning session’s incoming channels (see Sect. 3.3) they are retrieved and transferred to the appropriate pool’s packet queues. Any idle pools with pending events are marked as runnable. 4. The next pool and thread to be run are selected. 5. If the selected thread is active in a heap that is currently in a critical GC phase (see Sect. 3.2) then the thread carrying out the critical GC is activated instead, until the phase has completed. 6. The selected thread is resumed.
32
3.2
Paul Menage
Memory
The garbage-collector is based on the OCaml collector. When tracing the roots of a heap, it is necessary to suspend all threads that might access that heap. To ensure that all appropriate threads are stopped during such critical GC activity, each thread has associated with it a stack of heaps. When a thread makes a service call through to a different session, a pointer to the server’s heap is pushed on to the thread’s heap stack. When returning, the server’s heap pointer is popped from the heap stack. The top heap pointer on each stack is the thread’s active heap. (For brief periods of time, while transferring control between two sessions, a thread will actually have both of the top two heaps marked as active.) Whenever critical activity is being carried out on a heap, all threads which are active in the heap are suspended, other than to carry out the GC work. The majority of the GC work can be carried out without suspending threads. Tracking the threads which have access to each heap minimises the number of threads’ stacks which must be traversed to identify roots, and prevents QoS crosstalk between principals which are not interacting. Additionally, a thread executing a service call in a different session need not be interrupted (possibly whilst holding important server resources) due to critical GC work in its own heap. Since no pointers to the client’s heap can be carried through to the server, the code running in the server cannot cannot access that heap, and so the thread need not be suspended. Upon returning from the service call it is suspended if the activity is still in progress. When a session is destroyed, any references to its exported services are marked as revoked; attempts to invoke them generate an exception. 3.3
Network I/O
Rcane flows map directly to Nemesis I/O channels. A channel is a connection to a device driver associated with a particular set of flows, specified by a packet filter. The current implementation of Rcane supports channels for UDP packets and Ethernet frames, allowing interaction both on a local physical active network or on a larger virtual network tunnelled over UDP/IP. Link-level frames are classified on reception by the network device drivers via a packet filter, which maps the frame to the appropriate channel. In the case of sessions without guaranteed resources allocated on a given device, the frame is mapped to the Best-Effort session’s channel for that device. If the channel has free buffers available, the frame is placed in the channel – no protocol processing is performed at this point. If the channel has no free buffers, the packet is dropped. Thus, if a session is not keeping up with incoming traffic, its packets will get discarded in the device driver, rather than queueing up within a network stack as might happen in a traditional kernel-based OS. At some later point, when the appropriate VP is scheduled by Rcane to receive CPU time, the packets are extracted from the channels and demultiplexed to the appropriate thread pools for processing. Transmit scheduling is performed by the device drivers following a modified EDF algorithm, as described in [9].
RCANE: A Resource Controlled Framework
4
33
Evaluation
This section presents the results of various test scenarios run to verify the QoS guarantees and resource isolation provided by Rcane. 4.1
CPU Isolation
To demonstrate the isolation of multiple VPs from one another, three separate sessions, each with a single VP, were started at 5 second intervals. In each case an OCaml bytecode module was loaded over the network to be used as the entry point for the session. Session A runs on best-effort time only. Session B requests a 1ms slice in each 4ms period. Session C begins running on best-effort time. After 3 seconds it requests a CPU guarantee of 400µs each 2ms. It makes further changes to its allocation and then exits. For this experiment, requests for guaranteed CPU allocation also specified that they did not wish to additionally receive a portion of the best-effort time. Figure 3 (a) shows the amount of CPU time actually received by each session over the course of each scheduler period. Initially session A receives all the CPU; later B arrives and receives a constant 25% of the CPU. When C arrives, it initially shares the remaining 75% best-effort time with A; then it switches to guaranteed CPU time (initially 20%, then 40%, then 10%). It can be seen that the guarantees requested from the system were accurately respected at fine timescales. 4.2
Network Transmission
Figure 3 (b) shows a trace of network output from three sessions, each attempting to transmit flat-out. Session D has no guaranteed bandwidth. E has an allocation of 33% (on a 100Mb/s link). F starts with a guarantee of 25%. After about 12s, it requests 45%, thus reducing the best-effort bandwidth available to D. After another 2s, it requests 65%. Now the link is saturated and there is no best-effort transmission time available. After a further 2s it returns to 25%, allowing D to begin transmitting again. It can be seen from the trace that the desired resource isolation is achieved. 4.3
Memory Isolation
To demonstrate the utility of running different principals’ sessions in their own heaps, two scenarios were considered. In (a), VPs G and H are running in the same session. Initially both are generating small amounts of garbage. After a period of time, G begins generating large amounts of garbage. Scenario (b) is the same, but with the two VPs running in separate sessions (and hence having separate heaps). Figure 4 shows the outcome of these scenarios. In (a), both VPs are initially doing small amounts of GC work. G is running best-effort, H has a guarantee of 1ms in each 4ms period. When G switches to generating large
Paul Menage 100 100
A B C
F E D
80
80 TX bandwidth (Mb/s)
% of CPU time (averaged over VP’s period)
34
60
40
60
40
20
20
0
0 5
10
15
20
25
30 35 Time (s)
40
45
50
55
8
10
12
(a)
14 Time (s)
16
18
20
(b)
Fig. 3. (a) Dynamically changing CPU guarantees. (b) Network output
% of CPU time (averaged over VP’s period)
% of CPU time (averaged over VP’s period)
amounts of garbage, the time it spends garbage collecting increases substantially. However, as shown by the noisy region at the bottom right of the graph H also ends up doing an irregular but substantial amount of GC work. Although H has its own independent CPU guarantee, sometimes critical GC activity (such as root tracing) is taking place when its thread is due to run; it must complete this GC work before normal execution can be resumed. In (b), H is unaffected by the extra GC activity caused by G, since it is running in a separate session and hence does not share its heap. 100
G (total CPU used) 80
60
G (GC work) H (total CPU used)
40
20
H (GC work) 0 22
24
26
28 Time (s)
30
(a) Single heap
32
34
100
G (total CPU used) 80
60
G (GC work)
40
H (total CPU used)
20
H (GC work)
0 20
22
24
26
28 Time (s)
30
32
34
(b) Separate heaps
Fig. 4. Avoiding QoS crosstalk due to garbage collection
5 5.1
Related Work Active Networks
Many approaches to loading user-supplied code onto a network node are built on an existing safe language such as Java (including ANTS [6] and Hollowman [10]) or OCaml [2] (including ALIEN [11] and the PLANet service loader [12]). An
RCANE: A Resource Controlled Framework
35
alternative is to start with a very restricted language, and rely on the limitations of the language to bound the resources consumed by the user’s code. This approach is taken by PLAN [5] and Smart Packets [13]. PLAN also extends the concept of the hop count found in IP to apply to recursive or remote invocations, bounding the resources that a packet can consume globally. The Active Networks Working Group NodeOS Interface Specification [14] aims to standardise on an API addressing similar issues to Rcane, although at a lower level of abstraction. 5.2
Resource Control and Isolation in Safe Languages
JRes [15] provides Java resource control using minimal runtime support. This provides portability, but with high accounting overheads. The J-Kernel [16] gives Java support for multiple protection domains and capabilities to allow revocation of services, but does not fully partition the JVM heap. The Java Sandboxes [17] project allows separate heaps in a modified JVM, by preventing stores of interheap references at run-time. This has serious efficiency consequences, and also fails to address the issue of QoS crosstalk due to critical GC activity. 5.3
Resource Control in Operating Systems
Nemesis [4] aims to provide reliable resource guarantees to applications. It is based on the principles that applications should perform as much of their own work as possible, without relying on shared servers for data-path activities, and that applications should have full control over their own resources. The Exokernel [18] takes a similar approach, but motivated by performance gains, rather than provision of QoS guarantees. Scout [19] seeks to associate resources with data paths rather than with users or applications.
6
Conclusions and Future Work
This paper has presented the design for Rcane, a Resource Controlled Active Network Environment, and its implementation over the Nemesis Operating System. Rcane supports the execution and accounting of untrusted code written in a safe language. Direct interference between principals is prevented through the use of a safe language. QoS interference is prevented through scheduling and accounting. Experiments showed that principals running on Rcane do experience isolation with respect to CPU time, network bandwidth and GC activity. Areas for future work include: a more developed charging and accounting model, resource control and transfer on a network-wide scale, and allowing principals more flexibility in specifying scheduling and memory usage policies.
Acknowledgements The author wishes to thank Jonathan Smith at the University of Pennsylvania, where part of this work was carried out, and Jonathan Moore and Michael Hicks for developing the PLAN infrastructure.
36
Paul Menage
References 1. D. Scott Alexander. ALIEN: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, September 1998. 26 2. Xavier Leroy. Objective Caml. INRIA. http://caml.inria.fr/ocaml/. 26, 31, 34 3. Dickon Reed, Ian Pratt, Paul Menage, Stephen Early, and Neil Stratford. Xenoservers: Accountable Execution of Untrusted Programs. In Seventh Workshop on Hot Topics in Operating Systems (HOTOS-VII), March 1999. 26 4. I. M. Leslie et al. The Design and Implementation of an Operating System to Support Distributed Multimedia Applications. IEEE Journal on Selected Areas In Communications, 14(7):1280–1297, September 1996. 31, 35 5. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A Packet Language for Active Networks. In Third ACM SIGPLAN International Conference on Functional Programming (ICFP), 1998. 31, 35 6. David J. Wetherall, John Guttag, and David L. Tennenhouse. ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. In 1st IEEE Conference on Open Architectures and Network Programming (OPENARCH), April 1998. 31, 34 7. C. Liu and J. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real-time Environment. Journal of the Association for Computing Machinery, 20(1):46–61, February 1973. 31 8. Timothy Roscoe. The Structure of a Multi-Service Operating System. Technical Report 376, University of Cambridge Computer Laboratory, August 1995. 31 9. Richard Black, Paul Barham, Austin Donnelly, and Neil Stratford. Protocol Implementation in a Vertically Structured Operating System. In 22nd IEEE Conference on Local Computer Networks (LCN), 1997. 32 10. Sean Rooney. Connection Closures: Adding application-defined behaviour to network connections. Computer Communications Review, April 1997. 34 11. D. Scott Alexander, Marianne Shaw, Scott M. Nettles, and Jonathan M. Smith. Active Bridging. In ACM SIGCOMM Conference on Applications, Technologies, Architectures and Protocols for Computer Communication, September 1997. 34 12. Michael Hicks, Jonathan Moore, D. Scott Alexander, Carl Gunter, and Scott Nettles. PLANet: An Active Internetwork. In IEEE INFOCOM ’99, 1999. 34 13. Beverley Schwartz, Alden Jackson, Timothy Strayer, Wenyi Zhou, Dennis Rockwell, and Craig Partridge. Smart Packets for Active Networks. In 2nd IEEE Conference on Open Architectures and Network Programming (OPENARCH), 1999. 35 14. Active Networks NodeOS Working Group. NodeOS Interface Specification. Draft. 35 15. Grzegorz Czajkowski and Thorsten von Eicken. JRes: A Resource Accounting Interface for Java. In ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), November 1998. 35 16. C. Hawblitzel, C.-C. Chang, G. Czajkowski, D. Hu, and T. von Eicken. Implementing Multiple Protection Domains in Java. In 1998 USENIX Annual Technical Conference, June 1998. 35 17. Philippe Bernadat, Dan Lambright, and Franco Travostino. Towards a Resourcesafe Java. In IEEE Workshop on Programming Languages for Real-Time Industrial Applications (PLRTIA), December 1998. 35
RCANE: A Resource Controlled Framework
37
18. Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. Exokernel: an Operating System Architecture for Application-level Resource Management. In 15th ACM Symposium on Operating Systems Principles (SOSP), volume 29, 1995. 35 19. A. Montz, D. Mosberger, S.W. O’Malley, L. Peterson, and T. Proebsting. Scout: A Communications-Oriented Operating System. Technical report, Department of Computer Science, University of Arizona, June 1994. 35
The Protean Programmable Network Architecture: Design and Initial Experience Raghupathy Sivakumar, Narayanan Venkitaraman, and Vaduvur Bharghavan University of Illinois, Urbana-Champaign, IL 61801, USA {sivakumr,murali,bharghav}@timely.crhc.uiuc.edu http://timely.crhc.uiuc.edu
Abstract. This paper presents Protean, a programmable network architecture for future networks. Protean is an event-driven network architecture that allows service providers, applications, and even individual flows to customize the network services, while at the same time providing efficient data paths for flows that use default services. A key feature of Protean is the support for state management. A service that is invoked at one node has the ability to access and update non-local state, and the management of distributed network state is achieved by a core-based self-configuring infrastructure in Protean.
1
Introduction
The next generation Internet is expected to support very diverse environments (commercial heterogeneous wireline/wireless networks), applications (multimedia, WWW, telnet), and workloads (heterogeneous unicast and multicast streams with different quality of service requirements). The problem with supporting such diversity in a single network infrastructure is that different applications have very different requirements from the network. Consequently, it is clear that the network must play a more active role in supporting the needs of the applications and end users. To this end, there has been a lot of recent discussion regarding the design and deployment of active networks. Unlike traditional network architectures wherein the network provides only best effort datagram service and all the smarts reside in the end hosts, applications in an active network have the ability to inject specialized functionality into the routers of the network. In this paper, we present an overview of the active router architecture and state management in the PROTEAN (PROgrammable TEchnology for Active Networks) active network that is being developed at the University of Illinois. Protean is similar to other programmable network approaches in that it provides for the dynamic injection of services, advertisement of services, and a programmable abstraction of the network to service providers and even applications. However, Protean is distinct in terms of its focus on state management. While related work has typically focused on the mechanisms for injecting and executing customized services in the network, to our knowledge there has been very little study on how services can access and manipulate non-local state. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 37–47, 1999. c Springer-Verlag Berlin Heidelberg 1999
38
Raghupathy Sivakumar et al.
Of course, this is a critical issue in the practical deployment and use of active networks, since services must be able to access and update non-local network state to make intelligent decisions about packet handling. At the same time, the state management needs to be low overhead, and the state that is monitored needs to be extensible. Protean allows services invoked in a router to access and update non-local state, e.g. making reservations along an entire path or access routing tables of other routers. In essence, services in Protean are provided with a ‘distributed shared memory’ abstraction of the network. This has three key advantages: (a) it makes writing services much easier, (b) it allows the network to arbitrate resources and access among competing services, and (c) it provides a uniform framework for the dissemination of both network state (such as link bandwidth, expected delays, and resource availability) and available services in the network. On the other hand, aggregating and maintaining ‘network state’ efficiently is a challenging task. We discuss more about the Protean architecture and state management in subsequent sections. Section 2 presents the architectural framework of Protean and Section 3 describes the Protean state management. Section 4 presents an illustrative case study of the Protean architecture and Section 5 concludes this paper.
2
The PROTEAN Active Network Architecture
As shown in Figure 1, the Protean architecture has three key components (a) the router architecture (services, virtual network contexts etc.,) (b) the state management architecture (network state monitoring, propagation and access, distributed services management) shown in two separate modules, and (c) the programming and runtime framework. We focus on the first two components in this paper. 2.1
The Protean Active Router Architecture
The Protean router is based on an event-driven model. An event is a fundamental entity in the router architecture. Events are associated with event-handlers, and an (event, event-handler) pair is termed a service. A set of services associated with the data path of a packet in a router is contained in a virtual network context (VNC). A VNC in Protean consists of the data path a packet traverses from the point it enters a router till when it leaves the router, the probable events along the data path, the handlers for the events (services) and finally the state space for the VNC. A virtual network context is typically populated with a number of flows that reside within the context. The router architecture allows for the creation of customized VNCs by a service provider, application, user, or even individual flows. VNCs are hierarchically structured, and children VNCs inherit services from their parents by static scoping rules. Figure 1 shows the architecture of a typical router in Protean. Although the Protean router architecture is fundamentally similar to the programmable router approach [1,2], it differs from existing approaches in two key aspects: (i) the way virtual network
Distributed Services Manager Events, Event Handlers (Services)
Example Service Sets: VNC-ID ServiceSet
PROTEAN SWITCH
network state abstraction
39
ehU1 User defined Events
Programming and Runtime Framework access state update state
Default Services Application Injected Services
Virtual Network Context abstraction
1.1.1
inject services
Applications
access services
The Protean Programmable Network Architecture
1.1.1
VNC classify
VNC Classification
State Manager
(a)
Event 1
Event 2
1.1.1 eh1.A
1.2.1 eh2.A
1.1
eh1.B
1.1 eh2.B
1.2
eh1.C
1.2 eh2.C
*
eh1.D
*
Event n
1.1.2
ehU1, eh1.A, eh2.B eh1.B, eh2.B
1.2.1
eh1.C, eh2.C
2.1
eh1.D, eh2.D (B)
eh2.D (A)
(b)
Fig. 1. (a) Protean Active Network Architecture: The rectangular boxes show components of the architecture while the shaded planes indicate levels of abstraction (b) Protean Router Architecture: For each event in the data path of a packet, there exists a mapping from VNC-ids to event handlers. For a given packet, a maximal prefix match of the VNC-id of the packet is used to retrieve the appropriate event handler (A). The set of event handlers along the data path constitutes the Virtual Network Context of the flow (B)
contexts are created and maintained and (ii) the nature of the state space that is provided to virtual network contexts. Since Protean maintains an hierarchy of VNCs, individual flows are allowed to create their customized virtual network contexts by either inheriting from an existing VNC, or by building a VNC from scratch (by injecting handlers for all events in the trigger points along its data path) or by a hybrid of the two approaches (inheriting an existing VNC, modifying existing services, adding new services etc.). Also, a VNC in Protean has access to not only local state but also to non-local state. Moreover, the state space in Protean is itself programmable thus allowing for flows to inject their own state variables in their state spaces which would from then on be maintained by the Protean state manager. All services in Protean are injected and executed in the kernel level to improve efficiency, while the state management is done in user level in order to limit the complexity of the kernel. The rest of this section describes what a virtual network context in Protean is and how a virtual network context is setup by a flow. 2.2
The Protean Virtual Network Context
A virtual network context in Protean can be defined as a set of services. Each flow at a router is associated with a particular VNC and all data packets belonging to that flow are processed within this VNC. Each VNC is associated with an unique VNC-Id. Switches are made programmable by allowing flows to create their own customized virtual network contexts. The following is the list of components forming a virtual network context for a given flow:
40
Raghupathy Sivakumar et al.
– The data path for the flow within the router. While in conventional routers the data path is typically the same for all flows, flows in an active network might potentially want to traverse different data paths based on their requirements. For example, flow 1 may choose to be routed by the standard routing algorithm while flow 2 may choose to be routed by a QoS routing algorithm. – Events, handlers and services. We use the term service to signify an (event, handler) pair. Events are of two types: (a) basic events, which are predefined by the network, and (b) user-defined events. Basic events are of two types: those whose handlers are non-programmable, and those whose handlers are programmable. Note that only the top level for these services is non-programmable. Within a virtual network context, any scheduler or resource allocation policy can be used among its flows/children contexts. User-defined events are not a part of the default network event set. Thus, user-defined events can only be triggered by user-defined event-handlers. – The VNC state space. A VNC in Protean includes a programmable state space that the particular VNC has access to. While the state space consists of some default state variables in it by default (routing table, CPU utilization etc.,), Protean allows flows to program the state space to include non-local state and even newly defined state variables. For example, if a flow needs to have access to the congestion in all of its next hop routers, it can introduce new state variables in its virtual network contexts that indicate the levels of congestion in neighboring routers. The Protean state manager would then be responsible for monitoring this state and keeping the state consistent. Section 3 describes in detail how this is achieved. The Protean VNC can thus be expressed as follows: V N C := (dataP ath, (Service1, Service2, ...), stateSpace) where, services are of the form ((event1, eventHandler1), (event2, eventHandler2)...) and the stateSpace is a union of the pre defined default state variables and the state variables programmed by the particular flow. Each of the above components are made programmable in Protean, paving the way for a programmable router. The next section explains how a V N C is setup by a flow. 2.3
Programming a Protean Router
As mentioned before, at a given router, each flow is associated with its own virtual network context. Switches are made programmable by allowing flows to customize their virtual network contexts. While one extreme of customization would involve building from scratch each of the components that compose the V N C, the other extreme would be to inherit an entire V N C from the set of existing V N Cs. Setting up a VNC with customized services is a one-time effort; if all the services selected to compose the VNC are already in the router,
The Protean Programmable Network Architecture
41
then the overhead for creating a VNC is negligible; if some of the services need to be downloaded over the network, then the overhead for creating a VNC is significant. In Protean, flows inherit by default the V N C created by the closest ancestor. Thus, if a service provider creates its own virtual network context V N Ci , all flows originating from hosts subscribing to that service provider would by default inherit V N Ci in the absence of other V N Cs belonging to more closer ancestors in the organization hierarchy. Thus, VNC set up in Protean can be classified into two: – Default VNC setup. A flow that does not want to incur the overhead of setting up a customized VNC can start transmitting its packets without going through the VNC set up process. At each of the routers the flow goes through, it is associated with the VNC belonging to the closest ancestor of the flow in the organization hierarchy. The closest ancestor is identified by performing a max-prefix match of the flowId and the ids of the available VNCs at the router. – Customized VNC setup. For flows that do want to setup their own customized VNCs, Protean offers three choices: (i) inherit an entire V N C from the available set of V N Cs, (2) customize only portions of an inherited V N C or (iii) build the entire V N C using injected modules. It is important to note here that setting up a VNC does not involve an explicit setting up phase with a high overhead. Rather, all it involves is creation of mappings between events on the data path and appropriate event handlers. Hence the VNC setup phase in Protean is an implicit and not an explicit set up phase. The Protean state management architecture plays a key role in enabling flows to customize existing VNCs. The state management architecture provides the new flows with information about existing V N Cs including the various options available to build the components of the VNCs (data paths, trigger zones, services, etc.,) from which the flow chooses portions of existing VNCs or an entire VNC.
3
The State Management Architecture
The goal of state management in Protean is to provide services access to nonlocal state in a consistent and available manner. At the same time, the state management needs to be scalable, low-overhead, and robust. These are contradictory goals, because the more non-local state a router caches, the more consistency management it needs to perform. Likewise, the more dynamic the network becomes, the more likely it is for cached non-local state to become outdated, thus leading to either lower availability or more consistency management overhead. In order to balance these issues, Protean adopts a 2-level approach for state management: (a) it hierarchically clusters the network, and a node in a cluster only maintains non-local state about the cluster, and (b) it creates and maintains a self-configuring core infrastructure in each cluster, that is responsible for aggregating the network state within the cluster, and providing the ability for individual nodes to access and update this state.
42
Raghupathy Sivakumar et al.
In this paper, we focus on the intra-cluster state management infrastructure of Protean. Specifically, we describe how the core is formed, and how it propagates the cluster state to the nodes in the core. The mechanisms for inter-cluster state propagation and abstraction are still ongoing work, and are not discussed further. State management within the cluster has two key components: (a) generation of the core nodes, and (b) propagation of link state. We describe each in turn. 3.1
Generation and Maintenance of the Core
The core nodes of the network, together with tunnels that interconnect core nodes form an infrastructure called the core network. The core network serves to maintain and propagate state in the active network. In Protean, only autonomous networks (or stub networks that hang off transit networks) use a core network for state management. For transit networks which typically are bigger and serve a larger number of flows, the use of a core network might not be a scalable and effective option and in a related work, we propose a scalable and low-overhead state management mechanism for transit networks [3]. In this paper, we focus on the use of core networks for state management. Core networks in Protean are constructed by approximating the minimum dominating set of the underlying network [4] and hence satisfy two properties: (a) each node is either a core node or has low latency access to a core neighbor in order to access non-local state, and (b) the number of core nodes is minimized thus reducing the consistency management overhead. Each core node establishes tunnels with all of its nearby 1 core nodes. Once we come up with the core nodes and establish tunnels among nearby core nodes, then we are all set in terms of having an infrastructure that can capture the state of the cluster and propagate it among the core nodes. State is propagated to the nodes in the cluster via the state propagation mechanism described below. Also, nodes that are not in the core set access non-local state via a transparent mechanism, that allows a non core node to maintain a strongly consistent copy of the cached state with its dominating core node. 3.2
Propagation of State in the Core
The key remaining issue is how the cluster state is propagated among the core nodes. To take a concrete example of ‘state’, we will take the available link bandwidth as an example. Available link bandwidth is a particularly dynamic piece of state, because it changes every time a new flow starts or an ongoing flow terminates. For this example, we will assume that a monitoring process is available to compute the available bandwidth. Thus, the focus of this section is only on how this state is propagated, and what level of consistency we can expect among the core nodes. 1
We define nearby core nodes of a node u as core nodes that are less than or equal to 3 hops from u. It can be shown that every core node will have atleast one other core node within 3 hops from it [4].
The Protean Programmable Network Architecture
43
The state propagation mechanisms in Protean are motivated by three reasons: (a) each core node must maintain up-to-date local state, (b) if a node is aware of a non-local resource, it is a potential contender for the resource, thus the state corresponding to a resource must not be propagated far into the cluster if the available resource is small, and (c) if a resource is fluctuating, consistency overhead in maintaining the updated value is unacceptably high, and so state corresponding to fluctuating resources must be kept local. In essence, the goal of the state management algorithm is to propagate stable abundant resource state throughout the core nodes, and restrict the propagation of unstable or scarce resource state. We achieve this goal by creating two types of waves: slow moving increase waves that signal an increase in the resource, and fast moving decrease waves that signal a decrease in resource. The basic idea is that for fluctuating resources, the fast moving decrease wave triggered by a resource decrease will quickly overtake and kill the slow moving increase wave that was triggered by a previous resource increase. Likewise, a stable high bandwidth resource will eventually propagate to all core nodes by virtue of the increase wave. Increase waves have a time-to-live, i.e. maximum distance to which they can be advertised, which is a function of the available resource. 3.3
Using the Core to Perform State Management
Having described the core infrastructure to aggregate and propagate intra-cluster state, we are faced with five important issues: (a) how do services access the state, (b) what are the consistency semantics for distributed state, (c) what is the trade-off between providing read-only versus read-write access to state, (d) what are the trade-offs between intra-cluster and inter-cluster state management, and (e) how are services propagated in this infrastructure?. While (d) is still part of ongoing research, we discuss the other issues below. 1. Event-handlers are instantiated as kernel-level modules, but the state manager process is user-level. We have looked at two ways for event-handlers to access non-local state that is managed by the state manager: (a) via upcalls, and (b) via shared memory pages between the user-level and kernel-level modules. We have chosen the latter approach because of its simplicity and efficiency, though the former approach is more scalable when the state manager manages large amounts of state. 2. The consistency semantics of each state element is dependent on several factors, most importantly, the granularity at which waves are triggered. Between a non-core node and its core dominator, the caching of state is on-demand at the non-core node, and the consistency semantics is strong consistency. Among the core nodes, the consistency semantics is weak consistency, and Protean currently provides no guarantees that the accessed state is indeed correct. For guaranteed state update, we expect that a service will use the available non-local state as a read-only resource and directly propagate a state update request to the node that controls the state element.
44
Raghupathy Sivakumar et al.
3. For read-write accesses of state Protean supports two consistency models. The first model supports weak consistency semantics in which reads are served using the local copy of the state and writes are done to the local copy with a lazy update of the owner’s copy (the primary copy of the state). The second model supports strong consistency semantics in which all reads and writes propagate to the owner’s copy and block till the access/update is completed. Of course, it is evident that these two models have a tradeoff between the “strictness” of consistency and the overhead for providing such consistency. It is our belief that for most services in the network, strong guarantees are not required, thus the weak consistency model will be sufficient. Also, for both mechanisms, once the update is made, the state manager is responsible for propagating the updated state. 4. It is easy to see how service dissemination is achieved using the core. When a cluster node has access to an event handler, it advertises itself as a node which can be contacted to obtain a copy of the event handler. Directory service is thus an element of state, and directory service is updated at a core node whenever a core node either acquires or relinquishes a copy of an event handler. We do have efficient mechanisms for propagating services across clusters. In this case, we simply aggregate the services available in a cluster, and nominate a ‘clusterhead’ that acts as the repository of these services. Service dissemination is then carried out hierarchically across different cluster levels.
4
The Active Router - A Case Study
In this section, we present a case study of the Protean active network architecture that serves two purposes: (i) it acts as a proof of concept for the Protean router architecture and (ii) it provides a way of evaluating the Protean architecture. The case study implements an Active Dropping Router that allows flows to dynamically reconfigure the dropping policies for their respective queues. The case study uses a prototype implementation of the Protean router architecture and illustrates the router-level behavior - specifically, it shows how services are instantiated in the router, and perform measurements to calibrate both the overhead of service instantiation, and the improvement in functionality due to the introduction of application-specific services. More case studies of the Protean architecture are presented in [3]. We customize the “packet dropping” behavior of a Protean router. In a typical Protean event sequence, when a packet is ready to be queued in a designated queue at the output link, the “packet-level admission test” event is invoked. The event handler for this event determines if the new packet can be enqueued without causing some packet to be dropped (e.g. if the buffer is full or above a threshold value). If the packet-level admission test fails, then the “packet drop” event is invoked. The default event handler for packet drop uses the tail-drop policy, i.e. the incoming packet is dropped. Thus, by default, the incoming packet is dropped if the queue is overfull. However, event handlers for both the admission test and packet drop events can be modified. We focus on the packet drop
The Protean Programmable Network Architecture
45
event, and compare application-specific services that can replace the default tail drop with head drop, random drop, or priority drop services. For example, an application may introduce head drop for its feedback queue (where more recent feedback gets precedence), random drop for ensuring fairness among flows if multiple flows corresponding to the application are sharing the same queue, and priority drop for packet flows that have some application-specific structure built into them (e.g. MPEG flows, wherein packets corresponding to I-frame, P-frames, and B-frames are in descending order of importance). In a conventional router, any change in the dropping policy would involve updating the kernel code to implement the new policy, recompiling the kernel and finally shutting down the router to boot through the new modified kernel. Whereas, in our active router, we show that using the Protean active network architecture, the dropping policy can be modified on the fly without having to take the router offline. In terms of performance analysis, the later part of this section analyses the active router’s performance on three counts: (i) functional correctness, whether the active router performs according to its current configuration after the configuration has been changed on the fly, (ii) throughput, performance of the active router in terms of throughput when compared with a conventional router which has the new policies built into the kernel and (iii) latency, where the latency suffered by packets in our active router is compared to the latency suffered by packets in a conventional router. We now discuss the precise mechanisms to instantiate application-specific services in the Protean router. When an application wants to customize an event-handler for a VNC, it first notifies the active router about which eventhandler it wants to instantiate. Event-handlers are identified via unique service names. If the active router already has the event-handler, then it updates the (event, event − handler) association for the corresponding event, adding one additional entry corresponding to the VNC id in the event-handler table of the event (see Figure 2). Otherwise, the router contacts the state manager, which is responsible for service dissemination. Service dissemination is achieved through a hierarchy of DNS-like state managers, which maintain locally available services within the network cloud (and where the services are available), and pointers to forward queries for services not available locally. Eventually, an active router seeking a service discovers where it can obtain the service and downloads the corresponding event-handler via ftp-like bulk transfer. Event handlers are loadable modules that are dynamically loaded into the kernel space. The pointer to the event handler in the event table is then updated. Instantiating new services thus incurs a one-time seek-fetch-load overhead. Subsequent service invocations occur in the kernel space and are highly efficient. For this experiment, the initial configuration of the kernel uses a tail drop policy for all queues (as most routers in the internet do). Each of the three queues in the router is then reconfigured on the fly with a different drop policy. The specific policies assigned to the queues were random drop for oq1, head drop for oq2 and priority drop for oq3. The dropping schemes use two local state variables, the flow’s queue and the incoming packet to decide which packet to drop. We
46
Raghupathy Sivakumar et al.
70
200
60 150
50
#
# Packets
40
Frames
Fair Share 100
30
20
50
10 0
0
flow1 flow2
flow1 flow2
flow1 flow2
Flow1 Flow2
I Frames
P Frames
B Frames
With Tail Drop
Fig. 2. Priority Dropping
Flow1 Flow2 With Random Drop
Fig. 3. Random Dropping
now present the performance evaluation through two sets of results, one showing the functional correctness of the “programmed router” and the other showing the overhead in terms of latency induced by the active network component in the router. For the first part, we present three graphs, each showing the functional correctness of the router programmed with priority dropping, random dropping and head dropping respectively. For the priority dropping policy, two MPEG streams (with priorities of I, P and B set to 2, 1 and 0 respectively) were used. The first MPEG stream (flow1) does not use priority dropping while the second stream (flow2) programs the router with a priority dropping mechanism (by injecting the appropriate code). Graph 2 shows the difference between the number of I, P and B frames received by the receivers of flows 1 and 2 respectively. In order to show the functional correctness of the random dropping mechanism, the test performed measured the fairness that two flows enjoy when they share the same queue in the bottleneck router. Graph 3 shows the number of packets that got through for the two flows when a tail drop mechanism (default policy of the router) was used and when a random drop mechanism was used. The graph shows that the fairness improves when the router is programmed to perform random dropping as opposed to tail dropping. Graph 4 illustrates the performance of two flows, one using tail drop and the other using head drop, in terms of the effectiveness of the packets that get through for both the flows. For purposes of this test, the effectiveness of packets increases with the sequence number. The graph shows the net effectiveness observed by the two flows when using the two drop policies respectively. As expected, the effectiveness for the flow using a head drop scheme is much higher than that of the other flow. Graph 5 shows the latency observed by a flow traversing through a conventional router employing a priority drop mechanism and the latency observed by a flow traversing a Protean active router which has been programmed to perform priority dropping. Since the protean event handlers are instantiated in the kernel, once the handler is installed, the latency difference is close to zero. But the separation between the two curves represents the time taken to instantiate the priority drop event handler in the protean router. This time was observed to be around 230 ms.
The Protean Programmable Network Architecture
47
1.4e+06 5000
1.2e+06
4500 4000
Time in microseconds
Flow using head drop
Effectiveness 3500 Factor 3000 2500 2000 1500
600000 400000 Flow traversing normal router
200000
500 0
20
40
60
80
Time t ->
Fig. 4. Head Dropping
5
Flow traversing active router
Flow without head drop
1000
0
230 ms
1e+06 800000
100
120
0
0
20
40 60 Packet sequence number
80
100
Fig. 5. Active Component’s Latency
Summary
In this paper, we have described elements of the Protean active router and state management architecture and presented an illustrative case study of Protean. The key aspects of Protean are the event-driven architecture, the ability to create hierarchical virtual network contexts, and the ability to access and update nonlocal state from specialized services that are invoked at a router. At this point, Protean is still in preliminary stages of design and development, and several important issues are still to be resolved. However, we believe that the architecture has some features of interest and may offer some new perspectives on the key issues of state management and flexible service creation in active networks.
References 1. D. L. Tennenhouse and D. J. Wetherall. Towards an Active Network Architecture. Computer Communication Review, 26(2), April 1996. 38 2. Jonathan M. Smith. et. al. The SwitchWare Active Network Architecture. IEEE Network Special Issue on Active and Controllable Networks, 12(3):29–36. 38 3. R. Sivakumar, N. Venkitaraman, V. Bharghavan. A Scalable Architecture for Active Networks. TIMELY Group Research Report, 1999. 42, 44 4. S. Guha and S. Khuller. Approximation algorithms for connected dominating sets. Tech. Rep. 3660, Inst. for Adv. Computer Studies, Dept. of Computer Science., Univ. of Maryland, College Park, June 1996. 42
A Dynamic Pricing Framework to Support a Scalable, Usage-Based Charging Model for Packet-Switched Networks Mike Rizzo, Bob Briscoe, J´erˆome Tassel, and Konstantinos Damianakis Distributed Systems Group, BT Labs Martlesham Heath, Ipswich IP5 3RE, England {michael.rizzo,bob.briscoe,jerome.tassel,konstantinos.damianakis}@bt.com
Abstract. We describe a dynamic pricing framework designed to support a radical approach to usage-based charging for packet-switched networks. This approach addresses various scalability issues by shifting responsibility for accounting and billing to customer systems. The ultimate aim is to create an active multi-service network which uses pricing to manage supply and demand of resources. In this context, the role of the dynamic pricing framework is to enable a provider to establish ‘active tariffs’ and communicate them to customer systems. These tariffs take the form of mobile code for maximum flexibility, and the framework uses an auditing process to provide a level of protection against incorrect execution of this code on customer systems. In contrast to many active networks proposals, the processing load is moved away from routers to the edge of the network.
1
Introduction
As the Internet continues to grow and evolve into a global, multi-service network, the issue of how to charge fairly and sensibly for network services is becoming increasingly relevant. The flat-rate charging model, currently used by virtually all ISPs worldwide, relies heavily on characteristics of the present best-effort Internet which may no longer be valid in the near future. For example, the inability of the present Internet to offer differential services means that it does not make sense to speak of higher prices for better services. And the bandwidth limitations associated with dial-up connections provide a convenient cap on resource usage by any one individual, thereby protecting providers’ routers from being hogged by a single user at the expense of other users. It is envisaged that the current best-effort Internet will gradually be replaced by a network that can offer differential levels of network service that are better suited to the individual needs of specific applications. For example, a video-on-demand application might make use of a high-bandwidth, low-jitter, reservation-based service, whilst email would continue to use best-effort service. Furthermore, it is expected that access bandwidth for end-users will increase in order to enable the provision of high quality multimedia services. In this scenario, the flat-rate charging model gives rise to the anomalous situation wherein Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 48–60, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Dynamic Pricing Framework
49
users that make heavy demands on network resources are charged the same amount as users that make lighter demands. Moreover, the service offered to lighter demand users is likely to be impaired by the provision of services to higher demand users. This situation is likely to result in either convergence back towards a single-service network (where users will always request the best service possible), or denial of service to users whenever network resources are operating at maximum capacity. It makes sense, therefore, to abandon flat-rate charging in favour of a usagebased model in which there is a relationship between price and resource usage. Indeed it is possible that such a model might also have been considered for the current best-effort Internet, were it not for the substantial increase in operational complexity involved. Whilst flat-rate charging is extremely easy to implement, usage-based charging requires that some form of usage accounting be carried out before a charge can be computed. It is generally accepted that the additional operational cost associated with such accounting is substantial due to the increase in processing power that is required, not only to cope with accounting processes as such, but also to compensate for the the blocking nature of the measurement process, which has a negative effect on throughput. Consequently usage-based models are not yet considered viable, and many proposals have focused on compromise solutions based on aggregation [3,9,8,11]. As part of a project investigating radical approaches for operational support systems, we are investigating the possibility of lowering the operational cost of usage-based charging by shifting responsibility for billing to the users themselves. We propose that users measure their own traffic, and compute their own bills using tariffs supplied by the provider. This spreads the load so that each processing unit uses a near-negligible amount of resources for billing purposes, all the more so when one considers that most users’ machines spend much of their time in an idle state. It also allows network routers to focus on their principal function without sacrificing throughput. Using this approach it is possible to charge users differentially on the basis of both volume and quality of services. This gives providers a degree of control over resource usage, because tariff structures can be determined so as to give users an incentive to use the minimal amount of resources that meets their requirements. Furthermore, finer-grain control over supply and demand management of network resources can be achieved by price variation along the lines of established economic supply and demand principles. At times when resources are in short supply, demand is curbed by raising service prices. Conversely, at times when resources are under-utilised, demand is stimulated by lowering prices. If the decision-making involved in changing prices is (partially) automated1 , then the result is an intelligent active network which performs its own supply and demand management. 1
In this paper we limit ourselves to describing a general framework which can support this concept. It is beyond the scope of the paper to present arguments related to the desirability or extent of automated decision-making in this regard.
50
Mike Rizzo et al.
This approach immediately raises several questions, particularly with respect to trust, stability, security, and user acceptance. Some of these questions are briefly covered in sections 2 and 3, and are expanded on in another paper [6]. The principal focus of this paper, however, is the framework which enables a provider to communicate tariffs and price variations to its customers. Following a broad overview of our approach to charging in Section 2, Section 3 outlines the issues of specific concern to tariff representation, dissemination, and application. Section 4 describes a prototype that was developed to demonstrate the approach, and to gain experience in the choice of suitable implementation techniques. Section 5 follows with indications for further work. Finally Section 6 concludes with some general implications for active networks.
2
Background
We assume a packet-switched network in which a variety of network services are made available to users. The exact nature of these services, and the specific characteristics that form the basis upon which they might be differentiated, are not important for our purposes, and may vary from one provider to another. However we assume that, in general, service usage may be measured on the basis of packet counts, and may be classified using some notion of quality of service, irrespective of whether this is reservation-based [4] or class-based [2].
Fig. 1. Radical charging model: processes and flow of information
In the proposed charging model, the customer system is responsible for accounting for usage under the instruction of the provider. The provider supplies tariffs for each of the available services, along with other information pertaining to their application e.g. how frequently they should be applied. The customer’s system measures and categorizes both inbound and outbound traffic, applies the
A Dynamic Pricing Framework
51
appropriate tariffs for each category of traffic, and periodically sends accounting reports to the provider. The customer’s system might also be responsible for making payments, although this may be delegated to some other entity. The various processes and data flows are depicted in Fig. 1. This model clearly places a lot of trust in customer systems. Our view is that this does not pose a problem, as long as the provider is able to check up on a sample of its customers from time to time. A random audit function may be employed by the provider’s accounting process to make measurements pertaining to a particular customer at the provider end, and verify that the customer’s accounting reports tally with the observations made. The model is not targeted solely at the edge of a packet network, but is intended to be applied recursively throughout the network. Thus an access provider might be the customer of a larger provider, which may in turn be the customer of a backbone provider. A multi-host edge customer might also employ a similar model within its network in order to recover costs. Whilst it is likely that charge for network use will be uni-directional at the edge of the network, this is not the case in general. The distinction between provider and customer becomes somewhat blurred as one approaches the core of the network. There is also a charging issue related to the direction of traffic: should a chargeable entity pay for packets sent, packets received, or both? In general there are four possible charges between two entities A and B: – – – –
A charges B A charges B B charges A B charges A
to to to to
send packets to it; receive packets from it; send packets to it; receive packets from it.
An entity, therefore, can assume the roles of both provider and customer with respect to some other entity. We allow charging for any combination of the above to allow maximum flexibility with respect to traffic direction when establishing charging policies and tariffs.
3
The Tariffing Subsystem
Having outlined the general principles of our charging model, this section focuses on the role of the tariffing subsystem, which comprises: – establishment and adjustment of tariffs on the provider side; – dissemination of tariffs and adjustments to customer systems; – application of tariffs by customer systems to local measurements. The remainder of this section characterizes the requirements attached to this role, setting the scene for the subsequent section.
52
3.1
Mike Rizzo et al.
The Nature of Tariffs
We assume that a provider may set a separate tariff for each category of service that it offers, and that for a particular category of service, a tariff may change periodically. We do not exclude the possibility that a tariff may change frequently e.g. in response to changing traffic patterns. However, we distinguish between two kinds of change, namely replacing a tariff and adjusting a tariff. The former implies substitution of an old tariff by a new one, whilst the latter involves ‘tuning’ an existing tariff. We envisage that tariff adjustments will occur more frequently than tariff replacements. There is a clear distinction between ‘tariff’ and ‘price’. A tariff is responsible for determining a price with respect to a set of given contextual parameters. It is therefore possible for a price to change without there being a change to the tariff that determines it. For example, a traditional PSTN tariff might offer one price for peak hours, and another price for off-peak hours. Here price varies according to the time of day, but the tariff remains constant. Continuing with this example, a tariff adjustment might involve changing the off-peak price, or perhaps the times of day at which off-peak is considered to start. If an altogether different tariff structure is required e.g. due to the introduction of a new discount scheme, then a tariff replacement is required. It is a goal of the model to allow maximum flexibility in the structure of tariffs. Ideally it should be possible for tariffs to be modelled on complex rules. For example, a provider may wish to deploy a tariff for best-effort traffic which operates such that customers are penalized if their systems do not back off in the presence of congestion. It is also desirable to put as much intelligence as possible into tariffs, so as to avoid frequent transmission of tariff changes to customers. This is particularly relevant at times when network congestion is high, in which case a tariff should be capable of making price adjustments without having to receive explicit instruction from the provider. 3.2
Supply and Demand Management
Our model allows on-the-fly changes to prices, and can be used to support to supply and demand management wherein prices fluctuate on the basis of current demand. This concept may come across as too radical to some. However it is worth pointing out that many people are quite happy to purchase variable rate mortgages, or invest in the stock market. And just as other people pay a fee for a fixed-rate mortgage, or are prepared to commit themselves to a safer long-term savings plan, it is quite conceivable that they will be prepared to pay for their price to be kept fixed, or for price variations to be constrained in accordance with some pre-defined contract. The notion that there may be different charging schemes for a given service category leads us to the concept of a product. For example, a product A might offer best-effort service at a fixed price, whilst another product B might offer best-effort service at a variable price. It is envisaged that a provider will adjust product prices on the basis of observations it makes with respect to:
A Dynamic Pricing Framework
– – – –
53
the prices it is being offered by its own providers; competitors’ prices; current resource utilisation; relative demand for different products e.g. the price for a particular product might be lowered so as to entice users to switch to it.
Price adjustments can be effected in one of three ways: – A tariff may be able to adjust prices on the basis of observations made by local monitoring, without necessitating explicit communication from the provider. This requires foresight at the time the tariff is designed, and is limited to those price variations which are dependent exclusively on observations local to the customer system. – The provider may tune a tariff by adjusting some of its parameters. This kind of adjustment is required when the decision is dependent on observations which cannot be made by customer systems e.g. variations in the prices offered to the provider by its own providers, and the changes required can still be accomodated by the present tariff. – The provider may replace a tariff. This is required when the present tariff cannot accomodate the changes that are required. The first of these is by definition an automated decision. The second may be performed both manually or by an agent that issues adjustments on the basis of observations made by the provider system. The third is likely to be performed manually, as replacement of a new tariff represents a major change in business strategy. In particular, creation of a new tariff involves an element of design which can only be sensibly carried out by a human with expertise in economics. However, it is possible that given the availability of a repertoire of tariffs, an agent might be employed to automatically switch tariffs for a product on the basis of a set of specified rules. Given the possibility of frequent, on-the-fly changes, it is important that customers have some way of knowing what is going on. It is difficult to construct a customer user interface that can convey the workings of a tariff if the tariff is not known at the time the customer software is deployed. It is therefore desirable that the rules that define tariffs are accompanied by user interfacing suggestions that could somehow be used by the customer system 2 .
4
Implementation
This section describes a prototype that we implemented to demonstrate the tariff subsystem outlined above. The key features of our design include: 2
This is intended for feedback purposes only. We envisage that the customer system will also employ an agent (potentially supplied by a regulator) to monitor tariffs for the purposes of verifying that they are within the bounds of the contract.
54
Mike Rizzo et al.
– using mobile code to represent tariffs and associated graphical user interface (GUI) components; – use of a repeated multicast announcement protocol to communicate tariffs and tariff adjustments efficiently; – using dynamic class loading and reflection in order to receive and tune tariffs. The prototype comprises two applications, namely: – a provider system which allows the provider to introduce, replace, and tune tariffs for a number of products; – a customer system that enables customer to keep track of the charges being applied for the products they are using. The provider system is intended to serve multiple instances of the customer system running on different hosts in a multicast-enabled network. A multicast protocol is used to communicate tariff data to customer systems. 4.1
Tariff Representation
Fig. 2. UML description of tariff definition framework
In order to maximize flexibility with respect to the definition of tariffs, we chose to represent tariffs using Java classes. This technique also proved useful for supplying custom-built GUI components to support visualisation of tariffs. Figure 2 illustrates the framework within which tariffs are defined. The Tariff interface acts as the base class for all tariffs. This defines a single operation getGUI() which returns a Java SWING component that can be incorporated into the customer’s GUI. The intention is that this GUI component will enable the customer to visualise the behaviour of the tariff using the most appropriate user interfacing techniques for that tariff. Interfaces derived from Tariff establish a set of tariff types, each of which is associated with a different set of measurement parameters. These parameters are identified by listing them in the signature of the getCharge() method. For example, the interface RSVPTariff defines getCharge() as receiving an RSVP TSPEC, allowing for the definition
A Dynamic Pricing Framework
55
of tariffs that compute price on the basis of the characteristics of an RSVP reservation [1]. Another interface, PacketCountTariff, defines getCharge() as receiving measurements of packets in, packets out, and current congestion (typically measured as a function of packet drop), allowing for the definition of tariffs that are dependent on packet counts and sensitive to congestion. Tariffs are defined by providing implementations of tariff interfaces. For example, PacketCountLinear implements PacketCountTariff to compute charges in proportion to packet counts. CongestionSensitiveLinear works on a similar basis, but adds a penalty charge if the customer does not stay within specified traffic limits in the presence of congestion. A tariff implementation may make use of other ‘helper’ classes to assist it in its operation, as well as one or more GUI component classes for customer visualisation purposes. A GUI may also be required to enable the provider to make tariff adjustments. A complete tariff description then, consists of a set of Java classes, some of which are destined for the customer system and others which are intended for use by the provider system. The customer-side classes are bundled into a Java JAR file to facilitate loading by the provider system. 4.2
Tariff Dissemination and Adjustment
In order to deploy a new tariff, the provider system first loads the tariff classes which it requires into its execution environment. It then loads the customerside bundle, serializes it, signs it with a private key (to enable authentication by customers), and uses an announcement protocol to distribute it to customer systems. Upon receiving the bundle, each customer system verifies the signature, unpacks the bundle, and loads the classes into its execution environment using a purpose-built dynamic class loader. An instance of the received tariff class is created and installed in place of the previous tariff. If the tariff has a GUI component (obtained by calling the tariff object’s getGUI() method), then it replaces the GUI of the previous tariff. The change in GUI serves to notify the user that the tariff has changed. Tariff adjustment involves the remote invocation of an operation which is specific to the tariff currently in force. This means that a customer system cannot know the signature of this operation in advance of receiving the tariff i.e. the operation will not be listed in any of the tariff interfaces known to the customer system. In order to get around this problem, use is made of the reflection feature supported by Java. In order to disseminate a tariff adjustment, the provider creates an instance of an Invocation object, which stores the name of the operation to be called, together with the parameters that are to be supplied to it. This object is then serialized, signed, and announced using the announcement protocol. When an adjustment is received and verified by a customer system, the Invocation object is de-serialized and applied to the current tariff by using reflection to invoke the described operation. In order to simplify the announcement protocol, adjustments are required to be idempotent and complete. Idempotency guarantees that a tariff will not be adversely affected if an adjustment is applied more than once. Completeness
56
Mike Rizzo et al.
implies that an adjustment determines the entire parameter set of a tariff object, so that an adjustment completely removes the effect of any previous adjustments. 4.3
Tariff Application
The customer system applies a tariff by repeatedly invoking the getCharge() operation supported by that tariff every second, and adding the returned value to the cumulative charge. The parameters supplied to getCharge() depend on the kind of tariff currently in force. For example, if the tariff is an implementation of PacketCountTariff, then measurements of inbound packets, outbound packets and congestion over the past second are required. However, if the tariff is an implementation of RsvpTariff, then only a TSPEC describing the current reservation is required3 . Each invocation of getCharge() also results in an update to the tariff-specific GUI e.g. in CongestionSensitiveLinear, the usage parameters supplied to getCharge() are used to update the graphical displays of traffic and congestion. 4.4
Announcement Protocol
The announcement protocol is used to communicate serialized tariffs and adjustments from a provider system to multiple customer systems. The number of customer systems is assumed to be large, and a repeated multicast solution in the vein of SAP [10] is adopted. Each product supported by a provider is assigned a multicast channel for announcement purposes. Customer systems listen to the channels corresponding to the products that they are using. For each product channel, the provider repeatedly announces the current tariff and the most recent adjustment made to it (if any). Each announcement carries a version number, which is incremented each time the announcement is changed. Customer systems only process announcements when a version number change is detected. If a new customer joins a channel, it waits until it receives a tariff before processing any adjustment announcements. Furthermore, an adjustment is only applied if its announcement version is greater than that of the current tariff, thereby ensuring that a missed tariff announcement does not result in the application of a subsequent adjustment to an old tariff. 4.5
Illustration
Figure 3 shows the GUI for the customer-system with the GUI component for the CongestionSensitiveLinear tariff embedded within it. The latter displays information about traffic and congestion levels, and indicates traffic limits which must be observed when congestion is above a specified threshold. The formula used to compute the current price is displayed in the bottom right corner. In the 3
Mention of this tariff is intended purely for illustration purposes, and does not necessarily represent a realistic or sensible way to charge for RSVP reservations.
A Dynamic Pricing Framework
57
Tariff specific GUI component
Fig. 3. Customer interface
case depicted, congestion is above the threshold and the incoming traffic level is above its limit, with the result that a penalty of 0.2 is added to the price. The upper part of the customer GUI displays the current charge being applied (per second), and the total charge accumulated by the customer. The GUI also allows the user to specify the public key to be used for authentication purposes, and shows details of the multicast address being listened to for announcements. The provider system GUI consists of a set of product windows, each allowing control over a particular product. Figure 4 shows the provider-side window corresponding to the product being used by the customer system shown earlier. The lower part of the window contains the provider-side GUI component for the CongestionSensitiveLinear tariff. Using this interface, the provider can manually adjust the parameters associated with the current tariff. Any adjustments are communicated to customer systems using the announcement protocol, and are immediately reflected in customer-side GUI components. The upper part of the product window allows the provider to replace the tariff currently associated with that product. Each tariff is fully described by a policy, which contains such details as a tariff name, a descriptive string, and more importantly the location of the JAR file on the provider’s file system. Once a new policy has been selected, the ‘Activate’ button injects the corresponding tariff into the network, instantly replacing the existing tariff for that product.
58
Mike Rizzo et al.
Tariff specific GUI component
Fig. 4. Product window (in provider GUI)
5
Further Work
To date we have focused primarily on the establishment of general principles related to dynamic pricing, and the technical infrastructure required to support these principles. However, we have not identified those specific configurations of the framework which are economically viable or socially acceptable. This can only be achieved by a combination of rigorous modelling, and experimentation with user trials. We intend to continue to develop the existing infrastructure into a testbed for experimenting with different tariffing schemes and for conducting user trials in order to gain experience with relevant human factors issues. We are currently working on improving on a number of aspects of our current implementation, particularly with respect to the announcement protocol. Currently this makes a number of assumptions which are not valid in general. For example, it assumes that a tariff will fit in a single datagram. At best this leads to a packet fragmentation problem, but at worst it means that larger tariffs cannot be announced. There is also a problem in that well-known announcement addresses are expected to be known in advance by customers. This does not give the provider any flexibility with respect to channel assignments e.g. the provider may wish to move a product to share a channel with another product if it observes that a large number of customers are using both products simultaneously. Last but not least, there are a host of timing issues which need to be addressed e.g. working with multiple independent physical clocks.
6
Concluding Remarks
The dynamic pricing framework described in this paper demonstrates an active network approach to demand and supply management of network resources. This is relevant to the debate over whether overprovisioning is likely to be more costeffective than rationing of resources in a multi-service network [12,5]. By lowering the operational cost of usage-based charging, and by providing an infrastructure
A Dynamic Pricing Framework
59
within which resource rationing mechanisms can be adjusted and fine-tuned as required, many of the arguments against resource rationing are invalidated. Additionally, our experience with the dynamic pricing framework has some interesting bearings on the general areas of active networks and mobile code. One important point relates to the fact that active networks need not rely solely on the processing capacity of provider equipment. In particular, for applications with high processing loads, it may be possible to shift much of this load right up to the very edge of network, using multicast technology for efficient deployment of mobile code. Furthermore, it may be possible to exercise some control over core network elements as a side-effect of mobile code deployed to the edge of the network. In this respect, the dynamic pricing framework provides an interesting contrast to mainstream thinking on active networks, where the emphasis is normally on deploying mobile code to network routers. Another point relates to the well-known security problem concerning the protection of mobile code from malicious or erroneous execution platforms. The sample-based auditing approach adopted in our charging model does not represent a complete solution to this problem, but is a reasonable compromise which can detect some cases of abuse whilst acting as a deterrent in general. The class loader used for deployment of mobile code in our implementation differs substantially from other approaches to dynamic loading of remote classes in Java, as exemplified by Bursell et al [7]. Instead of loading each class individually from a remote class repository using a request-reply ‘pull’ protocol, we employ a ‘push’ approach in which a bundle of classes is delivered to the receiver in a single transaction. The receiver can then load all the classes without having to access the network. This is useful in situations where the sender determines which classes the receivers should be loading, and has the advantages that the effects of network latency are minimized, and that multicast may be employed to push bundles to several receivers simultaneously.
References 1. S. Berson, R. Lindell, and R. Braden. An architecture for advance reservations in the internet. Technical report, USC Information Sciences Institute, July 1998. 55 2. S. Blake et al. An architecture for differentiated services. Request for Comments (Proposed Standard) 2475, Internet Engineering Task Force, December 1998. 50 3. Roger Bohn et al. Mitigating the coming internet crunch: multiple service levels via precedence. Technical report, University of California, San Diego, Nov 1993. 49 4. R. Braden et al. Integrated services in the internet architecture: an overview. Request for Comments (Proposed Standard) 1633, IETF, Jun 1994. 50 5. Lee Breslau and Scott Shenker. Best-effort versus reservations: A simple comparative analysis. In Proceedings of SIGCOMM ’98, Vancouver, 1998. 58 6. Bob Briscoe et al. Lightweight, end to end usage-based charging for packet networks, 1999. http://www.labs.bt.com/projects/mware/charging.htm. 50 7. M. H. Bursell et al. A mobile object workbench. In Mobile Agents ’98, 1998. 59 8. David D. Clark. A model for cost allocation and pricing in the internet. In MIT Workshop on Internet Economics, March 1995. 49
60
Mike Rizzo et al.
9. Jon Crowcroft. Pricing the internet. In IEE Colloquium on Charging for ATM (ref. no. 96/222), pages pp1/1–4, November 1996. 49 10. M. Handley. SAP: Session announcement protocol. IETF Draft, Nov 1996. 56 11. Frank P. Kelly. Charging and Accounting for Bursty Connections, pages 253–278. MIT Press, 1997. 49 12. Andrew Odlyzko. The economics of the internet: Utility, utilization, pricing, and quality of service. In Proceedings of SIGCOMM ’98, Vancouver, 1998. 58
Active Information Networks and XML Ian Marshall1, Mike Fry2, Luis Velasco1, and Atanu Ghosh2 1
BT Labs, Martlesham Heath, Ipswich, IP5 3RE [email protected], 2 UTS, Sydney, NSW2007, Australia atanu,[email protected]
Abstract. Future requirements for a broadband multimedia network are discussed and a vision of the future network is presented. Three key needs are identified; rapid introduction of new services, dynamic customisation of services by clients, and minimal management overhead. Application layer active networking, perhaps the most pragmatic and immediately realisable active network proposal, is a potential solution to all three. Combining eXtensible Markup Language and Application Layer Active Networking yields strong benefits for networked services. A Wide range of applications can be developed based on the flexibility of XML and the richness of expression afforded by the metadata. A system of network intermediaries based on caches, which are also active and driven by XML metadata statements, is described.
1
Introduction
The characteristics and behaviour of future network traffic will be different from the traffic observed today, generating new requirements for network operators. Voice traffic will become another form of data, most users will be mobile, the amount of traffic generated by machines will exceed that produced by humans, and the data traffic will be dominated by multimedia content. In the medium term the predominant multimedia network application will probably be based around electronic commerce capabilities. Operators will therefore need to provide a low cost service, which offers an advanced global trading environment for buyers and sellers of any commodity. The e-trading environment will be equipped with all the instruments to support the provision of a trusted trading space. Most important is the ability to support secure transactions over both fixed and mobile networks. Networks will thus need to be robust, contain built in security features and sufficiently flexible to address rapidly evolving demands as other unforeseen applications become predominant.Existing networks are very expensive, and the deployment of new communication services is currently restricted by slow standardisation, the difficulties of integrating systems based on new technology with existing systems, and the overall system complexity. The biggest cost is management. The network of the future will Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 60-72, 1999. Springer-Verlag Berlin Heidelberg 1999
Active Information Networks and XML
61
need to be kept as simple as possible by using as few elements as possible, removing duplication of management overheads, minimising signalling, and moving towards a hands off network. The simplest (and cheapest) current networks are multiservice networks based on powerful ATM or IP switches. New transport networks are designed on the basis that nearly all applications will eventually use internet-like connectionless protocols. The difficulties of adding services such as multicast and QoS to the current internet demonstrate that even these simpler IP based networks will require additional mechanisms to enhance service flexibility. The simple transport network will thus need a flexible service surround. The service surround will provide a trusted environment, with security features (for users, applications and hosts), QoS support, application specific routing, automatic registration and upgrade for devices connected to the transport network. It will also enable Network Computing facilities such as secure gateways, application layer routers, cache/storage facilities, transcoders, transaction monitors and message queues, directories, profile and policy handlers. Such a service surround will likely be based on some form of distributed middleware, enabling the features to be modular and interoperable. The service surround must enable rapid introduction of new features by the operator. In order to minimise the management overhead clients will directly control which features should be used for a particular session, without operator intervention. The network will thus need to know nothing of the semantics of the session. To achieve this a middleware based on some form of active services or active networks will be required.
2
Active Networks (Tennenhouse)
Active networking was originally [TENN] a proposal, by Tennenhouse at MIT, to increase network flexibility by adding programmes, that are intended to run on network devices that the packet encounters, to the packet header. This is referred to as the capsule approach. There are a number of problems; The maximum transport unit (MTU) size in the internet is typically 565 bytes. This will likely be upgraded to 1500 bytes in the near future, however it is clear that if there is to be a programme embedded in every packet the programmes must be very small, even if the programme is not confined to the header. This severely restricts the flexibility that can be offered, although it has been shown that copy instructions to emulate multicast can be embedded in packet headers in some circumstances. It has been proposed that only those packets initiating flows should carry programmes. However, it is common for the packets in an individual flow (such as a document retrieval) to use multiple routes across the internet. This is a result of the routers having the freedom to use the best available route at any time, so as to maximise network resiliency. Therefore, in order for a programme to be applied to all packets in a flow, either the route for all subsequent packets in the flow must be pinned, so that all packets flow through the node where the programme was loaded, or, the programme must be copied to all nodes on valid routes. The second option is clearly impractical. The first option is currently not possible, and in any case creates an undesirable reduction in network resilience.
62
Ian Marshall et al.
The proposal envisages programmes being supplied by network clients. However service operators will never permit third party programmes to run on their equipment without a strong guarantee that the programme will not degrade performance for other users. Such a guarantee requires the programmes to be written in a language in which behaviour is verifiable through pre-run checks, resource usage can be tightly controlled and termination is guaranteed. The Safetynet [WAK] project at Sussex University in the UK is designing a promising language, but the research is still at a very early stage. Since it will be extremely hard to create interesting programmes in a language which is simple enough to enable resource control and termination guarantees, the flexibility offered by this approach is probably somewhat limited, even when the language is mature. The programmes are intended to be added to the switch control kernel in the router. All the known approaches to making the kernel extensible degrade performance. Packets which do not require router programming will thus suffer an unacceptable performance penalty. Undesirable interactions between programmes and network features are almost impossible to predict and control. For example a mobile client will potentially send programmes to several routers where they are used once, then do not receive the acknowledgement packets that would terminate the programmes as the acks are routed to the clients current location. Standards for the interface offered by active routers must be developed before any service based on this proposal could be offered. Appropriate standards are not even being discussed at present. Despite the manifest difficulties inherent in this proposal it has succeeded in highlighting an important requirement, and stimulating discussion amongst a previously disparate community of researchers attempting to develop more immediately realisable means to resolve the requirement. The main threads are summarised in the next section.
3
Active Networks and Services
The first response to Tennenhouse was a somewhat different flavour of active networking, in which the packets do not carry programmes but transport layer header flags indicating the desirability of running a programme [ALEX97]. This approach attempts to resolve the issue of restricted programme size, and potentially gives network operators the freedom to choose an appropriate programme of their own which has been tested. However, the proposal makes no progress on the last 3 issues, and the range of flags could not be large as the space available in the transport layer header is tiny. This proposal has impacted the IETF diffserv activity which is enabling QoS in ip networks by adding flags to transport headers, one of which is an active tag. The second response came from the programmable network community who had for some time been looking to make networks more active with respect to operators [LAZ] and had progressed to a concept called “switchlets” [ROO] in order to avoid requiring operator intervention for all programmable changes. Switchlets enable clients to control their own VPNs by downloading their own control software onto a designated subset of the switch. This is only a partial solution as it only provides
Active Information Networks and XML
63
flexibility within a VPN for a single large customer. Programmable interfaces are being standardised in IEEE P1520 [BIS] Smith and co-workers at University of Pennsylvania and Bellcore are working on a proposal (switchware [ALEX98]) which combines programmable packets, switchlets and a safe language (PLAN). This could be regarded as a realisable version of Tennenhouse, but only in the long term. We have proposed a third alternative, known as application layer active networking [ALAN], which is perhaps the most immediately realisable. Similar proposals [AMIR,PARU] were described as active services. In these systems the network is populated with active nodes referred to as service nodes, or dynamic proxy servers. These can be thought of as equivalent to the http caches currently deployed around the internet, but with a hugely increased and more dynamic set of capabilities. They are logically end systems rather than network devices. This approach relies on redirecting selected packets into an application layer protocol handler, where user space programmes can be run to modify the content, or the communication mechanisms. Packets can be redirected using a single active packet tag in the transport layer header, or on the basis of the mime type in the application layer header. There is no need for additional flags or for any new standards (indeed many implementations use the ubiquitous http), and an arbitrarily large number of programmes, of arbitrary size can be used. The programmes can be selected from a trusted data source (which may itself be a cache) containing only well tested or specified programmes, and can be run without impacting router performance or requiring operator intervention since they do not impact the control kernel for normal packets. Programmes can be chosen to match the mime type of the content (in the application layer header), so again no additional data or standards are required. Alternatively a more detailed specification can be supplied in xml metadata, if desired. There is a small performance penalty associated with the redirect operation, but this is acceptable for most applications. The major outstanding issue is the interactions between dynamically loaded programmea, and this should be a priority for ongoing research. There is a further proposal [CAO] that allows servers to supply cache applets attached to documents, and requires proxies to invoke the cache applets. Although this provides a great deal of flexibility, it lacks important features like a knowledge sharing system among the nodes of the network (It only allows interaction between the applets placed in the same page). The functionality is also severely restricted by the limited tags available in HTML (HyperText Markup Language). Most importantly the applets must be supplied by the content source and cannot necessarily be trusted. Clients do not have the option of invoking applets from trusted 3rd party servers. Using mime-types (as in ALAN) provides more flexibility than HTML tags, but still restricts the range of applications that can be specified by content providers, as, different operations are often required for content with identical mime types. It is therefore necessary to find a better way to specify new services. XML provides a very promising solution, since the tags are extensible and authors can embed many different types of objects and entities inside a single XML object. For example policies describing resource and security requirements can be expressed and transferred with the object. In this paper we present a design for a modified ALAN based on XML, and describe how it could be used to provide a customer driven QoS
64
Ian Marshall et al.
routing capability . We also demonstrate feasibility of the use of XML by implementing and measuring a simple example service.
4
ALAN and XML
Our design is built in several layers and is based on existing technology. Figure 1 shows the architecture of the prototype. The first layer is a fully populated cache hierarchy, with caches placed at all domain boundaries and network bottlenecks. We envisage active nodes and caches being co-located since the optimal sites for most of the activities proposed for active networks are domain boundaries. In addition it is advantageous to maintain a cache of programmes required by the active services at an active node and a web cache is a convenient implementation. For the prototype, we have used squid v1.19 [WESS] for the cache. The second layer and the upper layers constitute the core of our system and will be discussed thoroughly within this paper. An Application Layer Active Network Platform (ALAN) implements the active services. One of these services is an XML parser that provides the functionality to handle metadata associated with objects.
XML ACTIVE NETWORK
ACTIVE SERVICES
XML PARSER
ACTIVE NETWORK (ALAN)
CACHE NETWORK (SQUID) Fig. 1. shows the architecture or the prototype. The first layer is the cache network. For the prototype, we have used squid v1.19 [WESS97] for the cache. The second layer and the upper layers constitute the core of our system and will be discussed thoroughly within this paper. An Application Layer Active Network Platform (ALAN) implements the active services. One of these services is an XML parser that provides the functionality to handle the active objects and Metadata.
The ALAN Platform is a Java RMI based system built by the co-authors from the University of Technology, Sydney in collaboration with BT-Labs to host active services. It provides a host program (Dynamic Proxy Server) that will dynamically
Active Information Networks and XML
65
load other classes (Proxylets) that are defined with the following interface: Load, Start, Modify, Stop [ALAN]. The platform provides a proxylet that analyses the HTTP headers and extracts the mime-types of the objects passing through the machine (HTTP Parser). After determining the mime-type of the object, the program chooses a content handler, downloads the appropriate proxylet from a trusted host to handle that mime-type, and starts the proxylet with several parameters extracted by the HTTP parser. Using this model, a wide range of interesting services can be provided. However, this original model cannot support the whole range of services we plan to implement. There is a need for additional data (not included in the HTTP headers) to manage interoperability among the services and to expand the flexibility and range of applications that can be developed. XML provides a mechanism to implement these improvements and appears a perfect complement to the architecture.
ACTIVE CACHE
MACHINE SERVER
Cache Program
Dynamic Proxy Server
HTTP Parser Content handler Proxylet
Trusted Proxylet Server
Fig. 2. Functionality of an active Node. The functionality of an active node (figure 2) is described as follows. Upon the arrival of an object into the node, the HTTP parser examines the header and gets its corresponding Mime-Type. If the object is an XML object, then the XML Parser is called and it will extract the Meta-Data. The metadata specifies which proxylets should be invoked, in which order, with which parameters, and under what circumstances. The parser then makes the appropriate calls to the DPS, which loads the proxylets.
The BT co-authors built a simple XML Parser in Java that works in collaboration with a new HTTP parser designed to utilise all the Metadata needed in the active applications. The original HTTP Parser [ALAN] has been completely rewritten by BT in order to integrate the XML parser seamlessly into the processing. The functionality of an active node (figure 2) is described as follows. Upon the arrival of an object into the node, the HTTP parser examines the header and gets its corresponding Mime-Type. If the object is an XML object, then the XML Parser is
66
Ian Marshall et al.
called and it will extract the Meta-Data. The metadata specifies which proxylets should be invoked, in which order, with which parameters, and under what circumstances. The parser then makes the appropriate calls to the DPS, which loads the proxylets.
5
QoS Routing
Modern networks must optimise the management of communication among nodes that are interconnected by diverse and alternate paths. These paths may be based on heterogeneous, technologies with divergent properties. This makes smart path choice an essential feature in order to provide quality services (QoS). QoS management could be based on in-band datagrams (e.g. class of service based routing) or on out-of –band reservations (flow based routing). Both approaches have their adherents in the network research community. QoS-based routing has been recognised as a missing piece in the evolution of QoS-based service offerings in the Internet and is the subject of a range of standardisation efforts in the IETF including: Integrated Services - QoS - plus ISSLL Resource Reservation - RSVP Traffic Engineering Differentiated Service - Class of Service based An interesting application that highlights many of the issues is the aircraft services application illustrated in figure 3. There are two basic services; communication with the flight deck and e-commerce/www access for the passengers. The aircraft (Node 1) has a 64kbit/s radio based bi-directional link to the control tower and a VSAT based downlink (2Mbit/s). The control tower can use wide-area connectivity to establish a high bandwidth link to the aircraft using ATM and Satellite combined. Path selection is performed at the Control Tower by examining the meta-information of the packets, deciding which is the most appropriate path and adding security if needed. There will also need to be QoS management in the plane to ensure life critical info from the flight deck is delivered into the bandwidth restricted downlink before any traffic originating from passengers. To illustrate the power of an active information network of the kind we have described we have designed a QoS routing scheme which is suitable for the above application. The scheme requires no new standards and could be implemented entirely in the user space of network nodes. For this application QoS can be characterised in terms of bandwidth, latency, security, strength of guarantee and uni or bi-directionality. Current Internet routing protocols, e.g. OSPF, RIP, use "shortest path routing", i.e. routing that is optimised for a single arbitrary metric, administrative weight or hop count. These routing protocols are also "opportunistic," using the current shortest path or route to a destination [CRAW]. Our aim is to enable route decisions to be made on multiple metrics and fixed for the duration of the flow. For example a node could have a connection via landline with low latency, low bandwidth and high security, and a satellite connection with high bandwidth, high latency and low security. The choice of best route will be application specific. Given access to local route information obtained through link
Active Information Networks and XML
67
state adverts, nodes can make QoS decisions if the application requirements are also available. In our design the application requirements are expressed in XML metadata. The XML Metadata is rich enough to express the application layer requirements and force a correct choice. Using our scheme it also allows return traffic to be correctly routed using an appropriate proxylet at the head end of the satellite link Satellite Downlink
Satellite Uplink Plymouth
Node 1 Aircraft
ATM Link
Radio Link
Node 2 Control Tower Brussels Terrestrial Link
Fig 3. The aircraft application is an excellent example that shows diverse communication paths based on different physical layers: Aircrafts (Node 1) will have a radio based bidirectional link to the control tower. Meanwhile, the control tower can use COIAS wide area connectivity to establish a high bandwidth link to the aircrafts using ATM and Satellite combined. Path selection is performed at the Control Tower by examining the meta-information of the packets, deciding which is the most appropriate path and adding security if needed. Users can use meta-information to mark their packets with their needs of bandwidth, latency, needed degree of security and needed degree of guarantee. This meta-information will help the system to classify the packets so a smart path selection. By doing this, the packets will be routed using the optimum path.
The policy syntax is based on the syntax for IPSEC security policies, where a set of fields are associated with a particular security association (or degree of security). This enables routing policies and security policies to be handled in the same way. For the routing policies the fields are; APP_TYPE, CONTENT_TYPE (MIME), BANDWIDTH, LATENCY, GUARANTEE, DUPLEX. The fields are associated with 6 tuples each containing the value of the field required by the associated content, and the priority (on a scale of 1 – 10) of obtaining that value. The QoS router will intercept all socket requests from application layer processes, read all policies in the policy database relevant to that process, check the current path data and choose the output port which matches the most policy criteria. The criterion weighting is used to distinguish between alternates matching equal numbers of different criteria. Security criteria are regarded as mandatory – if no match is available the QoS router will either
68
Ian Marshall et al.
directly request user intervention (in an end system) or will deny the request (in an intermediate node). In the latter case the proxylet will request input from the session source or the preceeding active node (which may just use an alternate route). In an initial implementation the QoS router would be a proxylet and would be invoked by direct calls from other proxylets requiring QoS routing services. We anticipate that, for performance reasons, if it proved popular it would be rapidly reengineered as a layered protocol intercepting all socket calls. In a retrieval session the content source would supply an XML object with its policies. The XML Object will be parsed, at any active nodes in the path, where the routing policies will be extracted and any other necessary proxylets will be started. The QoS router will then be able to choose the best available route for the associated flow. Any downstream active nodes can obviously perform further local route optimisations in a similar manner. For an interactive session the initiator would supply a session definition formatted as and XML object. DPS
Application/Proxylet
XML Parser
QoS Router Path Data
Policy Data
TCP/IP IPSEC
. Fig 4 Active QoS routing nodeThe design of a QoS routing node is illustrated in figure 4. The path data is information about the QoS available on all local output addresses/ports. It is essentially a local routing table with added fields for measurement of delay, occupancy, loss rates etc. The measurements can be obtained by filtering link state adverts or using ping measurements. The policy data is a collection of policies regarding application requirements extracted by the XML parser from XML metadata.
6
Implementation of an Example & Results
In order to get some preliminary performance measurements of the architecture we implemented advert rotation driven by XML metadata policies. The objectives were to demonstrate the feasibility of XML policies and show the weak parts of the implementation in order to improve releases in the future.
Active Information Networks and XML
69
Studies show that advert banners that are dynamically rotated so the same HTML page can show different adverts for each request, are very popular. These dynamically created HTML pages are just slight modifications of an original template page. The changes usually consist of sets of graphics of the same size that will appear consecutively in the same position of the page. To achieve this, the server executes a cgi program that generates the HTML text dynamically. This dynamic behaviour tends to make this content un-cacheable. It is preferable to make simple dynamic pages containing rotating banners cacheable since this will allow a distributed service, eliminate the critical failure points and improve the usage of the existing bandwidth.
Object loaded on demand of the XML Object
REDIRECTED SERVER
Dynamic Proxy Server
HTTP
Final Object
XML Object
ORIGINAL SERVER
Parser
XML Parser Proxylet Advert Proxylet Loaded by the XML object
The XML parser extracts the information embedded in the object. Basically, it will extract the URL for the object to be treated, the URL of the proxylet needed and the commands for it.
Trusted Proxylet Server
Advert Proxlet
Fig. 4. As the object is requested and passes through the active node, it will be parsed in the http parser and then by the XML Parser. This analysis will extract the generic html page that is going to serve as a static template for the rotated adverts, the list of images which should be rotated and the rotation policy. The information is used as a parameter list in the invocation of a rotator proxylet, which will download the objects as needed. Subsequent requests for the page will be passed to the proxylet by the cache at the active node, and the proxylet will execute the rotation policy.
Our experiment consisted of running an active node on a Sun Sparc Station 10 with 64 MB running SunOS Release 5.5.1. The Java version for both programming and testing was JDK 1.1.6. The active node program was started and was already running the Proxylets needed for HTTP and XML Parsing. We conducted 20 experiments. For the first ten, the whole process ran each time a new advert rotation was requested, the advert proxylet was loaded. The subsequent ten utilised caching. When a new request arrived, a proxylet that was already running was used. We measured the times needed to accomplish the different tasks. The numerical results of these experiments are shown in figures 6 and 7 below. The analysis tries to show the times needed to
70
Ian Marshall et al.
perform the processes and tasks during the normal operation of the system. The functionality of these processes is described as follows: HTTP Parsing. Time needed to determine analyse the HTTP header and determine the Mime-Type. XML Parsing. Time needed to get the XML Object, parse it and extract all the Metadata Embedded. URL Download. Time needed to download the HTML. Proxylet Call. Time need to generate the query to load the Proxylet in our Active Node. Proxylet Download. Time needed to download and start the proxylet; it requires a lot of time because of the ALAN platform design. Advert Rotation. Time needed to perform the demanded task. In this case the advert rotation.
Fig. 6. The graph shows the results of the experiments when the service was cached. The proportion of time spent downloading the proxy has disappeared. The URL download time can vary depending the object to be downloaded and the bandwidth to the server. In our testing, all the objects are available in our LAN so we can expect greater values for this part of the process in wide area tests. However this increment will only be important for the first request, thereafter the URL object is cached and is made locally available.
The most imporant variable is the times due to the additional processing of the proxylets. It appears that the XML-Parse Proxylet and the Advert Rotator Proxyler are taking most of the time. Nevertheless the total delay is below one second. We can expect better results if a faster computer is used as a server with a non-interpreted language. However the purpose of this paper was to demonstrate the feasibility of active caching nodes based on XML and throughoutput was not a priority. This prototype shows that it is possible to provide active services with delays of just several hundred milliseconds.
Active Information Networks and XML
71
Fig. 7. illustrates the difference in the delay between the experiments that needed the proxylet to be downloaded and started and the experiments where the proxylet was already downloaded and running in the active node
7
Future Work
In the immediate future we intend to build and test the QoS routing service outlined in this paper. In addition there are three major longer-term issues, which we are concentrating our efforts on solving. It would be beneficial to specify behaviour and other non-functional aspects of the programmes requested in the metadata, using a typesafe specification language. One possibility is to use SPIN, which is c-like and has some useful performance and time related primitives. The use of a language of this kind would provide greater flexibility and interoperability, and go some way towards solving our second issue. For complex services it will be necessary to invoke several proxylets. At present the question of how multiple proxylets interact, and how interactions such as order dependencies can be resolved, is open. We anticipate attempting to use metadata specifications (using the language from issue a)) to maximise the probability of avoiding problems. The performance and scalability of the DPS is currently far from ideal. The Sydney co-authors [ALAN] are addressing these issues with a new implementation, and we anticipate significant improvements will be available shortly.
8
Conclusions
Application layer active networks will play a crucial role in networked applications that can tolerate a delay of around a hundred milliseconds. They will extend the
72
Ian Marshall et al.
functionality and versatility of present networks to cover many future customer needs. HTTP caches help reduce the use of bandwidth in the Internet and improve responsiveness by migrating objects and services closer to the client. They are also ideally placed to evolve into the active nodes of the future. XML is a perfect complement to Application layer active networks based on http caches, since it will allow active nodes to be driven by enriched Metadata requests and at the same time will introduce the mechanisms for sharing knowledge between nodes. We have implemented a prototype, which demonstrates that XML offers greater flexibility and expressivity than HTTP tags or MIME types, without significant performance penalty. The performance of our system is not yet ideal, however, results can easily be improved by using a non-interpreted language and a more powerful server.
References [ALAN] “Application Layer Active Networking” M. Fry and A. Ghosh,, Fourth International Workshop on High Performance Protocol Architectures (HIPPARCH '98), June 98. http://dmir.socs.uts.edu.au/projects/alan/prog.html [ALEX97] “Active Bridging” Alexander, Shaw, Nettles and Smith, Computer Communication Review, 27, 4 (1997), pp101-111 [ALEX98] “A secure active network environment architecture” D.S.Alexander et al, IEEE Network 1998 [AMIR] “An active service framework and its application to real time multimedia transcoding” E.Amir, S.McCanne, R.Katz, Proc SIGCOMM ’98 pp178-189 [BIS] “The IEEE P1520 standards initiative for programmable interfaces” J.Biswas et.al., IEEE Comms. Oct 1998 pp64-72 [CAO] “Active Cache: Caching Dynamic Contents (Objects) on the Web” P. Cao, J. Zhang and K. Beach Proc middleware ’98 (Ambleside). [CRAW] “A Framework for QoS-based Routing in the Internet”. E. Crawley. R. Nair. B. Rajagopalan. H. Sandick. Copyright (C) The Internet Society (1998). ftp://ftp.isi.edu/in-notes/rfc2386.txt. [ERIK] "MBone - The Multicast Backbone", Eriksson, Hans, INET 1993 [LAZ] “Programming Telecommunication Networks” A.Lazar, IEEE Network Oct 1997 pp 2-12 [PARU] “Active Network Node Project” G.Parulkar et.al Washington University St Louis [ROO] “Tempest: A framework for safe programmable networks” S.Rooney et.al., IEEE Comms Oct 1998 pp42-53 [TENN] “Towards an active network architecture”. Computer Communication Review, 26,2 (1996) D. Tennenhouse, D.Wetherall. [WAK] “Designing a Programming Language for Active Networks” I.Wakeman et.al. HIPPARCH ‘98 [WESS] “Configuring Hierarchial Squid Caches”, Duane Wessels AUUG'97, Brisbane, Australia. [XML] Extensible Markup Language (XML) W3C Recommendation 10-February1998 http://www.XML.com/aXML/testaXML.htm.
Policy Specification for Programmable Networks Morris Sloman and Emil Lupu
Department of Computing, Imperial College London SW7 2BZ, UK {m.sloman,e.c.lupu}@doc.ic.ac.uk
Abstract. There is a need to be able to program network components to adapt to application requirements for quality of service, specialised application dependent routing, to increase efficiency, to support mobility and sophisticated management functionality. There are a number of different approaches to providing programmability all of which are extremely powerful and can potentially damage the network, so there is a need for clear specification of authorisation policies i.e., who is permitted to access programmable network resources or services. Obligation policies are event triggered rules which can perform actions on network components and so provide a high-level means of ‘programming’ these components. Both authorisation and obligation policies are interpreted so can be enabled, disabled or modified dynamically without shutting down components. This paper describes a notation and framework for specifying policies related to programmable networks and grouping them into roles. We show how abstract, high-level policies can be refined into a set of implementable ones and discuss the types of conflicts which can occur between policies.
1
Introduction
Networks have to become more adaptable to cater for the wide range of user devices ranging from powerful multi-media workstations to hand-held portable devices. A convergence is taking place between telecommunications and computing so networks are increasingly being used to transport voice, video, fax as well as data traffic. Future personal digital assistants will include mobile phones and Web-enabled mobile phones are beginning to appear. There is a need to reconcile the perspectives of the telecommunication and computing communities in new dynamically programmable network architectures that support fast service creation and resource management through a combination of network aware applications and application aware networks. It is necessary to be able to dynamically program the resources within a network to permit adaptive quality of Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 73-85, 1999. Springer-Verlag Berlin Heidelberg 1999
74
Morris Sloman and Emil Lupu
service management, flexible multicast routing from multiple sources for applications such as video conferencing, intelligent caching and load distribution for Web servers or to perform compression and filtering when traversing low bandwidth wireless links. These types of application specific functions need to be dynamically programmed within the network components in order to support flexible and adaptive networks. The main objective is to speed up the slow evolution of network services by building programmability into the network infrastructure itself [1]. There are a number of approaches to supporting Programmable Networks: Active Networks – the packets traversing the network contain normal data plus programs which may invoke switch and router operations [2]. Example uses include setting up multicast routing groups or fusion of data from many different sensors into larger messages to traverse the network to the data sink. This is essentially programming at the IP level and is often limited to routing or filtering. It has inherent security risks which can be alleviated by the use of ‘safe’ languages or executing the programs in a controlled environment such as an associated processor rather than the main processor within a network component. Mobile Agents – agents containing code and state information traverse multiple nodes within a network in order to perform functions on behalf of users e.g., an email to voice converter which follows a mobile phone user [3]. This type of programming is generally associated with hosts or servers connected to the network rather than switches or routers but could also be used to set up specific routing tunnels [4]. Management Interface – network components provide a management interface which facilitates a limited form of programming of components by invoking operations to change their behaviour [5]. This is really provided for the use of network managers but some operations may be made available to managers of valueadded, third-party service providers or even user applications. For example, there could be service creation and service operation interfaces to support various virtual network, multicast or multimedia services. IEEE are standardising an Applications Programming Interface for Networks [http://www.ieee-pin.org/]. Management by Delegation – is a means of downloading management code to be executed within network components to perform functions such as complex diagnostic tests on specific nodes [6]. This is an extension to the Management Interface approach as it supports remote execution of code rather than just remote operation invocation. Code delegation is usually performed by network managers but could be used to load specific filtering or compression code onto an access gateway on behalf of an application or user. The advent of Java has made it easier to implement portable ‘elastic agents’ into which code can be loaded dynamically. Interpreted Policy – there has been recent interest in bandwidth management policies which specify who can use network resources and services based on time of day, network utilisation or application specific constraints [7]. Most of the previous work on policy has been related to management of distributed systems and networks [8,9]. Authorisation policies specify what actions a subject is permitted or forbidden to perform on a set of target objects. Obligation policies specify what actions must be performed by a subject on a target. Policies can be used to modify the behaviour of network components so can be considered a ‘constrained’ form of programming [8].
Policy Specification for Programmable Networks
75
There is no single universal solution to programmability of networks and the various approaches can be used to perform complementary functions, although there is some overlap between them as a particular functionality could be implemented using more than one approach. In addition, these are all very powerful facilities which can easily destroy the normal working of the network so it is necessary to specify authorisation policies to define who can program specific components and what programming operations they can access. The obligation policies are event triggered rules which result in actions being performed. This can be considered a ‘constrained’ form of programming in that policies can be dynamically modified but can only call predefined actions. Policies can be used to define the event conditions and constraints for invocations on a management interface, or loading or executing code in an elastic agent. Thus, policies are complementary to the other approaches described above. This paper focuses on the specification of policies for the adaptability and security needed in programmable networks. Section 1 outlines how objects can be grouped in domains in order to apply a common policy. Sections 2 and 3 discuss the policy notation and implementation, followed by some of the conflict detection and resolution issues. Section 5 introduces roles as a means of grouping policies which specify the rights and duties of managers. Policies for the configuration and management of network devices are not specified in isolation but derived from business objectives and requirements, so section 6 addresses the refinement of policies from an abstract description to implementable rules. Related work and conclusions are presented in sections 7 and 8.
1
Domains & Directories
In large-scale systems it is not practical to specify policies for individual objects and so there is a need to be able to group objects to which a policy applies. For example, a bandwidth management policy may apply to all routers within a particular region or of a particular type. An authorisation policy may specify that all members of a department have access to a particular service. Domains provide a means of grouping objects to which policies apply and can be used to partition the objects in a large system according to geographical boundaries, object type, responsibility and authority or for the convenience of human managers [8,10]. A domain does not encapsulate the objects it contains but merely holds references to object interfaces. It is thus very similar in concept to a file system directory but may hold references to any type of object, including a person. A domain, which is a member of another domain, is called a sub-domain of the parent domain. Object and sub-domains may be a member of multiple parent domains and may have different local names in each one of them. For example, in Fig. 1, the 2 ‘bean people’ and sub-domain E are members of both B and C domains, which therefore overlap. Details of domains are described in [8,10].
76
Morris Sloman and Emil Lupu
A
D
E B
C A
Sub-Domains and Overlapping Domains
B
C
D
E
Domain Hierarchy (without member objects)
Fig. 1 Domains
Path names are used to identify domains, e.g., domain E can be referred to as /A/B/E or /A/C/E, where ‘/’ is used as a delimiter for domain path names. Policies
normally propagate to members of sub-domains, so a policy applying to domain B will also apply to members of domains D and E. Domain scope expressions can be used to combine domains to form a set of objects for applying a policy, using union, intersection and difference operators, e.g., a scope expression @/A/B + @/A/C - @/A/B/E would apply to members of B plus C but not E, and @/A/B ^ @/A/C applies only to the direct and indirect members of the overlap between B and C. The ‘@’ symbol selects all non-domain objects in nested domains. An advantage of specifying policy scope in terms of domains is that objects can be added and removed from the domains to which policies apply without having to change the policies. However, objects have to be explicitly included in domains. It is not practical to specify domain membership in terms of a predicate based on object attributes but a policy can select a subset of members of a domain, to which it applies, by means of a constraint in terms of object attributes (see section 2). We have implemented our own domain service but we are redoing this for an LDAP (Lightweight Directory Access Protocol) directory service [11]. However, although LDAP supports the concept of an alias as a reference to an object in another domain, it does not permit objects to be members of multiple directories.
2
Policy Notation
A precise notation is needed for system administrators and (technical) users to specify the network policies related to the applications or services for which they are responsible. This notation is the means of ‘programming’ the automated agents in network components which interpret policy but can also be used to specify higher level abstract policies or goals which are interpreted by humans or are refined into implementable policies [12,13,14]. Another reason to have a precise notation is that policies may be specified by multiple distributed administrators so conflicts between policies can arise. Our notation can be analysed by tools to detect and, in some cases, resolve conflicts. Implementable policies are directly interpreted by automated manager and access control agents, which are (potentially) distributed, so we do not use logical deduction in order to analyse the state of the system.
Policy Specification for Programmable Networks
77
Authorisation policies define what activities a subject can perform on a set of target objects and are essentially access control policies to protect resources from unauthorised access. Constraints can be specified to limit the applicability of both authorisation and obligation policies based on time or values of the attributes of the objects to which the policy refers. x1 A+ @/NetworkAdmin {PolicyObjType: load(); remove(); enable (); disable ()} @/Nregion/switches
Members of the NetworkAdmin domain are authorised to load, remove, enable or disable policies in Nregion/switches. The ‘;’ separates permitted actions. x2 A- n: @/test-engineers {performance_test()} @/routers when n.status = trainee
Trainee test engineers are forbidden to perform performance tests on routers. Note the use of a constraint based on subject state information x3 A+ @/Agroup + @/Bgroup {VideoConf (BW=2, Priority=3)} USAStaff – NYgroup when (16:00 < time < 18:00)
Members of Agroup plus Bgroup can set up a video conference (bandwidth = 2 Mb/s, priority = 3) with USA staff except the New York group, between 16:00 and 18:00. Note the use of a time-based constraint. Obligation policies define what activities a manager or agent must or must not perform on a set of target objects. Positive obligation policies are triggered by events. x4 O+ on video_request(bw, source) @/USGateway { router:bwreserve (bw); log(bw, source)} @/routers/US
This positive obligation is triggered by an external event signalling that a video channel has been requested. The object in the USGateway domain first does a bwreserve operation on all objects of type router in the /routers/US domain and then logs the request (assume to an internal log file) i.e., operations specified in a policy can be on external objects or internal operations in the agent. The ‘;’ is used to separate a sequence of actions in a positive obligation policy. x5 O- n:@/test-engineers { DiscloseTestResults() } @/analysts + @/developers when n.testing_sequence == in-progress
This negative obligation policy specifies that test engineers must not disclose test results to analysts or developers when the testing sequence being performed by that subject is still in progress, i.e., a constraint based on the state of subjects. The general format of a policy is given below with optional attributes within brackets. Some attributes of a policy such as trigger, subject, action, target or constraint may be comments (e.g. /* this is a comment */ ), in which case the policy is considered highlevel and not able to be directly interpreted. identifier mode [trigger] subject ‘{’ action ‘}’ target [constraint] [exception] [parent] [child] [xref] ‘;’
The identifier is a label used to refer to the policy. The mode of the policy distinguishes between positive obligations (O+), negative obligations (O-), positive authorisations (A+) and negative authorisations (A-). The trigger only applies to positive obligation policies. It can specify an internal timer event using an at clause, as in x5 above, or an every clause for repetitive events. An external event is defined using an on clause, as in x4 above, where the video_request event passes parameters bw and source to the agent. These events are
78
Morris Sloman and Emil Lupu
detected by a monitoring service. The policy notation only specifies simple events as a generalised monitoring service can be used to combine complex event sequences to generate simple events [16]. The subject of a policy, defined in terms of a domain scope expression, specifies the human or automated managers to which the policies apply. The target of a policy, also defined in terms of a domain scope expression, specifies the objects on which actions are to be performed. Security agents at a target’s node interpret authorisation policies and manager agents in the subject domain interpret obligation policies. The actions specify what must be performed for obligations and what is permitted for authorisations. It consists of method invocations or a comment and may list different methods for different object types. An authorisation policy indicates the set of operations which are permitted or forbidden while the multiple actions in a positive obligation policy are performed sequentially after the policy is triggered. The constraint, defined by the when clause, limits the applicability of a policy, e.g. to a particular time period as in policy x3 above, or making it valid after a particular date (when time > 1/June/1999). In addition, the constraint could be based on attribute values of the subject (such as in policy x2 above) or target objects. In x2, the label n, prepended to the subject, is referenced in the constraint to indicate a subject attribute. An action within an obligation policy may result in an operation on a remote target object. This could fail due to remote system or network failure so an exception mechanism is provided for positive obligations to permit the specification of alternative actions to cater for failures which may arise in any distributed system. High-level abstract policies can be refined into implementable policies. In order to record this hierarchy, policies automatically contain references to their parent and children policies. In addition, a cross-reference (xref) from one policy to another can be inserted manually, e.g., so that an obligation policy can indicate the authorisation policies granting permission for its activities (see Section 6).
3
Policy Implementation Issues
The policy service provides tool support for defining and disseminating polices to the agents that will interpret them. Policies are implemented as objects which can be members of domains so that authorisation policies can be used to control which administrators are permitted to specify or modify policies stored in the policy service. Query subjects & targets
Policy Editor Enable policy
Policy Service
Query targets
Domain Service
O+/ O- policies Manager Agent
Register Monitoring Service
A+/A- policies
Perform actions
Notify (event)
Fig. 2 Policy Enforcement
Target Objects Domain
Policy Specification for Programmable Networks
79
An overview of the approach to policy enforcement is given in Fig. 2. An administrator creates and modifies policies using a policy editor. He checks for conflicts, and if necessary modifies policies to remove the conflicts (see Section 4). Authorisation policies are then disseminated to target security agents as specified by the target domains and obligation policies to manager agents as specified by the subject domains. Policies may be subsequently enabled, disabled or removed from the agents. Manager agents register with the monitoring service to receive relevant events generated from the managed objects. On receiving an event which triggers one or more obligation policies, the agent queries the domain service to determine target objects and performs the policy actions, provided no negative obligations restrain it. Fig. 3 shows a policy agent which interprets obligation policies. It is application specific in that there can be agents for quality of service management which are different from those used for security management, for example. Each class of agent has predefined management functions which are accessible from the policies. These functions may result in operations on remote target objects or can be internal to the agent. The functionality of an agent could be dynamically modified using Management by Delegation techniques to load new code, but this has not been implemented in our prototype. More details on the syntax, and implementation issues of the policy service can be found in [12,13,14]. Generic Interface
Load, Remove, Enable, Disable, policies
Policies
Application specific, predefined management functions
Java Interpreter CORBA interaction service
Application Specific Interface Operations on target objects
Events
Fig. 3 Obligation Policy Agent
4
Policy Conflicts
In any large inter-organisational distributed network, policies are specified by multiple managers, possibly within different organisations. Objects can be members of multiple domains so multiple policies will typically apply to an object. It is quite possible that conflicts will arise between multiple policies. There are two types of conflicts which we will consider – modality and semantic conflicts [15]. Modality Conflicts − are inconsistencies which may arise when several policies with modalities of opposite sign refer to the same subjects, actions and targets. Therefore, these conflicts can be determined by syntactic analysis of polices. There are three types of modality conflicts: § O+/O- subjects are both required and required not to perform the same actions on the target objects.
80
Morris Sloman and Emil Lupu
§ A+/A- subjects are both authorised and forbidden to perform the actions on the target objects. § O+/A- subjects are required but forbidden to perform the actions on the target objects. Note that O-/A+ is not a conflict, but may occur when subjects must refrain from performing certain actions as specified by a negative obligation, even though they are permitted to perform the actions, as in policy X5 in Section 2. It is possible to resolve these conflicts automatically by assigning a priority to individual policies, but meaningful priorities are notoriously difficult for users to assign and may result in arbitrary priorities which do not really relate to the importance of the policies. Inconsistent priorities could easily arise in a distributed system with several people responsible for specifying policies and assigning priorities. Our approach has been to permit more specific policies to have precedence – a policy applying to a sub-domain overrides more general policies applying to an ancestor domain. Our tools analyse the policies within a domain to indicate conflicts for an administrator to resolve and allow precedence to be enabled or disabled. We are investigating techniques for specifying other forms of precedence – in some situations negative authorisation policies should have precedence over positive ones, more recent policies over older ones or perhaps policies applying to short time-scales over longer (background) ones. Semantic Conflicts and Metapolicies − while modality conflicts can be detected purely by syntactic analysis, application-specific conflicts arise from the semantics of the policies. For example, a conflict may arise if there are two policies which increase and decrease bandwidth allocation when the same event occurs. Similarly, policies related to differentiated services which define to which queues specific types of packets should be allocated, must not result in 2 different queues to which the packet should be allocated. These conflicts for resources or conflicts of action are application specific and cannot be detected automatically without a specification of what is a conflict i.e., the conflicts are specified in terms of constraints on attribute values of permitted policies. We call these constraints metapolicies as they are policies about which policies can coexist in the system or what are permitted attribute values for a valid policy.
5
Roles
Organisational structure is often specified in terms of organisational positions such as regional, site or departmental network manager, service administrator, service operator, company vice-president. Specifying organisational policies for people in terms of role-positions rather than persons, permits the assignment of a new person to the position without re-specifying the policies. The tasks and responsibilities corresponding to the position are grouped into a role associated with the position (which is essentially a static concept in the organisation). The position could correspond to a manager or a user of a network or services. A role is thus the position, and the set of authorisation and obligation policies defining the rights and duties for that position. Organisational positions can be represented as domains and we consider a role to be the set of policies (the arrows in Fig. 4) with the Position Domain as
Policy Specification for Programmable Networks
81
subject. A person or automated agent can then be assigned to or removed from the position domain without changing the policies as explained in [17]. Role Authorisation & Obligation Policies
Position Domain (Subject)
Target Domains & Managed Objects
Role Fig. 4 Management Roles
Although the concept of role was originally defined to apply to people, it can also be used to group the authorisation and obligation policies that apply to a particular type of network component as a subject e.g., an edge-router that interconnects the local network to the service provider or a core-router providing a backbone service. It is possible that similar hardware and software is used for both core and edge routers and so assigning a particular router to a role will define the set of policies which are loaded onto that router. Another example is a mobile agent which is assigned to a visiting agent role when it is received at a network node. This could specify what resources it can access and what actions it must perform on arrival and departure. There are additional extensions to the concepts of roles described in [18,19]. These define inter-role relationships in terms of interaction protocols and concurrency constraints on the ordering of obligation actions. Furthermore, an object model for the specification of policy templates and role classes which uses inheritance to implement specialisation has also been defined. However, these issues will not be discussed further in this paper.
6
Policy Refinement
High-level abstract policies are often specified as part of the business process and express requirements from the communication network. These requirements are specified as management goals which cannot be directly interpreted by automated components and hence, must be refined into functional policy specifications or be implemented manually by human managers. We express abstract policies in the same notation as implementable policies, however the policy attributes (subjects, actions, etc.) may be written in natural language. For example, a high-level policy may be written as: T1 O+ @/NetworkManagers {/* provide adequate video conference set up */} @/users/groupA when 14:00 < time < 15:00
Network managers must provide an adequate video conference set up for groupA users between 14:00 and 15:00.
82
Morris Sloman and Emil Lupu
In order to achieve this goal it is necessary to refine policy T1 into bandwidth management policies, authorisation policies and further administrative policies to enable or disable special policies which might apply during these hours. For example: Administrative policies T2 O+ at 13:55 @/NetworkManagers { enable() } @/policies/BandwidthControl + @/policies/QoSmonitoring T3 O+ at 15:00 @/NetworkManagers { disable() } @/policies/BandwidthControl + @/policies/QoSmonitoring Network managers must enable at 13:55 (T2) and disable at 15:00 (T3) special
bandwidth control and QoS monitoring policies. Authorisation policies T4 A+ @/Agroup {VideoConf (BW=2, Priority=3)} @/USAStaff when (14:00 < time < 15:00)
Group A users must be able to set up the video connections (similar to policy x3). T5 A+ @/NetworkManagers { enable(); disable() } @/policies/BandwidthControl + @/policies/QoSmonitoring
Network managers are authorised to enable and disable bandwidth control and QoS monitoring policies. Bandwidth Control T6 O+ on req(bw,chanId) edgeRouter {reduceReservation(bw)} channels/chanId when bw < getReservation(chanId)
Edge routers should decrease the bandwidth reservation on a channel when the request is for less than the amount currently reserved. T7 O+ on req(bw, chanId) edgeRouter {increaseReservation(min(bw, x))} channels/chanId when bw > getReservation(chanId)
Edge routers should increase bandwidth when the request is for more than the amount currently reserved. However, the amount reserved should not exceed x. The refinement of abstract policies into implementable ones must be done by human managers. A positive obligation policy requires related authorisation policies giving subjects the necessary access rights to perform their tasks. Similarly, the refinement of an authorisation policy may include obligation policies defining the measures and counter-measures to be taken in case of security violations. Thus the refinement of a policy does not preserve the policy modality or necessarily apply to the same subjects or targets. For example, while network managers are responsible for ensuring that the adequate quality of service is provided (policy T1), the edge routers are responsible for performing the bandwidth reservations (T7, T8). We currently maintain pointers from an abstract policy to the policies, derived from it, (omitted from the above examples for clarity) but we do not have tools to support the refinement process. We are investigating the use of requirements engineering tools and techniques for refinement and analysis of policies.
Policy Specification for Programmable Networks
7
83
Related Work
There are a number of groups working on policies for network and distributed systems management [9,20,21]. Some of this has been based on our early proposals for policy notation. Another approach is to define policies using the full power of a general purpose scripting or interpreted language (e.g., TCL) and load this into network components. Bos [22] takes this approach to specify application policies for resource management for netlets, which are small virtual networks within a larger virtual network. There is considerable interest in the internet community in using policies for bandwidth management. They assume policies are objects stored in a directory service [7]. A policy client (e.g. a router) makes policy requests on a server which retrieves the policy objects, interprets them and responds with policy decisions to the client. The client enforces the policy by, for example, permitting/forbidding requests or allocating packets from a connection to a particular queue. The IETF are defining a policy framework that can be used for classifying packet flows as well as specifying authorisations for network resources and services [23,24,25]. They do not explicitly differentiate authorisation and obligation policies. A simple policy rule defines a set of policy actions which are performed when a set of conditions becomes true. These conditions correspond to a combination of our events and constraints for obligation policies. Their policy may be an aggregation of policy rules. They have realised policy conflicts can occur, but have not distinguished between modality and semantic conflicts nor do they say how conflicts will be detected. Directories are used for storing policies but not for grouping subjects and targets. They use dynamic groups which can be specified by enumeration or by characterisation i.e., a predicate on object attributes. We can achieve this by means of a constraint on policies within the scope of a domain expression which is a defined set. Defining a group in terms of an arbitrary predicate can be impractical. For example, the group of all Pentium II workstations with memory > 128 Mbytes would require checking millions of workstations on the internet to determine if they are members of the group, which would not be feasible. They have the concept of a role which is defined as a label indicating a function that an network device serves. Roles enable administrators to group the interfaces of multiple devices for applying a common policy. This is similar to our domains although it is not clear how it will be implemented. There is a restriction that their role can be associated with a single policy (which can be as complex as necessary). We think this is very restrictive and unnecessary. In the IETF approach a policy enforcement point queries a decision point to find out which policies apply. Our notation, with explicit subjects and targets permits us to propagate policies to where they are required so we combine decision and enforcement at subjects for obligation policies and targets for authorisation policies. Our policy service disseminates policies to the relevant distributed agents.
8
Conclusions
We have shown that our management policy and role approach, is also very useful for programmable networks. A clear specification of authorisation policy is essential, whatever implementation techniques are being used. The obligation policies can be
84
Morris Sloman and Emil Lupu
used to ‘program’ the network components or combined with other programming approaches to define the events and constraints for performing actions. In any large-scale system, conflicts between policies will occur. We distinguish between modality and semantic conflicts and indicate an approach for specifying what is a semantic conflict as a metapolicy. Where possible, conflicts should be detected at specification or load-time (c.f. type conflicts detected by a compiler), although some conflicts can only be detected at run-time. We have also shown the use of roles for specifying policies for network managers, service users and network components. We have a prototype toolkit which can be used to specify roles and policies. It also performs static analysis for conflicts. We are currently working on extending this to run-time analysis and are investigating the applicability of requirements engineering approaches for refining high level goals into detailed specifications to policy refinement. They also have more sophisticated consistency analysis tools which may be applicable.
Acknowledgements We gratefully acknowledge financial support from the Fujitsu Laboratories and British Telecom and acknowledge the contribution of our colleagues to the concepts described in this paper – in particular Nicholas Yialelis and Damian Marriott.
References 1. Wetherall D., Legedza U., Guttag J.: Introducing New Internet Services: Why and How. IEEE Network, Special Issue on Active and Programmable Networks, July 1998. 2. Tennenhouse D, Smith J, Sincoskie D, Wetherall D, Minden G.: A survey of Active Network Research. IEEE Communications Magazine, 35(1):80-86, 1997. 3. Bieszczad A, Pagurek B, White T.: Mobile Agents for Network Management. IEEE Communications Surveys, 1(1), 1998. www.comsoc.org/pubs/surveys. 4. de Meer, et al.: Agents for Enhanced Internet QoS. IEEE Concurrency 6(2):30-39, 1998. 5. Lazar, A.: Programming Telecommunication Networks. IEEE Network, Sep/Oct 1997, 8-18 6. Goldszmidt, G., Yemini Y.: Evaluating Management Decisions via Delegation. In Hegering H, Yemini Y (eds.) Integrated Network Management III, Elsevier Science Publisher (1993), 247-257. 7. 3COM: Directory Enabled Networking and 3COM’s Framework for Policy Powered Networking. from http://www.3com.com/,1998. 8. Sloman, M.: Policy Driven Management for Distributed Systems. Journal of Network and Systems Management, 2(4):333–360, Plenum Press, 1994. 9. Magee J., Moffett J. (eds.): Special Issue of IEE/BCS/IOP Distributed Systems Engineering Journal on Services for Managing Distributed Systems, 3(2), 1996. 10. Sloman, M., Twidle, K.: Domains: A Framework for Structuring Management Policy. In Sloman M. (ed.): Network & Distributed Systems Management. Addison-Wesley (1994), 433–453. 11. Whal, M., Howes, T.,Kille S.: Lightweight Directory Access Protocol (v3), IETF RFC 2251, Dec. 1997. Available from http://www.ietf.org 12. Marriott, D., Sloman, M.: Management Policy Service for Distributed Systems. 3rd IEEE Int. Workshop on Services in Distributed and Networked Environments, Macau, 2–9, 1996. 13. Marriott, D., Sloman, M.: Implementation of a Management Agent for Interpreting Obligation Policy. IEEE/IFIP Distributed Systems Operations and Management Workshop (DSOM’ 96), L’Aquila (Italy), Oct. 1996.
Policy Specification for Programmable Networks
85
14. Marriott, D.: Management Policy for Distributed Systems. Ph.D. Dissertation, Imperial College, Department of Computing, London, UK, July 1997. 15. Lupu, E., Sloman, M.: Conflicts in Policy-Based Distributed Systems Management. To appear in IEEE Trans. on Soft. Eng., Special Issue on Inconsistency Management, 1999. 16. Mansouri-Samani M., Sloman, M.: GEM: A Generalised Event Monitoring Language for Distributed Systems. IEE/BCS/IOP Distributed Systems Engineering, 4(2):96-108, 1997. 17. Lupu, E., Sloman, M.: Towards a Role-based Framework for Distributed Systems Management. Journal of Network and Systems Management, 5(1):5-30,Plenum-Press, 1997 18. Lupu E., Sloman, M.: A Policy-based Role Object Model. 1st IEEE Enterprise Distributed Object Computing Workshop (EDOC’97), Gold Coast, Australia, Oct.97, pp. 36-47. 19. Lupu, E.: A Role-Based Framework for Distributed Systems Management. Ph.D. Dissertation, Imperial College, Dept. of Computing, London, U.K, July 1998. 20. Koch, T. et al.: Policy Definition Language for Automated Management of Distributed System. 2nd IEEE Int. Workshop on Systems Management, Toronto, June 1996, 55-64. 21. Wies R.: Policies in Integrated Network and Systems Management: Methodologies for the Definition, Transformation and Application of Management Policies. Ph.D. Dissertation, Fakultat fur Mathematik der Ludwig-Maximilians-Universitat, Munchen, Germany, 1995. 22. Bos H.: Application Specific Policies: Beyond the Domain Boundaries. IFIP/IEEE Integrated Management Symposium (IM’99), Boston, May 1999. 23. Strassner J. Elleson, E.: Terminology for Describing Network Policy and Services, IETF draft work in progress, Feb. 1999. Available from http://www.ietf.org 24. Strassner J. Elleson, E., Moore, B.: Policy Framework Core Information Model, IETF draft work in progress, Feb. 1999, Available from http://www.ietf.org 25. Strassner J., Schleimer, S.: Policy Framework Definition Language, IETF draft work in progress, Nov. 1998, Available from http://www.ietf.org
A Self-Configuring Data Caching Architecture Based on Active Networking Techniques Gaëtan Vanet and Yoshiaki Kiriha C&C Media Research Laboratories, NEC Corporation 1-1, Miyazaki 4-Chome, Miyamae-Ku Kawasaki, Kanagawa 216-8555, Japan {vanet,kiriha}@nwk.cl.nec.co.jp
Abstract. This paper presents the design of a new Web cache architecture that uses the active network capabilities to provide a solution to cache dynamic data throughout the network. In our proposal, objects, viewed with the smal-lest granularity, are cached associated with a timestamp. But, instead of considering dates individually, we define some time classes which specify the level of objects timesensitiveness. Each intermediate node is specialised into a unique time class according to its location within the network and caches dynamic data which belongs to corresponding time class. Nodes are shared out among two types ( manager and cache ) and are bound together to define a hierarchical time-sensitive cache tree, the timestamp tree. In our proposed cache architecture, the timestamp tree is automatically reconfigured according to users access history, applications load and network conditions. To achieve such self-configuring cache architecture, we have actually designed five types of capsules.
1
Introduction
The Internet has been created 20 years by universities and research centres to make their work easily acknowledged. At that time, the Internet was a media for exchanging static data, accessed by a small number of users. This network was viewed as a world wide database containing the current state of the art of the scientific community. But for few years, the Internet has become very popular and millions of people browse the Web every day. To limit the overload of networks and applications sites, the cache technology has been introduced few years ago. Accessed Web pages are stored in cache proxies to avoid permanent reloading from origin servers. The last few years have led designers of Web cache infrastructures to develop schemes to store static data, ignoring dynamic ones. In fact, current cache solutions are unsuitable for caching dynamic data, which are always reloaded from their origin server. However, the appearance of new type of services ( online auctions, stocks quotes, sensors mixing, video on-line, … ) arises a new problem to be studied. All these new services require Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 85-96, 1999. Springer-Verlag Berlin Heidelberg 1999
86
Gaëtan Vanet and Yoshiaki Kiriha
dynamic data exchange and the structure of the current Internet has not been designed to fill this requirement. Because of this, the key idea of our study is to provide a solution to cache dynamic data by a self-configuring architecture. Two cache solutions became popular few years ago and are still widely implemented in the Internet : the Harvest [3] and the Squid [5,12] solutions. In both cases, network administrators must configure manually cache proxies taking part into the cache architecture by defining the parents-siblings relationships [12]. These relationships are fixed and valid for all types of Internet applications. Configuring neighbour caches requires co-ordination of both parties and is a burden as membership becomes large. Actually, configurations mainly depend on inevitable human errors and present a lack of adaptability to network and users accesses changes. Figure 1 shows a basic cache configuration of the Squid approach. In this scheme, proxy 4 is configured as the parent of each of the three other proxies. Proxies 2 and 3 are defined as siblings. When proxy 3 receives a request and it cannot answer, it can forward the request to proxies 2 and 4 to get the answer. But proxy 1 is never contacted. Whatever the information it stores, it does not co-operate with proxies 2 and 3 - unless a network manager defines manually this association. Obviously, this static configuration cannot fit for caching dynamic data where caches co-operation should be more flexible.
Users Users Cache Proxy 1
Cache Proxy 2
Cache Proxy 3
Cache Proxy 4 Origin Server
Fig. 1. Model of current cache architecture
The remainder of this paper is organized as follows. Section 2 presents some issues of dynamic data caching. Based on these facts, section 3 details our cache architecture. Then, section 4 explains the reconfiguration algorithm of our cache architecture and discusses the set of capsules we designed to achieve our proposal. Section 5 presents related work. Finally, section 6 ends with concluding remarks.
A Self-Configuring Data Caching Architecture
2
87
Issues of Dynamic Data Caching
Caching dynamic data is not so trivial and designers have to come up against various difficulties. One of them is the feature of dynamic objects themselves. Basically, caching dynamic data is useless because the content of caches becomes out-ofdate very easily and looses the persistency with the origin server. However, one of the purpose of dynamic data caching is to reduce the load of application servers during a high demand period. Even if the time span of data is limited, caching can provide a good help as long as caches are appropriately located. The definition of the time to live of dynamic objects also presents some difficulty. This parameter cannot be the same for all cached objects [13]. Since it should depend on the type of application, users and the cache configuration should take this parameter into consideration. Another problem is the format of cached objects [9]. The current cache solutions store entire Web pages in cache proxies. But concerning dynamic data, this scheme should be changed. Actually, “dynamic Web pages” are mainly composed of static data ( explanations for customers or advertisements ) and some dynamic data. These applications provide dynamic information but mainly exchange static data. These “dynamic applications” also give users the opportunity of selecting few dynamic data on their Web page. Then, storing whole pages is impossible due to the huge number of possible combinations, according to the variety of users interests. Thus, it should be interesting to make the granularity of cached objects to be smaller, from a whole page to a single object. The last problem is where is the most suitable place to cache dynamic data. In the case of static data, objects tend to be stored close to end users to reduce as much as possible transmission delays. The distance to the origin server is not taken into account. However, this scenario is not possible for caching dynamic data because of their short time to live. As it’s written in [13], if an object changes more frequently than it’s accessed, caching it is pointless. This property must be the basis of every dynamic data cache model. If we cache dynamic objects closer to their origin server, we obviously increase the number of hits. But, if we cache items farther into the network, their time to live must be rather long to make dynamic object caching useful. So, there’s a tight relationship between the time to live of cached objects and the distance between the cached data and their origin server.
3
Basic Concept
We consider the situation where many caches are scattered throughout the Internet. Our approach addresses the problem of caching dynamic data in intermediate nodes rather than reloading them from their origin server. From this viewpoint, the following requirements must be satisfied. Firstly, we must ensure that dynamic data will be stored according to their time sensitiveness. Then, we must also ensure that dynamic data will be cached close to shortest paths between users and origin servers. This requirement limits the latency if the cache infrastructure cannot answer. Finally, we must take the distance between the location of a cached object and its origin server
88
Gaëtan Vanet and Yoshiaki Kiriha
into consideration. Indeed, data with short life span should be stored rather close to their origin servers. Data with longer time spans could be cached rather farther into the network.
3.1 Class Categorisation Each dynamic object can be defined by the pair of “value” and “date”. It represents the value that an object had at a specific date. For a specific object, the number of the pair data becomes huge and the possibility of a cache hit becomes small. Then, instead of considering a single date as the basis of reasoning, we rather consider the difference of two dates : the date of the request and the requested date for the object. For instance, if we request, at t=10:00:00, the value an object had at t=9:59:55, this difference will be equals to 5 seconds. The more an object becomes dynamic, the smaller this difference will be. Considering such time difference significantly reduces the number of possible combinations of pair data. If the difference of timesensitiveness between a 4 seconds value and a 6 seconds value is not so important, then it could be interesting to merge them into the same “category”. Therefore, we define the time classes as groups of objects having the same level of time-sensitiveness. For instance, if the time-sensitiveness is included between 0 and 3 seconds, the object belongs to the first class ( class1 ). If the time-sensitiveness is comprised between 3 and 10 seconds, the object belongs to the second class ( class2 ) and so on. The conversion “time to class” is done by intermediate nodes, based on a mapping table common to all Internet applications. Each node is specialised into an unique class and is not allowed to store objects belonging to other classes.
3.2 Manager and Cache Nodes Like the Squid approach, ours is based on a hierarchical architecture where nodes are bound together to make a cache tree. These relationships are based on the location of nodes and the class of data nodes mainly process. Each application, delivering some dynamic data, maintains such a tree, with the server as the source. In our proposal, intermediate nodes are shared out among two types : manager and cache nodes. These two entities are joined together to make up a group. Each group is composed of one manager and many cache nodes. The size of groups is determined by the source of each cache tree ( application server ) during the cache reconfiguration. This decision is done by taking into account the load of the application server and some historical knowledge. The load is estimated by considering, for example, the number of active connections, the number of requests per second or the load of the CPU. The function of the manager is to register, through pointers, all resources ( cached objects ) available in its group. So, a group can store the object only once. These pointers are scattered dynamically, according to the number of requests which nodes were received. Cache nodes, for their part, store objects and their associated value. Each cached object is defined by a tuple : object identifier - value - validity time. The validity time is the date until when the object value is valid, and is specified by the definition of time class.
A Self-Configuring Data Caching Architecture Class 1
Users
2
Manager node
Class 2
89
Origin Server
1
Cache node
7
3
11
4 5 8
6
9
10
Backbone 12 Class 2
Fig. 2. Cache configuration for a two-classes application
In our architecture, each node maintains a requests history which stores the number of requests, per time class, received from the last cache reconfiguration. This will be useful for the cache architecture updating. Furthermore, we have chosen to introduce the idea of manager, as pointers repository, to avoid systematic multicast within groups as it’s widely implemented in the current cache solutions [3,5,12]. This can deal with the problem of data replication and increases the frequency of cache accesses.
3.3 The Timestamp Tree Our cache proposal uses two types of inter-nodes relationships, sibling and parent, to configure a hierarchical cache tree, called timestamp tree. All nodes belonging to a same group are defined as siblings. A parent is a node belonging to an upper class in the cache tree. In our case, cache tree levels are related to object classes : the more dynamic the class is, the higher it will be in the timestamp tree. All nodes have a unique parent. But in the case of first class nodes, their parent is the origin server. An example of cache configuration is illustrated in figure 2. The origin server configures a cache tree comprising two cache classes in a stock quote service application for both professionals and ordinary people. Professionals are only interested in the last quotations ( class1 objects ) while the others will be satisfied with rather out-of-date data ( class2 objects ). Figure 3 illustrates the timestamp tree corresponding to figure 2. However, the class2 group containing nodes 9, 10, 11 and 12 has not included in order not to overload the figure. Whenever a user request is received by node 7, this latter achieves the mapping “time to class” as explained in section 3.1. If the request is a class2 message which node 7 cannot answer to, the request is forwarded directly to its group manager, node 6.
90
Gaëtan Vanet and Yoshiaki Kiriha
Origin Server
Manager node Cache node
Class 1
1
2
3
Parent relationship
4
Sibling relationship
Class 2
5
6
7
8
Fig. 3. TimeStamp Tree example
If the request is still unanswered, the message is directly sent to the origin site. Then, node 6 creates a pointer which means that the object value is stored by node 7. If the request is a class1 message, node 7 forwards it to its parent, node 4, and waits for the answer.
4
Self-Configuring Cache Architecture
In order to fit with the requirements of dynamic data storage, a cache architecture must be dynamic and self-organized, based on users requests and data sensitiveness. In our proposal, the reconfiguration of timestamp trees is invoked immediately after the detection of an overload by intermediate nodes or origin servers. The depth of the reconfiguration depends on the entity detecting this state.
4.1 Algorithm If an application server detects an overload, a cache reconfiguration must be done. During the first stage, the application server defines rules of class membership and also maximum average inter-nodes distance within groups. The rules of membership are based on two criteria : the number of requests received by the node from the last reconfiguration and the distance ( in hops number ) between the node and the application server. Then the application server sends this information to all nodes of the network. This can be done through a flooding algorithm or based on a list of registered nodes, maintained by the server. All nodes define their new position in the architecture. Afterwards, intermediate nodes exchange packets to define the content of all groups. Groups must be defined as wide as possible to limit the data replication problem. At the same time, the average distance between the different node must be as small as possible to limit transmission delay between each member. The balance between these two parameters is specified by the origin server, based on its historical knowledge. All managers are determined as given the widest nodes group, keeping the
A Self-Configuring Data Caching Architecture
91
average distance between members lower than the limit defined by the origin server. After each reconfiguration, pointers and caches are re-initialised. If the overload is detected by an intermediate node, this latter notifies its neighbours about its current state, avoiding a complete cache architecture reconfiguration. As the node cannot provide cache function anymore, the corresponding group must be updated. If this node was a cache node, pointers related to it must be removed in the nearby nodes’ caches. But if the node was a manager node, a local cache reconfiguration is achieved, following the algorithm detailed above. This stage will define a new structure for this group and obviously a new manager. If a new node is inserted into the network, it does not belong to existing timestamp trees. It has to wait for messages, sent by application servers or nearby nodes, to take part into any cache architectures. If a node does not receive any messages from its neighbours, it cannot participate into the cache architecture.
4.2 Active Networking Based System Design The implementation of such a protocol is not so easy and the current IP based networking technology did not allow us to achieve such a purpose. We should have changed routers already implemented in the network, updated them with our protocol and changed the structure of packets. However, through the use of the active network technology [1,2,14], it becomes possible to develop our proposal. This technology allows the capabilities of routers to be extended, beyond the classical and basic IP forwarding and routing. It’s a way to insert into the network new functionality, without changing entities composing this network. Moreover, monitoring functions can be put at the network level, whereas they were only present at the application layer before. In the case of dynamic data caching, this feature is really important because it reduces the general latency of the cache infrastructure. This paper does not discuss the security problem due to the incursion of malicious code into the network. However, University of Pennsylvania [1,2] gives a very suitable answer to this problem. 4.2.1 The Designed Capsules We designed five types of capsules to implement our proposal : • A cache reconfiguration capsule for the application server to invoke a cache architecture reconfiguration ( cf. 4.2.2 ). • A congestion notification capsule for the congested node to notify its neighbours about its state. • A get capsule for nodes to request for an object. • A manager election capsule, exchanged between nearby nodes, to define groups content and the manager for each of them. • A new position capsule for nodes to notify their position to the application server. All these messages have the basic structure represented in figure 4. First of all, the content of the common header is rather similar to IP header. But in the active network area, it must contain a protocol identifier. The IP address of the previous node allows
92
Gaëtan Vanet and Yoshiaki Kiriha
to reload missing functions. The sender of the capsule is the node which achieves the “time to class” mapping. The destination field can be the address of the origin server or other nodes. Finally, the maximum hops number field specifies the time to live of the capsule in hops number. The data field of the capsule can contain information like the distance to the application server or the mapping table. Finally, the program part can perform arbitrary computations, store information in soft-state, create and send packets back out into the network. According to the function of each capsule, some fields can be contained or not. Common header
Protocol Id. 4 bits
IP-address previous node - 32 bits
IP-address sender 32 bits
Data
Program
IP-address destination - 32 bits
Max hops nb. 6 bits
Header checksum 16 bits
Fig. 4. Basic format of capsules
4.2.2 Example : Cache Reconfiguration Capsule Let’s consider the cache reconfiguration capsule as an example. This capsule is sent by application servers to all active nodes of the network to notify a cache architecture reconfiguration must be done. The format of this capsule is described in figure 5. Data Common header
Hops to server 6 bits
Mapping table 21 bits per class
Maximum inter-nodes distance - 32 bits
Program
Fig. 5. Format of a cache reconfiguration capsule
The hops to server field, incremented host by host, computes the distance between the server and the destination node. The mapping table is used to make the conversion between the time requested for the object and the time of the request. The maximum inter-nodes distance defines the maximum length of average distance between any two members of each group. These data are specified by the application server at the creation of the capsule. The program part is shown above. The different threshold, embedded in the program part of the capsule ( 50%, 4 hops, 8 hops, … ), are defined by application servers and constitute the rules of class membership. The other configuration parameters ( the maximum average inter-nodes distance or the distance between the node and the application server ) are contained in the data part of the capsule. The execution of this program makes them stored in the node’s cache.
A Self-Configuring Data Caching Architecture
93
Example of program embedded in a cache reconfiguration capsule : hops_to_server ++; if Already_Received_Capsule then removed ( This ) // The node has not received the reconfiguration capsu// le yet and is located at a suitable distance to the // application server. else if ( hops_to_server < max_hop_number + 1 ) then { forward ( This, Nearby_Nodes ); // Get the requests access history stored by the node. Object Class_Requests = getCache ( ); // Class_Id is the class of the node. // Conditions related to class1 requests if (Class_Requests[1]>50%) AND (hops_to_server<4) then Class_Id = 1; // Conditions related to class2 requests else if (Class_Requests[2]>50%) AND (hops_to_server<8) then Class_Id = 2; else Class_Id = 0; if Class_Id <> 0 then { Node_Type = manager; store ( mapping_table ); store ( maximum_inter_nodes_distance ); Send_To_Nearby_Nodes (Manager_Election_Capsule); }} // The node is located too far away from the applica// tion server. else Removed ( This );
5
Related Work
This section presents previous approaches of distributed and self-organised cache architectures. Even if the Harvest and the Squid solutions [3,5,12] are widely implemented, new researches are done to improve them. Povey and Harrison [10] proposed an architecture where each lowest level node cache object and the upper levels register a set of pointers to lowest nodes containing the real cached resources. This approach is interesting in the sense it makes difference between nodes, depending on their position in the network. Some research focus on the communication and co-operation between nodes. The Cache Digest [11] presents a novel protocol for co-operative Web caching. It could be a good solution to the Internet Cache Protocol [6,7] wherein the request-response delay is reduced by compressing information. This approach is based on bloom filters ( arrays of bits ) which make a compression of all keys with lookup capabilities. Nodes load the Digest of neighbours to know the content of their cache.
94
Gaëtan Vanet and Yoshiaki Kiriha
The appearance of the active network technology gave a new alternative in the cache policy development. Now, it’s possible to implement new protocols into the network without changing already implemented routers, putting the cache function at the network level. Georgia Tech [4] proposes an architecture where each node is associated to small caches, rather than big caches, in a few hand-chosen locations. Nodes, on the path between a client and a server, are partitioned into equivalence classes. Objects are similarly partitioned and can only be cached at nodes belonging to the same equivalence class. Moreover, each node maintains, through a set of pointers, resources cached in its nearby nodes. The M.I.T. [8] proposal is based on pointers which redirect requests to places containing cached objects. The deployment of pointers integrates the active network concept. Each active node exchanges information to know the cache content of its neighbours. Another proposal developed by the M.I.T. [9] deals with a protocol designed for caching dynamic data. In this approach, data values, associated with a timestamp, are cached along the network on the way to go from the application server to customers. This approach is interesting because it deals with dynamic data caching. But it does not provide any mechanisms for inter-node co-operation and raises the problem of data replication in the network. In fact, same data with various timestamps, differing by only one or two seconds, will be cached in many nodes throughout the network. The Adapting Web Caching project [15,16] is developing an architecture for an adaptive, scaleable Web caching system, with mechanisms that could be realised using active networking. Then, even if this proposal is currently developed based on IP multicasting and does not focus on dynamic data caching, it should be interesting to compare it with ours. This approach is based on an overlap of groups defined by IP multicast. Each node periodically exchanges with its neighbours, belonging to the same group, its cache content. Then, each node maintains a set of pointers to nearby nodes. When a request is received by a node, this latter checks if it can answer it or not. If not, it verifies if one of its neighbours can do it. If the request is still unanswered, the node decides if it must forward it to another node or directly to the application server. This decision is done according to the URL routing table maintained by each node. Each URL prefix is associated with one or more identifiers to the next-hop caches or cache groups. When the request hits a group where one node has the requested page, this latter multicasts the response to the group. If we want to apply this solution to dynamic data caching, we will come up against two difficulties : the validity of pointers and the problem of data replication. First of all, our proposal updates pointers through object requests. So, the overload due to this exchange is limited. But in the “adapting Web caching” project, nodes exchange their cache content periodically. In the case of static data, this approach is suitable. But in the case of dynamic ones, nodes should spend a big part of their processing time in notifying their neighbours about their content. Otherwise, pointers could be out-of-date and requests sent to unsuitable nodes, increasing the general latency. Moreover, the proposed system makes a data replication of Web pages towards end users : the more popular Web pages are, the closer to end users they will be stored. Then, dynamic data can flooded the network until end users. In our proposal, this point is limited by the fact that only specific nodes can store an object according to its location in the network.
A Self-Configuring Data Caching Architecture
6
95
Conclusion
The development of dynamic data cache solutions is one of the key for a large acceptance of new type of applications including online dynamic data services. Currently implemented cache solutions, quite efficient for caching static data, are helpless for caching dynamic ones. Through this document, we detailed a new solution for filling in this lack. Our protocol can be viewed as an adaptive protocol in an active network. The adaptive protocol means that the parameters, taken into consideration during the configuration, can changed based on the profile of applications. Our approach proposes a cache hierarchy where dynamic objects are cached according to their time-sensitiveness. The concepts of group and manager significantly reduce the problem of data replication. Furthermore, the cache reconfiguration is dynamic and invoked after the detection of an overload. Our cache architecture, based on the timestamp tree concept, improves the response time of users, especially during high demands periods, by achieving a good balancing of data locations based on their level of time-sensitiveness. We plan to implement a prototype to check the validity of our proposal of dynamic data caching. We believe that we cannot keep the same spirit to cache dynamic data than to cache static ones. Considering Web pages as a whole was a good solution for storing static data. But in the case of dynamic ones, it prohibits from developing suitable solutions. For dynamic applications, Web pages should be considered as static support for dynamic data rather than a complete dynamic object as it’s done in the current cache approaches. The static canvas of Web pages could be cached using current cache solutions and the dynamic part with our proposal. Intermediate nodes could put together these data to construct the final page, displayed on the user’s screen.
References 1. D. S. Alexander, W.A. Arbaugh, M. W. Hicks, P. Kakkar, A. D. Keromytis, J. T. Moore, C. A. Gunter, S. M. Nettles, J. M. Smith : “The SwitchWare Active Network Architecture”, University of Pennsylvania, IEEE Network, May-June 1998. 2. D. S. Alexander, W.A. Arbaugh, A. D. Keromytis, J. M. Smith : “A Secure Active Network Environment Architecture : Realization in SwitchWare”, University of Pennsylvania, IEEE Network, May-June 1998. 3. M. Baentsch, L. Baum, G. Molter, S. Rothkugel, P. Sturm : “World-Wide Web Caching - The Application level view of the Internet”, CS Department, University of Kaiserslautern (Germany), IEEE Communications Magazine, Vol. 35, No. 6, June 1996. 4. S. Bhattacharjee, K. L. Calvert, E. W. Zegura: “Self-Organizing Wide-Area Network Caches”, Networking and Telecommunications Group, College of Computing, Georgia Tech. 5. A. Chankhunthod, P.B. Danzig, C. Neerdaels, M.F. Schwartz, K.J. Worell : “A Hierarchical Internet Object Cache”, Computer Science Department - University of Southern California (3), Computer Science Department - University of Colorado (2).
96
Gaëtan Vanet and Yoshiaki Kiriha
6. K. Claffy, D. Wessels - RFC 2187 - “Application of Internet Cache protocol ICP (version 2)”, National Laboratory for Applied Network Research, May 9th, 1997. 7. K. Claffy, D. Wessels - RFC 2186 - “Internet Cache protocol - ICP (version 2)”, Technical Report, IETF Network Working Group, May 27th, 1997, draft-vesselsicp-v2-03.txt. 8. U. Legedza, J. Guttag : “Using Network-level Support to Improve Cache Routing”, Laboratory for Computer Science, M.I.T., 3rd International WWW Caching Workshop, Manchester, June 1998. 9. U. Legedza, D. Wetherall, J. Guttag : “Improving the Performance of Distributed Applications Using Active Networks”, Laboratory for Computer Science, M.I.T., IEEE INFOCOM, April 1998. 10.D. Povey, J. Harrison : “A Distributed Internet Cache”, School of Information, University of Queensland - Brisbane, Proceedings of the 20th Australasian Computer Science Conference, February 5-7 1997. 11.Alex Rousskov, D. Wessels - “Cache Digests”, National Laboratory for Applied Network Research, Proceedings of the Third International Web Caching Workshop, April 17th, 1998. 12.D. Wessels - Tutorial : “Configuring Hierarchical Squid Caches”, AUUG'97, Brisbane, Australia - http://ircache.nlanr.net/~wessels/Papers/ 13.D. Wessels - “Intelligent Caching for World Wide Web Objects”, University of Colorado, Boulder, Colorado, Proceedings of INET'95 - Hawaii, April 28th, 1995. 14.D.J. Wetherall, J.V. Guttag, D.L. Tennenhouse : “ANTS : A Toolkit for Building and Dynamically Deploying Network Protocols”, Laboratory for Computer Science, M.I.T., Proceedings IEEE Openarch’98, San Francisco, CA, April 1998. 15.L. Zhang, S. Michel, K. Nguyen, A. Rosenstein, S. Floyd, V. Jacobson - “Adaptive Web Caching : Towards a New Global Caching Architecture”, UCLA Computer Science Department (4) - Lawrence Berkeley National Laboratories (2), June 10th, 1998, Third International Caching Workshop. 16.L. Zhang, S. Floyd, V. Jacobson - “Adaptive Web Caching”, Initial proposal, UCLA Computer Science Department (1) - Lawrence Berkeley National Laboratories (2), February, 1997.
Interference and Communications among Active Network Applications Luca Delgrossi1, Giuseppe Di Fatta2,3 , Domenico Ferrari1 , and Giuseppe Lo Re3 1
CRATOS, Universit´ a Cattolica del Sacro Cuore via Emilia Parmense 84, 29100 Piacenza, Italy {ldgrossi,dferrari}@pc.unicatt.it 2 ICSI, International Computer Science Institute 1947 Center Street, Suite 600, Berkeley, CA 94704-1198, USA [email protected] 3 CERE, Centro di studio sulle Reti di Elaboratori, C.N.R. viale delle Scienze, 90128 Palermo, Italy {difatta,lore}@cere.pa.cnr.it
Abstract. This paper focuses on active networks applications and in particular on the possible interactions among these applications. Active networking is a very promising research field which has been developed recently, and which poses several interesting challenges to network designers. A number of proposals for efficient active network architectures are already to be found in the literature. However, how two or more active network applications may interact has not being investigated so far. In this work, we consider a number of applications that have been designed to exploit the main features of active networks and we discuss what are the main benefits that these applications may derive from them. Then, we introduce some forms of interaction including interference and communications among applications, and identify the components of an active network architecture that are needed to support these forms of interaction. We conclude by presenting a brief example of an active network application exploiting the concept of interaction.
1
Introduction
The last few years have seen the growth of active networks as an innovative technology in computer networking. Traditional computer networks allow their users to share network bandwidth as a common resource. Active networks focus not only on bandwidth but also on other network resources, such as the computing and storage capabilities at the end-systems and intermediate nodes. It also provides the means to inject user code into these nodes, thus enabling user customisation of network protocols and services. The first consequence of active networks is that the entire network paradigm is destined to change. In this new scenario, traditional network protocols look static, rigid, and barely suitable. A new base protocol is required that allows for an efficient management Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 97–109, 1999. c Springer-Verlag Berlin Heidelberg 1999
98
Luca Delgrossi et al.
of network resources and the delivery of user data and code. Traditional data packets are to be replaced by active packets (also called capsules), which can carry both user data and code. Intermediate nodes of an active network need to have the means to load and execute user code in an appropriate execution environment. This new network paradigm opens many interesting possibilities and at the same time it poses a number of new challenges. Modularity and extensibility represent the more general properties of active networks. Users can develop specific algorithms to be integrated in the network protocols, to achieve application-oriented network functions. Such functions can be injected into the network at run-time, based on specific needs. An extreme flexibility of the network can be achieved because single packets can specify their own management functions. Packets that do not contain code, as traditional ones, are forwarded by the default network functions. Although active networks do not increase the domain of deterministically solvable problems, they allow for technical solutions that are particularly suitable for some distributed applications. In several cases, this technology makes feasible network applications that would otherwise be not efficiently realisable. The extreme flexibility of active networks has caused the proliferation of many definitions for them, such as: programmable interface for the network [9], programmable network [15], adaptive protocols [13], platform for user-driven customisation of the infrastructure [12], network as a computational engine [7], and so on. Active networks require strict homogeneity of the basic software components in the network. Thus, an active network can be considered as a global distributed system. Active network technology introduces serious security problems. As user code can be executed on intermediate systems, active networks architecture should guarantee safety and security for user applications, nodes (both end and intermediate systems), and the network as a whole. Different architectures have been proposed in a recent past, which propose partial solutions to this problem [2], [5], [16]. The comprehension of the benefits that an active network implementation can produce is a subject that deserves studying. An efficiency measure has been proposed in [6]. The aim of this work is to focus on the interactions among active network applications. This seems to be an interesting potentiality of the active network paradigm, currently not explored in the literature. We feel that an efficient active network architecture should provide the means to allow applications to exploit the advantages that can be derived from the desired forms of interaction, and the mechanisms to protect applications against undesired forms of interaction. In the rest of this paper, we discuss a number of active networks applications and highlight the potential benefits they can gain from active networking (Section 2); we present some basic forms of interactions among applications including interference and communications (Section 3), and discuss some issues related to the introduction of mechanisms to support application interactions in an active network architecture (Section 4); finally, we conclude by presenting an example of an application exploiting the concept of interference (Section 5).
Interference and Communications among Active Network Applications
2
99
Active Networks Benefits
The evolution of computer networks towards the active network paradigm strongly depends on the actual benefits that can be obtained by applications. We feel that many of these benefits fall into the following categories: – – – –
availability of information held by intermediate nodes, data processing capability along the path, adoption of distributed strategies, easy development of new network services.
Availability of Information Held by Intermediate Nodes - Mobile agents can be encapsulated and transported in the active code of application capsules. They can retrieve and extract pieces of information held by intermediate nodes in a more effective way than through remote queries from the application itself. For instance, an agent could make use of active code to look-up the routing tables of an intermediate node and select some entries according to a given criterion. It can either send such extracted information back to the application, or it can use the information to take timely decisions autonomously from the application. More examples can be found in network management issues, such as congestion control, error management, traffic monitoring. A meaningful example is the one related to the customisation of the routing function. A mobile agent could be devoted to the evaluation of the path for the application’s data flow, according to the user’s QoS specification. Each application could set up its own control policy or exploit a common service (the default per-hop forward function). Data Processing Capability along the Path - Application-specific functions installed into an intermediate node could access and modify transient data addressed to other nodes. Such modification could be due to the current state of the network or to particular receiver needs. Data format translations, different compression levels, document encryption/decryption are some of the examples. Multicast transmission is an instance where the benefits appear more evident. Functions, dynamically deployed in intermediate nodes, can manage the joining of new users, or they can dynamically modify the multicast tree to optimise bandwidth utilisation, or, again, adapt the data format to different user specifications. Audio and video conferencing systems have been proposed in [4], [5], in which agents are located in crucial nodes where the transformation of the streams is needed. Each agent is in charge of the replication of the information for different users. It can also adapt the data flow to different bandwidth requirements and to the network load. Adoption of Distributed Strategies - Active networks applications can easily implement distributed strategies by spreading application mobile agents in the network. Examples of this new potentiality are given by existing applications such as web proxies [13], stock quotes and on line auction applications [15], distributed firewalls [13], and the distributed management of multicast trees [10].
100
Luca Delgrossi et al.
A particular application proposed in [14] is an ad-hoc mobile firewall, whose aim is to inhibit the annoying denial of service attack known as the SYN-Flooding attack. The defender injects into the network a defence mobile agent that is able to recognise the intruder packets and to stop them in intermediate nodes closer and closer to the attacker’s node. A defence based on active networking is inherently more powerful than a common attack strategy, and this case is interesting because it shows a typical interference that makes of an active network such a powerful tool that it becomes potentially dangerous. If no limitations were provided to the interference among applications, the resulting chaos would determine the biggest possible denial of service. From these considerations, we see that the design of an active network architecture capable of providing mechanisms that allow applications to declare their own desired degree of interference is essential.
Easy Development of New Network Services - The code injection technique is the key of the flexibility of an active network, and makes the development of new protocols, services, and other network applications straightforward. Tests on new protocols can be quickly performed on the real network, and not just simply simulated. The updating of network device software with complex dependencies can be remotely accomplished. The need of a greater celerity for delivering the software to network devices arises from the knowledge that the difficulty in introducing several Internet enhancement attempts (RSVP, MBone, IPv6) was also due to the impossibility of accomplishing the necessary actions on the network devices. The IETF diffserv working group is proposing an architecture [8] to support different services other than best-effort forwarding. Per-Hop Behaviours (PHB) are the bricks with which these new services can be built. Per-Hop Behaviours are dynamically allocated in the network nodes and the active networks methodology could be very suitable for this aim. New services that could be easily implemented in an active network are application-driven routing, already discussed, and parallel routing, where the path of unicast transmission is substituted by several parallel paths.
3
Active Networks Applications
As we pointed out in the previous sections, researchers have already been able to identify a number of applications that can potentially take advantage of the main active networking features to fulfil their tasks in an efficient manner. Here, we try to sketch a rough classification of these applications, based on a number of relevant criteria. The goal is to identify the essential components of a generic active networks architecture in which all type of applications fit well. The adopted criteria comprise the type of execution environment present in the intermediate nodes and the ability to share data among different applications.
Interference and Communications among Active Network Applications
3.1
101
Capsules
We start by describing what we feel is the simplest possible scenario. An application generates an active packet containing user data and executable code (capsule). The capsule is sent over a computer network and traverses a series of nodes - some of which are active nodes - on its way to the final destination. When the capsule reaches an active node, the software on the node identifies the presence of executable code, loads it into memory and executes it in a suitable environment. The results of this execution are then stored back into the capsule and the capsule is forwarded to the next hop towards its destination. This behaviour may be replicated at each active node until the destination has been reached. During the execution, it is possible to access critical information stored in an active node. For instance, a capsule may need the current value of a timestamp or detailed information on current traffic load, or on which routes are currently available for data delivery. In a reservation-based system, information on the amounts of resources still available and on the quality of service that can be obtained under the current traffic conditions could be made available as well. Thus, a scheme is needed that provides the means for a peer application to extract critical information from the network in an effective way. Even if this scenario looks simple, we can derive some conclusions from it. The problem of defining an efficient way for capsules to deliver results to a peer application has not been yet investigated. A first approach consists of letting the capsule reach the peer application at the destination: the application may then read the results of the execution directly from the capsule. As an alternative, a capsule may occasionally send back to the peer application at the sender side the results collected. If we push this idea a little forward, we can imagine applications injecting capsules into the network that never really reach a final destination but keep travelling and reporting results and collected data back to the sender application. This last scenario calls for the definition of a communication protocol between the application and its capsules. Although it would be possible for each application to define its own protocol, it would be convenient to define a common protocol to be used by all applications, so that nodes in the network can be used to improve the communications: for instance, an active node could resend a packet generated by a capsule if it can detect that the packet has been lost and the capsule has already left the node. 3.2
Interference among Applications
In a more sophisticated scenario, a capsule injects into an active node code able to modify the node’s behaviour by the execution of one or more customised functions. These functions could be designed to manage (forward, discard, filtering, modify) the data packets that traverse the node. This can be accomplished either by modifying the behaviour of a task run by the active node (the assumption is that the active node provides the means to do this) or by creating a new task running in the node. The latter case corresponds to the activation of a new agent, e.g., a packet filter, on the node. This mechanism can be used by an application
102
Luca Delgrossi et al.
to modify the router’s behaviour with respect to the data packets that will be next sent by the application itself. For example, it may be used to discard packets logically belonging to a substream when delivering hierarchically encoded digital video. Therefore, an application has the means to differentiate the service it receives from the network on a per-packet basis and dynamically adjust it. Although most of the proposed applications limit the agent actions to packets belonging to the same application which installed the agent, in some cases the action might be executed on packets generated by other applications, that may be unaware of it. We call this inter-application interference. Interference among applications can be a very powerful way of exploiting the active networking paradigm. However, it is necessary to provide a framework with strict rules that regulate interference and prevent illegal use by unauthorised applications. Also, appropriate mechanisms have to be built into the network to enforce these rules. We feel that an active networks architecture should provide means to:
– uniquely identify applications and capsules within the network, – associate a set of routines and a memory area in the active node with the incoming capsule, – allow for secure management of application interactions.
Some mechanisms have already been proposed in the literature. ANTS proposes the introduction of fingerprints to authenticate the application and the packet [16]. Such a mechanism is devoted to guarantee that capsules are associated with the correct environment (functions and data) in the Java Virtual Machine. This authentication procedure is aimed at the binding of the programming environment, and indirectly at security. Switchware provides a more general and complex authentication scheme, where the main goal is security [2]. Even in this case, no attention is paid to the regulation of interactions. In the example of the defence against the SYN attack we can imagine that the defender agents present some credentials to the intermediate nodes, whereas the attacker packets contain spoofed addresses. Although applications should be protected against an undesired interference, safe and reliable interactions remain a powerful tool to build effective applications. A number of ”good” interactions could be set-up between ISPs, which could collaborate by sharing their services. Forms of interaction can be identified in the web proxies. In the stock quotes and on line auction applications, the fundamental requirement is that the user (client application) be able to trust the intermediate agent (injected by the server application), i.e., a mechanism of authentication is required. Any external action on an application data flow should have been previously accepted or, even better, declared by the application itself. Only such severe rules can allow the correct use of interference as a useful network service. We propose three possible levels of interference. They correspond to the degree of intervention that an application is willing to accept:
Interference and Communications among Active Network Applications
103
– no interference: an application may require a high security level for its data, and, consequently, it does not accept any interference on its packets; – intra-application interference: applications using interference as a tool to achieve their specific goals could require an authentication mechanism that guarantees against intrusions from and by other applications; – inter-application interference: this is the general case, where an application could accept that other network entities access and modify its own packets. For instance, this is the case of some network services shared by more applications, such as the traditional routing functions (able to access but not to modify packets), policing functions (able to discard some packets), encryption functions (able to access and modify packets), and so on. An efficient active networks architecture should support all three kinds of interference among applications. 3.3
Communication among Applications
Communication between two or more active networks applications is determined by a mutual will to exchange information. Two or more applications, running on an active network, can establish real communication sessions, or more simply they can exchange some messages between each other. An analogy with traditional communicating processes is possible. When two applications decide to exchange some information, they can send some messages to each other, by adopting a common format. Many authors describe active networks applications as network protocols [1]. The communication between applications, in turn, should be managed again by a protocol, which will appear as a communication protocol between some other protocols. Typically, each application provides for some entry points, through which it can receive some information, or information requests by other active applications. Differently from the interference case, in the communication scheme a deterministic behaviour is entirely preserved and guaranteed. In this case, the provision and the management of possible interactions is up to the application. Instances of applications emerging in communication activities could be general-purpose utilities providing services on the network, which can be used by means of well-known handles. As we did for interference, we propose three different levels of communication: – no communication: an application executes its code without requirements of external data; – intra-application communication: this is the case of a multitasking application, where different components of the same application can exchange data among them; – inter-application communication: two separate applications agree upon the exchange of data between them; to this end, they can use messages or they can set some shared variable located on intermediate nodes of the active network. While communication appears as a mutual interaction, on the other hand interference involves a passive behaviour on the part of one of the subjects.
104
4 4.1
Luca Delgrossi et al.
Architectural Issues Intermediate Nodes
Several active networks architectures are currently under development in industry and academia. Different directing principles are underlying these projects, and as a result some of them present diametrically opposed characteristics of design. In this section we will focus on the main characteristics of intermediate system execution environments. Using some concepts derived from operating system design principles, the architecture of an intermediate system, which provides processing capabilities for user code, should present a layered architecture to guarantee different flexibility, security, performance, and usability levels. Only few existing active network architectures have adopted such a criterion as their design principle. Most of them provide only an execution environment obtained on top of pre-existing architectures, and delegate the facing of security problems to the language adopted for active codes. Another of the main topics, which differentiate existing architectures, is the entity that should be considered as an atomic object. Packets or streams can be adopted as the individual repositories of the actions of active codes. This corresponds to two different interpretations of the active networking concept: – the first interpretation reconsiders the network protocol concept, by extending the control information contained in the packets with small pieces of codes. The packet code will be executed at each node and will process the packet data along the path towards the destination; – the second approach considers the intermediate systems in the same way as an end-system. Standard functions or codes previously installed at intermediate systems constitute pipelines for data flows. To take advantage from all the capabilities, some architectures adopt both approaches. Switchware [2], for instance, provides active packets containing mobile programs, and, at the same time, active extensions, providing services on the network elements which can be dynamically loaded. The unresolved problems in the design of an active intermediate system are still many. Nevertheless, most of them are similar to the problems of designing a multitasking operating system. Active technology transforms the intermediate systems from special purpose devices to shared general purpose computing engines. This evolution coagulates specific problems of network and operating system fields in a more complex situation to be faced. Problems such as active program naming [3], active node resource management, protection of active applications, and system integrity are common subjects of both research fields. Furthermore, the introduction of interaction capabilities between active applications entails the classical problems of inter-process communication. A language capable of providing the communication primitives must be adopted, and the system has to provide the necessary abstractions to accomplish them. The programming language adopted for the active codes is an other fundamental aspect of an active network architecture.
Interference and Communications among Active Network Applications
105
Its characteristics are, in some aspects, complementary to the active node operating system characteristics, because it makes up for the lack of operating system with its capabilities. These can be summarised in a strong type control and in the capability of static program verification before the capsules are injected into the network.
4.2
Security
Security problems are closely related to interaction activities. Intrusion in a private data flow is an undesired form of interference. An active network architecture allowing for application interactions must provide strong protection forms to guarantee correct and secure executions. Current active networks architectures propose different and sometime complex solutions to the security problems. Some of them adopt traditional authentication methods, to securely identify packets or data flows, which are allowed to perform safe operations on the intermediate nodes. Among the existing architectures, which significantly take into account the security problems, the Secure Active Network Environment [3] presents the most effective solutions. More than the active network architecture, such an environment (or a secure active network) should provide severe rules for limiting the capabilities of an application for interaction with another. As a consequence, a safe control of interactions imposes some differentiation about who can act on whom. To this end, it is useful to operate a distinction about the two different interaction schemes proposed in this paper. The communication scheme redraws, in some aspects, a message-passing operating system architecture, which has been proposed as a useful model for network and distributed operating systems, one that is able to encourage distribution and, to some extent, security [11]. The different interference levels require a policy, which allows a stream to disclose itself to another one. A potential solution is supplied by the definition of different levels of protection for an active application. An active application may allow reading, writing and executing its own capsules, to nobody (no-interference), only to itself (intra-application interference), to some authenticated applications, or it can disable any protection form (inter-application interference). Authentication protocols by means of public and private keys, and digital signature algorithms can be used to guarantee security and coherence of the communications. The impartiality of intermediate node operating systems guarantees against unrecoverable actions such as the complete malicious discarding of packets. A rigid per-packet or per-flow authentication presents arduous scalability problems. The following different aspects of the security problems should, as a consequence, be taken into account.
Active Node Security - This aspect regards the protection of an intermediate node security from external dangerous actions. The means available to face these concerns are:
106
Luca Delgrossi et al.
1. the adoption of a programming language with reduced capabilities: a user code cannot access directly node resources; 2. the layered organisation of the intermediate node operating system, which is a precondition for its integrity, or an isolated user code execution environment like the sandbox of JVM in [16]. Active Network Security - In this case, security problems are related to the network as a whole. If an active code generated and forwarded more copies of a packet without any limitation, the network would be flooded in a short time. Such a denial of service could be avoided by introducing some mechanism like the TTL in standard IP packets. When a packet is duplicated, its copies will share the original amount of TTL. Such a mechanism on one side protects the network from flooding, on the other side it constrains the active networks application capabilities. Application Security - Active network applications must be guaranteed against undesired interference performed unintentionally or maliciously by other applications. To this end, active networks architectures should adopt more restrictive security mechanisms such as: a) naming strategy (for the routines installed in an active node) b) authentication and authorisation on a stream and packet basis.
5
The ”Counter” Application
In this section, we briefly describe an application designed to solve the problem of counting the number of intermediate nodes present in a large network. If we consider the Internet, the problem of determining how many routers are actually present does not lend itself to a simple answer: Internet addresses cannot help solve this problem, because nothing in the address structure allows for an intermediate router to be distinguished from an end-system. This information could probably be obtained by means of a hypothetical agent located on a node of the network which recursively interrogates all the adjacency detecting Internet routers to be marked, explored, and counted. The active network philosophy may as well offer advantageous tools with which to address this problem. Nomadic agents exploring the network could partition and explore the network more efficiently than a fixed, static agent. The idea is that explorer agents are capable of duplicating themselves whenever a switching point is encountered. The creation of such an application raises a problem due to the growth of the number of capsules. Safety and security aspects of the whole network require that all the capsules injected into the network have a limited TTL, in order to avoid uncontrolled flooding. The solution provided here adopts three different types of active agents: Source, Base, and Scouts. – the Source agent generates the whole application. It is in charge of creating the initial Base agents, collecting the intermediate results, and co-ordinating the node marking and cleaning process;
Interference and Communications among Active Network Applications
107
– the Base agents are responsible for the local actions carried out by the Scout agents. They act as local collectors, and have some control over Scout actions; – finally, the Scout agents, which are light voyager capsules, discover, count and mark the routers in the neighbourhood of the generating Base. Once a Base agent has been injected into an active node, it sends a Scout agent to each adjacent node whose distance is bigger than the current one, distributing its amount of TTL among them. It assigns to the Scouts a maximum exploration distance. It then waits to collect the partial counters obtained by the Scout agents. When a Base agent obtains the partial results from all of its Scouts, it sends its collected value to the Source. The application should keep track of the nodes that have been visited and those that have not been. To this end, the visited nodes are marked, thereby necessitating a successive cleaning phase. Each Scout agent injected into a node takes the following actions: if the node has not been already visited, it marks the node as counted, decreases its exploration distance value and generates as many capsules as there are adjacent nodes at a greater distance from the Source. When the Scout has reached a limit node, it will then install a new Base agent for the following expansion step, sending the address of this border node to the Source. If the node has already been visited, it returns to the parent node, it increases its parent node counter by its own counter, and decreases the number of open paths. The TTL value determines the dimension of the explored area. This area, which is explored by the Scout agents generated by a single Base, is called a Zone. The above application, of which we are building a first implementation, uses intra-application communication forms. Scout agents send the collected results to the Base agents, which in turn communicate with the Source. The same application can be partitioned in a spatial way. Different Autonomous Systems can agree on the possibility of separately enumerating their own intermediate systems. Multiple instances of the same application can exchange their final results to collect the global result. This last case represents an inter-application communication example.
6
Conclusions
Active networks move the control of some network functions to applications (end-to-end argument) and at the same time allow for the execution of some application components in the network. Factors such as performance downgrading and security may represent potential problems. The success of such a paradigm may depend on the fast diffusion of new network services and protocols, and on the development of new applications. Some characteristics of active networks have been analysed, which can produce a more efficient implementation of traditional applications. We feel that an important way to exploit active network features will be that of writing applications able to interact with each other. We discussed two forms of interaction: interference and communication. The former is a powerful tool, which requires ad-hoc security mechanisms. The latter does not impose strong security controls, albeit no applications, which employ such a concept, have been yet proposed. Finally, the active networks paradigm
108
Luca Delgrossi et al.
drives the static concept of network protocol towards a network operating system, capable of guaranteeing basic efficient connectivity and adequate security levels, and which makes the different network resources available to the applications.
References 1. Alexander, D.S., Arbaugh, W.A., Keromytis, A.D., Smith, J.M.: Safety and Security of Programmable Network Infrastructures. IEEE Communications Magazine, vol.36, n.10, October 1998, 84 - 92 103 2. Alexander, D.S., Arbaugh, W.A., Hicks, M.W., Kakkar, P., Keromytis A.D., Moore, J.T., Gunter, C.A., Nettles, S.M., Smith,J.M.: The SwitchWare Active Network Architecture. IEEE Network Special Issue on Active and Controllable Networks, vol. 12, n. 3, May-June 1998, 29 - 36 98, 102, 104 3. Alexander, D.S., Arbaugh, W.A, Keromyts, A.D., Smith, J.M.: A Secure Active Network Architecture: Realization in SwitchWare. IEEE Network Special Issue on Active and Controllable Networks, vol. 12, n. 3, May-June 1998, 37 - 45 104, 105 4. Baldi, M., Picco, G., Risso, F.: Designing a Videoconference System for Active Networks. Proceedings of the 2nd International Workshop on Mobile Agents, Stuttgart, September 1998 99 5. Banchs, A., Effelsberg, W., Tschudin, C., Turau V.: Multicasting Multimedia Streams with Active Networks. Technical Report TR-97-050, International Computer Science Institute, Berkeley CA 98, 99 6. Bhattacharjee, S., Calvert, K.L., Zegura, E.W.: Active Networking and End-to-End Arguments. IEEE Network Special Issue on Active and Controllable Networks, vol. 12, n. 3, May-June 1998 98 7. Bhattacharjee, S., Calvert, K.L., Zegura, E.W.: On Active Networking and Congestion. Technical Report GIT-CC-96-02, College of Computing, Georgia Tech. 98 8. Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., Weiss, W.: An Architecture for Differentiated Services. Internet RFC 2475, December 1998 100 9. Calvert, K.L., Bhattacharjee, S., Zegura, E.W., Sterbenz, J.: Directions in Active Networks. IEEE Communications Magazine, vol.36, n.10, October 1998, 72 - 78 98 10. Li-wei, H.L., Garland, S.J., Tennenhouse, D.L.: Active Reliable Multicast. IEEE INFOCOM’98 San Francisco, USA 1998 99 11. Nutt, G.J.: Centralized and Distributed Operating Systems. Prentice Hall International, 1992 105 12. Tennenhouse, D. L., Smith, J.M., Sincoskie, W.D., Wetherall D.J., Minde, G.J.: A Survey of Active Network Research. IEEE Communications Magazine, Vol. 35, No. 1, January 1997, 80-86 98 13. Tennenhouse, D. L., Wetherall, D.J.: Towards an Active Network Architecture. Computer Communication Review, Vol. 26, No. 2, April 1996 98, 99 14. Van, V.: A Defense Against Address Spoofing Using Active Networks. MIT Master’s thesis, May 1997 100
Interference and Communications among Active Network Applications
109
15. Wetherall, D.J., Legedza, U., Guttag, J.: Introducing New Internet Services: Why and How. IEEE Network Magazine Special Issue on Active and Programmable Networks, vol. 12, n. 3, May-June 1998 98, 99 16. Wetherall, D.J., Guttag, J., Tennenhouse, D.L.: ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. IEEE OPENARCH’98, San Francisco, CA, April 1998 98, 102, 106
The Grasshopper Mobile Agent Platform Enabling Shortterm Active Broadband Intelligent Network Implementation C. Bäumer and T. Magedanz IKV++ GmbH Kurfrü stendamm 173-174, D - 10707 Berlin, Germany {baeumer,magedanz}@ikv.de Abstract. The emerging notion of Active Networks describes the general vision of communication network evolution. In this context mobile agent (MA) technology and programmable switches are considered as enabling technologies. This paper presents an overview of the Grasshopper agent platform, developed by GMD FOKUS and IKV++ GmbH, which provides a powerful middleware for the implementation of MAbased telecommunication services, including active network applications. Apart from a general overview of the Grasshopper platform, we will briefly illustrate the usage of this platform for the short-term implementation of an Active Broadband Intelligent Network environment, which is currently realised within the ACTS project MARINE (Mobile Agent-based Intelligent Network Environment).
1
Introduction
The emerging notion of “Active Networks” describes the general vision of communication network evolution, where the network nodes become active because they take part in the computation of applications and provision of customised services. The basic idea of such Active Networks [1,2] is the movement of service code, which has been traditionally placed outside the transport network, directly to the network’s switching nodes. Furthermore, this movement of service code should be possible in a highly dynamic manner. This allows the automated, flexible, and customised provision of services in a highly distributed way, thereby enabling better service performance and optimised control and management of transport capabilities. Taking into account the current research related to the developments toward open, active, and programmable networks within both telecommunications and internet communities, it becomes obvious that two enabling technologies become key in both worlds: programmable switches, which provide flexibility in the design of connectivity control applications [3], and mobile code systems/mobile agent platforms, which allow the dynamic downloading and movement of service code to specific network nodes [4]. In contrast to the integrated approach of active network research, where the transmitted packets (so called capsules) contain besides the data also program fragments Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 109-117, 1999. Springer-Verlag Berlin Heidelberg 1999
110
C. Bäumer and T. Magedanz
responsible for processing the data at the switches, this paper concentrates on the discrete approach of active networking, where service deployment is performed separately (i.e., outband) from service processing. Based on previous research activities and related publications [5][6] this paper introduces the mobile agent platform Grasshopper and illustrates its application for the implementation of an active Broadband Intelligent Network (B-IN) environment. This active B-IN environment enables the dynamic deployment and distribution of MA-based services onto enhanced broadband switching equipment and service nodes. These service agents can be provided timedependent, i.e., installed for a limited time duration, and location dependent, i.e., installed on dedicated switches or even on specific end user systems, and thereby relax the load of service control and service management systems and the corresponding signalling and data networks. In the following section a detailed overview of the capabilities of the Grasshopper agent platform is given. In section 3, we illustrate how Grasshopper is used for realising an active B-IN environment, which is currently implemented within the ACTS MARINE project. Section 4 concludes this paper.
2
Grasshopper – The Agent Platform
Grasshopper [7], which has been developed by GMD FOKUS and IKV++ GmbH, is a mobile agent development and runtime platform that is built on top of a distributed processing environment. It is written in Java (based on Java JDK 1.1). Thereby an integration of the traditional client/server paradigm and mobile agent technology is achieved [8]. Grasshopper is conformant to the first agent standard1 and can be used in various application domains, such as telecommunications, active networking and electronic commerce. The first Grasshopper version has been released in summer 1998. Since February 1999, Grasshopper Release 1.2 is available. Region Region Registry Management MAF Finder
Communication
Agency Core Agency
Communication
Place
Management Persistence
M
M
M
Registration Security MAF AgentSystem
Transport
S
S
Fig. 1. Grasshopper Distributed Agent Environment
1
The first mobile agent standard has been defined in 1998 by the Object Management Group (OMG) known as Mobile Agent System Interoperability Facility (MASIF) [9].
The Grasshopper Mobile Agent Platform
111
In principle, Grasshopper realises a Distributed Agent Environment (DAE). The DAE is composed of regions, places, agencies and different types of agents. Fig 1 depicts an abstract view of these entities. Two types of agents are distinguished in Grasshopper: mobile agents and stationary agents. The runtime environment for both mobile and stationary agents is an agency: on each host at least one agency has to run to support the execution of agents. A Grasshopper agency consists of two parts: the core agency and one or more places. Core Agencies represent the minimal functionality required by an agency in order to support the execution of agents. The following services are provided by a Grasshopper core agency: • Communication Service This service is responsible for all remote interactions that take place between the distributed components of Grasshopper, such as location-transparent inter-agent communication, agent transport, and the localisation of agents by means of the region registry. All interactions can be performed via CORBA IIOP, Java RMI, or plain socket connections. Optionally, RMI and plain socket connections can be protected by means of the Secure Socket Layer (SSL) which is the de-facto standard Internet security protocol. The communication service supports synchronous and asynchronous communication, multicast communication, as well as dynamic method invocation. As an alternative to the communication service, Grasshopper can use its OMG MASIF-compliant CORBA interfaces for remote interactions. For this purpose, each agency provides the interface MAFAgentSystem, and the region registries provide the interface MAFFinder [9]. • Registration Service Each agency must be able to know about all agents and places currently hosted, on the one hand for external management purposes and on the other hand in order to deliver information about registered entities to hosted agents. Furthermore, the registration service of each agency is connected to the region registry, which maintains information of agents, agencies and places in the scope of a whole region. • Management Service The management services allow the monitoring and control of agents and places of an agency by (human) users. It is possible, among others, to create, remove, suspend and resume agents, services, and places, to get information about specific agents and services, to list all agents residing in a specific place, and to list all places of an agency. • Security Service Grasshopper supports two security mechanisms: external and internal security. • External security protects remote interactions between the distributed Grasshopper components, i.e. between agencies and region registries. For this purpose, X.509 certificates and the Secure Socket Layer (SSL) are used. SSL is an industry standard protocol that makes substantial use of both symmetric and asymmetric cryptography. By using SSL, confidentiality, data integrity, and mutual authentication of both communication partners can be achieved.
112
C. Bäumer and T. Magedanz
• Internal security protects agency resources from unauthorised access by agents. Besides, it is used to protect agents from each other. This is achieved by authenticating and authorising the user on whose behalf an agent is executed. Due to the authentication/authorisation results, access control policies are activated. The internal security capabilities of Grasshopper are mainly based on JDK security mechanisms. • Persistence Service The Grasshopper persistence service enables the storage of agents and places (the internal information maintained inside these components) on a persistent medium. This way, it is possible to recover agents or places when needed, e.g. when an agency is restarted after a system crash. A place provides a logical grouping of functionality inside of an agency. The region concept facilitates the management of the distributed components (agencies, places, and agents) in the Grasshopper environment. Agencies as well as their places can be associated with a specific region by registering them within the accompanying region registry. The region registry will also automatically register all agents that are currently hosted by those agencies. If an agent moves to another location, the corresponding registry information is automatically updated. The functionality of Grasshopper is provided on the one hand by the platform itself, i.e. by core agencies and region registries, and on the other hand by agents that are running within the agencies, in this way enhancing the platform’s functionality. The following possibilities regarding the access to the Grasshopper functionality must be distinguished: • Agents can access the functionality of the local agency, i.e. the agency in which they are currently running, by invoking the methods of their super classes Service, StationaryAgent, and MobileAgent, respectively. These super classes are provided by the platform in order to build the bridge between individual agents and agencies, and each agent has to be derived from one of the classes StationaryAgent or MobileAgent. • Agents as well as other DAE or non-DAE components, such as user applications, are able to access the functionality of remote agencies and region registries. For this purpose, each agency and region registry offers an external interface, which can be accessed via the Grasshopper communication service. • Agencies and region registries may optionally be accessed by means of the MASIFcompliant interfaces MAFAgentSystem and MAFFinder. In the context of Grasshopper, each agent is regarded as a service, i.e. as a software component that offers functionality to other entities within the DAE. Each agent/service can be subdivided into a common and an individual part. The common (or core) part is represented by classes that are part of the Grasshopper platform, namely the classes Service, MobileAgent, and StationaryAgent, whereas the individual part has to be implemented by the agent programmer.
The Grasshopper Mobile Agent Platform
3 3.1
113
Using Grasshopper for Implementing an Active B-IN Intelligent Network Limitations and Evolution
Among the different solutions aimed at providing advanced telecommunication services, the Intelligent Network (IN) represents at the end of this century the most prominent architecture [10]. The reason for this is that the IN provides a uniform and extensible service platform which should enable the rapid introduction of customised telecommunication services across different bearer networks, such as PSTNs (Public Switched Telephone Networks), ISDNs (Integrated Digital Services Networks) and Broadband ISDN (B-ISDN). The main architectural principle of the IN is the separation of service switching and service control, with a reduced number of centralised nodes (SCPs, Service Control Points) hosting service logic and data for controlling via a dedicated IN Application Protocol (INAP) on top of the Signalling System No. 7 (SS7) network with a high number of distributed specialised switches (SSPs, Service Switching Points). In addition, special assistant devices (IPs, Intelligent Peripherals) provide additional capabilities for advanced user interactions, which for cost reasons can not be accommodated in all switches. IN services are deployed and managed via a Service Management System (SMS), which obtains the services from a Service Creation Environment (SCE). Users gain access to the services via their terminal equipment. For more details look at [10]. The increasing competition between network operators requires fast responses to users’ needs. Service deployment times represent a key success factor for operators. In order to accomplish this objective, revisions of equipment available in the network, essentially concentrated on software changes, become a crucial requirement for the success of a technical solution. The quite centralised approach adopted in the IN architecture is a consequence for supporting this requirement, since services have to be introduced only in the centralised control nodes and not in all switches. However, in face of an increasing number of IN services, the centralised service control nodes (and the signalling network) become performance bottlenecks. Since IN evolution takes into account also recent progress of the IT domain, new opportunities exist to tackle these threads. The availability of new software technologies such as DOT (Distributed Object Technology) and MAT (Mobile Agent Technology) allows to evolve the current IN architecture, looking for systems where intelligence can be distributed where and when needed, while maintaining compatibility with current centralised architectures. This solution allows to design dynamic / active IN architectures, where enhanced switching systems can take over the most adequate role from time to time, depending on the software capabilities they host in a given time frame. In the following we briefly describe the approach taken by the ACTS project MARINE (Mobile Agent environments in Intelligent Networks) project. MARINE [11] which uses the Grasshopper platform for implementing an active B-IN.
114
C. Bäumer and T. Magedanz
3.2
The MARINE Project Implementing an Active B-IN
In the scope of the MARINE project, advanced broadband services, such as broadband video telephony and video on demand, are realised in an active B-IN environment. MARINE adopts an evolutionary approach for the IN, taking into account an interworking between the new Grasshopper-based service environment, comprising multiple switching nodes enabled to run services locally (i.e., broadband Service Switching and Control Points, B-SSCPs) and central service execution nodes (i.e., broadband SCPs, B-SCPs), and the traditional broadband IN environment based on BSSPs and B-SCPs. This means that it should be possible for a new open switch to access transparently remote B-SCP based services if necessary. On the other hand, a traditional B-SSP should be able to access new MA-based services in a centralised service execution node. The overall scenario is depicted in Fig 2. Service Logic Service Data Service Logic Service Data
B-SCP
B-INAP SS7
SMS/SCE
Agency
Agency
Service Node
IOP KTN
Agency Agency Agency Agency
B-IP
B-SSPs
Open Switches
Fig. 2. The MARINE Reference Architecture
The MARINE DAE is structured as follows. By means of the region concept, agencies belonging to a single network or service provider are grouped; i.e., each network operator has its own region. The place concept is used to separate IN related capabilities inside a single agency. For example, an agency within the B-SSCP hosts a place for SSF related agents as well as a co-located SCF/SDF place for the service agents (see also Fig 3). 2 Furthermore, two kinds of agents are required in addition to the core agency services: Firstly, specific service agents related to the multimedia service environment/infrastructure (e.g., basic bearer connectivity, special resource access, interworking/gateway services, etc.) have to be provided, which are not mobile, since they are related to specific locations (e.g., a switch, special resources, etc.). Secondly, the 2
Note that in this context we adopt the notion of IN functional entities, i.e., Service Switching Function (SSF), Service Control Function (SCF), Service Data Function (SDF), and Specialised Resource Function (SRF).
The Grasshopper Mobile Agent Platform
115
mobile agents implementing the actual B-IN service logic have to be provided. Looking briefly at the first class of environment agents, the B-SSP is enhanced by a Grasshopper agency, which provides at least two places, namely: • a B-SSF place which provides a connectivity service to the broadband Switching State Manager (B-SSM) and a service trigger agent responsible for dispatching service requests to (local or remote) service agents, and • a B-SCDF (SCF/SDF) place hosting all the local service agents and a housekeeper agent. Provider System (SMS)
SMS-Place
Open Switch (Enhanced B-SS&CP) DAE
Central Service Node (Enhanced B-SCP)
Agency A2
Agency A3
B-SCDF-Place
A1 B-SSF-Place
SSM
B-SRF-Place
B-SRF-Place
B-SCDF-Place
INAP
B1
B2
Communication Channel (ORB) BCM
B-INAP
UNI
NNI
B-INAP UNI
NNI
User to B-SCP /
from B-SSP
Fig. 3. Structure of the MARINE DAE
Optionally, a B-SRF place provides service capabilities for the access to dedicated special resources, such as speech synthesis, video codecs, etc. Thus, this place hosts service agents providing customised announcements, video previews, etc. As there is the need for centralised service provision (e.g., for mobility services), the B-SCP will be enhanced by the aforementioned B-SCDF (SCF/SDF) place, featuring besides inter-ORB communication also a B-INAP stack and a corresponding BINAP/CORBA gateway for allowing access from traditional B-SSPs. Furthermore, the enhanced B-SCP may accommodate an SRF place. Finally, an SMS place exists in the provider system, which serves as service agent repository and in addition provides appropriate management services for the MARINE environment in order to control and monitor the mobile service agents. Service agents are always starting their itinerary from this agency. This itinerary may be preconfigured, based on the service type and service user locality. Fig 3 depicts the MARINE Reference Configuration in a more detailed way. Via the B-SSM, the B-SS&CP is able to invoke the CORBA-based local and remote service agents (indicated as A1 - determination of local or remote service agents, A2 -
116
C. Bäumer and T. Magedanz
invocation of a local service agent, and A3 - invocation of a remote service agent) as well as traditional B-INAP-based services (B1 in Fig 3). Note that this B-INAP request is sent to a traditional B-SCP. As also indicated in Fig 3, the enhanced B-SCP may be accessed via CORBA and agent communication mechanisms as well as through B-INAP requests coming from traditional B-SSPs (B2). For a more detailed description of MARINE readers are referred to [12].
4
Conclusions
This paper has described the Grasshopper agent platform as an enabling technology for future active network and open service environments. Grasshopper provides a high degree of flexibility for software developers to realise their ideas. It is used today in several European projects. Most of these projects belong to CLIMATE [13], the Cluster for Intelligent Mobile Agents in Telecommunication Environments, which is part of the European ACTS Programme. The MARINE project can be considered as a first important step towards an active telecommunications network implementation. In the coming month, Grasshopper will be tuned to meet the strong performance and realtime requirements of active networks enabling also capsule implementations.
5
References
1. L. Tennenhouse, et.al. (1997). ‘A Survey of Active Network Research’, IEEE Communications Magazine, pp. 80-85, Vol. 35, No. 1, January 1997 2. MIT Active Networks homepage: http://www.tns.lcs.mit.edu/activeware 3. P1520 - Proposed IEEE Standard for Application Programming Interfaces for Networks: http://www.iss.nus.sg/IEEEPIN/ 4. M. K. Perdikeas, F. G. Chatzipapadopoulos, I. S. Venieris, G. Marino: "Mobile Agent Standards and Available Platforms", Computer Networks Journal, Special Issue on “Mobile Agents in Intelligent Networks and Mobile Communication Systems“, ELSEVIER Publisher, Netherlands, vol. 31, issue 10 (1999) 5. T. Magedanz, R. Popescu-Zeletin (1996b). ‘Towards Intelligence on Demand - On the Impacts of Intelligent Agents on IN’, Proceedings of 4th International Conference on Intelligent Networks (ICIN), Bordeaux, France, December (1996) 30-35 6. M. Breugst, T. Magedanz: ”Mobile Agents - Enabling Technology for Active Intelligent Networks”, IEEE Network Magazine, Vol. 12, No. 3, Special Issue on Active and Programmable Networks (1998) 53-60 7. IKV++ GmbH – Grasshopper homepage: http://www.ikv.de/products/grasshopper 8. M. Breugst et al.: ‘On the Usage of Standard Mobile Agent Platforms in Telecommunication Environments’, in: LNCS 1430, Intelligence in Services and Networks: Technologies for Ubiquiteous Telecom Services, S. Trigila et al. (Eds.), ISBN: 3-54064598-5, Springer Verlag (1998) 275-286
The Grasshopper Mobile Agent Platform
117
9. OMG (1995), Common Facilities RFP3, Request for Proposal OMG TC Document 95-11-3, Nov. 1995, http://www.omg.org/; MASIF specification is available through http://ftp.omg.org/pub/docs/orbos/97- 10-05.pdf 10. T. Magedanz, R. Popescu-Zeletin. ‘Intelligent Networks - Basic Technology, Standards and Evolution’, International Thomson Computer Press, ISBN: 1-85032293-7, London (1996) 11. MARINE project homepage: http://www.italtel.it/drsc/marine/marine.htm 12. L. Faglia, T. Magedanz, A. Papadakis: “Introduction of DOT/MAT into a Broadband IN Architecture for Flexible Service Provision“, H. Zuidweg et.al (Eds.), IS&N 99, LNCS 1597, ISBN: 3-540-65895-5, Springer-Verlag (1999) 469-481 13. CLIMATE homepage: http://www.fokus.gmd.de/research/cc/ima/climate/climate.html
LARA: A Prototype System for Supporting High Performance Active Networking R. Cardoe, J. Finney, A.C. Scott, and W.D. Shepherd Distributed Multimedia Research Group Computing Department,Lancaster University Lancaster, UK {cardoe,joe,acs,doug}@comp.lancs.ac.uk Abstract. There are a number of alternative directions in which active networking is progressing, each of which have their own advantages and disadvantages. This paper presents the Lancaster Active Router Architecture (LARA), as a means to integrate these distinct active network environments in a single system, to allow increased flexibility in network programming. There are a number of issues involved in accomplishing this, including scalability, resource management, performance and security. The remainder of this paper describes our prototype composite hardware/software solution, which takes the first steps towards tackling these issues.
1
Introduction
As the Internet and related internetworking architectures and protocols have started to mature in recent years, we have begun to gain a better understanding of some of the limits and problems that are still associated with these technologies. These issues range from technical problems such as the real-time requirements driven by modern distributed multimedia applications and scalability on a global scale, to development limitations inherent in the lengthy standardisation process currently used to evolve new networking technologies. We have also seen an increasing research interest in mobile code and adaptive systems, which has enabled research into a new paradigm for digital communications - active networking. Active networking is a generic term that encompasses many different approaches to one essentially simple goal, that of leveraging some of those mobile code and adaptive systems concepts in order to bring vastly increased flexibility into our networked systems. Currently, network protocols are relatively static and the behaviour of networks as they transport data from one host to another is hard to both predict and manipulate. Active networking aims to increase the flexibility of the network as a whole by allowing users or applications to ‘program’ the network, in much the same manner as end-systems are programmed. Active network technology allows the downloading of code into the routers, gateways, switches and bridges of the network in order to customise how it transports data. This gives networks and Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 117-131, 1999. Springer-Verlag Berlin Heidelberg 1999
118
R. Cardoe et al.
users the ability to respond quickly to changing requirements and is well suited to highly dynamic environments such as ad-hoc networking, distributed group communication and wireless networks. Furthermore, it is foreseeable that active networking will help to speed up the process of developing and deploying new protocols and services, as they can be created, installed and tested on-the-fly in a controlled environment, rather than going through lengthy standardisation processes. Active networking is a field that is rapidly gaining strength and there are research projects that focus upon almost every aspect of programmable networks. These range from application-level active networking, such as the proxylet [1] system developed at UTS, to link-layer experiments like the active bridging work [2] at UPenn. The scope of research in this area includes formal methods, security and safety, new languages and APIs for programming networks, in addition to novel architectures, applications and management tools and techniques. While there are many facets to the active networking field, the research documented in this paper is focused upon providing a composite hardware and software system capable of supporting high performance active networking, particularly at the periphery of internetworks. Specifically, a network node that can cope with the increased demands for resources that are required to perform active processing, routing and forwarding, whilst not compromising non-active routing functions and security. The remainder of this paper is structured as follows; section 2 reviews some of the related work carried out in this field and the approaches that are being taken by other research groups. This is followed in section 3 with a discussion of the requirements of active networking and those that have been focused upon during our investigations. Section 4 presents the architecture that comprises our prototype, both in hardware and software. Section 5 describes the realisation of our prototype and finally, sections 6 and 7 present some preliminary results, conclusions and future work.
2
Related Work
In recent months, there has been an explosion of interest in active and programmable networks, and some of the “early bird” projects have started to come to fruition. Some projects have attempted to create general-purpose active network environments, in order to gain some understanding of the issues that are involved in this area. Examples of such environments include ANTS [3] at MIT and SwitchWare [4] at University of Pennsylvania. Both the ANTS and SwitchWare projects take a similar view to the method by which Active Network (AN) code is developed and deployed. Firstly, their adoption of high level languages such as Java and OCaml to carry active network code allows the development of a wide range of AN applications. This has several advantages, including the availability of existing programming experience, development tools and code libraries. Secondly, AN code can be injected from end systems by network user applications on demand, rather than a more static model which may involve explicit code downloads or network administration. This leads to a highly flexible system, with the need for stringent security policies. SwitchWare draws greatly on the provability of its primary language (ML), and therefore places the emphasis of its security on language verification techniques. In addition, certificate-based
LARA: A Prototype System for Supporting High Performance Active Networking
119
authentication and cryptography is also supported when greater levels of security are required. ANTS, however, focuses primarily on providing functionality, and relies on other security models, such as the Java security model, to implement security/authentication and management policy. In addition, there has been work at other institutions, which focuses more tightly upon specific problem domains. The NetScript project [5] at University of Columbia is targeted more specifically at management applications and providing administrators with enhanced control of network resources. NetScript’s computational model is based on ‘dataflow’, a scheme in which the flow of data through a network node triggers computation on its behalf. This makes the system particularly well suited to network-based applications whose traffic is typically asynchronous and unpredictable. NetScript has been used to create programmable SNMP agents, protocol analysers, signalling and routing protocols and firewalls. There has also been interest in experimenting with non-IP based systems, such as ATM. ‘The Tempest’ [6] is a system developed at University of Cambridge that brings active network concepts to a switch rather than a router. By partitioning an ATM switch into “switchlets”, this enables distinct control architectures on a per port basis. “The Tempest” has demonstrated the viability of service-specific control, providing tailored support for a distributed video-conferencing service. Furthermore, application-specific control is possible, giving individual applications the ability to dynamically load control policies into a switchlet controller. The CANES project [7] at Georgia Tech. is an approach to active networking in which users can select from a set of pre-defined functions to be applied to their packets at they transit a network node. In addition, users can supply parameters to those functions in order to modify their behaviour, though they cannot supply their own functions and inject them into the network. Although this method is much more restrictive than others are in terms of what can be computed inside the network, there are advantages such as security and efficiency. Functions can be thoroughly tested and verified before being made available and can be optimised for the specific hardware architecture upon which they execute. This work has been used to investigate new approaches to dealing with network congestion [8], arguably a good candidate for active network techniques, due to the nature of the problem. In addition to individual projects, there has been an acceptance in the active networks research community, that interoperability is an important issue. Not only in terms of enabling development, but also because it is becoming increasingly clear that no single architecture can solve all the problems and address every need. To this end, the Active Network Encapsulation Protocol (ANEP) [9] was developed; this protocol allows multiple AN environments to co-exist on a network node and for that node to correctly demultiplex packets to them. Active network research has converged on two methods for deploying code inside the network, these are essentially in-band and out-of-band. The in-band system as exemplified by the capsules approach developed at MIT suffers from high bandwidth
120
R. Cardoe et al.
utilisation and possible duplication of code. This in turn has led to other research efforts such as SmartPackets [10] developed by BBN Technologies - a system that allows the efficient encoding of non-trivial programs to fit into only 1Kbyte of an IPv4 or IPv6 packet. This avoids the need for packet fragmentation that would necessitate the use of a transport layer protocol, thus increasing complexity and the processing requirement on active nodes. Out-of-band code injection uses a separate signalling channel to construct the active environment through which packets are processed and as such suffers a penalty in terms of connection setup time. In either case, there is a recognition for the need to minimise these overheads and in some cases protocols such as IPv6 are starting to be used rather than the more common UDP datagram, as the carrier for active code. The level at which customisation of network components by applications, endusers and system administrators in an active network system is commonly the network or IP layer. There are exceptions however, that are showing alternative types of network programmability. In a project related to SwitchWare at UPenn, the feasibility of MAC level active networking was demonstrated in an ‘active bridging’ system. The active bridge could be dynamically upgraded from a programmable buffered repeater into a learning bridge and then to a bridge supporting multiple spanning tree algorithms. Application-level active networking has also been prototyped with the proxylet system developed at UTS. This allows applications to use HTTP to select code modules from a server and associate them with a packet stream in order to apply some application-specific processing during transmission. It can be seen that there are a wide variety of research issues and directions in the field of active networking. The projects described above cover only a small proportion of those interests, but demonstrate the diversity of requirements that exist and which need to be fulfilled, at least in part, to build a usable, secure, robust and managed active network.
3
Requirements
It is clear that active networks have many requirements and that these vary depending upon the situation or problem domain at hand. For example, an application that wishes to inform the network which packets to drop in case of congestion is very different from a user or network administrator that wants to change the transport protocol that handles his packets. Networks are used for many different purposes however, and an active network needs to be able to respond to as many of these requirements as possible. Flexibility is one of the most central requirements, as this is one of the primary motivations behind the concept of active networking. To allow the maximum choice in terms of how the network can be tailored to suit the needs of a particular service or application. This approach can be seen in projects that attempt to create a generalpurpose programming environment and concentrate upon functionality as one of their prime objectives. This is not to exclude those research initiatives that have concentrated upon a particular application area such as management and are
LARA: A Prototype System for Supporting High Performance Active Networking
121
experimenting with architectures that best suit those needs. Indeed, it is envisioned that an active network will have multiple execution environments that will allow users or applications to program the network with a paradigm that best suits their needs. Our view is that flexibility must be a primary element of an active network and the ability to install multiple execution environments is vital. As stated in [6] it is becoming increasingly clear that no one monolithic protocol or protocol stack can cope or deal effectively with the demand for multi-service networks that are being built and needed. Security is named again and again as an important property of an active network, and the opportunities are available now to re-examine network security and build it into networks from the ground up. There are several different approaches being taken such as language-based verification and type checking, authentication and cryptography, and formal methods. There is general agreement that security is an integral part of networking in today’s world and that this need can only increase with the development of active networks. The complexity and power of these new technologies needs to be controllable, predictable and that security needs to be enforceable. A range of authentication policies is needed that can be mapped to specific user, application and network needs. Currently, the level at which programmability of the network is available is limited, but increasingly we are seeing active network concepts being moved and used in areas other than the network layer. We see active networking in the wider context of programmable networks, where all the constituent elements of a network from endsystems through network nodes and at multiple levels should be customisable. From MAC layer ad-hoc networking protocols to application level proxies. One conspicuous requirement of a network is performance and this is equally true in an active network. Current network resources are already being stretched with the phenomenal growth of the Internet and adding computation overheads to network nodes can only exacerbate this. Currently, functionality is one of the primary aims of research, aimed at developing and proving the feasibility of active networking. At Lancaster we see performance as a critical element in ensuring not only that a network can cope with the extra demands being made of it, but also in the general acceptance of this technology. An active network must not retard the performance of standard network operations, and it should be possible to construct active nodes that can still process standard traffic as well as provide enhanced services. The need for management and administration in active networking is clearly evident. Not only can active networking help to create new network management solutions, but the active network itself needs to be managed. The paradigm shift from passive to active networks sees a massive increase in the complexity of these systems, which would surely collapse without administration. This implies the need for strict system policies, which specify the resources and API’s to which an active network program is permitted access.
122
R. Cardoe et al.
It has become clear even from early research that no one architecture or methodology will be able to completely satisfy all the requirements of an active network. Therefore, interoperability is a necessary requirement in order to allow endusers and applications to communicate effectively and to select appropriate services relative to their needs. Some of the projects described above collaborate with each other under the auspices of a DARPA [11] funded scheme and have created with others an ‘Architectural Framework for Active Networks’ [12] which outlines some common assumptions and objectives. This reference also serves as a basis for interoperability of their systems and defines a common base from which individual research directions can move. A common architecture has been agreed upon that delineates between the network node and its operating system and the services this should provide to multiple active network execution environments that sit above. The ANEP protocol described above is also used as a common multiplexing tool for allowing interoperability. It can be observed that the requirements of an active network are many and varied, and include flexibility, security, performance, management and administration, multiple levels of programmability, and interoperability. Currently, there is no single system that fulfils all of these objectives, instead research is focused upon particular areas. Lancaster’s approach to this field has been to try and address as many of these requirements as possible, but with particular emphasis on flexibility and performance.
4
Architecture
This section describes the approach the LARA architecture takes to supporting the diverse range of requirements imposed upon network nodes by AN Execution Environments (EE’s) and their respective applications. As already noted, a composite hardware/software solution was adopted. This architecture splits neatly into four parts – Cerberus, a first prototype of the LARA concept, the LARA Platform Abstraction Layer (LARA/PAL), the LARA Management and Policy Database (LARA/MAN) and the LARA runtime EE (LARA/RT). See fig.3 at the end of this section.
4.1 Cerberus Designed to be a high performance, inexpensive “edge router”, i.e. a router which has a primary role of serving moderate sized networks and Intranets, Cerberus provides an inexpensive doorway into Active Network technology. Based around a dual bus backplane and the Dedicated Processor (DP) architecture (fig. 1), and as such, bears many similarities to existing commercial router implementations. The choice of a bus backplane vastly reduces the cost of the device, and as such allows larger scale experimentation to be undertaken. However, this architecture will only scale to a moderate no. of ports (approx. 4 or 5, 100Mbps ports). We plan to implement an ATM based switched backplane solution for higher performance systems, and have been careful to consider this whilst designing the overall architecture.
LARA: A Prototype System for Supporting High Performance Active Networking
123
Fig. 1. Dedicated Processor Architecture
The DP architecture was chosen as it transfers much of the computationally expensive forwarding process onto individual CPU’s. Cerberus has at least one processor per network port, and is therefore well suited to carry out extensive processing functionality on its forwarding processors. These processors are more general purpose and substantially more powerful than the CPU’s typically found in Cerberus’ commercial cousins. See section 5 for a more detailed description of the Cerberus implementation. In addition to the forwarding processors, a single management processor is also present, which deals with the majority of the control of the machine, including AN code compilation, authentication and policy management decisions. Larger scale prototypes of LARA are also planned, as explained in the future work section of this paper.
4.2 LARA/PAL As mentioned in the requirements section of this paper, it is foreseen that no one AN execution environment is likely to be able to tackle the complete AN problem domain. It is therefore the approach of this architecture to provide an abstraction layer over the physical operating system of the node, and export a set of primitives, which can be used to interface any execution environment. These primitives include CPU scheduling and multithreading support, main memory management, network bandwidth management, and enforcing policy decisions on behalf of the system administrator. LARA/PAL provides this abstraction. LARA/PAL maintains several data structures on behalf of EEs, to improve the reliability, performance and portability of the system. The two most important structures are the Capsule Information Table (CIT) (Fig.2), and the Packet Filter Table (PFT). The term “capsule” is used throughout the rest of this section to refer to an active network program, in any execution environment. The CIT contains one entry for EVERY capsule in the system, regardless of execution environment, and holds the low level context and policy information of that capsule. Whenever a capsule is instantiated, an entry is created in this table, and
124
R. Cardoe et al.
initialised from the policy provided by LARA/MAN. A unique key, the CAPSULE ID, is also generated at creation time. This key is passed to the relevant EE for later reference, and is used when the capsule wishes to use dynamic memory allocation or transmit data to ensure the capsule remains safely within the limits specified by its policy.
Fig. 2. Capsule Information Table
CPU Protection LARA/PAL uses a split-level scheduling technique to achieve fair allocation of CPU time to AN programs. The first level of scheduling is managed by the host operating system, and contains one task per loaded execution environment. These tasks are scheduled using one of a number of scheduling algorithms, configurable by the LARA/MAN subsystem, and give a certain amount of flexibility to increase and decrease the priority of EEs. To enable scheduling of active code, a second level of scheduling is managed by the EE itself, according to the safety of the programs that can be executed within that environment. For example, SwitchWare programs need little in the way of scheduling, as the OCaml programs used in that environment are stringently verified prior to execution. Other EEs, however, such as ANTS and LARA/RT, rely on more complex pre-emptive scheduling techniques to ensure fair processor access and to serve the needs of multithreaded AN programs. To provide the second level schedulers with as much help as possible, a timer callback is provided. This callback is generated directly from a timer interrupt, and provides scheduling granularity down to a few milliseconds, and can be used to detect programs violating their CPU usage limits dictated by the policy corresponding to that program. To improve the robustness and portability of the system, the thread context, that is, the data structure that holds a thread’s current state and allows that thread to be safely descheduled/rescheduled is managed by LARA/PAL, and held in the CIT. Memory Protection As some EEs allow the use of virtually any language, particularly languages that allow the use of explicit pointers, memory management becomes vitally important in maintaining the integrity of the router. LARA/PAL therefore maintains the memory management information on a per thread basis, along with the thread context information mentioned earlier, using a similar technique to modern UNIX operating
LARA: A Prototype System for Supporting High Performance Active Networking
125
systems. This information is also stored in the CIT. LARA/PAL relies on router hardware to detect and trap invalid memory accesses. These violations can then be delivered to the relevant EE for servicing. Dynamic memory allocation is also managed by LARA/PAL. This ensures that a single AN program cannot consume more memory than is defined by its policy. By maintaining state on all allocated memory on a per capsule basis, the amount of memory allocated can be strictly controlled, and reclaimed if a capsule is unexpectedly removed. This state is also held in the Capsule Information Table. Network Bandwidth Management In order to ensure a fair distribution of network bandwidth between AN capsules, we have adopted a per capsule queuing strategy. Placed between LARA/PAL and the network interfaces of the router, the queues enforce policy decisions concerning maximum bandwidth utilisation. Periodically a packet scheduler will service these queues, and deliver them to serialised (per device) transmission queues for delivery. AN capsules may decide not to specify a device when transmitting a packet, in which case the packet will traverse a default routing table, if available, prior to packet delivery. This system employs a similar approach to bandwidth scheduling as has been adopted by other schemes such as RSVP, which employ per flow queuing. Input Packet Filter Many AN capsule programs will need to intercept, modify and collect information from packets flowing through the router. In order to achieve this efficiently, the capsule must be able to specify accurately a pattern that can be matched with incoming packets, in order to trigger delivery to the correct capsule. This pattern is associated with the capsule ID in the Packet Filter Table. Several software packages are already available to achieve this, the most common of which is the BPF (Berkeley Packet Filter), which is already in use by many operating systems. We have adopted BPF as a standard approach for specifying these filters. A capsule can specify any number of filters, in accordance with the limits defined by its policy. However, the correct operation of the system when multiple overlapping filters are specified by different capsules is still an unresolved issue.
4.3 LARA/MAN Before an active network capsule can be loaded into an EE, the capsule must be identified and authenticated to ensure the program is from a trusted source. All code to be installed into a LARA node must carry an ANEP encapsulation header. This header is used not only to identify the EE into which the capsule is to be loaded, but also allows any extra options to be parsed contained within the header. This provides scope for future expansion. There are two stages to the authentication process. The first stage is a simple admission control policy, thus allowing the use of simple “black card/white card”
126
R. Cardoe et al.
filters based again upon BPF filters. This allows AN network administrators to easily block security attacks from known malicious sources, and apply blanket filters to improve security – for example, rejecting all AN programs sent from outside the local domain. The second stage of authentication is EE dependant, and is likely to involve much more complex authentication mechanisms, such as public key encryption, Kerberos, or more proprietary solutions such as the SANE [13] mechanism. If a capsule passes these authentication tests, control passes to the policy database. Based on information contained within the ANEP header, LARA/MAN will attempt to find a policy matching the capsule. If a policy cannot be found, the capsule is administratively rejected, and dropped. If a policy is found, however, this policy information is passed to the LARA/PAL, which forms the basis of a CIT entry for this capsule. The capsule is then launched into the relevant EE for initialisation and execution.
4.4 LARA/RT LARA/RT was designed as an execution environment primarily for testing and development of the LARA system, LARA/RT uses as many LARA primitives as possible and is still very much work in progress. While a detailed description of this environment is beyond the scope of this document, this section gives a brief overview of the structure of the EE. LARA/RT assumes that all active code is dangerous and should not be trusted. This is reflected in the strong process and memory protection that is provided within the LARA/RT environment creating strict boundaries for an active code module. This type of partitioning which is similar in nature to the sand-boxing technique used in programming languages like Java removes any language specific ties, while providing the same functionality. It also means that LARA/RT is language independent, as long as the code can be compiled down to a kernel loadable module. This is viewed as an important feature of the architecture that removes another constraint to the flexibility of the system. Allowing multithreading of capsules, a strict pre-emptive scheduler ensures CPU access is controlled, and violating capsules can be suspended or removed. The LARA/RT environment uses the router alert option available in IPv6 to get active code into the router. A packet containing active code arriving at the node will flag the router that it contains code rather than data to be forwarded, using one of the options fields in the hop-by-hop extension header of the protocol. This forces the router to examine the packet more closely. The semantics of which in our architecture mean that the code is extracted from the payload of the packet, and submitted for admission control checking and authentication. Once verified, the code is can be compiled to a kernel module and inserted into the LARA/RT environment where it is registered with LARA/RT.
LARA: A Prototype System for Supporting High Performance Active Networking
127
Fig. 3. LARA Architecture
5
Realisation
Our approach to constructing a network node with the capabilities described above involves two elements; one hardware, the other software. This is motivated by our belief that such a device will require specific hardware and software requirements in order to competently handle the increased resource demands of active networking. This section presents the implementation of that architecture, examining first the hardware platform known as Cerberus and followed by the software platform, LARA. Cerberus comprises two types of component, an active router controller and one or more forwarding engines (Fig. 1). The controller module consists of a 350Mhz Pentium II processor coupled with 128MB of main memory, a hard disk drive and a 100Mbps Ethernet hub. A forwarding engine is comprised of two ports, currently 100BaseT Ethernet, however these are each coupled to their own dedicated 266Mhz Pentium II with 128MB RAM. There are two busses in the router, a 200 Mbps control that interconnects the controller and each of the ports, and a high bandwidth data bus that links the network ports. The data bus is based on SCSI hardware, running modified device drivers that permit speeds over 0.5Gbps. The current prototype contains an active router controller and one forwarding engine with two Ethernet ports, (Fig. 4).
128
R. Cardoe et al.
Fig. 4. Lancaster Active Router Prototype
The system is modular in nature, has been designed to be dynamically extensible, realised as multiple 19” rack mountable units. Each of the forwarding engines is diskless and requests at boot time via the control bus, that the controller supply an O/S image using BOOTP. The forwarding engine then uses NFS to mount a root file system and run shell scripts to configure both internal and external network interfaces, routing information and install the LARA execution environment. This provides a network administrator with the ability to add another forwarding engine to the system without interrupting current operation and bring another two network ports online. Similarly, if one forwarding engine or port goes down, this loose coupling prevents total system failure and only that element is affected. The resources of network routers and switches are already being strained in current networks, and that is without any additional active processing taking place. Adding the complexity of an active network environment, its management and security systems, in addition to the workload caused by AN programs within that environment, represents a significant increase in the resource requirements of that device. The hardware design of Cerberus is thus motivated by the need to provide an acceptable level of performance whilst running this environment and dealing with active network traffic. Our design attempts to bring significant processing and memory resources to each network port to achieve this. A further consideration in the design was scalability; this was achieved in a dynamic manner by centralising the code with the controller and downloading this to forwarding engines as they come online and request it. The final design decision is reflected in the choice of hardware, off-theshelf PC components and constructed in a rack-mounted unit. The data backplane is another example, the use of SCSI as an inexpensive and widely available technology that could be tailored to provide sufficient bandwidth. The second element of our system is software; Cerberus runs a modified 2.1.125 development version of the Linux kernel. As before, a modular approach has been
LARA: A Prototype System for Supporting High Performance Active Networking
129
taken in the design of LARA (See Figure 2), which is based within the kernel and has been implemented as a kernel loadable module. This provides a clean separation between the active networking environment and the remainder of the kernel. However, it also introduces an element of safety, a Linux kernel module can be dynamically inserted and removed without affecting the rest of the system. So, if trouble does ensue and requires the removal and re-initialisation of the LARA environment or one of its components it can be accomplished without damaging the entire system. Essentially, LARA is a micro operating system inside the Linux kernel that provides scheduling, process and memory protection, policing and queuing facilities to execution environments (EE). It is the responsibility of an individual EE to utilise these abstractions and provide an environment in which to run active code instances. Execution environments too are implemented as kernel modules. Much of the LARA system has been written using the C language, although several portions of the scheduler and memory management subsystems required some use of assembler, for particularly low-level operations. In order to achieve the performance level that was required our LARA has been designed and implemented as zero-copy architecture. Other than those copy operations that are carried out by active processes, there are no copy operations performed on packets by the router. Most modern network interface cards utilise DMA operation to avoid unnecessary overheads and our SCSI data backplane has been implemented using the same techniques. This zero-copy requirement can also be seen in our decision to create the LARA environment inside the Linux kernel, avoiding crossing the user/kernel space boundary for every packet that must to be processed. This has been a significant problem with the performance of other active network systems, that are for the most part, implemented in user space. The design of Linux socket buffers also aids our quest for performance, they have a clear linear design that requires no extra copy operations or complex scatter gather functions when moving or DMAing them from main memory into devices. The SCSI backplane represents an important element of our architecture due to the need for a high bandwidth interconnect for our distributed network ports. It is based upon on-board Adaptec 7895 Ultra Wide SCSI host controllers. These SCSI host adapters conform to the SCSI-II protocol standard that specifies a paradigm called ‘target mode’. Target mode allows a host controller to act not only as an initiator of a transaction but also as the target. Two host adapters can then communicate over the bus at speeds of over 500Mbps. This required significant rewriting of the firmware that is downloaded to the SCSI controller at boot time, in addition to modifications of the low-level device drivers to permit socket buffers to be transmitted rather than the usual filesystem blocks. This version of the SCSI controller features two channels, which means that the effective bandwidth of the bus is doubled, from 40 to 80 MB per second. The current implementation has these arranged in a simple bus topology as illustrated in figure 1, however, there is the possibility of alternative strategies. For example a slotted ring may be used that could yield up to 300Mbps between any two ports at a time. Another topology could be a variation of the DQDB fair queuing protocol that uses two busses, one as an up stream and the other as a down stream. Research is on going in this area to construct the most efficient use of this versatile technology.
130
6
R. Cardoe et al.
Results and Conclusions
Cerberus and LARA form a prototype system that currently has the major components in place. The Cerberus infrastructure is complete, in addition to the basic LARA/PAL and LARA/RT functionality. Significant work has been carried out on the construction of the SCSI backplane that needs to provide high bandwidth support for the distributed network ports, and some preliminary results have been obtained. As noted earlier, speeds over 500Mbps have been achieved using a simple dual bus topology. One unexpected result was also recorded, relating to the performance of the backplane under certain circumstances. SCSI is most often used as an access protocol for devices such as hard disks and CD-ROM drives, and as such, is optimised for bulk or block transfers. It was discovered that throughput was very closely linked to the packet size that was transmitted over the bus. For example, 1k packets achieved a rate of 120Mbps, while 8k packets could be transmitted at speeds approaching 280Mbps over a single bus. The extent to which the packet size affected throughput was surprising, and initially appears to be a great disadvantage. However, when we consider that a line rate of 120Mbps per port is not unreasonable, and that packet size is likely to increase anyway [14], the use of this technology as an inexpensive solution becomes clear. If greater throughput or port density is required, however, a switch fabric solution would be more appropriate. Although the research described in this paper is very much work in progress, some conclusions can be drawn from our experiences. Firstly, the need for a well defined router abstraction layer, such as LARA/PAL, is evident, in order to support a wide variety of network programming paradigms. Secondly that an inexpensive high performance AN router can be constructed, in order to facilitate experimentation into active network technology. Thirdly, the need for an open interface approach as an integral part of an active network system is essential to providing long term flexibility and extensibility. We believe that the only way to truly support high flexibility in active network systems is to support many execution environments, and enforce a wide variety of customisable policies.
7
Future Work
We intend to extend the work presented in this paper in a number of ways. Firstly, development, testing and evolution of the LARA architecture will continue. Other, larger scale prototypes of the architecture are also planned, based on a switch fabric backplane, therefore increasing the capacity and scalability of the router. Continued work will also be carried out to discover the best mapping between the LARA architecture and the dedicated processor architecture. For example, which port/processor should an AN program be executed on? Plans are also underway to extend the authentication process used by LARA/MAN to accommodate per user authentication, thus allowing some degree of policy migration.
LARA: A Prototype System for Supporting High Performance Active Networking
131
References [1] Ghosh, A and Fry, Javaradio: an application level active network, Proceedings of the Third International Workshop on High Performance Protocol Architectures (HIPPARCH'97), Uppsala, Sweden, Uppsala University, Uppsala, Sweden, pp 1-11 [2] D. Scott Alexander, Marianne Shaw, Scott M. Nettles and Jonathan M. Smith. Active Bridging. CIS Department, University of Pennsylvania. Proceedings of the ACM SIGCOMM'97 Conference, Cannes, France, September 1997. [3] David J. Wetherall, John V. Guttag and David L. Tennenhouse. ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. Submitted to IEEE OPENARCH’98, San Francisco, CA. April 1998. [4] Carl A. Gunter, Scott M. Nettles and Jonathan M. Smith. The SwitchWare Active Network Architecture. CIS Department, University of Pennsylvania. November 1997. IEEE Network Special Issue on Active and Controllable Networks, vol. 12 no. 3, pp. 29 – 36 [5] Yechiam Yemini and Sushil da Silva. Towards Programmable Networks. In FIP/IEEE International Workshop on Distributed Systems. Oct 1996. [6] J.E. van der Merwe, S. Rooney, I.M. Leslie and S.A.Crosby. The Tempest – A Practical Framework for Network Programmability. Computer Laboratory, University of Cambridge. November 1997. http://www.cl.cam.ac.uk/Research/SRG/dcan/dcan_html/papers/ieee-network.ps.gz [7] Samrat Bhattacharjee, Kenneth L. Calvert, Ellen W. Zegura. Implementation of an Active Networking Architecture. College of Computing, Georgia Institute of Technology. 1997. Presented at Gigabit Switch Technology Workshop, Washington University, St. Louis, July 1996. [8] Samrat Bhattacharjee, Kenneth L. Calvert, Ellen W. Zegura. On Active Networking and Congestion. Technical Report GIT-CC-96/02. College of Computing, Georgia Institute of Technology. 1996. ftp://ftp.cc.gatech.edu/pub/people/bobby/an/publications/git-cc-96-02.ps.gz
[9] D. Scott. Alexander, Bob Braden, Carl. A. Gunter, Alden. W. Jackson, Angelos D. Keromytis, Gary J. Minden, David Wetherall. Active Network Encapsulation Protocol (ANEP). RFC Draft. http://www.cis.upenn.edu/~angelos/ANEP.txt July 1997. [10] Beverly Schwartz, Wenyi Zhou, Alden W. Jackson, W. Timothy Strayer, Dennis Rockwell, Craig Partridge. Smart Packets for Active Networks. BBN Technologies. 1998. http://www.net-tech.bbn.com/smtpkts/smart.ps.gz [11] Active and High Confidence Networks - DARPA BAA 97-04. http://www.darpa.mil/ito/Solicitations/PIP_9704.html [12] Ken Calvert, et al. Architectural Framework for Active Networks, July 3 1997. http://www.dyncorp-is.com/darpa/meetings/anets98jul/framework.html [13] D. Scott Alexander, William A. Arbaugh, Angelos D. Keromytis and Jonathan M. Smith. Performance Implications of Securing Active networks. CIS Department, university of Pennsylvania. 1998. http://www.cis.upenn.edu/~angelos/Papers/saneimp.ps.gz [14] Craig Partridge, Gigabit Networking, Addison-Wesley publishing, 1993. ISBN 0201563339
A Programming Interface for Supporting IP Traffic Processing Ariel Cohen and Sampath Rangarajan Bell Laboratories, Lucent Technologies, Murray Hill NJ 07974, USA
Abstract. There are many possible applications for network elements which support packet routing along with programmable packet manipulation. The goal is to design a programmable network element which provides the right “building blocks” for programming a variety of applications while providing good performance and a powerful programming interface. We explore an approach where a “fast path” component called a dispatcher (implemented in an OS kernel or in hardware) performs common packet processing tasks on behalf of programs running on the network element. The goal of this approach is to perform the processing of most packets within the dispatcher, while sending only few packets to the programs. Currently, the dispatcher can perform network address translations, TCP sequence number adjustments, and TCP window size adjustments on behalf of the programs. Additional dispatcher functionality such as packet queueing and scheduling is planned. This paper focuses on the programming interface for the dispatcher.
1
Introduction
In recent years, a number of new network elements have been introduced to address the emerging needs of the users of the Internet. These new products implement some combination of the following functionalities: firewalling, layer 4 switching, network address translation (NAT), bandwidth management, virtual private networking (VPN), load balancing, and transparent web caching. Such products typically support some IP routing functionality in addition to the new functionality that they provide. As with traditional IP routers and switches, these elements can be configured but not programmed. Thus, for example, there is no possibility of changing a load balancing algorithm by simply loading a different program into a load balancing unit, and there is no way of changing the transparent web caching mechanism utilized by a unit. Obviously, it is impossible to do something even more radical such as turning a load balancing product into a transparent web caching unit unless such functionality is already programmed into the unit. This state of affairs prompted us to study the possibility of providing programmability in a unit that could serve in the roles mentioned above. Hence, our focus is on adding programmability to network elements placed in front of clients or servers, or at the edge of enterprise networks. We call such elements programmable gateways. We use the term gateway to distinguish these elements Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 132–144, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Programming Interface for Supporting IP Traffic Processing
133
from the more general programmable routers studied by active networking researchers [8]. A programmable gateway would have a number of advantages when compared to the non-programmable specialized gateways mentioned above: – Gateway capabilities can be easily changed or enhanced. This makes the gateway more flexible, customizable, and adaptable as needs change. – New functionality can be provided quickly by loading software modules into the programmable gateway even when the ultimate goal is to provide the new functionality in a hardware module. The specialized hardware (which takes longer to develop) can be provided at a later stage. – Applications running on servers can inject programs into a gateway which is placed in front of the servers. For example, web server software may consist of software running on the servers as well as code running on the gateway. The gateway program will interact with the server software for the purpose of performing intelligent load balancing. – A programmable gateway can serve as a test bed for novel uses of packet manipulation in the network for research purposes. Programs for a programmable gateway may be provided by the developer of the gateway, third party developers of gateway software, developers of applications that can benefit from specialized packet processing (such as the web server software mentioned as an example above), and even customers. Our research prototype of a programmable gateway is called NEPPI (Network Element for Programmable Packet Injection). The NEPPI project has two related directions: one is to study possible applications of a programmable gateway, and the other is to study programmable gateway architectures which provide an appropriate infrastructure for executing the applications. This paper focuses mostly on the second direction, but some discussion of the applications is provided to motivate the choices made when designing the infrastructure.
2
NEPPI’s Applications and Place in the Network
NEPPI is targeted at certain locations within an enterprise network. Figure 1 shows some possible locations for NEPPI. NEPPI1 is located in front of clients. In that location it may serve as a TCP traffic redirector. For example, it may redirect HTTP requests transparently to web caches. Transparency in this context means that the clients do not need to specify a proxy in their browser settings, and the HTTP requests received by a web cache are indistinguishable from those generated by a browser which specifies this web cache in its proxy setting. This application belongs to a class of applications where NEPPI is used to achieve an effect by intercepting and manipulating packets on the path between endpoints while keeping the endpoints oblivious to the existence of NEPPI. In this case, NEPPI manipulates both the headers and the payload of IP packets. Header changes are needed to achieve the redirection, while payload changes are needed to add the origin server name to packets carrying HTTP GET requests, since the origin server name (which is required by the proxy cache) is not included
134
Ariel Cohen and Sampath Rangarajan
in the URL requested by the browser when a proxy cache is not specified in the browser settings. The changes in payload result in additional changes in the headers: TCP sequence numbers must be modified to take into account the bytes added to the packets. Hence, NEPPI manipulates the packets in a way that will achieve the desired effect while satisfying the requirements of the endpoints both at the network protocol level and the application level. See [5] for more details about transparent HTTP redirection with NEPPI.
Enterprise Clients Ethernet
VPN Gateway
NEPPI1
Access Concentrator
Modem
PC
IP Switch
Caches
NEPPI3
Additional clients
Internet IP Switch
NEPPI4
Firewall
Servers Ethernet
Caches
Another part of enterprise WAN
NEPPI2 Router Additional servers
Another part of enterprise WAN
Fig. 1. Possible locations for NEPPI within an enterprise network
NEPPI2 is placed in front of a server farm. Its role may be that of a load balancer for TCP traffic such as HTTP, FTP, or telnet. It may implement any desired load balancing algorithm along with mechanisms for server failure detection (so that traffic to a failed server can be directed to another server). When acting in this role, NEPPI will have to perform only packet header changes for the purpose of redirecting HTTP and telnet traffic, while active FTP may require both header and payload changes since IP addresses and port numbers are carried inside the payload of certain packets. When NEPPI is placed in front of servers, one can imagine NEPPI-aware applications running on the servers as well as applications which are oblivious to NEPPI. For example, standard web server software would be oblivious; NEPPI-aware web server software would load the
A Programming Interface for Supporting IP Traffic Processing
135
load balancing program into the NEPPI device and interact with this program throughout its operation (e.g. for the purpose of sending it load information). NEPPI3 can be viewed as a “virtual server”. Clients will send requests to NEPPI3, and NEPPI3 will redirect the requests to the actual servers which may be located across the WAN. Such requests may include FTP, HTTP, telnet, rsh, etc. In the case of rsh (remote shell), NEPPI3 may load balance the execution of programs across a number of machines in a cluster: clients execute programs by performing rsh on NEPPI3, and NEPPI3 redirects the rsh requests to machines in the cluster. All this is achieved by manipulating the IP packets arriving at NEPPI3, and injecting the modified packets back into the network. NEPPI4 sits at the edge of the enterprise network immediately behind the firewall. This location gives NEPPI the opportunity to perform more global functions, potentially for the entire enterprise. For example, it may serve as a transparent web cache redirector for all clients within the enterprise. In fact, the functions of an edge device such as a firewall or a VPN gateway may be included in the NEPPI device as well.
3
NEPPI Architecture
The architecture of NEPPI is illustrated in fig. 2. Gateway programs are userlevel processes, while everything under the dashed line is in the kernel. The dispatcher is a loadable kernel module which is responsible for most of the NEPPI functionality. Gateway programs communicate with the dispatcher for the purpose of obtaining, manipulating, and injecting packets. Gateway programs send the dispatcher rules that specify the properties of packets on which they wish to operate. Such rules may include IP address ranges for the source and destination, TCP/UDP port number ranges, etc. Based on the rules obtained from the gateway programs, the dispatcher generates packet filter rules which are sent to the Linux packet filter1 . An arriving packet which triggers a rule is sent from the packet filter to the dispatcher which either sends it to the requesting gateway program, or manipulates it in accordance with a manipulation rule specified by the requesting gateway program (such manipulation rules include address translations, TCP sequence number changes, and TCP window size changes.) It is currently required that no overlap exist among the sets of rules submitted by different gateway programs. Packets generated or modified by the gateway programs are sent to the dispatcher which in turn modifies them further (if they match a manipulation rule), and injects them into the network. In the case where the destination address of a packet is the NEPPI device itself, the dispatcher will hand the packet to the protocol stack instead of the network interfaces. In either case, the packet will bypass the packet filter, hence we avoid infinite loops where packets sent by gateway programs keep matching packet filter rules and are repeatedly 1
We are using the new Linux firewall support (Linux IP Firewall Chains) which has been included in the official Linux kernel since version 2.1.102.
136
Ariel Cohen and Sampath Rangarajan
forwarded to gateway programs by the dispatcher. This also means that in our current architecture a packet produced by a gateway program cannot be subsequently further processed by any gateway program.
Packet flow Gateway program
Gateway program
Rule flow Gateway program
Gateway program
Gateway program
Gateway program
/dev/neppi
Dispatcher
Protocol stack
Dispatcher
Packet filter Packet filter
Network interfaces
Fig. 2. Packet flow and rule flow in NEPPI
The dispatcher acts as a device driver for the device /dev/neppi. Gateway programs communicate with the dispatcher by performing library function calls which operate on this device. These functions are described in section 5. One minor change to the packet filter was required to support our architecture. The change was needed to enable the dispatcher to request the packet filter to call the dispatcher on packets which match the rules submitted by the dispatcher, so that the dispatcher can process these packets or submit them to a gateway program.
4
Packet Processing
Packets are processed at up to three stages. This is shown in the flowchart of fig. 3. A packet which does not match any of the rules submitted by gateway programs is processed in the standard way by the protocol stack. Other packets are given to the dispatcher. The dispatcher may manipulate a packet by
A Programming Interface for Supporting IP Traffic Processing
137
carrying out operations requested earlier by gateway programs (currently, these operations are address translations, TCP sequence number changes, and TCP window size changes.) Alternatively, the dispatcher may forward the packet to the requesting gateway program for processing. Once the packet is processed by the gateway program, it is received by the dispatcher which may further process it before re-injecting it into the network.
Packet arrival
(a packet filter rule previously submitted by the dispatcher based on a gateway program request)
(determined mostly by examining the TCP flags)
Packet fires dispatcher rule?
No
Standard processing of the packet
Send packet (if needed)
Yes
Packet to be sent to requesting gateway program?
No
Dispatcher manipulates the packet according to previously given instructions from a gateway program
Yes Requesting gateway program manipulates the packet
Fig. 3. NEPPI packet processing
One of the goals of this architecture was to perform packet header manipulations which are generic and which occur on a large number of packets (such as address translations and TCP sequence number changes) at the dispatcher, while performing manipulations (including payload modifications) which are more specialized and which occur only on isolated packets at the gateway programs. Since packet processing at the gateway programs which reside in user space incurs significantly more overhead than packet processing at the dispatcher, we have a “fast path” for most packets, and a “slow path” for only a few packets. For cases where the performance penalty incurred by a user-level gateway program is unacceptable, NEPPI supports gateway programs in the form of loadable kernel modules. However, user-level gateway programs offer significant advantages such as having less potential for adverse effects, and easier programming and testing.
138
Ariel Cohen and Sampath Rangarajan
An interest in a potential implementation of NEPPI on a hardware-based switch provided further motivation for studying the separation of the packet processing functionality into two tiers: a specialized tier implemented in software which examines and manipulates relatively few packets, and a more generic tier implemented in hardware which manipulates a large number of packets in accordance with instructions provided by the software tier. Alternatively, in the case of an intelligent switch with port processors and a central processor, the dispatcher functionality may be supported by port processors, while the gateway programs execute in the central processor.
As illustrated in fig. 3, the decision on whether a packet should be forwarded to a gateway program or whether it should only be processed by the dispatcher is based mostly on the TCP flags within the packet. In order for the reasons for this to become clear, it is important to understand the structure of a typical gateway program. Gateway programs start by declaring flows of packets on which they wish to operate. A flow is defined by ranges of source and destination IP addresses, and source and destination TCP/UDP ports. Note that many TCP connections may belong to a single flow. A gateway program may request to receive all packets within a flow, or only packets which have certain TCP flags set (such as SYN, ACK, or FIN). Other packets within the flows declared by the gateway program will be processed by the dispatcher according to instructions given by the gateway program (if any); such packets will not be forwarded to the gateway program. As an example of this approach, consider a gateway program which performs redirection. When declaring the flows, the gateway program will request all packets with a set SYN, FIN, or RST flag, and no other packets. By examining these packets, the gateway program can detect the beginning and end of individual TCP connections within the flows that it declared. Once a SYN packet is received by the gateway program, a redirection decision can be made. The gateway program submits an address translation instruction to the dispatcher followed by the SYN packet. From that point onward, all non SYN, FIN, or RST packets belonging to this TCP connection will be manipulated solely by the dispatcher. Since the gateway program monitors the FIN and RST packets, it can detect the end of a TCP connection, and instruct the dispatcher to remove the corresponding address translation.
In addition to address translations, the dispatcher may perform TCP window size translations which may be used for bandwidth management purposes, and TCP sequence number adjustments. These are needed when the gateway program modifies the payload of packets in a way that results in a packet size change. A gateway program for redirecting active FTP traffic is an example of such a program. Since an ASCII IP address and port number information are carried within the payload of some packets, redirection may result in a need to perform packet payload modifications which may result in a need to adjust TCP sequence numbers on all the following packets within the TCP connection.
A Programming Interface for Supporting IP Traffic Processing
5
139
Programming Interface
Gateway programs are linked with libgw, which is a library used to communicate with the dispatcher. In some cases it may be needed for the gateway programs to communicate with application programs as well. For example, a proxy program may accept a TCP connection from a client, establish a TCP connection to a server, and then request a gateway program to splice the two connections together [9]. Such communication is supported by NEPPI, but it is not discussed further here. The libgw interface is now described in some detail. 5.1
Structures
The relevant constant and structure definitions for libgw are shown in fig. 4, and a description of the structures is now presented: flow desig s. This structure describes the properties of a packet flow requested by a gateway program. src, src mask, and src ports contain the source IP address, an address mask designating the significant bits, and the source TCP/UDP port range for the packet flow. dst, dst mask, and dst ports similarly describe the destination IP address, address mask, and destination ports. protocol specifies the IP protocol. inv flgs contains the logical or of flags which specify which of the flow properties should be inverted (for example, we may wish to request all packets which do not come from the specified port range.) The following flags can be used: IP FW INV SRCIP (invert the source addr/mask), IP FW INV DSTIP (invert the destination addr/mask), IP FW INV SRCPT (invert the source ports), IP FW INV DSTPT (invert the destination ports), IP FW INV PROTO (invert protocol). mark lets the gateway program assign marks to packet flows so that it can distinguish between packets belonging to different flows just by looking at the marks included with the packets it receives from the dispatcher (packets will carry the mark assigned to the flow to which they belong.) type can be set to one of three values: INPUT, FORWARD, or OUTPUT. It is usually set to INPUT. This value determines the stage in the Linux firewall operation when the packets belonging to the flow will be intercepted. pkt msg s. Structures of this type carry the packets transferred from the dispatcher to the gateway programs, and from the gateway programs to the dispatcher. msg type contains IP PKT (a constant designating this message type). mark contains the mark associated with the flow to which this packet belongs. This mark is declared by the gateway program when it requests a packet flow from the dispatcher (see the description of flow desig s above). This field is ignored for packets sent from the gateway programs to the dispatcher; it is meaningful only for packets flowing from the dispatcher to the gateway programs. pkt len is the total length in bytes of the packet which appears in the pkt
140
Ariel Cohen and Sampath Rangarajan
field. This field is ignored for packets flowing from the gateway programs to the dispatcher (the dispatcher discovers the packet length by looking at the packet itself.) trans bypass is a flag which can be set by the gateway program to request the dispatcher to send the packet as it is (i.e. without performing any translations on it). The dispatcher always clears this flag before it sends a packet to a gateway program. pkt is the IP packet (including headers and payload).
trans desig s. Defines translations to be performed by the dispatcher on behalf of a gateway program. The translations apply to all packets within the flows requested by the gateway program which contain the source IP address, source port, destination IP address, and destination port which appears in the orig field of the address translation. The fields of trans desig s are: addr trans: defines an IP address and TCP/UDP port translation from orig to new. All packets within the flows requested by the gateway program which carry the source IP address, source port, destination IP address, and destination port defined in orig will be modified to carry the values defined in new. Note that values for the orig part of the address translation must be specified by the gateway program for all translations, even for translations which do not include an address translation. orig always serves to designate the packets to be translated. seq trans, ack seq trans: declare TCP sequence number or TCP acknowledgement sequence number translations. These fields declare a starting sequence number (start seq num), an offset (seq offset), and a flag (add) for specifying whether the offset should be added or subtracted. The sequence number in packets carrying a sequence number larger than start seq num is incremented or decremented by the specified offset. Similarly, the acknowledgement sequence number may be adjusted. Currently, the dispatcher maintains a history of one sequence number (and acknowledgement sequence number) translation. Hence, if the starting sequence number and sequence number offset are modified by the gateway program, the previous values will be kept by the dispatcher for the purpose of applying this older translation to older packets which may still arrive. win trans declares a TCP window size adjustment which can be used to perform simple bandwidth management. which specifies which translations are being declared. It is the logical or of constants corresponding to the translations types (ADDR TRANS, SEQ TRANS, WIN TRANS).
5.2
Function Calls
The libgw function calls used by gateway programs to interact with the dispatcher are as follows:
A Programming Interface for Supporting IP Traffic Processing struct flow_desig_s { u_int32_t src; u_int32_t src_mask; u_int16_t src_ports[2]; u_int32_t dst; u_int32_t dst_mask; u_int16_t dst_ports[2]; u_int16_t protocol; u_int16_t inv_flgs; u_int32_t mark; char type; }; struct pkt_msg_s { char msg_type; u_int32_t mark; u_int16_t pkt_len; u_int8_t trans_bypass; char pkt[MAX_PKT_LEN]; }; struct conn_s { u_int32_t src_addr; u_int16_t src_port; u_int32_t dest_addr;
141
u_int16_t dest_port; }; struct addr_trans_s { struct conn_s orig; struct conn_s new; }; struct seq_trans_s { u_int32_t start_seq_num; u_int32_t seq_offset; char add; }; struct win_trans_s { u_int16_t win_offset; char add; }; struct trans_desig_s { struct addr_trans_s addr_trans; struct seq_trans_s seq_trans; struct seq_trans_s ack_seq_trans; struct win_trans_s win_trans; u_int8_t which; };
Fig. 4. libgw structures
– int gwp init(int, void (*)()) Initializes the gateway program by opening the log file, specifying signal handlers, and initiating the communication with the dispatcher. The first parameter specifies whether the gateway program should run as a daemon, or not. The second parameter is a function which will be called in the event of a SIGINT or a SIGTERM signal. – int gwp start flow(struct flow desig s *, u int8 t) This function is used by the gateway program when it wishes to request a new packet flow from the dispatcher. The first parameter specifies the properties of the packet flow as described in the previous section. The second parameter is the logical or of constants (F FIN, F SYN, etc.) denoting TCP flags. If one of these flags is set in a packet within the flow, the dispatcher will send the packet to the gateway program. In addition to a logical or of constants denoting TCP flags, there are two other possible values for this parameter: F ALL which requests all packets (regardless of the settings of their TCP flags), and F NE which requests all packets with a non-empty payload. – int gwp stop flow(struct flow desig s *) Removes the specified flow from the set of flows requested by the gateway program. – int gwp recv(void *, int) Receives a packet (pkt msg s) from the dispatcher. The first parameter
142
– –
– – –
6
Ariel Cohen and Sampath Rangarajan
specifies the location for the packet, and the second parameter specifies the maximum number of bytes to be put there. int gwp send(void *, int) Sends a packet (pkt msg s) to the dispatcher. int gwp start trans(struct trans desig s *) This is used by the gateway program to declare a new translation or to update an existing translation. The translation on the relevant packets will be performed by the dispatcher on behalf of the gateway program. gwp stop trans(struct trans desig s *) Removes a translation. void gwp norm term(void) Performs cleanup operations followed by an exit(0). void gwp abnorm term(void) Performs cleanup operations followed by an exit(1).
Related Work
A distinction is made in [1] between two types of approaches to programming the nodes of active networks: active packets (also known as capsules [11]), and active extensions. The active packets approach is based on sending packets containing both a program and data, while the active extensions approach is based on sending the programs to the nodes separately from the passive data packets on which they operate. NEPPI can implement active extensions, and the applications of NEPPI mentioned in section 2 can be viewed as active extensions. Much of the work in the area of active networking studies the active packets approach with its challenging security [2] and programming language issues [7]. In the work presented here, our assumption is that the programs loaded by the parties which are allowed to load programs into the NEPPI node can be fully trusted, so security is not discussed, and the programming language used is C. The results of work on active networks security and programming languages can be applied in our context as well, but these issues are less critical in our more static environment. Some applications of the active packets approach are described in [8]; an active extension operating at the data link layer is described in [3]. Active networking in its most radical form strives to replace the current IP-based internet with an internet carrying a different kind of traffic (active packets) and different network elements (programmable routers executing and forwarding active packets). A more restricted form of active networking called active services is proposed in [4]. This approach preserves the current internet network protocols by allowing the router programs to operate at the application layer, but not at other layers. Similarly, NEPPI operates on standard IP traffic and it preserves the current internet protocols. However, unlike the case with active services, NEPPI applications are allowed to process and modify the IP packets passing through the node as long as the integrity of the transport and application layers at the endpoints is not compromised. A software architecture called router plug-ins for dynamically loading packet processing code modules
A Programming Interface for Supporting IP Traffic Processing
143
(plug-ins) into the kernel is described in [6]. A plug-in is somewhat analogous to a NEPPI gateway program. Similarly to gateway programs, different plugins may be bound to different flows. The purpose of the plug-ins is to support a modular and extensible network subsystem. Plug-ins implementing IPv6 options, packet scheduling, packet classification and routing, and IP security are mentioned in [6]. NEPPI gateway programs operate at a higher level than router plug-ins since the NEPPI dispatcher (implemented in the kernel or in hardware) provides facilities for performing common packet manipulation tasks such as address translations and TCP sequence number modifications on behalf of gateway programs.
7
Conclusions
We are advocating a network element which allows programmable manipulation of IP packets. The main focus of our work is in identifying the building blocks of a variety of potential applications of such a network element. Once such building blocks are identified, they can be implemented for efficient execution within an operating system kernel, or in hardware. A program running on this programmable network element can then implement novel functionalities efficiently by using the provided building blocks in a way that requires only a small number of packets to be processed by the program itself. We see much promise in this approach as a feasible method for enabling the rapid development of specialized network elements based on a flexible generic infrastructure.
Acknowledgements The authors wish to thank Scott Alexander and Chandra Kintala for helpful discussions, Hamilton Slye for performing the Linux kernel programming for NEPPI, and Navjot Singh for developing a GUI for NEPPI applications.
References 1. Alexander, D.S.: ALIEN: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania (1998) 142 2. Alexander, D.S., Arbaugh, W.A., Keromytis, A.D., Smith, J.M.: A Secure Active Network Architecture: Realization in SwitchWare. IEEE Network, 12(3) (1998) 37–45 142 3. Alexander, D.S., Shaw, M., Nettles, S.M., Smith, J.M.: Active Bridging. In: Proceedings of ACM SIGCOMM (1997) 101–111 142 4. Amir, E., McCanne, S., Katz, R.H.: An Active Service Framework and its Application to Real-time Multimedia Transcoding. In: Proceedings of ACM SIGCOMM (1998) 178–189 142 5. Cohen, A., Rangarajan, S., Singh, N.: Supporting Transparent Caching with Standard Proxy Caches. In: Proceedings of the 4th International Web Caching Workshop (1999) 134
144
Ariel Cohen and Sampath Rangarajan
6. Decasper, D., Dittia, Z., Parulkar, G., Plattner, B.: Router Plugins: A Software Architecture for Next Generation Routers. In: Proceedings of ACM SIGCOMM (1998) 229–240 143 7. Hicks, M., Kakkar, P., Moore, J.T., Gunter, C.A., Nettles, S.: PLAN: A Packet Language for Active Networks. In: Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming Languages (1998) 86–93 142 8. Legedza, U., Wetherall, D., Guttag, J.: Improving the Performance of Distributed Applications Using Active Networks. In: Proceedings of IEEE INFOCOM (1998) 590–599 142 9. Maltz, D.A., Bhagwat, P.: TCP Splicing for Application Layer Proxy Performance. IBM Research Report RC 21139 (1998) 139 10. Tennenhouse, D.L., Smith, J.M., Sincoskie, W.D., Wetherall, D.J., Minden, G.J.: A Survey of Active Network Research. IEEE Communications Magazine, 35(1) (1997) 80–86 11. Wetherall, D.J., Guttag, J.V., Tennenhouse, D.L.: ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. In: Proceedings of IEEE OPENARCH (1998) 117–129 142
New Generation of Control Planes in Emerging Data Networks Nelu Mihai and George Vanecek AT&T Labs, Internet Platforms Technology, San Jose, USA {nelu,vanecek}@ipo.att.com
Abstract. For a number of years, the network, software, and telecommunication industries have been working on a number of key technical issues dealing with the infrastructure support for hybrid services. It is now generally understood that there is great value in adding certain common service capabilities to the infrastructure supporting data networks and use them to facilitate hybrid services. Such networks would support the service capabilities at its edges close to its points-of-presence (POPs) while maintaining the control over all network components. This article describes the nature of the enhanced emerging networks and outlines the structure of the service platform that would support its functions.
Introduction The telecommunication industry is cautiously navigating its way through a maze of technological crossroads that will ultimately determine what kind of networks the world will use in the 21st century. In this journey, the industry has already passed several essential crossroads. One was the movement from pure voice over PSTN towards voice over data networks. It is now clear that voice, and video, is just another data type that can be transmitted through packet networks. While no one is proposing that the PSTN be retired, it is no longer a surprise to observe voice, video, and data running over the same high-speed data network. An important crossroad waiting to be passed relates to the question as to how intelligent the future data networks should be. A lesson learned in building the switched network is that there is real value in creating an enhanced network that does more than offer raw end-to-end communication to the users. In building the next generation data network, passing the crossroad will establish whether it is active or passive, smart, dumb, or intelligent, open or proprietary. What ever the network will be, it will most likely be smarter and more active than the present day Internet, and it will be less complex and more open than the present day PSTN. Its exact implementation is not the issue but its inherent nature is. Once the industry agrees on what kind it should be, the technology already exists to build it.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 144-154, 1999. Springer-Verlag Berlin Heidelberg 1999
New Generation of Control Planes in Emerging Data Networks
145
Efforts are underway to use the Internet as a model of data Internetworking, but to drastically reshape the infrastructure to implement a service-enabling environment. As an analogy to our human environments, the network has to support certain capabilities and posses functionality that is characteristic of a civil society. As a token example, consider the requirement for the network to know and keep track of its resources, who or what is using them, and who or what is controlling them. The network and its end-point devices should form a tightly coupled single-resource space from which resources are used to achieve the desired goals. This space needs to be repartitioned to yield maximum utilization with the least overall cost when considering the function of any particular component. This comes from the telecommunication’s industry observations that the full functionality and usefulness of the data networks have been greatly underutilized. This functional underutilization tends to move all costs, responsibility, and complexity to the end-points From this point of view, the resource space needs to be considered in its entirety from end-point devices to the service operating centers. Its overall use can be considered as enabling the creation, provisioning, and management of services. Someone creates a service using the functions of their local resources and the network provided service functionality, provisions it, and offers it for users to subscribe and use. The network can greatly offset the needed member and resource management overhead required. To this end, the industry has to recognize that there are great benefits in pushing some common service capabilities to the network (however that is implemented). It has to 1.
differentiate between application resources (e.g., GUIs), network service resources (e.g., Usage Recording and access control), and the physical network resources (e.g., switching and QoS), and
2.
Agree on the mandatory services (e.g., active directory) vs. voluntary services (e.g., hosting) that the network must guarantee.
Hybrid networks offer much higher functionality than any one homogeneous network. To the services, however, the physical differences should be transparent and hidden behind well defined interfaces and protocols and standardizing on a given IP based platform seems to be an ideal solution.
Refocusing on Data Networks for Enabling Hybrid Services In the early 1970’s the Internet appeared along with the Ethernet, the Internet Protocol (IP), and later the standardization of the protocol stack with ISO thus forming the TCP/IP transport mechanism. Over the past decade or so, the network industry has exerted most of its efforts on achieving higher bandwidth, faster access, lower cost, and greater participation in stemming the tide of increasing control and legislation from the federal government. From today’s technology perspective, very little has changed in the basic framework of the Internet. The Internet is today as it was in the beginning, a packet-delivery, connectionless, best-effort, unmanaged network of networks. Yet, the present rate of
146
Nelu Mihai and George Vanecek
innovation continue to change the network and today the Internet serves as the base model for what the 21st century network should be. The demands and expectations of the network have changed dramatically due mainly to the world-wide-web and its discovery by the consumer and business markets and by a massive investment in innovation. Businesses want security and reliability. Consumers want ease of use and advanced features. And everyone wants the ubiquity and services that are found in the telephone networks. Equally important is the move towards convergence of the telephone network, the Internet, the wireless networks, and many enterprise networks. Interestingly, the Internet was never designed with many of today’s requirements. It grew out of the idea that individual computers could talk to each other. The ideas of distributed information systems and network-based services came as an afterthought and so contributed piecemeal to the ever-changing design and architecture. To many of the original contributors, the bare fact that the Internet is still used today and in its present capacity is a continued bewilderment and awe. As it is, it is well known that the pure client-server architecture does not scale, that it is hard if not impossible to make secure, and that there is little to no support for network based services. In spite of all this, there is a slow but growing realization that the Internet should reform itself into a network of smart, service-supporting networks. Moreover, these networks should seamlessly interoperate with other types of networks. This restructuring can be achieved only if the basic underlining of packet delivery and DNS is upgraded with additional capabilities that are intrinsic to the support of the above objectives. There is no universal set of capabilities that every can agree upon lest it be already done. There is, however, a candidate set of capabilities that can be extracted from the commonly used capabilities in the services and applications that make up the consumer- and business-use of the Internet, and the needs of university campuses and research centers. From the consumer and business sides, services require customer access, which presupposes membership management and tracking. This requires the ability to register and then authenticate users. Services require the safe and secure exchange of information as well as protection from each other. This requires encryption of data, management and distribution of certificates, and the enforcement of access control. Furthermore, services require the tracking of events, usage, and purchases, as well as generation of bills and the means of obtaining payment. In all these cases, some common service capabilities are required of the network. Classically, networked based services and applications can be perceived as using a two tier hierarchy with the network and access tier on the bottom and the application and service tier on top. In this model, all the costs and complexities of the software supporting common service capabilities reside in the top tier. This is the direct result of software replication and lack of a common platform. An enhanced network support calls for a middle tier containing the common service capabilities. Thus, the new model is a multi-tier structure in which the lower tiers define what should commonly be referred to as the “smart” (or enhanced service) network.
New Generation of Control Planes in Emerging Data Networks
147
One major issue in this approach is the level of mediation in the edge routers and switches given the amount of data that flows through them. Consider that the capacity over the Internet has already crossed 10 trillion bits per day and within a decade that will grow by another two orders of magnitude into the 1000 trillion bits range. Today the backbones of many networks are build around a 100 megabit to low gigabit range. Many companies are now aggressively moving towards the design of switches that can handle terabit capacity. To put this into perspective, a terabit capacity can send 300 years of daily newspapers in one second, watch 100,000 simultaneous television channels, listen to 12,000,000 simultaneous telephone calls, or support 10,000,000 Internet users. Today, few can actually deliver terabit switching anytime soon. The problems are not in the technology required to build terabit switches: optical fiber and optical switching and massively parallel architectures with thousands of processors. The problems rest in the functionality of those switches. The trend is towards high-end, non-blocking, dedicated, multi-port, layer 2 through 7 switching, smart, terabit switches. Furthermore, it is now recognized that the network has to support hard VPN, QOS, tunneling, access control, and resource management and monitoring through remotely controlled open interfaces. Basically, the network elements need help from network wide service logic layer to appropriately switch packets. This interdependency between the network fabric components and the upper application and service functions needs to undergo major reengineering requiring a broad-industry standardization.
Physical Network Architecture Let us now look at the edge gateways and consider the technical issues. In today’s network world there is a clear separation among network elements consisting of routers and switches and the network nodes consisting of the hosts that use the network. Except for the communications from some administrative hosts that configure, install and monitor the network elements, no other kind of control and cooperation exists among the network elements and the hosts. The data networks today do not contain much intelligence; they knows how to transport data packets between the hosts by routing and switching the packets according to a configuration profile. In relation to the network topology, the path taken by a data packet is almost unpredictable and varies from moment to moment. The applications generating the packets are at the mercy of the routing protocols and have only an indirect method of influencing them. The protocols can be statically configured and programmed. It is almost impossible for an application running on a host to control the “destiny” of its traffic. That is, to control the way the routing algorithms will route it. It is very difficult to make the association between data flows and the application to which they belong. One of the most difficult tasks is to implement a resource allocation policy in a network environment. To run properly, a multimedia application needs resources in the host such as CPU time and I/O cycles, and in the network elements such as bandwidth and I/O ports. In new QoS protocols
148
Nelu Mihai and George Vanecek
where a certain level of control is necessary, there is no real resource allocation support. An application can make a request for bandwidth reservation, for instance, and the proxied network element can satisfy or reject it but it does not help in finding other means to satisfy that request, simply because it does not have information about the availability of resources in the network. It is up to the protocol itself (RSVP for instance) to solve this problem. However, this is done with a penalty in performance and with generation of extra traffic in the network. DNE
Gate Network Node
Network Element Adaptation Layer
Control and Management API
Low bandwidth Network Element
High bandwidth
Figure 1. Distributed Network Element
A Distributed Network Element (DNE) is a network element which has its main functionality implemented on a system formed by at least a network node (a switch or a router) and a network element, later referred to as a gate. This system behaves like a single virtual network device. The control of the distributed network element resides on both the node and network element and data traffic can be switched or routed through both the node and the network element depending on the traffic requirements of the application. The communication between the two is done using a standard protocol (for instance GSMP) and a dynamic API. The network node can dynamically control the network element’s functionality which includes quality of service, packet filtering, firewall, authentication, resource allocation, encryption and decryption algorithms, routing tables, and data flows. We implemented an object oriented abstract model of the network element. This model can be used for the prediction and modeling of the traffic in large scaled networks. Using this dual system, the network can implement the separation of low bandwidth and high bandwidth traffic. Low bandwidth traffic is mostly generated by the exchange of control messages while high bandwidth traffic carries mostly raw data such as video and audio. A very interesting observation is that for high bandwidth data, the major bottleneck in the network is the host acting as an edge gateway. No matter how powerful the CPU is, just transferring the data from the input network interface to the output network interface in the gateway, becomes a serious overhead for the CPU and a limitation on the sustained throughput of the gateway. The TCP/IP protocol itself is another bottleneck based in the problems with the sliding window mechanism.
New Generation of Control Planes in Emerging Data Networks
149
The main idea here is to manage the traffic so that it avoids any CPU running host in its path between the originating host and network. The data must travel as much as possible using only the network elements. It is true that in this case, we shifted the performance problem from the network node to the network element, but this is desirable because the network element has by far a better performance than any host does. The distributed network element can be implemented on a variety of network elements such as routers, and IP and ATM switches. Furthermore, it is independent of the hosts’ operating systems, and it exports a set of C/C++ and Java APIs. This can be the brick of the next generation of networks where the integration between network nodes and elements is very tight in a similar way that a personal computer’s peripherals (disk, display, etc.) are integrated with its operating system. To illustrate a scenario, let’s suppose that a host wants to connect to a videostreaming server. The traffic needed for the “negotiation phase” (e.g., registration and authentication) is forwarded in the DNE by the network element to the network node. This one verifies the identity of the host, sets the right privileges and checks the host’s traffic requests. If the specifications are for high bandwidth with QoS support, then it commands the network element to open a direct connection between the client and the server through the network element. This traffic does not need to be encrypted by the distributed network element, for it will bypass the network node and therefore it cannot endanger the secure domain. Of course, the traffic can still be encrypted at the application level if it is the case. During the server-client connection the network node is able to monitor the connection, revalidate the server and client identity and redirect the traffic to itself (by controlling the switch) if it is the case in the “ending phase”. On the other side, the protocol is designed in such a way that either the host or the network node can decide the data path. This architecture is very scalable for multimedia applications and avoids an important bottleneck for high bandwidth traffic: the network node. The dual system containing the network node and network element behaves like a single virtual network element but with its functionality implemented in a distributed manner. It promotes a new concept through which using open and dynamic API and standard protocols, the network elements and nodes constitute a tightly coupled dual system. The determining feature of a Distributed Network Element is the flow separation. Flow separation allows that the small volume control flows to be routed through the gate and the proxies, while the large volume data flows to be routed directly through the hardware (the network element). Flow separation can lead to a significant performance improvement while keeping the improved control over the network traffic expected in service network. The main functions needed to be supported are the following. •
Filtering: Rejecting and forwarding packets based on the source and/or destination IP address, port and protocol.
•
Routing/switching: Redirect a stream to a new destination IP address/port combination based on the source and/or destination IP address, port and protocol.
•
Distributed and dynamic control/management of network elements (distributed network elements): Distributed Network Element is a network element which has
150
Nelu Mihai and George Vanecek
its main functionality implemented on a system formed by at least a network node and a network element. This system behaves like a single virtual network device. The control of the distributed network element resides on both the node and network element. Data traffic can be switched/routed through both the node and the network element depending on the traffic requirements of the application. The communication between the two is done using a standard protocol (for instance GSMP). The protocol is designed in such a way that either the client or the network node can decide the data path. This structure is very scaleable for multimedia applications and avoids an important bottleneck for high band-width traffic: the network node. The dual system containing the network node and network element behaves like a single virtual network element but with its functionality implemented in a distributed manner. •
Resource allocation and monitoring: This component engages in monitoring and collecting traffic statistics, utilization and performance data and in its display. As service based network systems are becoming more complex and distributed it becomes critical that the network management system employs a real-time expert management tool which is used to dynamically monitor and allocate system resources and adjust the parameters of the network control algorithms
•
QOS control: Provide an API and formal protocol (eventually evolving to standard) to control supported QoS (now using non-standard or proprietary functionality and API) by access control devices, network elements and applications. Applications define “service spaces” that utilize QoS policies that map to IP++ flow attributes. This allows layer 2 to layer 5 switching/routing. The trend in the industry is to access network elements and nodes in a unified way using directory services and protocols such as LDAP.
•
IP Multicast, tunneling protocol and VPN technologies: Integrated IP Multicast and tunneling protocols with the SNetC increases the scalability and creates virtual private networks using technologies at the network level (for performance reasons).
Service Network Architecture The architecture of a cloud is structured to concentrate most of the service logic as a distributed environment at the edges of the managed network. At the edges, physical Points-of-Presence (POPs) aggregate traffic from hosts and other networks, and send it into the cloud through a Service POP (SPOP) as shown in Figure 2.
New Generation of Control Planes in Emerging Data Networks Hm
H1
Internet Protocol (IP)
Hm
H1
POP1
151
Physical Points-of-Presence
POPn
Service Points-of-Presence
SPOPi Secure IP Backbone SOC1
System Operating Center
SOCn
Figure 2. SNetC’s points of presence and system operating centers.
POP’s are the physical points of presence that aggregate traffic from a variety of hosts and networks into the cloud. This consists of IP traffic ingress from LANs and WANs, terminating PPP connections modems over PSTN, or terminating voice and FAX traffic from a POTS. They also map non-IP traffic into IP such as IPX. Once the traffic passes a physical POP, the traffic is IP based. SPOP’s are the Service POPs that implement middleware layer service logic. In addition to other functionality, SPOPs act as gateways to the backbone that provides a high bandwidth/low latency IP transport to other SPOPs and POPs as shown in Figure 3. SOC’s (System Operating Centers) are SPOP’s that are dedicated to the system monitoring and management purposes of the cloud. SOC’s may, or may not, have POPs connected directly to them as traffic to the SOC’s may flow only through the backbone. For much larger throughputs and number of active users and services, the SPOP can be provisioned as a distributed network element in which a high-speed smart switch supports the network functions and a cluster of gates (without the router/switch functions) supports the service functions. These scenarios are shown in Figure 3. PSTN
SPOP
DNE
POP
Voice to IP
Switch
Gate
GW
Switch
Gate
Core
SPOP Store
Enterprise Network
Store
Store
Gat e
IP Backbone Peer Peer Peer
POP LAN
Modem s (PPP)
Store
Core
Peer Peer Peer
Figure 3. Several Service Point of Presence configured as a Distributed Network Node Element or as a single node machine
152
Nelu Mihai and George Vanecek
The service logic support within an SPOP consists of three different components referred to as core, gates, and stores.
The Network Core is a general term for a cluster of machines containing essential information in directories and repositories regarding user accounts and available services to support user authentication, active directories access control, and usage recording. The authentication information is data the secure network requires to authenticate a user’s identity. A secure session can then be established and maintained for the duration of the user’s connection to the network. Gates implement the security and service functions managed in the core. This includes access control tightly coupled with dynamic firewall protection. with the core and other external networks. These networks include peers, stores and hops all residing outside of the firewall, as well as noncompliant legacy and enterprise networks. Note that one or more gates can be configured within an SPOP and that the gate functionality may be distributed in the form of a DNE. Stores are peers that implement the services relating to maintenance, provisioning and daily use of the cloud. Directories, usage records and mail are just some of the services handled by the stores. The cloud requires that the end-point devices, the traditional clients and servers, actively engage in the network’s service logic functions. This creates a peer-networkpeer rather than a simple client-service model. For a device to be an active member of the network, it must support IP-based communication and an active control channel to a gate. The “control channel” is then used to enable and control services between the peer and the network. An example of this is obtaining directory services and creating secure, authenticated connections. Peers access other peers through gates in the cloud. Access can be via encrypted or decrypted data streams. If a user on the client peer wishes to access the Internet, they would first have to enter the cloud through the gate, be authenticated, then access the gate and hop through which the Internet is connected.
Service and Application Support An essential role of the SPOP is to provide a common base of computing services on top of which both IP services and service delivery and management applications can be built. These services include security, active registry, naming, and usage recording/retrieval. What ties everything together is an IP platform that provides API’s and standard protocols to services and applications. The platform naturally partitions its functions into the network, the service, and the application environment. We refer to these as the Network Operating Environment (NOE), the Service Operating Environment (SOE), and the Application and Operations Environment (AOE).
New Generation of Control Planes in Emerging Data Networks
153
The SOE provides a common set of APIs and tools which can be used by applications to gain access to a set of common functions provided by the SOE. The set of services provided by the SOE are functions that are, as an example, commonly needed across a carriers internal service platform. Instead of creating multiple instances of essentially the same functionality for each service or offer, the SOE provides a single instance of those capabilities to all services and offers, therefore improving cost and time to market. The specification of the SOE must be carefully designed to take into account technical issues dealing security, scalability, extensibility, manageability, programmability, interoperability, efficiency, and commercially. The AOE provides the end-user, value-added applications and the means to manage and maintain all of the customer and business data across the entire platform. The design goal of the AOE is to support a suite of implementation and offer agnostic enduser applications, a single Service Delivery & Management platform to support all Internet services, seamless Customer Care experience, and client functionality to enable customer use of the applications within the AOE. The AOE must be built around a customer-centric data architecture that provides an integrated customer view across the common Internet platform. The AOE functional architecture consists of: • • •
End-user applications such as E-mail, IP telephony, and customer care, Service Delivery and Management functions such as account management, billing management and service management, and Network Care functions such as fault management, capacity management, configuration management, and performance management.
Summary In this paper, we described the general framework for supporting hybrid services in emerging data networks based on an IP platform and architecture. AT&T Labs has been building an IP platform, called GeoPlex, that would support the service network clouds. This platform is now in full use on both internal and external trial demonstrating the benefits to both consumer and business services.
References [1] C. Gbaguidi, J-P. Hubaux, M Hamdi, and A. N. Tantawi, A Programmable Architecture for the Provision of Hybrid Services, IEEE Communications, July 1999. [2]Elizabeth A. Bretz, The Internet, IEEE Spectrum, pp. 32-42, January 1998. [3] M. Hamdi, O. Verscheure, J. P. Hubaux, I. Dalgi, and P. Wang, Voice Service Interworking for PSTN and IP Networks, To appear in IEEE Communication Magazine, 2nd Quarter 1999. [3] D. Bushaus, Who Knows?, TeleDotCom, May 1998. [4] L. Kujubu, Can IP answer AT&T's call? InfoWorld, May 1998. [5] S. Lawson, AT&T Plans migration to IP, InfoWorld, Vol 20, Issue 20, May 1998.
154
Nelu Mihai and George Vanecek
[6] K. L. Calvert, S. Bhattacharlje, et al. Directions in Active Networks, IEEE Comm. Magazine, pp. 72, October 1998. [7] D. F. Vrsalovic, Intelligent, Stupid, and Really Smart Networking, ACM Communications, May 1998. [8] D. G. Messershmitt, The Convergence of Telecommunication and Computing: What are the Implications today?, Proceedings of IEEE, August 1996; also appeared in the reduced form in IEEE Communications Magazine, "The future of computer telecommunication integration", April 1996. [9] R. Gareiss, Is the Internet in Trouble? Data Communications, pp. 36-50, September 21, 1997. [10] University Corporation for Advanced Internet Development, vBNS, Abilene, and Internet2, http://www.ucaid.org. [11] Telecommunication Information Networking Architecture Consortium, http://www.tinac.com [12] Java Advanced Intelligent Network, http://www.sun.com. [13] PARLAY working group, http://www.parlay.org. [14] P. Newman, et al. General Switch Management Protocol Specification, Ipsilon Networks Inc, Palo Alto, Inc, CA, March 96 [15] J. Biswas, J-F. Huard, A. Lazar, K. Lim, S. Mahjoub, L-F. Pau, M. Suzuki, S. Torstensson, W. Weiguo and S. Weinstein, White Paper on Application Programming Interfaces for Networks, Working Group for IEEE P1520, January 1999.
Biographies DR. NELU MIHAI. Dr. Mihai joined AT&T Labs in 1995 and is currently the director of the network infrastructure and distributed multimedia group for the Internet Platform Technology Organization. In 1992 he joined Ready Systems where he designed real time kernels. He served as a technology consultant for United Nations Development Program, and as a research associate at CERN Geneva, Switzerland, Heavy Ions Institute from Darmstadt, Germany and the University of London. In 1982 he joined the Institute of Atomic Physics from Bucharest Romania where his fields of interests were real time distributed operating systems, networks technologies and distributed algorithms. Nelu Mihai received his B.S. in Software Engineering from Polytechnic University of Bucharest, Romania and his Ph.D. in Computer Science from the University of Bucharest. DR. GEORGE VANECEK. Dr. Vanecek is serving as Chief Scientist for the AT&T Labs IPTO. Before joining AT&T Labs, he was on the Computer Science faculty and a research associate with the CAPO Project at Purdue University. In 1994, he was awarded the Teaching Award in the Purdue University School of Science. Prior to his tenure at Purdue, Dr. Vanecek worked at the National Institute for Science and Technology (NIST) in the Engineering Design Lab and the Automated Manufacturing Research Facility. He also worked at IBM on large semantic network support. Dr. Vanecek received his B.S. and M.S. in Computer Science from Purdue University and his Ph.D. from the University of Maryland, College Park Campus.
An Active Networks Overlay Network (ANON) Christian Tschudin Department of Computer Systems, Uppsala University
Abstract. In this paper we report on an overlay network for the Internet that was built to ease the interconnection of active nodes at a global scale. A simple “Active Networks Overlay Network” (ANON) protocol was defined and implemented that enables to create virtual network segments. This overlay abstraction was complemented by simple tools for the automated management of a distributed ANON-based testbed. The HTTP protocol, enhanced with a CGI-script based security protocol (ASD2), is used for code distribution, log file inspection and steering of the active network execution environments. We have set up a global ANON-based ABONE consisting of multiple segments in Europe, the USA, and Japan, enabling the first active packet ever to physically circumnavigate the globe. Keywords: Active Network Testbed, Overlay Networks, ABONE, ANON.
1
Introduction
Several active (or programmable) network programming environments existed at the end of the year 1997. But most of the AN experiments carried out so far were “in-house” tests where access to and control of the computers and network links is straightforward. It was clear that the research community had to confront itself with more realistic challenges, in width as well as depth, by setting up an “active bone” similar to the M-bone or 6-bone. First, setting up such an infrastructure would create the pressure to come up with solutions that can compete with and/or complement the Internet, but second and equally important, it would lead to a test environment of size beyond what can be achieved within a single research group. What is required is an overlay network for active networking experiments. 1.1
Towards a Network of Active Network Segments
The appeal of an overlay network is that it introduces a clear border below which any magic trick is permitted to provide the overlay abstraction and above which
Parts of this work have been done while at the International Computer Science Institute (ICSI), Berkeley USA, and at the Computer Science Department of the University of Zurich, Switzerland.
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 156–165, 1999. c Springer-Verlag Berlin Heidelberg 1999
An Active Networks Overlay Network (ANON)
157
applications should not have to care about the underlying “real” network infrastructure. Two elements have to be considered: (a) techniques to efficiently map the overlay abstraction to the underlying (physical) resources , and (b) the management of the overlay i.e., the mapping policy, (re-) configuration, performance monitoring etc. In this paper we focus on the second aspect, more specifically on the self-configurability of an overlay network. We are interested in an open environment where an overlay segment for active networking “emerges” out of a simple join request without any further administrative intervention. The motivation for introducing an overlay segment abstraction is twofold: First, it is unrealistic to anticipate a single “IP-like” global overlay for active networking. Security concerns, but also operational problems, can be better addressed by having many segments, possibly with different access policies. Second, a network segment is a convenient abstraction to catch the semantics of ethernetlike networks and IP subnets. We see the ANON overlay as a temporary vehicle to run active networking experiments over existing infrastructures and in the future to migrate active networking closer to the existing hardware. Eventually, an ANON overlay segment will coincide with a real piece of networking technology. 1.2
Related Work
Genesis [6] is a recent proposal for a high level architecture that enables to create a hierarchy of virtual networks. Network instances are “spawned” from a parent virtual network, they inherit the parent’s signaling protocols and communication services which they can redefine inside the bounds imposed by the parent. The Genesis kernel is the basis for this architecture: Like an operating system kernel it acts as a resource allocator. “Spawned networks” in fact provide a quite rich process like overlay abstraction and aim at the refinement and nesting of virtual networks. ANON segments, on the other hand, try to be as close as possible to the wire. Related to Genesis, its role would be more to provide a microkernel style hardware abstraction on top of which Genesis network personalities can run. A related and in its goals similar proposal, the X-Bone architecture, was introduced in [12]. The X-Bone “is a system for the rapid, automated deployment and management of overlay networks”. The authors explicitly refer to active networking technology (programmable routers) in order to implement the Xbone. At the same time they point to AN as an application that could benefit from the various forms of overlays that an X-bone may provide. In the paper, a first prototype was expected for Fall 1998, but no announcement has been seen so far on the net (http://www.isi.edu/x-bone). The paper contains an extensive list of overlay networks and related research. A “modest proposal” for a scalable ABone is made in [5]. The authors motivate the need for setting up a managed ABone that consists of a two-level architecture of core and edge nodes. Core nodes would run the most recent and stable AN execution environments and are subject to special administration rules, while edge nodes would be more under the responsibility of developers. Edge nodes are used to access the core ABone. The focus of the paper is more on
158
Christian Tschudin
administrative matters (creation of an Abone Coordination Center ABOCC) and on the management of active nodes (see the paragraph on Anetd below), but also covers the question of access networks for which the authors favor more heavy-weight network overlay technologies. While we agree that additional effort is needed for bringing a scalable ABone into life, we remain sceptical about the plans to start this work by setting up central registries and administration rules. There are no intrinsic technical or administrative reasons to impose a core/edge architecture, moreover that the protocols used to manage this infrastructure all operate on the flat Internet and thus can as well be used for a highly decentralized ABone architecture. Instead of this centralistic approach, which also invites for political struggles over who should control the central registry instances, ANON follows a federated style of internetworking. Anetd (Active NETworks Daemon) is an integrated software package for both active packet forwarding and active node management [9]. A daemon program is in charge of running the AN execution environments, to which it demultiplexes incoming active packets. Outgoing active packets are sent by the execution environments directly, without intervention of the daemon. The Anetd software can be steered by special command packets. Commands include the download of new execution environment binaries via the WEB, fetching of (log) files, and starting and stopping an execution environment. The status of the Anetd-ABONE can be graphically examined on a WEB page (http://sequoia.csl.sri.com:7000/java/abonestat.html): It shows a single, fully meshed network segment with a dozen nodes in the USA, France and Korea. The software is not public. Although usage is free, it is only available in binary form. There are quite some similarities in the goals of ANON and Anetd. The major difference is that ANON attempts to reuse existing functionality: Lynx (for WEB access), PGP (encryption) and the UNIX toolbox are the ANON equivalent of the opaque Anetd monolith (moreover, all ANON software is available as source code). Another difference is that Anetd only looks at the incoming half of the datagram traffic. This means that it cannot create an overlay network, as it never sees outgoing datagrams but fully delegates the delivery of datagram to the IP layer. Overlay specific processing like encryption, or traffic shaping, are difficult to achieve with the Anetd architecture, but could easily be added to the ANON implementation.
1.3
Overview
We introduce the ANON semantics and protocol Section 2. Section 3 provides a description of the implementation and of supplementary tools for managing the ANON nodes. We also report on some experiments that use this infrastructure and ANON limitations, before concluding with Section 4.
An Active Networks Overlay Network (ANON)
2
159
The ANON Architecture and Protocol
The network abstraction provided by ANON is that of a bus–like network which we call a segment. Such an ANON segment interconnects active nodes: It makes them direct neighbors and enables to send unicast as well as multicast messages among them. Figure 2 shows such an ANON segment. A unicast messages is sent from node A to node D; A multicast message is sent from node B to nodes C and E. The message format used for active packets inside ANON is ANEP, the Active Network Encapsulation Protocol [2].
B
A
C
B
A
C
11 00 00 11 00 11
ANON D
000 11 ANON 111 00 000hub 111 00 11 000node 111 00 11
E
D
leaf node
E
Fig. 1. The ANON segment abstraction, and overlay implementation.
ANON introduces its own addressing scheme. The purpose of this is to make addressing independent of the underlying transport technology. ANON addresses do not apply to the active nodes themselves but to communication endpoints. That is, each node registers the endpoints for which it would like to receive messages. This means that unlike ethernet where addresses are usually hardwired into the physical network interface, ANON offers the possibility to define communication endpoints dynamically at run time. ANON addresses are 128 bits long, a node may register several of them. The simplest case is that each node registers exactly one endpoint by choosing a unique ANON address (e.g., by using a good random number generator). Such unique addresses are useful for unicasts. Several nodes can also agree on a common address (endpoint value) that they all register for. This becomes a multicast address because ANON will forward messages for such a destination only to the nodes that registered it. If all nodes register for a single common ANON address, this becomes a broadcast address. 2.1
ANON Network Elements: Leafs and Hubs
ANON has different kinds of network elements: Leaf nodes are the segment’s clients that host one or more execution environment for active networking using one or more endpoints. The segment’s internal network elements are called hubs:
160
Christian Tschudin
They form a fully meshed network and are responsible for the forwarding of active packets. – An ANON segment can be realized by a single hub to which all leaf nodes attach in a star topology. – More naturally, an ANON that consists of several hubs: These hubs provide the ANON communication services in a distributed manner (see right part of Figure 2). – As a special case and for convenience it is also possible to create a “hub–less” ANON by putting two leaf nodes back to back.
2.2
ANON Access and Self-Configuration Protocols
The ANON specification defines two tiny management protocols for coordinating the actions of the network elements. The leaf protocol (LPr) is in place between leaf nodes and their hub. This access protocol mainly serves to register a leaf node’s endpoints. The hub protocol (HPr) is used between the hubs that form the ANON segment: it serves to distribute and update the current list of hubs that belong to this segment as well as the list of registered endpoints. The ANON management messages are part of the ordinary stream of ANEP packets that a network element directs to another one i.e., they also conform to the ANEP standard. Management messages influence the state of the receiver entity, while ordinary messages are subject to forwarding or internal delivery. The various protocols in place are shown in Figure 2.2. Two leaf nodes, both hosting an execution environment (EE) for active packets, attach to an ANON segment that consists of two hub nodes. Messages sent from one EE to the other will be forwarded through the hubs until they reach the destination EE. Because the hubs that form a segment are fully interconnected, there will never be more than two hubs in any communication path between leaf nodes. Hubs do not need to perform any advanced routing.
ANON segment
EE ANON eth leaf node
LPr
LPr
HPr
ANON eth
UDP
hub
ANON IPv4
UDP
UDP
hub
EE ANON
IPv6
UDP leaf node
Fig. 2. The layering of the ANON communication software.
An Active Networks Overlay Network (ANON)
161
Figure 2.2 also shows a possible mapping of an ANON segment to different transport infrastructures. In the example, the sender leaf node attaches to its hub by putting all messages into ethernet frames. The first hub then uses UDP over IPv4 for its communications with the second hub. Finally, UDP over IPv6 is used to reach the destination leaf node. Soft–State Coupling Attaching a leaf node to an ANON hub and registering endpoints implies that some state is kept inside the ANON segment. The management messages are used for setting up this state and also for refreshing it. Inside the hubs all state is linked to some timeout value after which it is removed, unless it is periodically refreshed. Soft–state coupling has the advantage that it quickly adapts to changing situations without manual intervention. Furthermore does a replacement of a running hub (e.g., software update) not require to reinitialize the attached leaf nodes or neighbor hubs: Within a few minutes, the freshly started hub will have automatically learned all connectivity information.
3
An ANON Implementation
ANON was implemented for Linux and Sun Solaris platforms. A UNIX daemon program implements the hub functionality. It listens on two UDP ports: One port is used for all communications with neighbor hubs, the other is used for the access protocol between hub and its leaf nodes. The anond daemon can be steered by sending local UNIX signals (reload configuration, switch tracing on and off etc), logging is done in the proposed ULM format [1]. See the ANON web page at http://abone.ifi.unizh.ch/˜anon. 3.1
ANON Device Drivers and Active Services
A first ANON drivers was written for the Messenger system M0 [13]. Later on, a device driver for ANON was written in JAVA. Only the leaf functionality had to be implemented, which mainly consists in periodically sending the registration request and in filtering out incoming ANON management messages. For demonstration purposes, this driver was added to the ANTS system [15] such that ANTS could be operated on top of ANON. By using its own set of endpoint addresses, the ANTS systems runs completely in parallel with the Messenger system. Work is going on at the University of Geneva towards a connectionless WEB protocol for shipping the content of a WEB document in a single active packet [11]. For this, a JAVA-based server is used that attaches to an ANON segment. Any active network environment co-located on the same ANON segment can send it HTTP request messages and will receive the corresponding WEB page wrapped inside an active packet. The ANON-based ABONE was used as a testbed for “unleashing” a selfdeploying active service: A single service germ was injected into the messenger
162
Christian Tschudin
active network and literally installed itself worldwide [14]. The service messengers use the broadcast facility of ANON to discover “unserved” nodes to which they extend their service.
abone.europe
abone.usa Tsukuba abone. pacific
Berkeley Hanover
Tsukuba
Berlin abone. atlantic
Zurich Geneva
abone. eurasia START
Fig. 3. In 455 milliseconds around the globe with an active packet: The topology of the ANON networks in place on Aug. 16, 1998.
Finally we mention the first physically circumnavigation of the globe by an active packet that was performed over an ANON infrastructure. The active packet needed about 500 milliseconds to make the six long-distance active hops, which the overlay network translated to a total of 160 Internet hops. Figure 3.1 shows the network topology in place for this experiment. 3.2
Steering of the ANON-Based ABONE
The WEB provides an invaluable service for steering an ANON node. We use HTTP to look at anond log files and for remote steering of AN software, including the download of new versions. For this, shell scripts are encoded inside an URL and sent to a remote CGI script that will execute them on behalf of the requesting user. Scripts are digitally signed for security reasons. Our ASD2 software (Alternate Software Deployment Daemon) combines HTTP and PGP with a few lines of shell scripts and a C into a secure environment for remote script execution. Figure 3.2 shows the flow of arguments inside the ASD2 framework. Shell scripts are first signed with PGP. The resulting PGP message is encoded into an URL that is sent to the remote ASD2 CGI script. The signature is verified, the timestamp is compared to prevent replay attacks, and the list of allowed sending host is consulted in order to reject request from non-authorized manager nodes. If all tests are successfully passed, the script is executed and the result is returned as the content of a WEB page. ASD2 and the WEB interface have been used to deploy new execution environments and ANON hub software. Typically, the shell scripts sent to the node under management include downloading the code to install using Lynx, starting the C compiler, installing the binaries, setting up configuration files and starting the execution environments. Nodes in the ANON-based ABONE are currently added after bilateral agreement, resulting in mutual rights on the machines of the involved partners.
An Active Networks Overlay Network (ANON)
163
script passphrase secret key
asc
- allowed hosts - allowed clients (public keys) - timestamps
PGP
netscape or lynx
GET asd2.cgi?AbxGD... httpd
PGP
Result ... /bin/sh < script
asd2.cgi asd2 server
Fig. 4. ASD2 operation: Secure remote execution of UNIX scripts via a WEB interface.
These rights mainly include a project specific UNIX account and the obligation to be responsive to operational problems. Parts of the ANON infrastructure was built in parallel with the GLOMAT (GLObal Mobile Agent Testbed, http://www.cs.dartmouth.edu/ agent/network).
3.3
Limitations
A major issue for any overlay network for AN is the question of Maximum Transfer Unit (MTU). ANON does not provide support for automatic fragmentation and reassembly of large active packets yet, so we simply support the 64k limit imposed by UDP. In a heterogeneous setting as the one shown in Figure 2.2, however, some links may only provide MTUs of 1500 bytes. Having the hubs to know the “MTU of the segment” is not a viable solution, as the MTU also depends on the final links to a remote leaf node. The current state is to let the AN applications discover the MTU themselves by watching for truncated packets. Providing a “magic” fragmentation service looks tempting, but has to be balanced against the distortions at the level of resource consumption and special environments like slow wireless links, where large packets are impractical. While speaking about wireless links we note that the ANON segment abstraction may already be too rich for some environments because we assume a fully meshed hub network. Partial connectivity and unidirectional links, however, is a frequent situation that is difficult to hide. This problem has also been detected for the current ANON overlay for UDP because of routing problems inside the Internet: During the upgrade in Switzerland from the European TEN-34 to the TEN-155 infrastructure,
164
Christian Tschudin
Sweden could not be reached directly although a “two-hop” link from Switzerland via Berlin worked fine! By moving overlays closer to the physical layer and by exposing real connectivity to the active applications, we enable them to handle such situations.
4
Conclusions
The ANON protocol described in this paper provides a simple ethernet-like semantics for overlay network segments. The important aspect of ANON is selfconfiguration: New nodes may attach to an overlay segment at any time. Another contribution is the concept of network-unspecific communication endpoints that provide a simple to manage demultiplexing scheme for parallel AN execution environments. Overlay networks are an important element in the evolution of active networks. Their most important task today is to enable distributed AN experiments to run over current networks (horizontal expansion). This was also the motivation for the ANON work. Their long-term function is to help migrate active network functionality up and down in the protocol layers (vertical expansion). Next generation overlay networks will make use of AN techniques themselves.
Acknowledgements I would like to thank Andres Albanese, Joachim Beer, Kurt Bauknecht, Bob Gray, Jarle Hulaas, Gerrit Kalkbrenner, Kazuhiko Kato, Bj¨ orn Knutsson, David Kotz, Pierre-Antoine Queloz, Fritz Roesel and Alex Villazon for providing the resources necessary for setting up the ANON-based ABONE and for contributing to the global circumnavigation experiment.
References 1. Abela, J.: Universal format for Logger Messages (ULM), Internet draft, July 1997. 161 2. Alexander, D. S., Braden, B., Gunter, C. A., Jackson, A. W., Keromytis, A. D., Minden, G. J. and Wetherall, D.: Active Network Encapsulation Protocol (ANEP), July 1997. 159 3. Braden, B.: Active Signaling Protocols. June 1997. ftp://ftp.isi.edu/rsvp/active signaling/ASP overview.ps 4. Braden, B., Hicks, M. and Tschudin, C.: Active Network Overlay Network (ANON), Dec. 1997. http://abone.ifi.unizh.ch/˜anon 5. Braden, B. and Ricciulli, L.: A Plan for a Scalable ABone – A Modest Proposal. Jan. 1999. http://www.isi.edu/ braden/ABone.ps 157 6. Campbell, A. T., De Meer H. G., Kounavis M. E., Miki K., Vicente J. and Villela, D. A.: The Genesis Kernel: A Virtual Network Operating System for Spawning Network Architectures”. In Proc. IEEE OPENARCH’99, New York, March 1999. 157
An Active Networks Overlay Network (ANON)
165
7. DARPA ITO Research Area Active Networks, Feb 1998. http://www.darpa.mil/ito/ResearchAreas/ActiveNetsList.html 8. Decasper, D., Parulkar, G., Choi, S., DeHart, J., Wolf, T., and Plattner, B.: A Scalable, High Performance Active Network Node”. IEEE Network, Jan/Feb 1999. 9. Ricciulli, L.: Anetd - Active Network Daemon V1.0, Aug. 1998. http://www.csl.sri.com/ancors/anetd 158 10. Peterson, L. and the AN Node OS Working Group: NodeOS Interface Specification, Feb. 2, 1999. 11. Queloz, P.-A. and Villazon, A.: personal communication. 161 12. Touch, J. and Hotz, S.: The X-Bone. 3rd Global Internet Mini-Conference in conjunction with Globecom’98 Sydney, Australia, Nov. 1998. 157 13. Tschudin, C.: The Messenger Environment M0 – a Condensed Description. In Vitek, J. and Tschudin, C. (Eds): Mobile Object Systems - Towards the Programmable Internet. LNCS 1222, April 1997. 161 14. Tschudin, C.: A Self–Deploying Election Service for Active Networks. In Ciancarini, P. and Wolf, A. L. (Eds): Coordination Languages and Models, LNCS 1594, April 1999 (Proc. 3rd Int. Conference on Coordination Models and Languages, Amsterdam). 162 15. Wetherall, D. J., Guttag, J. V., and Tennenhouse, D. L.: ANTS - A toolkit for building and dynamically deploying network protocols. IEEE OPENARCH’98, 1998. 161
Autonomy and Decentralization in Active Networks: A Case Study for Mobile Agents Ingo Busse, Stefan Covaci, and André Leichsenring GMD FOKUS Kaiserin-Augusta-Allee 31, D-10589 Berlin, Germany [email protected]
Abstract. This paper discusses the applicability of mobile and intelligent agent technology for the development of active networks (AN). Agent technology has the potential to enhance the autonomy and decentralization of both deployment and execution solutions in an AN environment. The roles involved in an active network business scenario along with the actor specific requirements are shortly outlined. The main part of the paper introduces an agent-based active network architecture able to be built on top of existing programmable network nodes. The APIs offered by the proposed active node architecture are compared with the peer standardization efforts for open node interfaces. Two example applications demonstrate how ANs can improve the functionality of today’s networks, and how mobile agents enhance the robustness and performance of the AN solution.
1
Introduction
To accommodate the complex nature of the new supply chains, the versatility of higher level networked applications and the increasing number of users and user categories, all of these being characterized by heterogeneity and dynamically evolving requirements, a flexible and dynamically adaptable and extensible networking and communications infrastructure is needed. Most important in this context is the demand for rapid, dynamic and smooth creation/integration of new network services or service features, which is in contradiction with the slow standardization process that usually determines the network evolution. One category of emerging solutions in this context is called Active Networks (AN). Active Networks are composed of nodes that support dynamically programmed services adapting their operation or configuring their internal resources based on userinjected programs (users are network administrators or consumer applications). In this way, an active network is in the position to offer the operator and consumers dynamic programming and customization facilities which can be used by network/ communications services. In this paper, we will show that mobile agent technology is a good candidate for the development of active networks because it offers the benefits of autonomy and Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 165-180, 1999. Springer-Verlag Berlin Heidelberg 1999
166
Ingo Busse et al.
decentralization to the users as well as to the developers of active network technology. The paper is structured as follows: Section 2 gives an overview about the roles in an active network business framework. Section 3 presents an active network architecture based on mobile agent technology and describes the open node interfaces. It outlines the expected benefits of this architecture for each of the roles involved. Section 4 presents application areas for active network technology. Section 5 summarizes the results and concludes the paper giving also an outlook to further work.
2
Active Network Functional Frameworks
The impact of the dynamic software download and of its execution inside the nodes on the overall performance and flexibility of the AN solution calls for new functional frameworks. Thus, active networks have to consist of two major parts: a deployment framework and an execution framework. Whereas the deployment framework introduces a model of how services can be installed on demand, the execution framework describes how the injected components execute and co-operate in order to provide a safe, secure, robust and consistent network. Different from the traditional networking, the deployment and execution frameworks have to provide open network interfaces accessible also by other parties than the network administrators. AN technology has the potential to satisfy the needs of the various groups that exist in the market today. When considering the deployment and the execution frameworks, we can clearly distinguish between two roles: • A network operator, is the party that manages the active network (represents the owner of the AN) • A customer, is the party that uses and controls the service of data transport over this network (is the owner of the active networking service) Both parties have different requirements and responsibilities resulting from their roles in a business case. Customers want to have absolute control of the services that they (create and) use. That means customization of services via customer programmability (from a simple configuration to new protocol deployment), advanced security that they can control and modify, capability to deploy easily new technology they possess via injection of programs to the network, etc. This can be done for instance by active components which are installed on the AN nodes on demand in order to satisfy customer preferences / needs. AN is essentially a user-centric technology. The main advantages AN technology provides to customers are: • Adaptation of the network to the customers demands • Lower investment and higher reliability • More freedom and creativity in devising their own communications solution Network operators in contrast want to have control not only about the services and nodes involved in a certain session. They require complete control about the current state of all resources inside the network and the possibility to control the usage of these resources by customers. AN can introduce an easy-to-use and high performance
Autonomy and Decentralization in Active Networks
167
service management that allows network operators to update node software in a comfortable manner. The new flexibility an operator offers to customers also result in new issues for security, monitoring, and admission control. A network operator benefits from offering an AN by: • Accommodating more customer services • Outsourcing the service management overhead to the customer • Decreasing its dependence on equipment, middleware and solution providers Additionally, a network operator can benefit from an AN infrastructure mainly by implementing and operating system and network management solutions using the AN deployment and execution frameworks. As such these solutions will offer: • • • •
Reduced software maintenance effort (e.g. for software updates) Introduction of new capability without service interruption Better resource allocation because resources are only used if required Lower network management overhead and computational load because of the decentralization and delegation of monitoring and control functionality
AN Technology will increase the capability that both roles offer/demand and in the same time will enhance the robustness (against functional, technological or requirements changes) and performance of such capability.
3
Agent Technology Support for Active Networking
Since the early 90s, the mobile agent paradigm became a growing research area for software technology. Agent technology is showing a growing potential to satisfy the emerging requirements of the information and telecommunications markets and of their players. As such a number of research and standardization organizations (e.g. European Commission, EURESCOM, DARPA, OMG, FIPA, IETF, W3C) as well as commercial companies (e.g. IBM, ObjectSpace, IKV++) are engaged in producing and offering a considerable number of theoretical and practical agent-based solutions. Mobile Agents are autonomous software entities that act on behalf of a person or organization and that are able to change their physical location moving inside a network. An mobile agent based middleware provides mechanisms to migrate the agent’s code and execution state among different networked hosts and to execute there. Via the encapsulation of behavior definitions (i.e. the semantics of the agent itself) within the migrating agent, a mobile agent can support the delegation of large grain operations[3]. Traditional RPC APIs are only of static nature and as such not well adapted to the dynamic and changing environment. Additionally, the RPC design paradigm is based on a centralized operation scheme (around the client station) which requires concentration of the “co-operation knowledge” at the client side. The mobile agent technology in turn opens the way to a new, asynchronous decentralized system and applications design paradigm, as opposed to the traditional DOT client-server synchronous RPC approach. Implementation of migration-transparent services provide additional grounds for such designs by offering intrinsic support for higher
168
Ingo Busse et al.
autonomy and flexible decentralization, efficiency (load reduction and balancing) and robustness (fault recovery). Intelligence is another aspect of agent technology. For Intelligent Agents, the focus of concern are goal-orientation, social ability, learning capability / adaptability and personal / mental character. Intelligent agents are AI-based autonomous entities that co-operate with other entities to achieve the aggregated goal of the whole distributed system. Intelligent agents are capable of dealing with exceptional situations (flexibility / robustness), adapt to the environmental changes (learning based on interactions with the environment) and rationally (based on their personal character) co-operate with other agents in order to achieve their (adapted) goals. As shown above, the agent paradigm offers a powerful technology that has already been adopted by a considerable number of research projects. Why shouldn’t this technology be used for developing active networks ?
3.1 An Agent-Based Active Network Architecture An active network is built as an overlay of a legacy network. It consists of a number of active nodes1 connected to each other. Some of the legacy nodes may not be active. Such nodes are tunneled, i.e. all active network-related data are ignored and forwarded to an active node. In general, active components, i.e. mobile agents, can be inserted to an active network by end-user or network operators applications. Whereas user installed agents can adapt the network nodes in order to achieve the best utilization of the network for particular application-related data flow(s), administrator installed agents are able to manage network nodes considering all data flows. Thus, administrator agents can be installed in any active node (user agents instead are allowed to be installed only in nodes involved in their data flow) and do not have to be deleted after flow termination. These aspects have to be considered also when developing of an AN security facility. MA installed by a user’s application
Active Node
Facilities Agent System
User A
Traditionally Node
API
API
Router
Router
Active Node
Facilities Agent System API Router
User B
Router
API Agent System
Router
API
Facilities
MA installed by the network administrator
Agent System
Active Node
Facilities
Network Administrator
Active Node
Fig. 1. Deployment of Active Components 1
In the following, we focus on IP routers as one example node technology.
Autonomy and Decentralization in Active Networks
169
The technical realization of active component deployment by the network administrator is quite simple. Because the network operator’s host belongs to the active network layer, i.e. the distributed middleware necessary for the AN realization, an administrator application can directly and securely connect to any active node of the AN. The insertion of user agents in contrast is a little bit more complicated. The problem raises from the fact that user hosts are usually not part of the network operator’s domain and therefore should have no access to the active node configuration. How can this problem be solved ? A first solution would be that the user application has facilities that are developed with the active network in mind (active network-capable applications). These additional facilities have to provide the connectivity to the active network offering the possibility to be temporarily part of the operator’s AN domain. Technically this can be realized by spanning the distributed agent environment across the network domain and the user host. Over such a bridge, mobile agents are able to migrate into the network. Active Network Domain
User Application (Sender)
User Application (Receiver)
Network
Fig. 2. Active Components Deployment – First Approach
Unfortunately, this solution requires an extension of the end-user applications which makes it complicated to integrate existing applications. Alternatively to the development of AN-capable application, network edge nodes may provide techniques for node management considering the application data flows that arrive from outside hosts. This means, the AN domain provides dedicated entry points that are able to analyze the incoming stream and accordingly instanciate the appropriate active components that it installs in the AN nodes inside the operator’s domain. Active Network Domain
User Application (Sender)
Edge Node
Edge Node
User Application (Receiver)
Network
Fig. 3. Deployment of Active Components – Second Approach
Thus, end-user applications can run without any changes. Another advantage of this approach is, that the network administrator keeps complete control of its network and no foreign code is injected by users. A disadvantage of this approach is that the applications can benefit from the AN only if there exist related active code templates in the edge routers, i.e. the entry active routers can only provide AN-capability for a fixed set of applications supported by the network administrator.
170
Ingo Busse et al.
3.2 Active Node Architecture The proposed architecture of an active node is depicted in Fig. 4. Each active node consists of the following components: • A programmable node (e.g. a programmable IP router), which provides an extended API for the dynamic programming/customization of its resources and services (like buffers, traps, forwarding map). • A dedicated Mobile Agent Platform, which provides a number of Execution Environments (EE) for the mobile agents that visit the active nodes and program their services. The platform will also “integrate” the active router API facilities and offer to the visiting mobile agents access to them via the MA agent system facilities. • A number of Mobile Agents that reside in the agent system and utilize the agent system facilities in order to access and program the services of the active router. Component Execution Environment (EE)
EE
Active Node
EE Active Node API
Application Specific Active Components
Active Node Facilities Configuration, Resource Management, Security, Wrappers, etc.
Intelligent Mobile Agent Platform
Programmable Node API
Programmable Node
- Routing Functionality MIB
Fig. 4. Active Node Architecture
In a typical scenario, a MA can be either generated by the customer/consumer (application) of the network services or by the network operator. The active network will have a management station or some Customer Network Management (CNM) interfaces, which enable the operator or the customers to specify and create such MAs for programming the active nodes. A programmable node in this context distinguishes itself from the conventional switches/routers via its enhanced programmability. A careful requirement analysis will be needed, in order to identify the node programmability facilities that are needed to support the emerging, novel telecommunications market structure and applications. These facilities will be made accessible to the external entities via the extended node API. They provide network management done by mobile agents or conventional network management using SNMP. A Java-based Mobile Agent Platform is preferred because the Java programming language offers a widely-used, modern language with support for platform independence (an important factor ensuring portability within a heterogeneous switch or router environment), code mobility and security within an open network.
Autonomy and Decentralization in Active Networks
171
The proposed active node architecture includes several interfaces each providing methods to manage the network node on several abstraction levels. The architecture is based on a programmable node providing a programmable API. The term “programmable” indicates that this API contains extension that allow a more sophisticated configuration capability than today’s network nodes. Several companies are still in a development and test phase of such hardware. Programmable network nodes may contain support for QoS guarantees or for improved package forwarding using priorities and queues. As in traditional network management, the interface to control these mechanisms is only accessible for network administrators via SNMP, CMIP or vendor specific interfaces. The introduction of an active network node on top of the programmable node provides an evolution from static network nodes to dynamically ones. Unlike with traditional or programmable networks, a node can be dynamically configured according to the demands of a certain application under dependency of the current network state. This results from the idea to move customer management functionality to the node whenever this is required and adapt the network layer to the demands of the application layer. In contrast to the traditional layered OSI model, the different requirements of today’s applications call for a different handling of data flows (e.g. video, voice, data streams) also on the network layer. The active node provides an Active Node API that provides users with methods for adapting the network exactly to their applications. From a user’s view these changes are of temporal nature only and related to a certain application, i.e. the network becomes “active”.
3.3 Open Interfaces for Programmable and Active Networks The active network research started mainly with DARPA’s initiative and has remained confined largely to the Internet community. A standardization effort mainly related to open APIs was established within the IEEE which formed the P1520 working group. Another approach of interest is the work of the Multiservice Switching Forum, a forum of ATM switch vendors defining open APIs for ensuring interoperability between components from different vendors. 3.3.1 IEEE P1520 The P1520 working group proposed a set of standard interfaces which aims at creating a distributed programmable interface to network abstractions that allows signaling service programming software to be realized in a distributed software environment. Via these interfaces and the proposed views of the network and hardware states they permit it will be possible to have different types of signaling protocols and offer access to all of them via a generic interface. This can serve as a bi-directional benefit. Network service developers and application vendors will have the ability to implement services not specified in standard signaling protocols and conduct quality of service negotiations with the network elements. On the other hand hardware developers will also benefit from the enhanced and dynamic control of hardware by being released from the effort of complex software development.
172
Ingo Busse et al.
The P1520 Reference Model in Fig. 5 illustrates the different proposed levels of a programmable node. We can clearly distinguish levels, entities at each level and interfaces between levels. End User Applications V-Interface Value-Added Services Level
Algorithms for value-added communication services created by network operators, users, and third parties
Entities
U-interface Algorithms for routing, connection management, admission control, etc.
Network Generic Services Level
L-interface Virtual Network Devices Level
Virtual Network Devices (software representation) CCM-interface Physical Elements (hardware, namespace)
L E V E L S
PE Level
Fig. 5. The P1520 Reference Model
The four interfaces provide separation between end-user applications and valueadded services, network service provider and the underlying network resources. The high level model described above can be mapped to existing network technologies. Currently three types of technologies have penetrated the market : ATM networks, circuit switched networks based on the SS7 protocol and networks of IP Routers/switches. ATM technology addresses mainly the backbone networks, IP technology is rapidly growing in a variety of types of networks from data to voice and even wireless, and SS7 dominates the telecommunication market and telephony applications. For IP routed/switched networks the main idea is to dissociate the maintenance of state information from the algorithms that manipulate state in a network. Since IP routers are designed to do fast routing without selective communication session control (as P1520 proposes) and the IP community believes that is impractical to maintain per-flow state in large networks one could wonder why the P1520 proposed interfaces are necessary as they seem to be not desired by router manufacturers and users. A closer examination however reveals the necessity of those interfaces. When comparing the proposed agent-based node architecture as shown in Fig. 4 with the P1520 reference model, the following conformance can be derived: the PE level and the Virtual Network Device Level are covered by the Programmable Node so that the L-interface equals the Programmable Node API. Consumer injected agents will mainly use the U- and V-interface possibly provided by other (pre-installed) agents. 3.3.2 Multiservice Switching Forum (MSF) The goal of the MSF is to define an architecture that provides a separation of control and user/data plane aspects and develop a framework which can be easily extended to support new user/data plane and control functions [6]. Therefore, an agreement about a set of intra-switch interfaces should be reached. These open interfaces allow service
Autonomy and Decentralization in Active Networks
173
providers to deploy Multiservice Switching Systems composed of equipment from several vendors. A Virtual Switch Interface (VSI) protocol is suggested for the control of switches. Using this protocol, multiple independent network controllers can control a single ATM switch. The VSI protocol allows each controller to add and delete connections, to automatically discover resources, and to collect billing statistics. Additionally, the VSI protocol supports distributed processing, QoS, hitless resynchronization, and efficient flow control to implement high performance, high availability as well as multiservice switching.
3.4 Active Network Facilities Based on Mobile Agents As depicted in Fig. 4, an active node consists of a number of Execution Environments (EE) each providing the deployment of mobile agent based active components and their execution frameworks. From inside an EE, an agent is able to get access to basic active node facilities and the programmable node API. Thus it is able to use either the basic programmable interface offered by the node or more sophisticated management facilities pre-installed by the active network administrator. Node facilities may also be implemented as (mobile) agents. This provides an easy control and update by network operators (see example below). Example facilities are: • Resource Management Facility: provides the installation, deletion and updating of installed services and facilities. Additionally, it can support billing functionality and versioning of the node components. • Configuration Facility: can be used to configure the programmable node. It can either act autonomously in dependency of the current network traffic (e.g. adaptation of node’s loss rate), or can be called by user installed agents or can be called remotely by the network operator. • Security Facility: ensures the authentication of components granting access to node resources and provides AN specific access control enhancements. Beside the security facility, the active node architecture avoids interactions between agents by isolation of the EEs (sandbox concept). The sandbox concept, adapted from Java Applets, protects local resources, the network node and other mobile agents from miscellaneous agents and runtime errors. A particular sandbox defines, in cooperation with the security facility (security policy of the node owner), the particular security policy for an active component sent by a certain user. • Command wrappers: can hide vendor-specific node APIs by translating standardconformant control instructions (e.g. SNMP) to underlying vendor-specific programmable node API calls.
3.5 Expected Benefits Mobile agents technology may play an important role in the development of active networks. We believe, that the main characteristics of mobile agents, i.e. mobility and autonomy, provide functionality that enables networks to become more active and more powerful. The main areas where AN can benefit from agent technology are:
174
Ingo Busse et al.
• The usage of mobile agent technology makes the development of an active network facility much easier. The usage of existing mobile agent middleware technology and their integration into existing networks avoids the need to develop new and proprietary transport mechanisms for the deployment of active components. • As outlined in section 2, network operators require control of their network, including the installed active components. Usually, agent systems provide facilities to monitor, log and control distributed objects within a domain from a centralized point. In an active network architecture, such facilities can support the development of comprehensive network management solutions. • The AN community can use the experiences made by agent technology developers and users. Especially research that addresses security issues in open, distributed object systems are important and may increase the speed of AN development. • Active components realized as mobile agents are able to autonomously decide where they have to be installed in order to achieve an optimal network service configuration. Moreover, agent-based components can adapt themselves to the changing network conditions (self managed deployment itinerary). Alternative active network approaches using “dumb” components, e.g. capsules only transporting code, are not able to provide such flexibility. • The installation of autonomous components inside the network enables the development of distributed applications. Autonomous agents residing on several network nodes can interact and cooperate for a common goal, i.e. for providing a particular service.
4
Example Scenarios
As mentioned in section 2, active network technology will provide benefits for network operators as well as for end-users. In this section, we will describe applications for active networks and show how these extend the capabilities of operators and users. The first scenario focuses on the network operator role and shows how network management can be done inside an active network. Another key motivation for the development of active network technology is to improve widelyused internet techniques in order to fulfill the demands of the growing internet community. Also of interest is to break down already existing router software to separate parts in order to be able to replace components on the fly. Candidates for autonomous facilities are components responsible for Web caching, Web prefetching, multimedia filters, and converters. The latest is discussed in more detail in the second scenario.
4.1 Software Maintenance by the Network Operator In today’s networks, the network operator is responsible for the configuration and maintenance of the network nodes’ software. Because a network usually consists of hosts of several vendors or hosts of the same vendor but running different software versions, this task becomes more and more complicated. The way this is done can be
Autonomy and Decentralization in Active Networks
175
characterized as uncomfortable: the software is updated for each node via remote login while the service is interrupted. Newer technologies, e.g. some ATM switches, partially offer the capability to update software in a pull manner, i.e. a new software version is downloaded after request. In an active node, the node software can also be realized as component. Opposed to customer-specific software, these components are already pre-installed in the node and not moveable during execution. Via an administrator API, the network operator can install an update such components during the in-service phase. In contrast to traditional nodes, the operator must not login to the node and vendor-specific APIs are hidden by the AN middleware. AN technology offers a method to delegate this task to a facility responsible to contact all nodes under the operator’s control and to send agents containing the new software to those nodes. Thus, the software update can be performed in exactly those nodes where it is required, i.e. the software is pushed to the nodes.
4.2 Multimedia Stream Filters By nature, distributed multimedia applications require multi-peer communication support. Providing satisfactory end-to-end QoS (bandwidth, delay, jitter, and error rate) support within multicast groups for continuous media data services is a major research issue. While a common quality agreement for data capture, transfer and display between the user and the provider can potentially be negotiated and maintained for peer-to-peer user communications, this task becomes overwhelming when considering peer-to-multi-peer connections. In current packet switched networks, the issues related to resource allocation are addressed by reservation protocols. Filtering is an alternative method that has been proposed for effectively controlling QoS in point-to-point and multicast connections [Hoffman93]. Traditionally, filters outlined so far apply flow control at relatively low levels within a network, e.g. on the IP or TCP level. This low level approach can not be performed with multimedia streams. The reason is that filtering of low level packets would corrupt media streams because the frames depend on each other. One way to tie together the use of filters to QoS control is to introduce high-level QoS filters. Depending on the current network state and QoS parameters, application specific filters can adapt the media data stream (in case of video: height and width of the image, color depth and compression factor) to ensure a delivery also in situations when the bandwidth decreases. To introduce the concept of application level multimedia filtering for active networks, we start from a point-to-point multimedia connection. Traditional QoS reservation mechanisms, make reservations at the network layer only. Thus, a QoS guarantee is only valid for a link between two nodes of the complete path. Considering a network bottleneck in the last router of the connection, a higher reservation request would reserve higher bandwidth between the source and the last router, but lower bandwidth between the last router and the sink. The overall guaranteed bandwidth is only the lowest one in the entire path, i.e. equal to the bandwidth guaranteed by the last link (see Fig. 6).
176
Ingo Busse et al.
The idea of active networks allows to place filters in those network nodes, i.e. routers, where they can perform application level data reduction in order to achieve best network utilization. In the described scenario, it would be useful to place a filter in the first network node, which filters the media stream between source and sink and reduces the bandwidth of the stream to a value that can be guaranteed by the complete connection. Thus, the traffic inside the network is reduced by the difference between the source’s send rate and the sink’s receive rate (see Fig. 7). The saved bandwidth can now used by another connection. Moreover, the usage of application data specific filters provides the integrity of the media data stream, i.e. the filtering on an application level avoids the loss of stream information. In a MPEG video stream, for example, the B-frames can be dropped without corrupting the MPEG video stream. 0.7 Mbit/s
0.5 Mbit/s Router
Router
Router
Application sends: 1.1 Mb/s
Application able to receive: 0.7 Mb/s
Fig. 6. Waste of Bandwidth in Existing Networks and Packet Loss Because of a Bottleneck
Filter
0.5 Mbit/s Router
Router
Application sends: 1.1 Mb/s
Application able to receive: 0.7 Mb/s
Fig. 7. Reducing Network Traffic Through Application Specific Filtering
This simple example can be extended when considering a multicast communication, e.g. a video conference. The concept of filtering in active networks offers the possibility to inject filters in the network routers in a way that an optimal reservation tree is built. In the case of the described scenario, the first filter is placed in the router nearest to the source and reduces the bandwidth from 1.1 Mbit/s to a value that is equal to the highest bandwidth, a sink can receive. In our scenario, this optimal network configuration is shown in Fig. 8. Filter Application A sends: 1.1 Mb/s
0.7 Mb/s
Router
Router
Filter
Application B able to receive: 0.7 Mb/s
0.5 Mb/s
Application C able to receive: 0.5 Mb/s
Fig. 8. Optimized Multicast Tree
Autonomy and Decentralization in Active Networks
177
Another disadvantage of resource reservation in traditional networks is the static behavior of the reservation tree. RSVP only supports the addition and deletion of sink applications but not the re-allocation of resources after a sink is added/removed. The combination of active routers with the filter concept enables the reservation tree to adapt the network to the requirements of the sinks. Therefore, several cases have to be considered in our scenario. To demonstrate these cases, the scenario from Fig. 7 is used again: assume an established reservation between A and B with a bandwidth of 0.7 Mbit/s. First, a sink (consumer application C) added to the reservation tree is able to consume a lower bandwidth (e.g. 0.5 Mbit/s) than application B. In this simple case, a new filter has to be injected in that router where the path A-C is splitted from the existing path A to B. Thus, a configuration as shown in Fig. 8 is reached. Secondly, if the new consumer is now able to receive a higher bandwidth than application B, two scenarios are possible. If the new consumer supports a bandwidth that is greater than the one the sender is able to deliver, the installed filter has to be moved to a router that is located nearer to application B (see Fig. 9). This provides the transmission of a full-quality data stream to the new consumer, but with the same reduced quality to the old sink. 0.7 Mb/s Router
Router
Filter
Router
Application A sends: 1.1 Mb/s
Application B able to receive: 0.7 Mb/s 1.1 Mb/s
Application C able to receive: 1.5 Mb/s
Fig. 9. Self-Adapting Multicast Tree: Moving a Filter
In case, the bandwidth that can be processed by the new sink is higher than that of application B but lower than the sending rate of application A, the installed filter has to be re-configured (for filtering the bandwidth to the bandwidth value supported by sink C) and a new filter has to be installed. This later one is responsible to adapt the stream for consumer B (see Fig. 10). 0.7 ð 1.0 Mb/s Filter Router Application A sends: 1.1 Mb/s
0.7 Mb/s Router
Filter
Router Application B able to receive: 0.7 Mb/s
Application C able to receive: 1.0 Mb/s
Fig. 10. Self-Adapting Multicast-Tree: Reconfiguring a Filter
178
Ingo Busse et al.
Another advantage of filters in active networks is the possibility to use different filter types and/or combining filters. The size of an MPEG video stream can be reduced using the following filter types: selective filters - that drop specific frames (as described above), transforming filters - that transform a colored image to a grayscaled imagek, etc.. Furthermore, more than one filter operations can be undertaken inside a network router, e.g. drop the B-frames and the colors of an MPEG video stream. The high number of consumers in real-life environments will call for a dynamic adaptive network that is able to inject the filters exactly in the required network nodes. As outlined in the scenarios above, filters have not only to be installed (Fig. 7 and 8). A support for a self-adapting behavior, e.g. moving filters from one network node to another (Fig. 9) or a reconfiguration (Fig. 10) , is also required. For these purposes, the active network concept combined with mobile agent technology seems to offer a good solution that supports a generic behavior of complex filtering mechanisms customizable to specific application data streams.
5
Conclusion
Mobile agents seem to be well suited to active networks. They provide the required functionality, such as installation of components on the fly into a running system (service) as well as management capabilities, in a standardized manner. Combined with open interfaces they provide the flexibility which is required to allow network operators a fully automated service provisioning as well as to allow consumers to customize the network behavior according to their needs. Security issues are addressed by a framework which allows the network operator to specify a policy with fine grained access rules to the resources on the nodes. A centralized management console allows the network operator to manage and supervise the nodes according to it’s management policy, while consumers overtake the responsibility of service management according to their particular network service management policy. The outlined scenarios show two applications that can benefit from the autonomy and the decentralization provided by a mobile agent based AN. Management tasks like software maintenance by the network operator can be performed much easier. Multimedia streams can be adapted by a set of distributed filter components. Mobile agents are able to bring filter instances to the required active node. Moreover, the autonomous characteristic of agents enables filters to self-control their execution parameters and location in dependency of the current network and terminal state (throughput) and to decide about re-configurations of the reservation tree in order to achieve an optimal network traffic. Therefore, filter-agents have to interact with other filters or management components. In sophisticated scenarios this may result is a creation of filter-chains, where several types of filtering (image resolution, color depth) are combined. Other AN research projects (SwitchWare/PLAN, ANTS, Netscript) mainly focus on the possibility to inject code for a certain application in the network. But after a component was installed, the network behaves statically because the components are not able to reconfigure themselves autonomously nor to interact inside the distributed environment. Such capabilities are supported by the use of mobile and intelligent agent technology.
Autonomy and Decentralization in Active Networks
179
In our future work, we will validate the concepts introduced by this paper and built a prototypical AN framework on top of mobile agent technology. The implementation will be based on next-generation, programmable IP routers and demonstrate how AN concepts can fulfill the growing demands of the internet community. Furthermore, in order to exploit the full benefit of autonomy and decentralization the mobile intelligent agents offer, a new application design framework offering resource location semantics (as opposed to the current CORBA) will be addressed.
References 1. C. M. Adam, J.-F. Huard, A. A. Lazar, K.-S. Lim, M. Nandikesan, E. Shim, “Proposal for standization of ATM Binding Interface Base 2.1”, submitted to P1520, January 1999 2. J. Biswas, J.-F. Huard, A. A. Lazar, K. Lim, S. Mahjoub, L.-F. Pau, M. Suzuki, S. Rostensson, W. Weiguo, S. Weinstein, “Application Programming Interfaces for Networks”, IEEE P1520 Draft White Paper, July 1998 3. S. Covaci, T. Zhang, I. Busse, “Mobile Intelligent Agents for the Management of the Information Infrastructure”, Proceedings of the 31st Hawaii International Conference on Systems Sciences, January 1998 4. P. Newman, W. Edwards, R. Hinden, E. Hoffman, F. C. Liaw, T. and G. Minshall, “Ipsilon’s General Switch Management Protocol Specification Version 2.0”, RFC 2297, Internet Engineering Task Force, March 1998 5. D. Hoffman, M. Speer, G. Fernando, “Network Support for Dynamically Scaled Multimedia Data Streams”; Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video; Lancaster University; 1993 6. http://www.msforum.org
Acronyms AN AI API ATM CORBA DOT DPE EE IEEE IETF IP MASIF MIB MPEG
Active Network Artifical Intelligence Application Programmer’s Interface Asynchronous Transfer Mode Common Object Request Broker Architecture Distributed Object Technology Distributed Processing Environment Execution Environment Institute of Electrical & Electronic Engineers Internet Engineering Task Force Internet Protocol Mobile Agent System Interoperability Facility Management Information Base Motion Picture Expert Group
180
MSF OMG PE QoS RFC RPC RSVP SNMP TMN
Ingo Busse et al.
Multiservice Switching Forum Object Management Group Physical Element Quality of Service Request for Comments Remote Precedure Call Resource Reservation Protocol Simple Network Management Protocol Telecommunications Management Network
Towards Active Hardware David C. Lee1, Mark T. Jones2, Scott F. Midkiff2, and Peter M. Athanas2 1
Technology Development Center, 3Com Corporation 5400 Bayfront Plaza, M/S 3219, Santa Clara, California 95052, USA 2 Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University 340 Whittemore Hall, Blacksburg, Virginia, 24061, USA
Abstract. Active technologies have the potential to increase the intelligence and flexibility of modern networks – networks that are experiencing a consistently and exponentially increasing amount of traffic. Active technologies require flexibility in network routers and this typically means software. The flexibility offered by software solutions tends to work against the increasing performance requirements of future network routers. This paper proposes a solution to this problem and describes the result of efforts to build a prototype system. The proposed solution is to apply custom, reconfigurable computing technologies, new synthesis and compilation technology, and hardware models to active network devices. In short, to develop “active hardware” that integrates active software with adaptive computing. Elements of the proposed architecture, the execution environment, strategies for active hardware, and a stream-based hardware prototype are discussed in the context of a reconfigurable router. The reconfigurable router has the potential to allow protocol designers without hardware design experience to develop protocols that execute at hardware-level performance with software-level reconfigurability.
1
Introduction
Active and programmable networks [1,2] have evolved as an alternative approach to allow both generalized and custom solutions to problems in modern networks, such as network security, quality-of-service, multicasting, and routing. In an active network, solutions can be developed and deployed without the need for a global standardization committee or vendor upgrades. The potential flexibility allowed by active networks is typically achieved through the use of special software running in the router operating system. At the base level, this special software must use well-
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 180-188, 1999. Springer-Verlag Berlin Heidelberg 1999
Towards Active Hardware
181
defined and standardized application programming interfaces (APIs), whether defined by a programming language such as [3,-5] or a system interface specification. The system interface level of active networking is generally referred to as open signaling [6]. An active network node is a node that can perform custom computations that are directed by software delivered and installed from the network, typically using mobile software [7] techniques.
2
Motivation and Operational Example
While both open signaling and active network technologies have the potential to radically change the way networks operate, there is concern about how these software technologies affect router performance. It may well be that these approaches significantly limit the performance of a router, particularly important given that future routers must process gigabits or terabits of data per second. At present, generalpurpose processors cannot meet even some existing performance requirements without using application specific integrated circuits (ASICs) [8]. Clearly, ASICs do not support the reconfiguration necessary for active network technology and general-purpose processors may not provide sufficient performance. This paper describes a potential solution to help active and programmable networks achieve the goals of an intelligent and flexible network. This solution integrates newly emerging custom computing and active software technologies to achieve the conflicting goals of performance and flexibility in active hardware. Active hardware incorporates the flexibility and “active” elements of software into the reconfigurable and adaptive computing “hardware.” Active hardware enables high performance active networking with the potential to obtain orders of magnitude increases in performance when compared to general-purpose processor execution. The network device that illustrates the functionality of active hardware is the reconfigurable router. The reconfigurable router is based on active network technology that incorporates rapidly advancing field-programmable gate array (FPGA) technology combined with emerging compilation, synthesis, partitioning, and scheduling technologies to create a fully active node, both in hardware and software. FPGA-based algorithms can, at comparable costs, exceed the performance of conventional processors by 10 to 1000 times [9]. Assume that a reconfigurable router, being used for active network research, receives a packet that is the source code for an active program. The router’s operating system can schedule the program for 1) execution on the general-purpose CPU for compilation and execution, 2) execution on the reconfigurable hardware via synthesis by the operating system, 3) execution via direct control of the hardware configuration by the software, or, 4) analysis and partitioning of the source code into portions that can be scheduled for hardware execution and portions that can be scheduled for software execution. This strategy provides for the complete reconfiguration of the router without human intervention, and little knowledge of hardware.
182
David C. Lee1 et al.
Assume the previous packet added hardware support to the router for a label switching algorithm. As aspects of label switching are still active research and development areas, investigators will wish to test the performance of various algorithms. In a traditional workstation-based router, investigators would compile a new kernel with the new algorithm, install it, and run the experiments. Workstations, and not conventional routers in a wiring closet, must be used as it is impractical for investigators to modify a router vendor’s code. This process is time consuming and error prone. In a software-based active network router, hardware control may be achieved through the use of well-defined open signaling APIs. This would be no different for a reconfigurable router – except that the reconfigurable router would have a performance advantage as the label switching algorithm would be conceived for hardware and executed in hardware. If the label switching algorithm relied on a new ATM adaptation layer for its operation, it is unlikely that a software-based active network would be able to handle this situation. A hardware and software-based reconfigurable router allows “low in the stack” adaptability, at the hardware layer, at run-time, and without requiring detailed knowledge of the hardware. Active Program
Active Network-based Operating System
Reconfigurable Router
Hardware/Software Paritioning and Scheduling
Packets In
Reconfigurable Hardware
General-Purpose Processor
Packets Out
Fig. SEQ1. Block diagram of a reconfigurable router.
The proposed architecture and a number of possible implementation approaches for active hardware are discussed, including custom compilation, and Active VHDL. A brief review of the stream-based implementation of active hardware for network routing [10] is then presented. The remainder of the paper discusses the future directions of active hardware.
3
Proposed Nodal Architecture
Figure 1 is a block diagram of the proposed router architecture. Active code is delivered to the router from either the network or via a local terminal. The active code performs custom computations on the packet stream moving through the router. Once the active network-based operating system verifies completeness, access rights,
Towards Active Hardware
183
and execution safety of the active code, it will either install the code as a module into an active network-based operating system or schedule the code for hardware/software partitioning. If the received active code is not complete, some type of library resolution strategy [11] will be used to obtain missing portions. The partitioning process determines which parts of the active code will be scheduled for execution in the software and for execution in the reconfigurable hardware.
3.1 Active Network-Based Operating System Any delivery mechanism can be used to send the active code to a reconfigurable router. Once the active network-based operating system receives the code, it must then verify that the source of the code has the necessary access privileges, determine available resource allocations, and ensure that the code has all supporting libraries. An active network-based operating system will probably be some form of an extensible operating system [12] and [13]. Regardless of form, any operating system with dynamic, kernel-level module/handler installation capabilities will work. At this point, the operating system has determined that the code may run on the router and needs to determine if the code should execute only in hardware, only in software, or in both hardware and software. Active Program Partitioning
Hardware Scheduling
Synthesis
Software Scheduling
Dynamic Compilation
Interpreter
Active VHDL
Native Hardware Configuration
Reconfigurable Hardware
Native Code
General-Purpose CPU
Fig. SEQ2. Partitioning and scheduling process.
Software execution scheduling may require mobile software system to be in use and is the topic of many active network research projects. As shown in Figure 2, the software components may be dynamically compiled into native code or sent as native code and loaded as a module or a library into the operating system. The last possible mechanism is to execute the software-partitioned code in an interpreter. All of these mechanisms operate in what is referred to as an execution environment [2]. Hardware execution may be achieved by sending VHSIC Hardware Description Language (VHDL) descriptions or other hardware descriptions that can be
184
David C. Lee1 et al.
synthesized and loaded into the hardware. Alternatively, hardware execution can be accomplished by directly configuring the hardware in software. Active hardware enables both efficient hardware and software execution; if the appropriate hardware is present, then it will be used. Otherwise, an efficient software algorithm will be used. Further, active hardware provides execution of mobile and platform independent programming. A variety of active hardware mechanisms are described below.
3.2 Active Hardware and Active VHDL Active hardware interfaces with the execution environment of an operating system to perform the code execution. Once the code is loaded into the execution environment, the code will be partitioned and scheduled for execution on various hardware units, shown in Figure 2, based on algorithms developed for hardware/software co-design. The hardware/software co-design algorithms [14] are typically used to improve computational performance but can be adopted to support the reconfigurable router. The system partitions the functionality such that some portion runs on a general-purpose processor and the rest runs on reconfigurable hardware. One custom compilation approach [15] takes a programming language, such as ANSI C or Java, and converts the language statements into a behavioral graph. The graph is analyzed, bound to library elements, partitioned spatially, and scheduled for hardware execution where appropriate. The analyzed graph results in a partitioning of the system into software and hardware components. The scheduling of the hardware components may result in direct synthesis of the necessary hardware. The use of an intermediate synthesis step, however, will divide the complexity of the system into manageable portions and introduce a layer of abstraction. The intermediate synthesis step will be a synthesis to VHDL and then the synthesis of the necessary hardware. Alternatively, a hardware-level programming description, such as an FPGA bitfile may be sent and used to program the hardware directly. The intermediate synthesis to VHDL can be assisted by the use of standard library elements and interface naming conventions, all of which allow conformity to an abstracted model of a generic router. The interface naming conventions uniformly model signal descriptions, such as clocks and input/output ports. The library elements provide for the uniform operation of router modules such as Content Addressable Memories (CAMs), counters, adders, and so forth. The library elements and naming conventions, which can be viewed as a set of additions to VHDL, are called Active VHDL. The hardware model assists the partitioning and scheduling algorithms by providing a framework used to synthesize the VHDL code. The basic concept of firmware delivery over the network has been demonstrated by researchers at NTT who have shown that pre-synthesized hardware descriptions can be sent across the network and loaded into a switch [16]. Thus, the basic concept of a reconfigurable router is clearly possible. However, the NTT approach requires the vendor to develop the code and provide the user with firmware that can be installed
Towards Active Hardware
...
Programming Information
...
Packet
...
Packet
Programming Information
185
STREAM
Fig. SEQ3. Stream programming format.
on-site. The proposed architecture differs by generalizing the router model so that hardware may be reconfigured by the use of software-level source code or Active VHDL descriptions. Both provide the ability for the user or the administrator to install their own firmware into the router hardware, using standard interfaces described by Active VHDL.
3.3 Reconfigurable Stream-Based Processing A number of possible architectures may be used for the actual reconfigurable hardware, from a board of FPGAs to specialized custom computing machines (CCM). Using Active VHDL, components can be synthesized from standard libraries and integrated into a whole system. A run-time reconfigurable CCM is the term for a computing device that allows hardware to be dynamically configured within the flow of normal computation. This provides the benefits of hardware while maintaining the flexibility of software. CCMs are characterized by having both programmable datapath operators and programmable interconnection resources. Stream-based computing combines self-guiding streams of programming information and user data to perform a computational task [17]. A stream-based computing environment has a modular and configurable set of functional units that are connected together to perform custom computations. The custom computations are variable-grained pipeline processing on the input stream. The input stream can be packet data and programming information, as shown in Figure 3. Multiple pipelines (streams) may be used to process data in parallel; multi-bit processing may occur. Because of the pipelining, computations can be performed at line speed, allowing performance to be maintained at the cost of chip area. Stream-based processing is similar in concept to Protocol Boosters [18,19]. Boosters are composable elements of protocols that may be strung together to form more complex protocols. Stream-based processing is an existing adaptive computing paradigm, or a more flexible hardware composition paradigm to which boosters can be transparently adapted. In a stream environment, the hardware-level program is sent in-band with the data and draws a close parallel to the notion of an active network capsule [20]. A capsule is an execution model where each packet is a program, as opposed to each packet carrying program or data. The implication of this execution environment is that the data on the network can be treated as one large flow of information. The flow is composed of program followed by data and is self-directed. Thus, the stream concept
186
David C. Lee1 et al.
can be repeated at the network architecture level, the router architecture level, and the hardware architecture level. Figure 4 is an example of a stream-based differential packet marking router. The example consists of modules to perform required IP header checks, routing of the packet, examination of the packet stream with differential packet marking as needed, packet filtering and scheduling as necessary. If the marking algorithm requires modification, additional stream programming information can be sent to modify the packet marking processing module. Thus, a packet is received and verified, passed on to the route lookup module, passed on for marking, and so forth. In a fully configurable stream processor, one can dynamically add, remove, and modify modules and interconnects. For example, if the packet marking process was no longer needed, it could be removed on demand. Modules can be developed independently, greatly simplifying the implementation of a design. The cost of modularity is an increase in the number of gates required to implement a particular function and, potentially, a lower utilization of resources within each module. Note that the gate counts of FPGAs are rapidly increasing, suggesting that ease of design is becoming more important that squeezing out the last gate. A simple stream-based reconfigurable router was implemented in [10], showing that the concept is feasible. Further, the control logic for the router took less than 16% of the resources using older Xilinx 4028 FPGA technology, assuming routes are stored off-chip, and had a theoretical throughput of 160 Mbps using an 8bit bus. The work in [10] was extended to newer FPGA technology to achieve a projected throughput of 576 Mpbs on a 16-bit data bus [21]. New FPGA technology allows more complex designs to be implemented and faster throughput to be obtained using both faster clocking and wider buses.
4
Discussion and Future Work
Active hardware provides for a complete and seamless integration of active networks and adaptive hardware. They enable high performance active networking while maintaining the flexibility of active software. Active hardware technology will allow arbitrary, network-supplied mobile computation to be performed on network packet headers and data at line speed. FPGA processing for network hardware has been demonstrated in [19,22,23]. Firmware delivery of FPGA bitfiles has been shown to be possible in [16]. ANEP is the code delivery protocol of choice and library resolution strategies have been proposed to help make an active network more robust [11]. Programming interfaces to the router are being specified [6] and various active network operating systems and execution environments have been built, such as [12,13]. Hardware and software code partitioning and execution scheduling have been shown to be possible and the stream-based paradigm has been shown to work for software-based radio receivers and other applications [17].
Towards Active Hardware
187
Active networks, as a software technology for data networks, have a significant potential to revolutionize how networks operate. To make this revolution successful in terms of real deployable products that can work as core backbone routers, the authors believe that active hardware technologies such as the reconfigurable router must be developed. Active hardware technologies tighten the integration of operating systems and computational platforms and improve their overall performance and flexibility. Active software and hardware description language technologies increase the power and portability of the system while hiding the complexity.
IP Header Verification
Route Lookup
Packet Marking
Packet Filtering
Packet Scheduling
Fig. 4. Example modular design for a packet marking reconfigurable router.
References 1. D.L. Tennenhouse, J.M. Smith, W.D. Sincoskie, D.J. Wetherall, and G.J. Minden, “A Survey of Active Network Research,” IEEE Communications, Vol. 35, No. 1, January 1997, pp. 80-86. 2. K.L. Calvert, S. Bhattacharjee, E. Zegura, and J. Sterbenz, “Directions in Active Networks,” IEEE Communications, Vol. 36, No. 10, October 1998, pp. 72-78. 3. Y. Yemini and S. da Silva, “Towards Programmable Networks,” IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, L'Aquila, Italy, October, 1996. Available at http://www.cs.columbia.edu/~dasilva/pubs/dsom96.pdf 4. D.J. Wetherall, J. Guttag, and D.L. Tennenhouse, “ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols,” IEEE OPENARCH'98, San Francisco, CA, April 1998. Available at http://www.tns.lcs.mit.edu/publications/openarch98.html 5. M. Hicks, P. Kakkar, J.T. Moore, C.A. Gunter and S. Nettles, “PLAN: A Packet Language for Active Networks,” Proceedings of the International Conference on Functional Programming (ICFP) '98. Available at http://www.cis.upenn.edu/~switchware/papers/bibtex/plan.txt 6. “OPENSIG Fall’97 Workshop,” October 6-7 1997, New York, NY, http://comet.ctr.columbia.edu/opensig/activities/fall.97.html 7. T. Thorn, “Programming Languages for Mobile Code,” ACM Computing Surveys, Vol. 29, No. 3, September 1997, pp.213-239. 8. S. Keshav and R. Sharma, “Issues and Trends in Router Design,” IEEE Communications Magazine, Vol. 36, No. 5, pp. 144-51, May 1998. 9. J. Vuillemin Jean, P. Bertin, D. Roncin, M. Shand, H. Touati, and P. Boucard, “Programmable Active Memories: Reconfigurable Systems Come of Age,” IEEE Transactions on VLSI Systems, Vol. 4, No. 1, pp. 56-69, March 1996.
188
David C. Lee1 et al.
10. D.C. Lee, S.J. Harper, P.M. Athanas, and S.F. Midkiff, “A Stream-based Reconfigurable Router Prototype,” International Conference on Communications, 1999, to appear. 11. D.C. Lee and S.F. Midkiff, “Active Libraries: A Flexible Strategy for Active Networks,” 8th IFIP Conference on High Performance Networking, Vienna, Austria, pp. 284-298, September 1998. 12. D.R. Engler, M.F. Kaashoek, and J. O’Toole, “The Operating System Kernel as a Secure Programmable Machine,” Operating Systems Review, Vol. 29, No. 1, January 1995, pp. 78-82. 13. B.N. Bershad, S. Savage, P. Pardyak, E.G. Sirer, M.E. Fiuczynski, D. Becker, C. Chamers, and S. Eggers, “Extensibility, Safety, and Performance in the SPIN Operating System,” Operating Systems Review, Vol. 29, No. 5, December 1995, pp. 267-84. 14. G. De Micheli and R.K. Gupta, “Hardware/Software Co-Design,” Proceedings of the IEEE, Vol. 85, No. 3, March 1997, pp. 349-365. 15. J. Peterson, R. O'Connor, and P. Athanas, “Scheduling and Partitioning ANSIC Programs onto Multi-FPGA CCM Architectures,” IEEE Symposium on FieldProgrammable Custom Computing Machines, Napa, California, pp. 178-87, April 1996. 16. N. Yamanaka, E. Oki, H. Hasegawa, and T.M. Chen, “Active-ATM: UserProgrammable Flexible ATM Network Architecture,” Workshop on Active Networking and Programmable Networks (at International Conference on Communications), Atlanta, GA, June 11, 1998. 17. R. Bittner and P. Athanas, “Wormhole Run-time Reconfigurable,” ACM/SIGDA Int. Symposium of FPGAs, Monterey, CA, pp. 79-85, February 1997. 18. W.S. Marcus, I. Hadzic, A.J. McAuley, and J.M. Smith, “Protocol Boosters: Applying Programmability to Network Infrastructures,” IEEE Communications, Vol. 36, No. 10, October 1998, pp. 79-83. 19. I. Hadzic and J.M. Smith, “P4: A Platform for FPGA Implementation of Protocol Boosters,” Field-Programmable Logic and Applications (FPL '97), Berlin, Germany; pp. 438-47, 1997. 20. D.L. Tennenhouse and D.J. Wetherall, “Towards an Active Network Architecture,” Computer Communication Review, Vol. 26, No. 2, April 1996, pp. 5-18. 21. J. Hess, D. Lee, S. Harper, M. Jones, and P. M. Athanas, "Implementation and Evaluation of a Prototype Reconfigurable Router," IEEE Symposium on FieldProgrammable Custom Computing Machines, Napa, California, 1999. 22. J.T. McHenry, P.W. Dowd, F.A. Pellegrino, T.M. Carrozzi, and W.B. Cooks, “An FPGA-Based Coprocessor for ATM Firewalls,” IEEE Symposium on FieldProgrammable Custom Computing Machines, Napa, California, 1997, pp. 30-39.
The Impact of Active Networks on Established Network Operators Arto Juhola3, Ian Marshall1, Stefan Covaci2, Thomas Velte2, Mike Donohoe,4 and Seppo Parkkila3 1
BT Laboratories, Martlesham Heath, Ipswich, IP5 3RE [email protected] 2 Deutsche Telecom [email protected], 3 Helsinki Telephone Corporation/FINNET [email protected], 4 Telecom Eirean
Abstract. A collaborative case based study has established that Active Networks will have a very significant impact on Network operators. Active Networking appears to be the only route to adding integrated mobility, security, QoS and management services to existing networks. In the short to medium term operators will be keen to use Application Layer Active Networking since the risks are relatively low. However the benefits of moving to stronger forms as research progresses appear compelling.
1
Introduction
One of the most serious operational problems facing large public network operators, is the difficulty of adding new features and technologies to their large installed network base. Since Active Networking was originally proposed [1] as a means of overcoming this very problem, established network owners such as the European Telcos are understandably extremely interested in using it to solve their problem. One result of this interest is that the BT, DT, TE and AF have collaborated in a 3month strategic study examining the likely impact of active networks on network operators. The study was carried out under the Eurescom framework and is fully reported in the P844 deliverable. Eurescom is a Heidelberg-based co-operative research organisation of the member TelCos. Eurescom organises research projects in "pre-competitive" telecommunication study areas. The main aim of the work was to establish the relevance of active networks for network operators and to determine what actions they should take to maximise the benefits. The operators are primarily interested in business impact within a five year planning cycle, but current research does not usually deliver business solutions within 3 years. The project team therefore chose to focus on business solutions that Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 188-199, 1999. Springer-Verlag Berlin Heidelberg 1999
The Impact of Active Networks on Established Network Operators
189
might emerge on a 3-5 year timescale and on the immediate research and development plans needed to ensure delivery. We explicitly did not consider using active networks to solve problems, such as ipv4 -ipv6 migration, that are expected to be solved in other ways within 18 months nor did we address issues such as service interaction for which no complete solution is expected anytime within the next ten years. Five usage cases were selected which illustrate the potential range of applicability of active networks and enable the elucidation of impact on the widest possible range of services and operational processes. The selected case studies range from extending the most basic network services (Mobility) through value add services (QoS Routing, Security) to operational support services (Management). The cases were analysed and evaluated separately to see what advantages the introduction of active network technology would bring compared with known "non-active" solutions. All current active network proposals [1-8] were considered in the study. However, the project team felt that Application layer active networking [5] was the most immediately realisable proposal, and also that it carried the lowest level of risk. In addition it was considered relatively easy to introduce as an overlay, initially in small area of the network. ALAN was therefore prominent in the case studies. In this paper we summarise the case studies and present the key conclusions and recommendations of the project regarding the impact of Active Networks on operators.
2
Case Study Results
2.1 QoS Routing Modern networks must optimise the management of communication among nodes that are interconnected by diverse and alternate paths. These paths may be based on heterogeneous, technologies with divergent properties. This makes smart path choice an essential feature in order to provide quality services (QoS). For our purposes QoS can be characterised in terms of bandwidth, latency, security, strength of guarantee and uni or bi-directionality. Existing QoS routing schemes, based on load monitoring via the OSPF link state advert mechanism and diffserv packet priority markers in the DS byte, are only applicable within a domain due to the limitations of OSPF. Between domains the appropriate routing protocol is BGP. Even BGP4 (the latest version) does not explicitly provide a scheme to convey load information between domains (although user defined schemes could be used). The same restriction applies to schemes based on estimating load using RSVP traffic since RSVP is to state intensive to deploy between domains on a large scale. If an appropriate extension to BGP4 were available we would still be forced to use two network addresses in the application in order to enable return traffic to follow a different route as in the scenario above. Existing applications would need to be recompiled (and extended with a QoS router) for dual homed terminals.
190
Arto Juhola et al.
The only viable alternative is some form of active network solution. The capsule approach was dismissed as impractical in the time frame recommended in section 1.3. We have therefore considered two cases; application layer active networking, and transport layer active networking using packet flags. Both approaches solve the basic problem with a low management overhead and good potential for scalability and ease of introduction. The transport layer solution could be thought ideal for this scenario as the added functionality is mostly transport layer. However, at the current state of language development any transport layer solution would be less flexible due to the difficulty of dynamically adding routing instructions to a router kernel safely (without degrading performance and security), and would not necessarily have better performance. The complexity in both cases depends on the complexity of the info gathering system required to assess remote QoS availability. Table 2.1.1 summarises the evaluation of both approaches.
Utility Complexity Manageability Scalability Performance Flexibility Risks Availability Integration with existing nets
Transport layer High Moderate High Moderate Potentially high Moderate – restricted due to risks High – i.e. not good 5-10 yr. Moderate – must change router software
ALAN High Moderate High Moderate Low High Low – i.e. good 18 months High – Very easy
Table 2.1.1
2.2 Mobility MANET based One approach is to combine the use of MobileIP [9] and MANET (Mobile Ad-Hoc NETworks) [10] protocols. The MobileIP would handle the mobility between the access points of the fixed network (no handover) and the MANET would keep the connections alive while the end system is in motion. Since the MANET protocols are likely to remain in a state of constant change a while, the ability of the active/programmable nodes to accommodate the latest improvements without service disruptions and yet maintain good performance would be an advantage. Extended MobileIP based A second approach could be to try to "speed-up" the mobileIP Foreign Agent registration processes [11][12] and arrange functionality to prepare for the change of
The Impact of Active Networks on Established Network Operators
191
Mobile IP Foreign Agent in advance. The programmable (active) networks could be used to implement the necessary additional functionality. What is required is an extension to MobileIP Home Agent capable of redirecting incoming messages between user's several concurrent Foreign Agent tunnels. By using concurrent foreign agent tunnels the traffic flow will remain uninterrupted even if the host is moving. Since the preparatory registration can be performed with the mobility agents in advance, the redirection of actual traffic can be immediate. There is also work going on to allow a user to dynamically perform the initial registration to MobileIP Home Agent services [13]. In addition, there is a proposal for diminishing the security-related overhead in mobile agent registrations taking place across administrative domains [14]. In short, there will be means for operators to offer Host Agent services for users, if not according to the ideas of [13] and [14], then by some further development. 2.2.2 Evaluation - Utility: With the presented handover solutions the users will achieve the freedom of movement for their hosts, regardless of the network technology (within the limits of that technology). The presented handover solutions do not themselves benefit much from active networks. They facilitate the use of an "overlay" strategy in implementing many combined services, however, that certainly will require active network technology. - Complexity: Neither of the handover solutions will require complex additional functionality (omitting the intrinsically complex MANET routing protocols). - Manageability: The management overhead related to the introduction of handover alone (in addition to the roaming management) is not considerable. It is possible to achieve fault tolerance through redundancy, e.g. by having several alternative active nodes available for access routers. - Scalability: In the initial phase, there is no need to deploy excessive amounts of (active) nodes with MANET/extended mobile-IP support. Although global scale handover is a possibility with the extended mobile-IP, better performance is achieved when hosts register to nearest extended mobile-IP agent offering extended functionality. - Performance: It is foreseen that a satisfactory system performance can be achieved at IP-layer. From the network point of view, the redirection (long-distance) traffic can be minimised. This is possible because the hosts can register to nearest handover-supporting nodes available. The performance of the mobility-combined services may fluctuate while a host is in motion (e.g. QoS), but this is the case with all mobile systems. - Flexibility achieved: Since overlays are possible, changes can be introduced promptly to both service and management functionalities. - Fragility & risks involved: There is no need to introduce "active" code into the IP-layer. In case of MANET, the routing is executed by "rigid" software (which can be deployed with nodes with active capability). When considering extended mobileIP, the redirection of traffic is essentially an end-system function. - Migration strategies (ease of introduction): It is not necessary to introduce extended functionality into existing routers (just standard mobileIP in case of extended mobileIP). Overlay solutions are possible.
192
Arto Juhola et al.
- (Dis)advantages when compared with non-active solutions: The complexity of the considered cases is not enough to warrant the use of programmable behaviour when considered in isolation. The table below summarises the evaluation. MANET Utility Complexity Manageability Scalability Performance Flexibility Risks
High Low (MANET excluded) High High Moderate-high High Low – i.e. good
Availability Integration with existing nets
2-3 yr. High
protocol
Extended Mobile IP High Low High High Moderate-high High Low – i.e. good 1-2 yr. High – Very easy
Table 2.2.1
2.3 Management Network management systems of today rely on relatively rigid ways of handling associated behaviour. Once deployed, there are no smooth ways to alter the management behaviour of the network elements, so in this sense they can be said to be „passive“. This passivity is a major cause of the current interoperability problems. One solution is the NetScript scripting language developed at Columbia University. NetScript is an agent-based middleware for programming functions of intermediate network nodes. In particular, it is targeted to process routing functions and as such can be used to manage network nodes. The idea behind this is that there is already a basic way of programming complex network nodes. It is the use of scripting languages to configure the very complex MIBs of routers, which can incorporate thousands of variables. NetScript is therefore designed to offer interfaces to conventional MIB scripts. NetScript agents can be dynamically dispatched to and executed at a remote system. The NetScript architecture consists of Virtual Network Engines (VNE) and Virtual Links (VL). The authors of NetScript have outlined two scenarios for network management based on NetScript: • Remote Network Monitor (RMON) • SNMP Agents A remote monitor allows an organisation to watch the performance and status of a network and protocols from remote location. RMON supports only a rigid set of operations, and requires that filters be written with low-level bit manipulations. By dispatching NetScript agents throughout the network the nodes can perform highlevel filtering. NetScript can interoperate with, or even implement existing standards
The Impact of Active Networks on Established Network Operators
193
like SNMP. This is of special importance when a NetScript VNE resides on a network managed by existing non-programmable infrastructure. NetScript agents could • Provide detailed access to MIB variables, • Analyse and manipulate MIB status variables, • Receive SNMP-requests to invoke the appropriate program to get and set a MIB variable. Another approach to using programmable networks for network management is the „Policy Based Management„ of Imperial College, London. Policies are a means of specifying and influencing management behaviour within a distributed system, without coding the behaviour into the manager agents (i. e. when time > 1/June/1999). There are authorisation policies and obligation policies. Authorisation policies specify what activities a manager is permitted or forbidden to do to a set of target objects. Obligation policies specify what activities a manager must or must not do to a set of target objects and essentially define the duties of a manager. Automated manager agents interpret policies and so the behaviour of the agents can be modified dynamically by changing policy rather than re-coding. Policies can be combined with directory systems, so that every node can be located with its related policies. The policies thus provide a constrained form of programming of automated agents to change management strategies without shutting down the management system. Web based management seems to be closely related to „application layer networking„, because in both cases the IP infrastructure is used to provide networking at the application layer. The web based approach in combination with scripting languages enables active behaviour in network elements based on executable code that can move throughout the network. The same is true for policy based management and management with NetScript. All of these approaches are quite realistic compared to other active network systems by two reasons: They can be restricted to the application layer, and the moveable parts of software can origin from trusted sources with well known, tested and reliable components. Moreover, as the code is mobile, it is possible to build some „Plug and Play„ functionality by moving the management information of a new network node to all relevant parts of the network management system.
2.4 Security The biggest obstacle to the widespread use of the Internet for sensitive commercial business is the perceived lack of security. The resulting threat can arise not only from the casual hacker but also from more organised industrial espionage or crime. There’s an increasing number of security breaches in corporate data networks, though most of these may go unreported. These breaches include • The reading and/or modification of sensitive corporate data • The interception of transaction data (e.g. credit card transactions) • The destruction/adding of data or the planting of viruses • ‘Denial of service’ attacks The problem is exacerbated by the need for corporate Intranets to allow for (a) remote dial-in and (b) transactions across the public Internet. Exposing Intranets to the public Internet creates many opportunities for security breaches. Furthermore,
194
Arto Juhola et al.
there is no single system that can be implemented to counter all threats. Nevertheless, the use of the Internet for electronic commerce will continue to grow, creating a need for better security systems that can be implemented speedily at many nodes and that don’t add a sizeable additional processing overhead to transactions. Security management is set to become more complex according as IP networks grow and access nodes multiply. Any technique that would help reduce this complexity without compromising security would be welcome. Existing security systems are many and varied. No single system can guarantee complete security and interworking is not always possible. Furthermore, sophisticated security systems introduce additional processing overhead and hence greater potential delay in communications. This is currently the case for instance with advanced encryption methods. Active networks appear to offer the capability of having a more uniform security policy enforced across an Intranet coupled with live upgrades when necessary. However, there would be a need for some standards to guarantee interworking between equipment from different vendors. The very act of putting more intelligence into routers may make them more vulnerable to security breaches. A programmable router could be more open to virus attack or unauthorised attempts at control. Careful consideration will have to be given to what active functionalities should be initiated and what should remain implemented as at present.
2.5 Benefits Arising from Combining the Scenarios It was felt worthwhile to consider the application of active networking to a combined service. We have used the concept of a „dynamic managed Extranet“ to combine the scenarios. The „dynamic Extranet“ envisages a world where some companies are largely virtual – consisting almost entirely of highly mobile workers who are networked into several different companies for the duration of projects (which will typically be run as a collaboration between companies). This scenario adds a strong mobility flavour to conventional Extranet requirements. Hopefully readers can visualise for themselves many of the combined mobility, security, management and QoS features the scenario entails. The project team’s view is that this service cannot be offered without some degree of active networking enabling programmes to move between end systems. For example, if a Java environment is assumed for user level tools, new information processing components enabling new working relationships, can be transferred and run during a session. With the addition of Jini join and discover techniques, and some local resource management processes the tools can safely be given access to local disks etc. However network QoS cannot always be maintained for the new tools without some modification of the local stack enabling appropriate QoS decisions. A specific case is that of content streaming which could use proprietary encryption, codecs and packet scheduling protocols, all of which should be implemented in the stack for efficiency reasons. The packet scheduling in particular cannot be implemented as part of the Java VM so the VM must support the use of a dynamic protocol stack, i.e. active networking.
The Impact of Active Networks on Established Network Operators
195
QoS and security can track users needs even more efficiently if intermediates are provided in the network. This is because for mobile users return paths may not be the same as the initial path. If intermediate nodes understand the needs they can perform efficient alternate path routing with appropriate security. Clearly these nodes also need to be able to dynamically acquire and manage the protocols, policies and programmes that are mandated by the clients needs. I.E. for strong guarantees of good QoS strong active networking will be necessary. The most obvious combination benefit then is enabling services that would otherwise not be possible, however there are other benefits, which can only be obtained if active networks are applied to all services rather than just a particular feature as in the other case studies. A list of some of the most significant which are not highlighted in the individual study evaluations, is provided below - Combining active QoS routing and active management enables risks to be ameliorated by dynamic distribution of management policies - Combining active routing and security with mobility enables performance to be maximised through path length minimisation - Adding active management to reduce risks enables use of low level active code and further performance enhancement - Combining active management with any active service enables the complexity of the management system to be reduced to only what is currently needed at any node by live services. Active features in an active management system can therefore have specialised management requirements if required without adding to complexity - Active security and active mobility enables security to be maintained through handover between operators (e.g. when crossing national borders on European mainland) Of course there are many other benefits, which there is not space to list here. However, we believe that there is enough here to illustrate the idea that active networks should ideally be thought of as a total service package, and not just a neat way of adding one or two smart new network features. The evaluation is summarised in table 2.5.1 below. Utility Complexity Manageability Scalability Performance Flexibility Risks Availability Integration with existing nets
Active Networks High Low High High Moderate-high High Moderate 2-3 yr. High Table 2.5.1
Existing networks Low Low Low High Moderate-high Low Low – i.e. good 1-2 yr. High – Very easy
196
3
Arto Juhola et al.
Implications for the TelCos
The key aspect of active networking is the ability to support the deployment and execution of network programmes as and where required by operators and users. The key benefit will be an improved ability to penetrate lucrative Internet-based markets. Active networks also promise diminished network operating costs. Some detailed implications of active networking for network operators are: - Network traffic (revenues) will increase. This is due to inherent mobility (users can be "connected" everywhere), better service availability (no need to stop working/ consuming content because of lack of the right tools), and more services being available. - Decreased network operating costs. With active networks, the configuration of the management system to reflect the current network can be made automatic. - Reduction of central management overhead. In order to simplify the management system network devices may be given greater autonomy via rule/policy based instructions that enable them to cope with errors, unexpected events, unusual demands, etc. Distribution of policy changes/additions is easily achieved using an active network mechanism. - Simplified service management. Rather than operators being required to manage a huge range of Internet services they could simply offer a managed, virtual machine based, processing platform together with a library of service components. Users would then be free to compose and manage their own services to run on the platform, and operators would only need to manage the platform access service and the library usage service. Clearly this significantly reduces complexity and cost. It also radically improves time to launch for new services since the requirement for novel management development is minimised. - Additional flexibility to facilitate fast service introduction & enhancements. The service in question should be of sufficient complexity to warrant programmability (low-complexity services might still be best realised with "rigid" technology). - 3rd party development of value added services will arise. This does not prevent operator control related to the development if so wished, however (code authorisation). - The range of services will increase (=application level services). This is a direct consequence of the programmability and 3rd party involvement - when everyone can develop services, rather than just those developers employed by operators, many more services can be developed in response to user needs. - Possibilities to offer customised services will be improved. To support a high level of service flexibility it is crucial to use techniques that avoid undesirable interactions between the programmes. There are two main possibilities; use of a sandbox that eliminates any possibility of interaction, and extensive testing and monitoring of the programmes. The second possibility allows greater flexibility but at the cost of increased management overhead.
The Impact of Active Networks on Established Network Operators
197
3.1 Research Some outstanding research challenges the study identified in active networks are: - Active network topology - deciding the location and number of active nodes, location change handling, configuration of the topology - Protocols between active nodes and programs (routing etc.), measuring & diagnostics. - The programming paradigm (languages, specification techniques, distribution model, co-operation models, frameworks). - Methods to alleviate the feature interaction problem. - Development of standardised proof for active network software. The proof of the properties of the active network software is a problem related to the satisfying the code safety and code/node security. - Code development for active node Operation Systems - resource controls, scheduling, active kernels. - Flexible hardware (firmware) in support of active nodes. - Changes to operator workflows & processes (service provisioning, billing etc.). - Management of Active networks, active FCAPS (Fault, Configuration, Accounting, Performance and Security management), Agents. - Tools test & debug. - I/O for Mobile Agents, Network aware applications and humans. - Integration with existing systems
3.2 Marketing and Services Active network dependent services may prove to be an invaluable marketing spearhead for TelCos, since they enable rapid deployment of 'value add' that is apparently part of the network, but which can, in fact, be sourced from 3rd parties. This will enable market response to be quickly tested, and market presence to be rapidly established with minimal development costs. There will also be real benefits to users compared to "plain vanilla" ISP services, which will tend to markedly increase network traffic. In addition there will be many opportunities for novel forms of commercial activity by both operators and their customers. Some key market possibilities are highlighted below: - Provision of active network services - Service & code brokerage, directories for active programs & their sources (advertisement/searching/advising) - Service "Operating System“ leasing (processing time to be charged) - Active network program distribution and storage (information logistics) - Facilities management of third party services, e.g. advertising, access control, usage logging, billing, revenue collection
198
Arto Juhola et al.
3.3 Infrastructure/Roll-out Our initial view of the probable evolution is as follows: - First generation: Active nodes at the edge of networks, code running in userspace and at application layer, 2 years from now. - Second generation: Sparse overlay of fixed interdomain active nodes, some processing at kernel level, new routing technology at active nodes, 5 years from now. - Third generation: Dense overlay, dynamic re-configurable/self-configurable topology, active firmware, new technology in all nodes (not replacing old technology but extending it), 10-15 years from now.
4
Summary
The results from the case studies indicated that the application of active/programmable network ideas is useful for some individual network features, and necessary in creating flexible wide area services such as the dynamic Extranet. There are also a large number of operational benefits to network owners who deploy active networks. Although there is a need for much further research, the near-term technological possibilities and the revealed service provision prospects are extremely promising. We are therefore confident that network operators will be enthusiastic participants in the research and will also be willing customers of the results as the research matures. Early adoption is likely to follow the application layer active networking approach as this is both more immediately realisable and less risky than other alternatives
References [1] D.Tennenhouse, D.Wetherall, „Towards an active network architecture“ Computer communication Review, 26, 2 (1996), pp5-18 [2] Alexander, Shaw, Nettles and Smith „Active Bridging“ Computer Communication Review, 27, 4 (1997), pp101-111 [3] A.Lazar „Programming Telecommunication Networks“ IEEE Network Oct 1997 pp 2-12 [4] D.S.Alexander et al „A secure active network environment architecture“ IEEE Network 1998 [5] M.Fry and A.Ghosh „Application layer active networking“ HIPPARCH ’98 Workshop [6] E.Amir, S.McCanne, R.Katz „An active service framework and its application to real time multimedia transcoding“ Proc SIGCOMM ’98 pp178-189 [7] G.Parulkar et.al „Active Network Node Project“ Washington University St Louis [8] P. Cao, J. Zhang and K. Beach „Active Cache: Caching Dynamic Contents (Objects) on the Web“. [9] IETF RFC 2002, Network Working Group, Editor C. Perkins, IP Mobility Support
The Impact of Active Networks on Established Network Operators
199
[10] IETF RFC 2501, S. Corson, and J. Macker, Mobile Ad hoc Networking (MANET): Routing Protocol Performance Issues and Evaluation Considerations [11] IETF Draft, Luis A. Sanchez, Gregory D. Troxel, Rapid Authentication for Mobile IP [12] IETF Draft, Route Optimization in Mobile IP, Chales Perkins, David B. Johnson, draft-ietf-mobileip-optim-07.txt [13] P. Calhoun, C. Perkins, "Mobile IP Dynamic Home Agent Allocation", draft-ietfmobileip-ha-alloc-00.txt, November 1998. [14] P. Calhoun, C. Perkins, "Mobile IP Foreign Agent Challenge/Response Extension" draft-ietf-mobileip-challenge-00.txt, November 1998 [15] SUN Jini: http://www.sun.com/jini/
Using Active Processes as the Basis for an Integrated Distributed Network Management Architecture Dominic P A Greenwood1 and Damianos Gavalas2 1
Fujitsu Telecommunications Europe Ltd. Northgate House, St. Peters Street, CO1 1HH, Colchester, UK Tel: +44 (0)1206 363002 [email protected] 2 Communications Networks Research Group Electronic Systems Engineering Department, University of Essex CO4 3SQ, Colchester, UK Tel: +44 (0)1206 872425 [email protected]
Abstract. Active, mobile processes or agents are a technology whose features promise to have a significant impact on the distributed management of communications networks. Through the interaction of agents with one another and their device-based environments, active network architectures arise. This paper outlines a modular architecture that exploits these features to create a network-element management framework supporting both agents and conventional SNMP messaging. A process kernel is described which houses both pre-loaded and visiting agent processes, the pre-loaded processes having well defined functions including automated device registration and local fault diagnosis. An operational sequence is discussed, highlighting the beneficial usage of agents for certain automated tasks and an experiment measuring the response time of agents versus centralised SNMP polling is evaluated. Keywords: Active networks, distributed network device management, agents, convergence, SNMP, programmatic fragmentation.
1
Introduction
With the current trend towards the seamless integration of heterogeneous telecommunications and data networks with vendor specific devices, the requirement for flexible and comprehensive management architectures is substantial. As the IP protocol begins to take a foothold in the telecommunications world, due to its low cost and ease of integration with local private networks, it is more than likely that a proliferation in the number of addressable network elements is imminent. From such Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 199-212, 1999. Springer-Verlag Berlin Heidelberg 1999
200
Dominic P A Greenwood and Damianos Gavalas
growth arises the inevitable requirement for end-to-end QoS, that must be managed throughout a packet’s entire network path for guaranteed high-QoS transport. In addition, the user driven demand for rapid evolution in service availability will ultimately lead to user-customisable service level behaviour, which in turn will drive the deployment of new services. All this leads to one inescapable fact; that current management architectures and protocols will not be able to scale to the required degree of complexity. The current telecommunications management stack is commonly viewed as three primary layers; the device/element layer, the network layer and the service layer. The latter of these tends to operate as a superimposed market above the former two, which remain ingrained with simple message based protocols such as SNMP [1] and CMIP [2]. Although these are widely accepted standards, their limitations become evident when compared with a hybrid of message and active process management environments, as shown by [3] and [4], and as presented in this paper. The traditional message based communication mechanism used by both SNMP and CMIP relies heavily on frequent polling of devices by a central management station, which analyses all incoming data searching for problems and changes in condition. In by far the majority of cases, bandwidth is consumed to learn nothing other than that the network is operating within acceptable parametrical boundaries. The argument against moving towards more distributed and active management architectures assumes that current networked devices have an adequate level of control. However, this requires that network administration remains a centralised, operator-dependent and message based process, ignoring the many benefits of having devices capable of autonomous management and configuration. Even if we do not consider the benefits of a seamless, element oriented management stack, there still remains the potential for improvement over traditional SNMP/CMIP as shown in [5], which discusses the use of active processes for performance management. Many of these issues are highlighted in [6], which discusses the utility of programmatic elements in the upper layers of the management stack, i.e. service and customer management (a superset of service management). At the more fundamental level of network element management, [3] and [4] consider the implications of migrating network management towards progressively more active architectures. In light of these factors and the extensive work conducted to date within the field of active networks, this paper proposes a modular architecture based on active processes, which adopt specific operational functions within a managed device. Section 2 outlines the agent process and highlights those features applicable to this topic. Section 3 introduces the modular, network element management architecture and discusses its components. Section 4 describes several operational sequences that the architecture may undergo subsequent to activation and presents some preliminary results comparing the use of agents to centralised SNMP polling for the calculation of an aggregated function. Finally, Section 5 draws conclusions on the implications of the proposed technology.
Active Processes as Basis for an Integrated Network Management Architecture
2
201
The Active Process Component
The mobile programmatic element, or agent for the purposes of this paper, has existed for many decades, initially in the form of computer viruses. In essence an autonomous entity, the agent interacts with its environment and other active processes to achieve an overall goal. Contemporary features have led to the agent finding a role in areas associated with distributed computing [7] and, by association, with inter-communicating networked devices. For the purposes of network device management, as discussed in this paper, the agent takes the abstract form described by Figure 1: I n t e r f a c e
Component Architecture
Engine
Component Extensions
Information Intelligence Strategy Interaction
Figure 1. Abstract component-based structure of a generic process agent
The agent consists of a set of core functional components linked to the process engine. The interaction component controls the agent’s interface to other agents and the network environment. In this model tautology, only specific query and transaction primitives are defined, allowing the agent to confer with its local host regarding resource manipulation and device configuration. Additionally, semantic language constructs allow tiered interaction with peer agents, the level and complexity of communication relating to intrinsic capabilities of the agent through its intelligence and strategy components, and the specific management task. To clarify, agents may require the ability to negotiate for resource access, perhaps acting as a broker on behalf of service provider. As such, the agent can barter, which in turn requires a concept of credit exchange. The FIPA specified ACL [8], defines the syntactical and semantic clauses required for such interaction, and is the subject of current developmental work. The information component of Figure 1 defines any knowledge that the agent has concerning its specific task, and acts as the logical repository for any conveyed management information. Also shown is a facility to integrate component extensions into agent profiles. Achieved through the process of programmatic fragmentation, an agent may access the additional routines and data segments through the equivalent of dynamically linked object instantiation. This technology is related to the dynamic code integration described in [9], for the process oriented APRIL language, but implemented in Java for this architecture. Java has been chosen primarily due to its conceptual ethos as a portable, strongly typed language. Although there are problems currently with Java’s processor intensive support environments, it remains entirely suitable for an experimental framework.
202
3
Dominic P A Greenwood and Damianos Gavalas
Process Oriented Device Management
Having briefly outlined the proposed architecture and discussed characteristics of the process agents’ employed within it, we now discuss the integrated device architecture itself. Shown in Figure 2, the proposed system is separated into two primary sections: the Device Resources and Device Kernel, and the Process Kernel. Between these two lies the Protocol Interface and to the right of the diagram, the External Process Filter Interface (EPFI). Device Resources
Process Kernel
R
. . . . . 1
Rc
a
CFG
a
D&R
a
1
Device Kernel Rc
REG
MIB
a
X#1
a
X#2
a
External Process Filter Interface
1
R
n
Protocol Interface
Figure 2. Agent enabled device management architecture
Within the Device Resources section is found an arbitrary resource configuration, dependent upon the device function. In general, Rck are consolidation units, with R1 to Rn representing individual resource components. The Device Kernel is the local control processor specific to the type of device, responsible for hardware management and controlling internal packet transfers and flows. The Process Kernel section resides logically adjacent to the Device Kernel separated by the Protocol Interface. A Java virtual environment, the Process Kernel allows agent processes to execute as threads, accessing the device resources via the device kernel and communicating locally with one another. In the proposed architecture, each active device is preinstalled with four agent processes: the REG a, the CFG a, the D&Ra and the MIB a. These represent dedicated task agents for Registration, Configuration, Diagnose and Repair, and MIB Messaging respectively. The X#n a agents represent non-specific process agents that may arrive from an external source, or be generated internally by an Agent Generator process, which is described later. The following section now describes each of these core components in detail.
Active Processes as Basis for an Integrated Network Management Architecture
3.1
203
The Protocol Interface (PI)
(c)
Pr-Int-3 Pr-Int-1
Pr-Int-2
Pr-Int-2
(b)
St-Int
Propprietary Interface (Pr-Int)
(a)
Pr-Int-1
Standard Interface (St-Int)
The Protocol Interface forms a security shield and protocol translation boundary between the primary device components and the management process kernel. In its capacity as a security shield, all queries and configuration commands originating from the process kernel are authenticated to ensure validity. Additional protection could potentially include data encryption, although this is unlikely to be feasible when considering real-time operating constraints. It is more conceivable that a trust relationship would exist across the interface, local to the device. The protocol translation function of the PI enables the managed device to interoperate with proprietary agents, whether they have common or proprietary interfaces. The aim of standardisation bodies, such as FIPA [10], is to ensure that mobile agents have a common interface with an accepted baseline communication protocol. It may however often be the case that a vendor requires restrictive control over device or agent access by defining a proprietary protocol. This ability, whilst reducing interoperability, is potentially a flexible and useful control of agent access to the device kernel and resources. Figure 3 illustrates a selection of interface configurations:
(d)
Figure 3. Protocol Interface, optional protocol configurations.
As can be seen, subject to requirements the PI may take any of the configurations shown in Figure 3. The most flexible of these is that shown in (d), where the underlying common interface has a number of optional proprietary protocol overlays.
3.2
The External Process Interface (EPFI)
Not yet implemented, this component of the proposed architecture is an optional external interface that provides a protective barrier between external network devices and the process kernel, to guard against corrupt or malicious code access. Whereas the PI is responsible for verifying individual resource access and control signals, the EPFI validates agent authenticity against accepted criteria, before passage into the process kernel. This feature must remain optional due to the proprietary nature of many devices; it may not be possible to define widely accepted, trusted and reliable authentication procedures.
204
Dominic P A Greenwood and Damianos Gavalas
3.3
The Process Kernel
The Process Kernel is the host environment of the device management system and is specifically responsible for controlling the flow of information between resident agent processes and managing its internal and external interfaces. In the current implementation, the kernel is a Java Virtual Machine with a native interface to the Device Kernel. Each resident agent process is coupled to the system via a thread link from where it may signal other agents within a transaction forum. A fundamental feature of agents is that they may be functionally extended at runtime by using either dynamic libraries or language dependent extension features. This implies that functionality may be changed ‘on-the-fly’ without the need to replace the entire agent – although this may be necessary in certain cases. The essential agents included in all managed devices, as shown in Figure 2, are the Registration agent (REG a), the Configuration agent (CFG a), the Diagnostic and Repair agent (D&Ra) and the MIB Messaging agent (MIB a). These agents may be suspended, terminated, replaced or updated according to requirement and correct authority. As previously noted, the PI performs any necessary protocol translation and security authentication before transactions may take place between active agents. 3.3.1
The Registration Agent (REGa)
The REG a is a replaceable agent process, activated at device initialisation (boottime), and responsible for registering its host device with other active network devices. The REG a must contain a unique identifier (commonly a MAC or IP address) for the new device or DHCP [10] server location. Using a local node discovery algorithm, such as the IPv6 anycast function [11], the REG a locates a nearest neighbour and migrates to its docking facility1 via a recognised TCP/IP port. The REG a registers its host device with the device visited and, if specified, exchanges resource information. Ideally, this exchange would allow the REG a to identify the visited device’s resource function and availability, routing tables and particularly, the location of any local management stations. Naturally though, such information may be unavailable or restricted, but regardless, the REG a will continue to visit local devices until its terminate/return criteria have been satisfied. Dependent upon prevailing network conditions, or pre-specified behaviour, the REG a either messages or collates. In message mode, information is returned to the source device from each device visited, before termination criteria are reached. In collation mode, information is accumulated by the REG a process, before returning to its host. Evidently, a trade-off exists between assured arrival of information and bandwidth preservation.
1
The IPv6 ‘anycast’ function searches for any available node within a specified domain member set.
Active Processes as Basis for an Integrated Network Management Architecture
3.3.2
205
The Configuration Agent (CFGa)
The CFGa AGENT controls device self-configuration, primarily through patterned calibration of device parameters. A ‘pattern’ being an array of pre-determined parameter settings defining the configuration of both resource and management subsystems, which may be applied by the CFG a in response to operating conditions. The level of autonomous control varies from complete pattern selection to pattern parameter modification. Considering the device architecture in Figure 2, CFG a configuration patterns control such features as: Device status. Resource configuration and access control. Thread and socket availability and prioritisation. Agent design, strategies and production. Security, authentication and encryption. When the network device is installed, the CFG a specifies initial setup conditions. These may be subject to change according to information returned by the REG a process. For example, a domain management controller may wish to limit the a production of X#n agents according to specified criteria. The device may be forced to adopt this policy and hence update its internal configuration. 3.3.3
The Diagnose / Repair Agent (D&R a)
The D&Ra module provides an internal service that monitors the behaviour of device resources and management components, watching particularly for faults and anomalies. A diagnosis may include the identification of an imminent resource failure based on observed behaviour. Such a situation would cause the repair mechanism to switch to a spare resource, if available, and notify registered management stations of the situation. The specific processing capabilities of the D&Ra are device and vendor dependent, and hence the diagnosis function may range from a fault location report to a complex diagnosis of the fault and potential causes. In the latter case, processing may include the ability to draw hypotheses based on known facts and reasoned heuristics, enabling the device to draw its own conclusions and commence a repair programme. 3.3.4
The Messaging Agent (MIB a)
In order that the framework be compliant with SNMP and CMIP standards, the MIB a provides a standard Management Information Base and a query interface for basic messaging. For the purposes of many management enquiries and actions, a simple message service is appropriate by comparison with a relatively heavyweight mobile agent.
206
3.3.5
Dominic P A Greenwood and Damianos Gavalas
Non-Specific Function Agents (X#n a)
In addition to the system agents, the management subsystem may create and host dynamic agents (X#n) . These may either be created internally by the AGen (see Section 3.4) or arrive from an external source, via the EPFI. Before activation, an X#n a is assigned a process thread by the kernel that when active, forms a conduit through which the agent may communicate with other resident agents and query managed resources. Some examples of the type of task an X#n a will conduct are: Interrogate device resource availability. Negotiate for resource access. Supply code updates for resident agents. Deliver active processes to update the host device functionality. Provide a legacy management interface for SNMP/CMIP compliance. The interface of any agent that arrives from an external source must conform to security and protocol requirements imposed by the PI. Otherwise the agent will be terminated and its source node notified if required.
3.4 The Agent Generator (AGEN) This component is optional, dependent on whether the vendor requires the device be capable of creating its own agents. If resident, the AGen exists as a separate process within the process kernel, although for many managed devices it is normally sufficient that they accept X#n a’s from external sources, without taking an active role in agent generation. The absence of an AGen component restricts the creation of agents within the managed network, thereby regulating their proliferation. If required however, an AGen process may be uploaded from an external source subject to device authentication and publishing rights. The AGen creates an agent according to blueprint patterns it holds. Each describes in detail the framework, components and interfaces required in building the desired agent. If any components are unavailable at the local host, a message is dispatched to other known devices (including management stations and vendor sites) to request copies of the required code. This naturally creates a distributed component library where core elements are present at all devices that require them, and more specialised ones located at known repository sites. Agents may take on a variety of tasks ranging from basic SNMP type operations to complex autonomous configuration and service management. The AGen must therefore have access to multiple component types, and be sufficiently complex to construct information channels between them.
Active Processes as Basis for an Integrated Network Management Architecture
4
207
Experimental Network Architecture
The development of an experimental architecture for evaluating the concepts described here is in progress. Based on the generic device construct described by Figure 4, the hardware subsystems including the Device Kernel and Device Resources are currently abstracted into the software simulation framework. Given that several of these simulated devices may be interconnected forming a network, a traffic stream is introduced emanating from an external source. Each actively managed device then filters the packet flow, searching for agent and message signatures in their headers sending normal traffic onto the resource and directing agent and message packet sequences into the Process Kernel for re-assembly. Agent/Msg Path
Process Kernel
Agent/Msg Path
Pkt / Agent Filter
Pkt / Agent Filter
Device Kernel
Traffic Path
Traffic Path Resource
Figure 4. A generic device
4.1
An Operational Sequence
To illustrate the proposed architecture in operation, consider an abstract network configuration as described by Figure 5: To Vendor
Device A Switch Management Station
Device B
Device X
Figure 5. Simple network configuration model
208
Dominic P A Greenwood and Damianos Gavalas
Here two existing devices, DeviceA and DeviceB are connected to a switch, with DeviceB also connected to a local Management Station (MS). At an arbitrary point, a new device, DeviceX, is introduced to the network and given a direct path to the switch. The timing diagram shown in Figure 6 describes the interaction between these devices shortly after DeviceX is introduced and activated. The Agent and Msg tags indicate the transmission of an agent or standard message, respectively.
Device X
Device B
MS
Vendor
a
1
Agent: Tx REG
a
Agent: Forward REG Msg: Return NodeB Info.
a
Agent: Return REG a
2
Msg: Tx CFG update
Msg: Request driver a
3
Msg: Req. resource avail.
Agent: Return X#1 component update
Agent: Return driver
Msg: Ack. resource avail.
4
Msg: Alarm to MS Msg: Ack. Alarm Agent: Upload of new SNMP X#n to MS a
5
Agent: Upload of new SNMP X#n to devices a
a
Msg: Ack. receipt of X#n
Time
Figure 6. Interaction timing diagram
Following is a description of each of the listed event sequences, numbered 1 to 5:
①
DEVICE REGISTRATION Following activation of Device X, the device’s REG a is emitted on an anycast towards the first detectable device. In this sequence, the first located node is Device B, which accepts the REG a and forwards it onto the known regional MS. In addition, the REG a requests of Device B (and any additional intermediate nodes before an MS is reached), that it returns a message to the source device to confirm arrival. Once the MS receives the REG a, it registers the new device and notes its available resources. All known devices (except Device B) are then informed of the presence of the new device. The REG a then returns to its source, with the inclusion of MS management information relating to resource location, resource and service availability, and network device addresses.
Active Processes as Basis for an Integrated Network Management Architecture
209
②
CFG a UPDATE This sequence illustrates that during device configuration, a resource is detected for which no driver is locally available. The MIB a therefore sends a message to the local MS requesting download of a extension component for the resident CFG a. The MS in turn resolves the vendor location for the required driver and requests the update be sent as an installable X#n a agent, which is then transferred by return to the requesting device.
③
RESOURCE AVAILABILITY QUERY Under normal operational conditions, the device will regularly receive requests for management information or resource availability. This sequence illustrates a device requesting resource information from Device X.
➃
FAULT DETECTION This sequence illustrates polling of the MS with an alarm notification; a resource fault detection at the new device. The fault is initially detected by the D&Ra system monitor, which attempts to diagnose the fault and if no local repair is possible, inform the MS. The MS in turn acknowledges alarm receipt, attempts to resolve alternative resources for affected services and takes appropriate action to repair the device.
⑤
X#n a UPLOAD The final sequence describes the situation where a vendor releases an updated version of an X#n a that certain devices utilise. The replacement or update code is first sent by the vendor to all registered MS’s from where it is dispatched to all affected devices as an agent X#n a. Each device returns a receipt message to the MS upon successful installation.
The modular nature of the architecture allows any components to be updated or replaced as required, thus providing additional functionality alongside flexibility and reliability.
4.2
Example of an SNMP Management Task
In order to test the potential improvement over traditional SNMP centralised polling, this section presents some preliminary results of an agent based calculation of an aggregated MIB function. Otherwise known as a health function, the aggregation consists of multiple MIB variables representing the percentage of discarded IP input datagrams over the total number of datagrams received during a specified time interval. When using a traditional centralised polling scheme the value of this function would be computed by the manager upon return of all object values from the polled static management agents. Hence, if Sr is the average request/response size, and polling of n devices for v operational variables is applied, the wasted bandwidth (BW) for i polling intervals would be:
210
Dominic P A Greenwood and Damianos Gavalas
BWstandard = 2 * Sr * n * v * i
(1)
Alternatively, a sequential agent based scheme may be applied, where a single agent travels from the original source, visiting each device in turn before returning the calculated results from the last polled device. The health function calculation is performed at the local agent host, leading to a balanced distribution of the computational burden. Additionally, considerable compression of data is achieved as a large number of observed operational variables are reduced to a single value. Thus, assuming an agent average code size of Ac and state information size As, the resulting overhead would be:
BWagent = (Ac * n) + (As * n * i)
(2)
Here the v variables are aggregated into one. The first term of equation (2) describes the overhead imposed when broadcasting the agent code to all management servers, whilst the second represents the bandwidth consumed by the agent state transfers between the manager and the polled devices. Thus, for large v and i, agent based polling is clearly less bandwidth intensive than centralised polling. In Figure 7, the agent based polling scheme is compared to standard SNMP. The response time for the acquisition of the specified health function is measured, with Figure 7a showing that the results of applying a single agent to the task does not scale well as the number of devices increases. If however an agent is assigned a fixed number of devices, thereby partitioning the network into logical segments, is should be possible to improve the response times. Response Time (ms) 1600
Response Time (ms) 1800 1600
1400
1400
1200
1200
1000
1000
800
800
600
600
400
400
200
200
0
0 0
(a)
2
4 6 # Polled Devices
SNMP-based Polling Agentpolling (2 agents)
8
Agent polling (1 agent) Agent polling (3 agents)
10
1
(b)
2
3 4 5 6 7 #Devices Assigned per Agent
Agent polling of 4 devices Agent polling of 8 devices
8
9
Agent polling of 6 devices Agent polling of 9 devices
Figure 7. Response time (a) Number polled devices, (b) Number of devices assigned per agent.
Figure 7b shows the results of applying such partitioning. Depending on the managed network size, the optimum number of segments may be determined from the minimum point of the corresponding curve. For instance, in a network of six devices, the response time is minimised when each agent is assigned to two devices, i.e. the network is segmented into three domains. Hence, any active domain manager should autonomously match the number of segments to the number of required polling agents, according to the volume of managed devices. This issue is currently under investigation.
Active Processes as Basis for an Integrated Network Management Architecture
5
211
Conclusions
The adoption of management architectures as that proposed in this paper is dependent on device vendors integrating process execution environments into their devices. When such apparatus becomes readily available, the implication is that active network architectures will begin to evolve further. This paper has proposed a modular, process based management architecture that highlights the benefits of active code agents, including flexibility, reliability and inherent scalability, all core issues in the convergence of the telecommunications and data-communications worlds. Preliminary results have also been described for an experiment highlighting the beneficial use of agent processes for simple aggregated SNMP function calculation. Response time was shown to have improved over that for traditional centralised polling, with an intrinsic reduction in bandwidth consumption also apparent due to the elimination of broadcast polling. Current work pertains to further implementation of the described architecture and related issues concerning the function and interaction of agent processes.
References [1] [2] [3]
[4] [5]
[6] [7] [8]
Case J., Fedor M., Schoffstall M., Davin J., ‘A Simple Network Management Protocol (SNMP)’, RFC 1157, DDN Network Information Center, SRI International (1990). ISO/IEC 9596, Information Technology, Open Systems Interconnection, Common Management Information Protocol (CMIP) – Part 1: Specification, Geneva, Switzerland (1991). Susilo G., Bieszczad A. and Pagurek B., ‘Infrastructure for Advanced Network Management based on Mobile Code’, Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS'98), New Orleans (1998) 322-333. Ku H., Luderer G., Subbiah B., ‘An Intelligent Mobile Agent Framework for Distributed Network Management’, Proc. IEEE Globecom'97, Phoenix AZ (1997) 160-164. Gavalas D., Greenwood D., Ghanbari M., O’Mahony M., ‘An Infrastructure for Distributed and Dynamic Network Management based on Mobile Agent Technology’, IEEE Int. Conference on Communications (ICC’99), Vancouver (1999). Magedanz, T., Rothermel, K., Karuse, S., ‘Intelligent Agents: An Emerging technology for Next Generation Telecommunications ?’, Infocom’96, San Francisco, CA (1996). Carzaniga, A., Picco, G. and Vigna, G., ‘Designing Distributed Applications with Mobile Code Paradigms’, Proc. 19th Int. Conf. On Software Engineering (1997) 22-32. FIPA 97 specification – Part 2 : Agent communication language, http://www.fipa.org/spec/FIPA97.html
212
[9]
Dominic P A Greenwood and Damianos Gavalas
McCabe, F. G. and Clark, K. L., ‘APRIL: Agent Process Interaction Language.’, Intelligent Agents, Lecture Notes in Artificial Intelligence, 890, Wooldridge, M. J. and Jennings, N. R., Eds, Springer-Verlag (1995) 324—340. [10] Chiariglione, L. ‘FIPA: A General Overview’, http://drogo.cselt.stet.it/fipa/ Papers [11] Dynamic Host Configuration Protocol: Draft Standard RFC 2131: IETF (1997), ftp://ftp.isi.edu/in-notes/rfc2131.txt [12] Narten, T., Nordmark, E., Simpson, W.A., ‘Neighbor Discovery for IP Version 6 (IPv6)’, IETF, http://www.ietf.org/internet-drafts/draft-ietf-ipngwg-discovery-v2-03.txt
ANMAC: An Architectural Framework for Network Management and Control using Active Networks Samphel Norden and Kenneth F. Wong Applied Research Laboratory Department of Computer Science Washington University in St.Louis, USA Email : {samphel,kenw}@arl.wustl.edu Abstract. In this paper, we propose a new framework called Active Network Management and Control (ANMAC) for the management and control of high speed networks. The software architecture in ANMAC allows routers to execute dynamically loadable kernel plug-in modules which perform diagnostic functions for network management. ANMAC uses mobile probe packets to perform efficient resource reservation (using our novel reservation scheme), facilitate feedback-based congestion control, and to provide “distributed debugging” of complex anomalous network behaviour. ANMAC also provides security measures against IP spoofing, and other security attacks. The network manager has the flexibility to install custom scripts in routers for tracking down anomalous network faults. Key words: Active Networks, Network management, QoS, router plugins, resource reservation
1
Introduction
Technological advances in the network infrastructure (ATM, IPng) have led to the development of high performance local and wide area networks. Although advanced networking technology has increased the range of potential applications to include video conferencing, remote collaboration, and metacomputing, it has also increased the potential for chaos. For example, most of us have been on the receiving end of a network-wide denial-of-service attack that could have been initiated from some anonymous, distant host. Even more prevalent is the helpless feeling of waiting for a web page without knowing the reason for the delay. Although network management alone will not solve these problems, these are two examples of problems that may be examined through network management functions and in some cases, corrected. Network management becomes even more critical as user expectations and network usage rise with the increasing availability of high bandwidth, QoS based network technology. As networking technology becomes more complex, there will be a need for management tools that can utilize dynamic functionality in the network itself. Complexity in networking systems naturally leads to an increased level of errors Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 212-219, 1999. c Springer-Verlag Berlin Heidelberg 1999
ANMAC: An Architectural Framework for Network Management and Control
213
in the system implementation itself. Furthermore, the growth in the scale and number of networks presents additional problems. Network growth can make manual inspection impractical and increases the chance for strange network behavior caused by misconfigured devices. It is the unpredictable, anomalous behavior that is the most difficult to identify, characterize, and repair. For example, the BGP flapping problem [6] has plagued the Internet for years and is still not completely understood. Identifying and resolving unpredictable scenarios is often akin to program debugging because it is targeted at the rare cases. In this paper, we explore the use of active networking ideas in managing networks. In particular, we examine how the addition of dynamic functionality to network devices (specifically routers) can aid the network manager and how these same features can be leveraged to provide value-added network services to the user in the form of service setup, inspection and verification. We describe a novel architectural framework, the Active Network Management and Control Architecture (ANMAC). ANMAC uses the concept of dynamically loadable plug-ins [3] to install modules at active routers. The active routers incorporate functionality to perform feedback based congestion control, resource control and a variety of management functions. The rest of this paper is organised as follows. Section 2 describes traditional approaches in network management and discusses the need for an active networking approach in network management. Section 3 presents ANMAC, our active network framework for network management. Section 4 describes the key plug-in modules that are used in ANMAC. Section 5 concludes this paper.
2
Active Networking in Network Management
Traditional approaches to network management are based on simple approaches such as SNMP and RMON that are built bottom-up [7]. SNMP is in wide use due to its operational simplicity requiring very little prior knowledge to begin gathering interesting information. RMON gives a network-wide view and is concerned mainly with dealing with data from network probes (e.g., LAN analyzers). Thus, it complements SNMP which is concerned more with individual devices than the network as a whole. RMON is based on the same paradigm as SNMP: 1) the manager has a relatively static set of management functions; and 2) control is centralized at the NOC (Network Operations Center). The manager sits at a console and issues primitive operations in the hopes of finding a reason for poor network performance. The basic control structure is a master-slave one with communication to devices as slaves and from the NOC as master. In general, we see that the traditional approach is static requiring manual configuration, and uses a centralised scheme that is not scalable. Furthermore, it is impossible to respond to network faults in real-time which may be crucial for mission-critical applications. Active Networking (AN) is an emerging area of research that enables the entire network to be a fully programmable computational environment. Traditionally, there have been two approaches for code injection in active networks:
214
Samphel Norden and Kenneth F. Wong
capsules and programmable switches. Capsules [8] are miniature programs that are transmitted in-band and executed (through interpretation rather than direct execution) at each node along the capsule’s path. The result of the computation can determine what happens to subsequent packets. Programmable switches are programmed out-of-band by the network manager by injecting code into the network. While the former approach is flexible in terms of adapting to the user requirements, the code fragments are limited in size and therefore their functionality. Also, virtual machines restrict the address space that a capsule can access for security purposes limiting the applicability of the capsule. For example in [8], the code that is contained in the capsule is of a size less than a kilobyte. The programmable switch approach is static and requires the manual configuration of the router by the network manager. As explained subsequently, ANMAC uses a hybrid active networking approach that brings to network management increased efficiency, scalability, and flexibility. Scalability: As the number and complexity of nodes increase, management stations become points of implosion and can face large amounts of redundant information. A scalable solution would be to tailor the amount of information that is returned to the management center. This is done by a two-level hierarchical event gathering mechanism that allows the routers themselves to perform low level management of their local devices, while providing specific non-redundant information to the higher-level NOC. Performance: There are several performance enhancements that can be realised in ANMAC which can be categorised into pipelined and parallelised operations. For example, if an application requires a complex operation to be performed, then we can fragment this operation by pipelining the operation across different active routers in the QoS route of the application. Alternately, if packets belonging to different flows require different operations to be executed at the router, then the router can handle several flows concurrently. Dynamic functionality: Many opportunities in network management can be derived from the dynamic functionality offered by ANMAC. Some examples are: Flexible monitoring and processing: ANMAC allows rapid and flexible installation of flow monitors at various levels of aggregation, which is crucial for maintaining QoS guarantees. Mobile probes: Mobile probes are special control packets sent by the network manager to dynamically enable functions such as resource reservation, congestion control, and “distributed debugging” of complex network faults. Handling security attacks: In active networks that use the capsule approach, security is a major issue. How does the router trust that the code sent by the user is authentic and non-malicious? ANMAC provides a secure robust interface to the network manager. Distributed debugging: Each active router can have some special storage where it can store its past actions and state. If a network fault occurs, this information is transmitted to the NOC. The NOC can now use the logged information to
ANMAC: An Architectural Framework for Network Management and Control
215
simulate network requests and trace the network fault in a step wise manner. Dynamic functionality can be a boon to network management. However, there is a need to classify the extent of the functionality that one can achieve in routers. Processing power at the routers are constrained by the fact that they need to switch packets at gigabit and possibly terabit speeds. There is a fine line between how much computation can be achieved at the router before the packet is switched, and the actual benefit of the computation. However, the key thing to note is that ANMAC is a flexible architectural framework that facilitates incremental deployment.
3
The ANMAC Framework
The ANMAC framework consists of the active network subsystem (active routers), and the network management subsystem that includes components that manage and monitor the network including agents and mobile probes.
NOC
User AR
AR User AR User
AR AR User
AR
User - Network Probe AR - Active Router
Fig. 1. Network Topology
3.1
The Active Network subsystem
In ANMAC, active routers offer different services depending on the plug-ins that are locally available. Plug-ins are downloaded from centralised components called code caches [1] on demand. Users send packets with headers containing function identifiers which are indirect references to plug-in code modules at the router as well as input parameters. The overall topology resembles Figure 1.
216
Samphel Norden and Kenneth F. Wong
There are two types of packets that are transmitted in ANMAC. The first kind of packet is a control header packet that sets up state information at the routers for the rest of the application packets. In the event that the router has the code stored locally, it will execute the code on all packets that subsequently follow the header packet. If necessary, the control packet could result in parallel downloads on all links of the route chosen for the data packets reducing the latency for downloading. The other approach is one where every packet has an identifier, and state need not be reserved in routers for the application. Applications that use this approach may need a variety of functions to be applied and packets may take different routes depending on the plug-ins stored at the routers. The end-user could use the global knowledge of the network manager to query which routers have the necessary plug-ins. However, there is a tradeoff in the two approaches between minimizing state information at the routers (scalability), versus the longer latencies in finding appropriate routes. We further propose an additional performance enhancement in ANMAC. Active routers possess the ability to transfer a plug-in module to other routers that do not have the module, allowing load sharing. More importantly, this can have important consequences in resource allocation, when no existing QoS route is found initially, but on plug-in migration, routers adapt themselves (by acquiring plug-ins) to satisfy service and management requirements. 3.2
The Network Management Subsystem
The network management subsystem comprises of the Network Operations Center, that uses mobile probe packets, and Distributed Monitoring Entities that are installed at routers. Network Operations Center: One of the key components is the Network Operations Center(NOC) which provides an interface to the network manager to comprehensively view the current network state and “debug” network faults. It transmits control packets for downloading new or updated plug-ins at the active routers. The NOC can remotely install filters at routers that modify the services that are provided to flows. This is suitable when the NOC reroutes flows during congestion. The NOC is also responsible for efficiently correlating events to derive the correct cause of a network fault. Mobile Probe: The next component is the mobile probe packet that can be sent by both the active routers and the NOC. This packet is a special control packet that can be utilized for the following purposes. Resource reservation: Probes can be used to perform concurrent resource reservations across different routers for a given flow. We propose to deploy DRES, a novel resource reservation protocol that is described subsequently. Distributed debugging: The probes participate in the distributed debugging of network faults allowing state transfer between routers. Handling alarms: The probe packet could be sent to congested network areas by the NOC. Once a router reports back to the NOC using signaling, the probe could also be sent simultaneously by the router to find secondary routes leading to QoS renegotiation.
ANMAC: An Architectural Framework for Network Management and Control
Display Manager
217
Network Manager
Event Correlator State Debugger
Feedback Module
Events/Alarms N/W State Info Step debugging
Feedback to router
Programmable Interface
Install/upgrade Filters
Fig. 2. Network Operations Center (NOC)
Host administration: The NOC/router can send a probe directly to the enduser to collect data. In a corporate network, the NOC may have complete authority and could probe for information in any host for security reasons. This could be useful in cases when a fault occurs in the end-user’s application program. Since the problem is with the end-user and not the network, the NOC can verify this fact by sending a probe to the end-user. A further use of this feature is to prevent the sender from transmitting malicious flows. Distributed Monitoring Entity: The third key component is the set of Distributed Monitoring Entities (DME). The DMEs are responsible for actual monitoring of the links and collection of data which can then be used for efficient computation of the network management parameters such as the overbooking ratio, and other QoS statistics. Packet format: In [1], control packets had a function identifier (< f i >) to denote the function that needs to be executed on the data packets for the flow. In ANMAC we propose context dependent function identifiers (< cdf i >) that would enable the execution of a plug-in depending on some predicate or context. If we consider encryption of data within an intranet before it goes out to the internet, we need to perform the encryption only at the last (internal) hop and not at each router along the route. Thus, the < cdf i > with a location predicate would enable the selective execution of the encryption plug-in at the right location. Note that nesting < cdf i >’s allows pipelining the functions to be executed on a data stream improving the performance. IP Hdr 1 2
..........
n
Fig. 3. Packet Format
218
4
Samphel Norden and Kenneth F. Wong
The Plug-in Modules
As mentioned earlier, the dynamic load feature of the plug-in modules can be exploited to perform network management and control. The plug-in components of ANMAC that we plan to implement are given below. – Feedback based Congestion Control: Congestion is an application independent event and occurs within the network. This makes it suitable for active networks, especially since the time required for congestion notification information to propagate back to the sender limits the speed at which an application can decrease its sending rate or ramp up depending on the current state. ANMAC Routers dynamically react to congestion. Each router maintains a congestion profile and required QoS for a particular flow which is calculated by the NOC using past history. The congestion control algorithm(for example: RED) in the router would use this profile for deciding which packets to drop/shape. An alternative method is to inform the NOC which then installs special packet filters to reroute QoS flows. – Distributed Monitoring Entity Handler: This module collects the various data from the distributed DMEs and sends the data in a coherent coordinated fashion. The NOC can query the plug-in for information and install filters for specific monitoring. – QoS and Resource Control - Deferred Reservation: In ANMAC, admission control is performed by active routers rather than any centralised arbitration logic. When a user requests a QoS connection, the nearest active router will use the resource control plug-in to decide whether a new connection should be accepted. ANMAC uses DRES (Deferred REServation), our new resource reservation protocol which is described in more detail in [5]. DRES is a sender-oriented, soft-state, 2-phase resource reservation protocol that uses deferring (delaying) to increase call admissibility, and lower latency, while having overhead competitive to RSVP and ATM signalling. DRES uses the concept of tentative resource reservation (TTR) for unrejected but unadmitted flows. In an end-to-end resource request involving n hops, a resource request is not rejected immediately when there are insufficient resources at a single hop. Rather, the request continues to propagate through the network making TTRs. Furthermore, multiple requests can be propagating concurrently. During the call setup, resources can become available (e.g., call teardown, call rejection) that convert TTRs to permanent resource reservations that allow a call to be admitted. – Security: Traditional security concerns in active networks are avoided by the NOC installing plug-in code in routers. Consider for the moment, a specialised IP spoofing attack called TCP Syn-Ack flooding [4]. Once the router determines that spoofing is being done, it could inform the NOC and install a filter in its DME interface and propagate the filter to the next hop router, so that the neighbouring router can deal with the spoofed or forged
ANMAC: An Architectural Framework for Network Management and Control
219
packet in a similar fashion. Thus, subsequent attacks will fail due to this collaborative filtering process. More details of our implementation environment are available at [5].
5
Conclusions
In this paper, we have proposed a new framework for performing network management using active networks. We show that by providing routers with dynamic functionality, this allows a customizable interface that allows monitoring and management of the network at any level of granularity. We have described several dynamically loadable plug-in modules that tackle congestion, provide support for QoS traffic and provide security. We have also shown the robustness of the framework by implementing mechanisms to prevent security attacks. We plan to implement and evaluate ANMAC in a QoS-enabled test-bed.
References 1. Decasper D., and Plattner B. “DAN: Distributed code caching for active networks”, Proceedings of INFOCOM’98, June, 1998. 2. Dittia Z., Cox J.R. Jr., and Parulkar G. “The APIC Approach to High Performance Network Interface Design: Protected DMA and Other Techniques”, Proceedings of INFOCOM’97, Kobe, Japan, 1997. 3. Decasper D., Dittia Z., Parulkar G., and Plattner B. “Router-plugins: a software architecture for next generation routers”, Proceedings of SIGCOMM’98, September 1998. 4. Cisco Systems. “Defining strategies to protect against TCP-SYN denial of service attacks”, http://www.cisco.com/warp/public/707/4.html. 5. S. Norden. “ANMAC: A novel architectural framework for network management and control using active networks”, http://www.arl.wustl.edu/˜samphel/iwan.ps. 6. Labovitz C., Malan G.R., and Jahanian F. “Internet routing instability”, Proceedings of SIGCOMM’97, September 1997. 7. Stallings W. “SNMP, SNMPv2 and RMON: Practical network management”, Addison-Wesley Pub., 2nd edition, 1996. 8. Schwartz B., Zhou W., Jackson A. W., Strayer W. T., Rockwell D., and Partridge C. “Smart packets for active networks”,2nd Active Nets Workshop, March 1997.
An Active Network Approach to Efficient Network Management Danny Raz and Yuval Shavitt Bell Laboratories, Lucent Technologies 101 Crawfords Corner Road, Holmdel, NJ 07733-3030 [email protected] [email protected]
Abstract. Active networks is a framework where network elements, primarily routers and switches, are programmable. Programs that are injected into the network are executed by the network elements to achieve higher flexibility and to present new capabilities. This work describes a novel active network architecture which primarily addresses the management challenges of modern complex networks. Its main component is an active engine that is attached to any IP router to form an active node. The active engine we designed and implemented executes programs that arrive from the network and monitors and controls the router actions. The design is based on standards (Java, SNMP, ANEP over UDP), and can be easily deployed in todays IP networks. The contribution of this paper is the introduction of novel architectural features such as: isolation of the active mechanism, the session concept, the ability of active sessions to control non-active packets, and blind addressing. Implementing these ideas, we built a system that enables the safe execution and rapid deployment of new distributed management applications in the network layer. This system can be gradually integrated in todays IP network, and allows smooth migration from IP to active networking.
1
Introduction
The emerging next generation of routers exhibit both high performance and rich functionality, such as support for virtual private networks and QoS [11]. To achieve this, per flow queueing and fast IP filtering are incorporated into the router’s hardware [11]. The management of a network comprised of such devices and efficient use of the new functionality introduces new challenges. Active networks is a framework where network elements, primarily routers and switches, are programmable [14]. Programs that are injected into the network are executed by the network elements to achieve higher flexibility for networking functions, such as routing, and to present new capabilities for higher layer functions by allowing data fusion in the network layer. This work suggests a novel active network architecture which primarily addresses the network management challenges. At its center, an active engine executes programs that are received through the network. The active engine is Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 220–231, 1999. c Springer-Verlag Berlin Heidelberg 1999
An Active Network Approach to Efficient Network Management
221
attached to an IP router, and together they form an active node. We introduce the notion of a session, which generalizes the soft state mechanism that appears in many active network architectures [10,12]. This enables long lasting applications that are typical to network management, to reside in the active node. For a scalable distributed operation we introduce a new addressing mode, blind addressing, in addition to the explicit mode available in IP. We explicitly allow sessions to access the router management information base (MIB) using SNMP. Overall, we introduce a modular solution that enables an easy deployment in the current IP networks. This paper describes a working network prototype. The active engine is written mostly in C, and is demonstrated on a network comprised of software routers running on FreeBSD PCs and a Cisco 2514 router with a PC as an adjunct active engine. We put an emphasis on standard APIs and tools. The mobile code is written in Java, it is encapsulated together with data using the standard ANEP headers [2] over UDP. The engine communicates with the router using SNMP which enables it to monitor and control the router’s operation. Our approach can handle the entire range of active networking, from capsules to programmable switches. Capsule applications carry their code and terminate after execution. Programmable switches are implemented by “well-known” session ids that may receive data and act on it. We also allow authorized sessions to intercept non-active packets and manipulate their data, change their routing, drop them, etc. Such authorized sessions can also change the MIB variables in the router using SNMP. Another feature of the architecture is the ability of nonactive packets to request a special service, e.g., routing, that is implemented by a transparent resident session that is mapped to this service at the router. Network management applications are traditionally centralized around some manager. The manager queries the managed objects, builds a view of the network, and sends alerts if a problem is detected. The manager can also try and take corrective actions by sending configuration commands to network entities. The recent trend in network management architectures is to rely on multiple levels of abstraction, e.g., CORBA, Java ORB, Java RMI, Styx, DCOM, and Directory Enabled Networks (DENs). As a result, the cost of management is obscured from the application programmer, and thus neglected. If this trend will continue, management may consume increasing portions of network resources (bandwidth, buffer space). As is well put in the last chapter of [15]: “When CORBA is used the wrong way, the implemented applications, although they are functionally complete, can have performance and scalability problems.” We believe that our approach presents a better alternative to the current practice. It calls for the distribution of the management task in the network, enables shorter control loops, deletes long haul dissemination of redundant and unimportant information (”I’m OK” messages), and facilitates new exciting applications. The framework forces the programmer to be aware of efficiency issues and thus will result in more efficient code not only due to its intrinsic capabilities to do so, but also due to the human change of focus. Other agent based approaches [8,9]
222
Danny Raz and Yuval Shavitt
that enable distributed computing rely heavily on bandwidth blind approaches such as Java RMI and thus do not result in efficient usage of network resources. The rest of the paper is organized as follows. Section 2 gives a short overview of the system we built. A detailed comparison between our design and existing systems appears in Section 3. Section 4 describes the system architecture and flow of information. Section 5 shortly describes implementation examples. We discuss future work and give our concluding remarks in section 6.
2
System Overview
Logically, an active node in our system is comprised of two entities: an IP router, and an adjunct active engine (AE). The IP router component performs the IP forwarding, basic routing, and filtering that are part of the functions performed by today’s commercially off-the-shelf (COTS) IP routers. The filtering is used to divert packets to the active engine. The active engine is an environment, where user written programs can be executed with close interaction to the router data and control variables1 . Physically, the IP router and the active engine may either reside on different machines or coreside inside the same box. This structure enables us to upgrade any COTS IP router to an active router simply by adding an adjunct active engine. The separation protects non-active traffic from the effects of erroneous operations of the active part of the network, and inflicts minimal additional delay on non-active traffic. It also makes gradual deployment of active nodes in current networks easy. A logical distributed task is identified by a globally unique number called a session id. When code associated with a non-existing session arrives, it is executed and creates a process that handles all the packets of that session. Such a process can either handle only a single data packet and terminate (capsule), or it can exist in the AE for a long period of time handling many data packets as required by many network management applications. To perform network layer tasks, sessions must have access to the router’s network layer data, such as, topological data (neighbor ids), routing data, performance data (packets dropped, packets forwarded, CPU usage etc.) and more. We use SNMP as the interface between the router and the AE. Standard SNMP agents exist in all routers and enable a read/write interface to a standard management information base (MIB). In order to perform distributed tasks, an active node must have means to communicate with other active nodes. Relying on the fact that the full topology information is available for any specific node does not scale. To tackle this problem we support a topology-blind addressing mode that enables a node to send a packet to the nearest active node in a certain direction. This mode is useful for topology learning, robust operation, support of heterogeneous (active and non-active) environment, etc. We also support the explicit addressing mode in which a packet is sent to a specific active node. 1
The AE can be perceived as an execution environment in the context of [6].
An Active Network Approach to Efficient Network Management
223
Overall, we built a system that enables the safe execution and rapid deployment of new distributed management applications in the network layer. This system can be gradually integrated in todays IP network, and allows smooth migration from IP to active networking. To facilitate this, we introduce novel architectural features such as: isolation of the active mechanism, the session concept, the ability of active sessions to control non-active packets, and blind addressing.
3
Related Work
Recently, research in active networking is gaining popularity. Some of the research groups in this area are: MIT (ANTS) [16]; U. Kensas [10]; U. Penn. (SwitchWare) [1]; Georgia Tech. [4]; Columbia (NetScript) [17]; UC Berkeley (MeGa) [3]; Washington University (DAN) [5], and more. In this section we compare our architecture with the menagerie of existing active network architectures and with other agent based approaches. In our design, we separate the active engine where active code is executed from the router itself. This approach makes deployment easier and poses less threat to the non-active traffic in case the active engine breaks down. A similar approach was recently reported by Amir et al. [3], however, they are limiting the scope of their active server to the application level and thus limiting its capabilities. Bhattacharjee et al. [4] also suggested a similar approach but for a very restricted active server that can support only a given set of functions. In most other works, the active part and the non-active part are not well separated. ANEP [2] is incorporated into several of the existing projects. It is already used in [10] and will be used in [16]. We can only speculate whether the current research projects will converge eventually to a unified environment using ANEP as the wire encapsulation. Interestingly, SNMP escaped the notice of all the active network projects but one [17]. We believe the use of SNMP is the most attractive option to integrate active network technology with existing routers. Most agent based systems [8,9] reside in the application layer and thus do not have access to network layer information. A recent first step in addressing the need of agents to interface with network layer information is presented by Zapf et al. [18]. They allow their application layer agents to access router information through an intermediate resident application in the router using SNMP interface. Another interesting work was presented by Hj´ almt´ ysson and Jain [7]. They built a system where installed agents can manipulate data streams in a router. Though similar in flavor to this work, Hj´ almt´ ysson’s work suggests a new router design, while we emphasize the use of legacy routers.
4 4.1
Architecture Design Principles
Targeting specifically the network management domain, the following principles guided our design:
224
Danny Raz and Yuval Shavitt
Generality and Simplicity Building applications should be easy to a large base of programmers. Thus, the system should not be limited to one language, and should support languages that are in general use. The node should also be general enough to support many levels of active networking, from capsules to programmable switches. Modularity We separate the active node to modules with clearly defined API between them. In particular, we chose to separate the forwarding mechanism of a regular router from the operating environment where the packets are executed. We also use, as much as possible, well accepted standards, such as Java, SNMP, and ANEP [2], as the API in which the modules exchange information. Inter-operability and heterogeneity Most likely, active nodes will co-exist with non-active routers. Furthermore, incremental deployment of active nodes with co-existing routers seems a natural evolvement path. In such a scenario it is very unlikely to assume that an application running on an active node could explicitly know the addresses of its active neighbors. To this end we support “blind” addressing, in which the active node needs not know the address or the location of other active nodes. Long Lasting Sessions In many network management applications there is a natural need for an application to reside in a node for a long period of time (for example to do monitoring and billing). This cannot be efficiently implemented with capsules. As we do target the network management domain we specificly design the system to support such applications. It is also very important for an application to have an easy and standard access to the local information at a node, since in many applications the action taken by the packet depends on this information. Cost Visibility Although we wish to abstract most of the technical details in order to simplify the development of applications, we think that the application must be aware of the costs, both in terms of node resources (CPU, memory, etc.), and in terms of global network resources (bandwidth and delay). Therefore, we do not use advanced distributed tools such as CORBA and Java RMI, which in general hide much of the actual cost from the user. Safety and Security Non-active traffic should not be affected by the new active ability. Further more, an active application should not be able to affect any other application. The system should support security and robustness at all levels. The above principles directed us to make the following architectural decisions: (1) The active node is composed of a regular router with a diverter, which detects and diverts active packets to the main separate component - the Active Engine; and (2) The AE is a separate entity (which may reside on a different card, or a separate machine), which performs most of the active node’s task. This simple modular structure supports inter-operability, and does not require that the specific address of the next active hop will be known. The diverter part is fairly simple, and can be carried out using IP filtering which is supported in the API level by most of the router vendors. This structure also allows an easy incremental deployment in heterogeneous networks. Another advantage of this design is robustness; non-active traffic could not be effected by active traffic.
An Active Network Approach to Efficient Network Management
225
Even if for some reason the active engine stops working the router will still route non-active packets correctly. The second significant entity in our design is the session which serves both as a mechanism that preserves soft state, and as a rendezvous point for data fusion. Logically, a session is a distributed task preformed in the network. A session has a unique network id, thus, different programs, on various nodes can belong to the same session. These programs may exchange information using active data packets, and they can distribute (and/or update) their code by sending active programs. This notion of a session is general enough to support both long lasting processes, and short term capsules. The fine details of the design are described in the following subsections. 4.2
Detailed Design
session 1
session n Active Engine
manager security IP
SNMP
forwarding router
diverter
Fig. 1. The general architecture. The main components of the system are (see figure 1): Diverter — A part of the router that enables it to divert packets to the AE based on their IP/UDP header. The new generation of high-performance IP routers [11] has this option implemented as part of the router hardware. Edge routers and our prototype perform this function in software. Active Manager — The core of the AE is the Active Manager. This part generates the sessions, coordinates the data transfer to and from the sessions, and cleans up after a session when it terminates. While a session is alive, the Active Manager monitors the session resource usage, and can decide to terminate its operation if it consumes too much resources (CPU time or bandwidth) or if it tries to violate its action permissions. Security Stream Module — This module resides in kernel space below the IP output routine. Every connection that the session wishes to open, must be registered with this module to allow monitoring of network usage by sessions. The registration will be done by our supplied objects transparently to the application developer. The module is not fully implemented, yet.
226
Danny Raz and Yuval Shavitt
Router Interface — This module allows sessions to access the router Managed Information Base (MIB). It is implemented as a Java object that communicates with the router using SNMP. In the future we plan to enhance performance by caching popular MIB variables. The design allows multiple languages to be implemented simultaneously, but since the current implementation handles only Java packets we will restrict the description to the details of the Java implementation. Implementation of other languages may require some adaptations according to the language specifics. In the following we describe the flow of packets through the system. Note that a non active packet does not pass through the Active Engine since the diverter recognizes it as such and thus the packet takes the fast-track to its output port. All active packets include a default option that contains among others the unique session id of the packet, and a content description (data, language). All the diverted packets are sent to the active manager. If a packet does not belong to an existing session and it contains code it triggers a creation of a session. If it is a data packet it is discarded. A session creation involves, among others, authentication (not implemented), creation of a control block for the session, creation of a protected directory to store session files, opening of a private communication channel through which the session receives and sends active packets, and execution of the code. Methods that are associated with the session object allows the Java program to easily send itself to another node, and to send and receive data. New arriving programs are passed to the session to allow it to perform code updating without losing its state. Four UDP port numbers (3322-5) are assigned to active network research. The first, the blind addressing port, is used to send active packets to an unspecified node in a certain direction, i.e., towards some distant destination. The diverter in the first active node that is on the route to that destination intercepts the packet and sends it to the active engine. Therefor, the sender is not required to know the address of the next active node. The second UDP port number (the explicit active port) is used to send an active packet to a specific active node. This packet is forwarded through the fast-track of all the intermediate active nodes, and is not diverted until reaching its destination. The active manager keeps track of the resource consumption of the session in the node (CPU time, bandwidth, disk space). A session that consumes excessive resources is aborted. A session may also be aborted due to lack of activity. Since we expect most network programming to be stable, we do not try to optimize the capsule model. Thus, we are less concerned about the program size as programs are not going to be transmitted frequently. A mechanism to reassemble a program from a chain of UDP packets is currently implemented. 4.3
Security
Security and safety are of major concern in the deployment of active networks. A system is safe if no application can destroy or damage the appropriate execution
An Active Network Approach to Efficient Network Management
227
of other applications. In particular, the active engine as a whole should not effect the routing of non-active packets. A system is secure if all operations including access to data are authenticated, i.e., only authorized sessions can perform actions and or access private data. Our architecture supports both security and safety, although currently it is not fully implemented. In any design, one faces the dilemma of choosing between the freedom to allow more sophisticated session behavior (e.g., setting MIB variables, diverting non-active packets) and the fear of a possible safety/security hole. Our approach allows multiple levels of security via authentication and session classification. Each session is authorized to use specific services (MIB access for read or write, divert non-active packets) and resources (CPU time, bandwidth, memory). As it is important to ensure both safety and security in order to promote the use of active network, one can initially select to be more restrictive in authorizing services, and gradually allow more sophisticated services. Our first concern is to make sure that non-active packets are not affected by the active packets. This is easily achieved by the logical (and sometimes physical) separation of the active engine from the router. Next step in safety, is to ensure that a session will not corrupt or even gain access to other sessions data. We achieve this through the use of Java SecurityManager. It allows us to control the session running environment, in particular we prevent sessions from using native methods and restrict the use of the file system. Malicious or erroneous over use of system resources is of great concern. To this end, we intend to monitor the use of CPU time by sessions. We implemented a tight control over the usage of the communication channel to the outside world. TCP connections can be only opened by a permitted session using our supplied methods that monitor the bandwidth consumption. An attempt to use Java methods is blocked by controlling the IP layer in the active engine. An unauthorized connection will be dropped. UDP packets can be sent only through the manager, which again can monitor the bandwidth usage. 4.4
Test Bed Description
We built a small heterogeneous network as described in figure 2. The network is comprised of both FreeBSD based active routers and COTS routers (currently we use CISCO 2500 routers and Lucent Technologies RABU PortMaster3) with adjunct active component. The FreeBSD routers are PCs running FreeBSD, using routed for routing. In these PCs, the active engine and the router coreside in the same machine. Performance In building the prototype we did not aim at performance. Our first target was to build a concept system that will enable us to test design ideas and applications. Thus, many parts of the system where not optimized for performance. Nevertheless, we tested the capabilities of our prototype. Thus the delay of an active packet through the system should be treated as an upper bound on what can be accomplished and the load measures as lower bound.
228
Danny Raz and Yuval Shavitt tishrey
heshvan
kislev
razcisco
Internet
adar
router
Internet
tevet
shvat
active engine
Fig. 2. The prototype network architecture.
Our first experiment was to measure the delay of an active packet through one active node. To this end we used a session on heshvan that forwards every packet it receives to the destination on the packet. Java applications on tishrey and kislev (see figure 2) exchange UDP packets using either an active port that is diverted to heshvan’s active engine, or a non-active port that is forwarded by heshvan’s router. The packets were about 500 bytes long, and the network was kept without additional traffic to prevent queueing delay to effect the results. The average round trip delay (RTD) for a packet was 1.37ms without diversion, and 11.20ms with diversion. The 90% confidence interval are 1.37±0.0047ms (∼0.68% wide), and 11.20±0.022ms (∼0.39% wide). Thus the average delay through heshvan’s active engine was (11.2 − 1.37)/2 = 4.915ms. Heshvan is a Pentium machine with 64Mbytes memory, that runs at 200MHz. It is obvious that the active engine’s performance is bounded by the memory access speed and not by the computation power of the processor. We did not conduct a full scale stress test, but we repeated the above experiment with ten sessions active in heshvan, and the delay through the active engine did not change.
5
Application Examples
To demonstrate the power of our system, we consider two problems both related to network management but represent basic problems that can be used in various applications. The first one is bottleneck detection, which is a special case of collecting information or calculating a function along a route between two nodes. The second application is a message dissemination for a large group of receivers. It is useful for an automatic configuration of network elements or any other application that require dissemination of messages to a large population. In this section, we briefly discuss the bottleneck detection applications and refer the reader for more details to [13]. Bottleneck detection is an important problem faced in network management. It is a building block for higher level applications, e.g., video conferencing, that require QoS routing. It is also an example for any problem related to gathering information along a given path between two network nodes.
An Active Network Approach to Efficient Network Management
229
In today’s IP networks there is only one ad-hoc technique to examine one specific QoS parameter, namely the delay along a path. It is the well-known traceroute program that enables a user at a host to get a list of all the routers on the route to another host with the elapsing time to reach them. The use of the traceroute program for network management has several drawbacks: it can only retrieve the hostname and the delay along a path; it is extremely inefficient in its use of network resources; and it is slow.
(A)
(B)
(C)
Fig. 3. Three traceroute executions on a three-hop path: (A) the current program; (B) collect-en-route; and (C) report-en-route
In an active network, and specifically in our architecture, there are several options to gather information along a given path between two network nodes, each optimize a different objective. One option (collect-en-route) is (see figure 3(B)) to send a single packet that traverses the route and collects the desired information from each active node. When the packet arrives at the destination node, it sends the data back to the source (or to any management station). This design minimizes the communication cost since a single packet is traveling along each link in each direction. Another option (report-en-route) is (see figure 3(C)) to send a single packet along the path. When the packet arrives at a node, it sends the required information back to the source and forwards itself to the next hop. This design minimizes the time of arrival of each part of the route information, while it compromises communication cost. Note that the traceroute program has (see figure 3(A)) time and communication complexities that are quadratic in the path length. Table 1 compares the three options. The use of general programs in the capsule enables the application programmer to query any available (MIB) variable from the router, rather than just the router IP address. For example, for bottleneck detection we can collect statistics about TCP packet loss along a route to a certain host. It is easy to generalize the program to start the data collection from any active node in the network (not necessarily the originator), and to send the reports to any other node.
230
Danny Raz and Yuval Shavitt
Algorithm Used traceroute collect-en-route report-en-route
No. of messages used time of data arrival from node i n(n + 1) i(i + 1) 2n 2n n(n + 3)/2 2i
Table 1. Performance comparison (time is measured in hop count).
6
Discussion and Future Work
Since the inception of the active network idea there is a search for the ”killer application”, the one that will strongly require active network technology. We believe that network management is a domain where using active network technology could be proved to be very significant. Applications like adaptive control, router configuration, element detection, network mapping, security management (intruder detection, fighting deny of service attacks) are only some examples where active network technology can be successfully applied. Our architecture also supports solutions to other problems not necessarily part of network management, such as, search worms, smart mail, multicast, hop to hop flow control, etc. The additional delay seen by packets in active networks, is an issue that was addressed in the past, and as others pointed out we believe the small slow-down is compensated by the big saving that can be achieved in traffic volume. The analysis presented in section 5 is a first step towards a more formal framework to address this point. In our prototype, non-active packets suffer only the negligible additional delay of the diverter. And although the active engine is not optimized in the current version, as the emphasis is on functionality, the delay through the AE is reasonable. Altogether, we presented a prototype implementation of a network management engine built using active network technology. We believe our approach can enable new and efficient ways to manage today’s (and tomorrow’s) networks.
References 1. D. S. Alexander et al. The SwitchWare active network architecture. IEEE Network, 12(3):29–36, May/June 1998. 223 2. D. S. Alexander et al. The active network encapsulation protocol (ANEP). URL http://www.cis.upenn.edu/∼switchware/ANEP/docs/ANEP.txt, 1997. 221, 223, 224 3. E. Amir, S. McCanne, and R. Katz. An active service framework and its application to real-time multimedia transcoding. In SIGCOMM’98, Sept. 1998. 223 4. S. Bhattacharjee, K. Calvert, and E. W. Zegura. An architecture for active networking. In HPN’97, Apr. 1997. 223 5. D. Decasper and B. Plattner. DAN: Distributed code caching for active network. In INFOCOM’98, Mar. 1998. 223
An Active Network Approach to Efficient Network Management
231
6. AN Working Group. Architectural framework for active networks. URL http://www.cc.gatech.edu/projects/canes/arch/arch-0-9.ps, August 31 1998. Version 0.9. 222 7. G. Hj´ almt´ ysson and A. Jain. Agent-based approach to service management towards service independent network architecture. In IFIP/IEEE IM’97, pages 715 – 729, May 1997. San Diego, CA, USA. 223 8. G. Karjoth, D. B. Lange, and M. Oshima. A security model for aglets. IEEE Internet Computing, 1(4):68–77, July/August 1997. 221, 223 9. J. Kiniry and D. Zimmerman. A hands-on look at java mobile agents. IEEE Internet Computing, 1(4):21–30, July/August 1997. 221, 223 10. A. B. Kulkarni, G. J. Minden, R. Hill, Y. Wijata, S. Sheth, H. Pindi, F. Wahhab, A. Gopinath, and A. Nagarajan. Implementation of a prototype active network. In OPENARCH’98, pages 130–143, Apr. 1998. 221, 223 11. V. P. Kumar, T. V. Lakshman, and D. Stiliadis. Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow’s Internet. IEEE Communications Magazine, 36(5):152–164, May 1998. 220, 225 12. E. L. Nygren. The design and implementation of a high-performance active network node. Master’s thesis, Massachusetts Institute of Technology, Feb. 1998. 221 13. D. Raz and Y. Shavitt. An active network approach to efficient network management. Technical Report 99-25, DIMACS, May 1999. 228 14. D. L. Tennenhouse et al. A survey of active network research. IEEE Communications Magazine, 35(1):80–86, Jan. 1997. 220 15. A. Vogel and K. Duddy. JAVA Programming with CORBA. Wiley, 2nd ed., 1998. 221 16. D. Wetherall et al. ANTS: A toolkit for building and dynamically deploying network protocols. In OPENARCH’98, pages 117–129, Apr. 1998. 223 17. Y. Yemini and S. da Silva. Towards programmable networks. In IFIP/IEEE Intl. Workshop on Distributed Systems Operations and Management, Oct. 1996. 223 18. M. Zapf, K. Herrmann, K. Geihs, and J. Wolfgang. Decentralised SNMP management with mobile agents. In IFIP/IEEE IM’99, May 1999. 223
Virtual Networks for Customizable Traffic Treatments Jens-Peter Redlich, Masa Suzuki, and Steve Weinstein C&C Research Laboratories, NEC USA, Inc. 4 Independence Way, Princeton, NJ 08540, USA {redlich,masa,sbw}@ccrl.nj.nec.com
Abstract. Selective treatments are needed for different types of traffic and different user groups, even in the Internet. Virtual networks can help to partition physical network resources, where the resulting partitions may implement their own, independent control and processing mechanisms. This allows for customizable traffic treatment that is not available in either Integrated Services or Differentiated Services. This paper describes a networking strategy incorporating intelligent routers that implements the gateway between the end-user and the core (virtual) network. Intelligent routers classify end-user traffic and assign it to virtual networks provided by the core network, according to a programmable policy. Furthermore, an intelligent router may process transmitted data, according to QoS needs of the application and the core network’s resource allocation, as well as in support of higher-level application functionality. Open, CORBA-based interfaces allow for control of the intelligent router, including dynamic download of code which is used to extend the router’s functionality on the fly, as new applications or user requirements need to be supported.
1
Introduction
A virtual network (VN) is a service concept. It is characterized by a set of communication capabilities for an ensemble of communication calls, flows, or sessions that have something in common with one another. The Internet as a whole may be characterized as a virtual network which provides a best effort data forwarding service, augmented with higher level services, such as name services (DNS), management (ICMP, SNMP) and others, with a protocol stack grounded on IP. Other virtual networks may use the same physical infrastructure, but provide additional services or QoS features. Access to these VNs is usually restricted, as the VN features are not universally available or are reserved for usage by authorized users. Efforts such as COPS [ref] and DIAMETER[ref] are beginning to define client/server protocols to exchange policy information needed for allocation of resources to virtual networks.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 232-239, 1999. Springer-Verlag Berlin Heidelberg 1999
Virtual Networks for Customizable Traffic Treatments
233
Virtual networks are needed to avoid, on the one hand, the cost and complexity of providing a dedicated physical network for each traffic type with its associated preferred treatment, and, on the other hand, building a very complex "all services" network. Figure 1 illustrates the three alternatives of overlaid physical networks, overlaid virtual networks, and an all services network (as Broadband ISDN was envisioned to be [1]). Creating VNs is really a modularization strategy for an all services network that is scalable (in services) and not overwhelmingly complex. Some potential benefits, in addition to the architectural benefit described above, are: • • • •
Creation and deletion of special treatments (routes, processing, etc.) for aggregations of traffic, as needed. Using different, customized control algorithms in the different VNs. Providing built-in separations (enhancing privacy) between traffic aggregations with different sensitivities, purposes, or owners. Providing communications services customers with virtual private network (VPN) services in the Internet with the same and greater control capability they enjoy with VPN services in the switched telephone network.
There are, of course, some difficulties and potential disadvantages of VN architectures. First, there is the inefficient use of bandwidth, if pre-allocated in bulk for each VN rather than maintained as one fully shareable resource [2]. Second, it is a severe security challenge to let multiple entities control and manage pieces of the same physical resource [3]. The first concern is being addressed with techniques for quickly reallocating resources among VNs, and the second with operating system-like memory protection and CPU scheduling mechanisms, in conjunction with encryption, authentication, authorization and logging services.
Physical network 1 2
a)
1 2 1 2
Physical network 2 b)
Switch/router resources divided 1 2b telephony (1) and data (2) services separate control plane ih
c)
Switch/router resources allocated flflow with b a common control plane
Fig. 1. Three ways to realize an all-services communications capability, illustrated for voice and data traffic aggregations. (a) Physical overlay networks. (b) Overlaid Virtual Networks. (c) A single all-services network.
234
Jens-Peter Redlich et al.
2 Traffic Customization Precedents : Traffic Grooming, Virtual Private Networks, and Virtual Paths
The concept of virtual networks is already well-established in public telecom networks, albeit with limited flexibility. Three forms of virtual networks are common: 1 Traffic Grooming. 2 Virtual Private Networks. 3 Virtual Paths. Traffic grooming is a very old technique used in the telephone network to group traffic with similar characteristics, often related to destination, in order to more efficiently utilize transmission facilities. More recently, SONET add/drop multiplexers have been designed to route local traffic directly from one low-speed port to another, rather than multiplexing and de-multiplexing in the high-speed passthrough stream. Voice and data traffic may be separated in digital cross connects (DXCs) and sent through separate trunks to voice and data switches, and Digital Subscriber Line (xDSL) services separate data from voice traffic at the central office. Finally, traffic grooming is appearing in SONET/WDM ring networks as a way of reducing multiplexing costs by grouping similar traffic on particular wavelengths [5]. Virtual private networks (VPNs) are provided to large customers, such as companies linking different locations, who "...do not perceive that they are sharing a network with each other ... you think you have it, but you don't" [1]. A telephone DXC or switch can support the virtual links of several large customers, each of whom may be given some degree of control, particularly in reconfiguring cross connects. The virtual private network is ordinarily changed infrequently and slowly, and is usually restricted to grouping traffic on an owner (source and destination address) basis. Virtual private networks will continue to be an important virtual network category, with more recent work focusing on realizing finer resource dividing strategies and more customized control of routing and other VPN-specific treatments [6]. Virtual paths (VPs) [1] are groupings of ATM virtual circuits (VCs) that are traveling between the same end switches, sometimes groomed so that a VP carries virtual circuits of a particular service class or a particular user or user group. Use of virtual paths allows reuse of virtual circuit identifiers (VCIs) for circuits using the same switches but having different virtual path identifiers (VPIs), and reduces the processing load on intermediate switches which need only process VPIs. Virtual networks are useful in realizing both traffic grooming and VPNs. However, VPs are still just address-controlled routing mechanisms and do not allow the full range of aggregated traffic definitions and treatments that could be possible in virtual networks.
Virtual Networks for Customizable Traffic Treatments
3
235
Intelligent Edge Router
Nowadays Internet applications are not prepared for explicitly using Virtual Networks, i.e. to select a Virtual Network according to the applications QoS requirements and to utilize its build in processing and control features. New Internet signaling protocols, such as RSVP, are a first attempt to let applications explicitly request and configure resources from core network elements. But still, most applications are not RSVP aware and it will take some time until applications can assume ubiquitous support for RSVP in the network core. In addition, an application that signals directly to network elements lacks the global picture. It does not know about other applications or about other users. It does not know which services are most crucial to the end-user, as opposed to those that run just as background entertainment. Moreover the application does not know about the importance of a user within a community such as a corporation. How should a Websurfer know that he should reduce his resource consumption, because the marketing department has an IP telephony session with a big potential customer? To overcome these problems, we propose an architecture that decouples applications from the resource allocation mechanisms of the core network. In this architecture, the resource allocation function is provided by the router that connects an end-user LAN with the Internet core network. Because this router provides programmable functions in addition to its routing and forwarding function, we refer to it as an intelligent router (IR). The concept of allocating resources to traffic on the basis of attributes of the traffic rather than specific signaling requests is not new. There are very limited realizations in, for example, firewalls that block traffic from certain addresses or applications. What is new here is the flexibility of classification and resource allocation. We can quickly program new criteria for traffic classification, and set up a wide range of treatments for the VNs associated with these traffic classes. This is primarily supported by the intelligent router’s build-in capability to dynamically download compiled program code (share libraries), that are linked to the Router at runtime. Furthermore, we propose an open programming interface for setting up, controlling, and managing VNs from possibly remote locations. ISP’s Virtual Networks
LAN
Added processing
VN1 (IntServ) traffic mix Intelligent Router (classifies incoming traffic)
e.g. RSVP
VN2 (Gold) DiffServ
VN3 (Bronze) local network
Internet Core Network (WAN)
Fig. 2. An Intelligent Router (IR) is used to decouple applications on the LAN from the resource allocation mechanisms used inside the core network. A Virtual Network may include processing capabilities in addition to data forwarding service.
236
Jens-Peter Redlich et al.
Figure 2 shows our configuration. We assume that for all communication between any two hosts of the LAN, bandwidth and maximum delay meet all application requirements, or if not, that there are mechanisms that can resolve those LANresource allocation conflicts. These mechanisms are outside the scope of this paper. On the other side, the intelligent router is connected to an ISP, which may offer several virtual networks to its customers. Each of these virtual networks may have very specific QoS characteristics, pricing structures and control interfaces. However, as explained above, these virtual networks may share the same physical hardware (e.g. a wire coming out of the wall) or they may actually use a composition of physical access facilities, perhaps belonging to different ISPs. In addition to the virtual networks that are provided by the ISP, the intelligent router may implement its own set of virtual networks, each one with unique, value added functions and interfaces. The intelligent router is responsible for assigning the resources provided by the ISP to the various flows of IP packets that are emitted from the applications running on the LAN. This resource allocation process is governed by a policy, which is usually defined by the LAN administrator. For a small company’s network, this policy may, for instance, require that: -
Traffic originated from the CEO has preference over traffic from staff members. FTP traffic during working hours has lowest priority (except for those people that are assigned to a high priority software development project). HTTP traffic from summer interns is blocked completely (except traffic to allowed Web sites, such as the company’s headquarters web server). Email has preference over FTP.
In order to make its decisions, the intelligent router must analyze the traffic it has to forward. The source IP address can be used to determine the user who is associated with this traffic (assuming that an additional component maintains a mapping between users identifications and the IP addresses of their machines). If UDP or TCP is used (which is likely), the port numbers can be used to specify the service/application that is associated with this traffic. For big servers, e.g. of an Internet bank, the destination IP address can be used to determine the associated service. In addition, the application may use signaling, such as RSVP, to indicate its requirements. However, in this case, RSVP is terminated at the intelligent router and is used only to provide additional information about the associated flow of IP packets. An Internet telephony example helps illustrate how implicit or explicit application requirements are mapped to ISP resources, i.e. to the virtual networks provided either directly by the ISP or by the intelligent router. We assume that the Internet telephony application uses RSVP to signal its requirements for bandwidth and maximum delay to the network. This RSVP signaling is terminated at the intelligent router. Depending on the policies that the LAN administrator defined for the associated user and for the Internet telephony service, the company's CEO may get an ATM switched circuit for his traffic, with bandwidth and delay requirements derived from the RSVP information. Staff members might get the “gold service” from the ISP’s differentiated services virtual network, which in most cases shows good enough behavior for this type of traffic (but which is without any guarantees).
Virtual Networks for Customizable Traffic Treatments
237
Summer interns may run similar applications, but since their traffic is assigned to the “best effort” virtual network they may be not quite satisfied with the QoS they get most of the time. An alternative to using an ATM switched circuit for the CEO’s traffic is setting up a path through the Internet core that guarantees a high service quality for the IP telephony session. This assumes, however, that the ISP supports RSVP in its core network. If the IP telephony application sends its traffic without any additional (RSVP) signaling, the intelligent router applies its default policies without specific knowledge of bandwidth and delay requirements. For instance, for the CEO’s telephone call, the intelligent router can use RSVP to reserve nominal resources in the ISP’s core network (if RSVP is implemented there), even if the CEO's IP telephony application itself is unable to use RSVP. We see here a compromise between intelligent end systems and intelligent network in order to provide higher quality service even if the end system has not evolved to the latest stage. Last but not least, the intelligent router could temporarily change its policy in favor of a certain user, if this user either has the privilege to make such changes (service upgrades) or if the user is willing to pay for the higher service level from the ISP. The summer intern from the above example could purchase premium network support for his IP telephony application, even if he would usually have to use the “best effort” network. Assuming the availability of an infrastructure that allows for efficient and secure micro-payments, i.e. transactions below one dollar, the payment for the higher service could either be provided by the user himself or by his remote partner. Hence, as a courtesy to its customers, the summer intern’s bank could provide him with high service quality if he accesses his banking account through the bank’s Internet site. The use of the intelligent router for decoupling an application’s request for a certain QoS from the network’s resource management has the additional advantage that different mechanisms, i.e. protocols, abstractions, etc. can be used in the LAN and in the Internet core network. As alluded in the above examples, new protocols for letting an application express its QoS requirements can be introduced long before such support is available in the Internet core network. Moreover, the intelligent router may use the very efficient native resource management subsystems and signaling protocols of the underlying network in order to meet the applications requirements. Hence, an application may use native ATM without knowing about ATM at all.
4
Implementation Considerations and Conclusions
The implementation architecture is shown in Figure 3. The programming interface defined in CORBA IDL includes the following functionality: -
A programmable pattern factory that produces patterns, which are used for classifying traffic on the basis of bit-patterns in IP headers. Modification of parameters of schedulers for packet forwarding. Specification of parameters of token buckets for traffic delimiting. Specification of schedulers and traffic shapers to define VNs. Specification of rules for mapping incoming packets into VNs. Monitoring of packet counts in each operating VN.
238
Jens-Peter Redlich et al.
The capacity assigned to each contributing source of traffic for a particular VN is assigned as a fraction of the total throughput measured for that VN. The assignment is implemented with token buckets. The allocations among different VNs can also be modified. Another component, also shown in Figure 3, is a Domain Resources Manager that executes an algorithm for allocating capacity among different VNs. It does this on the basis of traffic congestion information from neighboring routers, obtained through their programming interfaces. A prototyping testbed is being constructed, using a software router running on a PC with Linux as its operating system. The router components are implemented as CORBA objects that can be remotely controlled. With these objects, virtual networks for different classes of traffic have been set up in an Ethernet environment. Initial results show the added burden of using CORBA results in an increased forwarding delay of less than 1ms. This system is intended to be compatible with future CORBA interfaces for network elements following the IEEE P1520 standard [14].
User Interface
DRM other routers
other routers
incoming traffic patterns
Classification Rules
CORBA interface
Schedulers Schedulers Schedulers
Ethernet outgoing Ethernet traffic Ethernet
Intelligent Router
Fig. 3. Implementation architecture of a LAN-based intelligent router.
A small group of everyday networking users in our organization are routed through this infrastructure. We can demonstrate fast creation of VNs and changes in assignments of the traffic of different users to these VNs. One application, for example, is on-demand upgrading of a user’s VN assignment on the basis of an incremental service charge. We are developing additional customized traffic treatments in this prototyping environment. We believe that implementations similar to ours will become a common platform for services flexibility in the Internet of the 21st century.
Virtual Networks for Customizable Traffic Treatments
239
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11.
12.
13. 14. 15. 16.
Boyle, J., "The COPS (Common Open Policy Service) Protocol", IETF draft draft-ietf-rap-cops-05.txt, January 18, 1999. P. R. Calhoun, A. Rubens, "DIAMETER Base Protocol", IETF draft draftcalhoun-diameter-07.txt, Work in Progress, November 1998. U. Black, ATM: Foundation for Broadband Networks, Prentice Hall, 1995, ISBN 0-13-297178-X. J-F. Huard, A. Lazar, "A programmable transport architecture with QoS guarantees", IEEE Commun. Mag., October 1998. S. Alexander, W. Arbaugh, A. Keromytis, J. Smith, "Safety and security of programmable network infrastructures", IEEE Communications Magazine, October 1998. P. Ferguson, G. Huston, Quality of Service, Wiley, 1998, ISBN 0-471-24358-2. E. Modian, A. Chiu, "Traffic grooming algorithm for minimizing electronic multiplexing costs in unidirectional SONET/WDM ring networks", 1998 Conference on Information Systems Sciences, Princeton. J. Rooney, J.E. van der Merwe, S.A. Crosby and I.M. Leslie, "The Tempest: A framework for safe, resource-assured, programmable networks", IEEE Communications Magazine, October 1998. IETF, RFC1633, “Integrated Services in the Internet Architecture: An Overview”, R. Braden, et. al., June, 1994. IETF, Internet Draft, “An architecture for Differentiated Services”, S.Blake, August, 1998, available at http://search.ietf.org/internetdrafts/draft-ietf-diffserv-arch-01.txt. IETF Internet Draft, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, K.Nicholas, et. al., August, 1998. available at http://search.ietf.org/internet-drafts/draft-ietfdiffserv-header-02.txt IETF, Internet Draft, “Management of PHBs”, M.Borden, et. al., August, 1998, available at http://search.ietf.org/internet-drafts/draftietf-diffserv-phb-mgmt-00.txt. IETF RFC2205, “Resource ReSerVation Protocol (RSVP), V1, Functional Specification”, R.Braden, et. al., September, 1997. IETF, RFC1825, “Security Architecture for the Internet Protocol”, R. Atkinson, August 1995. R. Dighe, M. Suzuki and S. Weinstein, "The Global Internet: A New Perspective on Broadband Access to the Internet", Proc. IEEE Globecom'98, Sydney, November 1998. J. Biswas, " IEEE P1520 Standards Initiative for Programmable Network Interfaces", IEEE Communications Magazine, October 1998.
Flexible Network Management Using Active Network Framework Kiminori Sugauchi1, Satoshi Miyazaki2, Kenichi Yoshida1, Keiichi Nakane1, Stefan Covaci3, and Tianning Zhang3 1
Systems Development Laboratory, Hitachi, Ltd. 292 Yoshida-cho, Totsuka-ku, Yokohama 244-0817, Japan {sugauchi,yoshida,nakane}@sdl.hitachi.co.jp 2 Corporate Information Systems Office Hitachi, Ltd., New Marunouchi Bldg. 5-1, Marunouchi 1-chome, Chiyoda-ku, Tokyo, 100-8220 Japan [email protected] 3 GMD FOKUS Kaiserin-Augusta-Allee 31 10589, Berlin, Germany {covaci,zhang}@fokus.gmd.de Abstract. The growing intelligence of communication equipment and the advances in communication services created a demand for more complex and flexible network management functions. It becomes difficult to handle such a demand by the traditional, simple manager“management agent” paradigm for network management systems. Mobile agent technology is regarded as one of the promising solutions for handling such demand. We evaluate the efficiency of a mobile agents based network management system, quantitatively as well as qualitatively, by using our prototype SDH (Synchronous Digital Hierarchy) test management functions. The results show effective use of mobile agent technology for network management.
1
Introduction
As the network spreads over many companies and homes, many users want to use various types of network services. To satisfy their customers, it is important for the network provider to offer various up-to-date network services. The network elements have to have a flexibility to adapt themselves to such latest services. In other words, network providers have to manage network resources to satisfy user specific requests. In this context, network management systems manage each path of individual user as well as the whole network resource, and are required to have the flexibility so that the network provider can configure the network to satisfy various requests of each customer. In this research, we propose the use of active network framework [9] to realize a flexible network management system. The mobile agent technology developed in the ACTS/MIAMI (Advanced Communications Technologies and Services/Mobile Intelligent Agents for Managing the Information Infrastructure [4]) project plays a Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 241-248, 1999. Springer-Verlag Berlin Heidelberg 1999
242
Kiminori Sugauchi et al.
central role in our approach. In the ACTS/MIAMI project, network management system and service management system using mobile agent technology are developed. The network management in this project focuses on fault management, configuration management, and performance management. In fault management, mobile agents technology is used for the alarm collection and the correlation analysis. Our research is partially based on the results of the ACT/MIAMI project. Here we try to implement the programmable network elements which are the key components of active network. Some other researchers have proposed the use of mobile agent technology to realize flexible network management [1, 2, 3]. In this paper, we advance these ideas by implementing a SDH (Synchronous Digital Hierarchy) test management system prototype. The evaluation results of the prototype reveal the following advantages of our approach: (1) The use of outbound control helps to solve the security problem [14] which is the most important research issue in realizing secure active network. (2) The performance problem [13], which is also an important criticism on the active network, can be partially avoided in our implementation. (3) The computing load for the network management, which can be an important factor in managing large networks, is decreased by the mobile agent technology. The rest of the paper is structured as follows: Section 2 presents the idea for using active network framework in the network management. Section 3 describes a SDH test system prototype. Section 4 summarizes the findings.
2
Network Management Using Active Network Framework
In this section, we describe our approach toward the flexible network management system. Programmable network elements, which are the core elements of active network, are used to realize the flexibility of the network management. The mobile agent technology developed in ACTS/MIAMI project is used to implement the programmable network elements. The first part of this section describes how to implement programmable network elements using mobile agent technologies. Then, we described the problems in Telecommunication Management Network (TMN) in detail. The last part of this section explains how to use programmable network elements to solve the problems.
2.1 The Programmable Network Element Using Mobile Agent Technology The framework of the active network is proposed to realize flexible network services [5, 6, 7]. The key idea of the active network is the use of programmable network element. In the active network, each network element is controlled by the software. The use of the software makes the modification of network functions and the development of new services easy. The hardware provides the basic functions to support software. The basic functions of the hardware involve the data transport functions and the hardware resource control functions. The typical hardware considered is the switching devices such as routers and ATMs.
Flexible Network Management Using Active Network Framework
243
The software are used to implement high level services to satisfy various requirements. We use mobile agent technology [8] to implement the software. Each mobile agent is a delegated autonomous software. It is downloaded to the network element and executes there locally to accomplish the user request. In other words, a Manager system Moving
Access Network Element
Transporting Part Procedural Part Network Element
Fig. 1. The architecture of mobile agent
mobile agent is the program that moves through the network elements and executes its own procedures autonomously. The typical services considered in our active network framework are dynamic routing and QoS. Figure 1 shows how the mobile agents move through the network and perform their tasks. Here, a mobile agent consists of the transporting part and the procedural part. The transporting part controls the migration of the mobile agent. The procedural part performs actual tasks for accomplishing the user request. In some case, the transporting part uses the processing results of its procedural part. Some researchers call similar mobile agent “capsule”, and used it to realize dynamic routing [13]. Note that the use of mobile agent or similar concept such as “capsule” releases the active network from the limitation of the predefined protocol. The necessity of protocol predefinition in the conventional network technology restricts the flexibility of the network. For example, introducing new standard protocol tends to take months or years of standardizing process. Agent technology can make flexible network by speeding up this process.
2.2 Problems in Telecommunication Management The conventional telecommunication management system is based on the TMN model [11]. A typical configuration of the TMN model consists of many operating systems, workstations, and network elements. Besides them, the TMN model has two major networks, i.e. the data communication network (DCN) and the telecommunication network. The telecommunication network transports user data. On the other hand, DCN is a special network designed to control network system. To exchange control information, each network element communicates through the DCN. Management function in the operating system uses also DCN to operate network elements. The DCN is separated from telecommunication network physically or logically. SS7 is a typically used network for this purpose. Important characteristic of this architecture is the outbound control. The key idea of the outbound control is the separation of user data and control data. In this
244
Kiminori Sugauchi et al.
architecture, the user data can not access DCN network. This alleviates the security problem, but we will discuss this latter. On the other hand, two important problems of the conventional approach to the network management are: • Lack of Flexibility: The growing intelligence of communication equipment and the advances in communication services created a demand for more complex and flexible network management function. It becomes difficult to handle such a demand by the conventional telecommunication management system. In the conventional framework, the management function on the network elements is fixed. Thus if new management function is required, the network provider has to suspend network services to reconfigure the management function. But the current keen competition between providers makes such service suspension difficult. • Load balancing of management tasks: In the SDH network, a path between two network elements is a logical unit of data transfer. A STM has data for many paths. Today, a high speed network such as SDH 10 Gbps network can support at least 384 individual path. The basic management task is executed on each path. In some cases complex configurations have to support thousands of paths. In the initial configuration phase, the network provider has to check the function of each path. However, since the number of the paths is not small, a set of the simple connectivity check can make a large processing load on the network management. It also results in a heavy traffic on the DCN. To support variety of services, load balancing mechanism which alleviates this problem is necessary.
2.3 The Merits of Network Management Using Active Network Framework Figure 2 shows the proposed configuration of the network management system. This configuration is based on TMN model. Agent-based programmable network
Operating System
Operating System
Operating System
User network manager
Data communication network
Software
Software
Software
NE(HW)
NE(HW)
NE(HW)
Workstation
Software NE(HW)
Telecommunication Network
Fig. 2. Network Configuration of Proposed Architecture
Flexible Network Management Using Active Network Framework
245
elements play a central role in this configuration. In this configuration, each operating system, workstation, and network element uses the software agent to execute management task. Operating systems, workstations and network elements have a mechanism to support agent system working on them. Separated from the user traffic, which is carried by the telecommunication network, mobile agents are carried by the DCN. This configuration has the following merits: • Flexibility: Since the new management functions are easily introduced in the network as a new agents, the network management system becomes flexible. When the network provider starts a new network service, the provider has to create a new management agent which supports new service. Without suspending network services, the provider can activate the new management agent so that it supports new service. • Load balancing for network management: Management function based on mobile agents can move around network elements and operating systems. Thus it is not necessary to pre-install all the management functions on the network elements and the operating system. Although the management functions required for the each network service are different, the mobility of the agent can install required management functions on where it is necessary. While the mobile agents execute, the operating system does not manage that function. This reduces the consumption of DCN network bandwidth, network element’s storage and workload of operating system and network elements. Note that our approach is free from two important criticisms of the active network, i.e. security and performance. • Security: As the network becomes the important information infrastructure of the society, how to keep network secure becomes one of the critical issues. Scott et al. [14] proposes the use of program verification techniques to keep the active network secure. However, no existing implementations of active networks use their techniques, and the security issue still remains as a future research issue in general.We use active network framework only on the DCN network. And the conventional techniques are still used for the telecommunication network which carries the user traffic. In other words, our architecture separates the DCN from telecommunication network. The fact that the user can not access the DCN alleviates the security problem. Even if the user tries to use agent that may disturb network services, our approach does not permit such an agent to influence the network service. • Performance: A typical implementation of active network has about 2,000 packets per minute [13, 14]. Since the current hardware switches can handle more than 2,000,000 packets per minute, there is a strong performance criticism on the active network. In our approach, the bottleneck of the performance does not exist in transporting phase but in the procedural phase for the management tasks. For example, various types of tests are performed in the configuration phase of the network installation. However, such tests require minutes of testing period and the transporting time is negligible. While the various types of management task are performed by the agent system, the user data can still enjoy the fast hardware switching performance in our approach.
246
3
Kiminori Sugauchi et al.
Prototype and Evaluation Results
We have implemented a prototypical network management system. It focuses on network test function. In this section, we first describe the test function. Then we describe the implementation of the prototype and the evaluation results of the prototype.
3.1 The Network Management Function Using Mobile Agents The network management task consists of various sub-tasks. The fault management task is one example of the network management task. It consists of “Detection”, “Isolation”, “Restoration” and “Notification”. Here we focus on the test function which is used in the Isolation and Restoration task. The test function is used not only for the fault management but also for the installation of a new network. Although the test function can concentrate on the fault parts on the network for the fault management, it has to test many paths in newly installed network which will be deployed in the service. If a new network is developed and installed, it is necessary to test all the paths in the installed network. It is not necessary to execute these tests instantaneously, but it is desirable to execute these tests during short period for providing network services quickly. If a network management system executes these tests centrally, it needs high performance computing power and high reliability/availability of the DCN. By using mobile agents, test sequences are processed locally on the network equipment. Mobile agents will inform the network manager only when these tests find problems. In this case, the network manager can focus on the paths which fail in tests.
3.2 Software Structure of Prototyping for Test Function We developed a mobile agents based test management function for SDH network management. Figure 3 shows the configuration of the prototype. In the prototype, each network element is simulated by dedicated software on each computer, and the telecommunication network is a simulated network. DCN network is implemented using TCP/IP network. The simulator used in this prototype emirates the specification of an actual SDH products. The prototype of the test functions was developed based on the specifications of the OSI systems management model. In the standards of OSI management, seven test categories are defined (connectivity test, connection test, loopback test, and so on) [12]. We also developed two advanced test functions by combining the simple OSI test functions. One is multiple connectivity test , that is a test for the path connectivity between two network elements. By moving along the path these agents check the section connectivity of the specified paths. The another is multiple loopback test, that is a test for many paths related to a specified network element. These agents execute multiple loopback tests for plural paths.
Flexible Network Management Using Active Network Framework Agent Database
Agent Server Agent platform
Network Configuration
247
G U I
Java Virtual Machine (JVM) Agent Based Network Manager
SDH Simulator
SDH Simulator
Agent platform JVM
SDH Simulator Agent platform JVM
Simulator
Simulator
Agent platform JVM
Simulator
Fig. 3. Prototype Configuration
3.3 Evaluation Result By using the prototype, we evaluated the effectiveness of network management system which uses the mobile agent technology. The conclusions from the evaluation of the prototype are summarized as follows: 1. Number of network operations: In the connectivity test, the connectivity of a single path is the basic item to be checked. In the SDH network, a single path is divided into many sections. A basic connectivity test is executed on each section. After the section tests, the result of the path test can be created by combining the results of the section tests. In traditional test scenario, manager application invokes 4 or 5 operation for one section test. Therefore the number of the management operation increases to 4 or 5 times according to the number of section test. By using mobile agents, the number of test operations is only two because the test agents can move along the path autonomously. The manager simply receives the result of the path test by sending single agent for the path test. In a traditional system, we need over thousand operations. Autonomy of the agent contributes to reduce these operations. Similarly, multiple loopback test also helps to reduce the number of remote operations, even if 10 Gbps network has at least 384 paths. 2. Performance: Although the performance is a typical criticism on the active network, we confirm that the bottleneck of the performance does not exist in transporting phase in our prototype. The testing phase consumes more time to check the connectivity of the network.
4
Summary
We proposed the use of the active network architecture in implementing a network management system. We use mobile agent technology to realize programmable network elements which give the flexibility and load balancing capability to the proposed network management system. We also evaluated efficiency of the prototype SDH test management system. The finds from the evaluation of the prototype are summarized as follows:
248
Kiminori Sugauchi et al.
(1) The use of outbound control solves the security problem which is the most important research issue to realize secure active network. (2) The performance problem, which is also an important criticism of the active network, can be partially solved by our implementation. (3) The computing load for the network management, which is an important issue of large network management, is decreased by the mobile agent technology. The flexibility of the management system enables the network carrier to introduce new network service without suspending their network.
Acknowledgment We wish to acknowledge Mr. Masanori Kataoka, General Manager of Systems Development Laboratory, Hitachi, Ltd. and Prof. Radu Popescu-Zeletin, Director of GMD FOKUS, for giving us a chance of undertaking this GMD–Hitachi Collaboration research. We also thank Mr. Yusuke Yamamoto of the Telecommunication Division, Hitachi, Ltd. for the discussion about SDH specifications.
References [1] T. Magedanz, T. Eckardt: “Mobile Software Agent: A New Paradigm for Telecommunication Management”, NOMS ‘96 (1996) [2] Y. Kim, et al.: “Design Considerations of a Mobile Agent System for the Network Management Purposes”, APNOMS ’98 (1998) [3] M. Baldi, et al.: “Exploiting Code Mobility in Decentralized and Flexible Network Management”, 1st International Workshop MA '97 (1997) [4] http://www.fokus.gmd.de/cc/ima/miami/ [5] S. Rooney, et al.: “The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks”, pp42-53, Vol. 36 No.10, IEEE Communication Magazine (1998) [6] J. Huard, et al.: “A Programmable Transport Architecture with QoS Guarantees”, pp.54-62, Vol. 36 No.10, IEEE Communication Magazine (1998) [7] D. Wetherall, et al.: “Introducing New Internet Services: Why and How”, pp.1219, Vol. No. , IEEE Network Magazine (1998) [8] V. A. Pham, et al.: “Mobile Software Agents: An overview”, pp. 26-37, Vol. 36 No. 7, IEEE Communication Magazine (1998) [9] M. Gervais, et al.: “Enhancing Telecommunications Service Engineering with Mobile Agent Technology and Formal Methods”, pp. 38-43, Vol. 36 No. 7, IEEE Communication Magazine (1998) [10] M. Breugst, et al.: “Mobile Agents – Enabling Technology for Active Network Implementation”, pp.53-60, Vol.3 No.12, IEEE Network Magazine (1998) [11] ITU-T Recommendation M.3010 (1992) [12] ITU-T Recommendation X.737 (1995) [13] Maria et al.: “Active Network Support for Multicast Applications”, pp.46-52, Vol.3, No.12, IEEE Network Magazine (1998) [14] D. Scott et al.: “A Secure Active Network Environment Architecture: Realization in SwitchWare”, pp.37-45, Vol.3, No.12, IEEE Network Magazine (1998)
Managing Spawned Virtual Networks Andrew T. Campbell1, John Vicente2 , and Daniel A. Villela1 1
Center for Telecommunications Research, Columbia University 2 Intel Corporation
Abstract. The creation, deployment and management of network architecture is manual, ad hoc and slow to evolve to meet new service requirements resulting in costly and inflexible deployment cycles. In the Genesis Project ([email protected]), Columbia University, we envision a different paradigm where new network architectures are dynamically created and deployed in an automated fashion based on the notion of "spawning networks", a new class of open programmable networks. Spawning networks support a virtual network operating system called the Genesis Kernel that is capable of profiling, spawning, architecting and managing distinct virtual network architectures on-thefly. In this paper, we describe a kernel plug-in module called "virtuosity" for the management of multiple spawned virtual networks. Virtuosity exerts control and manages multiple spawned virtual network architectures by dynamically influencing the behavior of a set of resource controllers operating over management-level timescales.
1
Introduction
The rapidly evolving nature of the application base, service demands and underlying network technology presents a significant challenge to the deployment of new network architectures. This challenge calls for new approaches to the way we design, develop, deploy and analyze next-generation network architecture in response to future needs and requirements. Currently, the creation and deployment of network architecture is manual, time consuming and a costly process. To the network architect the creation process is typically ad-hoc in nature, based on hand crafting small-scale prototypes that evolve toward wide scale deployment. We envision [8] a different paradigm where a communication middleware platform is capable of profiling, spawning, architecting and managing distinct virtual network architecture on-the-fly. We call our vision Genesis and summarize here the Genesis Kernel, a virtual network operating system. We believe that the design, creation and deployment of new network architectures should be automated and built on a foundation of spawning networks, a new class of open programmable networks. Spawning networks represent a new approach to the field of programmable networking where the network environment is capable of 1
Daniel Villela is a CNPq-Brazil Scholar 2 John Vicente is a Visiting Researcher at Columbia University
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 249-261, 1999. Springer-Verlag Berlin Heidelberg 1999
250
Andrew T. Campbell et al.
dynamically creating new network architectures on-the-fly. The Genesis virtual network kernel represents a next-generation approach to the development of programmable networks building on our earlier work on open programmable broadband [15,16,7] and mobile networks [25]. The Genesis Kernel has the capability to spawn child network architectures that can support alternative architectures in comparison to their parent network architectures. We call a virtual network installed on top of a set of network resources a parent network. The parent virtual network kernel has the capability of creating “child networks”. A child network operates in isolation on a subset of its underlying “parent network” resources and topology, supporting the controlled access to a set of users with specific connectivity, security, QOS and isolation requirements. At the lowest level of the Genesis Kernel architecture [8], a transport environment delivers packets from source to destination end-systems through a set of open programmable virtual router nodes called routelets. A virtual network is characterized by a set of routelets interconnected by a set of virtual links, where a set of routelets and virtual links collectively forms a virtual network topology. Each virtual network kernel can create a distinct programming environment that supports routelet programming and enables the interaction between distributed objects that characterize the spawned network architecture. The programming environment comprises a metabus3 that partitions the distributed object space supporting communications between objects associated with the same spawned virtual network. Each virtual network has its own metabus. A binding interface base [1] supports a set of open programmable interfaces on top of the metabus, which provide open access to a set of distributed routelets and virtual links that constitute a virtual network architecture. The metabus and binding interface base also support a set of life cycle services, enabling the profiling, spawning and management of child virtual networks. For full details on the Genesis Kernel see [8]. Within Genesis, resource management of spawned virtual networks is handled by virtuosity [9], a Genesis Kernel plug-in. The virtuosity architectural model (see [9] for complete architectural details) comprises a number of distributed elements. These elements are instantiated as part of the child virtual network kernel during the spawning phase [8] and are deployed as a set of distributed plug-in objects. Virtuosity leverages the benefits of the kernel hierarchical model of inheritance and nesting delivering scalable virtual network resource management. The Genesis virtual network resource management system is governed by four basic design goals that include slow time-scale dynamic provisioning, capacity classes, which provide general purpose ‘resource pipes’, inheritance and autonomous virtual network control. In this paper, we present the elements of the virtuosity system, a next-generation architecture for virtual network resource management. In Section 2, we present the maestro, a central controller responsible for managing the global resource policy within the virtual network. In Section 3, we introduce the auctioneer, which implements an economic auctioning model for resource allocation. An arbitrator, presented in Section 4, represents an abstract virtual network capacity 'scheduler'. We summarize and conclude in Section 5. 3
Metabus is a per-virtual network software bus for object interaction.
Managing Spawned Virtual Networks
2
251
Maestro - Distributed Virtual Network Control
At the core of the virtuosity resource management architecture is the maestro, a key controller that oversees the resources4 (i.e., virtual links that interconnect routelets) of the managed virtual network domain. Virtuosity, through maestro, manages and controls virtual network resources on a slow performance management [13] timescale that operates on the minutes / tens of minutes period. We argue that this is a suitable timescale for virtuosity to operate, while allowing virtual networks to perform dynamic provisioning, as needed. Maestro coordinates virtual network control through distributed virtuosity components performing virtual network monitoring, economic-based resource allocation and capacity-based scheduling all of which operate or exert control on management-level timescales. Using fundamentals of distributed management design the maestro manages global resource policy within a virtual network and its (parent) allocated virtual network resources. In a fully distributed manner, the maestro maintains global state of its virtual network. Maestro uses dynamic provisioning of virtual network resource capacity to meet the changing needs of its child networks (captured in child pervirtual network policy) and to react to changes in its global state. That is, a maestro may need to respond to dynamic changes in its own virtual network (e.g. changes in the resource needs of its local clients/users) and child network resource needs, as well as adjusting to changes imposed on it by the underlying parent network resource availability. In addition, maestro coordinates and influences child network behavior through the integration of monitoring-based feedback and economic factors which are being driven by subscriber service demands and cost potential. Maestro establishes resource policies, coordinates policy distribution and enforces policy through capacity scheduling and policing. A delegate, acting as a proxy agent, serves maestro by promoting decentralized coordination and localized communications. Delegates handle all local resource interactions and control mechanisms on the virtual network domain-specific routelets by interfacing with the other virtuosity elements supporting resource allocation and virtual network scheduling. Maestro interacts with its child networks to promote the efficient use of its global resources while ensuring that the resource needs of its child networks and its own virtual network users are being met. The maestro can influence the way in which resources are allocated to its child networks by setting optimal market pricing [20] and resource allocation strategies, e.g., under provisioning its own virtual link resources but overbooking resources to child networks to maximize revenue for the controlled capacity traffic.
4
Although we restrict the virtual network resource to link bandwidths, we feel that the virtuosity model can be easily extended to support other router resources or by partitioning router resources proportionally based on virtual network link aggregate demands.
252
Andrew T. Campbell et al.
2.1 Maestro Design During the spawning phase of a child network the maestro conducts a virtual network admission control test based on the resources requested by the child network topology. If the test is positive, then the parent provider network admits the child network and allows it to become a participant in the auctioning process controlled by the auctioneer and governed by the child virtual network's policy. Admission is coordinated by the parent maestro using its virtual network hierarchy tree. The parent maestro receives a ReqSpec for admission from a child virtual network and determines if sufficient resources are available within the context of its own available network resources to meet those new demands. If this is the case, it indicates that the parent has sufficient residual capacity in its own right to accommodate the child's needs. Virtuosity implements a measurement-based virtual network admission control test. Admission is based on evaluating the ReqSpec target capacity class resource provisions (viz. rate_quantity) against aggregate capacity class policies and aggregate resource usage. By monitoring the available capacity along all of its virtual links, the maestro determines if resources allocated along its virtual links are also underutilized. Based on this measurement state and capacity threshold violations it can allocate under utilized resources based on the capacity class, bandwidth and policy requested by the new virtual network. If capacity is available, the child network is immediately
auctioneer
(23) passLocalPolicy()
(18) passAuctionResults() (17b) updateSeller() (16b) invokeEvent()
arbitrator
child maestro CP
parent maestro CP delegate
policy cache
(2) requestAdmission() (1) eventSpec() (17c) admissionNotify() (17a) setSeller() (16a) promptEvent() (12,22) updatePolicy() (4a) getaggPolicy()
maestro CP
(14a) rankResources() (14b) resourceAvailability() (14c) rankCClass() (21) estPolicy()
analyzer
measure cache
(4b, 13a) getAggregateStats () (13b) getAggregateViolations()
(5) aggregate() (3) resourceAdmission() (6) notify()
admission controller
(8) requestAdmission() (7) eventSpec() (9) admissionNotify()
(10) checkParentAllocation() (11to parent) commitAllocation () (19) distributeAllocation() (20 from child)commitAllocation ()
(15a) optPrice() (15b) optquantity()
policy controller
Fig. 1. Maestro Object Model
optimizer
resource allocation controller
Managing Spawned Virtual Networks
253
admitted and the child network is allowed to participate in the auction process. In the case that the parent has insufficient resources to accommodate the new child network then it needs to renegotiate its provisioning needs with its own parent (and hence its provider) at the next level down its virtual network inheritance tree [8], [9].The provisioning request enters the parent auctioning process following a successful admission control sequence by traversing the hierarchy tree through several levels until a provider is found that can accommodate the requested demands. Through slow timescale resource allocation, the maestro invokes the auctioning process on a periodic or static deadline basis. This period is driven (again) by slow timescale consideration allowing the auctioning process to reach equilibrium and maintain constant services over longer timescales, e.g., tens of minutes. Resource pricing and quantity announcements to child networks are set such that the parent can achieve more effective utilization and revenue gain. The maestro uses two variables for resource auctioning. These are a price quote, Qij, and a rate quote, Rij, (where i = virtual resource; j = capacity class) which the delegate element relays to the seller object of the resource allocation system for appropriate auctioning. The auctioning process (which is discussed in the next section) requires a recursive, distributed algorithm and global consensus in order to reach steady state [20]; this constrains the static or dynamic invocation process to a lower periodic bound for recurring resource allocations. Upon reaching auctioning equilibrium [20], the maestro receives the results of the auctioning process, calculates local resource policies and stores the resource allocation policy results for child networks in the policy cache. Also depicted in Figure 1, we now illustrate the behavior of the admission control and the resource allocation process from the perspective of the maestro system and its objects (viz., maestro control point (CP), resource allocation controller, admission controller, optimizer, analyzer, policy controller, policy cache, measurement cache, and delegate) embedded between the maestro CP's of the child and parent virtuosity systems. The process begins with the child maestro CP submitting an eventSpec (1) notifying the maestro CP that the child is requesting admission or an extended resource capacity request. This is immediately followed by a requestAdmission() (2) of the associated provisioning specification. The maestro CP then invokes multiple resourceAdmissions() (3) methods from the admission controller for requested capacities on parent resources. The admission controller responds, notify() (6), after testing admission per resource against existing aggregate (4a) provisioning policies (i.e., child networks and local provisioning policy) and aggregate (4b) resource measurements (i.e., resource availability) to determine if the child requested increase or admission specifications exceeds the composite (5) provisioned policies and resource availability. The admission test is based on a rule-based policy that is per capacity class and per class resource usage in the determination of available capacity. The result (in this case) is admission failure along with the failure code and failure specification structure, specified per resources in list form. In turn, the maestro CP must then send an eventSpec (7) and requestAdmission (8) to the parent maestro CP to request additional capacity resources to extend its currently allocated provision, on behalf of the child's request. In this case, admission is successful upon completion of the parent's admission control, and the resource allocation (auctioning) procedures follows with notification through the parent maestro CP (9) with the admissible and allocated provisioning specification.
254
Andrew T. Campbell et al.
Prior to commitment (11) to the parent allocation, the allocation is checked (10) by a local resource allocation controller and stored (12) in the policy cache, if acceptable. The analyzer object is then invoked to assess the balance of global resource consumption (13a,b,c), across the virtual network resources (14a,b) and the capacity classes (14c). The analyzer results are used by the optimizer object to establish the optimal price (15a) and quantity (15b) values per capacity class per managed resource for appropriate auctioning. The auctioneer is event notified (16a,b) for next provisioning interval synchronization and relayed with the optimal per class per resource (17a, b) auctioning variables. At this point, the child maestro CP is then notified (17c) of successful admission and prepares itself for auctioning with the local auctioneer. It is anticipated that reaching auctioning equilibrium will occur on the order of several minutes or longer as the number of competing child subscribers, parent resources and capacity classes increase. Nevertheless, we argue that the extended auctioning period is well in line with the necessity to maintain network stability through management timescale control, furthermore, we believe that this trade-off is offset with the resource efficiency gains that are achievable with slow timescale dynamic provisioning. Upon reaching auctioning equilibrium, auctioning results (18) are submitted (through the maestro CP) to the resource allocation controller, which then proceeds to distribute (19) resource allocations to the child network auctioning 'buyers'. It coordinates with the child's resource allocation controller to gain final commitment (20) on the allocation. If unsuccessful, the resource allocation process may re-cycle through these same steps, until reaching firm commitments by all child network subscribers and their requested resources capacities. If successful, the policy controller object establishes local policies (21), and the maestro CP updates (22) the policy cache with child network provisions and local resource policies. Finally, the delegate object passes (23) the required capacity scheduling and policing policies to manage and enforce the child allocations.
3
An Auction-Based Resource Allocation System
We propose a virtual network resource allocation process based on supply and demand of virtual network services where competing child virtual networks, working on behalf of a community of users and through appropriate specification, request resources and pay for such services to a provider of virtual network services. There are inherent behaviors and objectives that dictate the economics, and more importantly, the effective allocation, partitioning, and utilization of such services. We argue that the provider (parent virtual network) and subscriber (child virtual network) behaviors, and correspondingly their objectives serve as fundamentals that can be leveraged for resource maximization through the influence of economic variables. Network providers seek to achieve resource efficiency through the effective utilization of link resources through effective price-based, load balancing and the addition of multiple virtual network subscribers. The competing nature that both the provider (parent) and subscriber (child) exhibit, we argue, should create the necessary dynamics that leads to a more aggressive environment for achieving resource efficiency.
Managing Spawned Virtual Networks
255
delegate From child maestro (10) passAuctionResults () (2) update()
(4b) setVector()
(1) eventNotify ()
auctioneerCP
buyer
auctioneer
(5) bid() (9) getAllocation()
(8) updateResults()
(3) updateSeller()
auctioning agent seller
bid list
(6) update() (7) allocation()
(4a) ask()
Fig. 2. Auctioneer Object Model
In this paper we consider a strategy known as Progressive Second Price (PSP) [20] that aims to provide high resource efficiency (e.g., cost, utilization) via a competitive, market-driven auctioning process. Auctioning occurs between a set of buyers (child virtual networks) and a seller (parent virtual network). Within a competitive bidding process a successful allocation conclusion for a particular buyer may not be attainable if the buyer is unwilling to pay the market value for the resource capacity or does not offer provisioning alternatives. The auctioning process is designed to follow a bidding procedure, allowing, for example the auctioning of best-effort classes would generally follow the more stringent capacity classes. In our extended auctioning model we introduce two key contract variables: contract_duration and contract_maxcost. These variables represent important provisioning options which allow child subscribers to make long-standing contracts with the parent provider; in this sense, these variables represent a way for a child to avoid the normal open-market competition of the auctioneer.
3.1 Auctioneer Design The auctioneer object architecture is illustrated in Figure 2. It comprises several objects: auctioning agent, seller, buyer, bid list, and auctioneerCP (auctioneer control point). The auctioneer is present at each routelet auctioning resources to a number of child network subscribers for its virtual link resources. The auctioneerCP object acts as a proxy that exchanges necessary information with the maestro (through the delegate) to receive updated parameters for interval-based auctioning. The seller object represents the provider in the auctioning system. Its task is to specify the quantity of the resource that is available for auctioning and the price for the resource. The seller will announce what is the current market price and availability for individual resources and capacity classes via the ask() method. These variables are
256
Andrew T. Campbell et al.
optimally determined based on what drives the provider market towards the desired revenue objective as well as resource gain efficiency, e.g., preferring controlled capacity to constant capacity. On the other hand, the buyer object plays the subscriber or child network role. Several buyers are allowed to be present within an auctioneer, bidding on behalf of the virtual network they represent. Buyer instantiation is a result of admission control; created and enabled when the buyer object participates in the resource allocation process. Once the buyer enters the auctioning process, it requests of the auctioning agent a position in the bid list that contains updated information (bids, allocation, quantities) about each buyer participating in the ongoing process. The buyer can bid for a resource by specifying its required quantity and the price expected to pay (upper bound) for individual capacity classes. The buyer object seeks to minimize cost and maximize rate quantity for each capacity class. The auctioning agent maintains the bid list object and stores updated information (current state) about the auctioning process. By accessing a bid list, the buyer can retrieve its current allocation and find whether it has been granted better conditions throughout the auctioning process. When equilibrium has been reached, the auctioning agent updates the results from the process to the auctioneerCP for forwarding to the maestro. By definition, equilibrium is achieved when all buyers cannot improve their allocations. This can be implemented using a timeout condition to signal that no more changes will occur. Once the buyer reaches a position that it cannot improve, it will cease bidding or notify the child virtual network maestro to seek a provisioning alternative and renegotiate for the resource. The process is considered to be in equilibrium when all buyers are satisfied. The dynamics of the auctioning process is illustrated in step-wise form in Figure 2. A delegate, operating on behalf of maestro, notifies the auctioneerCP that an event is taking place (1). It then updates the optimal price (Pij) and rate quantity (Rij) as provided by the maestro for appropriate auctioning (2), refreshing the seller state (3). The seller object then announces (4a) the available resource quantity and associated pricing for buyer bidding. In parallel, buyers will receive (4b) from children's virtual network maestros the desired strategy (allocation and cost) for their bidding (5). The auctioning agent mediates the auctioning process between buyers and sellers seeking successful auctioning equilibrium and optimal resource allocations. It maintains a bid list via the update() method (6) for allocations resulting from buyers' bidding, according to their resource valuations and recent bidding. The allocations can be retrieved at any time (7) by the auctioning agent to keep buyers informed. The auctioning agent then updates the results (new allocations and costs) at the AuctioneerCP (8). Information about allocations is also available to buyers through requests (getAllocation() method) (9). Delegates will then receive agreed price and rate allocations (10) and will pass the information to the maestro for child networks' policies according to their provisioned share of the parent resource. The maestro seeks closure by communicating the allocations (or denials) to child virtual network maestros through its delegates for final consideration. If agreed allocations are not satisfactory for any subscriber child network, the child maestro may invoke the auction process (again) with alternate specifications, and the previous child
Managing Spawned Virtual Networks
257
allocations and policies are voided by the parent maestro. Further details about the PSP auctioning model can be found in [20].
4
Virtual Network Capacity Scheduling
With the arbitrator, we introduce the notion of virtual link capacity-based scheduling driven by a parent-child hierarchy of virtual network provisioning policies. The arbitrator receives virtual network policy from the maestro over a slow timescale provisioning interval upon completion of the resource allocation process. The virtual link arbitrator manages the access and control of the parent link packet scheduler based on policy-driven virtual network capacity. Leveraging similar ideas of flow quality of service semantics [5], (e.g., deterministic, statistical and best effort) we abstract the QOS class differentiation concepts and apply it to capacity-based 'provisioning' classes. The intent here is to provide more provisioning flexibility and control to the child virtual network for maximizing resource efficiency and QOS control and reduce the burden on the parent to manage the child domain and maintenance of low-level QOS service level agreements (SLA). Rather, moving the provider (parent) and subscriber (child) service models to be based more on provisioning SLAs with flexible capacity classes. Therefore, virtual network SLA maintenance is kept strictly on a virtual link capacity basis while parent policing and regulation treatment can be removed from the delaysensitive models (to be managed autonomously by the child network services) and focused more on virtual link bandwidth sharing.
4.1 Arbitrator Design The capacity arbitrator is based on a set of virtual network capacity classes and class weight policies that are distributed to the arbitrator component by the delegate on behalf of the maestro. Virtual network classes represent differentiated policy for provisioning capacity. Class weights are calculated (by the maestro) based on the following parameters: rate_allocation, percentage, price variables specified in the AllocSpec. The capacity classes and weights are translations of the negotiated resource allocations on individual child virtual resources during the auctioning process and are used as the virtual link scheduling policies. Capacity classes and weights are used by the arbitrator to differentiate child virtual network allocations and the ordering of packet delivery to the parent link resource. During the spawning process, each virtual network is assigned a unique virtual network identifier to distinguish its traffic from other child network traffic. A capacity class identifier function is introduced into the arbitrator architecture prior to the child's link scheduler function to recognize QOS behavior treatment (e.g., best-effort, controlled load, expedited forwarding, etc.) associated with the child specific QOS architecture. This function interworks with a switch vector to assign each packet a stamp that associates it with a particular capacity class prior to its arrival at the routelet port packet link scheduler as illustrated in Figure 3. We refer to this procedure as capacity class mapping. Policy mappings are formed during the provisioning process and distributed during the resource allocation process to the child virtual networks. If, for example, a customer or child network supports only
258
Andrew T. Campbell et al.
best-effort IP traffic classes within a spawned child network and provisions for constant capacity, the switch vector would stamp all traffic with a constant capacity classification. On the other hand, if the customer supports Integrated Services classes (viz. guaranteed delay, controlled load and best-effort) within a spawned child network and provisions for all three capacity classes, then the class identifier would stamp the traffic with corresponding capacity classifications, by default. QOS class and capacity class mappings are considered part of the provisioning policy for child networks and stored within the policy cache of the supporting the maestro. virtual network provisioning policy
supporting arbitrator QoS mapping elements classes
thresholds
constant capacity
capacity scheduler
capacity class ID
monitor
capacity classifier
controlled capacity
to physical link or (grand) parent port
best-effort capacity
parent arbitrator
outgoing parent local traffic
capacity class ID
outgoing virtual network C traffic
:
capacity class ID
outgoing virtual network B traffic
capacity class ID
capacity class switch vector ..
outgoing virtual network A traffic
weights
parent link scheduler child link schedulers
Fig. 3. Arbitrator
A capacity classifier is used to identify virtual networks and their capacity classes. The classifier queues incoming stamped packets (from the output of routelet port link schedulers) to the appropriate capacity queue structures (viz. constant, controlled, and best-effort). Individual capacity queues are created for each child virtual network within an allocated capacity queue structure. Each virtual network queue is then assigned an appropriate weight, based on the policy previously negotiated and distributed by the maestro. Within the provisioned interval, the arbitrator manages scheduling of virtual network control based on the capacity class priority and weights, allowing child networks (and local user traffic) to queue available packets to the parents output port. The capacity arbitrator leverages space (i.e., available resource bandwidth), time (i.e., provisioning interval length) and capacity-class abstractions to
Managing Spawned Virtual Networks
259
manage scheduling of its own user traffic and packets from child virtual networks onto the parent virtual link. The capacity scheduler services the capacity queues in priority order and weighted round-robin for same capacity class queues. As illustrated in Figure 3, child network traffic is scheduled by the child's packet link scheduler associated with the routelet output port and similarly, the parent network routelet port. The introduction of the virtuosity arbitrator into the output port architecture merges both child and parent QOS scheduled traffic and provides coarse capacity scheduling of the composite traffic based on the allocated provisioning policies. It is important to note that the illustration represents the default virtual network resource management implementation, but is not the only parent option for managing child network traffic. Alternatively, the parent may override5 the arbitrator function and integrate child network traffic through its local routelet port link scheduler. Also illustrated in Figure 3, the monitor is central to the arbitrator, performing monitoring and policing on individual parent resources. Policing assures that child virtual networks are not consuming parent virtual networks resources above and beyond their allocation of the virtual link capacity. Policing actions (e.g., dropping, tagging or degrading to best-effort capacity class) is driven by virtual network policy.
5
Conclusion
We are implementing "spawning networks", a new class of open programmable networks. The Genesis Kernel lies at the heart of spawning networks capable of profiling, spawning, architecting and managing distinct virtual network architecture on-demand. In this paper, we have described a kernel-level plug-in module called "virtuosity" for the management of multiple spawned virtual networks. The virtuosity framework comprises a maestro, which performs distributed virtual network control; an auctioneer, which leverages economic models based on auctioning to perform resource allocation; and finally, an arbitrator which performs policy-based virtual network capacity scheduling.
Acknowledgement This work is supported in part by the National Science Foundation (NSF) under CAREER Award ANI-9876299 and with support from COMET Group industrial sponsors. In particular, we would like to thank the Intel Corporation, Hitachi Limited and Nortel Networks for supporting the Genesis Project. John B. Vicente (Intel Corp) would like to thank the Intel Research Council for their support during his visit with the Center for Telecommunications Research, Columbia University. Daniel A. Villela would like to thank the National Council for Scientific and Technological Development (CNPq-Brazil) for sponsoring his scholarship at Columbia University (ref. 200168/98-3). 5
This also suggests that virtuosity, or at least key components of virtuosity are not required and may be substituted. The architectural selection and realization is based on the parent resource management policy set during the profiling phase and programmatically composed during the spawning phase of the life cycle process.
260
Andrew T. Campbell et al.
References [1] Adam, C. .M., et al., "The Binding Interface Base Specification Revision 2.0", OPENSIG Workshop on Open Signalling for ATM, Internet and Mobile Networks, Cambridge, UK, April 1997. [2] Biswas, J., et al., "Application Programming Interfaces for Networks", IEEE P1520 Working Group Draft White Paper, www.ieee-pin.org [3] Blake, S., et al. “A Framework for Differentiated Services”, draft-ietf-diffservframework-01.txt. [4] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss W., “An architecture for differentiated services”, draft-ietf-diffserv-arch-02.txt, October 1998. [5] Campbell, A.T., Coulson, G., and D. Hutchison, "A Quality of Service Architecture" , ACM SIGCOMM Computer Communication Review, Vol. 24 No. 2., pg. 6-27, April 1994. [6] Session on “Enabling Virtual Networking”, Organizer and Chair: Andrew T. Campbell, OPENSIG '98 Workshop on Open Signaling for ATM, Internet and Mobile Networks, Toronto, October 5-6 1998. [7] Campbell, A.T., De Meer, H., Kounavis, M.E., Miki, K., Vicente, J., and Villela, D. A., "A Review of Programmable Networks", ACM Computer Communications Review, April 1999. [8] Campbell, A.T., De Meer, H., Kounavis, M.E., Miki, K., Vicente, J., and Villela, D. A., "The Genesis Kernel: A Virtual Network Operating System for Spawning Network Architectures", IEEE 2nd International Conference on Open Architectures and Network Programmability (OPENARCH'99), New York, October 1998, pp. 115127. [9] Campbell, A. T., Vicente, J., and Villela, D. A., "Virtuosity: Performing Virtual Network Management", International Workshop of Quality of Service (IWQoS), London, June 1999. [10] DARPA Active Network Program, http://www.darpa.mil/ito/research/anets /projects.html, 1996. [11] Duffield N., et al., “A Performance Oriented Service Interface for Virtual Private Networks”, draft-duffield-vpn-QOS-framework-00.txt. Work in progress. [12] The Genesis Project: Programmable Virtual Networking, http://comet.columbia.edu/genesis, 1998. [13] Keshav, S., and Sharma, R., “Achieving Quality of Service through Network Performance Management”, Proc. of NOSSDAV’98, Cambridge, July 1998. [14] "The Integration of Real-Time Control with Management in Broadband Networks'', Proceedings of the Workshop on Broadband Communications, Estoril, Portugal, January 20-22, 1992, pp. 193-204. [15] Lazar,A.A., "Programming Telecommunication Networks", IEEE Network, vol.11, no.5, September/October 1997. [16] Lazar, A.A. and A.T Campbell, "Spawning Network Architecture", White Paper, Center for Telecommunications Research, Columbia University, http://comet.columbia.edu/genesis, Janurary 1998. [17] Van der Merwe, J. E. and Leslie, I. M., "Switchlets and Dynamic Virtual ATM Networks", Proc Integrated Network Management V, May 1997.
Managing Spawned Virtual Networks
261
[18] Van der Merwe,J.E., Rooney,S., Leslie,I.M. and Crosby,S.A., "The Tempest - A Practical Framework for Network Programmability", IEEE Network, November 1997. [19] Multiservice Switching Forum (MSF), http://www.msforum.org/ [20] Semret, N., and Lazar, A. A., “Design, Analysis and Simulation of the Progressive Second Price Auction for Network Bandwidth Sharing”, Technical Report CU/CTR/TR 487-98-21 [21] OPENSIG Working Group http://comet.columbia.edu/opensig/ [22] Rajan, R., Martin, J. C., Kamat, S., See, M., Chaudhury, R., Verma, D., Powers, G., Yavatkar, R., “Schema for Differentiated Services and Integrated Services in Networks”, draft-rajan-policy-QOSschema-00.txt, October 1998. Work in progress. [23] Rooney, S., Van der Merwe, J. E., Crosby, S. A., Leslie, I. M., “The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks”, IEEE Communications Magazine, October 1998, pp 42-53. [24] Touch,J. and Hotz,S., "The X-Bone", Third Global Internet Mini-Conference in conjunction with Globecom '98 Sydney, Australia, November 1998. [25] Valko, A. G., Campbell, A. T., Gomez, J., "Cellular IP", INTERNET-DRAFT, draft-valko-cellularip-00.txt
Active Organisations for Routing Steven Willmott and Boi Faltings Laboratoire d’Intelligence Artificielle, Department Informatique Swiss Federal Institute of Technology, IN (Ecublens) CH-1015 Lausanne, Switzerland {willmott,faltings}@lia.di.epfl.ch
Abstract. Communications networks require increasingly complex resource management to stay up and running. This is particularly true in networks which aim to provide some guaranteed quality of service (either explicitly as in ATM and other connection-oriented architectures or implicitly as in a smoothly running IP network). The resulting increased complexity of routing procedures needs to be handled in a coordinated and flexible manner. This control could well be provided by customisable control programs in the network which rely on the computational capabilities provided by active nodes. Not only will control programs need to act independently and autonomously but they will also need to coordinate their actions with each other to ensure that decisions in the network are taken in a coordinated and consistent manner. This paper presents a framework for organising groups of control programs for routing tasks in a network. The organisation is able to adapt its own structure over time as the state of the network changes. Keywords: Routing, organisation, coordination, distributed artificial intelligence.
1
Introduction
Resource management and routing are network management problems which require careful control in today’s communications networks. Despite numerous predictions of a bandwidth glut ([11] among others), bandwidth use still needs to be carefully managed. The increased volumes of data flowing across modern networks mean that mismanagement can very quickly result in bottlenecks and potentially catastrophic cell loss. The problems are particularly acute for networks which aim to provide any kind of quality guarantees: 1. Connection-oriented network architectures such as TDM, SDH, SONET and ATM aim to guarantee Quality of Service (QoS) on a connection by connection basis. Making route calculations involves taking into account large amounts of link state information. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 262–273, 1999. c Springer-Verlag Berlin Heidelberg 1999
Active Organisations for Routing
263
2. In packet-based networks such as IP, routing is based on shortest path algorithms and there is typically a single main route available given for each source-destination pair (the estimated shortest). One of the principle aims of packet network operators is to keep the call rejection rate low whilst ensuring that accepted customers experience high levels of service. The advent of flow identification, as proposed for IPv6 [RFC2460] and per flow or per application routing (which may be possible with active network technology) would enable far more flexible resource allocation. Good allocation strategies can in both cases dramatically reduce the amount of over-capacity required to ensure smooth running. This paper discusses the use of customisable control programs running in active network nodes to control network routing. To accomplish this control programs need not only local control and information but also to coordinate with each other throughout the network. We present notions of organisation drawn from work in Distributed Artificial Intelligence (DAI) and Management Science as an approach to this problem. The main focus of this paper is on bandwidth adaptive organisations which change structure to match network resource availability.
2
Organised Routing?
Active networks provide the means to insert (possibly arbitrary) control programs and decision logic into individual network nodes. This supports the key aim behind much of today’s active networks research: enabling custom user control programs to be added into the network on a user by user or application by application basis. Arguably, for many applications there will also be an additional need for broader control to ensure that these control programs deliver a coherent final result across the network. Additionally the computations carried out at individual nodes will require (possibly non-local) network state and policy information for execution. Routing and resource allocation is perhaps chief amongst applications requiring wider coordination and state information. Routing algorithms often need to display non-local characteristics, such as prevention of routing loops and avoidance of bottlenecks. It directly follows that resource allocation processes running in network nodes need to be coordinated in their actions and have a clearly defined way of accessing distributed network state information - they need to be organised.
3
Active Nodes
Active networks are an enabling technology for the deployment of more intelligent and flexible network management schemes [12]. In the context of network routing problems active nodes in a network need to provide the following facilities:
264
Steven Willmott and Boi Faltings
– A computational environment which executes control programs operating on the routing process (we assume this environment can execute arbitrary control code but there should be scope for restricting this). – Access for control programs to a restricted set of primitives controlling node and link resources (a virtual instruction set). – A mechanism for updating the control programs present in the computation environment. Ideally this mechanism would be of the programmable switch type [12], acting as a “back door” for uploading new control programs into nodes. These together provide for logical (or actual) mobility of control programs, information, routing policies and inter controller relationships between nodes in the network. The work presented here is being applied in two domains: ATM networks and IP packet networks. The following two subsections outline the types of active nodes required for each. 3.1
Active ATM Nodes
In the case of an ATM network, controllers (the routing processes) do not directly manipulate the packet flow since routing decisions are made on a connection by connection basis. Once routes have been chosen they are set up in the switching tables of intermediate ATM nodes. Controllers instead control the application of route decision policy and how this is coordinated with other switches, making decisions on a per-demand basis. “Active” ATM nodes for our purposes therefore need to provide access to the primitives which control route selection for connections and to signalling processes used for connection set-up. In ATM networks there seems to be less scope for the active network “capsule” approach to passing code to nodes in packet headers. In general, parameters and settings for a whole flow are declared at connection set-up time - leaving much less flexibility for actions to be applied to individual packets.1 3.2
Active Packet Nodes
In packet networks, routing algorithms have direct influence at the packet forwarding level. In fact a strong branch of Active Networks research (advocated in [13] for example) is based upon the idea that the packet is the fundamental unit for control and to control. It is, however, difficult to see how effective routing and resource control could be achieved at the individual packet level (although for other network management functions this may well be the best level for control). Abstraction away from individual packets by aggregation of traffic into flows, groups of flows or other groups is often seen as essential for useful resource planning (see, for example, the efforts to provide flow identification in IPv6 [RFC2460]). Routing 1
Active packet headers might however be used profitably for making VBR and ABR services more controllable.
Active Organisations for Routing
265
tables in IP networks route packets in real time but stay relatively static, these tables are then updated by routing protocols such as RIP [RFC1058] and OSPF [RFC1131] which run in the background. IP routing tables therefore correspond to packet aggregation based on destination address. In active networks there may be several useful criteria for aggregation in the real time routing mechanism (rather than simply by destination) such as by source and destination, application or packet priority. An active packet node in the packet network for our purposes in the routing context would need to provide: – Access to the on-line routing mechanism principally so that it can be updated by routing mechanism operating in the background (e.g. not on a packet by packet basis). – A mechanism for flow identification. – Per-flow routing capabilities. The last two points allow much finer control or routing and are now often seen as essential for good resource management and QoS support in packet networks. Active networks is an important enabling technology for these two properties (and, of course, the first).
4
Building Organisations
There is a large body of work in both Management Science and DAI on how to apply organisational theory to distributed computational systems. [5] gives an AI perspective, [3] and [9] give interesting Management Science viewpoints and a collection of papers covering market based systems can be found in [2]. This work is complemented by the general trend in the network research community towards decentralisation and the use of hierarchies [10] and delegation [6]. The PNNI framework under development by the ATM Forum [1] is perhaps the most advanced and best known use of organisations in network architecture to date. A useful organisation needs to provide the following: – Information Organisation: Dividing up and providing access to information. For routing, this corresponds to representing the information required for making routing decisions (the link states and topology etc.). – Control Organisation: Ensuring distributed control decisions lead to coherent actions being carried out throughout the network. In routing, this corresponds to: 1) de-limiting where routing decisions are taken (hop by hop? at the source?) and 2) avoiding bottlenecks, congestion, oscillations, etc. and making sure the correct reservations are made. Organisations are made up of two types of component: units and relations. Units represent, for example; company departments, employees, sites or (here) areas of a network. Relations describe the relationships between units (such as superiority, parent, child, peer etc.) and define the organisational structure. The
266
Steven Willmott and Boi Faltings
examples given in this paper are all spatially distributed hierarchies. However, much of the discussion is relevant to other types of distribution (functional, by authority), and organisation ([3] for example discusses heterarchies, hierarchies and markets).
5
Static Organisations
In a static organisation the composition of units and existing relations between them remain fixed over time. The following two sections describe a static routing organisation in terms of controllers (units) and structure (formed by relations). 5.1
Control Programs
Control programs represent organisational units and perform management tasks in local areas of the network. For the routing task, control programs require the following types of information: – Routing policies and algorithms. – Continuously updated local information about the network state. This forms the basic input for solving routing tasks. – Responsibilities to other controllers elsewhere in the network (for example their superiors). This corresponds to control programs knowing their place in the organisation. Both the local information and the algorithms/policies may change over time to reflect the changing network state and management control. Controllers have knowledge of a limited area of the network and information about what lies outside this area is obtained via the organisational links. Control programs in a static organisation perform functions at two levels: – Local: Executing local routing tasks given demands for routes and their local state information. – Organisational: Cooperating with other controllers in the organisation to execute non-local routing tasks. These may be tasks arising in the controller’s own area for which non-local information or control is needed, or tasks arising elsewhere (which need information or action in control program’s own sphere of influence). 5.2
Control Structure
Relations are applied to compose many local control programs into an organisational structure which spans the whole network. Figure 1 shows a hierarchical structure with three levels of organisational units. The lowest level controllers (one level above the individual network nodes) each have a local viewpoint and hold information about the network state in that area. Actions are coordinated in accordance with the relations between the units,
Active Organisations for Routing
267
C Nodes Clustered at two levels
B A F
D E
G
K H
M
L
I
Hierarchy
J N
P
C1
O B1 A
B2
B3
B4
G B F E C D K
B5 L H
B6 I
M N O
B7 J
P
Fig. 1. The network nodes on the left are clustered at two levels,
e.g.: peer to peer: B2 communicates directly with B4 to find out about connectivity to node L, or hierarchically: controller C1 mediates between controllers in level B (all the Bx) to perform routing tasks. This control structure the stays fixed over time (although it may be updated for the physical addition/removal of nodes for example).
6
Adaptive Organisations
There are many different organisational structures (even if restricted to hierarchies) and no single organisation is appropriate for all tasks. There are clear arguments for allowing organisations to adapt over time, this is particularly the case when the environment they operate in is dynamic or there may need to be ad-hoc re-organisation (due to failures for example). There has been some preliminary work on adaptive organisations in the Distributed Artificial Intelligence (DAI), literature with [8] and [7] the most useful examples2 . The key requirement behind adaptive organisations is that the controllers in the organisation have some representation of their place in the organisation. The controllers can then apply a set of adaption rules to decide when to change this representation (and inform other controllers of the changes). The following sections present an organisation which adapts to bandwidth availability over time to illustrate the idea and utility of adaptive organisations. 6.1
Resource Summarisation at Different Levels of Abstraction
[4] introduces a clustering scheme for structuring network graphs. The network is divided up into equivalence classes according to connectivity at a specified available bandwidth. The regions created are called Blocking Islands3 . Figure 2 2 3
We refer the reader to [15] for further references. Please note that the clustering techniques and their applications are subject to patent protection.
268
Steven Willmott and Boi Faltings
gives an idea of the structures created by this clustering approach. A single network graph including nodes A to M and connected via links of varying residual capacities is clustered into regions at 6Mbits/sec and 9Mbits/sec.
Fig. 2. Two layers of blocking islands cluster a network at different levels of abstraction. The light grey regions (BI-1(9) - BI-6(9)) cluster equivalence classes of nodes reachable at 9Mbits/sec. The larger regions (BI-7(6) - BI-9(6)) cluster groups of nodes which can be reached at 6Mbits/sec. Dashed lines represent communication links with less than 6Mbits/sec free capacity, solid lines represent links with 6Mbits/sec or more free.
The sets of blocking islands generated for one bandwidth requirement are unique, identify bottlenecks (the inter-regional links) and highlight the existence and location of routes at a given bandwidth level. If two nodes are clustered in the same blocking island at a given bandwidth level there must exist at least one route between them - furthermore all links which form part of the path lie inside this blocking island. Applying this clustering technique several times for different bandwidth requirements represents bandwidth bounded connectivity in the network at different levels of abstraction. Changes in the available bandwidth on the links can cause splitting or merging of regions. 6.2
A Bandwidth Adaptive Routing Organisation
The hierarchy generated by the resource summarisation in Section 6.1 can be used to build an organisation. Control programs in the network nodes gather and hold information for each of the regions (blocking islands above). The simplest mapping is to designate one piece of control code responsible for each region both in the abstract and ground space (node-level). Control programs retain links to their neighbouring (peer) and parent/child regions (or at least to the controllers of these regions). Through the clustering scheme in the section above,
Active Organisations for Routing
269
this structure is then related directly to the bandwidth available and changes over time as resources are allocated and de-allocated. This organisation is applied to performing routing tasks in the network. More specifically, we are currently applying this to allocating CBR demands in ATM networks based on a source routing model.4 Controllers at the lowest level of abstraction perform routing tasks on real network nodes whilst controllers at higher levels coordinate the efforts of their subordinates (at lower levels of abstraction). The useful properties of the clustering scheme identified in [4] apply to any convex metric. However, bandwidth appears to be the most useful of these, primarily because many other QoS parameters (such as delay, jitter etc.) depend heavily upon available bandwidth [14]. Having the organisation adapt to the available bandwidth means that bandwidth information for routing decision making is already implicit in the information structure (before any routing algorithm has even been executed). 6.3
Control Programs
Control programs in an adaptive organisation are generalised versions of those used in static organisations (see Section 5). Controllers now require an additional meta-level of information: – Metrics for evaluating the need to change the organisation structure. These metrics form the update rules of the organisation. In a bandwidth adaptive organisation these metrics and update rules are based upon bandwidth availability in the network (and in this work arising out of the clustering techniques described in Section 6.1). As a result of this additional meta-level control, programs may now act at three levels: – Local: Executing local routing tasks given demands for routes and their local state information. – Organisational: Cooperating with other controllers in the organisation to execute non-local routing tasks. – Meta Organisational: In updating the organisation structure by applying the update rules. It is important to note that the update rules for this new meta-level must be embedded within the individual control programs themselves. Updates of the organisational structure cannot realistically be controlled by external processes (since network failures could cause all local adaption to stop and in large networks external adaption is a complex problem in itself if solved centrally). 4
Since this paper’s main aim is to discuss organisation issues, the routing schemes related to the organisational structures are not discussed here - these are described more precisely in [15].
270
6.4
Steven Willmott and Boi Faltings
Control Structure
Figure 3 shows two clusterings of controllers for a grid network. The first (left hand side) is the starting state with the controllers covering initially defined regions. For a static organisation this configuration persists and remains fixed over time. The right hand side shows an example adapted control structure for the same grid network. Nodes clustered at the top level (sharing the darkest regions, such as B and C) have high bandwidth connectivity available and regions only connected at low levels (such as A and B) are only reachable at low bandwidth.
A High
B
B
C
C
Medium
Bandwidth Available
A
Low 1. Starting state
2. Adapted state
Fig. 3. A grid network with a starting organisation as shown on the left. Over time the structure changes for a bandwidth adaptive organisation (as shown on the right). For a static organisation the initial state would be preserved. Finding a route between neighbouring nodes B and C is quick in the adaptive organisation since both nodes lie in the same local area of control. The same task takes longer in the static organisation because the two nodes happen to be clustered only at the highest level of abstraction. This difference reflects the representation of the ready availability of resources between nodes B and C in the adaptive organisation, which is something the static organisation does not capture. The situation is reversed for communications between nodes A and B and the adaptive organisation clusters these at the top level of the hierarchy whereas the static organisation can make a decision at the most local level. The extra effort required in the adaptive organisation reflects the fact that A and C are connected only by paths which are resource critical which may mean that they should be dealt with by an entity with a broader view of the network. Routing traffic on a critical link may have wider consequences for the rest of the network (for instance it may unnecessarily disconnect two regions of the network).
7
Active Organisations for Routing
An organisation in an active network forms part of the environment control programs on a given node execute in, it defines:
Active Organisations for Routing
271
– What information is available at execution time (information organisation), – What the wider context for the execution of the program is (control organisation). The relationships with other entities in the network may restrict the possible outcomes of computation, may determine, counteract or cause non-local effects. In our work on routing problem, network state information is managed within the organisation. Each controller holds local information and is able to query other controllers to obtain non-local information. The organisation also defines how a routing problem is solved: resource allocation decisions are finally taken by those controllers responsible for the lowest (link) level but where reservations are made is partly controlled from higher up in the hierarchy. The coordination structure provided by the hierarchy ensures that local resource reservations hang together to give complete routes and that load is evenly distributed to avoid congestion. The adaptive organisations described in the previous sections are also “active” in the sense that its control programs are able to represent the state of the organisation explicitly and update the organisation itself. Updates of code and information, as well as relationships etc., do not only come from network operators or users of individual applications but from within the network. Controllers on each node have some autonomy and influence over their own status in the organisation. Controllers may also have influence over controllers on other nodes (at lower levels of abstraction).
7.1
Importance of Active Network Developments
The requirements laid out in Section 3 clearly show the need for active network technology to support the work presented in this paper. Essentially, control programs need to be logically or actually mobile. As the organisational structure, network state and management policy (specific routing algorithms for example) change, the control programs in the network also need to adapt (or be replaced) dynamically. The adaptivity of the organisation requires considerable flexibility in the network nodes.
8
Status of Work
The work on adaptive organisations outlined in this paper is still in its preliminary stages. The node execution environments, control programs, communication mechanisms and adaption algorithms have recently been completed. What is still lacking is a generator for traffic scenarios and extensive testing. Current work is based on an ATM network model but under certain assumptions should also be applicable to packet-based networks (see Section 3).
272
9
Steven Willmott and Boi Faltings
Conclusions
Increasingly intelligent network management schemes, particularly for resource management, are vitally important for the smooth running of future networks. This not only true in connection-oriented networks, such as ATM, but also in packet-based networks where careful resource management is required to improve the ratio between potential load and available capacity (e.g. minimising the amount of over-capacity required). Coordinating the actions of on-line control programs throughout the network goes hand in hand with this need for better resource management, leading to interesting questions for active networks research: – How to facilitate this coordination? – How to prevent the potentially wide diversity of injected programs interacting catastrophically in the network? (even if none of them are violating security instructions). We introduce notions from the field of Distributed Artificial Intelligence on organisations and discuss the use of organisations for routing tasks. The paper contains two main threads of argument: 1. Organisations are important in ensuring that active networks behave coherently when executing many different user/system injected control programs. Programs executing at nodes require both information organisations (to be able to perform useful tasks) and control organisations (to ensure coherent behaviour). 2. Organisational structures in networks are heavily dependent upon active network techniques to provide flexible computation at network nodes and mechanisms for the dynamic update of control programs. This is particularly true of adaptive organisations. To help illustrate these points the paper also presents a bandwidth adaptive organisation scheme based on control programs which update themselves, their network information state and their organisation structure dynamically. The control programs coordinate with each other to ensure coherent execution of the resource management tasks.
Acknowledgements The authors would like to extend their thanks to the other partners in the SPPICC IMMuNe project (of which this work is part). Funding for IMMuNe from the Swiss National Science Foundation5 is also gratefully acknowledged. Thanks also go to Monique Calisti and Christian Frei for helpful comments on earlier drafts. 5
Project Number SPP-ICC 5003-45311.
Active Organisations for Routing
273
References 1. ATM-FORUM. P-NNI V1.0 - ATM Forum approved specification, af-pnni0055.000. ATM FORUM, 1996. 265 2. S. H. Clearwater. Market Based Control: A paradigm fo Distributed Resource Allocation. World Scientific, Singapore, 1996. 265 3. M. S. Fox. An Organisational View of Distributed Systems. IEEE Transactions on Systems, Man and Cybernetics, SMC-11(1):70–80, 1981. 265, 266 4. C. Frei and B. Faltings. A dynamic hierarchy of intelligent agents for network management. Workshop on Artificial Intelligence in Distributed Information Networks (held at IJCAI’97), 1997. 267, 269 5. L. Gasser. DAI Approaches to Coordination. In N. M. Avouris and L. Gasser, editors, Distributed Artificial Intelligence: Theory and Praxis, pages 31–51. Kluwer, 1992. 265 6. G. Goldszmidt and Y. Yemini. Distributed management by delegation. In Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS’95), pages 333–341, Los Alamitos, CA, USA, May30 June–2 1995. IEEE Computer Society Press. 265 7. F. Guichard and J. Ayel. Logical Reorganisation of DAI Systems. In Proceedings of the ECAI-94 Workshop on Agent Theories, Architectures and Languages (ATAL’94), pages 118–128. Springer Verlag (as Lecture Notes in Artificial Intelligence 890), August 1994. 267 8. T. Ishida, L. Gasser, and M. Yokoo. Organization Self-Design of Distributed Production Systems. IEEE Transactions on Konwledge and Data Engineering, 4(2):123–134, April 1992. 267 9. T. W. Malone. Modeling Coordination in Organisations and Markets. In A. H. Bond and L. Gasser, editors, Readings in Distributed Artificial Intelligence, pages 151–158. Morgan Kaufmann, 1988. 265 10. M. R. Siegl and G Trausmauth. Hierarchical Network Management: a Concept and its Prototype in SNMPv2. Computer Networks and ISDN Systems, 28(4):441–452, February 1996. 265 11. J. M. Smith. Programmable Networks: Selected Challenges in Computer Networking. IEEE Computer Magazine, 32(1):40–42, January 1999. 262 12. David L. Tennenhouse, Jonathan M. Smith, W. David Sincoskie, David J. Wetherall, and Gary J. Minden. A survey of active network research. IEEE Communications, 35(1):80–86, January 1997. 263, 264 13. David L. Tennenhouse and David J. Wetherall. Towards an active network architecture. Computer Communication Review, 26(2), April 1996. 264 14. Z. Wang and J. Crowcroft. Quality-of-Service Routing for Supporting Multimedia Applications. IEEE Journal on Selected Areas in Communications, 14(7), 1996. 269 15. S. N. Willmott, C. Frei, B. Faltings, and M. Calisti. Organisation and Coordination for On-line Routing in Communications Networks. In A. L. G. Hayzelden and J. Bingham, editors, Software Agents for Future Communication Systems. Springer Verlag, 1999. 267, 269
A Dynamic Interdomain Communication Path Setup in Active Network Jyh-haw Yeh, Randy Chow, and Richard Newman Dept. of Computer and Information Science and Engineering University of Florida Gainesville, FL 32611, USA {jhyeh,chow,nemo}@cise.ufl.edu
Abstract. An internetwork is composed of many administrative domains (ADs) with different administrative and security policies for protecting their own valuable resources. A network traffic flow between endto-end stub ADs through intermediate transit ADs must not violate any stub or transit domain policies. Packets may be dropped by routers that detect a policy violation. Therefore, it is necessary for a communication session to set up a communication path in which all constituent routers are willing to serve for the session so that data packets can be delivered safely without being discarded. Moreover, such a communication path cannot always guarantee successful packet delivery if the intermediate routers or links are prone to failure or congestion. This paper proposes a dynamic interdomain communication path setup protocol to address these issues. The protocol is dynamic in the sense that the path determination strategy is distributed and a path can be reconfigured to bypass a failed or congested router or link. These two dynamic features require the intermediate network nodes to perform some computation and to make some decisions. The implementation of the protocol relies on the computational capability of an active network in which active nodes in the network can provide computational capabilities in addition to traditional communication. Thus, the design of the protocol is based on the assumption of an active network architecture. The protocol will be a useful tool for all connection-oriented applications in active networks.
1
Introduction
An internetwork consists of many heterogeneous domains managed under different administrative authorities. For secure interdomain resource sharing, an administrative policy must be defined for each individual authority to specify eligible traffic flows between end-to-end domains and among transit domains. Each domain must have a mechanism to enforce its policy by either serving or dropping packets flowing through them so that valuable resources are not abused
The research is partially supported by NSA under contract number : MDA-904-98C-A892
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 274–285, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Dynamic Interdomain Communication Path Setup in Active Network
275
by unauthorized accesses. The enforcement mechanism requires authentication and authorization of each packet according to the local domain policy. In a firewall system [1], network traffic filtering is performed on a per-packet basis independently by firewalls in each domain. The authorization process must be performed for every packet in every firewall along the path from source to destination. It is time-consuming since each router needs to consult its local domain policy for authorizing each packet, especially when there are complicated domain policies and various types of requested service. However, a domain policy most likely will allow the same access privilege for all packets in a communication flow, and every packet in a communication flow normally has the same requested type of service. Because of these two properties, many proposed interdomain access control protocols [2,3,4] and policy routing protocols [5,6,7,8] change the authorization process from a per-packet basis to a per-flow basis by building a secure and authorized communication path before transmitting application data. The path setup in these protocols establishes a sequence of routers on route to the destination in which each router agrees to provide the services. For the purposes of authentication and data integrity, a secret session key is generated and distributed to each router in the path. After the path setup, all data packets flow through the same path to the destination. Each data packet carries a MAC (Message Authentication Code) signed by the session key. Instead of consulting with the local domain policy for authorization, each router verifies the MAC using the session key. If the verification succeeds, the service is granted to the packet. In this way, each router only performs the (expensive) authorization process once in the path setup for the entire communication session, and only performs an efficient MAC verification on each packet. IDPR (InterDomain Policy Routing) [5] is a typical policy routing protocol that entails establishment of a communication path using a static path determination strategy. Static path determination means that only the source router determines the path. Each router must maintain a consistent routing information database (RID), containing the connectivity of the internetwork and the associated domain policy of every node, for the computation of feasible paths. The large storage size requirement and the difficulty of maintaining consistency of RIDs among routers make this approach inefficient. This paper proposes a dynamic interdomain communication path setup protocol using a dynamic path determination strategy to eliminate the necessity of RIDs. In contrast to the static approach, the dynamic approach shifts the responsibility of path determination from one node to a set of nodes. Another dynamic feature of this protocol is that a path can be changed to bypass a failed or congested router/link while the connection remains active. Details of this protocol are described in Section 4. The dynamic nature of the proposed protocol requires additional computation and functionalities at the intermediate network nodes. This requirement can be achieved by using the emerging active network architecture [9,10,11,12] in which a network can be treated as a computing
276
Jyh-haw Yeh et al.
engine, as well as a communication network. Therefore, the proposed protocol is designed under the assumption of an active network architecture. Section 3 briefly describes the architecture of an active network.
2
Static Path Setup - Policy Routing
IDPR is a routing protocol designed for connection-oriented interdomain applications. It is composed of three primary protocols: 1. Policy Update Protocol : This handles reliable flooding of link state updates throughout the internetwork. 2. Path Setup Protocol : This installs and maintains routing information at intervening routers. 3. Packet Forwarding Protocol : This forwards data packets along a previously established path. In IDPR, each router has a RID for storing network topology of ADs and their associated transit policies. To provide a consistent view of the internetwork among all routers, the Policy Update Protocol maintains consistency among RIDs by using reliable flooding of link state updates throughout the internetwork. Another important component of IDPR, Path Computation, computes the routing path in accordance with source and transit domain policies. It is not a protocol in that each domain can implement its own version of Path Computation. Note that because of the consistency among all RIDs, the routing path computed by the source router is feasible since it most likely will be agreed on by all transit routers. Thus, an interdomain communication protocol can be implemented using underlying IDPR policy routing facilities to determine a feasible routing path. Having the ability to compute a feasible routing path in a source router, the path setup can confidently commence without the fear of rejection. In the Path Setup protocol, the source router sends out a Setup packet containing the computed path. For each transit router receiving this Setup packet, it checks its local transit policy to determine whether to accept or reject this path. In case of acceptance, the router creates an entry for this session in its Forwarding Information Database (FID). The purpose of this FID is for packet forwarding. An entry in FID consists of a path ID, previous and next router in the path, and possibly a session key. After an entry in the FID is built, the router forwards the Setup packet to the next router in the path, and the process is repeated for the subsequent routers. Once a path has been set up by the path setup protocol, the packet forwarding protocol forwards the user data packets along the path. Each router in the path uses the path ID and session key to check the authenticity and integrity of received data packets. For data packets with correct verification, the router forwards them to the next router as recorded in FID. In this way, all data packets for a session can flow through the established path to the destination. IDPR Path Setup is static because the path is determined ahead of time statically and solely by the source. Each transit router can only accept or reject
A Dynamic Interdomain Communication Path Setup in Active Network
277
the proposed path; it plays no role in the path computation. This violates the following general philosophy: the one providing the service should make the decision. Therefore, a dynamic path setup protocol is proposed in Section 4 in which, in contrast to IDPR, each router determines the next segment (router) of the path.
3
Active Network Architecture
Active networks are a novel approach to network architecture in which each switch in the network provides a computational environment such that customized computation can be performed on the fly based on the messages flowing through them. In essence, the network becomes programmable for any specific application. Currently, a new network protocol generally requires a lengthy standardization process before it can be deployed. However, under an active network architecture, the deployment of a new network protocol can be immediate. Traditional data networks passively transport bits from one end system to another. The network does not care much about the contents of the data payloads it carries, and they are transferred between end systems without modification. Computation is limited within such a network, e,g., header processing in packetswitched networks. The exponential growth of the Internet has brought diverse applications that may require intermediate network nodes to perform some computation on application data. For example, Web browsing can be enhanced if the intermediate nodes support Web page caching, and a path setup protocol needs to encrypt, decrypt, and validate packets at the intermediate nodes for safe key distribution. These two unique features of active network technologies, that routers are programmable for customized computation and new protocols are easily deployed, are ideal for the development and implementation of the dynamic path setup protocol proposed in this paper. Interdomain path setup is not only an application of active networks, it is also an essential tool for all connection-oriented applications in active networks. There is a strong synergy between the two. The active network research group at MIT has identified two approaches to an active network architecture, discrete and integrated, depending on whether programs and user data are transported separately or in an integrated fashion [9,10]. The proposed dynamic path setup protocol follows the integrated approach. To program a network, the integrated approach changes the passive packets of traditional network architectures to active capsules, which are programs with user data embedded. There are many details and issues in the design of an active network that are beyond the scope of this paper. For communication path setup, we will concentrate only on the programming with capsules for protocol implementation. The encoding of capsules has yet to be standardized by the active network research community. Thus, a functional description of capsules in our protocol is given rather than detailed program codes.
278
Jyh-haw Yeh et al.
To implement a path setup protocol in an active network, each router should be equipped as an active node. Since the underlying active network is a distributed computing engine, the determination of a feasible routing path can be decentralized. Thus, each router no longer needs to maintain a large routing information database to keep track of the internetwork topology as in the Policy Routing approach.
4
Dynamic Path Setup
The proposed dynamic path setup protocol in active networks uses active capsules that contain control information for iterative negotiation of the next router from source to destination through some qualified intermediate nodes. Once a path has been set up, the protocol is also responsible for the liveness of the connection by providing a path repair mechanism for the reconfiguration of the path upon failure of a link or a router. Before the detailed description of the protocol given in Sections 4.2 and 4.3, some data structures and packet types used in this protocol are described in the following subsection. 4.1
Soft State and Packet Types
In active network terminology, a “soft state” for a communication session specifies the current status of the session, and is maintained in each participating node. For dynamic path setup, the soft state consists of four data structures in each node of the path. These data structures, listed below, either specify some useful information or record the status of the session. Note that these four data structures in each node can be uniquely identified by the session ID. A Security Association (SA) contains the session ID, session key, encryption algorithms and all other information concerning security. A Qualified Neighbor List (QNL) in a node contains the available neighbors that are willing to provide services for the session. A Node Traversed (NT) is a sequence of nodes that have already been traversed by the Setup Capsule (the Setup Capsule is described later). During path setup, the Setup Capsule carries an NT that is updated by each node. The NT field can be used to avoid routing loops. After the path has been built, the NT contains the path and each node should have a copy of it. A Path Status (PS) is a two bit register that keeps track of the up/down status of the previous and next routers in the path. Packets in this protocol are divided into two categories - Capsule and Message. A Capsule carries a program for which a receiving router will fork a dedicated process to execute the program. Control information carried within a message is usually expected by a waiting process. There are four different capsules and eight different messages used in this protocol. All capsules and messages should carry a session ID for routers to identify the session.
A Dynamic Interdomain Communication Path Setup in Active Network
279
Setup Capsule : A Setup Capsule is generated from the source host. It tries to build a routing path to the destination. A Setup Capsule contains the SRS (source requested service) and an NT structure. Repair Capsule : A node generates a Repair Capsule if it detects a failed or congested router/link. This capsule is used to find a detour route to bypass the failure or congestion. The Repair Capsule contains the SRS and two segments of the original NT separated by the failed or congested router/link. Auth-Req Capsule : An Auth-Req Capsule is generated by a Setup Capsule or a Repair Capsule in each router and broadcasted to all neighbor nodes to collect the authorization information from them. An Auth-Req Capsule should contain the SRS. Data Capsule : A Data Capsule carries the application data and a program for processing the data. Yes/No Message : Upon receiving the Auth-Req Capsule, each neighbor router checks its local policy. A “Yes/No” message is then returned indicating whether the policy allows the SRS or not. Based on these “Yes/No” messages from its neighbors, a node builds a QNL allowing the node to choose the next router to which to forward the Setup Capsule or Repair Capsule. Grant Message : A Grant message is generated from the destination router if a path is found. It is sent all the way back to the source through the path just found. The Grant message carries the final NT (path) and the security association. Upon receiving the Grant message, each router stores the NT and the security association in local storage for future use. Negative Message : A Negative message is generated and sent back to the previous router in the NT if a router could not find any feasible neighbor to which to forward the Setup or Repair Capsules. This occurs either when there is no qualified neighbor or all qualified neighbors send back Negative messages. Alive Message : After a path has been built, each router periodically sends an Alive message to its two adjacent neighbors in the path to confirm continued path viability. Error Message : An Error message is generated if a Data Capsule violates the authenticity and integrity checks based on the security association. Repair Done Message : A Repair Done message is generated if a detour path is found. It is sent back to the failure detecting router through the detour path found. This message carries the new NT (path) and SA. Each router keeps the new NT and SA in local storage for future use. Tear Down Message : A Tear Down message requests routers iteratively to release the memory allocated for a session. If the path can not be repaired, all routers should receive the Tear Down message. On the other hand, if the path is repaired, only the routers not in the new path should receive the Tear Down message. NT Update Message : This message is to update the NT stored in the routers after a detour path has been built. 4.2
Path Setup
The network is treated as a computing engine in the active network architecture. For any network application, this computing engine needs some input programs
280
Jyh-haw Yeh et al.
from the source host and generates outputs for the application. In order to build a communication path, the program should instruct the engine to find a routing path to the destination in which all participating routers are willing to provide the requested service. In the proposed path setup protocol, the input program is a Setup Capsule. The source host prepares and sends the Setup Capsule to the network engine. The Setup Capsule contains an empty NT at the beginning, and adds one router each time when it traverses a router. When a router receives a Setup Capsule, it executes the following procedures. 1. Routing Loop Prevention : It checks the NT to see whether there is a routing loop. A routing loop exists if its own ID is already in the NT. In such a case, a Negative message is send back to the previous router and this process is terminated. 2. Destination Router Process : It checks whether the destination host resides in the current router’s subnet. If it does, the router generates the security association (SA) and sends it along with the NT to the source host via the previous router in the NT in a Grant message. 3. Neighbor Information Collection : If the node is not the destination router, it broadcasts an Auth-Req Capsule containing the source request service (SRS) to all neighbor routers. A QNL is built of all neighbor routers responding with a “Yes.” Each neighbor router executes the Auth-Req Capsule by comparing its local policy and the SRS. A “Yes” message is sent back if the policy allows the SRS. Otherwise, a “No” message is returned. 4. Next Router Selection Process : If the QNL is not empty, this procedure adds the current router to NT. It selects one neighbor router from the QNL until it is empty. For each selected neighbor router, two steps are performed. (1) Forward the Setup Capsule to the selected neighbor router. (2) Put the Setup Capsule Process into sleep and wait for Negative/Grant messages. If a Negative message is received, select another neighbor router and go to step (1). If a Grant message is received, save the SA and NT in local storage and forward the Grant message to the previous router or to the source host if the current router is the source router. If all neighbor routers in QNL are selected and no Grant message is received, a Negative message is sent back to the previous router, the current router is deleted from NT, and this process is terminated. The path determination strategy in this protocol is dynamic because each router decides the next segment (router) of the path. The scenario for this strategy is depicted in Figure 1. 4.3
Path Repair
As described earlier, another dynamic feature of the protocol is to bypass a failed or congested router or link during data transmission. In order to achieve this capability, another input program for instructing routers to repair the path must be installed in each router before transmitting the user data. This path
A Dynamic Interdomain Communication Path Setup in Active Network
281
INTERNET RT
p
ca
req
uth
a
setup cap
source host
RT Y
auth-req cap
RT
au
N req
RT
RT
dest host
th-
ca
p
RT
Y
se
tu
pc
ap
RT
Fig. 1. The scenario for dynamic path determination repair program can be carried in the Setup Capsule or another dedicated capsule. It is activated in each router when the router receives a Grant message, i.e., the path is set. The path repair program basically has three procedures. 1. User Data Forwarding : After a path has been built, the path repair program expects Data capsules from the source host. It checks each capsule’s authenticity and integrity based on the security association. If the checking is successful, the data processing program in the Data Capsule is called and executed. The path repair program resumes the control and forwards the Data capsule after the called program is completed. 2. Failed Router/Link Detection : The path repair program periodically sends an Alive message to the previous and next routers in the path. A bit in the two bit register PS is set if the corresponding router’s Alive message is received within a default time threshold T . If both bits are set, PS is reset at the end of T . By examining the PS, a router can keep track of the Up/Down status of its adjacent routers in the path. Another potential mechanism to detect failure is by the receipt of an Error message from the next router in the path after forwarding a Data Capsule to it. Consider the situation when the next router went down and came up again within the threshold T and the Up/Down protocol did not detect the failure. The soft state of this session would have been lost in next router and it could not recognize the forwarded Data capsule. An Error message would be returned by the next router. 3. Path Repair : If a failed router or link is detected by the Up/Down protocol, the failure detecting router will select another neighbor in its QNL and send a Repair Capsule to it. The scenario for issuing a Repair Capsule in this case is shown in Figure 2. The Repair Capsule is the same as Setup Capsule except that it finds a detour path from the failure detecting router to a reconnecting router. A reconnecting router can be any router closer to endpoint than the failed router in the original path. If the failed router or link is detected by receiving an Error message, the Repair Capsule is sent to the next router to reconnect the path. In both cases, the Repair Capsule carries two segments
282
Jyh-haw Yeh et al.
RT
ap
qc
re th-
au
RT
Y
auth-req cap
au
RT
N req c
-r auth
ap
Do ne
Repair
pai r Re
Y
Y
Repair
RT
RT
auth
-req
Repair Done Repair Done
Source Host
RT
Repair
RT Tear Down
Failed Router
cap
Y
RT
NT Update Failure detecting router
ap
eq c
th-
NT Update Reconnecting router
Dest Host
Fig. 2. The scenario for a successful path repair of the original NT. The first segment of NT starts from the source router to the failure detecting router. This segment of NT is treated the same way as the NT in Setup Capsule: the router ID is inserted to the list when the capsule travels through a router. The second segment of NT contains the remaining routers of the original NT with the failed router marked. It is used to determine whether a new path has been found. A new NT can be computed from these two segments of NT when the path repair is completed. When a router receives a Repair Capsule, it compares its ID to the two segments of NT. There are four possible results of the comparison. (1) If its ID is the marked router in the original NT, the router simply rebuilds its soft state to reconnect the path for the session. (2) If its ID is in the first segment of NT, a routing loop exists and a Negative message is sent back to the previous router in the first segment of NT. (3) If its ID is in the second segment of NT, the router is a reconnecting router and a detour path has been found. The sequence of routers following the reconnecting router in the second segment of NT is appended to the first segment of NT to form a new NT for the new path. Then the reconnecting router sends a Repair Done message containing the new NT to the failure detecting router through the new path. (4) If its ID does not appear in either of the two segments of NT, the same procedure for the Setup Capsule is applied to the Repair Capsule. That is, the router broadcasts the Auth-Req Capsule, builds the QNL, selects a router from the QNL, and forwards the Repair Capsule to the selected router. If the QNL is exhausted without finding a detour path, a Negative message is sent to the neighbor that sent the Repair capsule. For successful path repair, the consistency of the soft state NTs among all routers in the new path should be maintained as follows. (1) Upon receiving the Repair Done message, the failure detecting router issues an NT Update message containing the new NT to all prior routers in the new path.
A Dynamic Interdomain Communication Path Setup in Active Network
283
(2) After sending a Repair Done message, the reconnecting router should issue an NT Update message containing the new NT to all posterior routers in the new path and a Tear Down message to all prior routers in the second segment of NT. In the Up/Down protocol, two routers may detect a failed router/link simultaneously. Only the one nearer to the source in the path issues the Repair Capsule. The one nearer the destination should expect a Repair Capsule or a Tear Down message for a default time threshold T . If nothing is received within T , it means that the path repair is not successful and a Tear Down message is issued to the routers posterior to it in the path. If the failure detecting router receives a Negative message, it means that the path repair attempt was not successful. The router should try to send another Repair Capsule to another neighbor router in its QNL until the QNL is empty. If all neighbor routers on the failure detecting router’s QNL return Negative messages, the correct action is to either send a Tear Down message all the way back to the source or send a Negative message back to the previous router in the NT. The first choice stops searching for another path and informs the source that there is no path at this time; the source must perform path setup to recreate a path. The second choice continues to search another detour path starting from the previous router in the NT. Which choice to select should depend on the upper layer application. In this protocol, there are three major programs running in each participating router. These programs are the Setup Capsule, path repair program, and Repair Capsule. Table 1 briefly summarizes the differences among them.
Setup Capsule path repair program Repair Capsule
objective set up a path
relationship issued by source at the beginning detect the failure and activated after Setup maintain the path Capsule is finished set up a detour route issued by the path to bypass the failure repair pgm when detecting a failure
running routers all routers receive this capsule all routers in the path all routers receive this capsule
Table 1. Comparison among Setup Capsule, path repair program, and Repair Capsule
5
Conclusion
A dynamic interdomain path setup protocol is presented in this paper. We assume that the underlying network architecture is an active network, a novel network architecture in which each intermediate network node is able to perform some customized computation. The protocol is different from others in
284
Jyh-haw Yeh et al.
that it utilizes a distributed path determination strategy and has an automatic path repair mechanism for handling failures. We believe that the philosophy behind this strategy is better suited to the context of interdomain communication, i.e., the one providing the service should make the decision. Furthermore, the automatic path repair mechanism can detect failures much quicker since they are always discovered first by the nearest router. The nearest router can initiate the path repair process immediately upon detecting a failure. An active network environment facilitates both the computational and communication aspects of the protocol. Many active network applications will rely on such a connection setup protocol to establish active nodes for their perspective computation. The reconfiguration of a communication path upon failures is an important protocol design issue. To repair a failed communication session, a fast path repair process is crucial. The process should be efficient in failure detection and recovery. The proposed protocol achieves fast failure detection. However, failure recovery requires a fast detour path setup and relies on a good QNL selection criteria. Each router in the path setup protocol has no knowledge of the QNL in the selected next router. If the QNL is empty, a Negative message may be returned and cause a rewinding of the search. This kind of path setup rewinding should be limited by use of good selection criteria. One way to decrease the possibility of the path setup rewinding is to increase the information available to each router, for example, the past history of previous path setup or QNL of the neighbors. To make this information available to each router may slow down the path setup in another respect. Therefore, future work on this protocol should include finding good QNL selection criteria. Multiple failure is another issue not addressed in the protocol. There may be multiple routers detecting different failures more or less simultaneously. It is not a good idea to have multiple path repair processes running at the same time. Simultaneous repairs may result in redundant work and even incorrect path repair due to interference. This is also an open area to be addressed.
References 1. W. R. Cheswick, S. M. Bellovin: Firewalls and Internet Security. Addison-Wesley, 1994. 275 2. D. Estrin, G. Tsudik: Visa scheme for inter-organization network security. Proc. of the 1987 Symposium on Security and Privacy, 174-183, 1987. 275 3. H. Park, R. Chow: Internetwork Access Control Using Public Key Certificates. Proc. Of IFIP SEC 96 12th International Security Conf., 237-246, May 1996. 275 4. J. Yeh, R. Chow, R. Newman: Interdomain Access Control with Policy Routing. Proc. of Sixth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems, 46-52. Oct 1997. 275 5. M. Steenstrup: Inter-Domain Policy Routing Protocol Specification: Version 1. RFC 1479. July 1993. 275 6. E. C. Rosen: Exterior Gateway Protocol (EGP). RFC 827, Oct 1982. 275 7. Y. Rekhter, T. Li: A Border Gateway Protocol 4 (BGP-4). RFC 1654, July 1994. 275
A Dynamic Interdomain Communication Path Setup in Active Network
285
8. Y. Rekhter: Inter-domain Routing Protocol (IDRP). J. Internetworking Res. Experience, V 4, pp. 61-80, 1993. 275 9. D. L. Tennenhouse, D. J. Wetherall: Towards an Active Network Architecture. Computer Communication Review, Vol. 26, No. 2, Apr 1996. 275, 277 10. D. L. Tennenhouse, S. J. Garland, L. Shrira, M. F. Kaashoek: From Internet to ActiveNet. Request for Comments, Jan 1996. 275, 277 11. D. J. Wetherall, J. V. Guttag, D. L. Tennenhouse: ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. IEEE OPENARCH, Apr, 1998. 275 12. S. Bhattacharjee, K. Calvert, E. Zegura: An Architecture for Active Networking. High Performance Networking, Apr 1997. 275
Active Network Challenges to TMN Bharat Bhushan and Jane Hall GMD FOKUS, Berlin, Germany {bhushan,hall}@fokus.gmd.de
Abstract. Data and telecommunications communities have been witnessing two new developments in recent years: emerging active networking concepts and the revision of TMN. In the light of these developments, this paper investigates the extent to which TMN is suitable for managing future networking technologies and includes recommendations about where the TMN standards could be evolved to better accommodate active networking technologies. This paper is rightly timed because public telecommunications operators want to learn more about the usefulness of active networks but are sceptical about what they offer. Solutions to the challenges made by active networks will shape the course that active networks take, and management of active networks is one of the most difficult challenges. TMN wields authority in the field of telecommunications management and can be a key instrument for the management of active networks. Keywords: Active Network, Network Element Management, TMN, Configuration Management, Telecommunications Networks.
1
Introduction
The TMN standards were developed when what can be termed „conventional“ networking technologies were dominant. Since then considerable changes have taken place in the telecommunications market, in network usage, and in the network and value-added services provided as well as in the applications deployed, together with forecasts of even greater and more diverse usage to come. Such changes suggest that current telecommunications network architectures and management will be confronted with many new demands being made upon them. An increasing number of services, all with differing QoS requirements, together with large numbers of users, very large numbers of physical and logical entities, and more demanding customer requirements imply a much greater complexity to be managed and increasing dependence on telecommunications management systems to provide the support that providers need. Research into active networking technologies has proceeded in an attempt to improve the flexibility of networks by supporting more dynamic types of networking [1,2,3]. Two basic approaches exist: the discrete approach and the integrated approach. In the Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 285-299, 1999. Springer-Verlag Berlin Heidelberg 1999
286
Bharat Bhushan and Jane Hall
discrete approach, code is loaded on demand and cached for later execution (out-band service deployment). In the integrated approach, each active package carries its own code, i.e., contains a program fragment (in-band service deployment and subsequent processing). Active networking technologies are being made possible by the advances in software and hardware technologies, and in particular in distributed object-oriented engineering. Combined with modern distributed systems tools, the aim is to provide a greater degree of flexibility, reconfigurability, programmability and management to aid dynamic and rapid service creation to meet the demands of a competitive open service market. However, such technologies cannot be deployed successfully without the related management systems being available to provide the support required by telecommunications operators to meet the increasingly sophisticated demands of a customer-oriented market. This paper is therefore investigating the extent to which the TMN standards1 are appropriate for active networking technologies in order to assess the kinds of changes that need to be introduced into the standards in order to meet the challenges of the future. This paper is structured as follows. Section two introduces related work on managing active networking technologies. Section three investigates the suitability of current TMN standards for managing future networking technologies such as active networks. Section four presents potential active network implementations and shows how they could be managed with TMN. The conclusions summarise the findings of the paper.
2
Related Work on Managing Active Networking Technologies
The concept of active networks has not been limited to networking technology; it has found its application in network management too. This section surveys the efforts being made to apply the concepts and active network technologies to network management. BBN Technology has been developing a programming language and a system architecture which allow packets to carry diagnostic programs. Small packets are encapsulated within an Active Network Encapsulation Protocol [9]. Routing protocols and table updates could be implemented in capsules as could network management functions, such as those provided by SNMP or CMIP. Smart Packets aim to enhance network management by bring the management closer to the node being managed [10]. They extract from nodes the management functionality and construct them with (special-purpose) programming languages, thus making network control more agile. The heterogeneity of platforms used for running the system is a major problem in fault diagnosis. The Smart Packet technology will allow the diagnostics programs to customise themselves according to the platforms on which they run. With the help of Smart Packet technology, the diagnosis of new protocol and services will be possible before special tools are developed. 1
The TMN standards considered by this paper include the TMN functional, information and physical architectures [4] and the principal M.3000 series documents most relevant to the investigation [5] [6] [7] [8].
Active Network Challenges to TMN
287
The Xbind [11] platform has been developed with an aim to create, deploy and manage advanced multimedia services. The Xbind platform also enables the development of mechanisms for distributed network resource allocation, real-time multi-vendor switch control, broadband signalling, and multimedia transport. An implementation of the qGSMP [12] on Xbind has proved the Xbind to be practicable. NetScript [13] is a programming language and an agent-based middleware environment for building and operating active networked systems. With NetScript agents, management functions of intermediate nodes can be programmed and configured as application or user requirements change and these agents can be dispatched to remote networks. NetScript agents can monitor remote and strategically important network nodes. In this application, NetScript agents can function as highlevel filtering programs that watch the network traffic in real-time. The NetScript environment provides its users with a universal abstraction of programmable network devices and the NetScript language itself is a dynamic language. These two features of NetScript can be used to create powerful and programmable SNMP agents. The Darwin project [14] addresses the problem of runtime resource management for advanced network services and applies the concept of active networking to resource management. This approach proposes customised resource management mechanisms to support value-added service applications. These mechanisms allow the applications and service providers to tailor resource management, which in turn adjusts service quality, to suit their needs. The Darwin system architecture is being used to implement a management technique in which QoS is provided for a specific service or application[15]; this is in contrast to a fixed QoS framework, which is used conventionally. The DIRM (Dynamic Integrated Resource Management) project [16] is investigating the area of dynamic QoS management. This project aims to integrate many existing QoS management systems into a set of high-level APIs, which will allow applications to control the QoS for their communications over RSVP. The IEEE P1520 project [17] is developing a reference model for a future network architecture where the developers and administrators of value-added services will be able to access the network (for controlling and deploying services) through a standardised programming interface. The reference model will allow the developer to access and control three different networking technologies, namely ATM networks, the Internet, and SS7 (Signalling System 7) networks, through a single and unified interface. The objective of the reference model is to open up the management and signalling interfaces used to access network nodes and combine then into a single and standardised high-level programmable interface. Active nodes built on the above-mentioned active network-based management applications can allocate resources to the various virtual networks, undertake configuration and reconfiguration functions, also fault, routing and flow control functions. They can take decisions on their own and can report back to the management system. The work mentioned above is oriented to the Internet and the SNMP area. Active networking technology has not yet been examined from the TMN aspect. That is what this paper has set out to do. Research into the impact of each of two approaches (i.e., discrete and integrated approaches) on TMN is a sizeable piece of work. Therefore, this paper considers the active network concept as whole (i.e., including both discrete and integrated approaches) and is investigating the impact of
288
Bharat Bhushan and Jane Hall
the concept on TMN. Some of those aspects of active networks that make most impact on TMN and are relevant to public telecommunications operators are considered for investigation. From the research into active networks it is evident that the use of software as the basic structural foundation of networking is continually spreading. In order to assess the impact of this fact on TMN, this paper gives special attention to the software-related aspects of active networks.
3
The Adequacy of TMN in a Changing Environment
Future networking technologies that are likely to be deployed in telecommunications networks will themselves need to be managed. This section investigates the TMN standards to determine where they do not easily apply to managing emerging and future networking technologies such as active networks.
3.1 Architecture The TMN architecture is based on standardised interfaces, protocols and messages for the exchange of management information [4]. Generic information models and standard interfaces are regarded as the means for performing general management. The functional architecture is based on function blocks which exchange information over reference points. Everything is therefore related to a function block, and all the functional components are located within one of the function blocks. Although the OS (Operations System) physical architecture „must provide the alternatives of either centralizing or distributing the OS functions and data“ (section 6.3 of M.3010) it is pointed out that: „More study is required on how communications between distributed OS functions may be accommodated under the TMN architecture.“ Distributed OS functions were not fully supported as it was not clear enough at the time how this could best be achieved. It was also not so necessary as the cost of memory and processing encouraged centralising OS functions in a scarce resource. Continuous decreases in costs have rendered this argument invalid. In addition, advances in software engineering have made feasible a greater distribution of OS functions than was originally envisaged in the TMN architecture. Such distribution would make management systems more flexible and would be more appropriate for a variety of application areas, including the management of active networking technologies. The TMN architecture is a hierarchical architecture based on logical layers. In M.3010 (section 5.1.2) it is stated that the „element management layer manages each network element on an individual or group basis and supports an abstraction of the functions provided by the network element layer.“ The hierarchical managing/managed approach is reflected in the functional architecture. According to M.3010 (section 2.1.2) the OSF (Operations System Function) processes information „for the purpose of monitoring/coordinating and/or controlling telecommunication functions“ and the NEF (Network Element Function) communicates with the TMN „for the purpose of being monitored and/or controlled.“ The idea at the time, and realised in commercial implementations of the standards, is of a hierarchical approach
Active Network Challenges to TMN
289
with the NEF at the bottom and no NE interacting for management purposes on a peer-to-peer basis with another NE (Network Element). To manage active nodes in the same way is to ignore the rich and diverse functionality that can be supported by active nodes. In particular, the peer-to-peer interaction typical in a distributed software system such as that represented by active networks, cannot be easily accommodated in such a hierarchical approach. Recommendation: First, examine the logical layered architecture from the perspective of future networking technologies and investigate alternative approaches that take into account the possibilities of technological and software advances and that would provide a framework for more flexible and efficient management functionality. Second, distributed OSs should be investigated and incorporated into the standards and no longer just considered „for further study“. It should be possible to distribute the TMN OS functions and data to a greater extent than is currently allowed for.
3.2 Manager / Agent Paradigm TMN is based upon the OSI manager/agent model of the CMIP protocol. In M.3010 (section 3.2) it is stated that the manager role „issues management operation directives and receives notifications“ and that the agent role is „to respond to directives issued by a Manager.“ The interactions are simple and the agent can only respond to the manager’s commands. Apart from issuing notifications, the agent cannot initiate its own interaction and it cannot interact with other agents. The roles are clearly demarcated and although the possibility of management processes taking on both manager and agent roles during a single association is acknowledged in M.3010 (section 3.2), it is pointed out that such a case „requires further study.“ Synchronisation issues and concurrent bidirectional requests were also left for further study, which resulted in an approach where roles are assigned to management processes within a given context and remain fixed for that association, with no possibility of concurrent interactions. This is a shortcoming for more complex management functionality where it could be advantageous for management processes to be able to take on both roles during an association and where negotiation and not just command and response interactions are required. This feature could be useful in complex networks of the conventional type, for example over an X interface for peer-to-peer inter-domain management, as well as in the management of active networks where active nodes could take on the role of both manager and agent in a management interaction. The roles of manager and agent need to be more dynamic and may have no intrinsic significance in future networking technologies where well-defined manager and agent roles for an entire association could restrict the management processes in carrying out their tasks. Recommendations: First, investigate the manager/agent interaction of CMIP in the light of the active network technology paradigm and propose alternatives based on different interaction models and cooperative management solutions. The idea of centralising intelligence in a manager which initiates all request/response interactions with agents is no longer always appropriate and to maintain such centralised control could hamper the future effectiveness of conventional management systems. Second,
290
Bharat Bhushan and Jane Hall
in connection with this, the possibility of a balanced CMIP enabling both manager and agent roles to be adopted during a single association should be examined. This would remove the restrictions currently being experienced in limiting a management process to adopting only one of the two roles in an association.
3.3 Management Information Model TMN uses the OSI management information modelling concepts, including the structure of management information and GDMO (Guidelines for the Definition of Managed Objects). MOs (Managed Objects) are specified according to this model using static definitions that are fixed at compile time. M.3020 (section 3.3.13) states that a „management information schema specifies the information model of a managed system as seen over a particular interface by a particular managing application or system. The information model contains all the object classes that can and will be provided by that managed system to the managing application or system. In particular, it defines the naming structure for those object classes within the managed system. The management information schema defines all possible communication of information between the managing application or system and the managed system.“ This represents an approach to management information modelling based on MIBs that are expected to exist for some time without requiring modification. Managed object definitions to manage networks and network elements have therefore been standardised with the intention of being valid for years whereas an active node can itself make changes to the MIB. The operations that the MO supports are defined in the specification for the managed resource and no extensions or modifications are possible. When changes occur they are provided in a new version of software that supersedes the previous version in a regulated upgrade. All the management information must be in a schema, the bounds of the information are fixed. There is a given set of attributes and actions in an MO definition that cannot take advantage of developments requiring dynamic changes to MO specifications while the system is running. Conditional packages are available which are run time features but they must already exist at compile time, further conditional packages have to wait for a new compilation. Work on on-line extensions to MIBs, which would also be more appropriate for managing active networking technologies, is currently being undertaken but has not yet been included in the TMN standards. The static approach to information modelling is a rather inflexible paradigm for future networking technologies as, for example, the attributes and actions of an active node can be extended dynamically as active nodes incorporate new actions, behaviour, and attributes. This can be enabled with a distributed and extensible software environment which provides the user with a higher-level abstraction of the proprietary management interface to resources. Active nodes can change their functionality, they can load and execute programs that can change their behaviour and they can recognise different protocols dynamically. Current network management is not based on concepts to support such characteristics because the state of the network is reflected by the information that is obtainable from the proprietary management interface and,
Active Network Challenges to TMN
291
in effect, this idea bypasses the software environment that provides a higher-level abstraction of the proprietary management interface. When the TMN standards were developed the emphasis in networking technologies was more on the hardware and equipment comprising the network and less on the networking software. Over the course of time the software supporting networking technologies has become more significant, with more networking functionality being executed in software. Such software-based equipment, that can also support more extensive management functionality, is not really accommodated in this statement. When managing software, different assumptions can apply as software can be changed dynamically during the lifetime of the resource. A different approach to the functionality of the network and its extensibility can be adopted in a way not possible with a hardware oriented approach. The appropriateness of management information models will clearly be challenged in such an environment. If active network elements and active networks are to be comprehensively monitored and controlled, a different understanding of a network element and network may be needed. Active networks represent a distributed system with a significant software component and so need to be managed like a distributed system, i.e., there is additional software to manage. This represents a new challenge to conventional network and network element management which has tended to concentrate on managing hardware. TMN needs to encompass the management of networking software as an integral part of network and network element management. Recommendations: First, investigate how dynamic specifications can be incorporated into management information modelling and how shared management knowledge can take account of dynamic updates. Second, review the definition of a network and network element in the light of the greater significance of software at the network and network element layers.
4
Active Networking Technology Implications for TMN
This section undertakes a closer examination of the management of active networks, investigating in more detail TMN features compared with corresponding active networking features and looking at the implications of active networking technologies for TMN information modelling.
4.1 On-the-Fly Network Resource and Networking Service Creation Active networking offers extensibility for the provisioning and management of virtual networks. Packets carrying new management services can enhance management systems operating within NEs and create network resources on the fly by partitioning existing resources. Examples of network resources are routing tables, buffer space and bandwidth. Examples of network services are connection admission control, congestion notification and control, resource management, and traffic shaping. In order to highlight the need and applicability of rapid creation of network resources and services, an example of dynamic provisioning of virtual networks over a
292
Bharat Bhushan and Jane Hall
single ATM transport network and their efficient management is given here. Virtual networks as described in [18] are based on the programmability of networks. Multiple virtual networks within a single physical network may also be needed to support the needs of different users and applications. To meet these needs, active control of network resources and services is used to provide virtual networks and to guarantee an optimum level of the QoS. Each of the virtual networks is a set of resources allocated to a type of network traffic and can be controlled by a control-system tailored to the specific needs of applications, allowing application-specific customisation of network control. Besides allocating resources, restrictions may also be imposed on the network traffic operated by a virtual network. When the provisioning of a virtual network takes place, efficient management will be required. Active networking allows the virtual network provider to modify parameters of the network resources and services according to dynamically changing user and transport protocol requirements, to associate names with resources and to perform accounting management. Another application of active networks to multi-transport-protocol stacks is given here. The current trends in transport networks suggest that multimedia applications impose stricter QoS requirements than data-oriented applications and require different types of transport protocol compared with a single-medium application. The programmable transport architecture described in [19] addresses this need. It allows applications to choose from and bind to many different protocol stacks according to the applications’ transport requirements. The architecture includes a control and management front-end that carries out dynamic resource provisioning, accounting and QoS control. In the examples of virtual networks and multi-transport-protocol stacks, the key idea of network programmability is implemented by middleware (e.g., front-end in a multi-transport-protocol stack). Middleware is generic and portable enough to support different types of user requirement but at the same time it cannot be managed as a selfcontained software package. It should be managed together with the other networking components and its management should therefore be incorporated into the network management system. One of the prerequisites for the operation of active networks is on-the-fly creation of resources and services, and middleware plays a key role in meeting this prerequisite. On-the-fly creation of resources and services imposes two requirements on TMN. The first requirement concerns network service creation. Using active network technologies, new network services can be dynamically added to the managed network. Viewing this situation from within a TMN environment, if an agent is to manage continually added services the Q adaptor should also be updated in order to interface with newly created services. In fact, the TMN physical architecture can be implemented in a variety of physical configurations (section 4.1 of M.3010). M.3010 (section 2.1.5) states that „the Q adapter is used to connect as part of TMN those non-TMN entities that are NEF-like and OSF-like“. If non-TMN entities that are NEF-like are connected to a TMN system, Q adaptors are difficult to modify rapidly because they are part of the NEF and updating them can require substantial work in the management information model. In this use of Q adaptors, it should be researched as to how Q adaptors allow the NEF to dynamically interface with newly created services. If non-TMN entities that are
Active Network Challenges to TMN
293
OSF-like are connected to a TMN system, Q adaptors may be replaced by new Q adaptors in their entirety, which may not require substantial work. In this use of Q adaptors, it should be researched as to how Q adaptors allow the OSF and MF (sections 2.1.1 and 2.1.4 of M.3010) to dynamically interface with newly created services. It should also be researched as to how this replacement may affect other TMN activities. The second requirement concerns resource partitioning. New types of resource can be dynamically created (or partitioned) from the existing ones. This will require the information model to be changed (new GDMO classes to be created, agents to be recompiled, and so on) to represent the new types of created resources (sections 2.2.2, 2.2.1.3, and 3 of M.3010). But the current TMN information model does not allow new GDMO classes to be created on the fly. Recommendation: Substantial operations take place in active network elements, resulting in the rapid creation of new types of network resource and network service. TMN functional components should facilitate the development of management systems that allow operators and administrators to customise a TMN system in order to manage newly created services and resources. Support for customisation should be provided in both the managing system and the managed system. An application should be able to exercise control over dynamically changing network resources and entities providing network services. A managed system (e.g., an agent interacting with a virtual network) should be able to customise event reports according to the unpredictable changes that occur in the managed network elements.
4.2 TMN Information Modelling for Active Networking This section looks in more detail at TMN information modelling. It investigates information modelling aspects of active networking, discussing how network information can be modelled for active networking and showing how this modelling can influence M.3100. Since TMN information modelling is relevant to the managed system side of a TMN system, only the static nature of a managed system, in particular the MIB, is discussed in this section. Dynamic Changes in Transmission Schemes versus the Static Nature of Network Resource Representation. Active networking architectures, with the help of virtual machines, view networks as a large „seamless“ distributed system. These virtual machines are software modules that provide high-level programmable interfaces to the applications and are able to deploy new protocols dynamically. (e.g., active networking architectures PLANet [20], supporting technology ANTS (Active Network Transport System) [21], and PLAN (Programming Language for Active Networks) [22].) In this large distributed system, the management of individual intermediate network elements is loosely coupled with end systems, where the management system may execute. The intermediate managed elements can take „minor“ management decisions (e.g., which packet to route, which packet to block, find the best path, etc.) independently. An active network-based architecture can allow users to dynamically invoke network services (e.g., address resolution and routing), dynamically change the configuration of routers and switches, thus modifying a given path.
294
Bharat Bhushan and Jane Hall
Figure 1 illustrates three different aspects concerning dynamic changes occurring in the transmission scheme within a network element and the static nature of representation of network element resources. Parts A and B of Figure 1 show the telecommunication functions and resources of a network element and their representations in the NEF. Part C illustrates dynamic changes occurring in an active network element. M.3100 (section 3.5.1) defines the cross-connection MO class as follows: „A point to point cross-connection can be established between: one of CTP (Connection Termination Point) Sink, CTP bi-directional, TTP (Trail Termination Point) Source, TTP bi-directional, or .....“. Once a point-to-point type of crossconnection has been established between two end points of a given type, then the type of cross-connection does not change and the cross-connection MO remains in the MIB until explicitly deleted. In an active networking environment the type of one of the end points used in an already established cross-connection may change in the NE as a result of the allocation of new resources, a change in the transmission scheme or a change in the routing path (see below for an example of multicast). For example, a cross-connection between a CTP sink and a CTP source changing to a crossconnection between the same CTP sink and a new type of GTP (Group Termination Point) (see part B of Figure 1). Or a simple cross-connection changing to a multipoint cross-connection. That is, a new type of GTP or a new type of cross-connection was created in the network element but the MIB did not have a new type of GTP or new type of cross-connection to represent them. This implies that a change occurred in the telecommunication functions of the network equipment but the its representation within NEF remained unchanged (see part A of Figure 1). vpCTP vpTTP vcCTP vcTTP connect disconnect
A
representation of telecom functions and resources
cross-connection { CTP sink – CTP source }
NEF
cross-connection Dynamic changes
Dynamic changes
Representation
multipoint cross-connection
cross-connection { VP VC
CTP sink – new type of GTP }
actual telecom functions and
new resources in delete network element
C
out of TMN
Active Network Element (Switch, router) Link
CTP Sink CTP Bidirectional TTP Sink TTP Bidirectional GTP
cross-connection (eg virtual path)
B
CTP Source CTP Bidirectional TTP Source TTP Bidirectional GTP
Figure 1: Dynamic Nature of Switches and its Effect on Information Modelling Fragments at TMN NE
Active Network Challenges to TMN
295
The above change should be reflected dynamically in the NEM (see part A of Figure 1). Two parts of the NEM that should be updated by the changes occurring in the network element (see Figures 5, 8 and 18 of M.3100) are the containment tree of the NE MIB and the behaviour of the cross-connection MO that changed. With conventional TMN, the status and configuration of the NEM OSF can be changed but the „old“ cross-connection MO will have to be deleted and a new one instantiated with a new pair of termination points. However, objects of a new class cannot be dynamically instantiated because of the absence of the object class definition that represents a new type of resource in the GDMO specification. An example of a multicast server illustrating the above mentioned dynamic change in connection schemes is given here. In multicast, cell replication is done within the network by the network nodes at which a connection splits into two or more branches. In multicast server operation, all end systems wishing to transmit onto a multicast group set up a point-to-point connection with a device called a multicast server. The multicast server receives the cells from end systems across a point-to-multipoint connection. It then serialises and replicates cells and retransmits them to multiple end systems across a point-to-multipoint connection. The multicast server can also connect to all end systems of a multicast group across a point-to-point bi-directional connection and can replicate the cells before transmission. In another multicast scheme, all end systems of a multicast group connect with each other across a pointto-multipoint connection. Hence all nodes operate as transmitter and receiver for one another. This scheme needs no multicast server at all. Therefore, there can be at least three multicast schemes and, depending upon the requirements of applications and users, one scheme can be replaced by another. Theoretically multicast schemes may seem easy to work but practically they are complex, inflexible and make internetworking of existing protocols with ATM difficult. All existing end systems must register information about the newly joined end system. The complexity of the multicast operations can be greatly reduced and advantages of internetworking over ATM can be fully gained by combining autoconfiguration with active network technology. Autoconfiguration features can be implemented as a middleware module, utilising the node-resident processing facilities. This will enable a more agile control of multicast operation and will greatly facilitate the administration and operation of network nodes. (Also refer to [23] and [24] for an error control scheme in active networks). If this scenario is viewed from within a TMN system environment, it appears that in order to complete the task of administration and operation, the managing and managed TMN systems should update themselves according to the changes that occurred in the network element and middleware functions. The administrative, operational and availability states of network nodes should be updated to ascertain the normal functioning and readiness of cross-connections. In summary, management information related to networks in M.3100 is depicted in a tightly tiered structure. The entire management information is organised in a tree shape structure, having the network layer information at the top and the network element layer information at the bottom. The MIB should be able to change its status and configuration as changes occur in an individual network element. The changes
296
Bharat Bhushan and Jane Hall
should also be reflected in the upper layers of the tree structure. In the current form of the TMN functional and information architectures, there is a disparity in management activities that take place in the managed element and the status and configuration of the MIB. This disparity may not pose a problem at the network layer viewpoint but problems of naming may surface at the network element layer. Recommendation: The dynamic nature of active networks raises two questions about information modelling used by the TMN information architecture. First, how will the naming scheme change as new types of resources are dynamically created? Second, how will the consistency and completeness be maintained as new services are dynamically created at a resource? Research is needed to dynamically construct the name of new types of resources, keep the status of the MIB and the managed network consistent, and to maintain completeness (to check whether the services offered by a particular resource are really available). Addressing these issues will make OA&M (Operation, Administration and Maintenance) more flexible and easily scalable for rapidly changing networks. Dynamically Enhancing Functions of NEs and their Effect on the NLM Viewpoint. Active networks allow the operator to dynamically (module-wise) build up the functions of remote network elements and to enhance their functionality. For example, switchlets can be used to download onto a bridge some specialised software module which implements an algorithm (e.g., a tree-search algorithm) and can enhance the function of an ordinary (or non-active) repeater to a self-learning (active) bridge [25]. In an ELAN (Extended LAN), a self-learning bridge can change the logical interconnection between two LANs (or partition the ELAN), thus changing the logical topography of the ELAN. As another example, the functionality of a router can be enhanced to look for a suitable path to route PDUs (Protocol Data Units) on the best available bandwidth when networks become congested [26]. This will change the configuration of the end-to-end connection that is passing through the router. These dynamic changes and many of the same sort in a network element will have an effect on the following aspects of the network layer information model: Physical and Logical Information of the Network Object Class. The Network object class (section 3.1 of M.3100) represents the interconnected (logical and physical) telecommunications and management objects capable of exchanging information. These objects may be owned by a specific provider or associated with a specific network service. The network element is dynamically enhanced to provide a type of service that is different from what it provided during the configuration of the network (or, initialisation of the network layer MIB). In this situation, the Network, ConnectionR1 and TrailR1 object classes should be able to update themselves (e.g., alteration in the containment relationship in the MIB, modification in the transmission function) under the changed configuration of the network. However, TMN does not allow dynamic changes because it has „traditionally been based on a static model, with a fixed location of function, a high degree of central intelligence, and single protocol“ [27]. Recommendation: A network architecture built on active network technologies will be able to change its composition and configuration according to user demand. In
Active Network Challenges to TMN
297
order to manage this situation, research is needed to find the means of reconstructing relationships among network elements at the network layer. Relationships among network elements should change automatically as the functionality of a network element that is interconnected to a network is upgraded (e.g., a repeater is enhanced to function like a bridge). Information on topographical interconnection and configuration of network elements should be able to reflect the dynamically changing network status. Object classes may also emit notifications as a result of a change in the configuration; thus these new notifications can also be defined in the current version of TMN information model.
5
Conclusions
This paper has investigated the impact of managing active networking technologies on TMN. Combined with advances in distributed software technologies, which they can leverage, active network technologies will continue to evolve along with other network technologies that will emerge, no doubt becoming cheaper, easier to use and providing more power. The paper has attempted to pose questions that may need to be answered in the face of such developments. It can be regarded as intending to provide some suggestions about possible future technologies and their impact on TMN in order to stimulate thought about TMN and its evolution at a time of constant and farreaching change in the telecommunications industry. The main conclusion of the paper is that certain management and communication aspects of TMN (functional, physical, and informational architecture) are „static“ when used for the management of active networks, which provide more agile and dynamic functionality. We have considered only these „static TMN“ and „dynamic active networking“ aspects and made recommendations on how TMN could be evolved to overcome the problems associated with the paradigms of today. A new paradigm for management that can effectively manage active networking technologies and their evolution is required. In particular, the use of distributed object-oriented technologies, middleware, object-oriented distributed processing environments, and also agent technologies need to be considered for future TMN evolution. In other words, the consideration of how to manage future networking technologies, such as active networks, can act as a contributing trigger in the trend towards adopting open distributed environments for telecommunications management.
Acknowledgements This work has been carried out within EURESCOM Project P812 and the authors wish to thank their colleagues in this project for their constructive discussion of the ideas presented here.
298
Bharat Bhushan and Jane Hall
References 1. J. Biswas et al., „The IEEE P1520 Standards Initiative for Programmable Network Interfaces“, IEEE Communications, 36 (10), October 1998, pp. 64-70. 2. K.L. Calvert et al., „Directions in Active Networks“, IEEE Communications, 36 (10), October 1998, pp. 72-78. 3. D.L. Tennenhouse, „A Survey of Active Network Research“, IEEE Communications, 35 (1), January 1997, pp. 80-86. 4. Principles for a Telecommunications management network, ITU-T Recommendation M.3010, 1996. 5. TMN interface specification methodology, ITU-T Recommendation M.3020, 1995. 6. Generic network information model, ITU-T Recommendation M.3100, 1995. 7. TMN management services and telecommunications managed areas: overview, ITU-T Recommendation M.3200, 1997. 8. TMN management functions, ITU-T Recommendation M.3400, 1997. 9. Active Network Encapsulation Protocol (ANEP), Request for Comments, Active Networks Group, Authors: D.S. Alexander, B. Braden, C.A. Gunter, A.W. Jackson, A.D. Keromytis, G.J. Minden, D. Wetherall, Status: DRAFT, July 1997. 10. B. Schwartz, et al., „Smart Packets for Active Networks“ January, 1998. http://www.bbn.com 11. A.A. Lazar, K.S. Lim, and F. Marconcini, „Realizing a Foundation for Programmability of ATM Networks with the Binding Architecture“, IEEE Journal of Selected Areas in Communications, 14 (7), September 1996, pp. 1214-1247. 12. C.M. Adam, A.A. Lazar, and M. Nandikesan, „QoS Extensions to GSMP“, COMET Group, Department for Electric Engineering and Centre for Telecommunication Research, Columbia University, New York, USA, Technical Report 471-97-05, April 1997. http://comet.ctr.columbia.edu/xbind/qGSMP 13. Y. Yemini and S. da Silva, „Towards Programmable Networks“, IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM '96), L’Aquila, Italy, October 1996. 14. P. Chandra et al., „Darwin: Customizable Resource Management for ValueAdded Network Services“, Proceedings Sixth IEEE International Conference on Network Protocols (ICNP’98), Austin, October 1998. 15. E. Takahashi, et al, „A Programming Interface For Network Resource Management“, Proceedings Openarch’ 99, New York, March 1999. 16. Dynamic Integrated Resource Management (DIRM) - BBN Distributed Systems Project funded by DARPA/ITO. http:// www.dist-systems.bbn.com/projects/DIRM/ 17. J. Biswas, et al., Application Programming Interfaces for Networks, 1998. http://www.iss.nus.sg/IEEEPIN 18. S. Rooney, et al., „The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks“, IEEE Communications, 36 (10), Oct 1998, pp. 42-53. 19. J.-F. Huard, and A.A. Lazar, „A Programmable Transport Architecture with QoS Guarantees“, IEEE Communications, 36 (10), Oct 1998: 54-62.
Active Network Challenges to TMN
299
20. M. Hicks, et al., „PLANet: An Active Internetwork“, Proceedings IEEE INFOCOM '99, New York, 1999. http://www.cis.upenn.edu/~switchware/ 21. D.J. Wetherall, J.V. Guttag, and D.L. Tennenhouse, „ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols“, Proceedings IEEE OPENARCH’98, San Francisco, April 98. 22. M. Hicks et al., „PLAN: A Packet Language for Active Networks“, Proceedings of the International Conference on Functional Programming (ICFP)’98. http://www.cis.upenn.edu/~switchware/ 23. G. Parulkar, et al., „An Error Control Scheme for large-scale multicast applications“, Proceedings IEEE INFOCOM '98, April 1998, San Francisco. 24. U. Legedza, D. Wetherall, and J. Guttag, „Improving the Performance of Distributed Applications Using Active Networks“, Proceedings IEEE INFOCOM '98, April 1998, San Francisco, pp. 590-599. 25. D.S. Alexander et al., „Active Bridging“, SIGCOMM'97, Cannes, September 1997, Computer Communication Review, 27 (4), October 1997, pp. 101-111. 26. S. Bhattacharjee, K. Calvert, and E.W. Zegura, „An Architecture for Active Networking“, High Performance Networking (HPN’97), White Plains, NY, April 1997. http://www.cc.gatech.edu/projects/canes/pubs.html 27. A. Manley and C. Thomas, „Evolution of TMN Network Object Models for Broadband Management“, IEEE Communications, 35 (10), October 1997, pp. 60-65.
Survivability of Active Networking Services Amit Kulkarni, Gary Minden, Victor Frost, and Joseph Evans1 Department of Electrical Engineering and Computer Science University of Kansas, Lawrence, KS 66045 {kulkarn,gminden,frost,evans}@ittc.ukans.edu
Abstract. Active networking enables the rapid creation and deployment of innovative services in the network. This paper describes an architecture to ensure survivability of services in an active network through the dynamic reconfiguration of service components. We enhance the primary-backup protocol used in traditional distributed systems with active networking features that enable programmable selection of the service location and dynamic reconfiguration of the system if the primary service provider fails.
1
Introduction
One of the goals of active networking is to enable the development and deployment of new, secure and robust services and protocols in the network. The nodes of an active network are programmable, enabling creative, application-specific protocols to be installed in the network. SmartPackets carry code for the protocols to the active nodes, which provide a platform for its execution. Examples of application-specific protocols are audio bridging [1], sensor fusion applications [2], booster protocols [3] and services for improving application performance in wired/wireless networks like active filtering and active merging [4]. In traditional networks, where the primary network services e.g. routing and addressing, are well-known and fixed, a number of custom protocols and approaches [5,6] have been proposed and implemented to ensure survivability of these services during network outages. But an active network enables new services to be deployed at very short notice in the network. For example, in the MAGIC-II project [7], active networking is used to implement an active merging service that provides application-specific merging of client requests to reduce bandwidth demand of terrain visualization applications like TerraVision [8] over wireless links. The active merging service is deployed into the network at application startup and deinstalled when the application terminates. Making services survivable in a dynamic environment like an active network using traditional techniques would require 1
This research is partially funded by the Defense Advanced Research Agency (DARPA) under contract F19628-95-C-0215.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 299-306, 1999. Springer-Verlag Berlin Heidelberg 1999
300
Amit Kulkarni et al.
considerable effort on the part of the network administrator. Survivability issues in active networks are being tackled by the Survivable Active Networks project [9] and the NESTOR project [10]. The Survivable Active Networks project focuses on techniques to prevent, detect, isolate and recover from various threats and malicious attacks on active network elements. NESTOR attempts to develop technologies for self-configuring and self-managing network systems that can withstand failure. In contrast, we focus on providing a survivable infrastructure for services deployed in the active network. Survivability is achieved in any system through the replication of components using either the state machine approach or the primary-backup approach [11]. The state machine approach replicates the service state at all servers and presents client requests to non-faulty servers. In the more popular primary-backup approach, one server is designated as the primary and all other servers are backups. Clients make requests to the primary server only. If the primary fails, one of the backups takes over, i.e. a failover occurs. These approaches require extensive manual coordination and intervention, e.g. determining locations of the primary and backup servers, configuring setup files, and resetting the configuration during recovery from failure. In this paper, we modify the primary-backup protocol to demonstrate how a conventional protocol can be extended in an active networking environment with novel features such as automatic setup, programmable selection of service location and dynamic reconfiguration after system failure. Automatic setup is achieved by allowing the primary to install its own backup server. The primary sends ferret packets that execute a service-specific algorithm at the active nodes they visit to determine if the node can host the backup server. The primary chooses a backup from the set of hosts identified by the ferret packets. This enables dynamic selection of the best available location for the backup at the current time instead of statically chosen locations. If the primary fails, the system dynamically reconfigures itself with the backup taking over as the new primary server and selecting its own backup server. If the backup fails, the primary automatically selects a new backup. This process can continue ad infinitum whenever there is failure of the primary or backup server.
2
Protocol Operation
In the description of the fault-tolerant protocol below, the server is an in-network proxy implementing a specific service. Clients are assumed to be unaware of the location of the service and hence the protocol also implements a discovery phase. The assumption of location transparency is particularly relevant to active networking because services can be deployed dynamically in an active network. The protocol also makes the following assumptions: 1. The system is 1-fault tolerant i.e. the probability of both the primary and backup failing in some interval of interest is very small. 2. A crash implies that either the server fails or the node hosting the server fails. 3. Link failures in the network do not partition the network. 4. SmartPackets transmitted over a link are not lost, duplicated or corrupted.
Survivability of Active Networking Services
301
5. 6.
SmartPackets sent over a link are received in the proper sequence in a finite time. SmartPackets take a maximum round trip time δ to traverse a distance of interest (n) and back, measured in hops from the sending node. 7. The underlying routing protocol guarantees that SmartPackets always use the shortest route to their destination. Maintaining adequate link redundancy can satisfy assumption 3. Assumptions 4 and 5 are guaranteed by an underlying reliability protocol. In the following sections, we describe the different phases of the protocol.
2.1 Selection of Backup In the setup phase, the network administrator injects the server as a SmartPacket into the network. The SmartPacket routes itself to a destination chosen by the administrator, where it starts executing as the primary server. The primary server searches for the location of its backup by flooding its immediate neighborhood with ferret packets that contain a service-specific algorithm to determine the suitability of a neighboring node to host the backup server. Ferret packets apply the service-specific algorithm at every node it visits to evaluate if the node can host the backup server. Examples of the selection criterion are: 1. Distance from primary, e.g. node closest to the primary is chosen as backup. 2. Buffer space (memory) or processing power. 3. Special node functionality such as its function as a gateway. 4. Number of adjoining active nodes (i.e. degree of connectivity)
Active Node Bid Primary n-hop neighborhood
Ferret packet
Bid
Fig. 1. Primary sends ferret packets to solicit bids
Each ferret packet contains a replica of the server code so that a fully functional server can be created at the backup location, once it is identified. Flooding is controlled by restricting ferret packets to visit at most n nodes, marking nodes as VISITED and by carrying lists of visited nodes. Candidate backup servers instantiated at neighboring nodes send in bids to the primary (see Fig. 1). The primary collects bids using two “small state” locations BidStatus and BidDrop, which are named caches at the active nodes accessible to SmartPackets belonging to the same service
302
Amit Kulkarni et al.
application. Bid packets deposit bids in BidDrop if BidStatus has value OPEN and return to the originating node to indicate a BidConfim status to the candidate backup servers. Bid packets arriving after the bidding closes return with a BIDCLOSED status. If the BidStatus location is not available, the Bid packet returns with a NOBIDLOCATION status indicating primary failure. After a timeout interval δ, the primary makes its selection from the available bids and sends a Confirm packet to the selected backup location. Reject packets are sent to all candidate backup servers which terminate upon receipt of the packet. If the primary fails during the selection process, candidate backup servers receive a NOBIDLOCATION status. Candidate backup servers elect a primary from amongst themselves by implementing a bid-flooding protocol in which each backup server floods its bid message in its n-hop neighborhood. Each bid message deposits its bid at the location of a candidate backup server. After a time interval δ, each backup server checks the bids it has received. If its own bid is not the best bid, it terminates itself. Thus after time δ, the backup server with the best bid becomes the primary. There can be multiple primaries in the system at this point but they are separated by at least n hops and therefore serve different (and maybe overlapping) domains.
2.2 Normal Operation Since clients do not know a priori the location of the primary and backup servers, the primary server advertises its presence using beacon packets broadcast periodically with an interval τ. Beacon packets store information about the location of the primary server, the distance in hops to the server, the location of the backup server and the time when the information will expire, in the cache of the active nodes they visit. Beacon packets are restricted to visit at most n nodes. A client request is a SmartPacket that routes itself through the active network seeking information about the primary server. When it reaches an active node that has valid information in its cache, it obtains the location of the primary server and routes itself to that location. If the beacon information has expired, the request packet assumes that the primary has failed and routes itself to the location of the backup. The primary server also sends state update packets to the backup when new requests arrive and when pending requests are satisfied. The state update packets return to the primary with a SUCCESS status if they are able to update the backup’s state. Beacon packets are programmed to check the status of the backup (which lies within n hops from the primary) and return with its status if they reach the backup’s location. The backup uses the beacon packets as indication that the primary is functional. The backup sends a PrimaryTest packet to the primary if it does not receive a packet from the primary for time τ. The PrimaryTest packet returns to the backup with a FAILED status if the primary has failed. The backup then takes over as the new primary. The time to recover from failure is thus not more than τ + δ. Similarly, if the primary receives a FAILED status from a packet it sent, it begins selection of a new backup in time not exceeding τ + δ. When failover occurs, the backup sends out ferret packets as in the setup phase to select its own backup. It then starts sending beacon packets to the neighboring nodes in its n-hop neighborhood. The beacon packets overwrite cache information at the nodes, enabling future client requests to be routed to the new primary.
Survivability of Active Networking Services
303
2.3 Response Times The failover time for this protocol is τ + δ and the time taken by a primary to detect backup failure is τ + δ, as described in section 2.2. The reconfiguration time for the system is the time taken by a primary to select and appoint a backup server. This is the sum of the time taken by the ferret packets to search and locate suitable backup locations, and the time taken by the Confirm message to reach the selected backup. Since the maximum round-trip time over n hops is δ, the maximum reconfiguration time is δ+δ=2δ. If the primary fails during the backup selection process, the existing candidate backups detect failure after an interval δ when they send PrimaryTest packets, which take a maximum time δ to return with a FAILED status. The bid flooding protocol takes a maximum time δ to determine a new primary, which gives the maximum total response time as δ+δ+δ=3δ. The response times for the various scenarios are summarized in Table 1. Table 1. Response times
Event Maximum time taken by backup to become primary after primary failure Maximum time taken by primary to detect backup failure Maximum time taken by primary to select and appoint backup Maximum time taken to elect new primary if primary fails during backup selection process
3
Response time τ+δ τ+δ 2δ 3δ
Proofs of Service Properties
Since active networks permit deployment of user-supplied protocols and services, it is necessary to make some formal statements about their properties. In this section, we identify a few properties possessed by this protocol and attempt to prove their validity. The next step is to transform these properties into a formal description to enable mechanized checking, which is beyond the scope of this paper. Property 1: The primary chooses exactly one backup after the selection process. Proof: Bid packets from candidate backups arriving after the bidding closes return with a BidClosed status, causing those backup candidates to self-terminate. The primary makes its selection from the available bids and sends a Confirm packet to only one candidate backup. All candidates receiving the Reject packet terminate. Property 2: At most one server acts as the primary in its n-hop neighborhood. Proof: This implies that two servers within n hops of each other cannot both be primary servers. During normal operation, the backup does not change its status until a PrimaryTest packet sent to the primary returns with a FAILED status. If the primary dies during backup selection process, it is possible for two primaries to exist. However, the bid-flooding protocol ensures that only one primary exists in its n-hop neighborhood. Property 3: The protocol is deadlock-free. Proof: A deadlock occurs if the primary and the backup are both waiting for messages from the other. This cannot happen because packets sent by one to the other always
304
Amit Kulkarni et al.
return with a valid status. The backup detects failure and takes over if there is a lack of beacon message after time τ and a PrimaryTest message returns after a time not more than δ with a FAILED status. The primary detects failure and selects a new backup if the latest state update or beacon message returns with a FAILED status. Property 4: A client request always attempts to locate an active primary server. Proof: When a client request reaches an active node containing beacon information, it retrieves information about the location of the primary server and tests its validity. We consider the proof on a case-by-case basis: Case I: The information is valid but the primary has failed since last update. The request traversing towards the primary realizes the primary has failed either because there is no route to the primary, or the cache set up to deposit the request at the primary location does not exist. The request then retrieves the location of the backup server from the beacon cache and re-directs itself towards the backup. Case II: The information is invalid and the primary has failed. If the information is invalid, the request retrieves the location of the backup server from the beacon cache and travels towards the backup. Case III: The request reaches an active node that lies in the overlap of neighborhoods of two primary servers.
Primary A
Node 1 Primary B
Node 2
Fig. 3. Overlapping Domains
This can occur if the primary fails before selecting a backup and there are two new primaries that are more than n hops apart but share some active nodes (see Fig. 3). Assuming that primary A deposits its beacon information at the node 1 and keeps it valid, node 1 will be part of that primary A’s neighborhood. Similarly node 2 can be part of Primary B’s neighborhood. The request flip-flops between the two servers as it travels from node 1 to node 2 until the routing takes it out of the region and it eventually makes progress towards only one of the two servers. Another scenario is when the routing is such that the next hop from node 1 towards primary A is node 2 and vice-versa. The request will then bounce between the two nodes infinitely. This is impossible because if one of the nodes, say node 1, is closer to primary A than primary B, then it will have been captured by primary A. If node 2 is on the route from node 1 to primary A then that implies node 2 is closer to primary A than primary B and yet primary B captures node 2. This contradicts assumption (6), which guarantees that only shortest routes are followed.
Survivability of Active Networking Services
4
305
Summary
In this paper, we presented an architecture for ensuring survivability of active networking services that reside inside the network. Our goal was to demonstrate that traditional survivability protocols such as the primary-backup protocol could be extended in an active network with features such as automatic setup of services, programmable selection of backup location and support for dynamic re-configuration in the event of system failure. We modified the protocol to support the above features through the use of code replication and distribution, and through the use of ferret packets. Additionally, we also proved some interesting properties of the service to serve as a basis for verifying the algorithm in a formal specification model. Correctness of the specification can then be checked mechanically to ensure the correctness of the services deployed in active networks.
References 1. Legedza, U., Wetherall, D., and Guttag, J.: “Improving the performance of distributed applications using active networks,” Proc. Of IEEE INFOCOM, San Francisco, April 1998. 2. Yeadon, N.:Quality of Service for Multimedia Communications, Ph.D. Thesis, Lancaster Univ., May 1996. 3. Bakin, D., Marcus, W., McAuley, A., and Raleigh, T.: “An FEC booster for UDP Applications over terrestrial and satellite wireless networks,” Intl. Satellite Mobile Conference, Pasadena, CA, June 19, 1997. 4. Kulkarni, A. and Minden, G.: “Active networking services for wired/wireless networks,” Proc. IEEE INFOCOM, New York, 1999. 5. Garcia-Luna-Aceves, J., and Murthy, S.: “A loop-free path-finding algorithm: specification, verification and complexity,” Proc. Of IEEE INFOCOM, Boston, MA, 1995. 6. Garcia-Luna-Aceves, J.: “A fail-safe routing algorithm for multihop packet radio networks,” Proc. IEEE INFOCOM, Miami, Florida, 1986. 7. Frost, V., Minden, G., Evans, J. and Niehaus, D.: MAGIC-II – A large scale internetwork supporting high speed distributed storage, http://www.ukans.magic.net. 8. Leclerc, Y., and Reddy, M.: “TerraVision II: using VRML to browse the world,” in Data Visualization, St. Louis, MO, October 1997. 9. Sekar, R.: Survivable Active Networks, Telcordia Technologies, Iowa State University and AT&T Research, URL http://rcssgi.cs.iastate.edu/seclab/projects/survivable.htm. 10. Yemini, Y., Konstantinou, A., and Florissi, D.: NESTOR: Network Self Management and Organization, http://www.cs.columbia.edu/dcc/nestor. 11. Mullender, S.: Distributed Systems, 2nd Edition, Addison Wesley, 1993.
306
5
Amit Kulkarni et al.
Appendix: Program Pseudo-Code proc ProxyServer while (IsPrmy == false) { (S:State, IsPrmy:boolean); try { location: Address; waitfor(stateUpdateMsg or backupServ: Address; beacon); BidStatus: SmallState; } catch(TimeoutException){ BidDrop: SmallState; send PrimaryTest packet; routeTo(location); if (status == FAILED) { CodeForBackup: IsPrmy = true;break;} if (IsPrmy == false) { } } prepare Bid; CodeForPrimary: send(Bid,SourceAddress); waitfor(Bid); send(ferretPkt, *); if (status == BIDCONFIRM) { while (BidStatus == OPEN) { try { // backup has not responded; waitfor(Confirm or Reject); for all interfaces { if (Reject) exit; send(ferretPkt, *);} } catch (TimeoutException){ waitfor(TimeOutInterval); //send PrimaryTest packet if (BidDrop != null) { //check if primary is alive // there are available bids send PrimaryTest packet; BidStatus = FALSE; waitfor(PrimaryTest); // select best bid if (status == FAILED) { backupServ = analyzeBids(); for all interfaces { send Reject packets; send(confirmPkt,backupServ);} send(BidPkt, *); } wait(TimeOutInterval); while (true) { Winner = analyzeBids(); if (Winner != self) exit; send(beaconPkt, *); if request arrives { else IsPrmy = true; }} }elseif (status==BIDCLOSED) { S' = S + reqState; send stateUpdateMsg; exit; process request;} }elseif(status==NOBIDLOCATION){ waitfor(Msg); // elect new primary if (status == FAILED) { for all interfaces { identify new backup;} send(BidPkt, *); } wait(TimeOutInterval); endproc ProxyServer Winner = analyzeBids(); if (Winner != self) exit; else IsPrmy = true;} }
A Secure Plan Michael Hicks and Angelos D. Keromytis Distributed Systems Lab CIS Department, University of Pennsylvania 200 S. 33rd Str., Philadelphia, PA 19104, USA {mwh,angelos}@dsl.cis.upenn.edu
Abstract. Active Networks promise greater flexibility than current networks, but threaten safety and security by virtue of their programmability. In this paper, we describe the design and implementation of a security architecture for the active network PLANet [HMA+99]. Security is obtained with a two-level architecture that combines a functionally restricted packet language, PLAN [HKM+98], with an environment of general-purpose service routines governed by trust management [BFL96]. In particular, we employ a technique which expands or contracts a packet’s service environment based on its level of privilege, termed namespace-based security. As an application of our security architecture, we outline the design and implementation of an active-network firewall. We find that the addition of the firewall imposes an approximately 34% latency overhead and as little as a 6.7% space overhead to incoming packets.
1
Introduction
Active Networks offer the ability to program the network on a per-router, peruser, or even per-packet basis. Unfortunately, this added programmability compromises the security of the system by allowing a wider range of potential attacks. Any feasible Active Network architecture therefore requires strong security guarantees. We would like these guarantees to come at the lowest possible price to the flexibility, performance, and usability of the system. This paper presents the design and implementation of a security architecture for PLANet [HMA+99], an active internetwork based on PLAN, the Packet Language for Active Networks [HKM+98]. Our approach is to partition the problem into two levels: language-based security for PLAN programs, complemented by namespace-based security for more general router services, governed by trust management. We briefly discuss PLAN and its role in this architecture, but focus more attention on service security. We present both architecture and implementation, and conclude with some applications of our approach, including a simple firewall that ‘filters’ active packets. [HK99], an extended version of this paper, contains more detailed motivation and performance analysis.
This work was supported by DARPA under Contract #N66001-96-C-852, with additional support from the Intel Corporation.
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 307–314, 1999. c Springer-Verlag Berlin Heidelberg 1999
308
Michael Hicks and Angelos D. Keromytis
protocol A
resource allocation
routing network management
core services service installation PLAN packet protocol B
Fig. 1. PLANet’s security architecture.
2
Architecture
Our security architecture is illustrated in Figure 1. The solid boxes define the two levels of the architecture: the contents of the central box define the PLAN level which is usable without need of credentials, while the remaining area forms the service level. This architecture falls along functional boundaries: all PLAN programs, by their nature, are safe (as defined below) and so may run unauthenticated, while, in general, service routines are unsafe, and must be partitioned by level of trust, visualized by the dotted boxes. We augment the PLAN level with a fixed set of ‘core services’ which are known to be functionally safe. This architecture is designed to guard against the standard threats to computational resources and their contents [AAKS99]. In particular, we defend against attacks that would deny service, seek to obtain unauthorized content, and misrepresent (spoof) identity. We explain PLAN’s role in defending against these attacks below. 2.1
PLAN
PLAN [HKM+98] is a small functional language with syntax similar to ML [Ler,MTH90]. To express remote computation, it includes a primitive OnRemote (among others) that evaluates an expression at a remote node. Invoking OnRemote will result in a newly spawned packet. By design, the language has properties that prevent some attacks. PLAN is resource- and expression-limited, thus preventing CPU and memory denial-ofservice attacks. For example, all PLAN programs are guaranteed to terminate1 , since PLAN does not provide a means to express non-fixed-length iteration or recursion. Additionally, PLAN programs are isolated from one another since there is no means of direct communication among them, and because the language’s 1
PLAN programs terminate as long as the services called also terminate.
A Secure Plan
309
strong typing and garbage collection prevent indirect means, such as through pointer swizzling or buffer overflows. Finally, a network resource bound counter, similar to IP’s “Time to Live” (TTL) field, is used to bound network resources.
3
Service Security via Trust Management
Because of their general-purpose nature, service routines may perform actions which, if exploited, could be used to mount an attack. A radical approach to this problem would be to prevent any service routine from being installed that could potentially harm the node. However, this would preclude the addition of service routines—for example, network management operations—that should be available to trusted users. We thus employ security mechanisms which allow authorized programs to access potentially unsafe service routines. 3.1
Trust Management
In determining the form of these security mechanisms, we arrived at some basic requirements. First, the mechanisms should be simple to understand and employ. Second, security policies should be modifiable as needed, while the system is operating. Furthermore, policy mechanisms should be flexible enough to anticipate future application needs. Finally, security mechanisms must scale to support increasing numbers of principals and their trust relations. To meet these requirements, our service security relies on trust management [BFL96,BFIK99]. Trust management assigns some level of privilege (or trust) to a user, or principal, of the system. In particular, if a running PLAN program wishes to invoke a privileged service routine or alter a service parameter, the principal associated with the packet must be authenticated, and then the operation must be authorized. If either step fails, the operation is denied. We consider the question of policy and mechanism for authorization below; details about our particular implementation of authentication and authorization are presented in the next section. 3.2
Policy and Mechanism
Before applying trust management, we must consider what sorts of policies we would like to express, and what particular mechanisms we shall use to enforce these policies. For our system, we want our policies to express what services, above the core services, are available to certain users. We also find it convenient to indicate which services should be unavailable for a particular user; this will be motivated in Section 5. For purposes of simplicity and scalability, we choose to map sets of principals to sets of services. We also need to manage delegation policies with regard to these mappings. For example, we might specify that the services in set s may be accessed not only by principal p, but also by those principals authorized by p. In keeping with our requirements, this policy should scale to include many nodes, principals, services; and be alterable on-the-fly.
310
Michael Hicks and Angelos D. Keromytis
Furthermore, we want to specify not only whether a service routine may be invoked, but how it may be used. For example, a resident state service which allows packets to leave state on the routers might apportion different amounts of space to different users. We should also be able to specify general resource usage parameters, such as CPU and memory use. To enforce security policy we require strong principal authentication, and use a policy manager on every node; more details are given in the next section. In our system, packets must authenticate themselves at some point before accessing privileged services; at this time, the appropriate services are added to (or subtracted from) the packet’s current service symbol table. We call this approach namespace-based security. Since PLAN is strongly typed and looks up services on an as-needed basis, programs are incapable of invoking code outside of this updated table. Additionally, we allow those services which may require policy-based parameterization to query the policy manager as necessary during their execution. For example, the resident state service mentioned above would query the local policy to determine how much memory the current principal was allowed to occupy. We feel there are some compelling advantages to this approach. First, namespace-based policies are simple to formulate and easy to change. Second, because namespace-based security is centrally-administered, individual service routines may be written without concern for security, and policies may change dynamically without worry of inconsistency. Furthermore, unauthenticated programs may access the core services without additional performance penalty. Finally, because namespace-based security is not by itself sufficient, we allow services to formulate their own usage policies. There is still some work to be done in our current system. Namespace-based security only applies to PLAN service routine calls, not calls between service routines. This is slightly more difficult, but entirely possible, since Caml, our service implementation language, provides a mechanism which may be used to implement namespace-based security: module thinning. The use of module thinning has been explored for active networks in [Ale98] and for mobile agent systems in [LOW98]. Also, while we have experimented with mechanisms for enforcing resource usage, we have yet to arrive at ones that are sufficiently lightweight. Relevant details may be found in [Hic98].
4 4.1
Implementation Authentication
Before a PLAN program may invoke a trusted service, its associated principal must be determined; this is the process of authentication. Authentication is typically done in a public-key setting by verifying a digital signature in the context of some communication (e.g., a packet). In PLAN, one obvious link between communication and authentication is the chunk. A chunk (or code hunk) may be thought of as a function that is waiting to be applied. In PLAN, chunks are first-class—they may be manipulated as
A Secure Plan
311
data—and consist internally of some PLAN code, a function name, and a list of values to be used as arguments during the application. A chunk is typically used as an argument to OnRemote to specify some code to evaluate remotely. A chunk may also be evaluated locally by passing it to the eval service, which resolves the function name with the current environment, performs the application, and returns the result. We have added a service called authEval which takes as arguments a chunk, a digital signature, and a public key. authEval verifies the signature against the binary representation of the chunk, and if successful, the chunk is evaluated. There are two key advantages to this approach. One is that a principal signs exactly the piece of code he wants to execute, and may only have extra privilege while executing that piece of code. Second, only those programs which require authorization will have the extra time and space overheads. However, there is no protection against replay attacks, and public key operations are notoriously slow. Furthermore, authentication is only unidirectional (principal to node), thus providing less confidence to the caller. We mitigate these problems by using a variant of the mutual authentication protocol described in [AAKS98]. 4.2
Authorization
As our policy manager, we have chosen to use the Query Certificate Manager (QCM) [GJ98], which provides comprehensive security credential location and retrieval services, employing a distributed ACL. While in this paper we are making use of QCM, our architecture is designed so that other policy managers be used instead. In particular, we are also experimenting with the KeyNote [BFIK99] trust-management system. QCM is used to specify the services to be added to or subtracted from the default service-environment by associating certain thicken and thin sets of services with a principal or set of principals. Once a principal has been authenticated, these sets are used to modify the default environment. The resulting service environment is then used during subsequent chunk evaluation. As an optimization, we can cache this environment for future reference, thus avoiding repeated invocations of QCM and reconstructions of the environment. A key advantage of using QCM is that it can be used for more than just specifying sets of principals on a per-node basis. In particular, sets described in a distributed manner impose no additional query complexity. For example, a node A may define a set which partially resides at another node B: l = { p1 , p2 , ... , pn } union B$m If the authorization service on A makes a membership test on set l, QCM will automatically query B if necessary. QCM may also make use of certificates, which are signed assertions about set relationships, to short-circuit remote queries. These may be passed as additional arguments to authEval, or may be obtained during node-node authentication. This allows QCM to implement both pushand pull-based information-retrieval.
312
5
Michael Hicks and Angelos D. Keromytis
A Simple Active Firewall
As an application of our architecture, we implemented a simple active firewall. Typically, firewalls filter certain types of packets, such as all TCP connection requests on certain port numbers. Usually such packets are easily identified by their protocol headers. However, in PLANet, and indeed in any active-packet system, there is no quick way to assess a packet’s functionality. Our approach is that rather than filter packets at the firewall, we associate with them a thinned service environment in which any potentially harmful services are removed. The packets may then be evaluated inside the trusted network using only those services. While this may seem to contradict our premise stated in Section 2 that the default environment should consist only of ‘safe’ services, in the context of a trusted Intranet, we would expect that the default privilege allowed to local packets exceeds that of foreign packets. Furthermore, we would not want to impose the overhead of authentication and authorization on local packets in the general case. To thin the environment of foreign packets, our firewall associates them with a guest identity that has the appropriate policy. To do this, the firewall encapsulates each packet with a small wrapper which calls authEval with the original chunk, using the guest identity. In general, this would require the firewall to sign all incoming packets. However, because the guest environment will provide less privilege than the default environment, we should be able to conceivably avoid the cryptographic cost: any authenticating principal whose environment is thinned and not thickened can be ‘taken at his word.’ In the base PLANet implementation, a two-hop ping takes 2.13 ms for a minimally-sized packet (80 bytes) and 3.06 ms for a maximally-sized one (1500 bytes). Changing the middle node to the ‘signing firewall’ adds 37% and 32% to the round-trip times, respectively, raising them to 2.91 and 4.03 ms. Between 1/3 and 1/2 of this overhead is attributable to signing and verification, depending on the packet size. For the firewall, the remaining overhead is due to encapsulation costs (which requires extra marshalling and copying), while for the end-host it is due to decapsulation and additional interpretation costs. Parallelism and specialpurpose hardware can further reduce cryptographic costs and improve latency and throughput. If we eliminate the cryptographic operations, we reduce the end-to-end ping times to 2.55 and 3.41 ms for minimal and maximal payload, respectively. This reduces the firewall-induced overhead to 20% and 11%. A smarter PLAN interpreter would also considerably improve overall performance. The firewall also imposes a fixed 101-byte space overhead due to the extra code and signature that is attached to the incoming packets. This translates to 126% and 6.8% space overhead for the minimal and maximal payload packets respectively. One way of mitigating this overhead is for PLAN to support code caching and language-level remote-references. Since all PLAN values are immutable, the contents of a remote reference may be safely cached without the need for a coherence protocol.
A Secure Plan
6
313
Related Work
Research in the area of security for active networks is in its early stages. The SANE [AAKS98] architecture is part of the SwitchWare Project [AAH+98] at the University of Pennsylvania. SANE is currently used in conjunction with the ALIEN architecture [Ale98]. Security is achieved in ALIEN through a combination of module thinning and type safety. Similar approaches have been taken in [LR99,BSP+95,vE99]. Other language-based protection schemes can be found in [BSP+95,CLFL94,HCC98,LOW98,Moo98]. The main difference between this work and SANE lies in that we can depend on a provably safe language (PLAN) for those packets that do not require special privileges. Furthermore, programming constructs available in PLAN (e.g., chunks), considerably ease the task of implementing security abstractions. A working group within the Active Networks project has been defining a common security meta-architecture [Mur98]. However, this architecture has not become concrete enough for implementation. Secure PLAN is currently being extended to support validation and verification [NL96,Nec97] for active extensions. We have demonstrated that our architecture addresses possible threats while still preserving the flexibility and usability of the system. This architecture is based on language safety, authentication, and trust management. We discussed the practicality and acceptable performance of our approach experimentally, in the context of an active firewall.
References AAH+98. D. S. Alexander, W. A. Arbaugh, M. Hicks, P. Kakkar, A. D. Keromytis, J. T. Moore, C. A. Gunter, S. M. Nettles, and J. M. Smith. The SwitchWare Active Network Architecture. IEEE Network Magazine, special issue on Active and Programmable Networks, 12(3):29–36, 1998. 313 AAKS98. D. S. Alexander, W. A. Arbaugh, A. D. Keromytis, and J. M. Smith. A Secure Active Network Environment Architecture: Realization in SwitchWare. IEEE Network Magazine, special issue on Active and Programmable Networks, 12(3):37– 45, 1998. 311, 313 AAKS99. D. S. Alexander, W. A. Arbaugh, A. D. Keromytis, and J. M. Smith. Security in Active Networks. In Secure Internet Programming [VJ99]. 308 Ale98. D. S. Alexander. ALIEN: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, September 1998. 310, 313 BFIK99. M. Blaze, J. Feigenbaum, J. Ioannidis, and A. Keromytis. The Role of Trust Management in Distributed Systems Security. In Secure Internet Programming [VJ99]. 309, 311 BFL96. M. Blaze, J. Feigenbaum, and J. Lacy. Decentralized Trust Management. In Proceedings of the 17th Symposium on Security and Privacy, pages 164–173. IEEE Computer Society Press, Los Alamitos, 1996. 307, 309 BSP+95. B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, S. Eggers, and C. Chambers. Extensibility, Safety and Performance in the SPIN Operating System. In Proceedings of 15th Symposium on Operating Systems Principles, pages 267–284, December 1995. 313
314
Michael Hicks and Angelos D. Keromytis
CLFL94. J. S. Chase, H. M. Levy, M. J. Feeley, and E. D. Lazowska. Sharing and Protection in a Single-Address-Space Operating System. In ACM Transactions on Computer systems, November 1994. 313 GJ98. Carl A. Gunter and Trevor Jim. Policy-Directed Certificate Retrieval. http://www.cis.upenn.edu/~{}qcm, 1998. 311 HCC98. C. Hawblitzel, C. Chang, and G. Czajkowski. Implementing Multiple Protection Domains in Java. In Proceedings of the 1998 USENIX Annual Technical Conference, pages 259–270, June 1998. 313 Hic98. Michael Hicks. PLAN System Security. Technical Report MS-CIS-98-25, Department of Computer and Information Science, University of Pennsylvania, April 1998. 310 HK99. Michael Hicks and Angelos D. Keromytis. A Secure PLAN. Technical Report MS-CIS-99-14, Department of Computer and Information Science, University of Pennsylvania, May 1999. 307 HKM+98. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A Packet Language for Active Networks. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming Languages, pages 86–93. ACM, 1998. 307, 308 HMA+99. Michael Hicks, Jonathan T. Moore, D. Scott Alexander, Carl A. Gunter, and Scott Nettles. PLANet: An Active Internetwork. In Proceedings of the Eighteenth IEEE Computer and Communication Society INFOCOM Conference, pages 1124–1133. IEEE, 1999. 307 Ler. Xavier Leroy. The Caml Special Light System (Release 1.10). http://pauillac.inria.fr/ocaml. 308 LOW98. J. Y. Levy, J. K. Ousterhout, and B. B. Welch. The Safe-Tcl Security Model. In Proceedings of the 1998 USENIX Annual Technical Conference, pages 271–282, June 1998. 310, 313 LR99. X. Leroy and F. Rouaix. Security properties of typed applets. In Secure Internet Programming [VJ99]. 313 Moo98. J. Moore. Mobile Code Security Techniques. Technical Report MS-CIS-98-28, University of Pennsylvania, May 1998. 313 MTH90. Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. The MIT Press, 1990. 308 Mur98. Security Architecture for Active Nets, June 1998. Draft available at http://www.ittc.ukans.edu/~{}ansecure/0079.html. 313 Nec97. George C. Necula. Proof-Carrying Code. In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 106–119. ACM Press, New York, January 1997. 313 NL96. George C. Necula and Peter Lee. Safe Kernel Extensions Without Run-Time Checking. In Second Symposium on Operating System Design and Implementation, pages 229–243. Usenix, Seattle, 1996. 313 vE99. T. von Eicken. J-Kernel a capability based operating system for Java. In Secure Internet Programming [VJ99]. 313 VJ99. Jan Vitek and Christian Jensen. Secure Internet Programming: Security Issues for Mobile and Distributed Objects. Lecture Notes in Computer Science. SpringerVerlag Inc., New York, NY, USA, 1999. 313, 314
Control on Demand Gísli Hjálmtsý son
1
and Samrat Bhattacharjee2
1
2
AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932 College of Computing, Georgia Institute of Technology, Atlanta, Georgia 303321
Abstract. Control on demand is a paradigm for network programmability at the network transport level. Previous work on active and programmable networking at this level either achieves flexibility by inserting significant software in the critical forwarding path, or achieves efficiency by sacrificing functionality, relegating programmability to control plane connection management. In contrast, control-on-demand takes the middle ground, acting both in the control plane and in the data plane, still without adding software in the critical forwarding path. Rather than applying essential programs to every datagram our approach is to apply the installed programs asynchronously from data forwarding. This way we avoid essential processing in the critical forwarding path, applying the (user) installed service logic for service enhancement only. By retaining the current forwarding model, control-ondemand is consistent with current trends in router architectures with increasingly optimize and hardware enhanced forwarding engines. Applying the service logic asynchronously barely impacts router performance and robustness, making control-on-demand viable in practice in the near future. The main contribution of this paper is the control-on-demand paradigm, and the interface between application (service/user) programs and the forwarding engine. User programs execute in an execution environment, and use this interface to program the facilities of the forwarding engine, and to access the data-path. We describe our prototype controlon-demand IPv6 router, and discuss abstractions and mechanisms we have developed to support control-on-demand, most notably featherweight flows. We discuss two applications we have experimented with to demonstrate the potential of asynchronous enhancement controls.
1
Introduction
The explosive growth in networking and computing has increased the need to introduce increasingly complex network services at accelerated rate. Whereas traditional networks were designed and customized for a single network service model, the rapid 1
Work done while at AT&T Labs – Research, Florham Park, NJ.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 315-331, 1999. Springer-Verlag Berlin Heidelberg 1999
316
Gísli Hjálmtsý son and Samrat Bhattacharjee
change in infrastructure technologies calls for a more adaptable service model optimized for flexibility. However, the high quality of the telephony network is partially due to the customization at all levels of the network. Active and programmable networking enable service specific network customization when (and where) needed, while retaining flexibility. The essential characteristic of programmable networks is that the network interface is widened to allow service (application) semantics to be provided to the network. Current networks offer only preinstalled service semantics for a single service model, accepting only data across the network interface. The interface of an active and programmable network accepts both data and the logic to interpret that data, thereby enabling service specific treatment of the data inside the network, effectively allowing the service model to be customized for each application. The time scale and granularity appropriate for introducing new service logic remains a research topic. In particular, practicality may dictate conservative change policies to enhance network stability and manageability. Similarly, the usefulness of very specialized service logic, applicable only to small set of flows may not sufficiently justify the added network complexity and operational risk. Moreover, the value of protocol updates on network elements is questionable, as the lifetime of network hardware has become a fraction of the lifetime of protocols. Even more important is the time scale that the installed programs operate on. Current research efforts on active and programmable networking range from coarse timescale service management [1] and software updates of network elements, to connectivity management on time-scale of private network provisioning or connection duration [2,3,4], to active involvement in the correct forwarding of every datagram [5,6,7]. Whereas the in-data-path approaches that insert essential software in the critical forwarding path threaten robustness and performance, pure control-plane programs operating only at provisioning and call setup time-scales in comparison offer very restricted functionality because of their inability to act in the data-path. Control-on-demand operates on a time-scale finer than call control, but larger than per packet time scales. The installed service specific programs act on flows, rather than individual packets, and control the router facilities to adaptively optimize their use to maximize flow utility. To this end the control programs exploit local information, forwarding statistics and service semantics. In addition the control programs can opportunistically peek into the data-path, either by subscribing to parts of each data packet, or by peeking at parts of the packets in the flows queue at any given time. This way the programmable model of control-on-demand is significantly richer than that of strict control plane programming, without reducing router performance or robustness. In this paper we describe control-on-demand and how its service model provides sufficient richness to solve interesting problems previously solved only by acting fully in the data path. Yet it is sufficiently restricted and efficient to be viable for practical use in the near term. We give two examples of applications showing the potential of asynchronous enhancement control: the first selective discard as congestion adaptation of a video stream, the second adaptive smoothing of media streams. The first example illustrates how asynchronous application of the control programs achieves
Control on Demand
317
the benefits of in-data-path processing. The second shows how judicious separation of the smoothing work into two time-scales allows the performance critical work to be delegated to the forwarding engine, while executing the (service specific) smoothing policy asynchronously from and at a larger time scale than data forwarding. The rest of the paper is organized as follows. In Section 2 we motivate the service model of control-on-demand and contrast it to other active and programmable networking service models. Section 3 discusses related work. We present the nodal architecture in 4, and discuss our IPv6 prototype router in Section 5. In Section 6 we then discuss the architecture and particularly the programmable interface in more detail. Section 7 addresses security. In Section 8 we discuss three supporting mechanisms that we have developed as part of this work. In Section 9 we discuss applications of control-on-demand. We then conclude.
2
The Programmable Model
The programmable model of control on demand is motivated to enrich the network service model while exploiting fast-path (hardware) optimizations. In particular our goal is to avoid perturbing the critical forwarding path, to ensure that those (current) services for which current network models are satisfactory remain unaffected. In contrast to approaches where each node is either programmable or not [6], or where services are either active or passive [5], control-on-demand allows services to exhibit degrees of activity, ranging from needing only basic forwarding to fully acting in the data path. This way a service can balance the "activity" cost against the potential improvement in service utility. In particular some potentially important applications of programmable networking, including advanced group management in VPNs and floor control in teleconferencing, require only control plane programmability. Clearly existing store-execute-and-forward models can support these type of applications. In so doing, however, they send all datagrams of such flows through slow-path processing. In contrast control-on-demand supports such control plane programmability without any impact on forwarding performance. Similarly, the enhancement semantics of control-on-demand remain valid across the network hierarchy. In larger networks the ratio of processing to bandwidth is different at different parts of the topology. Currently backbone routers have ample bandwidth but limited per packet processing resources, whereas closer to the network edge this ratio is reversed. Using control-on-demand a service specific control policy may act aggressively in the data-path and the control plane close to the edge, but act only in control plane on backbone routers. Our approach is consistent with current trends in forwarding technologies, where switches and increasingly high performance routers implement forwarding in hardware. Adding additional software in the critical forwarding path goes against this trend. In particular, essential software in the forwarding path has significant negative impact on forwarding performance, threatens robustness and interoperability. Beyond
318
Gísli Hjálmtsý son and Samrat Bhattacharjee
forwarding facilities, important hardware facilities including multiple queues and advanced scheduling support are increasingly available in network nodes. To ever achieve viable performance, the model must support programmability while allowing services to exploit these hardware facilities. The programmable model of control-on-demand is inherently efficient. Most other work on active networking change the existing store and forward model, changing it to store, execute and then forward. In contrast control-on-demand leaves the store and forward model unchanged. The dynamically installed service specific controllers execute asynchronously from data forwarding. In particular, a data packet may be forwarded before the controller gets a chance to run and peek at the packet. This separation is similar to the separation of control plane and data plane in connection oriented networks and in line with other work on open signaling and programmable ATM control [2,3,4,8]. Control-on-demand can be viewed as an evolution and generalization of the control plane. Control-on-demand however goes further than prior work on control plane technologies by allowing the service specific controllers to act (while asynchronously) on the data being forwarded. The separation into two threads of execution is ideally suited to exploit hardware facilities. For example in stream smoothing, there are two natural time-scales, a fine scale for scheduling individual packets to a target rate, and a coarser one for setting the target. Control-on-demand can exploit these different time-scales, by having the installed program logic set the target, delegating the fine scale scheduling to the underlying (hardware) scheduler. Whereas there is potential value in having the installed program acting on each and every datagram, many applications proposed for active networking lend themselves nicely to such separation. A most essential characteristic of IP and key to its success is the softness of state inside the network, and the nature of this soft state. Apart from the routing database, (cached) state, most notably the forwarding cache, is used purely for performance enhancement but is not essential for correctly delivering packets to destination. In particular this state can be lost, or removed at the routers discretion without affecting the validity of state elsewhere in the network. Control-on-demand retains this enhancement characteristic, as the installed program logic is not essential for correct forwarding. Instead, the installed program logic is executed asynchronously to forwarding in an effort to enhance service quality. Consequently the installed program could be applied to the data stream on a best effort basis. In particular the router may refrain from executing the installed program during overload or remove it altogether at its discretion. Of course the more predictable the CPU scheduling the better the enhancement.
3
Related Work
While the number of projects on active and programmable networking is rapidly growing, this work mostly draws on the active networking work in IP described in [5,6,7], and the work on control plane programmability in ATM networks in [2,3,4].
Control on Demand
319
In [5] Tennenhouse and Wetherall propose an active networking model carrying typed packets, capsules, containing the data and the necessary code for correct data processing, and forwarding at every node. An appeal of this model is that every capsule carries the code essential for its own correct processing and delivery. Capsules thus retain the property of IP datagrams that each capsule is independent of other capsules. For the most part however, active networking is about establishing, maintaining and sharing service specific state inside the network over multiple datagrams. Control-on-demand in contrast retains the enhancement nature of Internet state generalizing the soft enhancement state to include a program. In ANTS [9] the capsule is optimized to contain only a request for a program to be downloaded and installed, with the first packet simply waiting the completion of this installation before it is processed and forwarded. This procedure amounts to a flow setup, with substantial service provisioning and thus significant flow setup time (the datagram triggering the installation implicitly signaling a flow setup). Control-on-demand similarly uses implicit “signaling” to trigger the program installation, but using more efficient signaling mechanisms. In [4] Van der Merwe and Lesley describe an architecture for programmable ATM networks. The X-bind effort at Columbia [2,3] has explored similar issues, developing a programmable control architecture for connection management in ATM. In contrast to control-on-demand both of these are strictly in the control plane and thus not applicable to problems requiring interaction with the data path. The interfaces and flow level management capabilities in control-on-demand draws on both of these projects. Both the X-bind and Cambridge switchlets, however, assume a significant amount of supporting infrastructure, whereas the only extra-nodal infrastructure needed for control-on-demand is one for retrieving control programs when needed. We simply use web technology, URL’s and a simple HTTP daemon, for this. This work builds on and extends [1] describing an architecture for an active approach to network and service management. It also builds on prior work at AT&T labs, on introducing code into a running C++ program [10]. Other related work, includes the PLAN work on language and security aspects of programmable networking [6] the Darwin project at CMU on application aware networking [11], and CANE project at Georgia Institute of Technology adapting telephony’s advanced intelligent networking (AIN) [12] to packet networks by offering a menu of software functions one of which is for forwarding [6]. The authors of [13] share many of our concerns with prior work on active networking. Their architecture for accessing and installing dynamic code could be used with control-on-demand. Currently we simply use HTTP with the router running an access policy for scrutinizing code servers.
320
4
Gísli Hjálmtsý son and Samrat Bhattacharjee
The Nodal Architecture
The nodal architecture for control-on-demand is Control depicted in Figure 1. The essential characteristic Meta of this architecture is a strong separation between on Control the service generic forwarding engine and the demand user installed control programs executing in an execution environment. In particular the dataInterface path does not go through the execution environForwarding engine ment. The architecture supports multiple execution environments, each managed by a metaDATA PATH controller (Figure 1). The meta-controller is responsible for installing and maintaining the dy- Figure 1 : The Nodal Architecnamically installed control programs. Assuming ture for Control-on-demand. that the code is larger than will fit in one protocol An opaque interface separates the unit (packet/frame), installation means locating, control part from the forwarding authenticating, downloading, verifying, (possibly engine. Note that the data-path compiling) and running the code implementing does not go through the controlthe requested control policy. Control programs ler. are (autonomous) typed objects. The manager maintains a cache of available programs, to avoid the downloading and installation of frequently requested ones. Once installed and associated to the requesting flow, the forward engine and the flow specific control interact directly. A control policy may be assigned and reassigned at any time. Ensuring separation in processing of service specific programs requires mechanisms not found in common operating systems. Such mechanisms are however provided for example in the Nemesis operating system [14]. Although the architecture supports arbitrary execution environments, in our prototyping the execution environment is simply the native code of the nodal hardware. Although the type of execution environment and its properties may play a significant role in various aspects of programmable networking in general, and on safety and security in particular, we see these as complementary to the architectural issues that is the focus of our work. Candidate execution environments could be based on the virtual machine of Java. The forwarding engine performs basic and multicast forwarding. In addition it may have some quality of service enhancing facilities increasingly found in modern routers, such as multiple queues and scheduling capabilities. The capabilities of individual network nodes will vary. One of the difficult parts of this work is to provide an interface to these facilities abstract enough to hide differences in implementation, still rich enough to efficiently exploit them. In particular, the interface abstractions defined herein apply to both IP routers and ATM switches (assuming though frame-based forwarding facilities for an ATM switch), to the level that a single flow may be forwarded across both ATM and IP platforms.
Control on Demand
321
4.1 The Control Semantics Control-on-demand is flow oriented. In a connection oriented network like ATM this simply means that the control programs are assigned to (multicast) connections. In IPv6 we exploit the flow label. For IPv4 networks we use a filter definition and a flow classifier to group multiple packets into flows [RSVP ref]. There are several motivating points to take a flow oriented approach. One is to amortize the “investment” of installing the on-demand controller. Another one is that many important applications of programmable networking only apply at flow or connectivity level, for example, group management or floor control in a multicast. To further reduce the on-line (real-time) demands the control programs are applied asynchronously to the data path. Since the control is for enhancement only it is not essential that it be applied to every packet. Most applications designed for the Internet gracefully adapt to changing network conditions, and rely only on the end-systems for correctness. Thus enhancement services inside the network are never essential, however, but increase in utility when applied more consistently. With these semantics the on-demand control is applied asynchronously to data forwarding, even when fine grained job scheduling (CPU) is available. Since the only essential work performed at the network nodes is forwarding, control-on-demand is inherently efficient; during overload the node may invoke only those programs that reduce the congestion. Therefore, a control-on-demand node has at least the same throughput as a bare forwarding engine.
5
Prototype Implementation in IPv6
We have prototyped the control-on-demand architecture, with the objective to further our understanding of the architectural issues and verify the viability of the paradigm in general and the interfaces in particular. Our implementation consists of three parts, mechanisms to enable communication between applications and controllers, implementing a control-on-demand node, and service specific program prototypes for number of applications. The emphasis of this prototype has been on polishing the abstractions and interfaces and to verify their usefulness (functional verification) by applying them to number of important problems. Our prototype is implemented using a Linux IPv6 router, running kernel version 2.1.43.
5.1 The control-on-demand Router We use the flow label of IPv6 to explicitly identify a flow and interpret it to signal a request for a special treatment. For every labeled flow we maintain state and maintain a separate queue. The state at least caches the outgoing port(s), but in general consists of the attributes assigned to the flow. For our purposes this state includes a controller reference, queue reference, time of flow initialization, and some flow statistics. Figure 2 shows how a flow transitions through four states during its life-span. A datagram arriving with a currently unknown flow identifier is identified as the begin-
322
Gísli Hjálmtsý son and Samrat Bhattacharjee
ning of a new flow at the current node. A new flow state is new flow created, and its state set to initialize. If the datagram conNULL tains the cc-extension a copy of the datagram is forwarded Request to the meta-controller, after the original datagram is routed end-of-flow and forwarded according to the IPv6 routing tables. The Initialize outgoing interface is recorded and cached as part of the Success flow's state. Subsequent datagrams of the flow are routed Active based on its flow identifier, effectively pinning the routes Failure for labeled flow. If the datagram does not contain a ccextension no other flow processing is requested and the Ignore state transits to ignore. In that case subsequent datagrams are simply forwarded. Figure 2: State transiFor flows requesting control-on-demand the environtions in controller ment manager, running in user space, investigates the ex“life” tension header, determining the controller requested. In particular the meta-controller consults its cache of controllers to see if the requested controller is already locally available. If not, using the code reference the metacontroller retrieves and installs it. On successful installation of the on-demand control, the flow state becomes active, indicating to the router that the flow controller is ready to act on arriving datagrams. In addition the controller reference in the state is updated. If the installation fails, the flow state is set to ignore. (In principle controlled by the default policy). The meta-controller may change the state from active to ignore at any time, for example if a controller fails for some reason. While the flow controller is being installed, flow datagrams are simply forwarded. In addition, some general (per flow) statistics are complied. The flow specific controller gains access to this information. Upon flow termination, determined either by inactivity or via an explicit notification from the flow controller, the meta-controller removes the flow state and reclaims resources allocated to the flow. Our prototype implements the essentials of the interfaces described below. In particular the subscribe/publish interface supporting the frame peeking is fully prototyped. The perturbation to the fast path for packets whose flow label is not set is two single word equality tests, but an additional test is needed on the IPv6 input processor to decide the state of a known flow. We use hop-by-hop extension headers to implement the message interface.
6
Details of the Architecture
A key issue in this work is defining the interfaces and primitives necessary and sufficient to support the control-on-demand paradigm. There are four major parts to the interface: the meta-control interface, a message exchange interface, facility access interface, and a subscribe/publish interface. The interface between individual controllers and the forward engine consists of all but the meta-control interfaces.
Control on Demand
6.1 The Facility Access Interface
Meta-controller actions: Assign(flow identifier, controller reference) Notifications: topology-change<set of inputs, set of outputs>
323
The facility access interface Actions on flows (implicit argument: flow identifier): provides access to the re- i) Reservations sources of the forwarding enreserve-buffer(packets, bytes) reserve-bandwidth(bandwidth, set of ports) gine. The interface is used by set-schedule(ordered list of {byte number, rate} pairs) the meta-controller to assign set-attribute(list of {attribute, value} pairs) controllers to flows, and by the ii) Forwarding control flow controllers to manipulate iblock(subset of input ports) : blocks input on the their data flow. The only subset of ports specified. oblock(subset of output ports) : blocks output on the primitive used by the metasubset of ports specified. controller assigns a controller delay(D-time, subset of output ports) : schedules to a flow. Controller assignarriving packets at least D-time units after arrival. ment may change dynamically. In particular, the meta- Actions on packets (implicit argument: packet reference): release-at(time, subset of output ports) : schedules controller may assign a null packet for departure on a set of output ports. controller to a flow at its disblock(subset of output ports) : blocks packet on a set of output ports specified. cretion. The facility access discard() : discards the packet. interface may reflect specific capabilities of the forwarding engine, such as scheduling Figure 3: The Facility Access Primitives capabilities, but hides the particular forwarding technology (e.g., ATM vs. MPLS or IP forwarding). The flow controllers however do learn the local flow topology. This is given as a set of ports participating in the flow. (We use a bit-vector implementation for the set of ports. Another implementation might instead use a list of port references.) We assume that the forwarding engine does basic group management (e.g., adding a leaf). In addition the flow controller is notified of topology changes. This enables the controller to exert flow level forwarding control (connectivity management) as if it were only in the control plane, by specifying properties on a subset of the ports. The controller may block the flow for input on a subset of ports, causing all packets on those ports to be discarded. Similarly the controller can block the flow on a set of output ports, removing those ports from the forwarding topology. These mechanisms allow the controller to do group management and to exercise floor control. In contrast to an in-data-path solution these primitives support flow level “connectivity” management without being in the data-path. Since the controller is activated asynchronously to the forwarding there is a window of opportunity, namely from the time the packet arrives until it is forwarded, within which the control must be run in order to see the packet. The controller may impose a fixed packet delay to increase the size of this window and thus relaxing realtime constraints on controller scheduling. This reduces the overhead of context switching, by making the controller work on multiple packets each time. Other primitives acting at flow level are primitives for resource reservations, and lower level policy selection. The controller may reserve buffers, specifying both maximum number of bytes and maximum number of packets. The latter allows the controller to efficiently limit the total number of packets buffered at the node, for ex-
324
Gísli Hjálmtsý son and Samrat Bhattacharjee
ample for active retransmis- send-to-meta-control(flow, type, data) type (one of): activate, inform, control sion. The interface has a data : type dependent : primitive to reserve bandwidth activate: policy type, code (reference), data on a set of ports. The controller inform: list of attributes. may interact with the underlycontrol: list of {attribute, value} pairs. ing scheduler by providing it send-flood(flow, set of output ports, data) with a list of start byte, rate send-next(flow, set of output ports, data) pairs, {bi,ri}. The semantics of this schedule is that after bi Figure 4 : The Message Exchange Primitives bytes, packets are forwarded at rate ri. If the controller is not activated for a long time the rate simply remains unchanged. This supports flow level smoothing while minimizing the coupling between the controller and the forwarding, and allows us to do smoothing on a best effort basis. The last primitive, set-attribute supports assignment to named attributes. It is used for scheduling property selection, queue priority and more. Until there is a strong convergence in reservation models the reservation interface remains under revision. The controllers may decide the fate of individual packets (see the subscribe/publish interface below). The release-at primitive supports scheduling of a packet for release on a subset of output ports. The primary use of this primitive is to schedule a retransmission of a packet (currently in the buffer) to react to downstream losses, but also allows the controller to explicitly schedule a particular packet. A packet may be blocked on a set of output ports, enabling the controller to filter a stream. Lastly, a packet may be discarded.
6.2 The Message Exchange Interface The message exchange interface provides a mechanism for asynchronous communication between controllers. All application (service) specific signaling is performed using this message exchange. Since these messages are arbitrary in size and content and may in particular include program code, it is important that these controller-tocontroller messages are not sent on a shared signaling channel. The message exchange interface supports exchanges between application(s) and meta-controllers and exchanges between flow specific controllers (service specific signaling). The interface has three primitives: a) send-to-meta-control, b) send-flood, and c) send-next. Whereas the first is the primitive used by applications to interact with metacontrollers, the second and third are used by the application and the flow specific controllers to perform application level signaling. All of the primitives take a flow identifier as an argument. In addition, a metacontrol message sent using the first primitive, has two parameters, a request type, that is one of activate, inform, or control and request data which is request type specific. A meta-control message is sent to all meta-controllers of the flow unchanged (i.e., intermediate controllers cannot change it). The primary meta-control message is an activate message used by applications to install and activate a control policy. The request data for an activate message contains three mandatory parameters: a flow identifier (which may be provided implicitly), a policy type name, and a policy implementation (reference), optionally followed by arbitrary policy specific parameters. The
Control on Demand
325
policy implementation parameter either contains the code, or is a globally valid network reference, a URL for example, from where the policy implementation may be retrieved. The policy specific parameters are provided to the flow controller on execution. If an activate message specifies a flow identifier already associated with the same control policy, the meta-controller performs a “refresh,” reinstalling a new version of the policy implementation. For an inform message, the data contains a flow identifier, followed by a list of attributes whose values are returned. If the attribute list is empty a list of all attributes defined for the particular flow is returned. Similarly the control message contains a flow identifier, followed by a list of attribute value pairs. The send-flood and send-next primitives are used for service specific (signaling) messages between installed controllers, and for (signaling) exchanges between metacontrollers. The meta-control messages are distinguished from the others by setting the flow identifier to zero. Both primitives take two additional parameters, a set of output ports, and service specific data. The message is output on the ports specified, and are either “flooded” in the case of send-flood or sent to the “next” flow specific controller(s) only. The layout of the service specific data is at the service/application’s discretion and is not specified. The data layout tor the meta-control exhanges is analogous to that of send-to-meta-control, containing a flow identifier, request type and request specific data. The only request type currently defined is migrate, taking as data a wrapped object (see the meta-control interface, Section 6.4).
6.3 The Subscribe / Publish Interface The subscribe/publish interface subscribe-stats(flow subscribe-stats(flowidentifier) identifier) allows flow specific controllers subscribe-peek(flow subscribe-peek(flowidentifier, identifier,offset, offset,length) length) offset to subscribe to (request) events offset - -offset offsetpeeking peekingbegins begins length length - -number numberofofbytes bytestotopeek peekatat(0(0indicates indicatesall) all) and information published (on subscribe-ignore(flow indentifier) subscribe-ignore(flow indentifier) request) by the forwarding engine. The three primitives of Figure 5 : The Subscribe/Publish Primitives this interface are shown in Figure 5. The controller may subscribe to simple flow statistics, such as number of packets and bytes transmitted since last invocation, or the number of bytes (packets) currently in the queue. If the flow identifier is set to 0, the controller receives nodal statistics about queue length and packet loss rate. The second primitive, subscribepeek, implements frame peeking, allowing the controller to subscribe to receive (peek at) a portion of the packet payload. Subscribe-peek does not cancel subscription to statistics. The last primitive cancels all subscriptions. The controller gains access to its subscriptions when it activated. A published peek event contains a packet reference, which in turn may be used by the controller to manipulate the packet through the interface. One of the benefits of this interface is that the flow controller may dynamically change the volume of data that goes through it. For example, a flow controller that during congestion performs a selective discard, can subscribe only to the flow statistics until a queue builds up at which time it starts peeking at the application level framing, to selectively discard the less important packages. Even during congestion the volume going through the controller is minimal. This way the controller can manipulate the data-flow without being in the data-path.
326
Gísli Hjálmtsý son and Samrat Bhattacharjee
6.4 The Meta-Control Interface
The controller has a meta- create(data) : create a controller. control interface, that is used clone() : clone a controller. Clone restarts at run(). by the meta-controller for delete() : destroy a controller. management of the dynami- run() : Execute the flow controller. Is the only entry point cally installed control policies. wrap() : wraps the controller (and its state). Returns a wrapped controller (byte string). Every controller must imple- unwrap(a wrap) : reinstantiates and runs a controller. ment this interface. go-next(set of output ports) : migrates one hop. The interface primitives in- go-flood(set of output ports) : migrates to all. clude operations to create a flow controller clone an exist- Figure 6 : The Meta-control Interface Primitives ing controller, and destroy a flow controller. The create primitive has one (untyped) argument for initialization data. This is the controller specific data provided in the activate message to the metacontroller. A clone is an exact replica of the cloned controller (as in fork) including its state, but is executed using the run method. The delete destroys a controller and reclaims resources allocated to it. A controller may be wrapped (for shipping or storage), and later (re)created (unwrapped). The execution of a new flow controller always starts by executing the method run. A flow controller may migrate, either jump one hop (go-next), or be “flooded” (goflood) to all nodes in the flow downstream of the specified set of output ports.
7
Security
Although security is not part of our research, the architecture does have some helpful security implications. As the installed programs are executed for enhancement only, a router may choose not to execute a particular program. Running programs may be interrupted. Thus just like in shared computing environments, local program termination is not essential for system correctness. As the forwarding model is not changed global termination (correct delivery) is unaffected. Before installing code the metacontroller consults a nodal security policy manager. In our prototype the policy simply limits the set of servers from where it will fetch programs. Frame peeking gives the service specific programs read access through copying a portion of the datagram. The per packet frame peeking does not lock a packet from being forwarded or discarded by the forwarding engine. Write access is limited to the packet payload only. Non-interference (in name/address space) among the control programs is assured in our case through standard user level protection, and by limiting each user process to access and manipulate only state of the flow it is assigned to. Resource consumption is managed by resource limits and scheduling.
Control on Demand
8
327
Supporting Mechanisms
8.1 Exploiting the IPv6 Flow Label Control on demand is flow oriented. Whereas in a connection oriented network this would simply mean that controllers are associated with connections, in an IP network it means that a sequence of packets are identified (classified) as belonging to the same flow. In our prototype we exploit the IPv6 flow label to explicitly identify flows. To define a flow an end-system simply sets a (locally) unique flow label on a packet intended for that flow. Assignment of a non-zero flow label declares the intent to use this flow for “something special,” and enables the end-system to refer to the flow for later attribute assignment. A router processes such a packet by creating an entry in a flow cache, which in our current prototype is simply a copy of the corresponding entry (in the regular cache) for the destination address. Subsequent packets with the same flow label are routed using this flow cache, effectively pinning the route of the flow. To assign a controller to the flow, we use hop-by-hop extension headers (see below). A new controller may be assigned to the flow at any time. Our use of the IPv6 flow label is consistent with its intended use, and does not change the softness of the state within the network. Since control-on-demand is enhancement control, and is never assumed, a router can discard the cached flow state at its discretion. In particular policies used for invalidating state in the regular cache may be applied to the flow cache. Thus, explicit flow release on termination is not necessary.
8.2 Frame Peeking Although the semantics of control-on-demand are designed to reduce the real-time (on-line) processing requirements on the router control processor, acting in the data path ultimately requires viewing the data being transported. To enhance the efficiency of this we use a new primitive for frame peeking - a mechanism that enables a controller to peek at a portion of a datagram rather than the full frame. The primitive takes two parameters, a) offset within the payload from where peeking starts and b) length, indicating the number of bytes to peek at. This primitive is motivated by the observation that most often the installed program only needs to peek at very few bytes in the application level header (see MPEG stream thinning below). Enabling the controller to only peek at parts of each packet/frame as opposed to having to be fully in the data path enhances the efficiency in two ways. First, by reducing the bandwidth across the boundary between the forwarding engine and the installed programs, and second, by allowing the datagrams to remain in the buffers of the forwarding engine. In a software router, like our prototype, this benefit is primarily the reduction in data copying from kernel space to user space where the controllers execute. In a more optimized high performance router, leaving the datagrams in the forwarding buffers provides an additional performance benefit. In particular, if the underlying forwarding hardware is an ATM switch, frame peeking reduces packet reassembly needed and avoids segmentation completely, further enhancing performance.
328
Gísli Hjálmtsý son and Samrat Bhattacharjee
Frame peeking and its benefits are not specific to control-on-demand and could benefit store-execute-and-forward approaches similarly by reducing the data copying across the abstraction boundary. However, frame peeking is performed at the point of buffering. In many routers this happens at output queue, thus forbidding frame peeking to be used as basis for routing decisions.
9
Example Applications of control-on-demand
To further verify the viability of our approach we have implemented numerous applications of control-on-demand. We discuss two of them below, and show how asynchronous application of the control program is sufficient to achieve the same results. 1. Selective_Discard() 2. While (1) { 3. Sleep(some time); 4. (Qinfo, matches, signature) := query (Q, offset, peekLen); 5. if (Qinfo.Bytes < highWaterMark) continue; 6. response r = new response(signature, ||matches|| ); 7. for ( i=0; i < || matches ||; i++ ){ 8. byte frame_type := matches[i]; 9. r[i] := NOOP; 10. if (frame_type != I-frame and Qinfo.Bytes > lowWaterMark){ 11. r[i] := DISCARD; 12. Qinfo.Bytes := packet->length; 13. } // endif; 14. } // endfor 15. respond( r ); 16. } // endwhile
Figure 7: Code segment implementing selective discard for MPEG
9.1 Stream Thinning as Congestion Adaptation We have implemented selective discard for a MPEG video streams. MPEG is hierarchically coded using three types of frames I-frames, P-frames and B-frames. Whereas the loss of an I-frame can affect all frames until the next I-frame (a group of pictures), loosing a small number of P- and B- frames only degrades quality marginally. The MPEG selective discard policy discards the less important P- and B-frames in an attempt to protect the to I-frames. Our approach to congestion adaptation using control-on-demand is based on the observation that during congestion buffers are long and thus queuing time significant. When run, the video stream controller executes a query (Figure 7, line 3) and blocks. When invoked the request returns flow and queue statistics, and a list of entries for each packet currently in the queue, each entry containing the result of the frame peeking query. Most of the time, the queue is small (or empty) and the controller goes to sleep. If however, the queue exceeds a high water mark, indicating congestion, the controller prepares a vector of operations, one for each packet, discarding all B and P frames until a low watermark is crossed. On completion the controller returns this vector using the respond primitive (Figure 7, line 15).
Control on Demand
329
This example also nicely illustrates the effectiveness of frame peeking for stream thinning. We used this scheme on a short MPEG movie, encoded into 26546 packets, a total of 13.86 MB. Most datagrams are of size 560 bytes, with few of size 170 bytes, with average of 522 bytes per datagram. Peeking at the one byte needed to identify the MPEG frame results in 27 KB (0.2%) of the data to be copied to user space. Blocking a packet also takes only one byte of response keeping the data copying at 0.2% of what an in-data-path solution would do. Enlarging the packets (the MTU in our testbed is 1500 bytes) would lower this ratio further. We conclude that the frame peeking significantly reduces the data copying across the interface.
9.2 Application-Specific Traffic Shaping Another example to demonstrate the effectiveness of control-on-demand, we have implemented a work-ahead smoothing service. The service performs applicationspecific traffic shaping to reduce the burstiness of a variable-bit-rate video stream, while avoiding underflow and overflow of the client playback buffer. The onlinesmoothing service has two natural time scales: a medium time scale for computing a transmission schedule consisting of target transmission rates, and a fine time scale for coordinating the transmission of individual packets based on the schedule. The smoother subscribes to flow statistics, including packet sizes and timestamps, but does not need to copy the packet contents. When activated by the CPU scheduler, the smoothing controller executes code to generate a list of {byte, rate} pairs, which are given to the packet scheduler (using the set_schedule primitive). At the data-path level, the packet scheduler switches to ratei after transmitting bytei. Hence, while the controller applies service-specific criteria to set the schedule, the performance-critical work of scheduling packets for transmission is delegated to forwarding engine. For the smoothing to be effective, the CPU scheduler must run the smoothing controller before the schedule is depleted (it is acceptable to run it too early). Assuming that each schedule is used for transmitting several packets this requirement is not hard to meet. Hence, using control-on-demand the application specific smoothing can be realized in an asynchronous manner with negligible impact on forwarding performance.
10
Conclusion
Control-on-demand is a new paradigm for active and programmable networks. Its service model provides sufficient richness to act in the data path, yet is efficient enough to make it practical. Control-on-demand does not adopt the store-execute-andforward model, but retains the store and forward model unchanged. In particular, control-on-demand does not add any software in the critical forwarding path. The programs installed on-demand, are executed on best effort basis at the discretion of the router asynchronously from data forwarding. Control-on-demand state is enhancement state (including the programs) and not needed for correct forwarding, thus maintaining one of the critical features of the Internet.
330
Gísli Hjálmtsý son and Samrat Bhattacharjee
The service model naturally allows the installed control programs to exploit lower level facilities, in particular hardware facilities. In addition, through frame peeking which allows the control programs to peek at fraction of the datagram payload, the controllers may adjust the number of bytes "peeked at" and thus can control the degree to which they act in the data path. Our results show that the savings in bandwidth between controller and forwarding engine is significant. For stream thinning, we showed an example where peeking only at the one byte needed to determine the payload type, data copying was reduced to mere 0.2% of the data flow. We have implemented control-on-demand prototype on an IPv6 router. Through experimentation with application of control-on-demand to number of problems we have verified the functionality of the architecture and interfaces. We exploit the IPv6 flow label to facilitate flow level state sharing and implement flow pinning, while retaining the softness of the flow state. We conclude that control-on-demand is sufficiently rich for range of applications, but is at the same time efficient enough to be of practical value.
References: 1. Gísli Hjálmtsý son and Ajay Jain, “Agent-based Approach to Service Management - towards Service Independent Network Architecture,” Integrated Network Management V - Integrated management in a virtual world, Proceedings of the Fifth IFIP/IEEE International Symposium on Integrated Network Management, pp. 715729, San Diego, California. Aurel Lazar, Roberto Saracco and Rolf Stadler editors, Chapman & Hall, May 1997. 2. A.A. Lazar and R. Stadler, “On Reducing the Complexity of Management and Control of Future Broadband Networks.” Proceedings of the Workshop on Distributed Systems: Operations and Management, Long Branch, NJ, 1993. 3. Aneroussis, N.G., Lazar, A.A., and Pendarakis, D.E., ``Taming XUNET III,'' ACM Computer Communications Review, Volume 25, Number 3, July 1995, pp. 4465. 4. Sean Rooney, Jacobus E. van der Merwe, Simon Crosby and Ian Leslie The Tempest, a Framework for Safe, Resource Assured, Programmable Networks, IEEE Communications Magazine, Vol. 36, No. 10, October 1998, pp.42-53. 5. D. L. Tennenhouse and D. J. Wetherall. “Towards an Active Network Architecture,” Computer Communication Review, 1996. 6. D. Scott Alexander, Marianne Shaw, Scott M. Nettles and Jonathan M. Smith, “Active Bridging,” Sigcomm 1997, Cannes, France, September 1997. 7. Samrat Bhattacharjee, Ken Calvert and Ellen W. Zegura. “An Architecture for Active Networking” High Performance Networking (HPN'97), White Plains, NY, April 1997. 8. Gísli Hjálmtsý son, "Lightweight Call Setup - Supporting Connection and Connectionless Services," in Proceedings of the 15th International Teletraffic Congress ITC-15, pp. 35-45, Washington, DC, USA. V. Ramaswami, P. E. Wirth editors, Elsevier, June 1997.
Control on Demand
331
9. D. Wetherall, J. Guttag, and D. L. Tennenhouse, "ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols", IEEE OPENARCH, San Francisco, CA, 1998. 10. Robert Gray and Gísli Hjálmtsý son, "Dynamic C++ classes - A Lightweight mechanism to update code in a running program," in proceedings of the USENIX Annual Technical Conference, pp. 65-76, June, 1998 11. Prashant Chandra, Allan Fisher, Corey Kosak, T. S. Eugene Ng, Peter Steenkiste, Eduardo Takahashi, Hui Zhang, “Darwin: Resource Management for Value-Added Customizable Network Services,” Sixth IEEE ICNP, Austin, Oct. 1998. 12. Bell Communication Research, Inc., “AIN Release 1: Service Logic Program Framework Generic Reqirements,” FA-NWT-001132. 13. Dan Decasper and Bernhard Plattner “DAN: Distributed Code Caching for Active Networks” in the proceedings of INFOCOM’98, San Francisco, California, March 1998. 14. I. M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden, “The Design and Implementation of an Operating System to Support Distributed Multimedia Applications,” IEEE JSAC, Vol. 14, No. 7, pp. 12801297, September 1996.
Agent Based Security for the Active Network Infrastructure Stamatis Karnouskos, Ingo Busse, and Stefan Covaci German National Research Center for Information Technology Research Institute for Open Communication Systems (GMD-FOKUS) Kaiserin-Augusta-Allee 31, D-10589 Berlin, Germany http://www.fokus.gmd.de/ima/
Abstract. Security in Active Networks is still in its infancy! This paper presents a new Agent-Based Security architecture for the Active Network Infrastructure (ABSANI). It is explained why agents in combination with Java are considered the appropriate solution for security architecture and how this can be applied in the Active Networks. An agent-based Active Node architecture is introduced and ABSANI is placed within that approach. Subsequently, all the basic components of the ABSANI are analyzed arguing for the benefits they offer. Finally an application scenario of Place-oriented Virtual Private Networks is demonstrated.
1
Introduction
This approach integrates multi-domain parallel evolving technologies (Agents, Java, CORBA). We try to mix the benefits of Agent Technology and where needed of CORBA in order to apply it successfully to the Active Networks domain. We present shortly these areas, how each one can be used as a benefit to the other and where and why our approach stands today in relation with the already ongoing research.
1.1 Active Network Technology The last years a variety of approaches have been pursued in order to provide a flexible programmable network infrastructure that could "change its behavior on drop of a dime". Active Network (AN) technology aims to move dynamic computation within the network and therefore making it more intelligent not just to its end-points but also in the intermediate nodes. An Active Network is a group of network nodes (switches, routers, -called Active Nodes hereafter-) that support the deployment and execution of user applications (embedded in the user communications), without interrupting the network operation. In this way, an Active Network is in the position to offer dynamically customized/programmed network services (e.g. connection) to the customers/users or even enables users to inject their own applications to support Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 330-344, 1999. Springer-Verlag Berlin Heidelberg 1999
Agent Based Security for the Active Network Infrastructure
331
their communication needs. Programmable networks open many new possibilities for innovative applications that are unimaginable with traditional data networks. This dynamic network programmability can be conceived by two different approaches: I. In-band programming of the network nodes (also widely known as the capsule approach). The program is integrated into every packet of data sent to the network (the program is injected on the same path as the data). When these capsules arrive at the Active Node, the node evaluates the programs and adapts its functionality. The programs within the capsules are typically very small due to the size limitation of the packets and the transport overhead imposed by the capsule programs. Active Network programmability based on capsules is therefore limited. That is definitely negative especially in the context of connection-oriented communication environments where active node re-configuration/programming (activated by the reconfiguration of network connections) is needed much less frequently than the processing of packet payload. It is not necessary and not very efficient to equip each data packet with a computation capability as this adds too much overhead to the processing of packets. Thus capsules have very low utilization in such context. II. Out-band programming of the network nodes. Here the programs are injected into the node in a different session from the actual data packets that they affect. User would send the program to the network node (switch/router) where it should be stored and later when data arrives, it is executed processing that data. The data can have some information (e.g. special tags) that would let the node decide how to handle it or what program to execute. Within this approach which makes clear the separation of data/communication packets nodes can be programmed via injection of new program code into the active nodes, where injection can typically be done by specific packets (e.g. mobile agents) that are evaluated at the network nodes. Our architecture supports exactly this approach. Finally in this category falls also the notion of remote manipulation (binding) of the node’s resources through a set of well defined interfaces [1]. This is not considered a pure AN approach as we have high-level configurability/remote manipulation and not programmability of the node. The difference between remote manipulation and active code injection is similar to the difference between a RPCbased and a Mobile Agent (MA)-based software design paradigm, where MAs can help to increase the flexibility and robustness. In addition, it allows for load balancing of the active network services.
1.2 Agent Technology Software agents is a rapidly developing area of research. The research community has still not found a clear answer to the most popular question "What is an agent?" and the debate still goes on. A general answer could be: Agents are software components that act alone or in communities on behalf of an entity and are delegated to perform tasks under some constraints or action plans. However agents come in myriad of different types depending on their nature and the environment.
332
Stamatis Karnouskos et al.
Examples are: Collaborative agents, Autonomous/Proactive agents, Interface agents, Mobile agents, Reactive agents, Hybrid agents, Intelligent/Smart agents, Mental/Emotional agents etc. The above categorization is not unique and depends on some of the attributes agents show in greatest degree. Of course there can be mixed agents i.e. an Intelligent Agent can also be Mobile. In our Active Network infrastructure a variety of agents can be used. E.g. • Intelligent agents that reside on the node and "intelligently" configure the node's resources for optimal performance. • Mobile agents that can be "dumb" but execute trivial tasks in all nodes of the Active Network Infrastructure • Collaborative agents that work in teams and take care of the security within an Active Network domain. E.g., automatic certified security updates on the AN nodes, elimination of denial of service attempts by blocking the source of attack to the nearest AN node etc. Mobile agent systems provide the AN infrastructure with many advantages. MAs shatter the notion of Client/Server model and eliminate its limitations. They provide robust networks as the hold time for connections is reduced only to the time required to move an agent, the agent carries credentials and therefore the connection is not tied to constant user authentication, load balancing can be achieved as there is no request flow across the connection in order to "guide" the agent and respond to results, there even has been already standardization efforts defining interoperable interaction between agent systems [2].
2
Motivation
Security in Active Networks is still in its infancy!Active node programming is typically a security-critical activity. Of course in such a programmable network the security implications are far more complex than in current environments. Although there has been some research concerning the security of AN little or no effort has been made to make a dynamic, extensible, configurable and interoperable. ANs demand that this security architecture is as highly programmable and evolvable as possible. Extensive and expensive authentication measures are necessary to protect the active node resources from malicious intrusions. Such security measures can not be applied on the basis of individual packets due to their time and space requirements. Our solution is an Agent Based Security Architecture for Active Networks. With this approach we don’t seek a one-side technological approach to the AN security problem but the integration of parallel evolving technologies. ABSANI aims in integrating cutting-edge technologies in order to produce a high-security architecture and deal with the advanced security threats that Active Network technology introduces. There is no need in re-inventing the wheel in the security approach we take. By building upon existing security schemes we make sure that our architecture is open and interoperable. We understand that these are parallel developing domains into which much research effort has been invested the last years and which will keep on evolving fast. By integrating state-of-the-art components we make sure that our architecture stays up-to-date and advances/adapts to current needs as its components evolve. That not
Agent Based Security for the Active Network Infrastructure
333
only is in favor of its internal/external security but also of its lifetime. Within the ABSANI architecture we try to encompass the flexibility and special characteristics of agent technology. - resource abuse - masquerade - repudiation - denial of service attack
complete control of agency over hosted agent
Agency
Agency - eavesdropping - alternation - record/replay
Place
Message Services / Services / Resources Service / Resources Resource
Class Server Registry
User
- eavesdropping - alternation - masquerade
Fig. 1. Security Threats to Agent-Based Applications
We use the agent-based approach to program an Active Node. In such an environment author of the MA code, the user, the owner of the hardware, the owner of the execution platform can be different entities governed by different security policies in a heterogeneous environment. As we also see in Fig.1 security in such an environment is an extremely sensitive issue. The hosts have to be protected from malicious agents and the agents themselves have to be protected from malicious hosts or other malicious agents who could attack them. Moreover the communication road between the AN nodes has to be protected with state of the art security techniques. The Agent Community as well as the AN Community work on these topics. Our open security architecture assures that future solutions in the agent security domain can be applicable to our approach, therefore strengthening the Node's protection system.
3
The Active Network Architecture AN Node #2
Legacy Router AN Node #3
Agent AN add-on Legacy Router
Agent AN add-on Legacy Router
AN Node #1
Agent AN add-on Legacy Router
Legacy Router
User's AN or Legacy Router
Fig. 2. Active Network Infrastructure
334
Stamatis Karnouskos et al.
The Active Network Infrastructure is seen as a network of co-existing AN nodes and legacy nodes. User initiates agents that traverse the network and configure the Active Nodes. In Fig.2 the user has initiated an Agent to change the behavior of AN Node #2 and AN Node #3. The agent visits the target node and executes. Then, having fulfilled its tasks, moves to the next AN Node via the Legacy Router. There he executes again. Our notion of an Active Node architecture is with embedded the agent technology (illustrated in Fig.3). As we can see agents can empower current Routers and transform them to Active Nodes. The resources of the node can be accesses/controlled by visiting agents and according to the node's policy schemes.
Agent AN add
1st Execution Environement
nth Execution Environement
Agent Platform CORB SNMP Interface
Legacy Router
Abstraction Layer of Router Resources Routing Hardware
Fig. 3. Active Node Architecture
4
The Security Architecture
Security can’t be an afterthought! It has to be integrated with the node's core function and not implemented at the end as an extra, optional or explicitly called service. The new security architecture for AN proposed hereafter is based on mobile agent technology. Wherever we detect significant benefits we make use (Fig.4) of the Common Object Request Broker (CORBA) [3] which is today an established standard that enhances the original RPC based architectures by allowing relatively free and transparent distribution of service functionality. Currently no standard that handles the interoperability between different agent platforms and the usability of CORBA services by agent based components exists. By further developing this architecture we hope to provide feedback to future standardization efforts. Active Node Security Services Agent System
CORBA
Node Resources Fig. 4. Technology view of ABSANI
Agent Based Security for the Active Network Infrastructure
335
The architecture consists of Places that interact with the core of the architecture (Fig.5). The communication is done mainly between the enforcement engines and Resource Managers. Analytically the components that this architecture consists of are:
Place #1 Enforcement Engine
Enforcement Engine
Credential DB Resource Manager
Policy DB Component DB
Management Place
Place #n
...
Credential DB
Credential DB Resource Manager
Resource Manager
Policy DB Component DB
Cache
Audit
Enforcement Engine
Policy DB
Cache
Component DB
Audit
Cache
Audit
Node-Enforcement Engine
Node Credential DB Node Policy DB
Node - Resource Manager
Node Component DB Audit
Cache
Fig. 5. Overall Architecture view
4.1 Place A Place is a context within an agent system1 in which an agent is executed. This context can provide services/functions such as access to local resources. A place is associated with a location which consists of a place name and the address of the agent system within which the place resides. A Place can be used in different ways. Places are: • Dynamically assigned to agents as they enter the node. The criteria can vary e.g. all agents coming from a specific user or agents belonging to a specific policy scheme etc. A policy manager and a resource manager are assigned to the Place and are given the general security guidelines, which can never be bypassed. If an agent has sufficient credentials then he can fully interact with the components
1
An agent system is a platform that can create, interpret, execute, transfer, and terminate agents. An agent system is identified by its name and address uniquely. One or more Places reside within an Agent System.
336
Stamatis Karnouskos et al.
e.g. change the Place's policy, ask for more resources, insert elements into the component database etc. •
Statically assigned per entity (e.g. user, enterprise etc). Again static resources are given to the Place and the local Resource Manager manages them. With this way it is possible for an Enterprise to setup a network of Places in various nodes, creating a Place Oriented Virtual Private Network (PO-VPN). This offers several advantages e.g. secure communication between company-trusted places etc.
4.2 Policy DB The Policy Database is responsible for maintaining all policy schemes. By separating the PolicyDB from the Enforcement Engine we insert a dynamic way of policy modification within the node. We use an already existing language to define the policies to be stored in the database. The security policy defines the access each piece of code has to resources. Signed code can run with different privileges based on the key that it used. Thus users can tune their trade-off between security and functionality (of course within the allowed limits). We make use of the principal of least-privilege. This principal states that only the minimally powerful authority should be used to authorize a request for access. Thus any mistakes from “powered” users will lead to the least possible damage. Following this thought a principal with the authority to do many different things should be able to indicate which one of those authorities should be used in a specific request. E.g. An administrator wants to backup the Node's databases. He holds two keys the Supervisor_Key (allowed to do anything within the DB) and Read_Key (allowed only to read the DB). He should use the second key to backup his DB. Thus even if something goes wrong no modification/damage can occur at the DB. Any attempts to describe the security policy in terms of each individual principal's authority to access each individual object is not scalable and not understandable for those instituting the policy. Thus it has been proposed to group principals and objects into sets with common attributes, where the attributes are used in making security decisions rather than the individual identities. So we have Rolebased Policy, Group policy, clearance labels, domains etc We are also experimenting with the KeyNote Trust Management System [4] in order to realize flexible policies. In any case policy files are humanreadable/understandable.
4.3 Credential DB Credentials of principals/code & components are stored in this database. A principal is an entity that can make a request for access that is subject to authorization. Security relies not only to the authentication of the entity but also to the
Agent Based Security for the Active Network Infrastructure
337
activities he wants to perform. The credentials combine a description of the identity of the principal and also attributes associated with the principal and the actions he wants to perform to take the decision whether he is granted to do what he asks for or not. Scenario: The principal may want to execute code that is not trusted (but the principal is trusted). On hard node security level this should be denied. Therefore the Enforcement Engine checks a) if the principal is trusted and allowed to perform the desired action b) if the code he wants to execute is trusted. X509v3 and SPKI Certificates [5] are used as credentials in a heterogeneous environment with a key used as the primary identification of a principal. The credentials include a hash of the content, list of signers and their signatures, certificates, other info associated with the specific action or agent. Credentials can be associated with various components such as agents, code, policies etc. Credentials are used to: • Verify that the component was created/distributed/authenticated by the claiming principals. • Verify that the component hasn’t been altered after it has been signed. • Fulfil partially the non-repudiation need so that the originator of that code can't deny it.
4.4 Component DB The Component Database can be considered a general Database of active code, protocols, etc. It can also be used for caching agent's code but its use is far more extended than simple caching. As we will demonstrate it is a non-removable part of this architecture that strengthens the overall security. Security is by nature overhead in the communication and execution in order to protect the system. We accept that. Yet there are novel ways/techniques to minimize this overhead (under certain conditions) and fortify the Security on the node.
The multiple re-visit by the same agent scenario: An agent performs multiple visits to the node. Each time we verify the agent's credentials, put him within a specific policy framework, check it while it executes, authorize every call it makes in other objects or resources it wants to use. It is obvious that if this agent is a frequent visitor it is dull to re-apply the same actions again and again. A caching scheme must be used. Now this caching can be done in different levels. We can cache the agent's code, the agent's credentials, components that the agent needs, monitor the agent's use of resources and associate with a specific agent code etc. Then the next time the agent comes to the node we don’t have to verify its user nor its code. Also as it has executed before we know approximately what its behavior and needs are. Furthermore we have in the Component DB stored its verified, checked and authorized code. Thus we take from our Component DB the code of the agent (which we trust) and only the data of the newly arrived agent. In that way we avoid the repetition of authorizations which are time consuming. Of
338
Stamatis Karnouskos et al.
course this is a policy matter and can be changed but the node should have the means to provide this flexibility, and in order to do that we need the Component DB.
The common component usage scenario : As before, we have agents that visit our node. In this case the distinguishingcharacteristic is not that the code of the agent comes is the same (as in 4.4.1), but that they make use of similar components. E.g. the agents in order to execute an action need some special protocol or some special cryptographic module etc. We (the Node Manager or even the Place Manager) could provide such components in the Component DB signed and tested in the specific environment. The agent then can make a call to those components and perform its actions. As all components will be signed the agent can decide whether it is safe to use those components or not. Such a DB serves in multiple ways. The agent can be lighter as there is no need to carry everything he needs, the node security is enforced as it executed components that have been thoroughly tested by the Node Provider and all the actions are faster as overhead due to security actions are minimized.
4.5 Resource Manager A Resource Manager is available in order to handle the resources. Place Resource Manager. The Place resource Manager can handle the resources that are dedicated to a specific place. It can be contacted also directly via the agents that reside in the associated place also in the case that there is a need for more resources. • Node Resource Manager: Handles the LocalNode Resources. It is contacted via the NodeSecurityManager or via the PlaceResourceManager (Fig.6). It is also the gateway to the resources of another node or nodes. An interface is provided on how this security Architecture interacts with the Resource Manager. •
Note that the resources available to a certain Place are transparent to the Agent. That means that local resources could be extended via CORBA in order to access resources in other AN nodes. This helps with the Place Oriented Virtual Private Network (PO-VPN) as we will explain later.
4.6 Cache The Cache is another essential part of the architecture in order to improve performance. Security checks are time-/computing- consuming processes. In our effort not to duplicate all the time the security checks we have a cache. Caches exist in all Places and are accessible via the Security Enforcer only. Security checks that have been done via the Enforcement Engine are stored with a time limit in the cache. If the time limit expires then the security checks are performed again, otherwise the security check is considered valid and is used by the system.
Agent Based Security for the Active Network Infrastructure
339
The Policy DB can be dynamically updated via the Enforcement Engine any time. Thus the problem is faced that the cache contains outdated information. We solve this problem by deleting -each time the policy for an Entity changes- the cached security checks that are associated with this key/person partially or completely. So next time a security check is requested it will not exist in cache and it will be performed from the beginning. This is a novel method to speed-up the performance of our architecture. Place #x
Place-Enforcement Engine
Place-Credential DB Place - Resource Manager Place- Policy DB
Place- Component DB Place- Cache Place- Audit
Node-Enforcement Engine Node Credential DB Node Policy DB Node - Resource Manager Node Component DB Node Audit
Node Cache
Fig. 6. Figure 1 - Component Communication View
4.7 The Node Management Place A special dedicated Place, the Management Place is responsible for changing the Node's general behavior (Policy, DBs etc). Agents that execute in that environment are "privileged" agents and are under highest security controls. They are able to modify the node Databases and its security scheme, thus extra care has to be taken. Generally this environment should be restricted only to Node Administrators. Normal users can change the behavior of Places assigned to them but they are not able to contact/execute within the isolated and highly protected Management Place. Provisioning and Configuration is done only via the Management Place.
340
Stamatis Karnouskos et al.
4.8 Auditing Experience has shown that 100% security is difficult to realize - if not impossible - due to the multiple factors that interfere. Collecting data generated by network activity provide a useful tool in analyzing the existent security and also trace back (if possible) the originators of a security breakout. Audit data include any attempt to achieve different security level or change entries in the system's databases etc. Intrusion attempts can also be detected via audit e.g. when we see repetitive failures in the attempt to use a component/service we can adapt our policy/behavior so that we prevent any possible intrusions. The more detailed the audit process is the better can various activities be debugged and protected from repeated errors or false configurations.
4.9 Enforcement Engine The Enforcement Engine is used to enforce the policy on the Node and on the Places. An Enforcement Engine must satisfy three important rules. • It is always invoked. The Enforcement Engine should not be called explicitly. Each action should be evaluated and allowed only if it complies with the Policy. •
It is tamperproof. The information that the Enforcement Engine relies on shouldn’t be altered in any way by third unauthorized entities. This calls for Signed objects that no-one can alter.
•
It is verifiable. Enforcement Engine relies on trusted unchanged basic code in order to boot-up. Then its abilities can be expanded.
The Node Administrator is able to use a GUI and edit the Node Policy & Credential Database prior of system run. Place Administrators are able to alter their Policy & Credential DBs via Agent Interface.
5
The Language Decision
One approach is to design a new language tailored to the needs of active networking and our system. The difficulty would be i) designing from the scratch a new language with a bunch of desired features (e.g. safety, performance), ii) if we don’t manage to address all required features needed by the user it would be impossible for user to implement the mobile code he wants, iii) it would require a huge amount of work to keep the language up-to-date with all needs, iv) it would be used by a limited number of people (AN people only) and therefore bugs, errors, etc would be seldom if not at all reported. The other approach is to use an existing language. Java is a very popular language designed especially for mobile code and with security in mind. Multiple research (and not only) domains use this language. Therefore bugs, errors are found and reported fast. The language is a commercial product and advances as day by day
Agent Based Security for the Active Network Infrastructure
341
new features/libraries are added. Also Java is a safe language. The basic security concepts in Java are based on the following components: language design, byte code verifier, class loader, security manager. In the following each part will presented and investigated on how to use the concept to support a security model for the agent platform. First of all, Java is a safe language. That means, there are several mechanism inherent to Java, providing protection against incorrect programs, notably: strictly typed language, careful control of casts, lack of pointer arithmetic,automatic memory management including garbage collection to avoid memory leaks and dangling pointers, check of array references to ensure that they are within the bounds of the array Even though a compiler performs thorough type checking, there is still the possibility of an attack via the use of a “hostile” compiler. Since the agency does not load the source code of an agent but already compiled code in the form of class files there is no way of determining whether the bytecodes were produced by a trustworthy compiler or by an adversary attempting to exploit the agency. Therefore a class verifier is called for. The class verifier [6][7] of Java is used to check every class that is loaded into the Java virtual machine over the network. Before any loaded code is executed, the class is scanned and verified to ensure that it conforms to the specification of the Java virtual machine. The class verifier operates in four passes. The first pass checks that the class file is conformant to the class file format. The second pass performs all verification that can be performed without looking at the bytecode. This includes for example a check whether final classes or methods are subclassed or overridden, respectively. The third pass is a data-flow analysis on each method assuring that there will be no stack over- or underflow, registers always have a value when being accessed, methods are called with appropriate arguments, types are correctly used, and that the opcodes have appropriate typed arguments on the stack and in the registers. This is also referred to as the bytecode verifier. The fourth pass is done during run-time. It ensures for example that a method exists when being called, i.e. it guarantees that the symbolic references are working. The class loader [8] first checks the local codebase of an agency. If a class is available locally it is not loaded over the network but from the local codebase. This prevents the system classes with access control checks from being replaced. In addition the class loader sets the protection domain. The security manager [9] is contacted whenever sensitive system resources, such as the file system or the network for example, are accessed. A check method is called in order to determine whether the calling entity has the required access permissions. To distinguish between the access of foreign classes and the access of system classes the call stack is analyzed. The class loader of each call on the stack is determined and the permissions are the intersection of the permissions of each protection domain which is contained in the class loader.
342
6
Stamatis Karnouskos et al.
Design Goals Fulfilled
This Security Architecture has been designed with the following guidelines in mind: • Simplicity. The model is as simple as possible to understand and administer. The simpler the whole thing is the better the Security Architecture functions and evolves. • Scalability. Our Security architecture can be applied from small and low agent/node populated systems up to large intra- and inter-enterprise ones. To make that sure we: i) have a flexible and advanced policy and access controls (role-based security etc), ii) support of various domains that enforce different policies, iii) Manage distribution of data and cryptographic keys e.g. across the network without human intervention. • Flexibility This is probably the most significant driving force in the design of this security architecture. Flexibility is enhanced to the maximum for end-users and administrators but not at the cost of safety/security. Choice of access control Policy, choices of audit policy or security functionality profiles are some examples. • Interoperability The architecture uses CORBA for interoperability reasons. CORBA guarantees consistent security schemes among heterogeneous systems where different ORBs are deployed by various vendors. We raise this security to a higher level so that the Agent world is able to use these advantages. Also we use the Grasshopper [10] a MASIF [11] compliant Agent Platform. • Performance The trade-off between Performance and Security is always a controversial issue within the research community. Security is by its nature overhead. Though different users have different needs. We can't simply provide a homogeneous security facility. Security should be user or even task specific. The super-user who enforces a specific security scheme (in a Place or a node) should also be able to select/decide on the tradeoff between certain security and performance (of course within some limits). For better performance we have included Caches within each execution place as well as Component_DBs where code can reside. We hope that in the future the performance of Java will be also improved. • Object-Orientation Interfaces are purely object oriented. We think that in this way system's integrity is promoted and complexity of security mechanisms is hidden under simple interfaces. Those interfaces could be at any time changed/enhanced without an impact on the way this architecture nor its users use them. This approach offers also survivability as well as ability to advance and adapt to future needs. • Access-Control Access Control aims at preventing an agents from accessing unauthorized resources. In our Security Architecture calls to resources are intercepted. Then the Security Manager is called in order to decide whether this call complies with the
Agent Based Security for the Active Network Infrastructure
343
Policy. If so the action is allowed otherwise denied and an error code is returned. Primarily the following goals must be satisfied : Safety - A Safe system limits the possibility that an agent will write to another agent’s namespace and therefore bringing it into an unstable, false or unintended state. Privacy - Agents should not be able to access the address space of another Agent and read it's data. • Safety The use of a safe language such as Java provides some guarantees concerning the Safety. • Conditional-Access Most traditional operating systems deny or allow access. Via our Security architecture we are able to allow conditional resource access. E.g. one Agent can request more memory in order to execute additional tasks. The PlaceResourceManager contacts the NodeResourceManager and requests e.g. more memory. The Enforcement Engine checks to see if this request complies with the current policy and if so more memory is dynamically assigned to a Place for a certain time.
7 Place-Oriented Virtual Private Networks – An Application Scenario An application scenario is introduced here in order to show the flexibility/advantages ABSANI offers. We introduce the concept of Place-Oriented Virtual Private Networks (PO-VPNs). VPNs offer to Enterprises the opportunity to construct their own Network and administer it the way it eases their needs. The ABSANI architecture has been designed with this goal also in mind: The offer of Places which can be leased to 3rd party entities and managed by them. This of course assumes partitioning or multiplexing of available resources. An Enterprise can obtain for its needs a Place in strategically located Active Nodes, thus constructing a PO-VPN. As it can manage all policies/resources on the assigned Place, it has complete control (to the allowed limits set by the Node Operator) over the PO-VPN. Actually this looks like a distributed Agency spread over various nodes. A possible scenario is that an Enterprise wants to create a PO-VPN. As various providers would offer services in various prices it could be a benefit to chose among the best offers not overall (as an offered packet) but partially (specific service selection). What we mean exactly with that? Suppose that a provider offers high speed processing power (fast CPUs) but limited storage capacity to the node. A second one offers better price on huge amount of storage but low processing power. The user should be able to combine both. E.g. execute the code in the fast node and have results stored on the slow node with the extended storage. That is of course difficult to realize/implement at the moment but we would like to leave this possibility open as this could be a future evolutionary step. In any case those kind of scenarios are supported by the Security Architecture we have presented here.
344
8
Stamatis Karnouskos et al.
Summary and Conclusions
An agent-based Security architecture has been presented. ABSANI uses the mobile agent technology and the benefits that derive from it in order to apply them in the Active Network domain. It has been demonstrated with an agent-based active node architecture how agents can empower the current passive routers and transform them to Active Nodes. We have placed the security architecture within this AN nodes. We have showed that benefits such as simplicity, scalability, flexibility, interoperability, performance and safety have been addressed successfully. Agent community has invested a lot of effort trying to make mobile code secure and flexible and Active Networks' objectives can be achieved via our approach. This approach provides a dynamic, extensible, configurable, and interoperable way to secure Active Networks. Also with the use of Java we can guarantee a high level of safeness. Furthermore by combining approaches (in a Lego-like way) we enhance not only the interoperability of our architecture but as well its lifetime. ABSANI offers a security scheme dealing successfully with the current needs of secure active networking, and will continue its fast evolvement as long as agent technology keeps on advancing.
References 1.
C. M. Adam, J.-F. Huard, A. A. Lazar, K.-S. Lim, M. Nandikesan, E. Shim, “Proposal for Standardization of ATM Binding Interface Base 2.1”, submitted to P1520, January 1999. 2. Mobile Agent System Interoperability Facility, URL : http://www.fokus.gmd.de/research/cc/ima/masif/ 3. OMG Web Site : http://www.omg.org/ 4. The KeyNote Trust-Management System. URL : http://www.cis.upenn.edu/~angelos/keynote.html 5. Simple Public Key Infrastructure. URL : http://www.ietf.org/html.charters/spki-charter.html 6. F. Yellin, „Low Level Security in Java“, 1997, deleted from www.javasoft.com. 7. B. Venners, „Security and the Class Verifier“, JavaWorld, October 1997, http://www.javaworld.com/javaworld/jw-10-1997/ 8. B. Venners, „Security and the Class Loader Architecture“, JavaWorld, Septermber 1997, http://www.javaworld.com/javaworld/jw-09-1997/ 9. B. Venners, „Java Security: How to Install the Security Manager and Customize your Security Policy“, JavaWorld, November 1997, http://www.javaworld.com“javaworld/jw-11-1997/ 10. IKV++ GmbH - Grasshopper URL : http://www.ikv.de/products/grasshopper
Author Index Kulkarni, A.
Alexander, D.S. 1 Athanas, P.M. 180 Bäumer, C. 109 Bharghavan, V. 37 Bhattacharjee, S. 315 Bhushan, B. 285 Briscoe, B. 48 Busse, I. 165, 330 Campbell, A. 13, 249 Cardoe, R 117 Chow, R. 274 Cohen, A. 132 Covaci, S. 165, 188, 241, 330
299
Lee, D.C. 180 Leichsenring, A. Lo Re, G. 97 Lupu, E. 73
165
Magedanz, T. 109 Marshall, I. 60, 188 Menage, P. 25 Midkiff, S.F. 180 Mihai, N. 144 Miki, K. 13 Minden, G. 299 Miyazaki, S. 241
Damianakis, K. 48 Delgrossi, L. 97 Denazis, S. 13 Di Fatta, G. 97 Donohoe, M. 188
Nakane, K. 241 Newman, R. 274 Norden, S. 212
Evans, J.
Rangarajan, S. 132 Raz, D. 220 Redlich, J.-P. 232 Rizzo, M. 48
Parkkila, S.
299
Faltings, B. 262 Ferrari, D. 97 Finney, J. 117 Frost, V. 299 Fry, M. 60 Gavalas, D. 199 Ghosh, A. 60 Greenwood, D.P.A.
199
Hall, J. 285 Hicks, M. 307 Hjálmtsý son, G. 315 Jones, M.T. 180 Juhola, A. 188 Karnouskous, S 330 Keromytis, A.D. 307 Kiriha, Y. 85
188
Scott, A.C. 117 Shavitt, Y. 220 Shepherd, W.D. 117 Sivakumar, R. 37 Sloman, M 73 Smith, J.M. 1 Sugauchi, K. 241 Suzuki, M. 232 Tassel, J. 48 Tschudin, C. 156 Vanecek, G. 144 Vanet, G. 85 Velasco, L. 60 Velte, T. 188 Venkitaraman, N. 37
346
Author Index
Vicente, J. 13, 249 Villela, D.A. 249
Yeh, J.-h. Yoshida, K.
Weinstein, S. 232 Willmott, S. 262 Wong, K.F. 212
Zhang, T.
274 241 241