Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6155
Burkhard Stiller Filip De Turck (Eds.)
Mechanisms for Autonomous Management of Networks and Services 4th International Conference on Autonomous Infrastructure, Management and Security, AIMS 2010 Zurich, Switzerland, June 23-25, 2010 Proceedings
13
Volume Editors Burkhard Stiller University of Zurich, UZH, Communication Systems Group, CSG Department of Informatics, IFI Binzmühlestrasse 14, 8050 Zurich, Switzerland E-mail:
[email protected] Filip De Turck Ghent University, IBBT, Department of Information Technology Gaston Crommenlaan 8, 9050 Gent, Belgium E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): C.2, D.2, H.4, C.2.4, D.4, D.1.3 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13
0302-9743 3-642-13985-X Springer Berlin Heidelberg New York 978-3-642-13985-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © IFIP International Federation for Information Processing 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
The International Conference on Autonomous Infrastructure, Management and Security (AIMS 2010) was a single-track event integrating regular conference paper sessions, tutorials, keynotes, and a PhD student workshop into a highly interactive event. The main goal of AIMS is to look beyond borders and to stimulate the exchange of ideas across different communities and among PhD students. AIMS 2010 collocated the International Summer School in Network and Service Management (ISSNSM 2010). This unique summer school offers hands-on learning experiences in network and service management topics, which requires attendees to work in practical on-site courses combined with preceding short tutorial-like teaching sessions. AIMS 2010––which took place during June 23–25, 2010, in Zürich, Switzerland and was hosted by the Communication Systems Group CSG, Department of Informatics IFI, of the University of Zürich UZH––followed the already established tradition of an unusually vivid and interactive conference series in terms of the fourth conference, after successful instantiations in Oslo, Norway 2007, Bremen, Germany 2008, and Enschede, The Netherlands 2009. AIMS 2010 focused especially on autonomous management aspects of modern networks and their services. The set of mechanisms, peer-to-peer-based schemes, scalability aspects, and autonomous approaches are of major interest. In particular the design, monitoring, management, and protection of networked systems in an efficient, secure, and autonomic manner are key to commercially viable and successful networks and services. Like all of its predecessors, AIMS 2010 and ISSNSM 2010 provided a single-track and five-day program, which proved especially suitable for stimulating the interaction with and the active participation of the conference’s audience. Summarized briefly, the summer school started with a practical introduction to 6LoWPAN, a programming of IPv6-based Wireless Sensor Networks with Contiki. Secondly, the embedded automation systems and device manageability instrumentation was experimented with. And thirdly, traffic mining for packets in an IP network was investigated. The AIMS conference continued with a keynote presentation delivered by Metin Feridun, internationally recognized as a distinguished research fellows in the area. The three technical sessions––covering peer-to-peer, autonomous management, and management mechanisms––of AIMS 2010 included a total of nine full papers, which were selected after a thorough reviewing process out of a total number of 27 submissions. Furthermore, five short papers complemented the program on interesting management aspects. Additionally, the AIMS PhD workshop was a venue for doctoral students to present and discuss their research ideas, as well as, and most importantly, to obtain feedback from the AIMS audience about their investigation carried out so far. This year, the PhD workshop was organized in two PhD paper sessions, where selected investigations were presented and discussed. All PhD papers included in this volume describe the current state of these investigations, including their research problem statements,
VI
Preface
investigation approaches, and outline those results achieved so far. This year, 22 PhD papers were submitted to the PhD workshop. Each of them was assigned for review to members of the PhD Workshop’s Technical Program Committee, composed of experienced researchers in the field of network and services operations and management, resulting in 11 high-quality papers included in these proceedings and presented during the PhD Workshop. The present volume of the Lecture Notes in Computer Science series includes those papers presented at AIMS 2010 and the overall resulting final program of AIMS 2010 and ISSNSM 2010 demonstrates again the European scope of this conference series, while including papers mainly from Europe. Many thanks go to the two PhD Workshop Co-chairs, Lisandro Granville, University Federal do Rio Grande do Sul, Brazil and Aiko Pras, University of Twente, The Netherlands, for organizing the PhD Workshop sessions, and to the two ISSNSM 2010 Co-chairs, Cristian Morariu, University of Zürich, Switzerland, and Martin Waldburger, University of Zürich, Switzerland, for setting up and organizing the summer school topics and test-bed hardware. Special thanks go to the local organizers for enabling the logistics and hosting of the AIMS 2010 and the ISSNSM 2010 event, especially to the local Co-chairs Hasan and Evelyne Berger, the Web master Andrei Vancea, and the additional local support team of Peter Racz, Fabio Hecht, and Guilherme Machado, all from CSG@IFI, University of Zürich, Switzerland. Finally, the editors would like to address their thanks to Springer and in particular Anna Kramer, for a smooth cooperation on finalizing these proceedings. Additionally, special thanks go to the support of the European FP6 NoE, “EMANICS” (No. 26854) and to the FP7 project, “SmoothIT” (No. 216259).
April 2010
Burkhard Stiller Filip De Turck
Organization
General Chair Burkhard Stiller
University of Zürich, Switzerland
Program TPC Co-chairs AIMS 2010 Filip De Turck Burkhard Stiller
Ghent University IBBT, Belgium University of Zürich, Switzerland
PhD Student Workshop Co-chairs Lisandro Granville Aiko Pras
University Federal do Rio Grande do Sul, Brazil University of Twente, The Netherlands
Summer School Co-chairs Cristian Morariu Martin Waldburger
University of Zürich, Switzerland University of Zürich, Switzerland
Steering Committee Mark Burgess Jürgen Schönwälder Aiko Pras Olivier Festor David Hausheer Rolf Stadler
HIO, Norway Jacobs University Bremen, Germany University of Twente, The Netherlands INRIA Nancy-Grand Est, France University of Berkeley, USA KTH Royal Institute of Technology, Sweden
Technical Program Committee AIMS 2010 Claudio Bartolini Torsten Braun Isabelle Chrisment Alva Couch Hermann De Meer Gabi Dreo Rodosek Thomas Dübendorfer
HP Labs, USA University of Bern, Switzerland LORIA University of Nancy, France Tufts University, USA University of Passau, Germany University of Federal Armed Forces Munich, Germany Google, Switzerland
VIII
Organization
Metin Feridun Olivier Festor Lisandro Granville David Hausheer Georgios Karagiannis Alexander Keller Antonio Liotta Emil Lupu Hanan Lutfiyya Jean-Philippe Martin-Flatin Jiri Novotny Aiko Pras Bruno Quoitin Ramin Sadre Jürgen Schönwälder Joan Serrat Rolf Stadler Radu State John Strassner Robert Szabo Kurt Tutschku Tanja Zseby
IBM Research, Switzerland INRIA Nancy-Grand Est, France University Federal do Rio Grande do Sul, Brazil University of Berkeley, USA University of Twente, The Netherlands IBM Global Technology Services Eindhoven University of Technology, The Netherlands Imperial College, UK University of Western Ontario, Canada NetExpert, Switzerland Masaryk University, Czech Republic University of Twente, The Netherlands University of Mons, Belgium University of Twente, The Netherlands Jacobs University Bremen, Germany Universitat Politecnica de Catalunya, Spain KTH Royal Institute of Technology, Sweden University of Luxemburg, Luxemburg Postech, Korea Budapest University of Technology and Economics, Hungary University of Vienna, Austria Fraunhofer FOKUS, Germany
PhD Student Workshop Committee Olivier Festor Lisandro Granville David Hausheer Emil Lupu Hanan Lutfiyya George Pavlou Aiko Pras Gabi Dreo Rodosek Ramin Sadre Joan Serrat Burkhard Stiller
INRIA Nancy-Grand Est, France University Federal do Rio Grande do Sul, Brazil University of Berkeley, USA Imperial College, UK University of Western Ontario, Canada University College London, UK University of Twente, The Netherlands University of Federal Armed Forces Munich, Germany University of Twente, The Netherlands Universitat Politecnica de Catalunya, Spain University of Zürich, Switzerland
Reviewers A set of very detailed and constructive reviews for papers submitted to AIMS 2010 was provided by all of our reviewers, which correspond to the Technical Program Committee members as well as the PhD Workshop Committee as stated above and
Organization
IX
additionally Stefan Beauport, Archi Delphinanto, George Exarchakos, Jeroen Famaey, Tiago Fioreze, Alberto Gonzalez, Fabio Victora Hecht, Steven Latré, Nikolay Melnikov, Vlado Menkovski, Cristian Morariu, Anuj Sehgal, Anna Sperotto, Iyad Tumar, and Martin Waldburger. Therefore, it is of great pleasure to the Technical Program Co-chairs and the PhD Student Workshop Co-chairs to thank all those reviewers for their important and valuable work.
Table of Contents
Keynote Facing Complexity in Systems Management . . . . . . . . . . . . . . . . . . . . . . . . . Metin Feridun
1
P2P-Based Systems Modeling User Behavior in P2P Live Video Streaming Systems through a Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ihsan Ullah, Gr´egory Bonnet, Guillaume Doyen, and Dominique Ga¨ıti OMAN – A Management Architecture for P2P Service Overlay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adriano Fiorese, Paulo Sim˜ oes, and Fernando Boavida Towards a P2P-Based Deployment of Network Management Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rafik Makhloufi, Gr´egory Bonnet, Guillaume Doyen, and Dominique Ga¨ıti
2
14
26
Autonomous Management On the Combined Behavior of Autonomous Resource Management Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siri Fagernes and Alva L. Couch
38
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stein Desmet, Bruno Volckaert, and Filip De Turck
50
An Autonomic Testing Framework for IPv6 Configuration Protocols . . . . Sheila Becker, Humberto Abdelnur, Radu State, and Thomas Engel
65
PhD Workshop: Overlays and Non-conventional Network Infrastructures Researching Multipath TCP Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Henna Warma and Heikki H¨ amm¨ ainen Report- and Reciprocity-Based Incentive Mechanisms for Live and On-Demand P2P Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Victora Hecht and Burkhard Stiller
77
81
XII
Table of Contents
Model-Driven Service Level Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anacleto Correia and Fernando Brito e Abreu
85
Managing Risks at Runtime in VoIP Networks and Services . . . . . . . . . . . Oussema Dabbebi, Remi Badonnel, and Olivier Festor
89
Towards Dynamic and Adaptive Resource Management for Emerging Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daphn´e Tuncer, Marinos Charalambides, and George Pavlou Adaptive Underwater Acoustic Communications . . . . . . . . . . . . . . . . . . . . . Anuj Sehgal and J¨ urgen Sch¨ onw¨ alder
93 98
Short Papers Probabilistic Fault Diagnosis in the MAGNETO Autonomic Control Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pablo Arozarena, Raquel Toribio, Jesse Kielthy, Kevin Quinn, and Martin Zach Modelling Cloud Computing Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . Marianne Hickey and Maher Rahmouni Towards an Autonomic Network Architecture for Self-healing in Telecommunications Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingxian Lu, Christophe Dousson, Benoit Radier, and Francine Krief
102
106
110
LearnIT: Enhanced Search and Visualization of IT Projects . . . . . . . . . . . Maher Rahmouni, Marianne Hickey, and Claudio Bartolini
114
Strategies for Network Resilience: Capitalising on Policies . . . . . . . . . . . . . Paul Smith, Alberto Schaeffer-Filho, Azman Ali, Marcus Sch¨ oller, Nizar Kheir, Andreas Mauthe, and David Hutchison
118
Management Mechanisms Automatic Link Numbering and Source Routed Multicast . . . . . . . . . . . . . Visa Holopainen, Raimo Kantola, Taneli Taira, and Olli-Pekka Lamminen
123
Mining NetFlow Records for Critical Network Activities . . . . . . . . . . . . . . Shaonan Wang, Radu State, Mohamed Ourdane, and Thomas Engel
135
Implementation of a Stream-Based IP Flow Record Query Language . . . Kaloyan Kanev, Nikolay Melnikov, and J¨ urgen Sch¨ onw¨ alder
147
Table of Contents
XIII
PhD Workshop: Security, Network Monitoring, and Analysis Towards Flexible and Secure Distributed Aggregation . . . . . . . . . . . . . . . . Kristj´ an Valur J´ onsson and Mads F. Dam
159
Intrusion Detection in SCADA Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rafael Ramos Regis Barbosa and Aiko Pras
163
Cybermetrics: User Identification through Network Flow Analysis . . . . . . Nikolay Melnikov and J¨ urgen Sch¨ onw¨ alder
167
Distributed Architecture for Real-time Traffic Analysis . . . . . . . . . . . . . . . Cristian Morariu and Burkhard Stiller
171
Scalable Service Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idilio Drago and Aiko Pras
175
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179
Facing Complexity in Systems Management Metin Feridun IBM Research — Zurich Säumerstrasse 4 CH—8803 Rüschlikon, Switzerland
[email protected]
Abstract. Emerging technologies such as cloud computing, increased use, proliferation and mobility of powerful end-user devices, and the migration of applications to the Internet, are creating new and exciting challenges to the management of IT (Information Technology) infrastructures. Massive scale and distribution of IT resources, the expectation of high availability of Internetbased services, and heterogeneity of resources are prominent examples of complexity system administrators encounter today. New approaches are needed for these new (and in many cases old) problems. This keynote will survey some of these systems management issues and the emerging approaches to solving them.
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, p. 1, 2010. © IFIP International Federation for Information Processing 2010
Modeling User Behavior in P2P Live Video Streaming Systems through a Bayesian Network Ihsan Ullah, Gr´egory Bonnet, Guillaume Doyen, and Dominique Ga¨ıti ERA/Institut Charles Delaunay – UMR 6279 Universit´e de Technologie de Troyes 12 rue Marie Curie – 10000 TROYES – France {ihsan.ullah,bonnet,doyen,gaiti}@utt.fr
Abstract. Live video streaming over a Peer-to-Peer (P2P) architecture is promising due to its scalability and ease of deployment. Nevertheless, P2P-based video streaming systems still face some challenges regarding their performance. These systems are in fact overlays of users who control peers. As peers depend upon each other for receiving the video stream, the user behavior has an impact over the performance of the system. We collect the user behavior studies over live video streaming systems and identify the impact of different user activities on the performance. Based on this information, we propose a Bayesian network that models a generic user behavior initially and then adapts itself to individuals through learning from observations. We validate our model through simulations. Keywords: P2P, IPTV, user behavior, Bayesian networks.
1
Introduction
P2P-based live video streaming has become popular in recent years. Unlike IP multicast, it does not require any major change in the current network infrastructure. Moreover, it reduces the need of deploying streaming servers as compared to the Client/Server (C/S) architecture, which requires an increase in the number of servers as the number of users grows. P2P approach allows cooperating end-hosts (called peers or nodes) to self-organize into an overlay network. Peers in these networks share their computing and upload resources by caching and relaying the video content to each other. Currently, several P2P video streaming systems have been deployed on the Internet and they have attracted a large number of users. Nevertheless, these systems still suffer from some performance problems such as startup and playback delays [1,2]. P2P systems are in fact networks of users who control peers. Since peers depend upon each other, activities of users have a direct impact over the performance of these systems. For example, the departure of a peer disrupts the stream availability to dependant peers. Similarly, the low quality stream received by one peer is forwarded to its descendants. Therefore, due to the importance of the user behavior in P2P streaming systems, several intensive measurement studies B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 2–13, 2010. c IFIP International Federation for Information Processing 2010
Modeling User Behavior in P2P Live Video Streaming Systems
3
have been performed over them. Based on the measurements, some approaches have been proposed for performance improvements. But on one hand, these approaches consider some aspects of user behavior while ignoring others. On the other hand, measurement studies provide insights into an average user behavior that can be used for deducing a generalized model. However, a global model is not suitable for users having different preferences and interests. Therefore, an individually adaptable model is required. In this paper, we collect the information provided in measurement studies. We identify metrics of a user behavior and their impact on one another. With the help of this information, we model the user behavior through a Bayesian network. Our network encodes the available knowledge about the user behavior and can learn an individual user behavior, as new observations become available to it. This network can be potentially used by video streaming service providers, network operators and peers in a P2P IPTV systems. It will help there in carrying out decisions that improve the performance. The remainder of the paper is organized as follows. Section 2 summarizes the related work. In section 3, we analyze the measurement studies over video streaming systems for identifying the user behavior metrics, their impacting factors and relationships among them. In section 4, we propose a Bayesian network for modeling a user behavior and describe results of our preliminary experiments. Finally, section 5 draws conclusions and gives directions for the future work.
2
Related Work
Work on the user behavior in P2P live video streaming systems is mostly in the form of measurement studies. Moreover, some measurements have been extended with propositions for performance improvement. Tang et al. [3,4] first study a C/S and a P2P video streaming system and observe that a user who has spent more time watching a channel is statistically willing to stay longer. Based on this observation, they propose an approach that enables a joining peer to choose a provider peer that is present since a longer time. Wang et al. [5], after justifying the importance of stable peers, propose a mechanism for their identification, based on their elapsed time in the current session. The identified stable peers are put in the backbone for minimizing the impact of churn. Nevertheless, both the above mentioned approaches consider the stability of a peer that is one metric of user behavior. Moreover, for the estimation of stability they take into account only its one impacting factor (session’s elapsed time) while ignoring others such as streaming quality and channel popularity. Liu et al. [6] first measure a P2P IPTV system and identify the impacting factors of users’ session durations and bandwidth contribution ratios. They propose a mechanism for predicting the longevity of peers and their bandwidth contribution ratios. Since their measurement analyzes all users together, their model expects all users to behave in a similar way which is not the case. Horovitz et al. [7,8] propose a machine learning approach based on Support Vector Machines (SVM) for detecting actively the load in the uplink of provider
4
I. Ullah et al.
peers. This approach considers only one metric of the user behavior (i.e. upload bandwidth). A shortcoming of the above approaches is that they either do not consider all metrics of the user behavior and/or use static models which expect all users to have a similar behavior. Concerning our adopted approach, Bayesian networks have been also used in other domains for anticipating user behavior. For example, Wenzhi et al. [9] use Bayesian networks for predicting the future behavior of a mobile user in an intelligent Location-Based Services (LBS) publish/subscribe system. Laskey et al. [10] present an approach based on Bayesian networks to detect threatening user behavior in an information system.
3
Analysis of User Activities in Live Video Streaming
Since a user is not interested in the underlying mechanism of video streaming, we collect the information provided in studies over P2P, C/S and telco-managed IPTV systems to understand a user behavior. These studies measure different metrics of user behavior and provide valuable insights. In this section, we present the metrics of a user behavior and their impact on one another according to the observations given in the measurement studies. 3.1
User Behavior Metrics and Their Impact on Each Other
After a synthesis of measurement studies, we collect components and impacting factors of the user behavior that are observed in these studies. They are timeof-day, user arrival/departure rates, session durations, user population, surfing (activity of channel browsing) probability, bandwidth contribution ratios, channel type, channel popularity, elapsed time, streaming quality (in terms of the buffer level), failure rate (the departure of a peer before starting the playback) and playback delay (the lag between the time of a packet generation at the source and the playback time at peers [11]). Based on the observations given in measurements, we discuss them altogether and identify causal relationships among them. Time-of-day. High arrival/departures rates are critical in determining the scalability of the system. In case of high arrival rates, the newly joined peers must find the provider peers that can immediately supply the video stream to them. Similarly, higher departure rates can disrupt the stream availability to dependant peers. Jia et al. [12] observe that peers join and leave in groups indicating the start and end of a TV program. Hei et al. [1] observe high joining and leaving rates at peak times. Moreover, they find higher departure rates at the end of a program. Similarly, a study over telco-managed IPTV system [13] find that STBs (Set-Top-Boxes) turning on and off events occur in larger number during certain times of a day. All these observations reveal an impact of time-of-day on user arrival departure rates.
Modeling User Behavior in P2P Live Video Streaming Systems
5
The session duration of a peer determines its stability which is important for the continuity of stream to consumer peers. Liu et al. [14] explore the timeof-day effect on session durations and they observe that peers joining in peak hours stay longer watching the same channel. Similarly, peers have longer session durations within evening leisure times [6]. On the other side, Veloso et al. [15] argue that time-of-day has no impact on the session duration while day-of-week has an impact which is in contrast to the results given by Liu et al. [6]. One reason may be the difference in the types of applications they study since Liu et al. [6] analyze P2P live video streaming system while Veloso et al. [15] study a C/S system which stream both live audio and video content. Concerning user population, measurement studies [1,13,6,15,16,17,18,19] consistently show two peaks during a day one around noon time and another during early night. This clearly indicates an impact of time-of-day on the user population. Finally, according to Cha et al. [19] and Qiu et al. [13] the surfing probability is also impacted by the time-of-day. They observe a sharp increase in surfing probability after specific time periods because of commercials or end of certain programs. Channel type. Channel type impacts surfing probability and departure rate. Cha et al. [19] observe that surfing probability changes with the type of the channel. For sports and news channels it is higher than other types. Moreover, Hei et al. [1] observe that users watching a movie channel possess the behavior of batch departures. However, they do not observe the same behavior for the users watching other TV programs. Channel popularity. Channel popularity impacts session duration, user population, availability of neighbor peers and surfing probability. We discuss them one by one. Analysis studies [1,14,6] report longer session durations for popular channels and shorter ones for unpopular channels. Concerning user population Hei et al. [1] find that popular channels get more users than the unpopular ones. Moreover, while monitoring a peer, they observe that it faced difficulty in finding partner peers during watching a less popular TV channel. Since unpopular channels get less users, finding partner peers becomes difficult. Regarding surfing probability, Cha et al. [19] observe that users tend to remain connected when they join a popular channel hence reducing the surfing probability. Streaming quality. Streaming quality impacts session duration and bandwidth contribution ratios. Liu et al. [6,14] find that a peer receiving a good buffer level initially tends to stay longer in that session. Similarly, a peer initially receiving a good quality (buffer level) produces better bandwidth ratios [6] through contributing more upload bandwidth. Elapsed time. Elapsed time impacts the session duration. Studies [3,19] show that statistically, the time spent on watching a channel is positively correlated to the remaining online time on that channel. Arrival/departure rates. Arrival/departure rates impact streaming quality and failure rate (the departure of a peer before it starts playback). According to
6
I. Ullah et al.
Liu et al., the streaming quality is impacted by the flash crowd that downgrades at peak hours of the day [6]. Since streaming quality is measured in terms of the buffer level, its degradation means that peers will face longer startup delay leading to a high failure rate as observed by Li et al. [17]. The stated reasons by authors are the random partnership making algorithms and a high percentage of peers behind NAT and/or firewalls. Since random partnerships do not prefer one potential parent over another, during high arrival rates a peer can choose a parent peer that itself has joined the system recently and unable to relay the content to its child peers. Similarly, high departure rates impact the stream continuity to child peers. User population. User population impacts playback delay and bandwidth contribution ratios. Jia et al. [12] observe that the average delay is correlated with the number of online peers. It remains lower with the smaller number of online peers and vice versa. Li et al. [20] observe a strong correlation between the average bandwidth contribution ratios and the size of the system. It may be due to the fact that in a larger community, peers get more chance to contribute than in a smaller one.
Time
Content
Time-of-day Day-of-week
Channel popularity Channel type
Population
QoE
Stability
Arrival rate Departure rate User population Neighbor peers
Delay Streaming quality Bandwidth contribution ratio
Session duration Surfing probability Failure rate Elapsed time
Fig. 1. An abstract causal graph of user activities in live video streaming systems
To provide an abstract view of the findings given in the measurement studies, we combine the related metrics into a group and depict the resulting abstract graph in Figure 1. The arrows from one group to another shows the impact of one group on another.
4
Modeling User Activities with a Bayesian Network
To model the user behavior in a P2P live video streaming system, we use Bayesian networks also called belief networks.
Modeling User Behavior in P2P Live Video Streaming Systems
7
A Bayesian network [21] is a pair (G, P ), where G = (V, E) is a directed acyclic graph of vertices V and edges E. Vertices/nodes represent random variables and directed edges show direct dependencies between variables. Let U = {X1 , ..., Xn } be a set of random variables, then P is a set of conditional probability functions p(Xi |Xj , Xj ∈ parents(Xi )). A directed link in G from Xi to Xj means Xi is the parent of Xj and Xj is the child of Xi . Child nodes are directly dependent on parent nodes. Variables having no parent are independent. In a Bayesian network each variable is conditionally independent from all its descendants given its parents. A variable can have several states, each of them with a probability value. Each node has a Conditional Probability Table (CPT) that relates it to its parent nodes. A CPT contains probability values for all combinations of the states of a variable and states of its parents. Initial probabilities are called prior probabilities, which are deduced from data or provided by an expert. With the availability of new observations, this prior knowledge can be updated to posterior knowledge. The set P defines a set of joint probability distribution over U as shown in (1). P (U ) = P (X1 , ..., Xn ) =
n
p(Xi |Xj , Xj ∈ parents(Xi ))
(1)
i=1
A Bayesian network can compute the probability of a variable Xi to be in state Sk given that Xj is in state Sl , where Sk ∈ states of Xi , Sl ∈ states of Xj and Xj ⊆ U \ Xi . Since measurement studies analyze user behavior generally for all users, an individual can behave differently. Using this information in a global model will expect all users to behave in a similar way. Therefore, we make use of Bayesian networks that enable to combine expert’s knowledge (knowledge deduced from the measurements) with new observations (observations received from the system). Indeed, in the absence of any observation, our network is general. It updates itself for each user with new observations through learning. Based on the synthesis of measurement studies, the Bayesian network we propose is depicted in Figure 2. It involves 12 nodes each of them representing a user behavior metric or an impacting factor. To be able to carry out a preliminary evaluation through a simplified model, we discretize each variable into two states. In Figure 2, the name of the node is given on top and its states are described below it with prior percent probability values. In case of nodes having parents, an average of all the given probabilities for each state is shown. Since, for the moment we do not have access to any real trace of a live video streaming system, we reported the prior probabilities from figures given in measurement studies. We explain the discretization and approximation of prior probabilities in the following. For instance, channel popularity has two states namely popular and unpopular. To assign them the prior probabilities, we consider the 80−20 rule given in [19] that states that 80% of user requests come for the 20% most popular videos. Hence, a user connecting to a channel has 80% chance to connect to a popular channel. Similarly, user activities are higher from 12 noon to 00 midnight, therefore we divide
8
I. Ullah et al.
Fig. 2. Our proposed bayesian network for user behavior1
the whole day into two parts namely T 1 = 12 − 00 and T 2 = 1 − 11 hours. We divide channel types into reality and fiction, where the former represents channels like news, music and sports while the later represents movies and serials. User behavior in reality type is different from fiction type where the surfing probability decreases [19]. Since no clear information is available about their distributions in measurement studies, we assign them a uniform distribution. Arrival rate has two discrete states (high and low) that is dependant of time-of-day. Since the probability of T 1 is slightly higher than T 2, high arrival rate has the same. Session durations are discretized into two simple states namely stable and unstable. To decide the stability of a peer, we use the concept given by Wang et al. [5], where they term a peer as stable if it stays up to 40% of the observed period. In our Bayesian network, session duration has five parent nodes and we assign them prior probabilities approximated from the user behavior studies. The overall average is biased to unstable session durations which is consistent with the measurement studies. We set the states of departure rate and user population in the same way. For elapsed time, we consider the study [3], which states that about 50% of user sessions last for less than 200 seconds. So we put a threshold ‘T h = 200 seconds’ to decide the cut off between newly arrived peers and peers who have elapsed more time in the system. Since, we cannot get any information about the probabilities for the states of streaming quality, delay, neighbor peers and bandwidth contribution ratios, we choose approximated prior 1
The screenshots are taken from Netica (http://www.norsys.com/)
Modeling User Behavior in P2P Live Video Streaming Systems
9
probabilities for them. However, we do not consider the impact of NAT/Firewall on bandwidth contribution ratio. The process of estimating probabilities for a variable is called inference. If the parents of a variable are given, probabilities of its values can be estimated. As an example, if the probability of user population (U P = Large) is required to be estimated, given its parents time-of-day (T oD = T1 ) and channel popularity (CP = P opular), the following process is carried out. P (U P = Large|T oD = T1 , CP = P opular)
4.1
=
P (U P = Large, T oD = T1 , CP = P opular) P (T oD = T1 , CP = P opular)
=
N umber of samples with (U P = Large, T oD = T1 , CP = P opular) N umber of samples with (T oD = T1 , CP = P opular)
Experimental Evaluation
To evaluate our proposed network, we use the Bayes Net Toolbox (BNT) [22]. We carry out the evaluation process for two cases. In the first case, we test the global model that we construct with prior conditional probabilities deduced from the studies and perform queries to observe certain states of some dependant variables. In the second case, we let the network learn its parameters from a data set representing an individual behavior and perform similar queries as in the first case. In both cases we provide evidences to independent nodes and observe results returned by the network for dependant ones. Provided evidences for time-of-day consist in a representative day divided into two parts that are high usage time and low usage time. For channel popularity and type, we select a sample TV channel2 where the programs schedule is given. Also a list of popular programs is given. We assign popularities to each hour according to this information. Moreover for channel type, we find only two hours as reality type while the rest as a fiction type. Concerning the elapsed time, we provide uniform inputs for its two states. The evidences provided to each of these four nodes are shown in Figure 3 (a). We query our network for computing the states of dependant variables each time an evidence is provided to independent ones as discussed above. Some of the obtained results are shown in Figure 3 (b). Starting from the top-left plot, we describe them one by one. P (AR = High) gives a two valued result, showing a low probability of high arrival rate in low usage hours that clearly indicates its dependency on time-of-day. Similarly, P (DR = High) changes according to the values of its parent variables (time-of-day and type of program). The dependance of user 2
http://www.mtv.com/
10
I. Ullah et al. Channel popularity
Time of day
T2 (12−24)
T1 (1−12)
1
3
5
7
9
Popular
Unpopular
11 13 15 17 19 21 23 Time
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
1
3
5
7
9
11 13 15 17 19 21 23 Time
ET>=Th Elapsed time
Content type
Fiction
1
Reality
1
3
5
7
9
ET
|
11 13 15 17 19 21 23 Time
(a) 1 P(DR=High)
P(AR=High)
1 0.8 0.6 0.4 0.2 0
1
3
5
7
9
P(SQ=Good)
P(UP=Large)
0.6 0.4 0.2 1
3
5
7
9
0.2
0.8 0.6 0.4 0.2 0
11 13 15 17 19 21 23 Time
1 P(BCR=Good)
1 P(SD=Stabe)
0.4
1
0.8
0.8 0.6 0.4 0.2 0
0.6
0
11 13 15 17 19 21 23 Time
1
0
0.8
1
3
5
7
9
0.8 0.6 0.4 0.2 0
11 13 15 17 19 21 23 Time
(b)
1 P(DR=High)
P(AR=High)
1 0.8 0.6 0.4 0.2 0
1
3
5
7
9
P(SQ=Good)
P(UP=Large)
0.6 0.4 0.2 1
3
5
7
9
0.2
0.8 0.6 0.4 0.2 0
11 13 15 17 19 21 23 Time
1 P(BCR=Good)
1 P(SD=Stabe)
0.4
1
0.8
0.8 0.6 0.4 0.2 0
0.6
0
11 13 15 17 19 21 23 Time
1
0
0.8
1
3
5
7
9
0.8 0.6 0.4 0.2 0
11 13 15 17 19 21 23 Time
(c)
AR: Arrival rate; DR: Departure rate; UP: User population; SQ: Streaming quality; SD: Session duration; BCR: Bandwidth contribution ratio.
Fig. 3. Evaluation of our network (a) Evidences given to the network ; (b) Results returned by the global network ; (c) Results returned by the network after learning
Modeling User Behavior in P2P Live Video Streaming Systems
11
population on time-of-day and channel popularity is clear from P (U P = Large), which changes accordingly. Results regarding streaming quality are interesting. Since it is negatively correlated with the arrival and departure rates which are high during peak usage hours, results suggest that users will get low quality during these hours. It shows the importance of handling churn to minimize its effect over the performance. Concerning stability of peers, we consider P (SD = Stable) equal or greater than 0.5 as a stable peer. Results computed by our proposed network are consistent with the study of Wang et al. [5], where they found 5.5% to 15.5% stable peers during different analyses. Bandwidth contribution ratios are relatively smoother because they depend upon two contrasting variables (i.e. user population and streaming quality). To evaluate the network for learning an individual user behavior, we create a training data set from the information given in the measurement studies. It contains 10 samples while there is a possibility of 4096 samples. We let the network learn its parameters from the given data set. After learning we perform the same process we did previously i.e. providing the evidence variables and observing the investigated ones. We provide the same set of evidence variables we did in the previous example. The results we get are depicted in Figure 3 (c). A comparison with Figure 3 (b) indicates that the departure rates are less impacted by timeof-day. Similarly, user population is less impacted by channel popularity. We can notice similar differences in streaming quality, peer stability and bandwidth contribution ratios. It shows that the network can learn a user behavior from limited observations and can adapt to individual behaviors. 4.2
Applications
Our Bayesian network integrates all identified user behavior metrics in such a way that through inference an unobserved metric can be estimated from other fully or partially observed metrics. For example, user population can be estimated from time-of-day and/or channel popularity. Conversely, channel popularity can be inferred from user population and/or session duration. This enables our Bayesian network to be used by video streaming providers, Content Delivery Networks (CDNs), telco-managed IPTV and P2P IPTV systems. A video streaming provider can use this network for anticipating high arrival rates and user population that can help to manage their resources efficiently. Similarly, a CDN providing live video streams can anticipate the increase in number of users in a particular region and can proactively plan the assignment of servers accordingly. Some proposition exist that state that peering Set-TopBoxes (STBs) for providing the rewinding facility will help to reduce the load on the network [23]. In that scenario, our approach can help in predicting the on-time of an STB. P2P IPTV systems can use it both on the source side and users side. On one hand, the source can decide to accept a child peer which is stable and contributes good bandwidth ratios. On the other hand, its integration into peers can help them react proactively and carry out decisions that improve the overall performance. For example, a peer while choosing a stream provider can consider the
12
I. Ullah et al.
estimated bandwidth contribution ratios and streaming quality of the potential providers. Moreover, churn can be managed in an improved way.
5
Conclusion and Future Work
User behavior in P2P live video streaming systems plays a critical role in the performance of these systems. That is why it is extensively studied in recent years. Nevertheless, the proposed approaches are static and model all users in a similar way. In fact, individuals behave differently according to their interests and habits. Therefore, in this paper, we analyzed the measurement studies on video streaming systems and modeled the relationships among different metrics of the user behavior. After getting a generalized view of causal relationships of various user activities, we depicted them in an abstract causal graph. Based on the information given in the measurement studies, we proposed a Bayesian network that models the user behavior generally and then learns from observations to adapt itself to individual users. Our Bayesian network allows to infer an unobserved metric from other metrics, even if some of the other metrics are unobserved. This model can be used by video streaming service providers (centralized, CDN-based or telco-managed-IPTV) for better managing their resources and can also be integrated into P2P IPTV systems. It will enable peers to carry out useful decisions for performance improvement. We also validate our model through simulations. Currently, we are working on extending our model to multinomial variables for better estimations and predictions of user behavior metrics. We will also integrate it into a P2P network for enabling peers to make decisions based on the estimations of this model. Possibilities we are looking into are, whether to have one Bayesian network that is managed in a centralized way or to integrate one in each peer. Similarly, we are investigating types of decisions that will be carried out by the stream provider peers and stream consumer ones.
References 1. Hei, X., Liang, C., Liang, J., Liu, Y., Ross, K.W.: A measurement study of a large-scale P2P IPTV system. IEEE Transactions on Multimedia 9(8), 1672–1687 (2007) 2. Hei, X., Liu, Y., Ross, K.W.: Inferring network-wide quality in P2P live streaming systems. IEEE Journal on Selected Areas in Communications (JSAC) 25(9), 1640– 1654 (2007) 3. Tang, Y., Sun, L., Luo, J.G., Zhong, Y.: Characterizing user behavior to improve quality of streaming service over P2P networks. In: Zhuang, Y.-t., Yang, S.-Q., Rui, Y., He, Q. (eds.) PCM 2006. LNCS, vol. 4261, pp. 175–184. Springer, Heidelberg (2006) 4. Tang, Y., Sun, L., Luo, J.G., Yang, S.Q., Zhong, Y.: Improving quality of live streaming service over P2P networks with user behavior model. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4352, pp. 333–342. Springer, Heidelberg (2006)
Modeling User Behavior in P2P Live Video Streaming Systems
13
5. Wang, F., Liu, J., Xiong, Y.: Stable peers: Existence, importance, and application in Peer-to-Peer live video streaming. In: IEEE INFOCOM, pp. 1364–1372 (2008) 6. Liu, Z., Wu, C., Li, B., Zhao, S.: Distilling superior peers in large-scale P2P streaming systems. In: IEEE INFOCOM, pp. 82–90 (2009) 7. Horovitz, S., Dolev, D.: Maxtream: Stabilizing P2P streaming by active prediction of behavior patterns (2009) 8. Horovitz, S., Dolev, D.: Collabrium: Active traffic pattern prediction for boosting P2P collaboration. In: IEEE International Workshops on Enabling Technologies, pp. 116–121 (2009) 9. Wenzhi, C., Liubai, Zhenzhu, F.: Bayesian network based behavior prediction model for intelligent location based services. In: 2nd IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, pp. 1–6 (2006) 10. Laskey, K., Alghamdi, G., Wang, X., Barbara, D., Shackelford, T., Wright, E., Fitzgerald, J.: Detecting threatening behavior using Bayesian networks. In: Conference on Behavioral Representation in Modeling and Simulation (2004) 11. D´ an, G., Fodor, V.: An analytical study of low delay multi-tree-based overlay multicast. In: Workshop on Peer-to-Peer streaming and IP-TV, pp. 352–357 (2007) 12. Jia, J., Li, C., Chen, C.: Characterizing PPStream across internet. In: IFIP International Conference on Network and Parallel Computing Workshops, pp. 413–418 (2007) 13. Qiu, T., Ge, Z., Lee, S., Wang, J., Xu, J., Zhao, Q.: Modeling user activities in a large IPTV system. In: 9th ACM SIGCOMM conference on Internet measurement, pp. 430–441 (2009) 14. Liu, Z., Wu, C., Li, B., Zhao, S.: Why are peers less stable in unpopular P2P streaming channels? In: Fratta, L., Schulzrinne, H., Takahashi, Y., Spaniol, O. (eds.) IFIP-TC 6. LNCS, vol. 5550, pp. 274–286. Springer, Heidelberg (2009) 15. Veloso, E., Almeida, V., Meira, W.J., Bestavros, A., Jin, S.: A hierarchical characterization of a live streaming media workload. IEEE/ACM Transactions on Networking 14(1), 133–146 (2006) 16. Vu, L., Gupta, I., Liang, J., Nahrstedt, K.: Measurement and modeling of a largescale overlay for multimedia streaming. In: International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness, pp. 1–7 (2007) 17. Li, B., Xie, S., Keung, G.Y., Liu, J., Stoica, I., Zhang, H., Zhang, X.: An empirical study of the coolstreaming+ system. IEEE Journal on Selected Areas in Communications 25(9), 1627–1639 (2007) 18. Xie, S., Keung, G.Y., Li, B.: A measurement of a large-scale Peer-to-Peer live video streaming system. In: International Conference on Parallel Processing Workshops, pp. 57–62 (2007) 19. Cha, M., Rodriguez, P., Crowcroft, J., Moon, S., Amatriain, X.: Watching television over an IP network. In: 8th ACM SIGCOMM conference on Internet measurement, pp. 71–84 (2008) 20. Li, B., Xie, S., Qu, Y., Keung, G.Y., Lin, C., Liu, J., Zhang, X.: Inside the new Coolstreaming: Principles, measurements and performance implications. In: IEEE INFOCOM, pp. 1031–1039 (2008) 21. Jensen, F.V.: Bayesian Networks and Decision Graphs (2001) 22. Murphy, K.P.: The Bayes Net Toolbox for MATLAB. In: Computing Science and Statistics, vol. 33 (2001) 23. Cha, M., Rodriguez, P., Moon, S., Crowcroft, J.: On next-generation telco-managed P2P TV architectures. In: IPTPS (2008)
OMAN – A Management Architecture for P2P Service Overlay Networks Adriano Fiorese1,2 , Paulo Sim˜ oes1 , and Fernando Boavida1 1
Centre for Informatics and Systems of the University of Coimbra - CISUC Department of Informatics Engineering - DEI University of Coimbra - UC {fiorese,psimoes,boavida}@dei.uc.pt 2 Department of Computer Science - DCC Santa Catarina State University - UDESC 890233-100 Joinville, SC, Brazil
[email protected]
Abstract. In a world where networking services are increasingly being provided by service overlay networks, management of these services at overlay level is becoming crucially important. This paper presents an architecture for services management in P2P Service Overlay Networks (SON). The architecture, named OMAN, takes into account the formation of the P2P SON comprising several different service providers belonging to several different network domains. It uses P2P mechanisms to provide service search and service self-improvement through the monitoring of the P2P SON overlay and self-configuration. Preliminary results concerning service aggregation and service searching are presented in this paper, giving an insight into the expected benefits and providing proof of concept as well as pointing the overall potential of the OMAN architecture. Keywords: Services Management, P2P, Service Overlays, P2P SON.
1
Introduction
With the advent of service overlay networks, new service provisioning business players are appearing. As in any other business investment, service providers weigh costs and profits, and resort to any competitive advantage in order to maximize revenue. A competitive advantage can be achieved through effective service management, exploiting the features of the overlay network used for service composition. Adequate management of the overlay network abstraction, which can include autonomous systems (AS), cloud infrastructures, communication links, and so on, allows service providers to offer and run services in a way that improves the end users Quality of Experience (QoE).
This work was partially funded by FCT (scholarship contract SFRH/BD/ 45683/2008).
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 14–25, 2010. c IFIP International Federation for Information Processing 2010
OMAN – A Management Architecture for P2P Service Overlay Networks
15
In this context, we propose OMAN, a service management architecture for P2P service overlay networks, in a multi-provider environment. Current approaches to the management of the Service Overlay Networks (SON) [1,2,3] lack strategies for adaptation to the flexible needs of new applications and services intended to Future Internet. OMAN explores adaptation opportunities by using service monitoring information and service aggregation at SON level. According to OMAN, nodes that implement some particular software module and that belong to a P2P overlay, which composes a SON, will inform the application or service that executes over it of key SON performance values, which can be used for SON routing and SON resource management, for instance, allowing for the maximization of application or service quality. Having in mind the stated goal, this paper is organized as follows. Section 2 discusses related work. Subsequently, in section 3, the proposed architecture will be firstly presented in general and secondly described on a layer-by-layer basis. Section 4 presents some preliminary performance results obtained by simulation and further discusses the key aspects of the Service Aggregation functionality. Section 5 presents future work and Section 6 summarizes the conclusions.
2
Related Work
The concept of services as applications that can be accessed over the Internet through a software interface is used in this paper as in [4]. The concept of Service Overlay Networks (SON) is not new as well. A SON is an overlay network designed to provide services [1,3]. The main SON issues are the establishment of the overlay itself and its management. The bandwidth provisioning over a SON as well as the interactions with the domains involved, and the necessary agreements to do that are studied in [3]. The main contribution of that paper is the study of the bandwidth provisioning problem in a SON using the pipe bandwidth model for static and dynamic provisioning. Taking advantage of the P2P technology, some of the initial issues related with the establishment of a regular SON can be solved. The scalability through multiple domains as well as the resilience of the P2P overlay networks allow to connect several different service providers over a P2P overlay network to deliver and provision their services. This is the subject of the work presented in [5]. However, this work does not the issue of how to improve or handle enhancements on the services using the properties and information present in the P2P overlay that composes the SON. Service Overlay Networks composed by Web Services are discussed in [4]. The federation of service providers is the core behind the idea of that paper. The web services are used to compose a SON using SLA as a management tool, and to monitor the intermediaries in order to pursue the accomplishment of the SLA. The SON composed by peers in a P2P mode is created in an on-demand basis. As in [3], the utilization of SLA is handled by a module of our architecture. However, this handling is considered in our architecture as a pre-production step
16
A. Fiorese, P. Sim˜ oes, and F. Boavida
since the creation of the P2P SON is intended to be made in a cooperation basis among the service provider competitors. Some work on QoS-aware SON is presented in [6,7]. However, these address only service level path establishment and management as well as service components discovery. In [6] this latter issue is pursued by an enhanced CAN P2P overlay network [8,9]. Our architecture goes beyond this since it proposes to improve the service behaviour while the service is executing. In [10] a Resilient Overlay Network (RON) is proposed. RONs are designed as overlay networks whose purpose is to recover from path outages in a few milliseconds. To accomplish this, RON nodes monitor the quality of the Internet paths and use this information to decide whether to route the packets directly over the Internet or by way of other RON nodes. This idea has some advantages in routing path resilience; however, services should be optimized according to several other application’s criteria beyond path routing as, for instance, the load on the intermediary peers.
3
OMAN – Overlay Service Management Architecture
In order to face the service management problem in a P2P SON, it is necessary to handle aspects ranging from the composition of the SON until the interaction aspects between the services and the SON, including how to take advantage of the information at the P2P overlay level to leverage the services and applications. OMAN intends to target the latter aspect. Figure 1 shows the proposed OMAN architecture. The circle end line seen in 1 means an interface for accessing the functionality offered by the module where the line starts, whereas the arrow end line means the module where the line starts is offering some particular information. At layer 1 the P2P communication will support the whole architecture. This module should be common to every node participating in the P2P SON. This module will be responsible to manage all the aspects related with the maintenance of the P2P overlay, including the actions of joining and leaving of nodes. This layer is also responsible for handling the aspects of the P2P SON provisioning. It includes the Service Level Agreements (SLA) and administrative management of the service providers in the SON’s group. At layer 2 some basic framework services are provided. This is the case of the external or legacy management systems. Also, service providers can utilize the search functionalities at this level to find overlay management services, particular specific services or infrastructural services, and component services to be used in assembling new services. A particular service designed to cope with the latter kind of searching is the Aggregation Service (AgS) [11]. The results from the AgS can be used by external composition service platforms. At level 3, specialized services in terms of overlay management and services improvement will take place. The Overlay Monitoring (OM) module collects information about the state of the overlay regarding its resources and execution conditions. The Best Peers Selection Service (BPSS) is a service that informs a
OMAN – A Management Architecture for P2P Service Overlay Networks
17
set of best peers to position a service according a particular application metric. In this sense this service helps applications and services on the use of the monitored overlay. The OM and BPSS support the Configuration Manager (CM) service. The CM has sub-modules: Resilience, QoE and Autonomic that cope with the dynamic aspects of the overlay management. The resilience sub-module will be responsible for instantiating and looking for alternative nodes to execute the operations in the case of failure or disconnection of a particular node in the P2P overlay that supports the SON. The QoE sub-module will offer an interface to obtain experience information from the users, concerning the services provided by the whole system. The Autonomic sub-module is intended to provide some self-* capabilities [12] to OMAN architecture, in particular self-configuration capabilities, based on an initial management policy received from the application or service controller (provider) and on the information collected by the OM module. In the next subsections each module of the architecture will be explained in detail.
Application/Service
3rd Layer
QoE Best Peer Selection Service
Overlay Monitoring
Autonomic
Resilience
Configuration Manager
2nd Layer
P2P Search Service
Aggregation Service
Support for Legacy Management Systems
1st Layer
Support for P2P Communication
SON provisioning
Fig. 1. The OMAN Architecture
18
3.1
A. Fiorese, P. Sim˜ oes, and F. Boavida
First Layer
This layer comprises the P2P overlay network itself. To accomplish that, every peer involved in the framework that implements this architecture must have the Support for P2P Communication and the support for SON provisioning. Support for P2P Communication. This module comprises the algorithms and strategies necessary to implement a P2P overlay network. The choice of which particular P2P overlay will be used depends on the communication provider of the SON. However, preliminary studies will be conducted using CHORD [13] for the validation. Nonetheless, if a particular application that is not aware of the CHORD mechanisms or that wants to implement specific strategies to routing its information then this application or service should implement its own particular routing scheme, including how to choose and how to manage the neighbour’s peers. SON provisioning. This module negotiates and enforces the SLA among the service providers that compose the P2P SON. Among other responsibilities this module should guarantee the financial sustainability of the P2P SON with billing policies based on the content shared or bandwidth used, or other parameters the involved choose. 3.2
Second Layer
P2P Search Service (P2PSS). This module is in fact the first interface to access the generated P2P overlay network. For instance, pure CHORD applications use this module as a service to find out the information they need. For the CHORD overlay, not only the peers’ ID but either the application information (e.g. shared files) must be keyed into the flat key space identifier of 128 bits. The ID or key generated is used to find out in which peer in the overlay that information will be stored and by the same principle that information is recovered: just informing its key. This module offers an interface to external applications to use it. Aggregation Service (AgS). The AgS is a higher level search service. It comprises the aggregation of particular services’ information in order to improve the search. It works like a second tier for the P2P Search Service, increasing the response time for a specific search. In our validation framework a particular P2P SON use the AgS to search service components to be assembled in new services. The adopted strategy per si diminish the time of searching, and using also the replication of the search results the performance gets better, as shown in Section 4.2. Support for Legacy Management Systems. This module offers an interface to standard management legacy systems. Third party software operators can use this module to offer a network management service for instance. This module can be adapted and it can be increased depending on the necessity of the P2P
OMAN – A Management Architecture for P2P Service Overlay Networks
19
SON operators. Basically, this module is responsible for translating the requests from the service into the necessary commands to legacy management systems running on the node (peer) that compose the P2P SON. Also, this module is responsible for exposing the answer received from the legacy system. 3.3
Third Layer
Overlay Monitoring (OM). The Overlay Monitoring is responsible for offering data related with the P2P SON to the applications and services. This module collects this information through monitoring peers (using pooling or the publish/subscribe paradigm depending on the information being monitored). The applications can use these data to improve their routing, storing, and processing capabilities. The Overlay Monitoring module is closely bonded to the Configuration Manager module since the application or service triggers it to adapt the application according its necessity. The approach used in the Overlay Monitoring module is that follows. Each node belonging to the P2P SON should report the monitored information in a time period interval (or when the information is ready to be reported in the case of publish/subscribe paradigm) to an aggregation node (super-node). These super-nodes, which will be elected according how long they are alive in the SON, exchange the information allowing the application/configuration manager to have easy access to it and how to use it. Thus, the following information should be monitored. Processing Load: This information depicts how idle or busy is the node’s processor. It is closely related with the node’s load, although that information also takes into account memory and disc usage per process. Hence, the processing power information is appropriate to applications and services that share processing capabilities. This information is kept as an average value between measurement intervals. This strategy avoids the use of a node that is intensively processing data or that makes intensive use of calculations even taking low memory consumption. Storage Capacity: How much main or secondary memory space is available for the application or service in the running node? This is what this metric depicts. This information is required for sharing content applications, especially file sharing ones. This information is also important, to instantiate or to clone the same application or service in other node in the P2P SON. This latter functionality is essential in the case of load balancing or fault tolerance and also for P2P applications that face high churn rates. Bandwidth: Each node involved in the P2P SON has communication interfaces that can be active or not. Each interface provides access to a communication link with a certain available bandwidth, depending on the link utilization which, in turn, depends on the number of sessions handled by the node. The available bandwidth determines how fast a node can communicate with other. So this information is especially valued for routing, content delivery and almost every networked operation an application intends to perform.
20
A. Fiorese, P. Sim˜ oes, and F. Boavida
Node Load: The node load depicts how busy is the node as a whole in the P2P SON. This metric takes into account not only the used processor capacity and how many time slices it can attend but also how much memory and disc storage are being used by the node to execute its demands in a time interval. This monitored metric is important to applications that need some kind of processing information about other peers in a session. For example, a service that might encrypt or execute a codec and decode some video traffic in the middle of a transmission could use the node load information to choose which nodes can participate on the forwarding route. Best Peer Selection Service (BPSS). The Best Peer Selection Service (BPSS) is a module in our architecture intended to face the peer’s location problem. Sometimes the nearest peer can be, or not, the best peer to exchange monitoring or metadata information. Hence, the choice of the best peer is dependent on a particular objective an application has. In compliance with the overlay monitoring, our BPSS module offer the topological metric to the applications. The topological metric is the information about the distances among the peers belonging to the SON. These distances are measured according the link latency among them or using other techniques (landscape mark servers, for instance), and they are dependent on the bandwidth and load existent between nodes. All in all, to use this module the application or service should specify the metrics and the number of best nodes it needs and request them. Depending on the specified metrics, the BPSS will use the Overlay Monitoring (OM) module (so both module should be implemented all together), to answer with a bunch of peers in the P2P SON that correspond that metrics. When the only metric desired is the distance between the node where the service is running and other particular nodes then just the BPSS will be used. Configuration Manager (CM). The Configuration Manager (CM) module is responsible for adapting the P2P SON to the conditions required by the applications or services as well as for controlling the underlay in order to get the desired behaviour for the application. The CM is also responsible for receiving the configurations to be applied to the overlay; create groups of peers to execute particular management operations; control the authorizations for applying configurations and with the help of the BPSS module it is responsible for applying the configuration on the appropriated service, device or peer. It also can offers information about the result of the configuration or also delegates the operation of configuring to another peer. The CM autonomously will use the information collected by the OM module to adjust the storing, sharing, and other capabilities. To control the underlay, CM will use the Support for Legacy Management Systems. The decisions the CM should make to control the underlay are taken depending on the overall policy of the P2P SON (based on SLAs agreed between the operators or providers in the application or service level), and autonomously by the metrics monitored by the Overlay Monitoring (OM) module. The Autonomous sub-module will take care of this aspect.
OMAN – A Management Architecture for P2P Service Overlay Networks
21
The information monitored by the OM module also influences the Resilience sub-module. This last module is responsible to keep the P2P SON working when disrupting abnormal conditions occurs. An example of its working appears when a node in the CHORD overlay does not respond to the requirements and its application and service must be cloned and started again in another node. The task of cloning and starting the “new” application is part of the responsibilities of this module. The choice of the best peer to start the “new” application can be done using the BPSS module. So the Resilience sub-module is dependent on the implementation of the BPSS module. With the regular necessity of enhancing applications and services, the user’s general opinion about them is even more required. The Quality of Experience (QoE) is the concept tailored to capture this notion about the systems from the end users. The CM module is also responsible for implementing this feedback mechanism through the QoE sub-module. The capacity of analysing the initial management policy and automatically adapt the P2P SON to reach the statements is executed by the sub-module we call Autonomic. The self-configuration is one of a series of self-* [12] characteristics intended to comprise the selfmanagement concept. At a first glance the P2P SON adaptation based on an initial policy with the integration of the information collected by the Overlay Monitoring module can also support the Resilience sub-module that executes some self-healing functions. To accomplish its responsibilities, the Configuration Manager needs the cooperation of the BPSS and the OM modules. Hence, the nodes that belong to the P2P SON and implement the Configuration Manager also need to implement the other modules of the third layer of our architecture.
4
Preliminary Results
This section presents some preliminary results obtained by simulation, concerning the AgS, which is in the layer 2 of OMAN. The AgS was chosen as proof of concept to illustrate part of the OMAN contribution. For that, it was chosen to be presented. Then, firstly, the AgS will be explained in more detail and finally assessment results will be shown and discussed. 4.1
Aggregation Service (AgS)
The proposed AgS is a P2P tier over the Support for P2P Communication. As a P2P tier, it consists of super-peers elected among the peers that belong to the service providers that constitute the P2P SON. These super-peers are also called aggregation peers and their purpose is to aggregate the offerings of services and service components in another overlay tier to improve search. Unlike the P2P Search Service (see Fig. 1, 2nd layer), whose functionality is provided by the P2P overlay network itself to execute generic searches, the AgS is designed to improve the search process of aggregated services and component services that will be offered to end users or to third party service providers
22
A. Fiorese, P. Sim˜ oes, and F. Boavida
Aggregation Service Management peer Aggregation peer Aggregation links Physical links Overlay links
Provider/Domain Manager
Domain A
Domain B
Domain C
Fig. 2. The Aggregation Service (AgS)
to make a service composition. For this to work, the peers that execute these services and component services, which are called management peers, should publish the interfaces of these services and component services in the aggregation peers. Once the interfaces are published they can be searched. Figure 2 illustrates the AgS concept. Thus, for instance, the AgS in the P2P SON can be utilized in a scenario where a provider of a composed service wants to search service components to be assembled into a new service. This provider uses the AgS to search the management peers where the required service components are available. Taking Fig. 2 as reference, let’s consider that a service provider from Domain A subcontracts services located in Domains B and C (e.g. a connectivity service with QoS guarantees and access to multimedia content). In this scenario the AgS ensures that the management peer reference for this service can be recovered. Further access to the service interfaces (e.g. contracting, monitoring, and life cycle management) is provided by the management peer that represents that service component. Interaction between the service contractor and the management peer is conducted outside the AgS. 4.2
Aggregation Service Evaluation
The AgS was evaluated through simulation. The simulated scenario was a virtual ring arrangement with 1.000 aggregation peers and 10.000 management peers. Each management peer could randomly publish, using a uniform distribution, up to 7 different service references in a randomly chosen aggregation peer. The simulations used the hop-count number of the Search Message (sent between aggregation peers) as a metric to calculate the average path length for the searches. The searching operations ranged from 100 to 1000 ones in order to observe the scalability and the average path length behaviour in each of the scenarios belonging to the same simulated environment. Two scenarios were assessed: 1) a
log10 total of query reply messages received
OMAN – A Management Architecture for P2P Service Overlay Networks
23
Comparing average path length (APL) with and without query results replication 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 2
2.1
2.2
2.3
2.4 2.5 2.6 2.7 log10 total of query operations
2.8
2.9
3
Query reply messages APL (hops unsucc. query op./succ. query op.) - with replication APL (hops unsucc. query op./unsucc. query op.) - without replication APL (hops succ. qmsg/succ. qop) - with replication APL (hops succ. qmsg/succ. qop) - without replication APL (hops succ. qmsg/query reply msgs) - with replication APL (hops succ. qmsg/query reply msgs) - without replication
Fig. 3. AgS average path length with and without replication of search results
scenario where the search result was cached (replicated) on the aggregation peer who started the search; and 2) a scenario without the replication of the search results. Preliminary results show high scalability as well as the resilience of the proposed AgS, through the replication of the search results. Figure 3 shows the most significant results. In Figure 3 is possible to see that there is a difference between the lines representing both scenarios, and that the average path length on the scenario whose search results are replicate is lower. The confidence interval for mean is of 95%. Thus, for the scenario without replication the average number of hops (peers) to find information is 14,1805 and the confidence interval is 1,004258. For the scenario with replication the mean is 13,634 with confidence interval of 0,950478. The results show that replication of search results improves the search performance and also enhances the resilience, since the aggregation peer that originally received the service reference publication message can go down and that service reference can still be reached through a previous replicated search result. It is worth saying that over time this strategy can lead to a situation where a good number of search operations are successfully performed in the peer’s local cache, avoiding any communication overhead on the AgS P2P overlay. As expected, the average path length in unsuccessful query operations is greater than in successful operations and it is not altered by replication or not, as it can be seen in the lines (hops unsucc. query op.) in Figure 3. Another finding is that the average path length depends on the number of active aggregation peers. Although the number of simulated aggregation peers was 1,000, the AgS P2P overlay consisted, on average, of 177 active aggregation
24
A. Fiorese, P. Sim˜ oes, and F. Boavida
peers with standard deviation of 16. This is the number of hops undertaken by unsuccessful query messages that travelled through the entire overlay and did not find the desired information. This low number of active aggregation peers can explain the low average path length that was shown in the results.
5
Future Work
In order to answer questions regarding the efficiency of the OMAN architecture some work is needed. First of all, the integration of AgS in a full implementation of the architecture in a simulated scenario or test-bed can provide valuable information about overall usability, about the capacity to handle a large number of services and about the interaction with and dependencies on the supporting P2P overlay. Also, simulations on the Best Peer Selection Service (BPSS) - a key component to services enhancement - should provide information about how to organize, in terms of topology, a real P2P SON. Last but not least, this kind of work also can contribute to assess what we claim as the necessary cooperation among service providers in order to keep the P2P SON communications linking costs under control. The Overlay Monitoring (OM) implementation and evaluation is also of great importance. Together with the BPSS module, the OM module can play a key role in service performance improvement, leveraging the end user quality of experience and the efficiency of service provision. The integration of the BPSS, OM and the Configuration Manager in a testbed whose purpose will be to assess the behaviour of services running on a P2P SON is another important topic for future work.
6
Conclusion
This paper presented OMAN, an architecture for service management in P2P overlay networks. OMAN addresses the problem of managing and providing service composition in large-scale environments involving multiple administrative domains and several service providers organized in a P2P SON. OMAN is based on 3 layers. The first layer comprises the support for P2P communication, where a P2P network involving service providers will take place, as well as an administrative module whose responsibility is to compose the service overlay network and to establish the necessary inter-provider agreements. The second layer comprises a searching service, whose main purpose is to offer basic and advanced searching features to the applications and services as well as to service providers; additionally, this layer provides support for legacy management services. The third layer provides value-added monitoring and adaptation services, based on information collected at the P2P SON level. This modular construction allows specific delivery strategies over the P2P SON for the service providers, allowing different peers to play different roles according the service providers policies.
OMAN – A Management Architecture for P2P Service Overlay Networks
25
This paper also presented and discussed some preliminary results about the AgS module of the OMAN architecture. In this context, two scenarios were simulated: first, a scenario using the caching (replication) of the search results at the aggregation peer where the query was initiated; second, a scenario without replication of the search results. The obtained results confirm the good scalability, resilience and performance of the AgS paradigm, and point to the potential of the overall OMAN architecture.
References 1. Fan, J., Ammar, M.H.: Dynamic topology configuration in service overlay networks: A study of reconfiguration policies. In: Proceedings of 25th IEEE International Conference on Computer Communications, INFOCOM 2006, pp. 1–12 (2006) 2. Liang, J., Gu, X., Nahrstedt, K.: Self-Configuring information management for Large-Scale service overlays. In: 26th IEEE International Conference on Computer Communications, INFOCOM 2007, pp. 472–480. IEEE, Los Alamitos (2007) 3. Duan, Z., Zhang, Z., Hou, Y.T.: Service overlay networks: SLAs, QoS, and bandwidth provisioning. IEEE/ACM Trans. Netw. 11(6), 870–883 (2003) 4. Machiraju, V., Sahai, A., Moorsel, A.V.: Web services management network - an overlay network for federated service management. In: IFIP/IEEE Eighth International Symposium on Integrated Network Management, pp. 351–364 (2003) 5. Zhou, S., Hogan, M., Ardon, S., Portman, M., Hu, T., Wongrujira, K., Seneviratne, A.: ALASA: when service overlay networks meet Peer-to-Peer networks. In: 2005 Asia-Pacific Conference on Communications, pp. 1053–1057 (2005) 6. Lavinal, E., Simoni, N., Song, M., Mathieu, B.: A next-generation service overlay architecture. Annals of Telecommunications 64(3), 175–185 (2009) 7. Adam, C., Stadler, R., Tang, C., Steinder, M., Spreitzer, M.: A service middleware that scales in system size and applications. In: 10th IFIP/IEEE International Symposium on Integrated Network Management, IM 2007, pp. 70–79 (2007) 8. Ratnasamy, S., Francis, P., Shenker, S., Karp, R., Handley, M.: A scalable ContentAddressable network. In: Proceedings of ACM SIGCOMM, pp. 161–172 (2001) 9. Crowcroft, J., Pias, M., Sharma, R., Lim, S., Lua, K.: A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys & Tutorials, 72–93 (2005) 10. Andersen, D., Balakrishnan, H., Kaashoek, F., Morris, R.: Resilient overlay networks. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, Banff, Alberta, Canada, pp. 131–145. ACM, New York (2001) 11. Fiorese, A., Sim˜ oes, P., Boavida, F.: Service Searching based on P2P Aggregation. In: Proceedings of ICOIN 2010, Busan - South Korea (2010) 12. Hariri, S., Khargharia, B., Chen, H., Yang, J., Zhang, Y., Parashar, M., Liu, H.: The autonomic computing paradigm. Cluster Computing 9(1), 5–17 (2006) 13. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, San Diego, California, United States, pp. 149–160. ACM, New York (2001)
Towards a P2P-Based Deployment of Network Management Information Rafik Makhloufi, Gr´egory Bonnet, Guillaume Doyen, and Dominique Ga¨ıti ICD/ERA, UMR 6279, Universit´e de Technologie de Troyes, 12 rue Marie Curie, 10010 Troyes Cedex, France
[email protected]
Abstract. Standard static centralized network management approaches are unsuitable for managing large, dynamic and distributed networks. Some decentralized approaches based on the P2P model have emerged to overcome the limitations of these centralized approaches like lack of scalability and fault-tolerance. However, they do not address issues related to the deployment of management information which is crucial in a decentralized management context because of the heterogeneous nature of this kind of information. Furthermore, current decentralized approaches still faces difficulties to ensure security, persistence and consistency of management information. In this paper, we investigate the use of a DHT as a framework for the deployment of network management information. First, we feature management information from a deployment perspective. Then, we propose a basic deployment strategy. We evaluate this approach in the context of the monitoring service considering network scalability, propagation delay and information loss under churn. Keywords: P2P-based network management, decentralized monitoring, CIM, DHT.
1
Introduction
Current networks are evolving in terms of complexity, distribution and dynamics, making standard, static and centralized network management approaches unsuitable to manage them. In order to overcome these problems, new decentralized management architectures such as those based on the P2P technologies have emerged. Indeed, the P2P model is seen as a promising way to improve centralized network management solutions and to avoid their limitations like the lack of scalability and fault-tolerance. Due to the decentralization of management approaches and to the distribution of management information, current management approaches have difficulties to ensure security, persistence and consistency of this kind of information. Furthermore, the management information has an heterogeneous nature. So, it is necessary to have an appropriate deployment strategy for this particular kind of information. However, in the literature the proposed P2P-based management approaches do not address issues related to the deployment of management information. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 26–37, 2010. c IFIP International Federation for Information Processing 2010
Towards a P2P-Based Deployment of Network Management Information
27
In this context, we focus our work on the design of a P2P-based approach for the deployment of management information on a P2P network. For this purpose, we rely on CIM (Common Information Model)1 as target information model. In order to validate our deployment approach, we evaluate it in the context of monitoring which is a constraining service where information is distributed, dynamic and should be retrieved in real time. The performance of the monitoring service is evaluated through both experimental and simulation tests, considering the network scalability, the delivery delay and the loss of information under churn. This paper is organized as follows. First, we give an overview of the P2P-based management paradigm in Section II. Subsequently, we present our approach for the deployment of network management information on a P2P system in Section III, and we depict our experiments and analyze results in Section IV. Finally, we present our conclusions and perspectives in Section V.
2
P2P-Based Network and Service Management
The evolution of current networks has introduced different decentralized network management approaches, coming to overcome the limitations of centralized network management approaches like lack of scalability and reliability, caused by the use of a central network manager as a single point of failure. Due to the characteristics of the P2P model, new promising network management architectures based on this paradigm have emerged [8,17]. The general perspective behind the P2P-based management is to incorporate the P2P model advantages into network and service management architectures. P2P systems are known for their characteristics such as decentralization, self-organization, scalability and faulttolerance. They are used to implement applications like file sharing, distributed computing, collaborative work and recently network management. In the latter, a node (peer) in the overlay network can be considered on a management plan as an agent and a manager at the same time. Thereby, one resource can be managed by several management entities and a manager can be interested in the management of several resources [10]. In addition to [17] that presents the general principles of a P2P-based management architecture, several works address particular uses of the P2P model in a management framework. Among them, we cite: (1) Astrolabe [13], an information management system, implemented using a P2P protocol and a restricted form of mobile code based on the SQL query language for aggregation; (2) MANP2P [8], a P2P-based management solution based on key management entities such as top-level managers (TLMs) and mid-level managers (MLMs). This model is used as basis for several studies on P2P-based management like [17] and for the development of P2P-based management and monitoring systems, such as [12,4]. Most of the explored P2P-based management approaches concern the monitoring service. Among these approaches, [4] propose a monitoring and notification service used in MANP2P, that relies on the publish/subscribe paradigm to 1
http://www.dmtf.org/standards/cim/
28
R. Makhloufi et al.
diffuse monitoring information. This service is based on two main frameworks: JXTA, a generic framework to build P2P solutions and JXTA-SOAP, an implementation of SOAP to send notification messages. [10] propose a monitoring approach based on unstructured overlay networks, where components can selforganize with a given predefined neighbor degree, such that each component becomes responsible for monitoring its immediate neighbors. This architecture is based on their membership protocol HyParView that builds and maintains an unstructured overlay network for P2P network and service management. [2] propose a management framework that performs monitoring for fault and performance management, with local and distributed tests. This architecture employs a P2P structured overlay that consists in several distributed network agents (DNA). The Kademlia DHT (Distributed Hash Table) is used in this architecture to allow each DNA searching for other DNAs. [1] propose P2PM (P2P Monitor), a P2P monitoring system which uses activeXML documents and alerters deployed on each peer to detect local events. There are also some works on the management information models like [5], an extension of CIM for P2P network and service management, instantiated on the Chord DHT [15]. Although all these approaches discuss the operational and architectural aspects of P2P-based network management, they do not address the problematic of the deployment of network management information. The latter is stored locally on the agent node itself, causing a single point of failure. Indeed, if the question of information deployment is irrelevant in the context of a centralized management where information is collected and processed by a single centralized management entity, it becomes crucial in a decentralized management context because the management information is heterogeneous, dynamic, distributed. Furthermore, due to their dcentralisation, current management approaches have difficulties to ensure security, persistence and consistency of the distributed management information. For this purpose, we propose a P2P-based approach for the deployment of network management information on a P2P network.
3
Deployment of Management Information on a P2P Network
In this section, we describe the investigated DHT-based deployment approach for network management information on a P2P network. We also present a case study on the utilization of this deployment approach in the context of monitoring. 3.1
Featuring Network Management Information
In order to propose a suitable P2P-based deployment strategy for network management information, we first need to feature the management information. Indeed, the network management information has some important deployment features. We propose to classify them as follow:
Towards a P2P-Based Deployment of Network Management Information
29
– Aggregation: a value of a managed object can be non-aggregated, a partial aggregate for a subset of managed elements or a global aggregate for the entire network; – Persistence: determines the existence duration of an information and the importance of preserving information when the related hosting nodes leaves the network [16]; – Accuracy: defines the level of accuracy of a management information; – Security: concerns access control, confidentiality and integrity of data; – Dynamics: a value of the managed object can be static or dynamic. Thereby, we propose to apply an appropriate deployment strategy according to the characteristics of the used management information. This can be mainly done using standard methods such as DHTs for deploying and retrieving information, and tree or gossip based algorithms for aggregating data [11]. For our case study, we focus on non-aggregated, persistent, accurate and dynamic management information. We do not address the security aspect in this paper. 3.2
Deployment Basis
The work we undertake is to propose a model for the decentralized deployment of network management information on a P2P system. We choose CIM as information model because it is a well-known standard, employed in most of available management technologies such as DEN-ng, WBEM and knowledge plane based approaches. For example, when using WBEM2 technology, there is a central management entity called CIMOM (CIM Object Manager). The latter employs a CIM Repository which stores CIM class definitions and dynamic data. In the context of a decentralized management infrastructure, there is a need to have a decentralized entity like a CIMOM for managing information. The deployment of management information can be performed through an unstructured overlay network or a structured one. In the first category, the deployment can be done through a diffusion algorithm like gossip or flooding [9]. As proposed in [10], an unstructured overlay can be reorganized with a fixed neighbor degree so that each component becomes responsible for managing its immediate neighbors in the overlay. In the second category, the deployment can be done thought a DHT. The latter can be either used to index the management information by sending a reference to a managed object to another node, or to deploy a copy of this information on a remote node. A simple tree-based algorithm can be used to propagate the management information over the tree nodes. 3.3
DHT-Based Deployment Approach
Our approach consists in considering a method based on a P2P structured overlay network. It employs a DHT [15] for delivering and looking for network management information. Also, it uses an application-level multicast system for the 2
http://www.dmtf.org/standards/wbem/
30
R. Makhloufi et al.
diffusion of information and by building and maintaining a publish-subscribe multicast tree. The information deployment process is divided into three tasks: instantiation, deployment and retrieval. Instantiation. When instantiating a CIM object class, The CIM naming mechanism offers a way to uniquely identify instances of this class. This object name is composed of the namespace path, which locates a particular namespace of a CIM implementation, plus the model path, which is the concatenation of the class name and the properties of the class that are qualified with keys. Figure 1 shows an example of a CIM class CIM EthernetPortStatistics, issued from the CIM device model with one possible instance of this class, which describes the statistics for the management of an Ethernet port. To better understand the deployment process, we employ the same example in the rest of the paper. class CIM EthernetPortStatistics { [Key] string InstanceID; .. . uint64 PacketsTransmitted; uint64 PacketsReceived; uint32 SymbolErrors; uint32 LateCollisions; uint32 FrameTooLongs; };
instance of CIM EthernetPortStatistics { InstanceID=’0x3:23H87A8’; .. . PacketsTransmitted=15; PacketsReceived=27; SymbolErrors=2; LateCollisions=0; FrameTooLongs=2; };
Fig. 1. An example of a CIM object class with an instance of this class
In this CIM class written in MOF (Managed Object Format), there is one key InstanceID and five other attributes . An instance of this class is a set of property values, declared using the keyword sequence instance of followed by the class name CIM EthernetPortStatistics. The model path for the this class is CIM EthernetPortStatistics.InstanceID=0x3:23H87A8. Therefore, following the naming scheme of CIM, the object name of this class in the case of WBEM would be http://CIMOMNameSpacePath:CIM EthernetPortStatistics.InstanceID=0x3:23H87A8. In the context of a P2P-based management, information is distributed over the peers and the same object can be owned by many peers. Thus, when using the CIM naming mechanism, a manager has no indication on the source node from the object name. Thereby, a good way to resolve this problem is to add to the object name a reference to the source node so that a manager can retrieve the management information of a particular node. Deployment. This task consists in deploying the management object on a remote peer by using a DHT. We choose a DHT tree-based solution because of its benefits. The structured topology allows us to have control on the information distribution over the peers. Besides this advantage, DHT-based overlay networks
Towards a P2P-Based Deployment of Network Management Information
31
provide a content lookup service with load balance, fault-tolerance, scalability and a mastered routing performance. They are able to retrieve information from the DHT using O(log(N )) hops in a network of size N [14]. Contrary to the classical case where the management information is stored only locally with a single point of failure, we use a DHT to store the information on remote nodes. Thereby, when the source node leaves the network, the management information is still available on a remote peer. This allows us to ensure information persistence. To deploy the management information on a DHT, we need to build an appropriate Id which is the hash of the object name generated by the naming scheme. The class instance of the managed object is routed by the agent node over the overlay to the nearest node (root) to the hash of the managed object name. According to this push-based algorithm, the agent periodically sends a new class instance of the managed object in order to update the old value. Retrieval. There are two possibilities to retrieve a management information. In case where only one manager is interested by this information, the manager sends a request message for this information, by using a lookup service like the one offered by a DHT. Then, the root sends the received object to the requesting manager. On the other hand, in case where several managers exist, we use a publish/subscribe ALM (Application-Level Multicast) system for the diffusion of the management information on all the subscribed managers. 3.4
Application to the Monitoring Service
In our study, we applied our deployment approach on the monitoring service which is very constraining: monitored information is dynamic, distributed and must be retrieved periodically and in real time. Furthermore, one resource can be monitored by several managers. According to the deployment approach described above, the architecture we use to deploy the monitoring information and to send notifications from the agent to the managers is done according to a push-push publish/subscribe communication mechanism as illustrated in Figure 2. In this architecture, there are three main actors which interact in the system. Firstly, an agent node a, which acts as a server and publishes periodically the new value of a managed object. Secondly, the root node r, which is the nearest node to the hash of the managed object, receives the managed object sent by the agent node. Finally, the M managers mi , where 0 < M < N and N the number of nodes in the network, which are interested in the monitored variable, subscribe to the communication group created by the ALM in order to periodically receive the new value of the managed object stored on the root.
4
Experimental Framework
We perform semi-real experiments for evaluating the performance of the proposed DHT-based management information deployment approach, in the context of the constraining monitoring service of a P2P network.
32
R. Makhloufi et al.
Fig. 2. System architecture
4.1
Test Environment
In order to implement the deployment approach described above, we rely on Pastry [14] to create an overlay network and on Scribe [3] to build and maintain a publish-subscribe multicast tree. We conducted our experiments on FreePastry3, a Java implementation of the Pastry DHT. To provide results under semi-real experimental conditions, we carry out our experiments on a LAN environment consisting of 12 computers. We run multiple Pastry nodes within each JVM (Java Virtual Machine). In order to coordinate the 12 computers, we use DELTA [6], a generic environment dedicated to the test and measurement of Java-based distributed applications. 4.2
Experimental Setup
We create N pastry nodes on the overlay and we select uniformly at random one source node that will act as an agent a with M randomly chosen nodes acting as managers, where M < N . The agent node is in charge of periodically publishing the new values of the managed object. We create a Scribe topic whose groupId is the hash of the managed object name described in section 3.3. Scribe sets the Pastry node with the nodeId that is numerically closest to the groupId as a root of the multicast tree. All the managers send a subscribe request to join the created communication group. When the agent node sends a new value of the managed object in a publish message, this message is routed to the root before being broadcasted over the multicast tree on all the subscribed managers. The management information that we handle in our experiments is dynamic, distributed and non-aggregated. Thus, the agent node publishes every 5 seconds a new value of the managed object. This object contains a counter, for example, the attribute PacketsTransmitted, depicted by Figure 1. The use of a counter 3
http://freepastry.org
Towards a P2P-Based Deployment of Network Management Information
33
as a monitored variable allows us to easily compare the values emitted by the source node with those received by the subscribed managers. 4.3
Results
In order to validate our DHT-based deployment approach on a P2P network, we used criteria close to those provided in [7]. Thereby, we measured the information loss in the presence of churn, the propagation delay and finally the scalability of this approach while having a large number of managers. Information loss under churn. We evaluate the capacity of the system to handle churn (arrivals and departures of network nodes) through the information loss rate which occurs on the network in the presence of such a phenomenon. Information loss is the number of times the agent publishes a value of the managed object without being received by a manager. To implement a realistic user behavior, we use as a reference model in our experiments the Video on Demand (VoD) over IP analysis study given in [18]. Nowadays, P2P VoD systems take a large percentage of the traffic over the Internet. It is important to ensure quality of service in this kind of systems. They are well-known for the dynamics of users, where the user behavior modeling is available in some works. In our experiments, we generate random session durations (i.e. online time) in lognormal form with parameters (μ = 2.2, σ = 1.5). Similarly, the nodes arrivals into the system follow the revised Poisson process [18], where the number of arrivals usually ranges from 0 to 5 users per second. The nodes are created as long as the maximum number of nodes on the hosting machine is not reached. Figure 3.a shows the cumulative distribution of the information loss rate. For example, we have obtained a loss rate of less than 5% in 40% of the cases. This high loss is caused by the multiple departures of the root nodes. In order to better understand this distribution, the Figure 3.b shows a typical example of the variation of the average information loss during experiments, with 120 managers. When the system is not under churn, there is no loss of information. However, under churn, we observe that there is a reasonable average loss of 0.75%, except the two peaks, where the loss reaches up to 100%. It corresponds to the case, where the root node leaves the network. So, none of the nodes receives the correct new value during a certain time interval, corresponding to the tree maintenance time. It is generally in the range of 20 − 40 seconds. A good way to avoid the information loss when the root node leaves the network is to replicate the information on another node. Propagation delay. It is the necessary time between publishing a value of the managed object by the agent node and retrieving the same value by a subscribed manager node. Experimental results obtained when measuring the propagation delay on 120 managers without churn and under churn are given respectively in Figure 4.a and Figure 4.b. Thereby, we calculate the propagation delay for each received monitored value.
34
R. Makhloufi et al.
Fig. 3. Information loss: (a) CDF; (b) Variation in function of time
Fig. 4. Propagation delay: (a) Without churn; (b) Under churn
We notice two levels of delay in both figures. The top level, where the propagation delay is more important, corresponds to the bottom of the Scribe multicast tree and the second one is the level just below the root node. We notice a propagation delay in the range of 5-30 ms. We do not observe an influence of churn on the propagation delay because we only measure this delay at a manager when it holds the same monitoring value as the one sent by the agent. Scalability. Scalability is the capacity of a network to operate correctly and to ensure accuracy despite of the large number of nodes in the network. A good
Towards a P2P-Based Deployment of Network Management Information
35
way to evaluate the scalability of the management service is therefore to look at the end-to-end communication delay under a large number of nodes. This delay is measured between the agent node and the managers. Because of a small number of available machines on our test bed, we rely on the FreePastry simulator in order to consolidate the evaluation of the scalability of our approach. We use the provided Euclidean network topology model to reproduce the latency of a real P2P network topology. Furthermore, we produced 12 simulation runs for each experiment, and each value represented in the curves is the average of 3.6 × 105 values. In order to evaluate the scalability of our approach, we measure its capacity to handle a large number of managers for each managed object. The curves of Figure 5.a and Figure 5.b show the simulation and the experimental results, respectively.
Fig. 5. Impact of the number managers on the propagation delay: (a) Simulation results; (b) Experimental results
We notice in this figure that both curves of the propagation delay in our approach follow a logarithmic distribution O(log16 (M )), where M is the number of managers. We observe a maximum average propagation delay in the order of 950ms, obtained when using 2 × 104 managers. Our objective is not to evaluate the Pastry DHT performances, but those of our deployment approach. This is why we measure the delay between the agent and the managers, and not only the search time for a node as measured in [2]. nonetheless, our simulation and experimental results show that by exploiting the properties of the DHT, the propagation delay in our approach does indeed scale. Furthermore, the propagation delay follows the same distribution in both simulation and experimentation.
5
Discussion and Conclusion
In this paper we have introduced a DHT-based infrastructure for the deployment of network and service management information on a P2P network. We first
36
R. Makhloufi et al.
propose a classification for the management information features according to their aggregation, persistence, accuracy, security and dynamics, before applying our deployment approach for one category of information (non-aggregated, persistent, accurate and dynamic). Thus, in order to validate this P2P-based approach, we applied it to the monitoring service. The performance of the monitoring service is evaluated with both experimental and simulation tests, considering the network scalability, the delivery delay and the loss of information under churn. The experimental and simulation results show that the DHT-based approach is a satisfying solution. First, under churn, there is a low information loss, except the case where the root node leaves the network. Moreover, the propagation delay is reasonable, and we do not observe an influence of churn on this delay. Furthermore, with the introduction of additional managers, the propagation delay in our approach follow a logarithmic distribution. In addition, the results show that our approach is scalable regarding the propagation delay. For future work, we plan to enhance the classification of network management information and to consolidate our evaluation by reproducing the actual behavior of some managed objects. Finally, we consider that the deployment problematic is the same as the one addressed in database management with some common aspects such as replication, access control and complex queries. Thereby, on a long term, we will explore approaches used in the data management community and study the possibility to employ them in the context of decentralized network management information deployment.
References 1. Abiteboul, S., Marinoiu, B., Bourhis, P.: Distributed monitoring of peer-to-peer systems. In: Proceedings of the 24th IEEE International Conference on Data Engineering, pp. 1572–1575 (2008) 2. Binzenhfer, A., Tutschku, K., Graben, B., Fiedler, M., Arlos, P.: A P2P-Based framework for distributed network management. In: Proceedings of the 2nd International Workshop of the EURO-NGI-Networks-of-Excellence, pp. 198–210 (2006) 3. Castro, M., Druschel, P., Kermarrecand, A.-M., Rowstron, A.: Scribe: a largescale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications 20(8), 1489–1499 (2002) 4. dos Santos, C.R.P., Santa, L.F.D., Marquezan, C.C., Cechin, S.L., Salvador, E.M., Granville, L.Z., Almeida, M.J.B., Tarouco, L.M.R.: On the design and performance evaluation of notification support for P2P-based network management. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 2057–2062. ACM, New York (2008) 5. Doyen, G., Festor, O., Nataf, E.: A CIM extension for peer-to-peer network and service management. In: de Souza, J.N., Dini, P., Lorenz, P. (eds.) ICT 2004. LNCS, vol. 3124, pp. 801–810. Springer, Heidelberg (2004) 6. Doyen, G., Ploix, A., Lemercier, M., Khatoun, R.: Towards a generic environment for the large-scale evaluation of peer-to-peer protocols. In: Proceedings of the 2008 Networking and Electronic Commerce Research Conference (2008)
Towards a P2P-Based Deployment of Network Management Information
37
7. Ehrig, M., Schmitz, C., Staab, S., Tane, J., Tempich, C.: Towards evaluation of peer-to-peer-based distributed information management systems. In: Proceedings of the AAAI Spring Symposium on Agent-Mediated Knowledge Management, pp. 73–88 (2003) 8. Granville, L., da Rosa, D., Panisson, A., Melchiors, C., Almeida, M., Tarouco, L.: Managing computer networks using peer-to-peer technologies. IEEE Communications Magazine 43(10), 62–68 (2005) 9. Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005) 10. Leitao, J., Pereira, J., Rodrigues, L.: Large-scale peer-to-peer autonomic monitoring. In: Proceedings of the 3rd IEEE Workshop on Distributed Autonomous Network Management Systems, in Conjuction with GLOBECOM, pp. 1–5 (2008) 11. Makhloufi, R., Bonnet, G., Doyen, G., Gati, D.: Decentralized aggregation protocols in peer-to-peer networks: a survey. In: Proceedings of the 4th IEEE International Workshop on Modelling Autonomic Communications, pp. 111–116 (2009) 12. Panisson, A., da Rosa, D.M., Melchiors, C., Granville, L.Z., Almeida, M.J.B., Tarouco, L.M.R.: Designing the architecture of P2P-Based network management systems. In: Proceedings of the 11th IEEE Symposium on Computers and Communications, pp. 69–75. IEEE Computer Society, Los Alamitos (2006) 13. Renesse, R.V., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003) 14. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001) 15. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 149–160. ACM, New York (2001) 16. Tout, R.N., Ghodous, P., Ouksel, A., Tanasoiu, M.: Data persistence in p2p backup systems. In: Proceedings of the 16th ISPE International Conference on Concurrent Engineering, vol. 1, pp. 149–156 (2009) 17. Xu, H., Xiao, D.: Towards P2P-based computer network management. International Journal of Future Generation Communication and Networking 2(1), 25–32 (2009) 18. Yu, H., Zheng, D., Zhao, B.Y., Zheng, W.: Understanding user behavior in largescale video-on-demand systems. SIGOPS Oper. Syst. Rev. 40(4), 333–344 (2006)
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes1 and Alva L. Couch2 1
2
Faculty of Engineering Oslo University College Oslo, Norway
[email protected] Computer Science Department Tufts University Medford, MA, USA
[email protected]
Abstract. A central issue in autonomic management is how to coordinate several autonomic management processes, which is assumed to require significant knowledge exchange. This study investigates whether two autonomic control units, which control the same system, can achieve some level of self-coordination with minimal knowledge exchange between them. We present the results from simulations on a model of two autonomous resource controllers influencing the same system. Each of the controllers tries to balance system utility with the cost of resource usage, without the knowledge of the second controller. Simulations indicate that coordination of autonomic management processes is possible, as the system performs close to optimally with minimal knowledge exchange between the two resource controllers. Keywords: self-organization, resource management, computing, agent management, distributed agents.
1
autonomic
Introduction
The vision of autonomic computing is to build systems that are capable of self-management, adapting to changes by making their own decisions, based on status information sensed by the systems themselves[1,2]. Current solutions to autonomic computing rely upon “control loops” in which the managed system is monitored, changes are planned and implemented, and the effectiveness of changes is evaluated [3]. For many authors, the definition of control loops as provided in IBM’s autonomic computing manifesto[1] is equated with the definition of autonomic computing. However, there are some drawbacks to IBM’s control architecture. One drawback is that distinct control loops are difficult to combine into a coherent management strategy. Another drawback is that the effectiveness of a single control loop depends upon the accuracy of its model of system behavior during the B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 38–49, 2010. c IFIP International Federation for Information Processing 2010
On the Combined Behavior of Autonomous Resource Management Agents
39
planning phase[4]. Vendors deal with these two drawbacks by constructing autonomic controllers that gather and act upon global knowledge of system behavior. This knowledge is exchanged with other controllers via what might be called a “knowledge plane” [5,6]. As a result, vendors pursue large monolithic approaches to autonomic control, in which an overarching controller gathers and acts upon data concerning all aspects of performance. The net result of this approach is that it is not practical for two non-communicating control approaches to collaborate or even co-exist. In this paper, we study an alternative approach to autonomic control that is designed to foster collaboration between controllers and minimize necessary knowledge between them. In an initial AIMS paper last year [7], Couch demonstrated that an autonomic controller could achieve near-optimal behavior with an incomplete model of system behavior. Couch claimed that this model of control will allow controllers to be composed with minimal knowledge exchange. In this paper, we test that claim, by simulating the results of composing two controllers for front-end and back-end services under a variety of conditions. We demonstrate that composition is possible, and that reasonable results can be achieved from two controllers, provided that each controller is aware of when the other is operating. The remainder of this paper is organized as follows. Section 2 gives a brief overview of related works. In section 3, we introduce the extended theoretical model which forms the basis of the simulations in the experiments, which are described in section 4. Section 5 gives an overview of our observations and results, which we discuss in section 6.
2
Related Work
There is a wealth of literature on the control-theoretic approach to autonomic management. Traditional approaches to resource control rely upon control theory to predict system response in the presence of changes[8]. These authors use the system model to predict overall response. Therefore, the usefulness of the technique varies with the accuracy of available models. Another technique for gaining optimal control of application resources involves using optimized sampling strategies to compensate for lag times from unrestrained sampling[9]. Rather than controlling a short-term sampling window, these authors sample based upon experimental design principles and a concept of which samples are more “important” than others. Other alternatives that do not rely on control theory are still dependent on constructing accurate models[10]. In these studies, an optimizer uses a model to predict the relationship between resources and behavior, and plans without control-theoretic assistance.
3
Model
As a sample case for our composition problem we have used a system consisting of a back-end server and a front-end server which –when combined –provides a
40
S. Fagernes and A.L. Couch
service. The performance of the system is determined by the total response time P incurred in providing the service. The response time consists of two main parts: – P1 : the response time at the front-end server. – P2 : the response time at the back-end. The response time will be influenced by several factors, but in this study we will isolate two such variables in the resource domain, so that R1 is a adjustable resource variable on the front-end server, and R2 on the back-end. For the purposes of this paper, in like manner to Couch’s simulations[7], we L1 L2 assume ideal scalability, so that P1 = R and P2 = R . The total response time 1 2 or performance P is then expressed by L1 L2 + , R1 R2
P =
(1)
where L1 and L2 is the current load at each of the servers. An illustration of the scenario is shown in Figure 1.
Client
Front end
Back end
P1+P2
P2
P1+P2 Client Front end
P2
Back end
Time
Fig. 1. The total system response time. Transmission time is ignored.
The associated cost S of providing the service is the sum of the costs of the current resource usage, which is defined as C(R1 , R2 ) = C(R1 ) + C(R2 ) = R1 + R2 .
(2)
The associated value of receiving the service, which is given notation V (P ) is defined (in like manner to Couch’s initial paper) according to the following expression: L1 L2 V (P ) = 200 − P = 200 − + (3) R1 R2
On the Combined Behavior of Autonomous Resource Management Agents
41
illustrating how the perceived value decreases with increasing response time, and increases with increases in R1 and/or R2 . This cost function is chosen so that the overall reward function V (P ) − C(R1 , R2 ) has a theoretical optimum value. 3.1
Feedback and Control Loops
The core of this study is to analyze how the closure operators control/adjust their individual resource parameters. Both operators receive feedback on the perceived value V of the delivered service S, and based on local knowledge of the cost of the current resource usage, each operator can make its local estimate dV of the derivative dV dR . The dynamics of the model is simple; if dR is positive it will be beneficial to increase R, and if it is negative the resource usage should be decreased. An obstacle lies in the fact that the change in value dV is a result of redV dV cent changes in both R1 and R2 . It is challenging to estimate dR and dR cor1 1 rectly, since the operators controlling R1 and R2 do not know about each other’s presence. 3.2
Theoretical Optimum and False Optimum
The choice of this model makes calculating the theoretical optimum of the operators’ choices both simple and straightforward. The goal of the system management is to balance cost and value, i.e. to maximize the net value (value minus cost). This is equivalent to maximizing V (P ) − C(R1 ) − C(R2 ) = 200 −
L L − − R1 − R2 , R1 R2
which trivially gives the following values for optimal choice of R1 and R2 R1Opt = R2Opt = (L),
(4)
(5)
where L represents the common load L = L1 = L2 . Based on the fact that these two operators do not know about each other, it is reasonable to expect them to make a wrong estimate of the optimal choice of resource usage. Each operator receives feedback of the overall value in the system, but does not take into consideration the cost of the other resource variable. In a scenario where the resource operators work concurrently with equal resource values R, the perceived net value function would look like this: V (R) − C(R) = 200 −
2L −R R
(6)
√ which gives the optimal value Rfalse = 2L, which we refer to as the false optimum in the remainder of this paper, because some management designs achieve this optimum instead of the true optimum.
42
4
S. Fagernes and A.L. Couch
Experiment
We implemented a simulator based on the two-operator model, with the aim of simulating how the system behaves, and of determining whether the independent resource controllers are able to adjust the resource level to a level that is close to the optimal values. The system was tested for two different types of load L: – Constant load, where L = 1500. – Sinusoidal load, where L(t) = 1000sin((t/p) ∗ 2π) + 2000, which makes the load varying periodically between 1000 and 3000. We developed and tested two methods for updating the resource variable. Under our first method, both controllers receive system feedback and simultaneously update their resource variable. We designate this method as the concurrent algorithm in the remainder of the paper. Under the second approach, each controller adjusts its variable while the other remains constant through a specific number of iterations (cycles). During each cycle, one update of the system performance is fed back to the resource controllers. In this approach, which we designate as the alternating algorithm, each controller tries to converge to its optimal resource usage level, without interference of the other controller’s updates.
5
Results
This section presents and explains the main findings from the simulations. The metrics for evaluating model performance are 1. The actual net value produced by the system throughout each simulation, compared with the theoretically highest value. 2. The actual resource usage (actual values of R1 and R2 ) compared with the theoretical optimal value for R1 and R2 . The simulator was run for a range of values of R1 and R2 , but due to space limitations only a small selection is presented here. However, the presented findings occurred for all tested values. True and False Optima As computed earlier in this paper, the theoretically optimal behavior of both √ resource controllers is a resource level of R1 = R2 = L. In other words, √ a desirable result would be to have both variables R1 and R2 converge to L. The plots we computed of the actual behavior of R1 and R2 all indicate some kind of convergence, but this convergence does not necessarily come to the true optimum behavior.
On the Combined Behavior of Autonomous Resource Management Agents
43
Concurrency Leads to False Optima Throughout the simulation results, concurrent updates of the resource variables R1 and R2 by both controllers acting independently resulted in convergence to false optima for at least one of the variables. The value of the false optima depended on the initial resource variables. If the initial values of R1 and R2 √ were identical, both variables appeared to converge to the same false optimum 2L as seen in Figure 2. Concurrent updates
0
30
10
40
20
30
Net value
50
Resource usage
40
60
50
70
60
Concurrent updates
200
220
240
260
Time
280
300
200
220
240
260
280
300
Time
Fig. 2. Actual resource usage (left) and net value (right), under constant load. Initial resource values: R1√= R2 = 50. In the left figure, the dashed straight line represents the √ false optimum 2L while the lower solid line represents the actual optimum level L. Net value (right) is compared to the theoretical maximum net value (represented by the upper straight line).
However, R1 and R2 have different initial values, the lowest one ends up “dominating” by converging close to the optimal value ( (L)). The other gets “stuck ”at a higher level, and sometimes even increases from its initial level. For initial values far from the optimal this trend wreaks a devastating impact on the efficiency of the system (Figure 3). Concurrent, uncoordinated resource adjustments result in poor performance based on achieved net value, particularly when there is a difference in initial values of R1 and R2 . If the difference between these is above a certain threshold, the system produces negative net value (Figure 4). Alternating between Processes Leads to True Optima and Thrashing Our next approach was to alternate the two processes. We constrained one of them from acting while the other adjusted its resource parameter. As illustrated in Figure 5, the apparent effect is that the resource values chosen by both processes now oscillates around the theoretical optimum Ropt . In our view, this algorithm has managed to “escape” the false optimum. The downside is that the
44
S. Fagernes and A.L. Couch
Concurrent updates
0
30
40
10
50
20
30
Net value
70 60
Resource usage
40
80
50
90
60
100
Concurrent updates
200
220
240
260
280
300
200
220
240
Time
260
280
300
Time
Fig. 3. Actual resource usage (left) and net value (right), under constant load. Initial resource values: R1 = 1, R2 = 50. In the left figure, the lower oscillating curve represents the resource variable R1 , while the upper represents R2 . The straight line in the left √ figure represents the actual optimum level L. In the right figure, the straight line represents the theoretical maximum net value.
200
220
240
260
Time
280
300
40 20 −40
−20
0
Net value
20 −40
−20
0
Net value
20 0 −40
−20
Net value
Alt(10 cycles)
40
Alt(1 cycle)
40
Concurrent
200
220
240
260
Time
280
300
200
220
240
260
280
300
Time
Fig. 4. Achieved net system value, constant load. Initial resource values: R1 = 1, R2 = 100.
oscillations are “larger”, which leads to worse performance with respect to total net value, as illustrated in Figure 6. Another feature of this experiment, is that the system seems to be less vulnerable to choices for initial values of R1 and R2 , compared to when the processes are run concurrently (Figure 4).
On the Combined Behavior of Autonomous Resource Management Agents
50 20
30
40
Resource
40 20
30
Resource
50
60
Alt(1 cycle)
60
Concurrent
45
200
220
240
260
280
300
200
220
Time
240
260
280
300
Time
Fig. 5. Actual resource usage, constant load. Initial resource values: R1 = R2 = 50.
45 40 20
25
30
35
Net value
35 20
25
30
Net value
40
45
50
Alt(1 cycle)
50
Concurrent
200
220
240
260
Time
280
300
200
220
240
260
280
300
Time
Fig. 6. Achieved net system value, constant load. Initial resource values: R1 = R2 = 50.
The Best-Case Situation The next approach was then to increase the time that each process was able to adjust while the other was kept constant. The idea was to let each process converge to its optimal value without interference from the other variable. We have run the algorithms for varying measurement window and number of cycles. As seen in Figure 7, by keeping each resource variable constant for 10 cycles, the resulting net value has improved significantly (compared to the “1 cycle”-case). However, increasing the number of cycles beyond 10 does not seem to improve things, as the net value is actually lower for cycles=25 and 50.
46
S. Fagernes and A.L. Couch
40 25
30
35
Net value
35 25
30
Net value
40
45
Alt(10 cycles)
45
Alt(1 cycle)
220
240
260
280
300
200
240
260
Alt(25 cycles)
Alt(50 cycles)
280
300
280
300
40 35 25
25
30
35
Net value
40
45
Time
30
Net value
220
Time
45
200
200
220
240
260
280
300
200
220
240
Time
260
Time
Fig. 7. Achieved net system value, constant load. Initial resource values: R1 = R2 = 50.
200
220
240
260
Time
280
300
45 40 25
30
35
Net value
40 25
30
35
Net value
40 35 25
30
Net value
win=12
45
win=6
45
win=3
200
220
240
260
Time
280
300
200
220
240
260
280
300
Time
Fig. 8. Achieved net system value, constant load. Initial resource values: R1 = R2 = 50. Alternating for 10 cycles.
On the Combined Behavior of Autonomous Resource Management Agents
47
Table 1. Theoretical efficiencies (constant load) Conc 0.815 0.794 0.335 -0.643 0.815
Alt (1) 0.934 0.911 0.936 0.934 0.934
Alt(10) 0.977 0.995 0.995 0.988 0.987
60 40
60 40 100
200 Time
300
400
Net value
−40
−20
0
Net value
0 −20 −40 0
Alt (75) Alt (200) 0.993 0.911 0.986 0.990 0.990 0.909 0.989 0.914 0.993 0.979 Alt(10 cycles)
20
40 20 −40
−20
0
Net value
Alt(50) 0.988 0.986 0.988 0.991 0.993
Alt(1 cycle)
60
Concurrent
Alt (25) 0.993 0.987 0.993 0.987 0.987
20
Win=3 R1=R2=1 R1=R2=50 R1=1, R2=50 R1=1, R2=100 R1=R2=100
0
100
200 Time
300
400
0
100
200
300
400
Time
Fig. 9. Achieved net system value, sinusoidal load. Initial resource values: R1 = R2 = 50.
Keeping the number of cycles at 10 (which seemed to give the best results in some of the experiments), and varying the measurement window size for the input data, showed that increasing the window generates more chaotic behavior and worse performance with respect to achieved net value, as seen in Figure 8. However, the pattern of less efficient management when the number of cycles increased, proved not to be consistent throughout the simulations, as seen in Table 1. Here we have systematically computed the theoretical efficiency as introduced in [7]. Theoretical efficiency was defined as the ratio of sums of achieved net value to the theoretically best: (Vi − Ci ) E= i (7) opt opt − Ci ) i (Vi opt opt where Vi − Ci is the best theoretical net value the system can produce at time i. As seen in Table 1, keeping each variable constant for 10-75 cycles give significantly higher efficiency than just keeping them constant for one cycle (e.g. the values in column Alt(1) are much lower). However, the number of cycles that are optimal varies according to the choice of initial values of R1 and R2 .
48
S. Fagernes and A.L. Couch
Increasing the measurement window size from 3 seems to have undesirable effects, as seen in Figure 9, which is consistent with Couch’s claim that more data sometimes hurts performance [7,11,12]. The results discussed here were obtained while running the simulator under constant load. We also ran the simulator under sinusoidal load, to confirm that the algorithms still retain convergence effects as reported by Couch[7]. One example is shown in Figure 9.
6
Conclusion
In this paper, our goal was to understand the mechanisms that are required in order to achieve self-organization of independent autonomic components or processes. The findings indicate that controllers can collaborate effectively when they are constrained under a simple process of taking turns, and without requiring any other information exchange. This finding challenges the popular belief that a knowledge plane and a concept of global knowledge are necessary to allow multiple processes to co-exist. For this kind of self-organization to be possible, our experiments indicate that it is sufficient that the autonomic entities adjust their behavior at different times, i.e. completely concurrent execution of controllers behavior is not sufficient for the system as a whole to achieve its goals. An algorithm that lets each entity affect the common environment or system (and receive system feedback) for several iterations without the interference of other components seems to help the components to achieve close-to-optimal behavior. We found that while the time window in which each resource controller operates alone must be of some size, the measurement window is constrained to be quite small. This corresponds well with earlier results [7,11] which concluded that high reactivity can be more efficient than requiring large numbers of measurements. At the outset, we challenged the idea that management agents must be tightly coupled to coordinate their actions. Our initial hypothesis was that this coupling could be much less than is present in other proposed and actual agents. This study showed that the minimal level of coupling needed was temporal coordination, which is much less than the total knowledge exchange proposed in competing solutions.
References 1. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer (2003) 2. Huebscher, M.C., McCann, J.A.: A survey of autonomic computing - degrees, models, and applications. ACM Comput. Surv. (2008) 3. Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. John Wiley & Sons, Chichester (2004)
On the Combined Behavior of Autonomous Resource Management Agents
49
4. Kephart, J.O.: Research challenges of autonomic computing. In: Proceedings of the 27th international conference on Software engineering (2005) 5. Macedo, D.F., dos Santos, A.L., Nogueira, J.M.S., Pujolle, G.: A knowledge plane for autonomic context-aware wireless mobile ad hoc networks. In: Pavlou, G., Ahmed, T., Dagiuklas, T. (eds.) MMNS 2008. LNCS, vol. 5274, pp. 1–13. Springer, Heidelberg (2008) 6. Mbaye, M., Krief, F.: A collaborative knowledge plane for autonomic networks. Autonomic Communication (2009) 7. Couch, A., Chiarini, M.: Dynamics of resource closure operators. In: Proceedings of Autonomous Infrastructure Management and Security 2009, Twente, The Netherlands, June 30-July 2 (2009) 8. Padala, P., Shin, K.G., Zhu, X., Uysal, M., Singhal, S., Wang, Z., Merchant, A., Salem, K.: Adaptive control of virtualized resources in utility computing environments. In: EuroSys (2007) 9. Xi, B., Liu, Z., Raghavachari, M., Xia, C.H., Zhang, L.: A smart hill-climbing algorithm for application server configuration. In: WWW, pp. 287–296 (2004) 10. Padala, P., Hou, K.-Y., Shin, K.G., Uysal, M., Singhal, S., Wang, Z., Merchant, A.: Automated control of multiple virtualized resources. In: EuroSys, pp. 13–26 (2009) 11. Couch, A., Chiarini, M.: Combining learned and highly-reactive management. In: Proceedings of Managing Autonomomic Communications Environments (MACE) 2009, Venice, Italy, October 27 (2009) 12. Couch, A., Burgess, M., Chiarini, M.: Management without (detailed) models. In: Gonz´ alez Nieto, J., Reif, W., Wang, G., Indulska, J. (eds.) ATC 2009. LNCS, vol. 5586, pp. 75–89. Springer, Heidelberg (2009)
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows Stein Desmet, Bruno Volckaert, and Filip De Turck Department of Information Technology(INTEC) - IBCN, Ghent University - IBBT, Gaston Crommenlaan 8 bus 201, B-9050 Gent, Belgium
[email protected]
Abstract. The media processing and distribution industry generally requires considerable resources to be able to execute the various tasks and workflows that constitute their business processes. The latter processes are often tied to critical constraints such as strict deadlines. A key issue herein is how to efficiently use the available computational, storage and network resources to be able to cope with the high work load. Optimizing resource usage is not only vital to scalability, but also to the level of QoS (e.g. responsiveness or prioritization) that can be provided. We designed an autonomous platform for scheduling and workflow-to-resource assignment, taking into account the different requirements and constraints. This paper presents the workflow scheduling algorithms, which consider the state and characteristics of the resources (computational, network and storage). The performance of these algorithms is presented in detail in the context of a European media processing and distribution use-case.
1
Introduction
In the audiovisual industry, the resources required to perform various tasks such as editing and processing of video and audio, are generally quite considerable, not to mention expensive. This is not only true for computational resources but for storage resource and communication resources as well. Converting video material from one format to another can easily take several hours, even on today’s multi-core servers. Raw Standard Definition material used during the editing process has bitrates of 60Mbps or more, while HD material easily quadruples these bitrates. This obviously creates high demands on storage and network. Especially for independent film makers, these resource requirements and their cost might pose an obstacle. The Independent Films In Progress (IFIP) [1] project designs a platform aimed at the media industry, especially the independent film. Its goal is to provide a framework supporting the development of independent productions. Amongst others, it provides a platform for users to automate and submit production processes. The platform offers the possibility for users to submit a series of tasks - e.g. video transcoding or CGI rendering - or workflows they want to be executed within a certain budget and timeframe. The platform autonomously locates appropriate resources for the execution of these workflows, and determines B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 50–64, 2010. c IFIP International Federation for Information Processing 2010
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
Abstract workflow description
Video Editor
Specific workflow description (e.g. BPEL)
51
Transcoder Storage
IFIP network CGI Artist
IFIP Portal
Editor
Scheduler
Workflow Engine
Start of a workflow at the most appropriate time
Transcoder
Storage
Fig. 1. Overview of the IFIP platform. The scheduler locates resources for the workflows submitted at the portal, and launches them at an appropriate time.
the most appropriate start time of the workflows, taking all their constraints into account. This is illustrated in Figure 1. A user submits a workflow at the IFIP portal, which forwards the flow to the scheduler. The scheduler examines the workflow and the user’s constraints defined in the metadata. For each task in the workflow, the scheduler locates resources hosting the required services or applications - taking into account both the user’s demands and the current state of the infrastructure - and adds the task-to-resource mapping to the workflow description. Additionally, an optimal starttime for the workflow is determined as well. At that starttime, the scheduler submits the workflow description to the workflow engine, which subsequently executes the workflow. Considering such a multi-client distributed media processing platform, an efficient allocation of all resources - computational, storage and communication - is of the utmost importance for being able to meet the users’ requirements and offer high Quality-of-Service such as responsiveness or budget considerations. Efficient resource usage also results in the ability to handle higher loads. This optimal resource selection is not only location-based (which resource to use), but also time-based (when to use the resource). This scheduling problem is known to be NP-hard [2]. There is already a substantial amount of research on the topic of scheduling tasks in grids and other heterogeneous distributed environments [3, 4, 5]. However, media-oriented cases, including the one presented here, have specific, highly data-intensive and time-constrained demands, to which the workflow-toresource mapping must be specifically tuned. The network in a media environment for example requires special consideration. Due to typical network packet patterns inherent to media transfers, media traffic turns out to behave differently from regular IT traffic. This makes unreliable throughput, packet loss and lost connections due to oversubscription far more likely than in normal situations [6, 7]. The MediaGrid framework (and accompanying MediaNSG simulator) [8] is aimed at employing grid technology in a media production and distribution environment. The IFIP project builds further on this framework, and the research work performed for the GEISHA project [7], the predecessor to the IFIP project. The GEISHA project focused on the actual implementation of a service oriented architecture in media grids, and the associated challenges.
52
S. Desmet, B. Volckaert, and F. De Turck
The research project GridCast [9] is undertaken by the British Broadcasting Company (BBC) and the Belfast e-Science Center, and also aimed at developing a media grid. However, it is intended on the specific BBC topology, whereas we aim to provide a general framework. Furthermore, scheduling in GridCast is mainly focused on scheduling data transfers only. Another project aimed at media oriented grids is MediaGrid.org [10], an open standards group for the development of a grid platform specifically designed for digital media. Other projects include mmGrid [11], a middleware for supporting multimedia applications in a grid environment and Parallel-Horus [12], a cluster programming library for applications in multimedia grids. However, these projects mainly focus on design and implementation of media grids, rather than the problem of making efficient use of available grid resources. This paper presents a number of distinct algorithms to schedule workflows to resources in a media environment. The heuristics presented are an adaption of the list scheduling technique [13] and use an offline - sometimes also referred to as static - scheduling approach. This refers to the time at which scheduling decisions are taken. In the offline approach, information regarding the state of all resources and every workflow is assumed to be known. This is a valid approach, as users submit their workflows at the portal, and expect results by a certain time, allowing the scheduler to periodically perform scheduling of the pending workflows. For example, each workflow that needs to be executed the next day is known, so scheduling decisions can be made overnight. Online scheduling on the other hand schedules workflows ‘on-the-fly’ when they are submitted, based on the current system load. This increases responsiveness, but leads to less efficient resource usage, as the global overview of the system is reduced. This article is organized as follows. Section 2 describes the problem in greater detail and defines the model used. Section 3 gives an overview of the designed algorithms, while Section 4 thoroughly evaluates these algorithms. Finally, Section 5 presents directions for future research.
2
Problem Model Description
We assume k workflows {w1 , w2 , . . . , wk } competing for resources, whose submit time and deadline are known as well as their maximum allowed budget. Each workflow consists of a set of inter-dependent atomic tasks {t1 , t2 , . . . , tn } which are represented in a Directed Acyclic Graph (DAG). The edges between nodes represent task dependencies, and a node either represents a task or a control structure such as a decision or a parallel construct. The particular execution path that will be chosen by a decision construct is not known until runtime, i.e. workflows are non-deterministic. By nature, a loop construct is not supported by a DAG. Loops can be handled by for example unfolding or peeling them as described in [14]. Each task requires a specific service to execute, such as for example a transcoder or a CGI renderer. It retrieves its input data from a data repository, and stores its output data on a data repository. There is no direct data exchange between tasks, unless through an intermediary data repository. Tasks can retrieve
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
53
and store their data either streaming or non-streaming. Each task requires a predefined processing time on a standard processor. The network consists of a set of nodes, interconnected by a number of edges. These edges represent network connections and are considered to be unidirectional. In other words, to create a full duplex link, two edges are needed. Links offer a certain bandwidth at a certain use cost. A node represents a network node, to which a Computational Resource (CR) or Data Resource (DR) can be attached. The routing between two nodes is known in advance and is assumed to be fixed. Computational resources offer one or more service types - e.g. transcoding or CGI rendering - at a certain execution speed, expressed in relation to a standard processor. Data Resources are the data repositories and can be used for both data retrieval as for storing output data. Similar to Computational Resources, Data Resources have a particular read speed and write speed, and an associated use cost. Every task must make advance reservations on each resource (computational, data, network) it uses. In other words, the task reserves exclusive use of the resource during a certain period. Note that this reservation system also helps to avoid the earlier mentioned network oversubscription danger. Rather than reserving each resource for the entire duration of a task, we have chosen to reserve them only when they are actually needed (i.e., a Data Resource used for data retrieval is only reserved for the time it takes to actually read the input, not for the whole input - processing - output cycle). This improves resource usage, but complicates the reservation procedure, as dependencies between various reservations need to be taken into account. Furthermore, for Data Resources a distinction is made between reservations for reading and writing. It is assumed that existing files are never overwritten, so reading and writing from the same Data Resource can occur simultaneously. Furthermore, it is required that every resource can throttle its speed. Suppose a Computational Resource is able to perform a task at a certain speed, but the Data Resource is not able to supply the input data at that speed. It is then necessary for the Computational Resource to slow down until both speeds match.
3
Scheduling Algorithm Details
The scheduling algorithm’s aim is to find suitable resource triplets (i.e., input Data Resource, Computational Resource and output Data Resource) with sufficient network interconnections for each workflow activity. Resource interconnection characteristics are equally responsible for workflow execution times (and cost) and must be taken into account, in order to avoid creating network bottlenecks. Determining a suitable set of reservations for a resource combination is no trivial task. Reservation length depends on resource speed and task requirements, but also on whether or not the task is streaming its data. Streaming input requires equal duration of input retrieval and task execution, as we assume input data will be streamed during the full time spent on processing.
54
S. Desmet, B. Volckaert, and F. De Turck
DR 1
T1,1
DR 1
T1,1
DR 1
T1,1
DR 1
T1,1
Link
T1,1
Link
T1,1
Link
T1,1
Link
T1,1
CR
T1,1
CR
T1,1
CR
T1,1
CR
T1,1
Link
T1,1
Link
T1,1
Link
T1,1
Link
T1,1
DR 2
T1,1
DR 2
T1,1
DR 2
T1,1
DR 2
T1,1
Time
(a) No streaming
Time
Time
(b) Input and output (c) Input streaming (d) streaming streaming
Time
Output
Fig. 2. Resource reservation order according to streaming or non-streaming nature
As mentioned earlier, this may require throttling to match both resources’ speeds, thus ultimately influencing reservation lengths. This obviously creates dependencies between the reservations of a single task. Furthermore, reservations need a specific order, depending on the nature of the input and output, as illustrated in Figure 2. The figure shows the different reservation orders that must be respected when dealing with streaming/non-streaming input/output data of individual workflow activities. Apart from these task-level reservation restrictions, there are also workflowlevel restrictions on reservations. Tasks can only execute when all parent tasks have finished. Overlap of reservations on the same resource is not allowed, but provisions are made to support workflows with alternate execution paths. As the actual execution path is not known until runtime, reservations of tasks in alternate paths are allowed to overlap. Only one set of reservations will effectively be used at runtime. The algorithms presented here are based on the List Scheduling technique [13]. The general outline is shown in Listing 1. The heuristic is initialized by adding the root nodes of every workflow to the list Q. Subsequently, a priority is assigned to each task in Q according to a certain strategy, discussed in Section 3.1. The task with the highest priority is selected for assignment and removed from Q. The list Q is somewhat similar to a priority queue, but the priorities of its content need to be re-evaluated in each iteration of the algorithm, as the priority of an unassigned task is usually based on that task’s properties, the tasks that have already been assigned, and the associated system loads. The latter two change after every iteration of the algorithm. The selected task is then assigned to a triplet of resources and network interconnections according to certain criteria. Suitable reservations need to be made on the selected resources, taking into account existing reservations and interreservation dependencies. Different approaches can be considered to select the resources and reservations, presented in Section 3.2. If no acceptable resource combination can be found for the selected task because each possible combination would violate a constraint, action is undertaken to handle this situation. As with deploying tasks and assigning scores, multiple approaches can be considered (Section 3.3).
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
55
Algorithm 1. Solve(W orkf lows, Resources) Q, P rocessed ← [ ] for each wi ∈ W orkf lows do Q ← Q + root(wi ) while Q = [ ] ⎧ ⎪ HighestP riority ← 0 ⎪ ⎪ ⎪ CurrentT ask ← null ⎪ ⎪ ⎪ ⎪ for each ⎪ ⎧ tj ∈ Q ⎪ ⎪ ⎪ P riority ← getPriority(tj ) ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ if P riority ⎪ ⎪ > HighestP riority ⎪ do ⎪ HighestP riority ← P riority ⎪ ⎪ ⎪ ⎪ ⎩ then ⎪ ⎪ CurrentT ask ← tj ⎨ do Q ← Q − CurrentT ask ⎪ ⎪ Result ← assign(CurrentT ask, Resources) ⎪ ⎪ ⎪ ⎪ if Result = Constraint V iolation ⎪ ⎪ ⎪ ⎪ ⎪ then ⎧ HandleConstraintViolation(CurrentT ask, Q, Resources) ⎪ ⎪ ⎪ ⎪ P rocessed ← P rocessed + CurrentT ask ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ else for each tj ∈ children(CurrentT ask) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ do if ∀tk ∈ parents(tj ) : tj ∈ P rocessed ⎩ ⎩ then Q ← Q + tj
Finally, the children of the selected task are considered for insertion in Q, but only if all of a child’s parent tasks have already been assigned to resources. This is necessary to preserve the execution order of the workflow. This entire process is repeated until there are no more tasks left in Q. In this discussion, we focus on two optimization objectives, the makespan of a workflow - its entire execution length - and the execution cost - the total cost for using the resources. Task deadlines and budget requirements are considered to be strict i.e. violating them means the workflow cannot be executed. 3.1
Determining Task Importance
The following strategies to assign priorities to tasks in Q have been designed. Random (RC). Assigns random priorities to tasks. ClosestConstraintViolation (CCV). This algorithm assigns higher priority to tasks closer to violating a workflow constraint, i.e. are closer to their deadline or budget. This naturally depends on the tasks of the workflow that have already been assigned. Giving priority to workflows closer to violating a constraint may prevent these workflows from violating that constraint, as they gain precedence for using the available resources. ClosestRelativeConstraintViolation (CRCV). This algorithm is similar to the previous one, but the time or budget remaining values are first normalized,
56
S. Desmet, B. Volckaert, and F. De Turck Allowed interval for task
Allowed interval for task
DR 1
DR 1
DR2
DR2
DR3
DR3
CR1
CR1
CR2
CR2 Time
(a) High resource occupation
Time
(b) Low resource occupation
Fig. 3. Two different tasks and the current reservations on their available resources
and priority is given based on these normalized values. In other words, a task whose workflow already uses a high percentage of its allowed budget or time interval, receives priority. TaskLeastAvailableResourceTime (TLART). This approach considers the set of available resources (computational, data and network) for a task and the occupation on these resources. This occupation is evaluated during the time interval in which the task is allowed to execute. Tasks where the average occupation across their available resources during this interval is higher, receive higher priority. In other words, tasks whose available resources are already heavily in use receive priority. Figure 3 shows the available resources and their reservations for two different tasks. The resources of the task in Figure 3(a) are obviously more occupied than the resources of the task in 3(b). TLART would consequently give higher priority to the former task. TaskMostAvailableResourceTime (TMART). This heuristic is exactly the opposite of TLART, as it gives precedence to tasks with lower average resource occupations. In Figure 3, TMART gives higher priority to the task in Figure 3(b). This is a somewhat greedy strategy, as tasks that are already known to have a high probability of finding a suitable assignment are given priority. 3.2
Assigning Tasks to Resources
The following strategies are available to assign tasks to resources. RandomInserter (RI). Randomly chooses a Computational Resource able to process the task and an input and output Data Resource containing the required data sets. The task is scheduled to start as early as possible, taking into consideration the current reservations on the Computational, Data and network resources. This process of finding reservations for a task is illustrated in Figure 4 for a task with non-streaming input and output. Figure 4(a) shows the initial situation. Figure 4(b) shows a first attempt at finding reservations for the current
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
57
Allowed interval for task Invalid
DR 1
DR 1
Link
Link
CR
CR
Link
Link
DR 2
DR 2
Time
Time
(a) Initial situation
(b) First try
Invalid Valid DR 1
DR 1
Link
Link
CR
CR
Link
Link
DR 2
DR 2 Time
(c) Second try
Time
(d) A valid solution
Fig. 4. Finding the earliest possible reservations and starttime for a task
task. The first free interval on the input Data Resource is valid and an overlapping free interval exists for the network link from the Data Resource to the Computational Resource. An adjacent interval is available on the Computational Resource and another adjacent interval is present on the network link going from the Computational Resource to the output Data Resource. However, there is no valid interval available on the output Data Resource. Figure 4(c) shows the second attempt at finding suitable reservations. The second free interval on the input Data Resource is again valid, but if reading from the resource is started as early as possible, then there is insufficient overlap with the free interval on the network link. The correct solution is shown in Figure 4(d). Figure 4 only shows the case of a task with non-streaming input and output. The three other cases are handled differently, as input-, output- and processing time depend on each other. SimpleInserterWithDelay (SIWD). The algorithm starts by choosing a Computational Resource from the available resources, and subsequently proceeds by choosing an input and output Data Resource for that particular Computational Resource. The resource is selected in terms of completion time, lowest use cost or a weighted average. Determining the best resource in terms of completion time considers both a resource’s speed and occupation, by looking for the earliest usable time slot on that resource as shown in Figure 5. In other words, a slower resource might be picked over a faster resource because it has an earlier time slot available. Reservations are made as described earlier for the random algorithm. Note that this strategy considers each resource separately. A Computational Resource might be selected because it has a very early available time slot, but
58
S. Desmet, B. Volckaert, and F. De Turck X
Fig. 5. Resource selection by the SimpleInserterWithDelay strategy
(a) Resource combination 1
(b) Resource combination 2
Fig. 6. Resource selection by the BestCombinationInserter strategy
this slot is not usable due to reservation dependencies on the other resources, and the Computational Resource may have been a poor choice. Furthermore, the practice of first choosing a Computational Resource and then Data Resources can often lead to suboptimal solutions [15]. BestCombinationInserter (BCI). This strategy improves SIWD by evaluating every possible resource combination of input Data Resource, output Data Resource and Computational Resource - including the network links involved according to the criteria earliest completion time, lowest use cost or a weighted average of both. To determine the earliest completion time of a task on a particular resource combination, the earliest possible valid set of resource reservations must be found on that combination. The procedure for finding this reservation set is similar to the reservation procedure described for the RandomInserter algorithm (Figure 4). The BCI strategy is obviously computationally considerable more expensive than SIWD. BestCombinationInserter is illustrated in Figure 6 which lists two possible resource combinations for a task with streaming input and non-streaming output. While combination 6(a) has slower resources (note that the reservation interval lengths are longer), it is still preferable to 6(b) because it has an earlier completion time. TieBestCombinationInserter (TBCI). This approach improves BestCombinationInserter by handling tie situations. If two combinations have the same completion time, then the selection is made based on the total cost of each combination, and vice versa.
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
3.3
59
Handling Constraint Violations
Only one strategy to handle constraint violation situations is presented here, due to space limitations, although others are available. The DeleteViolatingWorkflow (DVW) strategy simply blacklists the offending task’s workflow and removes all reservations that have already been made for that workflow. This enforces strict constraints, and violating workflows are removed to free up resources.
4 4.1
Results and Discussion Developed Simulation Environment
Testlab evaluations and field trial implementations require a lot of effort, and are impractical to perform large-scale testing. Only rigorous testing on largescale environments with different setups can provide a true comparison on the performance of different algorithms. Since building large testbeds is costly, time consuming and often impractical, simulation is an important aid in the evaluation of algorithms. Therefore, a simulation framework is designed and implemented in Java to facilitate the evaluation of the developed algorithms. It is able to simulate any combination of algorithm, network topology and workflow load. The configuration of network infrastructure, workflow load and the scheduling algorithm to use are supplied to the simulator in an XML format. Note that the workflow load supplied to the simulator is a fixed description consisting of a list of workflows including individual submit times and properties such as deadline and budget. This allows us to evaluate different scheduling strategies on the same resource topology and workflow load. For performance reasons, the simulator performs high-level simulation of network transfers (i.e. not down to packet level but transfer times determined based on data size, network route and individual network link bandwidth). The simulator logs detail information on all tasks and resources. 4.2
Simulation Setup
The following evaluation setup has been used to obtain the results presented here. A simulation is run for each algorithm using the same load and network. These simulations are repeated a number of times with different loads and networks to obtain a view on the average performance of the algorithms. The basic network topology is randomly generated starting from the European network shown in Figure 7 but with varying parameters such as available network bandwidth between nodes, the number, speed and cost of Computational Resources and the number and data repositories of the Data Resources. Bandwidth between cities is randomly chosen from 1Gbps, 100Mbps or 10Mbps. Cities have a number of nodes connected to them - always through 1Gbps links - on which the Computational and Data Resources reside. Data and Computational Resources never share a node, and each city has at least one Data Resource
60
S. Desmet, B. Volckaert, and F. De Turck
Fig. 7. The European network as a result of the COST-292 project
and one Computational Resource. Three classes of Computational resources exist, executing tasks at 0.5x, 1.0x or 2x reference speed, and for the simulations presented here, tasks can execute on every Computational Resource. All workflows are line workflows, i.e. there are no parallel or alternative paths. Tasks’ input and output data sizes range from 768Mb to 46Gb, according to a random uniform distribution. Input and output can be streaming or nonstreaming. Processing time depends on the size of the input data and the Computational Resource’s speed. Processing 768Mb data at reference speed takes 120 seconds, while processing 46Gb takes 2 hours at reference speed. A task has a probability of 0.3 to find its requested input data on a particular Data Resource. Likewise, it has a 0.3 probability to be able to store its output data on a particular Data Resource. Three different scenarios are evaluated, with workflow loads consisting of respectively 250, 500 and 750 workflows over a 24h interval, with a uniform distribution arrival. These loads are chosen to stress the available resources and the algorithms. 4.3
Detailed Evaluation Results
The results on the following combinations of priority-, assignment-, and constraint handling strategies are presented: RC-RI-DVW, CCV-SIWD-DVW, CCV-BCIDVW, CCV-TBCI-DVW, CRCV-TBCI-DVW, TLART-TBCI-DVW and TMART-TBCI-DVW. The abbreviations for the different strategies are defined in Section 3. Figure 8(a) shows the average workflow acceptance rates for 20 different runs, as a function of an increasing workflow load. While these loads may not seem very high, recall that the execution of a single task can take several hours. Half of the runs prioritized makespan, the other half budget. The workflow acceptance rate is the percentage of workflows that can be successfully assigned to resources and executed. Recall that strict constraints are enforced, i.e. workflows that would violate a constraint are simply not started. Figure 8(b) - which shows the average Computational Resource occupation as a function of an increasing load - serves to illustrate that the decrease in
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
61
acceptance rates as loads go up is attributed to a saturation of Computational Resources, rather than the algorithms. Note that it is very unlikely for resources to reach full utilisation, due to free time fragmentation on the one hand, and the dependencies between reservations on different resources on the other hand. From Figure 8(a), it is evident that randomly assigning workflows to resources results in unacceptable workflow acceptance rates. Likewise, using the naive SIWD resource assignment strategy produces barely higher acceptance rates. SIWD is inefficient because it considers Computational and Data Resources independently, which has several implications. Firstly, choosing a fast Computational Resource independently makes no guarantees about the available Data Resources and network links for that resource. Furthermore, always choosing the best Computational Resource ensures that these Computational Resources will quickly become unavailable. Further observations show that the tie considering strategy of TBCI indeed improves the BCI strategy slightly, as it produces a higher acceptance rate. The results show that the performance of the CRCV priority heuristic compared to the CCV approach drops as the loads go up. The rationale behind the CRCV priority strategy is to give higher priority to workflows that have already used up relatively more of their budget or available time. However, the absolute time or budget remaining is important, as it allows CCV to assign lower priorities to tasks and workflows with loose constraints and high priority to workflows with strict constraints. This is however not the case for CRCV. This can result in situations where workflows or tasks with loose constraints are given priority and thus needlessly reserve fast or cheap resources, while they could have completed with slower or more expensive resources. The TMART priority algorithm performs considerably worse than TLART. Giving more priority to tasks that are unlikely to be deployed definitely increases the possibility of these tasks being deployed, without adversely affecting tasks that already have a good possibility of getting deployed. The opposite ‘greedy’ strategy TMART definitely has a negative impact on the amount of workflows that can be deployed. Figure 9 displays the running times of the algorithms to determine a schedule, as a function of an increasing workflow load. The algorithms are executed in Sun Java 6 on an AMD Opteron 2350 machine with 8Gb memory installed. The less complex SIWD and BCA assignment strategies are obviously considerably faster than the BCI and TBCI strategies, which need to consider every resource combination for each task. Furthermore, the more complex priority assignment algorithms TLART and TMART also cause higher execution times, as they need to evaluate the occupation of every potential resource for each task. The higher execution time of TMART versus TLART can be attributed to the fact that TMART can deploy considerable less workflows than TLART. Consequently, the constraint violation handler is called more often for TMART, resulting in an increased execution time.
62
S. Desmet, B. Volckaert, and F. De Turck 100%
100%
90%
90%
80%
80% 70%
RC-RI-DVW
60%
CCV-SIWD-DVW
50%
CCV-BCI-DVW
40%
CCV-TBCI-DVW
30%
CRCV-TBCI-DVW TLART-TBCI-DVW
20%
TMART-TBCI-DVW
utilisation
deployed
70%
CCV-BCI-DVW
40%
CCV-TBCI-DVW
30%
CRCV-TBCI-DVW TLART-TBCI-DVW
20%
10%
0%
0%
500
CCV-SIWD-DVW
50%
10%
250
RC-RI-DVW
60%
TMART-TBCI-DVW
250
750
500
750
workflows
workflows
(a) Deployment rate
(b) Computational Resource occupation
Fig. 8. Average results over 20 runs including standard deviation, as a function of an increasing workflow load. Loads are over a 24h interval. The standard deviations are shown as well. 1.600 1.400
seconds
1.200 RC-RI-DVW
1.000
CCV-SIWD-DVW 800
CCV-BCI-DVW CCV-TBCI-DVW
600
CRCV-TBCI-DVW 400
TLART-TBCI-DVW TMART-TBCI-DVW
200 0 250
500
750
workflows
Fig. 9. Average execution time of each algorithm as a function of the load, 24h interval. The standard deviations are shown as well.
5
Future Work
Areas for future work include a more extensive evaluation on the performance of the various algorithms, not limited to line workflows, but also including completely arbitrary workflows containing parallel paths and alternative paths of decision branches. Additionally, only one constraint handling strategy has been presented in this paper. Additional constraint violation handling strategies are available, and their evaluation must be presented as well. Alternate (meta-)heuristics may also be evaluated. Furthermore, online scheduling algorithms have been designed, and their detailed performance evaluation is ongoing at the time of writing.
6
Conclusions
The Independent Films In Progress (IFIP) project is a framework aimed at the media industry, providing a platform for users to automate and submit
Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
63
production processes such as editing and distribution. Users submit a series of tasks or workflows they want to be executed within certain constraints. The platform autonomously assigns the appropriate resources to the users’ workflows and executes them. A key issue in such a platform is how to efficiently allocate tasks to the available resources, hereby improving throughput, scalability and QoS. To this end, this paper presents a number of offline scheduling algorithms which are based on the List Scheduling technique, but use different strategies to handle the various stages of the heuristic such as assigning priorities to tasks, task-to-resource assignments and constraint violation handling. Extensive evaluation of the developed algorithms has proven that they outperform naive implementations, and shows the importance of selecting adequate priority, deployment and constraint handling strategies.
Acknowledgment Part of the research described in this paper is funded through the IBBT-project IFIP.
References 1. IBBT: Independent films in progress (IFIP) (2008-2010), http://www.ibbt.be/en/project/ifip 2. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness (Series of Books in the Mathematical Sciences). W. H. Freeman, New York (January 1979) 3. Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Metaheuristics for Scheduling in Distributed Computing Environments. Springer, Heidelberg (2008) 4. Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: State of the art and open problems. Technical report, School of Computing, Queen’s Univeristy (2006) 5. Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999) 6. Andries, L.: Facing media traffic challenges. Broadcast Engineering (Febuary 2010) 7. IBBT: Grid enabled infrastructure for service oriented high definition media applications (GEISHA), http://www.ibbt.be/en/project/geisha 8. Volckaert, B., Wauters, T., De Leenheer, M., Thysebaert, P., De Turck, F., Dhoedt, B., Demeester, P.: Gridification of collaborative audiovisual organizations through the mediagrid framework. Future Gener. Comput. Syst. 24(5), 371–389 (2008) 9. Harmer, T.: Gridcast–a next generation broadcast infrastructure? Cluster Computing 10(3), 277–285 (2007) 10. Walsh, A.E.: The media grid: A public utility for digital media. In: 8th International Symposium on Spatial Media (ISSM) jointly held with IEEE International Conference on Computer and Information Technology, IEEE CIT (2007) 11. Basu, S., Adhikari, S., Kumar, R., Yan, Y., Hochmuth, R., Blaho, B.E.: mmgrid: Distributed resource management infrastructure for multimedia applications. In: IPDPS 2003: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, Washington, DC, USA, pp. 88–1. IEEE Computer Society, Los Alamitos (2003)
64
S. Desmet, B. Volckaert, and F. De Turck
12. Seinstra, F.J., Geusebroek, J.M., Koelma, D., Snoek, C.G.M., Worring, M., Smeulders, A.W.M.: High-performance distributed video content analysis with parallelhorus. IEEE MultiMedia 14(4), 64–75 (2007) 13. Chr´etienne, P., Coffman, E.G., Lenstra, J.K., Liu, Z. (eds.): Scheduling Theory and its Applications. John Wiley and Sons, Chichester (1995) 14. Ardagna, D., Pernici, B.: Adaptive service composition in flexible processes 33, 369–384 (June 2007) 15. Alhusaini, A.H., Prasanna, V.K., Raghavendra, C.S.: A unified resource scheduling framework for heterogeneous computing environments. In: Proc. Eighth Heterogeneous Computing Workshop (HCW 1999), April 12, pp. 156–165 (1999)
An Autonomic Testing Framework for IPv6 Configuration Protocols Sheila Becker1, Humberto Abdelnur2 , Radu State1 , and Thomas Engel1 1
University of Luxembourg, Luxembourg {sheila.becker,radu.state,thomas.engel}@uni.lu 2 MADYNES - INRIA Nancy-Grand Est, France
[email protected] Abstract. The current underutilization of IPv6 enabled services makes accesses to them very attractive because of higher availability and better response time, like the IPv6 specific services from Google and Youtube have recently got a lot of requests. In this paper, we describe a fuzzing framework for IPv6 protocols. Fuzzing is a process by which faults are injected in order to find vulnerabilities in implementations. Our paper describes a machine learning approach, that leverages reinforcement based fuzzing method. We describe a reinforcement learning algorithm to allow the framework to autonomically learn the best fuzzing mechanisms and to automatically test stability and reliability of IPv6. Keywords: Fuzzing, IPv6, Reinforcement Learning.
1
Introduction
The prediction that IPv6 implementations will increase due to the limited address space of IPv4 persists since a long time. Actually, a second reason for IPv6 to sprout, occurs because of the current underutilization of IPv6 enabled services, as services provided from Google and Youtube, makes accesses to them very attractive due to the better response time and higher availability, and these services receive a lot of requests. Although the usage of IPv6 grows, appropriate testing and fuzzing frameworks are lacking. These frameworks are important to assure the reliability of network protocols. For this reason, we assess a fuzzing framework for IPv6 protocols in order to detect vulnerabilities with the objective to appraise reliability. In this paper, we start our analysis by defining a behavioral model of IPv6 protocols with a Finite State Machine of the given protocol. Afterwards, we inspect the different message types exchanged within the protocol by defining for every field, a message is composed of, the type of the field, the default value, and the length of the field. Subsequent to this inspection, we investigate on possible fuzzing strategies to guarantee efficiency while fuzzing. We employ these fuzzing strategies in an IPv6-implementation and we propose the usage of two different reward functions, one based on what is sent to the network, and the other based on kernel tracing. Finally, we use these reward functions to lead a reinforcement learning algorithm for learning the fuzzing strategy with the highest reward. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 65–76, 2010. c IFIP International Federation for Information Processing 2010
66
S. Becker et al.
The paper is structured as follows: In Section 2 we explain fuzzing and the different existing fuzzing mechanisms. We introduce reinforcement learning and explain which model we adopt within our framework in Section 3. Section 4 introduces the analyzed IPv6 protocol and how this analysis is used for implementing the different fuzzing strategies, and it explains how to integrate reinforcement learning in our fuzzing framework to make it intelligent. In Section 5 we discuss related work. Lastly, we conclude our work and determine future work in Section 6.
2
Fuzzing
Fuzzing is known as a special case of software testing. It is a method to discover faults by providing unexpected input while exceptions are monitored. Basically, there are two different fuzzing styles. On one side, there is the generation based fuzzer, where the input is created from scratch. While on the other side, the mutation based fuzzer mutates already existing input but in this case the input needs to be captured beforehand, i.e. with the help of wireshark1. The intention of fuzzing is to expose vulnerabilities of an application, or as in our case, a network protocol. So, it allows to uncover format string vulnerabilities, bufferoverflows, and integer-overflows. One example of such a vulnerability is shown in the following code of the Solaris 8 IP stack [5] uint8 t i p o p t p f i r s t ( i p o p t p t ∗ optp , i p h a t ∗ ipha ) { uint32 t totallen ; t o t a l l e n= ipha−>i p h a v e r s i o n a n d h d r l e n g t h −( u i n t 8 t ) ( ( IP VERSION<<4)+ IP SIMPLE HDR LENGTH IN WORDS ) ; t o t a l l e n <<=2; ... } The first two fields of the IP header are treated as one field with two components, the code assumes that the size of the IP options comes out of subtracting a static value from the first byte. This can lead to a kernel crash when the IP version is less than 4. The first approach for fuzzing [14] was to see the whole input as a sequence of bits and to randomly flip the bits, this is called random fuzzing. Random fuzzers do not have any knowledge of the protocol, and the input space can be tremendous. Therefore, it is more effective to use block-based fuzzers [2]. This way, a message is divided in different blocks, the fixed strings and the variable values. Only the variable input will be fuzzed as it makes no sense to fuzz the fixed strings, as usually the fixed strings are not considered for processing a message. The only thing that may have an impact is to change the size of 1
http://www.wireshark.org/
An Autonomic Testing Framework for IPv6 Configuration Protocols
67
the fixed fields. In comparison to random fuzzing, block-based fuzzing has the advantage, that we have knowledge of the type of the fields, this way, we squeeze the input space not only by ignoring the fixed strings but also by restricting the fuzzing of some particular fields. Some of these particular fields e.g. the checksum field, are inspected before the messages are being processed, and in case the field is obviously erroneous, the messages will not be processed but dismissed. Concluding, it is useless to fuzz the checksum field as we do not want the message to be deleted even before penetrating the implementation.
3
Reinforcement Learning
Reinforcement Learning [10], [18], is known as the problem of an agent that needs to learn by trial-and-error interactions with a dynamic environment. The agent gets a reward or punishment based on the interaction taken. In a reinforcement learning model we have a set of states an agent can be in, and a set of actions for each state. The agent is in one state and has to choose an action based on input that provides indications on the current state. Thereafter, the agent swaps to a different state, and has to choose another action. After each swap the agent receives a reward, or in some models even a punishment. The objective in reinforcement learning is to optimize the reward in long-term. Different algorithms and models have been introduced in order to solve reinforcement learning problems as for temporal difference learning there are QLearning [10], [18] , or the SARSA algorithm [18]. Where temporal-difference (TD) learning combines Monte Carlo ideas and dynamic programming ideas. TD incorporates the advantages of both ideas, it learns immediately from raw experience without a model of the dynamics of the environment. Estimates are updated based on previous experience. Q-learning is an Off-Policy TD control. The action-value function Q directly approximates the optimal action-value function Q* without following a behavior policy. Nevertheless, the policy is followed for determining visited and updated state-action pairs. SARSA stands for State-Action Reward State-Action and is an On-Policy TD control. In other reinforcement learning models, only transitions between states are appraised. In the SARSA algorithm transitions from state-action pair to state-action pair are considered. This algorithm is an on-policy TD control as it follows a behavior policy Π. Action values are evaluated and estimated for this behavior policy, which must be adjusted towards an optimal policy so that the algorithm performs optimal. We focus on the SARSA algorithm for our work, as it is an online temporal-difference learning for transitions between state-action pairs. The following equation shows how the values are updated after each episode. Q(st , at ) ← Q(st , at ) + α[rt+1 + γQ(st+1 , at+1 ) − Q(st , at )]
(1)
eQ(s,a) Πt (s, a) = P r{at = a|st = s} = Q(s,b) be
(2)
68
S. Becker et al.
Equation 1 defines how the estimated values for action a in state s are updated. Q(st , at ) represents the value of performing the action at while being in state st in moment t. We assume that we have a finite number of states S and a finite set of actions A. The reward is represented by r, α is a step-size parameter, and γ is a discount rate to determine the importance of future events, is γ → 0 then it is considered to be shortsighted, is γ → 1 then it is considered as being farsighted. We also assume to have an action selection probability Πt (s, a) that assigns for every state probabilities for performing each action as defined in Equation 2. The objective of reinforcement learning is to learn appropriate values for the function Q and to estimate the right probabilities Πt . Basically, Q measures the efficiency of each action in each state while Πt is useful for driving the exploration path. We see from Equation 1 that actions that have resulted in positive feedbacks will increase the corresponding Q-values and will make these actions more probable in the future (by increasing the probability of these actions).
4
IPv6 Protocol Analysis and Fuzzing Framework
This section describes a reinforcement driven fuzzing framework for IPv6. We consider the Neighbor Discovery Protocol, a simple but heavily used protocol in IPv6, to illustrate our case. 4.1
Neighbor Discovery Protocol Analysis
The Neighbor Discovery (ND) Protocol replaces the Address Resolution Protocol (ARP), and the ICMP Router Discovery and Redirect from IPv4. It is used amongst others for address auto-configuration, to get knowledge about network prefixes, routes, configuration information, link-layer addresses, and to detect duplicate IP addresses. This protocol is composed of five different message types: – – – – –
Neighbor Solicitation Neighbor Advertisement Router Solicitation Router Advertisement Redirect Message
Behavioral model Finite State Machines (FSM) can be used to model the behavior of protocols. An FSM consists of states and transitions between states. An FSM helps to see what messages are sent and received in which state. In Figure 1 we can see the defined FSM, where NS stands for Neighbor Solicitation, NA for Neighbor Advertisement, RS for Router Solicitation, and RA for Router Advertisement. The states S1 to S7 describe the address-auto configuration that is launched when a node is new in a network, as well as for Duplicate Address Detection. For address-auto configuration in ND, the new node sends out a Neighbor Solicitation with its own address, if it receives no Neighbor Advertisement, it knows that the
An Autonomic Testing Framework for IPv6 Configuration Protocols
69
Fig. 1. Finite State Machine of the Neighbor Discovery Protocol
chosen address is not used, so it can use that address. Thereafter, it sends a Router Solicitation message, in order to configure itself correctly according to the information it receives from the Router Advertisement. In case it does not receive a Router Advertisement, it will make a default configuration. Once, the configurations are finished it is part of the network and it will periodically use the previous method in order to detect potential duplicate address in the network. The states from S7 to S11 represent the normal behavior of a configured node, that sends Router Solicitations and Neighbor Solicitations. The redirect message is not present in this FSM as this type of message is sent by the router to inform a node, that a better route exists. In the defined FSM we focussed only on the behavior of a node. The behavior of the router is not taken into account in this work, as it would have only little impact on our work. In order to have a complete view of a protocol, to model an FSM is not sufficient. We also need to analyze message formats.
70
S. Becker et al.
Message Inspection In order to analyze message formats, we decompose the messages into the fields they are composed of. Once we have all the fields of one message, we define what type each field has, and what the default value is of that field as shown in Table 1. Table 1. Decomposition of the Neighbor Advertisement message Neighbor Adverstisement
(bin, 128bits, prefix + interface address) (bin, 128bits, multicast address) (HopLimit, 8bits, 255) (type, 8bits, 136) (code, 8bits, 0) (checksum, 16bits, ICMP checksum) (R, router flag, 1bit) (S, solicitation flag, 1bit) (O, override flag, 1bit) (reserved, 29bits, unused) (target address, bin, 128bits)
Table 2. Fuzzing of the Neighbor Advertisement message Neighbor Adverstisement (bin, 128bits, prefix + interface address) (bin, 128bits, multicast address) (HopLimit, 8bits, 255) (type, 8bits, 137) (1) (code, 8bits, 0) (checksum, 16bits, ICMP checksum) (checksum, 16bits, ICMP checksum) (R, router flag, 1bit) (S, Solicitation flag, 1bit) (3) (reserved, 29bits, unused) (target address, bin, 128bits)
4.2
(2)
Fuzzing Strategies
After analyzing the protocol and the messages, we need to define what strategies we want to apply in order to detect vulnerabilities efficiently. Many different methods and strategies exist for fuzzing [9]. 1. Deleting fields 2. Inserting fields
An Autonomic Testing Framework for IPv6 Configuration Protocols 3. 4. 5. 6.
71
Modifying value of a field Inserting message based on the behavioral model Repeating message based on the behavioral model Dropping message based on the behavioral model
One possibility is to modify one or more data fields. This will have as consequence that an endpoint in the network has to handle wrong input. The data fields can be modified by deleting the fields, by inserting fields, or by modifying the value of the data field. Another possibility is to change the message type, by saying that this message is of another type. In Table 2 examples for fuzzing the neighbor advertisement message are shown. (1) shows the changed type field, originally the ICMP message type is 136, here we changed it to 137. (2) shows the duplicate checksum field. In (3) we deleted the ICMP-field 6. These described methods mutate the content of the message, but we can also mutate a message-order, i.e.sending a different order of messages by inserting, repeating, or dropping messages inside one session. Practically, this means that we need to create/mutate packets with changed data fields, based on the protocol and message analysis we have done previously. Strategical behavior If we were to model the fuzzing strategies and the associated protocol state machine like one single state machine, we would need one state for each combination of fuzzing strategy and state. This can lead to a state explosion because we have for each state six fuzzing strategies that we applied to k fields. Resulting, we have k 6 possibilities. For our example, considering the Neighbor Advertisement message where we have 11 fields, this means that we have 116 possibilities for only one state and concluding, we have 11 ∗ 116 possibilities as we have 11 states in our FSM. This leads to a state space explosion. To counteract this problem, we determine another FSM but not based on the behavioral model of the protocol but on the strategical model, meaning that we will use the different fuzzing strategies as state model as shown in Figure 2. The states are mapped to the fuzzing strategies as follows: – – – – – –
A - Deleting fields B - Inserting fields C - Modifying value of a field D - Inserting message based on the behavioral model E - Repeating message based on the behavioral model F - Dropping message based on the behavioral model
This model has two advantages in comparison to the usage of the FSM of the behavioral model of the protocol. On one side we counteract the state space explosion, and on the other side we provide a general fuzzing model, that can be adapted to other network protocols. 4.3
Framework for Adopting a Reinforcement Learning Model
For adopting a reinforcement Learning model, we need to have a reward function, so that we can deduce for every action in a specified state a reward. For a reward
72
S. Becker et al.
Fig. 2. Fuzzing strategies as behavior model
function, one needs to quantify the impact of IPv6 messages sent to one machine, or to a whole network. One way to see and quantify the impact on the host, lies in pursuing a message and to see the amount of function or system calls executed due to that message. Therefore, we need to trace the IPv6 messages. For this purpose we configure the kernel so that we can trace and follow the mutated IPv6 messages. We start by downloading a kernel compilation via git-core. We configure the kernel with the command make menuconfig, under kernel hacking we can find the section tracers where we can enable kernel function tracer. This way we can trace function calls, as memcpy, kmalloc, etc. Furthermore, we can debug the IPv6 modules of the kernel, by integrating debugging code. After the recompilation of the kernel, we have output of the protocol implementation and we see what code has been executed in which module. We also have information if errors occurred while processing the message. Finally, we can take this information and allocate the number of methods or functions called to the fuzzed messages. Based on this, we have a quantifying mechanism for our reward function. Another way to quantify the impact of our fuzzing framework, is to monitor the messages sent from the host to be fuzzed. This way we can see if the host responds correctly or not, and if the responses are delayed. Hence, we can conclude three different reward functions for our framework, one for the tracing, one for the debugging, and another one for monitoring the network. The first reward function is based on the entropy and the power the function calls produce due to the fuzzed input. The entropy represents the heterogeneity of the fuzzing framework as the distribution of the functions called due to one message over all available functions where qi is the number of different functions called. The entropy of a message qt is defined as: H(qt ) = −
m
pt,i log(pt,i )
i=1
where:
qt,i pt,i = m
i=1 qt,i
(3)
An Autonomic Testing Framework for IPv6 Configuration Protocols
73
The power represents the amount of the functions called due to one input message. The power is defined as follows: m 2 P ower(qt ) = qt,i (4) i=1
We normalize these two metrics and assemble both functions, for entropy Equation 3 and for power Equation 4, to glean a resulting reward function for tracing as follows: r(qt )trace = H(qt )norm + P ower(qt )norm (5) For debugging we define following reward function: 1, if error monitored r(qt )deb = 0, if no error monitored
(6)
For monitoring messages on the network we specify the following reward function: 1, if corrupt or delayed message monitored r(qt )mon = (7) 0, if correct response and no delay monitored Finally, we can calculate an overall payoff function out of Equation 5, Equation 6, and Equation 7 as follows: r(qt ) = r(qt )trace + r(qt )deb + r(qt )mon
(8)
A potential reinforcement learning process chain could look like Figure 3 shows. The objective of this framework is to create an autonomic fuzzer, that can send wrong/fuzzed messages to a host, and that based on monitoring for exceptions, considering the responses of the host that might be corrupt or delayed, learns the impact of the fuzzed messages. While integrating reinforcement learning,
Fig. 3. Potential reinforcement learning process chain of our framework
74
S. Becker et al.
we allow the fuzzer to become autonomic as it will choose the fuzzing strategy for given messages that return the optimal reward and based on the messages returned it can detect in which state it resides. This way, the framework learns the best strategies.
5
Related Work
One can approach the fuzzing problem by different techniques, as using symbolic execution [3], [13], [8], where input variables are made symbolic, and constraints are assembled to these variables along the execution path. Tracing tainted data is another approach where system functions and system calls from input data are followed to trace the behavior of an application as described in [19], [16], [6]. Taint tracing has an important benefit to the symbolic execution. In taint tracing the source code of the application is not needed, whereas it is needed for symbolic execution. Modeling techniques, as event driven models based on extended finite state machines [12], or markov models [17], are used in some papers for detecting flaws and for testing. In [9] a model-based approach for flaw detection based on a finite state machine model as well as fuzzing strategies are proposed. Reverse engineering is used in [4] for fuzzing purposes. The authors of [11] provide a quantification of the extent of IPv6 deployment, they collect and analyze a variety of data to characterize IPv6 penetration. TTCN is a testing framework, which is used for IPv6 testing in [21] and in [20] it is used for testing Neighbor Discovery Protocol. The problem with testing is that, a system or in this case a network protocol is testing with correct input to see if the protocol works as it is specified in the RFC. Testing does not provide any framework for unexpected input, whereas this is the objective of Fuzzing. Reinforcement learning has been addressed in many papers and books, so Sutton and Barto [18] give a good overview of the existing approaches for reinforcement learning. In [10], the major algorithms are explained. The authors of [7] propose to use graph kernels and gaussian processes for relational reinforcement learning. To the best of our knowledge, no previous work has been investigating on Fuzzing using Reinforcement Learning methods.
6
Conclusion and Future Work
In this work, we proposed and assessed a autonomic fuzzing framework for IPv6 protocols. An overview has been given on fuzzing methods and reinforcement learning algorithms. We analyzed the Neighbor Discovery Protocol by specifying a finite state machine and decomposing the different message types. We showed how fuzzing can be used for vulnerability detection in IPv6 protocols by applying different fuzzing strategies. We adopt a reinforcement learning algorithm by defining a reward function that is composed of three different functions based on different monitoring techniques. We used reinforcement learning with the
An Autonomic Testing Framework for IPv6 Configuration Protocols
75
objective to make our fuzzing framework intelligent so that it will learn which fuzzing strategies are most effective. For future work, we plan to give a proof of concept of this work and we want to apply different reinforcement learning models to see which performs best within our framework. This framework can be adopted for other network protocols, for instance SIP (Session Initiation Protocol). The future work consists in adopting this framework for other IPv6 protocols, as ICMPv6 and also for other network protocols. We plan to integrate this framework into the fuzzing framework KiF [1].
References 1. Abdelnur, H.J., State, R., Festor, O.: KiF: a stateful SIP fuzzer. In: IPTComm 2007: Proceedings of the 1st international conference on Principles, systems and applications of IP telecommunications, pp. 47–56. ACM, New York (2007) 2. Aitel, D.: The Advantages of Block-Based Protocol Analysis for Security Testing. Immunity Inc. (February 2002), http://www.immunitysec.com/resources-papers.shtml 3. Cadar, C., Twohey, P., Ganesh, V., Engler, D.: EXE: Automatically Generating Inputs of Death Using Symbolic Execution. In: Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), Virginia, USA (November 2006) 4. Comparetti, P.M., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: Protocol specification extraction. In: IEEE Symposium on Security and Privacy, pp. 110–125 (2009) 5. Dowd, M., McDonald, J., Schuh, J.: The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional, Reading (2006) 6. Drewry, W., Ormandy, T.: Flayer: exposing application internals. In: WOOT 2007: Proceedings of the first USENIX workshop on Offensive Technologies, Berkeley, USA, pp. 1–9. USENIX Association (2007) 7. Driessens, K., Ramon, J., G¨ artner, T.: Graph kernels and gaussian processes for relational reinforcement learning. Mach. Learn. 64(1-3), 91–119 (2006) 8. Godefroid, P., Kie´zun, A., Levin, M.Y.: Grammar-based Whitebox Fuzzing. In: PLDI 2008: ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, US (2008) 9. Hsu, Y., Shu, G., Lee, D.: A model-based approach to security flaw detection of network protocol implementations. In: IEEE International Conference on Network Protocols, ICNP 2008, October 2008, pp. 114–123 (2008) 10. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996) 11. Karpilovsky, E., Gerber, A., Pei, D., Rexford, J., Shaikh, A.: Quantifying the extent of ipv6 deployment. In: Moon, S.B., Teixeira, R., Uhlig, S. (eds.) PAM 2009. LNCS, vol. 5448, pp. 13–22. Springer, Heidelberg (2009) 12. Lee, D., Chen, D., Hao, R., Miller, R.E., Wu, J., Yin, X.: Network protocol system monitoring: a formal approach with passive testing. IEEE/ACM Trans. Netw. 14(2), 424–437 (2006)
76
S. Becker et al.
13. Majumdar, R., Xu, R.-G.: Directed test generation using symbolic grammars. In: ESEC-FSE companion 2007: The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp. 553–556. ACM, New York (2007) 14. Miller, B.P., Fredriksen, L., So, B.: An empirical study of the reliability of unix utilities. Commun. ACM 33(12), 32–44 (1990) 15. Narten, T., Nordmark, E., Simpson, W., Soliman, H.: Neighbor Discovery for IP version 6 (IPv6). RFC 4861, Draft Standard (September 2007) 16. Newsome, J., Brumley, D., Song, D.X.: Vulnerability-specific execution filtering for exploit prevention on commodity software. In: NDSS (2006) 17. Sparks, S., Embleton, S., Cunningham, R., Zou, C.: Automated vulnerability analysis: Leveraging control flow for evolutionary input crafting. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, December 2007, pp. 477–486 (2007) 18. Sutton, R.S., Barto, A.G.: Reinforcement learning i: Introduction (1998) 19. Vuagnoux, M.: Autodaf´e: an Act of Software Torture. In: Proceedings of the 22th Chaos Communication Congress, Berlin, pp. 47–58. Chaos Computer Club (2005) 20. Wang, Z., Yin, X., Wang, H., Wu, J.: Automatic testing of neighbor discovery protocol based on fsm and ttcn. In: The 2004 Joint Conference of the 10th AsiaPacific Conference on Communications and the 5th International Symposium on Multi-Dimensional Mobile Communications Proceedings, Bejing, China, vol. 2, pp. 805–809. IEEE Computer Society Press, Los Alamitos (2004) 21. Zhang, Y., Li, Z.: Ipv6 conformance testing: Theory and practice. In: ITC 2004: Proceedings of the International Test Conference on International Test Conference, Washington, DC, USA, pp. 719–727. IEEE Computer Society Press, Los Alamitos (2004)
Researching Multipath TCP Adoption Henna Warma and Heikki Hämmäinen Aalto University, Department of Communications and Networking, Otakaari 5, 02150 Espoo, Finland {henna.warma,heikki.hammainen}@tkk.fi
Abstract. The adoption process of a new Internet protocol or only a change to an existing one is anything but trivial. The classical diffusion theory does not apply as such for studying protocol adoption because the deployment of a protocol usually requires the involvement of multiple stakeholders with varying interests. Multipath TCP (MPTCP) is a new interesting change to the TCP/IP protocol suite which is an extension to regular TCP. MPTCP exploits the idea of resource pooling principle by splitting the data of a single TCP connection across multiple paths in the Internet. The research introduced in this paper aims to identify and evaluate the incentives of different stakeholders to adopt MPTCP. This paper summarizes the proceedings in MPTCP research from the socio-economic point of view and the plans how MPTCP adoption could be studied further. Also, the main problematics of the research are discussed in the paper. Keywords: Future Internet, multipath TCP, adoption incentives.
1 Introduction During the last few decades, the importance of the Internet for the society has increased significantly. The Internet has evolved from its early days to be a backbone of many economical and social activities in the society. The Internet was originally designed for a very different purpose than it is used nowadays. The requirements of the Internet performance as well as scalability have changed a lot and the current Internet architecture as such suits badly for the demands and requirements of the modern society [1]. It is inevitable that changes are needed both to the architecture and protocols. However, the new technical solutions, e.g., protocols for the future Internet should not be developed without a decent socio-economical control. Because the Internet is a very complex system, the adoption of different technical solutions is not trivial at all. Assessing the viability and durability of new technical solutions already in the early phase of the development may significantly increase the probability that they will get adopted. The social and commercial control of Internet protocols has, for example, an advantage from Internet Engineering Task Force (IETF) point of view. Since standardization is a very time-consuming activity, the efforts should be focused on those solutions which will interest different stakeholders widely and not just a marginal sector of the market players. This type of research that we are intending to carry out may save IETF for doing groundless work or otherwise help in the standardization work. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 77–80, 2010. © IFIP International Federation for Information Processing 2010
78
H. Warma and H. Hämmäinen
1.1 Research Topic Multipath Transmission Control Protocol (MPTCP) is one of the protocols supporting the future Internet design principles [2]. MPTCP is a manifestation of resource pooling principle [3] and it enables the data packets of a single TCP connection to be split across multiple paths in the network. Thus, MPTCP does not only increase the throughput but also improves the resilience of the network connections. From a technical point of view, the deployment of MPTCP requires a relatively small change to TCP/IP stack and it is upon end-users only. Although MPTCP is simple from the implementation point of view, it may have a significant impact on the value networks in the Internet connectivity market. Although MPTCP has verified benefits from the technical point of view, it is not self-evident that different stakeholders would like to adopt this protocol. To evaluate the viability of MPTCP it is essential to know what the incentives of different stakeholders are to adopt MPTCP. This is what we want to study in the first place. Studying the MPTCP adoption incentives can also give more insight into dynamics of Internet protocol adoption in general.
2 State of the Research MPTCP has gained outstanding attention within the Internet engineering community recently. However, the attention has been more on the technical side and MPTCP has not been studied in detail from the socio-economic point of view. This section summarizes the results achieved so far and suggests three distinct cases which are planned to be studied in the near future. 2.1 Achievements The initial evaluation of MPTCP has been completed from the social and commercial point of view. The possible deployment scenarios have been outlined and their business opportunities have been discussed in [4]. The paper is a high level analysis of MPTCP deployment and the deployment scenarios should be studied in more detailed to get a concrete idea of the viability of each scenario. The authors have identified in the paper [4] which stakeholders acting in the Internet connectivity market are affected by the MPTCP deployment. Those stakeholders are end-users, connectivity providers (ISPs), software authors/equipment vendors and router/infrastructure vendors. Costs and benefits for different players are outlined without any deeper analysis but this is a good baseline for the further reseach. One conclusion in the paper is that because the implementation of MPTCP is upon endusers only ISPs cannot prevent the deployment of MPTCP but they can still support and enhance the deployment of MPTCP. However, the adoption incentives of each stakeholder are not tenably justified. In [4] authors also suggest that deployment of MPTCP would be especially attractive when multihoming capability already exist at the end-user device because no hardware updates are not needed. This is the case with many existing mobile devices and therefore the adoption incentives of this scenario for different stakeholders should be especially studied further.
Researching Multipath TCP Adoption
79
2.2 Planned Research The following cases explain how we are going to tackle the research question. Case 1. In the first case we aim to identify all the different factors which are affecting MPTCP adoption and increase the understanding of dependencies of the different factors. For example, do network effects have any effect on MPTCP adoption and if they do, how do they affect demand? The intention is to build a model which would illustrate the system generally. We also try to evaluate the significance of different factors which could later on help us to concentrate on the most important issues. Case 2. Identifying adoption incentives of software or equipment vendors is highly interesting because they have the possibility actually implement MPTCP. If they do not see MPTCP as a good investment, it is unlikely that they will deploy the protocol. In the second case, we are going to study a specific business case where one provider controls the software in both end-user devices and content servers. There are multiple examples of this case: Nokia with mobile phones and Ovi and Apple with iPhone and iTunes Store to mention but a few. The main objective is to find out whether implementing MPTCP to its devices is profitable or not for that kind of provider. The reason why we want to investigate specifically incentives of this type of providers is that if the business case is lucrative for them, they could significantly enhance the deployment of MPTCP. Case 3. In Case 3 we would like to research incentives of ISPs to adopt MPTCP into their business. Although in most scenarios presented in [4] ISPs do not have the ability to implement MPTCP on their own, they can enhance the deployment of MPTCP by offering lucrative service contracts to their customers. The approach would be topdown and we could assume that MPTCP would be deployed. Modeling, for example, different kind of pricing schemes for the MPTCP usage we could find the incentives of an ISP to offer MPTCP related service contracts to its customers. 2.2.1 Research Methods System dynamics is a method for modeling complex systems [6] over time. The different stakeholders affecting of being affected by MPTCP adoption model a complex system which is continuously changing. That is why system dynamics could be also feasible method for modeling the factors affecting MPTCP adoption in Case 1. It would also clarify the relationships of different factors and help to understand the complexities of the research area. In Case 2, the intention is to use at least functional cost modeling, e.g. [5], to analyze the data. To be able to apply this method we have to identify costs of implementing and running MPTCP exactly. So, the implementation and operational costs should be separated in the model.
3 Problematics of the Research Since the planned research is focusing on the future and MPTCP has not even been implemented yet, it is very difficult to obtain data to be analyzed. However, one good way is to interview some experts from different operators or network equipment vendors but
80
H. Warma and H. Hämmäinen
still the amount of reliable data may be limited. Therefore, many assumptions have to be made and if the assumptions are not accurate enough, the results may end up being too far from reality which may decrease the credibility and usefulness of the research. Another problematic issue is how we actually define adoption. This may not be a big problem with the first case but more with the second case. The end-user host, for example, may be MPTCP capable even though the end-user is not using MPTCP. So, the end-user might have the MPTCP capability without making any aware decision to invest on MPTCP. Is MPTCP adopted in this case? This is what we really need to consider before diving deeper into analysis. Finally, if we think about the problem in a larger scope another issue occurs. Since the research plans presented in this paper are supposed to be the first steps towards a doctorial thesis it would be useful to know how the research cases could be mapped into a bigger scope. It would be much easier to carry out the research if we had in mind what we ultimately want to find out. Because the research is very at the beginning, long-term plans still remain unclear.
References 1. Handley, M.: Why the Internet Only Just Works. BT Technology Journal 24 (2006) 2. Ford, A., Raiciu, C.: TCP Extensions for Multipath Operation with Multiple Addresses. IETF Internet-Draft (work in progress), draft-ford-mptcp-multiaddressed (July 2009) 3. Wischik, D., Handley, M., Braun, M.B.: The Resource Pooling Principle. ACM SIGCOMM CCR 38(5), 47–52 (2008) 4. Levä, T., Warma, H., Ford, A., Kostopoulos, A., Heinrich, B., Widera, R., Eardley, P.: Business Aspects of Multipath TCP Adoption. In: Future Internet Assembly (FIA) Book, Valencia, Spain (2010) 5. Dilip, J., Chuang, J., Stoica, I.: Modeling the Adoption of new Network Architectures, CoNEXT (2007) 6. Sterman, J.: Business Dynamics - Systems Thinking and Modeling for a Complex World. McGraw-Hill, New York
5HSRUWDQG5HFLSURFLW\BDVHG,QFHQWLYH0HFKDQLVPVIRU /LYHDQG2Q DHPDQG339LGHR6WUHDPLQJ )DELR9LFWRUD+HFKW and %XUNKDUG 6WLOOHU 'HSDUWPHQWRI,QIRUPDWLFV,),8QLYHUVLW\RI=XULFK %LQ]PKOHVWUDVVH&+²=ULFK6ZLW]HUODQG {KHFKW,VWLOOHU}#LILX]KFK
$EVWUDFW 7KHSRSXODULW\RIYLGHRVWUHDPLQJLQWKH,QWHUQHWKDVFRQWLQXRXVO\ LQFUHDVHGDQGWUDIILFJHQHUDWHGE\VXFKDSSOLFDWLRQVKDVEHFRPHDODUJHSRUWLRQ RI WKH ,QWHUQHW WUDIILF 7KH 3HHUWR3HHU 33 DSSURDFK LV EHFRPLQJ SRSXODU GXHWREHWWHUVFDODELOLW\DQGUHGXFHGGLVWULEXWLRQFRVW/LYHVWUHDPLQJDQG9LG HRRQ'HPDQG9R' DUHWZRILHOGVRIYLGHRVWUHDPLQJZKLFKKDYHEHHQWUHDW HGDVFRPSOHWHO\GLIIHUHQWWKH\DUHKRZHYHUIURPDXVHU¶VSRLQWRIYLHZYHU\ VLPLODU8VHUVRID33V\VWHPDOUHDG\VXFFHVVIXOO\FROODERUDWHRQGLVWULEXWLQJ YLGHR VWUHDPV WKXV DOORZLQJ IRU DQ LQWHJUDWHG FROODERUDWLRQ E\ KDYLQJ SHHUV VWRUHYLGHRVWUHDPVZDWFKHGLQWKHSDVWWRHQDEOHWKHLUGLVWULEXWLRQLQWKHIXWXUH 6LQFHNH\33SURSHUWLHVVXFKDVVFDODELOLW\DQGFRVWVDYLQJVGHSHQGRQWKHHI IHFWLYHQHVVRIWKHXQGHUO\LQJLQFHQWLYHPHFKDQLVPWKLVSDSHU SURSRVHVUH SRUW DQG UHFLSURFLW\EDVHG LQFHQWLYH PHFKDQLVPV DV FDQGLGDWHV WR DGGUHVV WKH SURSRVHGVFHQDULRDQG RXWOLQHVPDMRUUHTXLUHPHQWVWRPHHW
,QWURGXFWLRQ
,Q WKH FRQWH[W RI YLGHR VWUHDPLQJ OLYH VWUHDPLQJ DQG 9LGHRRQ'HPDQG 9R' DUH GLIIHUHQWLDWHGZKLOHWKHIRUPHUSURYLGHVWKHGHOLYHU\RIYLGHRVWUHDPVRIOLYHHYHQWV VXFKDVVSRUWHYHQWVPXVLFFRQFHUWVDQGEUHDNLQJQHZVWKHODWWHUSURYLGHVWKHGLVWUL EXWLRQRIUHFRUGHGFRQWHQWVXFKDVPRYLHV:KLOHGLIIHUHQWLQWHFKQLFDOQDWXUHDQGLQ WHUPVRILPSOHPHQWDWLRQIURPDXVHU¶VSRLQWRIYLHZWKH\DUHYHU\VLPLODU6LQFHXV HUVRID33V\VWHPDOUHDG\VXFFHVVIXOO\FROODERUDWHRQGLVWULEXWLQJYLGHRVWUHDPVWKH QHZLGHDRI/LYH6KLIW>@LVWRDOORZIRUDQLQWHJUDWHGFROODERUDWLRQE\VWRULQJWKRVH YLGHRVWUHDPVWKDWXVHUVKDYHZDWFKHGLQWKHSDVWLQRUGHUWRHQDEOHWKHVHXVHUVWRGLV WULEXWHWKHPLQWKHIXWXUH:LWKWKHFRPELQDWLRQRIOLYHDQGRQGHPDQGYLGHRVWUHDP LQJ XVHUV ZLOO EH DEOH WR ZLWKRXW KDYLQJ SUHYLRXVO\ SUHSDUHG DQ\ ORFDO UHFRUGLQJ ZDWFKWKHPDWFKRUWKH2O\PSLFVIURPWKHVWDUWDQGMXPSRYHUXQLQWHUHVWLQJSDUWVXQ WLOKHVKHVHDPOHVVO\FDWFKHVXSZLWKWKHOLYHVWUHDP6LPLODUO\WKHOLYHWUDQVPLVVLRQ PD\EHXVHGIRUWKHSUHPLHURIDPRYLHRU79VKRZZKHQLWLVH[SHFWHGWKDWVHYHUDO SHRSOH ZDWFK DW WKH VDPH WLPH ,QVWDQWO\ DQG DXWRPDWLFDOO\ LW ZLOO EH DYDLODEOH ² VLQFHWKHVWDUWLQJWLPHRIWKHSUHPLHU²WRHYHU\XVHUZKRMRLQVDWDODWHUWLPH0DMRU DGYDQWDJHVIRUEURDGFDVWHUVDUHOHVVDGPLQLVWUDWLRQHIIRUWVLQUHFRUGLQJDQGGLVWULEXW LQJYLGHRVWUHDPVVLQFHLWLVGRQHDXWRPDWLFDOO\DQGEDQGZLGWKDQGVWRUDJHVDYLQJV RFFXU B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 81–84, 2010. © IFIP International Federation for Information Processing 2010
82
F.V. Hecht and B. Stiller
7R DGGUHVV VFDODELOLW\ SUREOHPV RI FOLHQWVHUYHU VROXWLRQV WR UHGXFH ILQDQFLDO FRVWVPDNLQJYLGHRVWUHDPLQJGLVWULEXWLRQIHDVLEOHIRUVPDOOFRQWHQWSURYLGHUVHYHQ SULYDWHHQGXVHUV33EDVHGYLGHRVWUHDPLQJVROXWLRQVKDYHDSSHDUHG+RZHYHUWKH SUHVHQFHRIIUHHULGHUV²SHHUVWKDWGRQRWFRQWULEXWHUHVRXUFHVWRWKHQHWZRUN²XQ GHUPLQHV WKH SRWHQWLDO VFDODELOLW\ DQG FRVWVDYLQJV RI WKH 33 DSSURDFKHV ,QFHQWLYH PHFKDQLVPVKDYHEHHQSURSRVHGWRDGGUHVVWKLVIDLUQHVVSUREOHPE\PRWLYDWLQJSHHUV WRFRQWULEXWHUHVRXUFHVDQGDFWIDLUO\WRZDUGVRWKHUSHHUV8SWRQRZQRIXOO\LQWHJUDW HG LQFHQWLYH PHFKDQLVPV HVSHFLDOO\ DGGUHVVLQJ WKLV OLYH VWUHDPLQJ FRPELQDWLRQ $ SHHUWKDWKDVUHFRUGHGYLGHRVWUHDPVQHHGVDQLQFHQWLYHWRNHHSWKHFRS\DQGSURYLGH LWWRRWKHUSHHUVZLWKRXWQHFHVVDULO\EHLQJVLPXOWDQHRXVO\LQWHUHVWHGLQVWUHDPVLWFDQ UHWXUQ WR WKH 33 FRPPXQLW\ 7KXV UHSRUWEDVHG DQG UHFLSURFLW\EDVHG LQFHQWLYH PHFKDQLVPVDUHH[FHOOHQWFDQGLGDWHVWRDGGUHVVVXFKDVFHQDULR&XUUHQWDSSURDFKHV IRUWKLVLQWHJUDWHGYLGHRVWUHDPLQJKRZHYHUIDLOWRSURGXFHDFFHSWDEOHUHVXOWVVLQFH WKH\ HLWKHU UHTXLUH V\PPHWU\ RI LQWHUHVW >@ UHFLSURFDWLRQ >@ >@ RU DUH SURQH WR IDOVHUHSRUWV>@
5HODWHG:RUN
7KHNH\DVSHFWLQDQRSHQ33V\VWHPLVWKHSUHVHQFHRIDQHIIHFWLYHDQGVHFXUH LQFHQWLYH PHFKDQLVP WR LQFHQWLYL]H SHHUV WR VKDUH WKHLU UHVRXUFHV VXFK DV XSVWUHDP EDQGZLGWK&ORVHG33V\VWHPVHJ6RS&DVW>@GRQRWQHHGWRSURYLGHLQFHQWLYHV IRUSHHUVWRVKDUHVLQFHSURWRFRODQGVRXUFHFRGHDUHFORVHGDQGSURWHFWHGE\OLFHQVH DJUHHPHQWVZKLFKIRUFHXVHUVWRVKDUHWKHLUSURFHVVRUDQGEDQGZLGWK ,QUHFLSURFLW\EDVHGLQFHQWLYHPHFKDQLVPVSHHUVPDLQWDLQKLVWRULHVRISDVWWUDQV DFWLRQV ZLWK RWKHU SHHUV DQG XVH WKLV LQIRUPDWLRQ WR GHFLGH WR ZKLFK SHHUV WR VKDUH WKHLUUHVRXUFHV7KHVHVFKHPHVFDQEHEDVHGRQGLUHFWUHFLSURFLW\RULQGLUHFWUHFLSURF LW\ ,Q GLUHFWUHFLSURFLW\ LQFHQWLYH PHFKDQLVPV VXFK DV 7LWIRUWDW 7)7 >@ D SHHU EDVHVLWVGHFLVLRQVROHO\RQLWVRZQSDVWH[SHULHQFHVZLWKRWKHUSHHUVUHZDUGLQJSHHUV ZKLFKKDYHFRQWULEXWHGWKHPRVW7)7GRHVQRWSHUIRUPZHOOLQOLYHYLGHRVWUHDPLQJ VLQFHWKHLQIRUPDWLRQLVOLNHO\WRIORZLQRQHGLUHFWLRQRQO\ZKHQDSHHUSURYLGLQJWKH VWUHDPLVDKHDGRIWKHSHHUUHFHLYLQJLWQRULQ9R'VLQFHLWUHTXLUHVWKDWHDFKSHHUV EH LQWHUHVWHG LQ VRPHWKLQJ WKH RWKHU KDV ,Q LQGLUHFWUHFLSURFLW\ LQFHQWLYH PHFKD QLVPVWKHGHFLVLRQRIDSHHU$DERXWZKLFKDSHHU%WRSURYLGHZLWKUHVRXUFHVDOVR GHSHQGVRQWKHVHUYLFHWKDW%KDVSURYLGHGWRRWKHUSHHUVLQWKHV\VWHP$WUDQVLWLYH 7)7PHFKDQLVPFDQEHXVHGW\SLFDOO\XVLQJDVKDUHGKLVWRU\&RPSDFW36+>@FRP ELQHVERWK DSSURDFKHV 7)7DQG WUDQVLWLYH 7)7 WR H[SORLW LQGLUHFWUHFLSURFLW\ XVLQJ ERWK SULYDWH DQG VKDUHG KLVWRU\ XVLQJ 0D[)ORZ WR PLWLJDWH IDOVH UHSRUWV $OWKRXJK VKRZQWREHPRUHHIIHFWLYHWKDQ7)7LQD33ILOHVKDULQJVFHQDULR&RPSDFW36+KDV QRW\HWEHHQDSSOLHGWR33OLYHRURQGHPDQGYLGHRVWUHDPLQJ 5HSRUWEDVHGLQFHQWLYHPHFKDQLVPVVXFKDV*LYHWR*HW*7* >@RIIHUDWUDQ VLWLYH UHSXWDWLRQ VFRUH OHWWLQJ SHHUV IDYRU XSORDGLQJ WR RWKHU SHHUV ZKLFK KDYH XS ORDGHGPRUHWRWKLUGSHHUV7KH\GRQRWH[SHFWUHFLSURFLW\ZKLFKLVDGHVLUDEOHSURS HUW\ LQ ERWK 33 OLYH YLGHR VWUHDPLQJ DQG 9R' FDVHV ZKHUH WKH LQIRUPDWLRQ IORZV PRVWO\LQRQHGLUHFWLRQ+RZHYHULWUHOLHVRQWUXWKIXOUHSRUWVIURPSHHUVDIDFWWKDW
Report- and Reciprocity-Based Incentive Mechanisms
83
PDNHV*7*QRWXVHIXOLQUHDOOLIH7KHPLVVLQJSURSHUW\LVSURYLGLQJJXDUDQWHHVRI IHUHGE\UHFLSURFLW\EDVHGPHFKDQLVPV
3URSRVHG$SSURDFK
7KHDSSURDFKLQYHVWLJDWHGZLOOFRPSULVHWKUHHLQFHQWLYHPHFKDQLVPVD DQLQGLUHFW UHFLSURFLW\LQFHQWLYHPHFKDQLVPZKLFKZLOOEHUHILQHGDQG DSSOLHGWRERWKOLYHDQG RQGHPDQG33YLGHRVWUHDPLQJE DUHSRUWEDVHGLQFHQWLYHPHFKDQLVPZKLFKH[ WHQGV WKH ZRUN LQ WKH DUHD RI EHLQJ 6\ELOUHVLVWDQW EDVHG RQ UDQGRP VDPSOLQJ DQG 33DXGLWLQJPHWKRGVDQGF DFRPELQDWLRQRIWKHWZRZKLFKZLOOGHWHUPLQHDEDO DQFHGFRPSURPLVHEHWZHHQUHFLSURFLW\DQGUHSRUWEDVHGLQFHQWLYHPHFKDQLVPV ,QGLUHFWUHFLSURFLW\ LQFHQWLYH PHFKDQLVPV KDYH SURSHUWLHV ZKLFK DGGUHVV DV\P PHWU\RILQWHUHVWWKHFDVHZKHQDSHHUSURYLGHVVWRUHGFRQWHQWDQGLVQRWLQWHUHVWHGLQ DQ\FRQWHQWWKHUHTXHVWLQJSHHUKDVWRRIIHUDQGZKHQDXVHUVZLWFKHVFKDQQHOV,WKDV EHHQ DSSOLHG WR 33 ILOHVKDULQJ V\VWHPV >@ VKRZLQJ LPSURYHPHQW LQ GRZQORDG WLPHVZKLOHRIIHULQJSURWHFWLRQDJDLQVWIDOVHUHSRUWVZKLFKLVLQKHUHQWIURPUHFLSURF LW\EDVHG LQFHQWLYH PHFKDQLVPV 7KH ILUVW VWHS LV WRDGDSW &RPSDFW36+ >@WR D 33 YLGHR VWUHDPLQJ VFHQDULR FUHDWLQJ D QHZ LQFHQWLYH VFKHPH 9LGHR36+ ,Q ILOH VKDU LQJSHHUVH[FKDQJHFKXQNVRIILOHVLQDQ\RUGHUDQGDFKXQNZLOOUHPDLQLQWHUHVWLQJ WRRWKHUSHHUVIRUDORQJWLPH,Q33OLYHVWUHDPLQJWKHGRZQORDGUDWHKDVDQXSSHU ERXQGGLFWDWHGE\WKHELWUDWHRIWKHYLGHRVWUHDPDQGDORZHUERXQGGLFWDWHGE\WKH ORVVWROHUDWHGE\WKHYLGHRFRGHFXVHGDQGFKXQNVJHWREVROHWHDVWKH\ORVHWKHLUOLYH QHVV,Q9R'FKXQNVUHPDLQLQWHUHVWLQJIRUDORQJHUWLPHDQGGRZQORDGUDWHPD\EH IDVWHUWKDQWKHELWUDWHRIWKHYLGHRVWUHDP7KHQHZDSSURDFKZLOOH[DPLQHLWVHIIHFWV RQ 4R( 4XDOLW\RI([SHULHQFH PHWULFV XQGHUVWDQGLQJ DQG LPSURYLQJ LWV VFDODELOLW\ 6\ELODQGFROOXVLRQSURRIQHVVSURSHUWLHVLQWKHJLYHQVFHQDULR $IXQGDPHQWDOSUREOHPRIUHFLSURFLW\EDVHGLQFHQWLYHPHFKDQLVPVLVWKDWLWLVLP SRVVLEOHIRUWKHSHHUFDVWHUWRXVHWKHPVLQFHWKH\GRQRWGRZQORDG$OVRLWLVGLIIL FXOWWRWUDQVIHUUHSXWDWLRQVIRUPRUHWKDQRQHKRSGXHWRLQIRUPDWLRQEHFRPLQJRXW GDWHG+HQFHDQHZ UHSRUWEDVHGLQFHQWLYHPHFKDQLVP15%,0 ZLOOEHGHYHORSHG DQGHYDOXDWHG7KHPHFKDQLVPLVLQVSLUHGE\PXOWLOHYHOPDUNHWLQJVFKHPHVVXFKDV $P:D\,WZLOOIXQFWLRQRQWKHEDVHWKDWHDFKSHHULVJLYHQDVFRUHZKLFKWDNHVLQWR FRQVLGHUDWLRQWKHUHFXUVLYHFRQWULEXWLRQRIHDFKSHHUWRWKHHQWLUH33QHWZRUNWKHUH IRUH LQ RUGHU WR LQFUHDVH WKHLU RZQ VFRUH SHHUV QHHG WR XSORDG WR SHHUV ZLWK KLJK VFRUH$KLJKHUVFRUHLPSURYHVWKHFKDQFHRIJHWWLQJFORVHWRWKHSHHUFDVWHUEHPRUH UHOLDEOHDQGH[SHULHQFHVKRUWHUGHOD\ZKHQZDWFKLQJDWRUQHDUOLYHVWUHDPLQJ6FRUHV ZLOOEHFDOFXODWHGXVLQJD6\ELOUHVLVWDQWV\VWHPZKLFKSHQDOL]HVSHHUVWKDWDUHVXV SHFWV RIEHLQJ6\ELOV)LQDOO\DGLVWULEXWHGDXGLWLQJV\VWHPZLOO YHULI\SHHUVFODLPV IRUWKHLUVFRUH6XFKDQLQFHQWLYHPHFKDQLVPLVH[SHFWHGWRSURGXFHJRRGUHVXOWVZLWK UHJDUG WR VFDODELOLW\ DQG IDLUQHVV ZKLOH RIIHULQJ D SURYHG UHVLVWDQFH WR PDOLFLRXV SHHUVWKDWOLHDERXWWKHLURURWKHUSHHUVFRQWULEXWLRQFROOXGHRUFUHDWH6\ELOV7KHDS SURDFKZLOOLQYHVWLJDWHWUDGHRIIVEHWZHHQFKDQFHRUIUHTXHQF\RIWKHDXGLWLQJDQGUH WXUQDFKLHYHGE\PDOLFLRXVSHHUV7KHGHYHORSPHQWRIVXFKDQLQFHQWLYHPHFKDQLVP GRHVUHSUHVHQWDQHZFRQWULEXWLRQWRWKHDUHDRIUHSRUWEDVHGLQFHQWLYHPHFKDQLVPV
84
F.V. Hecht and B. Stiller
)LQDOO\ WKLVDSSURDFK SURSRVHV WKHLQYHVWLJDWLRQRIWKH FRPELQDWLRQ RI ERWK DS SURDFKHVWR FRPSOHPHQWDGYDQWDJHV DQGOLPLWDWLRQVRIERWK 7KHPDLQDGYDQWDJHRI UHFLSURFLW\EDVHGLQFHQWLYH PHFKDQLVPV OLNH9LGHR36+ LV WKDW WKH\XVHUHFLSURFLW\ WRSUHYHQWRYHUVSHQGLQJRIUHVRXUFHVDQGLQFUHDVHIDLUQHVV,QWKH15%,0SURSRVHG SHHUVPD\DFKLHYHSDUWLDOVXFFHVVZKHQFKHDWLQJEXWWKH\ZLOOKDYHDFOHDUFKDQFHRI JHWWLQJFDXJKWDV ZHOOZKLFKZLOOEH DWUDGHRII EHWZHHQWKHIUHTXHQF\ RIWKHVDP SOLQJVFKHPHXVHGIRUDXGLWLQJ,IFRQWULEXWLRQVFDQEHYHULILHGSDUWLDOO\E\UHFLSURFD WLRQWKHVXFFHVVRIFKHDWLQJSHHUVLVH[SHFWHGWREHORZHU /LNHZLVH WKH SRVVLEOH GLVDGYDQWDJH RI 9LGHR36+ FRQFHUQLQJ LWV OLPLWHG YLHZ VLQFHRQO\UHSXWDWLRQWUDQVIHURIRQHKRSLVFXUUHQWO\IHDVLEOHLQ&RPSDFW36+FDQEH RYHUFRPH E\ XVLQJ YHULILDEOH UHSRUWV DV LQ 15%,0 $QRWKHU GLIIHUHQFH EHWZHHQ 15%,0 DQG &RPSDFW36+ UHIHUV WR WKH GXUDELOLW\ RI WKH LQIRUPDWLRQ &RPSDFW36+ DOORZV UHSXWDWLRQ LQIRUPDWLRQ WR EH UHWDLQHG IRU ODWHU XVH LI LW LV DVVXPHG WKDW SHHU LGHQWLWLHV DUH GXUDEOH ZKLOH 15%,0 XVHV RQO\ LQVWDQW LQIRUPDWLRQ WR HYDOXDWH WKH FRQWULEXWLRQRISHHUV7KHXVHFDVHVWXGLHGZLOOSURILWIURPUHSXWDWLRQLQIRUPDWLRQWKDW LVGXUDEOHVLQFHSHHUVPXVWEHLQFHQWLYL]HGWRVWRUHDQGSURYLGHFKXQNVWRRWKHUSHHUV DQGEHUHFLSURFDWHGHYHQWXDOO\7KHUHVXOWLQJLQFHQWLYHPHFKDQLVPZLOOEHUHVLVWDQWWR 6\ELODWWDFNVXSWRDSURYHQSRLQW
6XPPDU\
7KH GHYHORSPHQW RI LQFHQWLYH PHFKDQLVPV VXLWDEOH IRU DQ LQWHJUDWHG OLYH VWUHDPLQJ DQG 9R' VFHQDULR LV FKDOOHQJLQJ 7KXV WKLV SDSHU SURSRVHV WKUHH SRVVLEOH PHFKD QLVPVWDFNOLQJWKHIDLUQHVVSUREOHPXVLQJGLIIHUHQWSDUDGLJPV:KLOHWKHILUVWWZRLQ FHQWLYH PHFKDQLVPV ZLOO UHSUHVHQW D FRQWULEXWLRQ LQ WKHLU RZQ WKH WKLUG RQH LV SODQQHGWRFRPSOHPHQWWKHLUFKDUDFWHULVWLFVLQDQLQWHJUDWHGDSSURDFK $FNQRZOHGJPHQW 7KLVZRUNKDVEHHQSHUIRUPHGSDUWLDOO\LQWKHIUDPHZRUNRIWKH (8,&7675(36PRRWK,7)3,&7 The authors would like to thank also Richard Clegg and Raul Landa for their discussions and support.
5HIHUHQFHV 1. Bocek, T., El-khatib, Y., Hecht, F.V., Hausheer, D., Stiller, B.: CompactPSH: An Efficient Transitive TFT Incentive Scheme for Peer-to-Peer Networks. In: 34th IEEE Conference on Local Computer Networks (LCN 2009), Zürich, Switzerland (October 2009) 2. Cohen, B.: Incentives Build Robustness in BitTorrent. In: 1st Workshop on the Economics of Peer-to-Peer Systems, Berkeley, California, U.S.A. (June 2003) 3. Hecht, F.V., Bocek, T., Morariu, C., Hausheer, D., Stiller, B.: LiveShift: Peer-to-peer Live Streaming with Distributed Time-Shifting. In: 8th International Conference on Peer-to-Peer Computing (P2P 2008), Demo Session, Aachen, Germany (September 2008) 4. Mol, J.J.D., Pouwelse, J.A., Meulpolder, M., Epema, D.H.J., Sips, H.J.: Give-to-Get: Freeriding Resilient Video-on-demand in P2P Systems. In: Multimedia Computing and Networking Conference (MMCN 2008), San Jose, California, U.S.A. (January 2008) 5. SopCast, http://www.sopcast.com/ (Last visited: February 2010)
Model-Driven Service Level Management* Anacleto Correia1,2 and Fernando Brito e Abreu1 1
QUASAR/CITI/Faculdade de Ciências e Tecnologia/Universidade Nova Lisboa, Caparica 2 Escola Superior de Tecnologia/Instituto Politécnico de Setúbal, Setúbal [email protected], [email protected]
Abstract. Service-level agreements (SLA) definition and monitoring are open issues within the IT Service Management (ITSM) domain. Our main goals are to propose a model-based approach to IT services SLA specification and compliance verification. The specification will be accomplished by proposing a SLA language - a domain specific language for defining quality attributes as non functional requirements (NFRs) in the context of ITSM. This will allow that SLA monitoring and compliance validation at a level of abstraction that is understood by the stakeholders involved in the service specification. Keywords: IT Service Management, ITIL, SLA, MDA; DSL, NFR, BPMN.
1 Introduction1 Most organizations rely on Information Technology (IT) services to support their business processes. IT services are built upon the technical infrastructure, systems and application software. The set of processes that allow planning, organizing, directing and controlling the provisioning of IT services is called an IT Service Management (ITSM). Several ITSM frameworks have been proposed, such as the ITIL [1]. In ITIL, one of the most relevant processes is service level management. In order to support the IT services, IT providers use two kinds of applications: service management applications, which allow tracking IT services incidents and problems; and systems management tools, for monitoring and controlling networks, systems or applications. Among the concerns within the service level management process, are for instance, the requirements for services availability, performance, accuracy, which are specified in terms of service-level agreements (SLA). In a SLA, such requirements, known as quality attributes, are quantitatively bounded (e.g. maximum time to recover). However, although those requirements are non-functional in nature, and many nonfunctional requirements techniques have been proposed in the literature [2], those techniques are not being used in SLA specification. SLA definition and monitoring are identified as open issues within the ITSM domain [3]. In this research plan we are going to focus our attention on three specific problems in this context: The first problem arises from the current practice in SLA specification for IT services, mostly based in templates filled with natural language *
The work presented herein was partly supported by the VALSE project of the CITI research center within the Department of Informatics at FCT/UNL in Portugal.
,
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 85–88, 2010. © IFIP International Federation for Information Processing 2010
86
A. Correia and F. Brito e Abreu
descriptions, and results not amenable to the automation of SLA compliance verification. Since SLA definition are subjective in nature it renders the second problem: it is not clear to all involved stakeholders which are the activities upon which SLA compliance should be verified, and how their non-compliance will affect the evolution in the process describing the IT service delivery. The third problem concerns a semantic gap which occurs in SLA compliance verification. This gap comes from the fact that compliance checking has been relying mostly in Quality of Service (QoS) information from systems management applications, instead of consolidated data presented at the process model representation of the IT service. In the rest of this paper we will present a survey on the current state of the art of SLAs (section 2) and our proposal to deal with SLAs in the ITSM domain (section 3).
2 Related Work When discussing related work, it is useful to have a common taxonomy that fosters a more systematic study of different approaches and also helps identifying the issues, which are not overcome by current approaches, and our proposal intends to tackle. Our taxonomy for describing the entries of the survey of the current state of the art in SLA specification and compliance validation (see Table 1) is based in the following dimensions: Domain: it refers to the technological context at which the SLA is defined and evaluated. (1) ITSM – regarding processes of an ITSM framework; (2) SOA – concerning web services interacting in an SOA environment; (3) System Management – related with services of IT infrastructure assessed by QoS parameters. Formalization: it refers to the level of formalism used for SLA specification. (1) Formal – mathematically-based techniques amenable of rigorous proof; (2) Semi-formal – although the representation might not be a complete description of reality, the representation itself and the system description are equivalent; (3) Informal – descriptions in natural language usually using a structured template form. Abstraction Level: it refers to the kind of representation used in SLA. (1) Model-based – diagrammatic models for SLA elicitation and specification; (2) Hybrid – a blended approach using a diagrammatic notation supplemented by a textual language; (3) Textual – specification exclusively based in textual language. Verification Compliance: it refers to the level at which the SLA validation occurs. (1) Model-based – the SLA compliance is checked at the level of models; (2) Message-based – the compliance verification is based in the exchanged messages; (3) Data-based – the compliance verification is based on information stored in databases or the like repositories, by applications that deal with and support special types of IT infrastructure or specific IT processes; (4) Absent – there is no SLA compliance checking. The most well-known approaches on SLAs are related with SOA (Web Service Level Agreement (WSLA) [4], the Web Services Agreement Specification (WSAgreement) specification [5], Web Services Offering Language (WSOL) [6], the Web Service Management Layer (WSML) [7] and the Web Service Modeling Ontology (WSMO) [8], Rule-Based Service Level Agreements (RBSLA) [9] and the SLAng [10]). These approaches only address computational processes and are not able to deal with processes where humans are involved such the ones covered of ITSM. Moreover
Model-Driven Service Level Management
87
the verification of compliance is made at an implementation instead of at a model level most suitable for common stakeholders. There are also available, a number of service management applications that support the ITSM service level management process [11], as well as systems management tools [12], that monitor network, IT devices and applications at QoS parameters level. Albeit the former applications are related with the ITSM domain, they have issues, as we highlighted in section 1, such as the informal specification of SLA and the monitoring and compliance verification, based mostly in scattered and not integrated repositories of data. The latter applications have also these kinds of issues, and moreover their domain is mainly the technical infrastructure. Table 1. Summary of current SLA approaches to specification and validation Approach
[4] [5] [6] [7] [8] [9] [10] [11] [12]
Domain
(2) (2) (1) (3)
Formalization Abstraction Level
(1) (1) (3) (3)
(3) (2) (3) (3)
Verification Compliance
(2) (2) (3) (3)
3 SLA Specification and Validation Environment Our proposal for the specification and validation of SLA compliance of IT services is supported by an environment, the SLA specification and validation Environment (SLAEnv). The SLAEnv architecture addresses two main phases in the SLA life cycle: the SLA specification phase, when a negotiation process takes place among the stakeholders; and the validation phase, which includes the monitoring, reporting, evaluating and improving of the SLA. The negotiation step aims to define the conditions of the SLA contract. An SLA Editor supports this activity. The SLA Editor must be built based in the Process Metamodel and SLA Metamodel, using a DSL Tool. For the Process Metamodel definition the foundation is the BPMN 2.0 specification. On the other hand the SLA metamodel specification is based in techniques of non functional requirements (NFRs) elicitation. The output of the SLA Editor is an SLA Contract expressed in a SLA language, compliant with the SLA metamodel, and additional integrity constraints expressions written in the OCL. The conditions of an SLA contract are the input for an SLA Evaluator. System states (snapshots of a running system) could be created and manipulated during an evaluation. Information about a system state will be attained by querying the Process Modeler Animator for IT service execution. For each snapshot OCL constraints can be checked. By evaluating OCL expressions we could gather detailed information about the system’s SLA conformance. Graphical views (eg. dashboard or balanced scorecard) of system’s state can be provided to the SLA Visualizer. The Process Modeler Animator is a BPMN console where IT services are depicted and animated as BPMN process diagrams. An IT service specification, can be persisted in a standard format (e.g. XMI), which will be the Process Model repository.
88
A. Correia and F. Brito e Abreu
This definition is required for the SLA contract definition, since it will allow that hooks, concerning SLA quality attributes, could be associated to BPMN diagram elements (activities, gateways, etc.). Those hooks will allow the SLA Evaluator to trace an IT service instance execution to the corresponding SLA contract quality attributes thresholds. Therefore, the SLA Evaluator is the corner stone of the environment since it joins the specification and validation phases of an SLA. This is done by matching the SLA contract assertions with the IT service data execution, and making available the resulting snapshot to the visualization component. This approach, unlike the mentioned in section 2, is tailored for SLA monitoring and compliance verification in the ITSM domain. Moreover, it allows a semi-formal, and model based specification and conformance checking of SLA. The validation process of this proposal will take place in an IT service management project on the domain of financial self-service systems.
4 Conclusions Since SLA definition and monitoring are open issues within the ITSM domain, this research work intends to address it by proposing a model-based approach to IT services SLA specification and compliance verification. The specification part will be accomplished by proposing a SLA language. This language will be derived by way of a domain specific language that is intended to be able to specify the quality attributes, also known as non functional requirements, in the context of ITSM. The main contribution of this proposal is to allow SLA compliance verification to be made at the same abstraction level of specification, i.e., at a model level, filling the semantic gap between stakeholders.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
ITIL3, OGC. Summary, ITIL Version 3: ITSMF - ITSM Forum (2007) Matoussi, A., Laleau, R.: A Survey of NFR in Soft. Dev. Process. DI UP 12 (2008) Coyle, D.M., Brittain, K.: Magic Quadrant for IT Service Desk. In: Gartner 2009 (2009) IBM. WSLA Language Specification, version 1.0. (2001) Andrieux, A., Czajkowski, K., et al.: WS-Agreement. Open Grid Forum (2007) Tosic, V., Lutfiyya, H., Tang, Y.: WSOL. In: AICT-ICIW 2006, p. 156 (2006) Verheecke, B.: Web Services Management Layer, WSML (2003) W3C. Web Service Modeling Ontology, WSMO (2005) Paschke, A.: RBSLA. In: Proceedings of ICCIMCA 2005, vol. 02, pp. 308–314 (2005) Skene, J., Lamanna, D., Emmerich, W.: Precise SLA. In: ICSE 2004, Edinburgh (2004) PinkElephant. PinkVERIFY 3.0 Toolsets (2010) Gläßer, L.: Development and Operation of Application Solutions. Siemens (2005)
Managing Risks at Runtime in VoIP Networks and Services Oussema Dabbebi, Remi Badonnel, and Olivier Festor INRIA Nancy Grand Est - LORIA Campus Scientifique - BP 239 Technopˆ ole de Nancy Brabois 54506 Vandœuvre-l`es-Nancy, France
Abstract. IP telephony is less confined than traditional PSTN telephony. As a consequence, it is more exposed to security attacks. These attacks are specific to VoIP protocols such as SPIT, or are inherited from the IP layer such as ARP poisoning. Protection mechanisms are often available, but they may seriously impact on the quality of service of such critical environments. We propose to exploit and automate risk management methods and techniques for VoIP infrastructures. Our objective is to dynamically adapt the exposure of a VoIP network with regard to the attack potentiality while minimizing the impact for the service. This paper describes the challenges of risk management for VoIP, our runtime strategy for assessing and treating risks, preliminary results based on Monte-Carlo simulations and future work.
1
Introduction and Challenges
Voice over IP (VoIP) defines a new paradigm for telephony services. It permits to transmit telephony communications directly over IP networks at a low cost and with a higher flexibility than traditional PSTN1 telephony. It relies on an infrastructure composed of VoIP phones, IPBX servers for establishing and managing call sessions, and VoIP gateways for interconnecting with the PSTN. It exploits dedicated protocols including signaling protocols such as SIP2 or H.323, and transport protocols such as RTP or RTCP. However, VoIP communications are less confined, and then are more exposed to security attacks. They are concerned by security attacks inherited from the IP layer such as poisoning and flooding attacks, and also by VoIP-specific attacks such as SIP spoofing and SPIT3 [1]. Moreover, security mechanisms may significantly deteriorate the quality and usability of these services. Typically, the application of encryption, filtering and authentication techniques may seriously increase delays and loads of VoIP communications. Risk management provides new opportunities for dealing with these security issues in such critical environments. It is defined as a process which consists in 1 2 3
Public Switch Telephony Network. Session Initiation Protocol. Spam Over Internet Telephony.
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 89–92, 2010. c IFIP International Federation for Information Processing 2010
90
O. Dabbebi, R. Badonnel, and O. Festor
assessing risks and treating them, i.e. taking steps in order to minimize them to an acceptable level [2]. Existing work related to risk assessment in VoIP infrastructures includes approaches for assessing threats (defender viewpoint) such as honeypot architectures and intrusion detection systems based on signatures, or based on anomalies [3,4]. They also include approaches for assessing vulnerabilities (attacker side) such as fuzzing-based discovery and auditing/benchmarking tools [5]. Risk models supporting this assessment may be qualitative (based on linguistic scales), quantitative (based on probabilities) or mixed (based on aggregations of qualitative parameters) [6]. Existing work on risk treatments permit to eliminate risks (risk avoidance) by applying best practices, to reduce and mitigate them (risk optimization) by deploying protection and prevention systems [7], to ensure against them (risk transfert) by subscribing an insurance contract or to accept them (risk retention) [8]. When we look further at these approaches proposed for VoIP networks and services, we can clearly observe that most of them do not really address risk management. They usually do not integrate any risk model, or at least not explicitly, and they only cover partially the risk management process. There is therefore a serious need for applying risk management in these environments in order to protect them efficiently, while maintaining their quality of service.
2
Runtime Risk Management for VoIP
We propose to investigate risk management methods and techniques for securing VoIP infrastructures. In particular, we are interested in automating risk management at runtime (reactively): the objective is to dynamically adapt the exposure of the VoIP network and its components with respect to the potentiality of attacks. This automation aims at reinforcing the coupling between the risk assessment phase and the risk treatment phase. The exposure is continuously controlled based on the activation and the deactivation of countermeasures. A countermeasure permits to reduce the performance of a security attack, but it may also deteriorate the service by introducing additional delays or reducing the access to some specific features. In the context of security, we have extended the rheostat runtime risk model [9] to VoIP environments in [10]. Let consider a security attack noted a ∈ A with A defining the set of potential VoIP attacks. Risk is typically defined as the combination of the potentiality P (a) of the related threat, the exposure E(a) of the VoIP infrastructure, and the consequence C(a) on that infrastructure if the attack succeeds (see Equation 1). P (a) × E(a) × C(a) (1) R= a∈A
Rheostat exploits a risk reduction algorithm and a risk relaxation algorithm. The risk reduction algorithm permits to reduce the risk level when the potentiality P (a) of the threat is high, by activating security safeguards (such as passwords and turing tests). This activation reduces the exposure E(a) of the infrastructure, and then permits to decrease the risk level to an acceptable value. The risk
Managing Risks at Runtime in VoIP Networks and Services
91
Fig. 1. Runtime risk management for VoIP environments
relaxation algorithm permits to minimize the impact on the infrastructure by deactivating security safeguards when the risk level is low. We have evaluated this risk model in particular the impact of the risk threshold and the number of safeguards and specified several safeguards for specific VoIP attacks. We are generalizing this work to multiple VoIP security attacks. We have also specified a functional architecture for supporting runtime risk management in these environments, as depicted on Figure 1. This architecture is composed of an intrusion detection system responsible for detecting threats in the VoIP platform, a risk manager for managing risks and selecting security safeguards with respect to the potentiality of security threats, and a configuration server for dynamically activating or deactivating the security safeguards.
3
Monte-Carlo Preliminary Results
We have developed a first implementation of the risk manager component, and are performing additional experiments based on Monte-Carlo simulations. Based on a functional evaluation, we have already shown the capability of the risk manager to reduce risks due to VoIP attacks and to minimize costs induced by safeguards, and evaluated the benefits and limits of our approach in comparison with traditional strategies. Our runtime solution permits to mitigate the risk level (benefits of up to 41%) and to maintain the continuity of the VoIP service in a dynamic manner. The benefits are limited in the case of instantaneous VoIP attacks. We are performing additional and complementary experiments of this schema based on Monte-Carlo simulations. In particular, we are evaluating the parameters that directly impact on the risk model, such as the number of safeguards, the threshold value and the observability properties of the considered threat.
92
4
O. Dabbebi, R. Badonnel, and O. Festor
Conclusions and Perspectives
Telephony over IP has known a large-scale deployment. VoIP communications are less confined than in traditional telephony, and are exposed to multiple attacks. Security mechanisms are required for protecting these communications. Their application may however seriously deteriorate the performances of such a critical service: a VoIP communication becomes incomprehensible as soon as the delay is more than 150 ms, or the packet loss is more than 5%. In that context, we propose to exploit and automate risk management methods and techniques at runtime in VoIP networks and services. Our aim is to dynamically adapt the exposure of the VoIP infrastructure based on a set of security safeguards in order to provide a graduated and progressive answer to risks. We have exploited the rheostat risk management model, specified a functional architecture, and evaluated some specific case scenarios. We are extending this approach to multiple VoIP attacks, and are quantifying the impact of parameters based on Monte-Carlo simulations. For future work we are planning to evaluate several configurations for deploying our functional architecture, and will investigate return-on-experience mechanisms for automatically configuring and refining the risk model parameters.
References 1. Thermos, P., Takanen, A.: Securing VoIP Networks: Threats, Vulnerabilities, and Countermeasures. Addison-Wesley Professional, Reading (2007) 2. Kuhn, D.R., Walsh, T.J., Fries, S.: Security Considerations for Voice Over IP Systems. National Institute of Standards and Technology (2005), http://csrc.nist.gov/publications/ 3. Dantu, R., Kolan, P., Cangussu, J.W.: Network risk management using attacker profiling. Security and Communication Networks 2(1), 83–96 (2009) 4. Shin, D., Shim, C.: Progressive Multi Gray-Leveling: A Voice Spam Protection Algorithm. IEEE Network Magazine 20 (September 2006) 5. Bunini, M., Sicari, S.: Assessing the Risk of Intercepting VoIP Calls. Elsevier Journal on Computer Networks (May 2008) 6. Bedford, T., Cooke, R.: Probabilistic Risk Analysis: Foundations and Methods. Cambridge University Press, Cambridge (April 2001) 7. D’Heureuse, N., Seedorf, J., Niccolini, S., Ewald, T.: Protecting SIP-based Networks and Services from Unwanted Communications. In: Proc. of IEEE/Global Telecommunications Conference (GLOBECOM 2008) (December 2008) 8. ISO/IEC 27005: Information Security Risk Management, International Organization for Standardization (June 2008), http://www.iso.org 9. Gehani, A., Kedem, G.: RheoStat: Real Time Risk Management. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 296–314. Springer, Heidelberg (2004) 10. Dabbebi, O., Badonnel, R., Festor, O.: Automated Runtime Risk Management for Voice over IP Networks and Services. In: Proc. of the 12th IEEE/IFIP Network Operations and Management Symposium, NOMS 2010 (April 2010)
Towards Dynamic and Adaptive Resource Management for Emerging Networks Daphné Tuncer, Marinos Charalambides, and George Pavlou Department of Electronic & Electrical Engineering, University College London, UK {d.tuncer,m.charalambides,g.pavlou}@ee.ucl.ac.uk
Abstract. In order to meet the requirements of emerging services, the future Internet will need to be flexible, reactive and adaptive. Network management functionality is essential in providing dynamic reactiveness and adaptability but current network management approaches have limitations and are inadequate to meet the relevant demands. In search for a paradigm shift, recent research efforts have been focusing on self-management principles. The PhD work presented in this paper proposes to investigate how autonomic principles can be extended and applied to fixed networks for quality of service (QoS) and performance management. The paper describes the two main research issues that will be addressed, namely (a) coordinated decision making in distributed environments, and (b) lightweight learning capabilities, and highlights their importance on realistic application scenarios for the emerging Internet.
Keywords: self-management, autonomic decision making, learning.
1 Introduction Today networks are planned and configured in a predictive manner through off-line traffic engineering [1] where the expected demand is calculated from previous usage and a specific routing configuration is produced, aiming to balance the traffic and optimise resource usage for the next provisioning period. Reactive approaches are not possible given the current nature of management systems that are external to the network and the resulting latency in learning about arising conditions and effecting changes. As such, this external off-line configuration approach can be well suboptimal in the face of changing or unpredicted traffic demand. In fact, the only current dynamic intervention relates to control functionality within network devices in order to deal with faults. For example, fast reroute control functions [2] direct traffic away from a failed component upon detecting a failure. But this is done in a predefined manner that does not take into account any other aspects related to the current state of the network and can have an unforeseen negative impact elsewhere.
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 93–97, 2010. © IFIP International Federation for Information Processing 2010
94
D. Tuncer, M. Charalambides, and G. Pavlou
This lack of adaptability to arising conditions makes the current Internet unable to cope with the requirements of emerging services and time-critical applications characterised by highly demanding traffic, various traffic patterns and quality of service (QoS) requirements. To meet these requirements, the future Internet is expected to become more flexible, adaptive and reactive to traffic and network dynamics. Because of the static nature of current management approaches towards these emerging demands, a paradigm shift is required and new management approaches should enable the Internet to continuously optimise the use of its resources and to recover from problems, faults and attacks without impacting the supported services and applications.
2 Research Challenges To overcome the drawbacks/limitations of current approaches, recent research efforts have been applying the autonomic computing principles developed by IBM [3] to network management systems. According to these principles, management systems are enhanced with self-aware, self-adaptive and self-optimising functionality, which is embedded within the network devices themselves. This allows the system to analyze and understand its environment, to decide upon the appropriate configuration actions, and to learn from previous decisions based on closed-loop feedback control solutions. More precisely, in order to perform closed-loop control, sophisticated monitoring mechanisms should be in charge of gathering network information, which can be used by decision making processes distributed across network nodes. The work in this PhD will investigate how autonomic principles can be extended and applied to fixed networks such that continuous resource optimisation and resilience can be supported through dynamic and adaptive traffic engineering functions. Important aspects that this work will be investigating are learning capabilities coupled with coordinated decision making. Several research challenges are raised by this approach, as described below.
2.1 Cooperative Decision Making Unlike current approaches, a self-managed network should support decentralised decision making and control functions. The decision making process is distributed among sets of nodes in the network, each node being responsible for deciding on the actions to take based on local feedback regarding the state of the network. The main benefits of this approach are robustness to failures and immediate adaptation to changing conditions. However, as each node has partial knowledge of the global information, inconsistencies between several independent decisions can occur, which may jeopardise the stability and the convergence of the global behaviour of the system. As such, management decisions need to be made in a collaborative manner and be harmonised toward a common objective.
96
D. Tuncer, M. Charalambides, and G. Pavlou
coordinated dynamic traffic engineering at network ingress nodes. This functionality is described below. To achieve specific objectives such as minimising end- to- end delay and maximising link utilisation, ingress nodes may dynamically control the incoming customer traffic based on real-time feedback from egress routers. This dynamic distributed control could be supported by coordinated operations for global performance optimisation and by learning capabilities for optimised reconfiguration actions. To avoid, for example, potential path congestion due to traffic upsurges or network failures in a domain, each ingress router can dynamically adjust the splitting ratio of incoming traffic across multiple paths according to traffic dynamics as depicted in Fig. 1. Here, ingress routers continuously receive feedback information from egress routers in the form of different parameters such as delay or packet loss. Based on this feedback, ingress routers can learn (through reinforcement learning techniques for instance) from the consequences of the previous decisions to adapt their behaviour and dynamically readjust the current traffic splitting ratio. However, if the ingress routers do not coordinate their actions, conflicting local decisions and inconsistent operations may arise, which can cause traffic concentration on the same paths and thus congestion. For this reason, cooperation between ingress nodes is essential.
Fig. 1. Coordinated Dynamic Traffic Engineering at Ingress Nodes
Another scenario that will be considered by this work concerns network resilience. In this scenario self-optimisation functionality relates to the continuous back-up path reprovisioning according to changing traffic conditions for fast failure recovery. Backup paths today are statically established and computed through external management systems, based only on the network topology. Due to this pre-engineered approach, current traffic conditions are not taken into account and therefore, the traffic diverted elsewhere upon a failure may suffer from congestion or may cause congestion to the region it is redirected to. To overcome these drawbacks, back-up paths should be dynamically re-computed by the network nodes themselves according to monitored traffic. However, as traffic conditions change dynamically, coordinated decision making is required across repairing nodes to ensure that affected traffic is directed to parts of the network that are not highly utilised to avoid degrading post-failure
Towards Dynamic and Adaptive Resource Management for Emerging Networks
97
performance, and to ensure that redirected traffic along back up paths is optimally spread towards underutilised regions.
4 Summary To meet the requirements of emerging advanced services the future Internet needs to be more flexible, adaptive and reactive. While current network management approaches mainly rely on pre-defined and static capabilities, new management systems will need to support adaptation capabilities through self-management functions and feedback closed-loop control mechanisms. The work in this PhD will investigate how autonomic principles can be extended and applied to fixed networks such that continuous resource optimisation and resilience can be supported through dynamic and adaptive traffic engineering functions. In particular, it will address the two main research challenges presented in this paper, namely coordinated decision making in distributed environments, and lightweight learning capabilities. The work will rely on a scenario-based approach, focusing on realistic case studies for the emerging self-managed Internet.
References 1. Awduche, D., Chiu, A., Elwalid, A., Widjaja, I., Xiao, X.: Overview and Principles of Internet Traffic Engineering. IETF RFC 3272 (May 2002) 2. Atlas, A., Zinin, A.: Basic Specification for IP Fast Re-route: Loop-free-Alternates. IETF RFC 5286 (September 2008) 3. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. IEEE Computer 36(1), 41–50 (2003) 4. Lavinal, E., Desprats, T., Raynaud, Y.: A generic multi-agent conceptual framework towards self-management. In: IEEE/IFIP Network Operations and Management Symposium, pp. 394–403 (April 2006) 5. Jiang, T., Baras, J.S.: Coalition Formation Through Learning in Autonomic Networks. In: Proceedings of the International Conference on Game Theory for Networks (GameNets 2009), Istanbul, Turkey, May 13-15, pp. 10–16 (2009) 6. Tcholtchev, N., Chaparadza, R., Prakash, A.: Addressing Stability of Control-Loops in the Context of the GANA Architecture: Synchronization of Actions and Policies. In: Proceedings of the 4th IFIP TC 6 International Workshop on Self-Organizing Systems. LNCS, vol. 5918, pp. 262–268. Springer, Heidelberg (2009) 7. Charalambides, M., Pavlou, G., Flegkas, P., Loyola, J., Bandara, A., Lupu, E., Russo, A., Dulay, N., Sloman, M.: Policy Conflict Analysis for DiffServ Quality of Service Management. IEEE Transactions on Network and Service Management (TNSM) 6(1) (March 2009) 8. Dietterich, T.G., Langley, P.: Self-Managing Networks. In: Mahmoud, Q.H. (ed.) Cognitive Networks: Machine Learning for Cognitive Networks: Technology Assessment and Research Challenges, ch. 5, pp. 97–120 (2007) 9. Tesauro, G.: Reinforcement Learning in Autonomic Computing – A Manifesto and Case Studies. IEEE Internet Computing 11(1), 22–30 (2007) 10. Gelenbe, E.: Steps towards self-aware networks. Communications of the ACM 52(7), 66– 75 (2009)
Adaptive Underwater Acoustic Communications Anuj Sehgal and Jürgen Schönwälder Computer Science, Jacobs University Bremen Campus Ring 1, 28759 Bremen, Germany {s.anuj,j.schoenwaelder}@jacobs-university.de
Abstract. Underwater wireless networks consist of mobile and static nodes, which usually communicate using the acoustic channel since radio transmissions attenuate rapidly and optical communication is only suitable for short distances. The underwater acoustic channel is plagued with issues of high transmission power requirements, rapidly changing channel characteristics, multi-path echoes, possible high ambient noise, high and varying propagation delays. To achieve optimal performance an underwater network must be able to sense its ambient environment and react by adapting its communication parameters. This paper proposes to take a two-tier approach in order to develop adaptive underwater communications; by developing an adaptive routing protocol that reacts to environmental parameters and developing a software acoustic modem to enable lower layer adaptations as well. Keywords: Underwater Acoustic Networks, Underwater Acoustics, Software Modems, Environmental Adaptive Routing.
1
Introduction
Communication using the radio channel is not efficient underwater since radio waves attenuate quickly. The only radio systems which are usable underwater are those which use very low transmission frequencies (30-300 Hz), very long antennae and high transmission power. Optical communication is usable for very short range communication owing to the high level of attenuation the visual spectrum experiences in the aquatic medium [1]. As such, the acoustic channel is used commonly to achieve wireless communications in the underwater environment. Underwater acoustic transmission suffers from a number of issues due to the volatility of channel conditions. High and varying ambient noise and multi-path echoes create challenges for dependable communication in the underwater acoustic channel. While it is widely accepted that the underwater acoustic channel suffers from low-transmission speeds, narrow bandwidth, high transmission power requirements and high bit-error-rates, high localized fluctuations in propagation delay due to the dependence of sound velocity on ambient temperature, salinity and acidity also add to the complexities of the channel. It is necessary to build cross-layer adaptive systems for optimal underwater acoustic communications. The work proposed in this paper is aimed at the routing and physical layer. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 98–101, 2010. c IFIP International Federation for Information Processing 2010
Adaptive Underwater Acoustic Communications
Route 1 (Single Hop) Route 2 (Multi Hop) Route 3 (Multi Hop)
99
Capacity (bps) Average SNR (dB) 0.1797 11.80 0.4429 7.68 1.0002 3.70
Fig. 1. A sample underwater topology with node A transmitting to E, 2 km apart at 500m depth. Node X is at 1 km distance from A and at 500 m depth. Nodes B, C, D are located at 1 km depth with 1 km distance between them. The table shows some performance metrics of this network. Signal transmission properties of 30 kHz at 80 dB strength and 2 kHz bandwidth are used.
Though software acoustic modems were initially proposed as an approach to tackle the high cost [2] of hardware for this environment, this approach may also be used in order to build adaptive acoustic modems which could change the coding and modulation methods in agreement with all nodes. An adaptive routing protocol that can sense environmental parameters is also necessary in order to fully optimize the network performance. This paper presents the research questions of relevance to this study, followed by some related research work and conclusions.
2
Research Questions and Proposed Approach
Communication in the underwater acoustic channel imposes a dependence between achievable data-rate and the transmission distance, depth, frequency and transmission power [3,4]. From Figure 1 it becomes clear that using a multi-hop approach and preferring deeper routes provides greater reliability and transmission capacity. Increasing transmission distance and shallow water communications introduce a high bit-error-rate (BER) and signal-to-noise ratio (SNR). It can also be shown that temperature, salinity and acidity of the ambient environment have a pronounced effect on channel bandwidth, BER and SNR [5]. As such, the research questions of interest to this study are:
100
A. Sehgal and J. Schönwälder
1. How much performance increase can be obtained by mitigating effects of ambient temperature, salinity and acidity, by prefering shorter and deeper hops? 2. What is the effect of packet sizes on the reliability, energy consumption and throughput of such networks? Does there exist an optimal packet size and should it be adapted according to ambient conditions as well? 3. How can performance be improved by adapting channel coding and modulation methods during run-time to suit the channel characteristics? If so, what parameters must be considered to optimize performance? 4. What effect does node mobility have upon the performance of such a network? How might any negative effects be mitigated? 5. What are the efficient ways of having the entire node swarm agreeing upon communication parameters in a low-power lossy network plagued with frequent partitions? The proposed approach combines an adaptive routing protocol which reacts to the ambient environment and a software acoustic modem capable of changing its characteristics at run-time. The proposed protocol (and modem) will need access to external data, such as, node depth, temperature, salinity and acidity. This can be easily derived from sensors on each node. An ultra-short baseline (USBL) tracking system for global localization or a two-way round trip time measurement system for inter-node localization can be used for distance measurements. The routing protocol will be analyzed in the underwater communications simulator designed for this study [5]. The software modem, being designed using GNURadio, will be configured with a transmitter and receiver pair; a software layer will model parameters such as ambient noise in order to study effects of conditions which are not mathematically modelable. This test-bed could also be used to study the effects of changing modulation and coding mechanisms on the performance of the communication system. The metrics considered by this study will include delivery ratio, propagation delay, SNR, BER, bandwidth, transmission power, energy consumption and endto-end throughput. The 2D and 3D network topologies discussed in [1] will be used as the basis for the testing scenarios.
3
Related Work
A few routing protocols have been proposed for underwater networks. The Vector Based Forwarding (VBF) protocol [6] aims to solve the problem of high error probability in dense underwater sensor networks by defining a routing pipe from the source to the sink and causing floods inside the pipe. In [7] a two-phase routing solution for long-term monitoring missions based on centralized planning of network topology and data-paths is introduced. A Depth Based Routing (DBR) protocol was proposed in [8] which always attempts to forward data to a node that is geographically located above itself. These protocols are not applicable in ad hoc underwater networks as they are designed for specific node-sink communication scenarios.
Adaptive Underwater Acoustic Communications
101
Some effort has also been placed into designing software acoustic modems for underwater communication [9,10]. Most of these developments have concentrated on using waterproofed TinyOS motes with microphones and speakers to set up an underwater network. However, even though this lowers costs of underwater networks it does not provide enough resources to develop reliable systems which could be used in real world communications.
4
Summary
We proposed developing an adaptive routing protocol and a software modem in order to improve the efficiency of underwater acoustic communications. Important research questions were formulated and a brief outline of the development, testing and evaluation approach, along with related work, has also been provided.
Acknowledgement The work reported in this paper is supported by the EC IST-EMANICS Network of Excellence (#26854).
References 1. Akyildiz, I.F., Pompili, D., Melodia, T.: Underwater acoustic sensor networks: research challenges. Ad Hoc Networks (Elsevier) 3, 257–279 (2005) 2. Partan, J., Kurose, J., Levine, B.N.: A survey of practical issues in underwater networks. In: WUWNet 2006: Proceedings of the 1st ACM international workshop on underwater networks, pp. 17–24. ACM, New York (2006) 3. Sehgal, A., Tumar, I., Schönwälder, J.: Variability of available capacity due to the effects of depth and temperature in the underwater acoustic communication channel. In: Proc. of IEEE OCEANS 2009, Bremen (May 2009) 4. Stojanovic, M.: On the relationship between capacity and distance in an underwater acoustic communication channel. In: WUWNet 2006: Proceedings of the 1st ACM international workshop on underwater networks, pp. 41–47. ACM, New York (2006) 5. Sehgal, A.: Analysis and simulation of the deep sea acoustic channel for sensor networks. Master’s thesis, Jacobs University Bremen, Germany (August 2009) 6. Xie, P., Cui, J., Lao, L.: Vbf: Vector-based forwarding protocol for underwater sensor networks. In: Proceedings of IFIP Networking 2006, Coimbra, Portugal (May 2006) 7. Pompili, D., Melodia, T., Akyildiz, I.F.: A resilient routing algorithm for long-term applications in underwater sensor networks. In: Proceedings of Mediterranean Ad Hoc Networking Workshop (Med- Hoc-Net), Lipari, Italy (June 2006) 8. Yan, H., Shi, Z.J., Cui, J.H.: Dbr: Depth-based routing for underwater sensor networks. In: Das, A., Pung, H.K., Lee, F.B.S., Wong, L.W.C. (eds.) NETWORKING 2008. LNCS, vol. 4982, pp. 72–86. Springer, Heidelberg (2008) 9. Jurdak, R., Ruzzelli, A.G., O’Hare, G.M.P., Lopes, C.V.: Mote-based underwater sensor networks: Opportunities, challenges, and guidelines. Telecommunication Systems Journal 37, 37–47 (2008) 10. Jurdak, R., Baldi, P., Lopes, C.V.: Software-driven sensor networks for short-range shallow water applications. Ad. Hoc. Netw. 7, 837–848 (2009)
Probabilistic Fault Diagnosis in the MAGNETO Autonomic Control Loop Pablo Arozarena1, Raquel Toribio1, Jesse Kielthy2, Kevin Quinn2, and Martin Zach3 1
Telefónica Investigación y Desarrollo, Madrid, Spain {pabloa,raquelt}@tid.es 2 Telecommunications Software and Systems Group (TSSG), Waterford, Ireland {jkielthy,kquinn}@tssg.org 3 Siemens AG Austria, Viena, Austria {martin.zach}@siemens.com Abstract. Management of outer edge domains is a big challenge for service providers due to the diversity, heterogeneity and large amount of such networks, together with limited visibility on their status. This paper focuses on the probabilistic fault diagnosis functionality developed in the MAGNETO project, which enables finding the most probable cause of service problems and thus triggering appropriate repair actions. Moreover, its self-learning capabilities allow continuously enhancing the accuracy of the diagnostic process. Keywords: Autonomic, Home Area Networks Management, Bayesian Network, Self-learning.
(HAN),
Probabilistic
1 Introduction As networks grow in size, heterogeneity, and complexity, network management needs to deal with incomplete management information, uncertain situations, lack of full visibility on network status and dynamic changes. All this is particularly true for outer edge domains, i.e. the point of attachment of home area networks (HANs). To address these problems, several new management approaches have been developed. Current autonomic management architectures have adapted the initial IBM concept [1] to enable self management of networks. For example, FOCALE [2] was designed with the concept of autonomic network management in mind and considers the additional constraint of heterogeneity of devices. Moreover, the Generic Autonomic Networking Architecture (GANA) [3] provides a framework to model how protocols, functions, nodes and networks can provide autonomic network management. In order to deal with uncertainty, other approaches have explored the application of probabilistic techniques to network management. CAPRI [4] uses Bayesian Network (BN) inference [5], combined with a highly distributed architecture, to conclude the cause of the observed network problems. The 4WARD project [6] has proved the feasibility of probabilistic techniques to address the composition of management functions in distributed architectures. MAGNETO [7], a project funded under the Eureka Celtic Initiative, is addressing the challenges described above in the particular case of Home Area Networks (HANs). The approach in MAGNETO is to deploy management functionality both B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 102–105, 2010. © IFIP International Federation for Information Processing 2010
Probabilistic Fault Diagnosis in the MAGNETO Autonomic Control Loop
103
inside HANs and in the ISP domain to enable distributed autonomic behavior, thus allowing management tasks to span different domains. The combination of shared autonomic management produces additional benefits – for example, if a problem in the access network is affecting several HANs, MAGNETO agents at the ISP are in a better position to properly diagnose the fault. MAGNETO addresses two main use cases: Service Degradation – which focuses on how a network can react to degradation in the quality of a service, and Service Breakdown – which focuses on how a network can react to problems disrupting the delivery of a given service.
2 Fault Diagnosis in MAGNETO Fault diagnosis in MAGNETO follows a distributed approach that, based on probabilistic techniques, allows the system to face uncertainty. In MAGNETO, the possible causes of service failures (hypothesis) detected in the network and the set of observable network variables (symptoms) are modeled as a BN. In this BN possible causes are defined as hypothesis nodes while symptoms as evidence nodes. Thanks to bayesian inference, the probability of each of the possible causes of a service failure can be estimated. Thus, the result of the diagnosis procedure is a set of the most probable causes of the failure together with their probability values. In MAGNETO, the initial structure and Conditional Probability Tables (CPTs) of the BN are modeled with the help of the network/service experts. Since MAGNETO deals with the management of services in the outer edge, the causes of problems may lie in different domains (such as the HAN, the ISP, etc.). For this reason, MAGNETO relies on a Multi-Agent System (MAS) [8] platform that facilitates distributing fault diagnosis functionality across domains. Besides, MAGNETO allows breaking down a BN in a set of smaller BNs that can be used by agents spread across the different network domains. For instance, some agents may be specialized in diagnosing HAN errors while others in diagnosing access network errors, and they may exchange their results to cooperatively reach a global diagnosis. The following types of agents have been defined: • Diagnosis Agents orchestrate the diagnostic process among a set of cooperating agents. This algorithm is an iterative process where new evidences are added to the BN as they become available and bayesian inference is repeatedly performed until enough confidence for a given hypothesis is reached or no extra information can be obtained. In the latter case, the diagnosis can be delegated to a different diagnosis agent for further inference. • Observation agents provide evidences which are obtained on request by performing specific tests, such as alarm queries. Each observation agent is specialised in a certain type of tests and publishes its capabilities in a service directory. • Belief Agents provide the set of probability values of a given node being in each of its possible states. These agents have bayesian knowledge embedded and perform bayesian inference to obtain the beliefs of a certain node based on the evidence they may get from observation agents. By exchanging beliefs, these agents enable partitioning a BN to perform distributed diagnosis. • A Knowledge Agent is in charge of conducting a self-learning process and distributing diagnosis knowledge to other agents.
104
P. Arozarena et al.
An important MAGNETO goal is to be able to automatically improve the quality of its diagnosis results. In order to do so, past diagnosis reports should be validated either automatically or manually. At the time of writing, only manual validation is supported since automatic validation requires feeding back to the fault diagnosis functionality the results of autonomic self healing actions performed as a consequence of a given diagnosis. Once MAGNETO has enough new validated reports, it triggers a parametric bayesian learning algorithm, called Expectation Maximization (EM) [9], to obtain more accurate CPTs. In order to perform self-learning, MAGNETO periodically executes the following steps: 1. Diagnosis agents send diagnosis operation reports to the knowledge agent. These reports contain the results of diagnostic operations and are stored in a knowledge repository. 2. Diagnosis reports are validated to indicate whether the result was right or wrong. This validation can be either manual or automatic. 3. A self-learning process is executed by the knowledge agent. This process accesses the knowledge repository and updates CPT values using the validated diagnosis results and the BN knowledge available at that point in time. It is important to note that a confidence value is used by the self-learning algorithm to determine the weight of the existing BN parameters over new incoming data. 4. Once new CPT values are learnt, they are propagated to all diagnosis agents so they can update their inference knowledge accordingly.
3 Conclusions and Future Work The MAGNETO prototype is being validated in a real testbed representing two interconnected HANs and an ISP. Feedback from this validation, together with further research outputs, will be used to define the evolution of the prototype. Future work related to fault diagnosis will mainly focus on the following research topics: • Studying ways to automatically validate diagnosis results so human intervention can be avoided, thus making the self-learning loop more efficient. • Using a different confidence value per each parameter in a BN to make the selflearning process more accurate. • Comparing different ways to combine self-learning with BN partitioning in terms of facility to cope with different domains, accuracy of results, simplicity of the validation process, etc.
Acknowledgments This paper describes work undertaken in the context of the CELTIC MAGNETO project, which is partially funded by the Spanish “Ministerio de Industria, Turismo y Comercio” under the Avanza program, by “Enterprise Ireland” as part of the International Collaboration Support Programme and by the Austrian FFG (Basisprogramme, Projekt Nr. 820360).
Probabilistic Fault Diagnosis in the MAGNETO Autonomic Control Loop
105
References 1. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. Computer 36(1), 41–50 (2003) 2. Strassner, J., Agoulmine, N., Lehtihet, E.: Focale: A novel autonomic networking architecture. In: Latin American Autonomic Computing Symposium (LAACS), Campo Grande, MS, Brazil (2006) 3. Chaparadza, R., et al.: Creating a viable Evolution Path towards Self-Managing Future Internet via a Standardizable Reference Model for Autonomic Network Engineering. In: Towards the Future Internet - A European Research Perspective, pp. 136–147. IOS Press, Amsterdam (2009) 4. Lee, G.: Capri: A common architecture for distributed probabilistic internet fault diagnosis. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Thesis, Ph. D. (2007) 5. Barco Moreno, R.: Bayesian modelling of fault diagnosis in mobile communication networks. Universidad de Málaga, Tech. Rep. (2007) 6. Hasslinger, G., et al.: 4WARD Deliverable D-4.2: In-Network Management Concept (2009), http://www.4ward-project.eu 7. MAGNETO: Management of the outer edge, http://projects.celtic-initiative.org/MAGNETO 8. Wooldridge, M.: An Introduction to Multi Agent Systems, 2nd edn. John Wiley & Sons, Chichester (2009) 9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, pp. 236–243. Springer, Heidelberg (2001)
Modelling Cloud Computing Infrastructure Marianne Hickey and Maher Rahmouni HP Labs, Long Down Avenue, Bristol, BS34 8QZ, UK {Marianne.Hickey,Maher.Rahmouni}@hp.com
Abstract. We present a modelling approach for an adaptive cloud infrastructure consisting of secure pools of virtualized resources, in order to facilitate automated management tasks and interaction with the system by a human administrator, or programmatically by a higher level service. The topology of such a system is rapidly changing as, for example, it has the abilities to create, modify or destroy pools of virtual resources according to customer demand, as well as dynamically modify the mapping of virtual to physical resources. It is also highly distributed and management data needs to be compiled from disparate sources. Our modelling approach, based on the semantic web, allows us to represent complex topologies, model incomplete or erroneous systems and perform operations such as query, while still allowing validation of the models against system invariants and policies. It also supports distributed modelling, allowing sub-models to be combined, data merging, and shared vocabularies. Keywords: Modelling, Cloud Computing, RDF, Ontology, Rules, Validation.
1 Introduction There is currently a shift towards cloud computing, which changes the model of provision and consumption of information technology (IT) services, and separates the IT consumer or customer organisation from much of the direct cost and management of IT provision. Rather than an organization managing IT services on their own computing infrastructure, cloud computing takes the approach of meeting an organisation’s IT needs, partly or wholly, by IT services available on the internet. The infrastructure to support cloud computing needs to be highly adaptive and distributed. The topology is rapidly changing, as the platform on which applications run will have the abilities to create and destroy pools of virtualized resources, to manage dynamic resource allocation across the population of resources, and to detect and recover from failures. Information about the infrastructure needs to be assimilated from, and integrated into, a variety of management sources. Throughout all this change, topological constraints and management policies will need to be applied. Our aim is to model a platform for cloud computing, to enable interaction programmatically by other management systems and higher level services, as well as by human administrators. This paper discusses the requirements of modelling a cloud computing infrastructure (Section 2) and presents a solution based on semantic web [1] technologies (Section 3). Sections 4 and 5 discuss related work and conclusions. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 106–109, 2010. © IFIP International Federation for Information Processing 2010
Modelling Cloud Computing Infrastructure
107
2 Issues in Modelling a Cloud Computing Platform Fragmentation of service data is one issue that increases the difficulty of IT management. With virtualization, the fragmentation of data accelerates. A cloud computing infrastructure is highly distributed; for example, virtual machines in a particular customer’s resource pool may be spread across a heterogeneous physical infrastructure, spanning inside and outside the enterprise. The system model itself may be distributed amongst hosts but may need to be merged, for example for a complete view of the system for validation. The infrastructure model needs to interoperate with a variety of data and management agents. Information about the infrastructure needs to be assimilated from a range of sources, including discovery and a range of other management sources, from within and beyond an organization’s boundaries. Likewise, information from the system model may need to be merged with other data in a larger management system. The topology of an infrastructure to support the demands of cloud computing obviously needs to be highly dynamic, for example: physical hosts may fail or need to be taken down for servicing; virtual machines may be migrated between physical hosts because of this, or as a result of dynamic resource allocation that aims to reduce cost by limiting the number of physical hosts used and/or power consumption; customers may request a new resource pool at any time or modify existing ones to meet their demand. While all this change is going on, we need to be able to validate that the system adheres to topological constraints and policy requirements, and if it does not, query the model to investigate why. With an adaptive infrastructure, we need to model complex topologies and dependencies. For example, many systems have topologies where partonomies, based on part-whole relationships, are as important as subsumption hierarchies that are based on sub-class relationships. The platform will need to support a variety of different customers and keep pools of resources owned by different customers secure. We need to represent system invariants and policies, and these need to be validated; for example, a policy might require that a virtual machine cannot belong to two different pools of resources.
3 Our Solution Our approach to a programmatic way to model, validate, query and compare a cloud computing platform is based on semantic web technologies, see Fig. 1, as they provide several useful features for modelling a distributed, rapidly changing, heterogeneous environment, such as: a standard, open representation and set of tools; ability to model complex topologies; data merging; shared vocabularies; ability to model incomplete/erroneous data; inference; ability to represent policy and topology constraints. The approach uses the Resource Description Framework (RDF), [2], to represent instances of an adaptive infrastructure, and an ontology and system topology rules to model the domain. For validation, it uses layers of generic and domain specific validation rules. The topologies we need to model require relationships to be expressed between classes, such as cardinality restrictions, and we therefore need to use OWL [3], rather
108
M. Hickey and M. Rahmouni
than RDF-Schema (RDFS) [4]. Further, we need to model partonomies as well as subsumption hierarchies, and other relationships that are not expressible in OWL [5]. We use a rules language to model such relationships. Rules are also used to represent system invariants and policy, and these rules can be run in a specific validation phase. Our approach separates the description of the domain and validation criteria, which enables the same underlying model to be used with different criteria. We can model a system that does not conform to system invariants, and still be able to process and query the model.
Fig. 1. Modelling with the Semantic Web Technology Stack
Using the Jena toolkit [6], we have developed a proof of concept demonstrator, based on an OWL ontology, Jena rules [7] to support topology that cannot be expressed in OWL and Jena validation rules to detect and report minimum and maximum cardinality violations, and other violations of domain invariants.
4 Related Work Management systems often have proprietary representation, querying and data manipulation methods. Using RDF/OWL for modelling, with SPARQL [6] for querying the models, is a promising approach to eventually facilitate interoperability between management systems and data integration between management data repositories. In the context of cloud computing platforms, this is more relevant than ever before. There has been work on standardizing systems modelling languages. The most prominent of these is the Service Modelling Language (SML) [8], which is based on profiles of XML Schema (for defining structural aspects of SML models) and the ISO Schematron language (for defining constraints), with some additional support for inter-document references. A key difference between an RDF/OWL approach and an XML-Schema approach is that it deals directly with a model of the domain, whereas XML-Schemas make statements about XML element structures for describing the domain. With the latter approach, syntactic considerations can get in the way of modelling
Modelling Cloud Computing Infrastructure
109
and operations such as query and transformation [9]. RDF/OWL also has good support for semi-structured data. The issue of transferring characteristics across a partonomy is addressed in [10], which uses a different Rules language, SWRL, and primarily considers domains such as medical terminologies and web services.
5 Conclusions With the shift towards cloud computing, IT systems models are needed more than ever, to provide an administrator or automated management tools good visibility and control in order to better manage IT services. However, cloud computing itself presents significant challenges to modelling as the management data is potentially more and more fragmented and topologies are rapidly changing to meet customer demands and resource allocation decisions. Models from disparate management sources need to interoperate, and data needs to merge transparently. Throughout all this change, topological constraints and management policies need to be applied. This paper presents a modelling approach to attempt to meet these challenges. The approach is based on the semantic web technology stack, and thus uses a standard, open representation and tools, as well as providing features such as: support for distributed models, including data merging and shared vocabularies; inference; the ability to represent incomplete systems, complex topologies, topology constraints and policy rules.
References 1. W3C Semantic Web Activity, http://www.w3.org/2001/sw/ 2. Manola, F., Miller, E., McBride, B. (eds.): RDF Primer, W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ 3. Bechhoffer, S., et al.: OWL Web Ontology Language Reference, W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-owl-ref-20040210/ 4. Brickley, D., Guha, R.V., McBride, B. (eds.): RDF Vocabulary Description Language 1.0: RDF Schema, W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ 5. Horrocks, I.: OWL Rules, OK?: W3C Workshop on Rule Languages for Interoperability, Washington, D.C., April 27-28 (2005) 6. Jena – A Semantic Web Framework for Java, http://jena.sourceforge.net/ 7. Jena 2 Inference support, http://jena.sourceforge.net/inference/index.html 8. Service Modeling Language, Version 1.1, W3C Recommendation (2009), http://www.w3.org/TR/2009/REC-sml-20090512/ 9. Reynolds, D., Thompson, C., Mukerji, J., Coleman, D.: An Assessment of RDF/OWL Modelling, HP Labs Technical Report, HPL-2005-189 (2005) 10. Horrocks, I., Patel-Schneider, P.F., Bechhofer, S., Tsarkov, D.: OWL Rules: A Proposal and Prototype Implementation. Journal of Web Semantics 3(1), 23–40 (2005)
Towards an Autonomic Network Architecture for Self-healing in Telecommunications Networks Jingxian Lu, Christophe Dousson, Benoit Radier, and Francine Krief Orange Labs, 2 Avenue Pierre Marzin, 22300 Lannion Cedex, France {jingxian.lu,christophe.dousson,benoit.radier}@orange-ftgroup.com, [email protected]
Abstract. We propose a solution to achieve a global self-healing process based on local knowledge of each network element using a model-based diagnosis approach. Then we illustrate this solution in an IMS platform and detail the corresponding causal graph and diagnosis process. Keywords: Self-healing, Model-based diagnosis, Causal graph.
1
Introduction
With the increased complexity of system installation, maintenance, configuration and recovering, an increase in intelligence in the system is required and the concept of autonomic system will come in. An autonomic system manages itself and becomes autonomic without human interventions. It has four distinct objectives: Self-configuring, Self-healing, Selfoptimizing and Self-protecting. The closed control loop proposed by IBM is implemented by autonomic managers, which control managed resources [1]. With the development of autonomic system, the identifying, tracing, and determining of the cause of failures and the repairing action became more and more important to the operators in order to satisfy the needs of customers. Thus, the purpose of our studies is to introduce self-healing in operator networks.
2
Model-Based Diagnosis for Self-healing
The causal graph is a general, intuitive and effective representation to model the consistent causalities governing the function of the system to monitor [2]. It is supposed to be an acyclic graph that limits the expressiveness of the representation, however, this restriction is important to reduce the complexity of reasoning [3]. The causal graph structure in this paper is based on five kinds of nodes and two kinds of arcs which are shown in Figure 1. The diagnosis process consists in collecting all the current observations, and then, determines all the primary causes which could explain them. To achieve this process, each node has a unique state as following: Guilty (a test result or an active alarm or a proved cause), Innocent (a test result or an absent B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 110–113, 2010. c IFIP International Federation for Information Processing 2010
Towards an Autonomic Network Architecture for Self-healing
Primary cause Intermediate cause
Observation Test
Repairing action
111
Should cause Could cause
Fig. 1. Nodes and arcs of causal graph
alarm or an exonerated cause), Suspect (a test to run or an unknown cause to investigate), Unknown (the default state). Then, applying the causal propagation rules that follows until stability of the node states over the whole graph gives the diagnosis and repairing action: R1 : if a node is guilty, then all its “should cause” successors are guilty. R1’ : if a node is guilty, at least one of its predecessors is guilty. R2 : if a node is innocent, then all its “should cause” predecessors are innocent. R2’ : if all the predecessors are innocent, the node is innocent. R3 : if a node is guilty, then all its predecessors become suspect. R4 : if a node is suspect, then all its “should cause” predecessors are suspect. R4’ : if a node is suspect, then all its “should cause” successors are suspect. R5 : if a test node is suspect, then the test is performed. Depending of the result, the state becomes guilty or innocent. R6 : if a repairing action is guilty, then it should be processed.
3
Application to IMS Service
The autonomic network architecture used a distributed Knowledge Plane (KP) to interconnect different network elements [4,5]. We define the ARM (Autonomic Resource Manager) to manage network element types and ANM (Autonomic Network Manager) to manage the ARM plane and to control the network [6,7]. We illustrate the previous diagnosis process by applying our approach to the case of a SIP-based VoIP service delivered over an autonomic management architecture (Fig. 2). Let’s suppose that there is the following problem: I-CSCF sends the user’s IP Multimedia Public Identity (IMPU) to SLF which can not find the corresponding user information. According to the above problem, the global diagnosis relies on the causal graph shown in the Figure 3. We have divided it into four regions. Each region is a subgraph which corresponds the ARM or the ANM. The operators validate each subgraph in each ARM to decide if it could be involved in a global diagnosis when the equipment are integrated in the network. ARM allows the administrator to understand in a smooth way how the equipment could interact with contextual information. If the ARM is compatible with the operators’ expectation, the subgraph is integrated in the KP of ARM and ANM by connecting the different subgraphs together. As a consequence, a primary cause in a subgraph could become an intermediate cause in the implicit global graph. It’s why we need to standardize the interface (showed in the Fig. 3 as arrows between
112
J. Lu et al.
ANM (Administrer)
Information System Alarm unknown IMPU
ARM (SLF) Alarm unknown IMPU
Know Yes/No IMPU? Update ARM SLF
Update SLF
(HSS)
Know IMPU? Yes/No
SLF Request IMPU Register
User
P-CSCF
Register
ARM (AS)
HSS
AS
S-CSCF
AS
Response (unknow/ok)
I-CSCF
Fig. 2. The case of a SIP-based VoIP service
the black nodes, which cross the domains) between different domains to permit the connection between different subgraphs. After that, the global graph can be used by the diagnosis process in a closed control loop way. Today, there is no autonomous solution which allows to solve this validation step, human expert is still required. After that, the diagnosis in SLF and AS domains are processed locally. When the diagnosis cannot be completed at the local level, it needs to rise to the global environment (ANM) and collaborates with the other local diagnosis to find or discard or confirm the primary causes. That will ensure an end-to-end diagnosis service. The causal graph modelling can be merged or split according to the needs of the distribution of diagnosis service in a network environment. Running the diagnosis process (Section 2) on the example of IMPU alarm is procedure in the following steps by applying the specified rules:
No subscriber
ANM Domain
Unknown client in HSS (No account)
Unknown client in SLF(No account)
SLF no synchronizaton
IMPU unknown in SLF IMPU barring in SLF Alarm unknown IMPU in SLF
Test of unknown client
Resynchronization SLF Client barring in SLF ARM of SLF Domain
Client No payment Client Barring in HSS
ARM of HSS Domain
Test of client barring
AS no synchronizaton
Resynchronization AS Unknown client in AS(No account)
Client barring in AS
IMPU barring in AS IMPU unknown in AS
ARM of AS Domain
Fig. 3. The global diagnosis
Alarm unknown IMPU in AS
Towards an Autonomic Network Architecture for Self-healing
113
1. (R3): As “Alarm unknown IMPU in SLF” is guilty (alarm present), then “IMPU unknown in SLF” and “IMPU barring in SLF” are suspect. 2. (R4): “IMPU unknown in SLF” is suspect, then its predecessor “Unknown client in SLF (No account)” is set to suspect. 3. (R4): “Unknown client in SLF (No account)” is suspect, then “Unknown client in HSS (No account)” is set to suspect. 4. (R4’): “Unknown client in HSS (No account)” is suspect, then “Unknown client in AS (No account)” and “IMPU unknown in AS” are set to suspect. 5. (R2): “Alarm unknown IMPU in AS” is innocent (no alarm presents), then “IMPU unknown in AS”, “Unknown client in AS (No account)” and “Unknown client in HSS (No account)” are set to innocent. 6. (R2’): Then “Unknown client in SLF (No account)” is set to innocent. 7. (R2,R2’,R4): “IMPU barring in SLF” is suspect, and the same process will be launched as steps 2 to 6. Then “Client barring in SLF” is set to innocent. 8. (R1’): As “Unknown client in SLF (No account)” and “Client barring in SLF” are innocent, then “SLF no synchronization” is guilty. 9. (R6): The repairing action “Resynchronization SLF” is performed.
4
Conclusion
We have proposed a solution to achieve a global network diagnosis with local knowledge of each network element using causal graphs validated by the operators. It permits to connect different necessary processes (subgraphs) to achieve an end-to-end diagnosis and to perform the repairing actions. The further research should aim to specify the interface between elements or domains which corresponds to the connection between different local diagnosers and consider how the performance of the global diagnosis process could be adapted to our autonomous architecture by defining when and how to send information. Then, we will implement our self-healing approach on an operational IMS platform.
References 1. IBM: An architectural blueprint for autonomic computing. Technical report, IBM White paper (June 2005) 2. Poole, D.: Normality and faults in logic-based diagnosis. Artificial Intelligence, 1304– 1310 (1989) 3. Console, L., Dupr´e, D.T., Torasso, P.: A theory of diagnosis for incomplete causal models. In: Proc. of IJCAI, Detroit, USA, pp. 1311–1317 (1989) 4. ANA: Autonomic Network Architecture, http://www.ana-project.org 5. Mbaye, M., Krief, F.: Autonomic Networking: The Knowledge Plane. System and information sciences notes edn., vol. 1 (July 2007) 6. Strassner, J.C., Agoulmine, N., Lehtihet, E.: FOCALE – A novel autonomic networking architecture, vol. 3, pp. 64–79 (May 2007) 7. Strassner, J., Kim, S.S., Hong, J.W.K.: The design of an autonomic communication element to manage future internet services. In: Hong, C.S., Tonouchi, T., Ma, Y., Chao, C.-S. (eds.) APNOMS 2009. LNCS, vol. 5787, pp. 122–132. Springer, Heidelberg (2009)
LearnIT: Enhanced Search and Visualization of IT Projects Maher Rahmouni1, Marianne Hickey1, and Claudio Bartolini2 1 HP Labs, Long Down Avenue, Bristol, BS34 8QZ, UK HP Labs, 1501 Page Mill Rd., Palo Alto, California, 94304-1100, USA {Maher.Rahmouni,Marianne.Hickey,Claudio.Bartolini}@hp.com 2
Abstract. Year over year, the majority of IT projects fail to deliver the required features on time and budget, costing billions of dollars. The task of portfolio managers is to make a selection of prospective projects, given only rough costbenefit estimates for them. In this paper, we present LearnIT, a tool to aid portfolio managers in their portfolio selection job. LearnIT identifies past or current IT projects within a large database on the basis of similarity to a given IT project or proposal. Furthermore, LearnIT is capable of being trained with expert user feedback, via example. LearnIT also provides a means for visualizing and making evident any relationships between IT projects in order to help IT managers design templates for future projects thus increasing their success rate. Keywords: IT Project and Portfolio Management, Similarity, Search, Visualization.
1 Introduction Every year, thousands of IT projects are undertaken to accommodate a business or process change. Alarmingly, a majority of those projects fail to deliver on time and budget with the required functions and features. According to the 2009 Standish Chaos Report [1], 31% of IT projects in the US will be abandoned before completion, 52% will cost 189% of their original estimates and the average time overrun is an unbelievable 222% of the original time estimate. One of the reasons behind those catastrophic results is the lack of mechanisms and tools for identifying related projects, understanding relationships between projects and learning from past experiences. Better estimates of IT project costs and time could be achieved by aggregating data from past similar IT projects. Increased efficiency could be achieved by re-using common components already implemented, tested and delivered successfully. In this paper we describe work that addresses these challenges in two ways. • •
Firstly, by providing a set of similar past IT projects to a given IT project or proposal, so as to enable improved estimates on cost, time and staffing profiles, and furthermore learn from past experiences. Secondly, by visualizing and making evident any relationships between the thousands of projects in an IT organization of a large enterprise. This aspect can help IT managers design templates for future IT projects thus minimizing the risk and increasing their chances of success.
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 114–117, 2010. © IFIP International Federation for Information Processing 2010
LearnIT: Enhanced Search and Visualization of IT Projects
115
The rest of the paper is organized as follows. Section 2 presents our solution – LearnIT – a tool that IT managers can use to quickly find similar projects and to visualize large sets of projects and their relationships. Section 3 discusses related work and, finally, we present conclusions and future work in Section 4.
2 Our Solution – LearnIT Our solution, LearnIT, provides predictive analytics for IT projects by analyzing the similarity between them and between proposals and ongoing and completed projects, with the long term aim of facilitating additional functionality such as resource prediction, staffing profiles, potential risks and values for projects, based on past experiences in similar projects. LearnIT has two facets: •
•
One is a project centric view over a database of completed or ongoing projects, allowing a project manager to find past projects that presented similar conditions to the one at hand, or identify patterns that lead to successful outcomes. For instance, by looking at similar projects, a project manager may have a better idea of any likely issues with their project. The second is a big picture view that looks at all projects and the relationships between them, so as to identify clusters and themes and to recognize sets of projects as a first class entity. This will, for example, allow a portfolio manager to rapidly navigate the set of projects, facilitating the portfolio selection task.
(a)
(b)
Fig. 1. (a) LearnIT System Architecture (b) IT Project Similarity Module
The architecture of LearnIT is shown in Fig 1(a). In addition to a database of past and active projects, LearnIT is composed of three main components, which are summarized below: •
A similarity calculator module used by both the search and the visualization components, which computes the similarity between two IT projects.
116
M. Rahmouni, M. Hickey, and C. Bartolini
• •
A learning module that uses direct feedback from the user to modify and update the weights on which the project similarity module depends. A visualization module where the relationships between projects are shown and projects are clustered, making common themes evident to the user.
IT Projects have a list of features such as name, description, staffing profiles, financial information, tasks, etc. As shown in Fig. 1(b), the similarity calculator module takes a pair of projects as input, and outputs a number between 0 and 1 representing the similarity between the two projects. The algorithm takes into account different types of project features, including but not limited to: text (e.g. description), documents (e.g. business justification, presentation, and architecture diagrams), numeric (e.g. cost data), and workflows (e.g. task schedule). For each feature of the project data, a sub-module calculates the similarity of the features. A sub-module may be associated to several features, such that there is a many-to-one mapping between features and sub-modules, for example, a sub-module may be associated to a data type. The similarity of two projects is the weighted sum of individual, feature specific, similarities. Each sub-module may require some tuning. For instance, a text submodule may be trained with a corpus of data specific to a domain or an enterprise. The weights may also need to be tuned and this is achieved through the learning module. Different solution methods exist for tuning the weights by using a statistical classifier algorithm [2][3]. The weights can be manually set to a sensible default to bootstrap the system. The visualization component makes use of a force directed algorithm for drawing graphs [4]. The main idea behind the algorithm is to draw the graph by simulating a virtual physical model. This model is referred to as the spring model. In this model, a node in the graph is modeled as a ring of steel and an edge is modeled as a spring to form a physical system. This model uses two forces, an attractive force that works along edges and is proportional to the length and weight of the edge and a repulsive force that draws nodes apart that is inversely proportional to the square of the distance between them. Given an initial random layout, the springs and the repulsive forces move the system to a locally minimal energy state, that is, an equilibrium configuration. It has been observed [5] that in such configuration, the resulting graph satisfies some of the most important aesthetic criteria such as distributing vertices evenly, making edge lengths uniform, minimizing edge crossings and reflecting symmetric properties. In our solution, the nodes of the graph represent IT projects, an edge exists between two nodes if the similarity between the corresponding IT projects is greater than or equal to a user defined threshold. The threshold could be updated interactively. Also, the weights on the edges represent the similarity between the corresponding nodes.
3 Related Work Keyword search (e.g. Google search) could be used to search the project database. If the project information is structured then SQL queries over project attributes could be used. These approaches are less than ideal in that the user needs to know what they are looking for (what search term to use, what attributes to search). A project typically consists of many attributes and pieces of text, and constructing a query based on all of
LearnIT: Enhanced Search and Visualization of IT Projects
117
these data is cumbersome. Besides, much of the information retrieved may not be relevant, and it is hard to rank results in terms of the importance of different attributes. The state of the art in IT portfolio and project management (PPM) tools, such as HP Portfolio and Project Management [6], does not give a graphical view of the entire set of projects and their relationships. Typically, the available representations are nongraphical representations, such as lists of projects, or risk-value bubble charts, which do not scale to thousands of projects. With this kind of approach, a lot of projects could be clustered together even if they are not similar. Also, these are limited to numeric attributes.
4 Conclusions and Future Work Many IT projects fail to deliver on time and budget with the required functions and features. We propose a set of mechanisms and tools, LearnIT, for identifying related projects and understanding the relationships between them, in order to learn from past experiences, thereby improving future project planning and execution and aiding portfolio selection. A prototype of LearnIT has been implemented and initial evaluation with a database of thousands of IT projects indicates positive results. Future work will extend this prototype to include other functionality described in Section 3, particularly learning from domain expert feedback. We also plan to add further similarity sub-modules to include more project attributes in the similarity calculation; in particular, we are exploring graph based similarity distances for project attributes such as work breakdown structure.
References 1. CHAOS Summary 2009, Standish Group International (2009), http://www.standishgroup.com/newsroom/chaos_2009.php 2. James, M.: Classification Algorithms. Wiley Interscience, Hoboken (1995) 3. Rahmouni, M., Bartolini, C.: Learning from Past Experiences to Enhance Decision Support in IT Change Management. In: Proc. 2010 IEEE/IFIP Network Operations and Management Symposium (NOMS 2010), Osaka, Japan (2010) 4. Di Battista, et al.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall, Englewood Cliffs (1999) 5. Quigley, A., Eades, P.: FADE: Graph Drawing, Clustering, and Visual Abstraction. In: Marks, J. (ed.) GD 2000. LNCS, vol. 1984, pp. 197–210. Springer, Heidelberg (2001) 6. HP Project and Portfolio Management (PPM) Portfolio Management module, https://h10078.www1.hp.com/cda/hpms/display/main/ hpms_content.jsp?zn=bto&cp=1-11-16-18%5E1299_4000_100__
Strategies for Network Resilience: Capitalising on Policies Paul Smith1 , Alberto Schaeffer-Filho1 , Azman Ali1 , Marcus Sch¨ oller2 , 3 1 1 Nizar Kheir , Andreas Mauthe , and David Hutchison 1
Computing Department, Lancaster University, UK {p.smith,asf,a.ali,andreas,dh}@comp.lancs.ac.uk 2 NEC Laboratories Europe, Heidelberg, Germany [email protected] 3 France T´el´ecom R&D Caen, 14066 CAEN, France [email protected] Abstract. Networked systems are subject to a wide range of challenges whose nature changes over time, including malicious attacks and operational overload. Numerous mechanisms can be used to ensure the resilience of networked systems, but it can be difficult to define how these mechanisms should be configured in networks that support many services that have differing and shifting requirements. In this paper, we explore the potential benefits of using policies for defining the configuration of mechanisms for resilience. We discuss some of the difficulties of defining configurations, such as identifying conflicts, and highlight how existing policy frameworks could be used or extended to manage this complexity.
1
Introduction
The cost of failure of communication systems can be extremely high, as we depend on them to support many aspects of our daily life. Developing strategies to ensure the resilience of networked systems is of primary importance, but the challenges these systems are subject to are wide-ranging and change over time. These include component faults and mis-configurations, as well as operational overload and malicious behaviour from intelligent adversaries. In this paper, we explore the use of policies to define configurations of mechanisms that can ensure the resilience of networked systems. The configuration of resilience mechanisms via policies is considered in the context of a general high-level strategy for resilience, called D2 R2 + DR – Defend, Detect, Remediate, Recover, and Diagnose and Refine [1]. We believe that using policies is beneficial for a number of reasons: we de-couple the implementation of the mechanisms from the strategy used to enable resilience, which is a desirable property considering the changing nature of challenges. Furthermore, policy frameworks may assist in tackling a number of challenging problems in defining resilience strategies for multi-service networks: we are specifically interested in deriving concrete configurations from high-level requirements, identifying conflicting configurations, and evolving configurations over time in response to the changing nature of challenges and requirements. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 118–122, 2010. c IFIP International Federation for Information Processing 2010
Strategies for Network Resilience: Capitalising on Policies
:Link
:Local Manager MO
:Intrusion Detection MO
:Flow ExporterMO
MonitorMO
:Rate LimiterMO
:Classifier MO
119
:Visualisation MO
1 getUtilisation notify(utilisation) on] tilisati [highU le() enab
loop
notify
3 5
()
2
4
notify (hig hRis k) [high isk) R Utilis gh hi y( notif limit(l ation] ink)
5
() tion tilisa getU
5
notify(flow, info
)
classify
src, dst) sification, notify(clas lim
7
it
(s
notify()
rc
,d
6
st .x
%
7 )
Fig. 1. Strategy for network traffic volume resilience
2
Policy-Based Resilience Strategy
One of the key problems related to resilience in networks is to discriminate operational overload due to legitimate service requests from malicious attacks, such as a Distributed Denial of Service (DDoS), and then apply adequate countermeasures [2]. To confront these challenges, we defined a strategy which relies on a number of mechanisms that must co-operatively enforce the resilience of the network, including a flow exporter, a rate limiter, and an anomaly classifier (Fig. 1). We use policies to configure and coordinate the interactions between these mechanisms. For example, specific root causes will require distinct remediation strategies, and, when a flow is classified as a possible DDoS attack, a preventive rate limiting action may be applied (Step 7 ). By having a resilience strategy implemented with the aid of policies, as opposite to having it hardcoded, one can easily change it by adding or removing policies, thereby permitting the modification of the strategy during run-time. This is of particular importance to us, as strategies for resilience are subject to frequent modifications, due to changes in requirements, context changes or new types of challenge.
3
Complexities of Defining Configurations
We highlight here where support can be found in policy-based management frameworks to address the complexities of defining configurations for resilience. 3.1
Deriving Configurations from High-Level Requirements
We assume policies will realise a high-level requirement to ensure resilience, e.g., in terms of the availability of a server farm and the services it provides. However, it is not clear if a resilience strategy such as the one in Fig. 1 is sufficient to ensure that a given high-level goal, e.g., defined in a SLA, is met. Moreover, complex
120
P. Smith et al. ath yP r rla to ve elec O S ce rvi r Se onito M
on highUtilisation (link) { do { RateLimiterMO limit (link, 90%); } }
on classification (f1, value, conf) { if ((value == 'DDoS') and (conf <= 0.8)) { RateLimiterMO limit (f1.src, fl. dest, 80%); } }
Network
Vertical conflict
Levels
Service
on highServiceUtilisation (service) { do { VMReplicatorMO replicateService (service); } }
r VM ato c pli Re c e d r er Co apte sifi s Ad las ism l an oca er C h S c L ag ID n Me Ma ce ien te sil Ra iter Re Lim k Lin itor n Mo S ID
Defend
Detect Remediate Recover Strategy
Horizontal Conflict
on classification (f1, value, conf) { if ((value == 'normal') and (conf > 0.8)) { RateLimiterMO limit (f1.src, fl. dest, 0%); } }
Fig. 2. Defining configurations for resilience is a multi-level problem, with vertical configuration conflicts across levels, and horizontal conflicts along the D2 R2 strategy
scenarios would make deriving concrete policies by hand intractable. We seek to derive implementable policy configurations from high-level specifications and intend to build on techniques, such as [3], which apply goal elaboration and refinement of QoS requirements into the policy configuration of routers. 3.2
Identifying and Resolving Conflicting Configurations
In complex multi-service networks, conflicts can lead to the resilience requirements of a set of services being met unnecessarily at the expense of another set, or no requirements being met for any service. Conflicts can manifest in a number of ways: vertically, across protocol levels, in the presence of concurrent challenges – e.g., a flash crowd and a DDoS attack. Because of a DDoS attack, rate limiting may be started on routers; a network-level mechanism. During a flash crowd, a service could be replicated to another server farm; a service-level mechanism. However, due to the na¨ıve rate limiting, replicating a service could make the resource starvation situation worse. Another type of conflict may occur horizontally, along the D2 R2 strategy. Consider an attack targeted at both a server farm and a corporate customer. Attack traffic could saturate the links that provide access to a core network, making push back of malicious traffic to the Internet gateways desirable. Detection mechanisms at the server farm may determine that a node has ceased to behave maliciously, and initiate a recovery configuration for that node by stopping a rate limiter. However, the node may still be behaving maliciously in relation to the corporate customer, and recovery could inadvertently disengage the remediation configuration for that network. Policies that demonstrate these conflicts are shown in Fig. 2. Policy analysis can help to ensure the correct specification of resilience strategies, in particular in terms of dominance and coverage checks [4]: the former could be applied in multi-level analysis, to ensure that mechanisms at one level
Strategies for Network Resilience: Capitalising on Policies
121
do not render mechanisms at another level redundant, the latter could be used for the analysis of configurations at the same level, e.g., conditions or range of values where mechanisms are not co-ordinated properly. 3.3
Learning Resilience Behaviour
Resilience configurations will need to evolve over time because the nature of attacks may change and new customer agreements may cause high-level priorities to shift. Furthermore, a strategy may prove to be sub-optimal or incorrect. To assist with this, we can benefit from existing research on policies. Typically, policy-based learning relies on the use of logical rules for knowledge representation and reasoning, as policies can be easily translated into a logical program [5]. Rules can be iteratively amended to better reflect resilience practices, based on how successful previous attempts to mitigate a challenge were. Similarly, the system must be able to learn entire new rules, for example, that during the football league final, high link utilisation is better remediated with the replication of the server streaming the live match, rather than simply rate limiting link capacity.
4
Conclusions
Network resilience is difficult to ensure because the configuration of systems is complex, spans across several levels, and is subject to a wide range of challenges. Policies provide flexibility in the configuration of the components that implement this strategy, as forms of detection and remediation are subject to frequent modifications. We examined the applicability of policies to mitigate high traffic volume challenges and highlighted how policy-based approaches can assist in making the problem more tractable. Future work will investigate how these policy techniques can be extended.
Acknowledgment The research presented in this paper is partially funded by the European Commission in the context of the Research Framework Program Seven (FP7) project ResumeNet (Grant Agreement No. 224619). This work has also been supported by the EPSRC funded India-UK Advance Technology Centre in Next Generation Networking. The authors are grateful to Angelos Marnerides for insights relating to detection approaches.
References 1. Sterbenz, J.P., Hutchison, D., C ¸ etinkaya, E.K., Jabbar, A., Rohrer, J.P., Sch¨ oller, M., Smith, P.: Resilience and Survivability in Communication Networks: Strategies, Principles, and Survey of Disciplines. Computer Networks: Special Issue on Resilient and Survivable Networks, COMNET (to appear, 2010)
122
P. Smith et al.
2. Peng, T., Leckie, C., Ramamohanarao, K.: Survey of network-based defense mechanisms countering the DoS and DDoS problems. ACM Comp. Surv. 39(1) (2007) 3. Bandara, A.K., Lupu, E., Russo, A., Dulay, N., Sloman, M., Flegkas, P., Charalambides, M., Pavlou, G.: Policy Refinement for IP Differentiated Services Quality of Service Management. IEEE Trans. on Network and Service Management 3(2) (2006) 4. Agrawal, D., Giles, J., Lee, K.W., Lobo, J.: Policy ratification. In: POLICY 2005, Washington, DC, USA, pp. 223–232. IEEE Computer Society Press, Los Alamitos (2005) 5. Corapi, D., Ray, O., Russo, A., Bandara, A., Lupu, E.: Learning rules from user behaviour. In: 2nd Int. Workshop on the Induction of Process Models (2008)
Automatic Link Numbering and Source Routed Multicast Visa Holopainen, Raimo Kantola, Taneli Taira, and Olli-Pekka Lamminen Aalto University, Department of Communications and Networking, P.O. Box 3000, 02015 TKK, Finland {visa.holopainen,raimo.kantola,taneli.taira,olli-pekka.lamminen}@tkk.fi
Abstract. We present a new paradigm for multicasting. Our paradigm combines two ideas: a simple distributed algorithm for automatically numbering network links with a small amount of numbers, and a novel forwarding method. We describe our paradigm in detail and discuss its benefits and drawbacks qualitatively and quantitatively. Keywords: multicast, link numbering, Steiner trees.
1
Introduction
Internet Protocol (IP) based multicast does not scale, because it requires core routers to keep a forwarding table entry corresponding to every multicast group (in the worst case). As the number of multicast groups grows with the upheaval of IP television and publish/subscribe applications, this is no longer feasible. Another approach for multicasting called LIPSIN (Line Speed Publish/ Subscribe Inter-Networking) was presented in [1]. The headers of LIPSIN carry a bloom filter that describes every link of the multicast tree. While being an interesting concept, LIPSIN has serious drawbacks, e.g.: DoS; LIPSIN prevents loops by dropping packets that came in from a wrong interface with respect to the packet’s bloom filter. However, since two different trees can have the same bloom filter representation, this causes Denial of Service (without virtual links, which dramatically increase the protocol complexity and sometimes the amount of state). Forwarding to wrong links; By design, LIPSIN forwards traffic to some links that are not a part of the intended multicast tree. While it is possible to reduce the amount of such ”false positives” by certain optimizations, they are an intrinsic property of bloom filters. If false positives are unacceptable, some other multicast mechanism is needed. No local recovery; In IP multicast all link/node failures are repaired locally. LIPSIN does not specify this, and also it cannot recover from single node/link failures using bypass tunnels (without virtual links). The reason is that normally LIPSIN disallows forwarding packets to the interface from which they came in, which is needed in many recovery scenarios. Large overhead; By design, bloom filters are relatively long bit strings in LIPSIN 256 bits is suggested. While this may be acceptable in large networks providing IP television service, small networks may find this wasteful. B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 123–134, 2010. c IFIP International Federation for Information Processing 2010
124
V. Holopainen et al.
Traditional (IP/MPLS) source routing suffers from high header overhead as well as from the lack of a protocol mechanism for multicasting. Also, source routes in packet headers reveal network topology to eavesdroppers [1]. Hence source routing is not used in current networks even for unicasting. We propose a Source Routed Multicast (SRM) paradigm that aims to rectify the drawbacks traditionally associated to source routing. 1.1
Overview of SRM and Qualitative Comparison to LIPSIN, IP Multicast, and Traditional Source Routing
The main differences of SRM compared to traditional source routing are the following: First, to make the header overhead in SRM as small as possible, we present a simple distributed algorithm that automatically numbers core network links with a small amount of numbers, leading to a small number of bits needed in packet headers (Section 2). We find empirically that this mechanism keeps header overhead very low. Second, we introduce a protocol mechanism for multicasting with source routes. More specifically, we propose to include additional bits into the packet header’s ”link stack” that indicate if a given core router needs to forward the packet to its customer interface(s) (Section 3). We find empirically that this mechanism does not introduce prohibitively high cost in terms of link usage (Section 4). Third, to give some protection against topology leaking, we use a random arbitrary-length padding in the ”address stack” of packet headers (Section 3). We circulate the padding in the header so that there is no way (at least in principle) for an eavesdropper to know the location of padding in the header, and hence no way to separate real link numbers from the padding1 . The high-level overview of SRM is illustrated in Fig. 1. SRM is intended to be used within a single Autonomous System (AS), and it is (implicitly) a Layer 2 protocol. However, it might be possible to extend it to a multi-AS setting by allowing ASes to advertise source routes (in some obfuscated format to hide topology) across AS boundaries. However, since arguably (due to bandwidth use based billing) there seems to be very little incentive for ASes to provide transit service for multicast, we will only focus on the intra-AS scenario. The main philosophical drawback of SRM compared to LIPSIN is that in SRM the access routers need to keep state (except in broadcast access networks, e.g. wireless). However, we claim that maintaining state in access routers is acceptable for two reasons: First, the amount of state needed in access routers is typically small compared to the amount of state potentially needed in core routers, and second, since access network topology is typically a tree or a ring, the access routers do not need complicated routing protocols. We claim that SRM provides smaller state for core routers than LIPSIN, because LIPSIN needs to cache certain packets (for details see [1]). 1
The third aspect of our design is presented solely to point out that it may be possible to hide topology with source routing. Hence (and due to lack of space) we will only mention a few (extremely heuristic) mechanisms in Section 2 and in Section 3 that should be associated to the use of padding in order to reduce the risk of topology leaking. This is a potential topic for future work.
Automatic Link Numbering and Source Routed Multicast
Source Routed Multicast Path computation IP packets
Content server
IP inside SRM
(Virtual) Customer interface
125
Customer interface Core network - stateless
Access network stateful
IP packets inside SRM packets
IP packets
Rendezvous point
IP packets IP packets
Access network stateless
Customer interface
IP packets
Fig. 1. SRM architecture overview. Note that SRM introduces more packets to the interface between the rendezvous point and its adjacent core router (also known as ”stress” [15]). Hence one should implement this interface virtually instead of physically.
SRM facilitates local recovery from single link/node failures, just like MPLS [9]2 . Also, SRM forwarding does not involve any cache lookups, and hence (at least in principle) it should always be fast. In LIPSIN a core router needs to AND the header of each incoming packet with each of its interface addresses. Hence it is slow if routers are not equipped with interface processor. On the other hand, in SRM the amount of per packet logic operations is constant. In IP networks a random path is selected among equal cost paths between two nodes. Hence by using IP we cannot always measure the performance of a given path. Also, LIPSIN does not facilitate this (without node IDs in packets). This is a concequence of the fact that in LIPSIN there may be several end-nodes even if we try to pin a certain path (because of forwarding to wrong links). On the other hand, SRM facilitates much easier active measurements. This point should become clear once we describe our paradigm in detail in Section 3. An example illustration of the differences in the cost-optimal multicast trees facilitated by the three approaches is presented in Fig. 2. There the gray router is the core router next to the multicast source, and the white routers are next to multicast destinations. In the figure LIPSIN provides the lowest cost while IP multicast yields the highest cost. In Section 4 we find that this is the case in general. 1.2
Related Work
SRM utilizes source-routing of multicast trees. Previous papers [4,5] have studied source-routing of general multicast trees, in which branching is allowed outside the source node. However, source routing of general trees is very complicated 2
In the case of local recovery the core routers need to be capable of adding the ”label stack” corresponding to the bypass tunnel into the packet header.
126
V. Holopainen et al.
1
2
1
1
1
2
(a) IP multicast na¨ıve [10] trees are used for multicasting (cost=6).
1
2
1
1
1
2
(b) LIPSIN - costoptimal Steiner arborescences may be used (cost=4).
1
2
1
1
1
2
(c) SRM - costoptimal set of paths may be used (cost=5).
Fig. 2. Example of the lowest-cost multicast trees provided by different approaches. Numbers represent link costs.
in terms of forwarding. We avoid this complexity by (intuitively) disallowing branching outside the source. Many papers, e.g. [6,16], have addressed the problem of computing multicast trees with degree constraints. Our approach is different in the sense that we do not impose a degree constraint on the source node, while for other nodes the (out)degree constraint is 1 (intuitively). Many papers, e.g. [15], have studied MPLS-based multicasting, in which branching is allowed only in certain routers. Our approach differs from this line of work, since we utilize explicit source routing instead of (IP/MPLS) routing tables. Many papers, e.g. [2], have addressed vehicle routing problems, which capture the essence of our problem. These papers typically focus on computation, while we will focus more architectural and protocol aspects.
2
Automatic Link Numbering Algorithm of SRM
The first question one might ask is, why should we number links when they already have MAC-addresses. Could we not use them for source routing? The problem is simply overhead. For example, if we would source-route the path in Fig. 3 with 48-bit addresses, the resulting address stack would be 392 bits long at the first node, while with SRM it is only 33 bits long (see Fig. 4). When thinking about automatic numbering, the first thing that might come into mind would be to number the nodes (core routers) of the network with network-wide unique numbers. This would facilitate hop-by-hop forwarding with the help of routing and forwarding tables, as well as tunneling multicast packets by listing only the destination node numbers in the packet header. However, this approach runs into trouble if we connect two large networks that have already numbered nodes. Also, it would not facilitate removing routing tables from core routers. Hence our automatic link numbering algorithm assigns a number for each link of the network instead. These numbers are not unique within the network. Instead, the aim is to use the smallest amount of numbers (and hence the smallest amount of bits in packet headers) possible, so that any path (sequence of link numbers) starting from a given node is still unique. Our approach has no
Automatic Link Numbering and Source Routed Multicast
127
3
0 2
1 1
6
3
0
2
1
2
7
0
2 0
4
5
5
1 0
4
7
6
1
9
1
2
3
3 11
0
3
8
2
1
2
12
3
4 10
0
4
0
13
5 14
1 18
2
7
1
0
6 17
1 3 1 16
0
2
4
21
20
0 19
2
0 3
15
1 23
2
3 22 27
1
0 25
0
2
3
1
26
3
28
4 5
24
0
4 1
3
0
32
1 0
2
29
31
30
Fig. 3. ITALIAN-NET - An example link numbering and a multicast path
not used at node 32
forward to customer IF 0->1
0 0 1 1 0 0 0 0 0 0 1 0
0 1 ... 0 0 0 0 0
link representation length; 4 bits
payload type/ length; 16 bits
header length; 8 bits
1->6
0
0 1 1
6->7 0 0 0 0
7->18 0
0 1 1
18->21
1
0 0 1
23->26
21->23 0 0 0 0
link/customer interface stack; 0-8132 bits
0
0 0 1
26->32
0
1 0 1
1
1 1 1
padding; link representation length8132 bits
SRM header; length = N*32 bits (in the example case N=2)
Fig. 4. SRM header corresponging to the path in Fig. 3 at the first core router
problems if we connect two already numbered networks, and furthermore, does not need routing or forwarding tables. We assign only one number for each link (instead of each interface). This approach enables the destination core router of a path to send a response to
128
V. Holopainen et al. 2
1
3
4
0
13
2 0 12
18
5 14
5 0
3
1
19
5
1
11
10 10
0
20 14
1
19
2
6
0
14
7
16
8
4
7
6
15
8
6
2
7
20
15
11
9
6
18 9
8
3
11
10
3
4
2
21
3
6
12
9
4
16
13
22
12 16
(a) UNINETT
18
9
13
19
4
11 17
(b) ATT
23
9
8
12
7
15
8
17
1 10 13 17
5
(c) NSFNET
5
10 7
(d) ARPANET (e) COST 239
Fig. 5. Test topologies
the source via the same path the packet traveled to it. For example, active measurement applications should benefit from this approach. The distributed link numbering problem that our algorithm implicitly solves differs from the distributed graph edge coloring problem (see e.g. [11]) in one key aspect: we typically need to color (number) only one link at a time, while edge coloring algorithms are optimized for coloring an existing network that has no colored links in the beginning. Hence we can use a much simpler algorithm than the most sophisticated coloring algorithms. The most common situation is the following: a core router notices a new link. In this case the adjacencent routers first synchronize their Link State Databases (LSDBs). Then both routers determine from their LSDBs whether each router having smaller ID has already numbered its adjacent links. If yes, then the router (one of the adjacent routers) determines from its LSDB the smallest number not yet assigned to any of its own core links, or the adjacent router’s (over the newly detected link) core links3 , and assigns this number for the new link. This means that if several links are detected simultaneously, our algorithm proceeds step-by-step from the core router having the smallest ID to the core router with largest ID. After each step the router that numbered its adjacent links floods the resulting numbers (as well as the adjacent node IDs) to other routers inside a routing protocol (e.g. OSPF or IS-IS). Then the router having the next smallest ID can number its adjacent links. If after the numbering there are number conflicts (i.e. one or more routers have the same number on two or more links) they are resolved in the order given by node IDs. An example of the result of automatic link numbering is given in Fig. 3. A classical theorem by Vizing says that the edges of a graph of maximum degree Δ can always be colored with at most Δ+1 colors. However, our algorithm sometimes needs more colors (numbers). Hence we simulated the performance of our algorithm in the real core networks of Fig. 5, in ITALIAN-NET, in a 10000-node grid graph, as well as in a 1000-node Waxman-graph generated with parameters α = 0.15, β = 0.2 (see [14]), yielding diameter 3. We assigned the numbers in random order for the links. Table 1 gives the results. From the table we can conclude that our algorithm rarely requires additional bits for link representation compared to optimal (Δ + 1) numbering. Hence we believe that 3
This default number selection policy aims to minimize overhead. To avoid topology leaking, the number selection should be configured to be ”sparser”.
Automatic Link Numbering and Source Routed Multicast
129
Table 1. Mean amount of numbers used by our algorithm compared to minimum ITALIAN-NET: 1
UNINETT: 1
ATT: 1.17
NSFNET: 1
ARPANET: 1
COST 239: 1.14
Grid (10000 nodes): 1.6
Waxman (1000 nodes): 1.10
the simplicity of our numbering algorithm outweighs the small overhead increase it produces (compared to optimal numbering).
3 3.1
Routing and Forwarding in SRM Overview
The main idea of our multicast paradigm can be illustrated by means of analogy. Consider the Open Vehicle Routing Problem [2], in which one has a depot that hosts one or more delivery cars. The depot is located at a certain network node, while the network links represent roads. Also, one needs to send some physical objects from the depot to customers (a subset of network nodes) using the cars. The problem is called Open if the cars do not need to return to the depot after delivering the objects to customers. Now if we do not limit the number of cars or their capacities, and we allow loops in the car routes, we have the essence of our multicast paradigm. Let us now give two more formal definitions, as well as two observations that intuitively motivate our approach. Definition 1. Minimum cost source-rooted multicast paths (SRMP) problem: Find the set of directed paths (possibly containing loops) starting from source node, so that each destination (core router next to a destination) is a part of at least one of the paths. Find the set in such a way that the sum cost of links that are used by the paths is minimal. Definition 2. Delay-constrained minimum cost source-rooted multicast paths (DC-SRMP) problem: SRMP with the restriction that all paths must have a given delay or less. Observation 1. With symmetric link costs optimal solution to SRMP has a cost that is at most twice the cost of the optimal Steiner arborescence covering the same set of destination nodes (cost increase factor 2). With asymmetric link costs the cost increase factor is at most D (nrof core routers next to destinations). Proof. Symmetric link costs: Take the cost-optimal Steiner tree (cost-optimal arborescence with backward links) and traverse it with Depth First Search (DFS), adding each traversed link (also backward links) to a path. Now any link of the optimal Steiner tree will be traversed at most twice. Asymmetric link costs: Select the lowest cost path to each destination.
130
V. Holopainen et al.
Observation 2. A feasible solution to DC-SRMP exists whenever a feasible solution to the corresponding delay-constrained Steiner tree (arborescence) problem exists. Also, in this case the cost increase factor is at most D. Proof. Any branch of a Steiner arborescence can be enforced by means of a source routed path. Hence, given the cost-optimal Steiner arborescence that satisfies delay constraints, SRM can send packets to any destination via the same path that is used in the arborescence. The main point of these proofs is that cost-optimal solutions to SRMP and DCSRMP never have a higher cost than D∗(cost of the corresponding cost-optimal Steiner arborescence4), and in case of symmetric link costs and no delay constraints, cost-optimal solutions to SRMP never have a higher cost than 2∗(cost of the corresponding cost-optimal Steiner arborescence). The latter result shows that theoretically SRM produces lower cost than IP multicast. This will be verified empirically in Section 4. It is well known that both Steiner arborescence problem as well as SRMP are N P-hard. However, many practical (optimal as well as heuristic) solution methods are known for both.5 3.2
SRM Header and Forwarding of SRM Packets
A key concept of our paradigm is the header structure of a multicast packet (SRM header). An example of this header is given in Fig. 4. An SRM header contains the following fields in the given order: 1. Link representation length in bits; 4 bits (maximum core router degree 4 22 −1 = 32768) 2. Header length in 32-bit words; 8 bits (maximum path length 8132/16 − 1 = 507 hops) 3. Payload type/length; 16 bits (similar to EtherType field in Ethernet header) 4. Link/customer interface stack + padding; variable length, altered by routers on the path The link representation length tells each router of the path how many bits are used in this header for expressing link numbers. By default the number of bits is determined based on the number of bits required to represent the largest link number of the used path. However, in order to hide the maximum node degree and to gain faster memory operations in core routers, the router vendor/network operator might decide to use a larger number of bits than required (most likely 7 bits) to represent links. In this case the automatic link numbering protocol could select the numbers more sparsely, allowing more randomness to the padding. The link/customer interface stack is a sequence of bits that indicates for every router of the path (1) if the router should forward the packet to customer interface(s), and (2) the next link (if any) where the packet should be forwarded. 4 5
Provided by LIPSIN. See e.g. web site neo.lcc.uma.es/radi-aeb/WebVRP/
Headerr address field size with optimal numbering (mean)
Automatic Link Numbering and Source Routed Multicast
131
300
250
200
150
100
50
0 2
20
200
2000
20000
Maximum core router degree on a path (mean) SRM (mean core path length 5 hops)
SRM (mean core p. len. 15 hops)
SRM (mean core p. len. 25 hops)
SRM (mean core p. len. 35 hops)
IPv4
Ethernet
LIPSIN/IPv6
Fig. 6. Address sizes in headers. Data points shown only if the false positive rate of LIPSIN (see [1]) on the path was on average below 20%. Mean core router degree was assumed to be half of the maximum degree.
There is one important aspect about the padding. It must begin with a bit sequence that does not correspond to a link number at the last core router of the path. This way the last router can determine the end of path. To avoid topology leaking, the padding should be selected in an intelligent way (future work) with sparse link numbering. Whenever an SRM packet arrives to a core router, the default action is to look at the first bit in the link/customer interface stack. If it is 1, then the router forwards the payload to customer interface(s). Let the link representation length be 7 bits. Now the router looks at the next 7 bits in the stack. If they correspond to a link number, the router moves these 7 bits prepended with a 0 to the end of the header, and forwards the packet to the link. 3.3
Scalability of SRM
Now let us evaluate the scalability of SRM. In most real networks the maximum node degree is less than 30 (see e.g. [1] and Rocketfuel). Hence we typically need at most five bits (maximum degree 25 − 1 = 31) for link representation. Let the average Autonomous System (AS) level path length be 3 [13], and the average core path length inside a single AS 15. In this case the link/customer interface stack length would be 5 ∗ 45 + 46 = 271 bits. Hence (since the Internet diameter is decreasing [12]) SRM could be used without a notable overhead increase (compared to e.g. IPv6 or LIPSIN) even if the whole Internet were a single domain. Fig. 6 presents mean link/customer interface stack size as a function of mean path length and maximum node degree.
132
4
V. Holopainen et al.
Delay and Cost of SRM, IP Multicast, and LIPSIN – Empirical Evaluation
In this section we will compare the delay and cost provided by SRM, IP multicast, and LIPSIN. To understand the following discussion, the reader should have some knowledge of linear optimization [17], and more specifically, how it is usually applied to Steiner problems [7,8]. Since explicit delay constraints are arguably rare, we will focus explicitly on cost and implicitly on delay. In other words, we optimize for cost and present the resulting costs and maximum delays for each of the paradigms. We used Integer Linear Programming (for an introductory treatment, see e.g. [17]) and Gurobi-solver6 to find an optimal solution to the minimum-cost Steiner arborescence problem, as well as an upper bound for SRMP by using a Steiner arborescence formulation that disallows branching outside source. This formulation was chosen because it can be easily implemented, and can be solved relatively quickly by using the well-known cut strategies from Steiner arborescence formulations (see [7,8]). To use this method in a real network, it would be possible to install the ILP solver (Gurobi in our case) to each rendezvous point, formulate the ILP program in another program (for example, written in C) whenever a JOIN or a PART message arrives to the rendezvous point, call Gurobi from the C-program, and install/modify the returned multicast group → source routed multicast path mappings in the kernel of the rendezvous point. In case of a JOIN, it might be useful to let the rendezvous point first install the shortest path to the destination (to facilitate fast response time), and solve the ILP off-line. Once the ILP program returns, the rendezvous point could re-configure its multicast group → source routed multicast path mappings. Our test networks (besides ITALIAN-NET) are presented in Fig. 5. They are commonly used in the literature. We also tested grid graph of 16 and 64 nodes. The cost increase factors resulting from our upper bound ILP formulation of SRMP compared to the optimal Steiner arborescence solutions with asymmetric link costs are presented in Table 2 (averages over 100 test runs). The number of destinations was 0.25∗(number of nodes) rounded to the nearest integer. More detailed test results for ITALIAN-NET are presented in Fig. 7. From subfigures 7(a) and 7(b) it can be seen that with unit link costs and delays the SRMP solutions (SRM ILP) have on average a lower cost than the optimal ones provided by IP multicast (SPF), and are at worst 10% higher than the optimal ones facilitated by LIPSIN (Steiner opt). From subfigures 7(c) and 7(d) it can be seen that the cost gap with asymmetric random link costs between SRMP and Steiner tree solutions is about 22%-35%. With many subscribers the SRMP solutions have a lower cost than the Steiner tree solutions obtained with the approximation algorithm of [3] (Steiner heur). However, the delays of SRM tend to be quite long. In other words, it seems that solutions to SRMP would rarely be feasible solutions to DC-SRMP. However, it 6
www.gurobi.com
Automatic Link Numbering and Source Routed Multicast
133
Table 2. Cost increase factor of SRM (upper bound ILP formulation) compared to optimal Steiner arborescences. Asymmetric costs selected randomly from interval (1,100). ITALIAN-NET: 1.21
UNINETT: 1.47
ATT: 1.18
NSFNET: 1.24
ARPANET: 1.09
COST 239: 1.09
Grid (16 nodes): 1.15
Grid (64 nodes): 1.30
^ĐĂůĞĚŵĂdžŝŵƵŵĚĞůĂLJ;ƵŶŝƚůŝŶŬĚĞůĂLJƐͿ ŵĂdžŝŵƵŵĚĞůĂLJ;ƵŶŝƚůŝŶŬĚĞůĂLJƐͿ
^ĐĂůĞĚƚƌĞĞĐŽƐƚ;ƵŶŝƚůŝŶŬĐŽƐƚƐͿ
ϭ͘ϯ ϭ͘Ϯϱ ϭ͘Ϯ ϭ͘ϭϱ ϭ͘ϭ ϭ͘Ϭϱ ϭ ϱ
ϭϬ
ϭϱ
ϮϬ
Ϯ͘ϲ Ϯ͘ϰ Ϯ͘Ϯ Ϯ ϭ͘ϴ ϭ͘ϲ ϭ͘ϰ ϭϮ ϭ͘Ϯ ϭ ϱ
Ϯϱ
ϭϬ
^ƚĞŝŶĞƌͺŽƉƚͺĐŽƐƚ
^ƚĞŝŶĞƌͺŚĞƵƌͺĚĞůĂLJ
^ƚĞŝŶĞƌͺŽƉƚͺĚĞůĂLJ
^ZDͺ/>WͺĐŽƐƚ
^W&ͺĐŽƐƚ
^ZDͺ/>WͺĚĞůĂLJ
^W&ͺĚĞůĂLJ
ϭ͘ϱ ϭ͘ϰ ϭ͘ϯ ϭ͘Ϯ ϭ͘ϭ ϭ ϭϬ
ϭϱ
Ϯϱ
(b) Delay; unit delays and costs ^ĐĂůĞĚŵĂdžŝŵƵŵĚĞůĂLJ;ƵŶŝƚůŝŶŬĚĞůĂLJƐͿ ŵĂdžŝŵƵŵĚĞůĂLJ;ƵŶŝƚůŝŶŬĚĞůĂLJƐͿ
ϭ͘ϲ ^ĐĂůĞĚƚƌĞĞĐŽƐƚ;ƵŶŝƚůŝŶŬĐŽƐƚƐͿ
ϮϬ
^ƚĞŝŶĞƌͺŚĞƵƌͺĐŽƐƚ
(a) Cost; unit delays and costs
ϱ
ϭϱ ηĚĞƐƚŝŶĂƚŝŽŶƐ
ηĚĞƐƚŝŶĂƚŝŽŶƐ
ϮϬ
Ϯϱ
Ϯ͘ϴ Ϯ͘ϲ Ϯ͘ϰ Ϯ͘Ϯ Ϯ ϭ͘ϴ ϭ͘ϲ ϭ͘ϰ ϭϮ ϭ͘Ϯ ϭ ϱ
ϭϬ
ϭϱ
ϮϬ
Ϯϱ
ηĚĞƐƚŝŶĂƚŝŽŶƐ
ηĚĞƐƚŝŶĂƚŝŽŶƐ ^ƚĞŝŶĞƌͺŚĞƵƌͺĐŽƐƚ
^ƚĞŝŶĞƌͺŽƉƚͺĐŽƐƚ
^ƚĞŝŶĞƌͺŚĞƵƌͺĚĞůĂLJ
^ƚĞŝŶĞƌͺŽƉƚͺĚĞůĂLJ
^ZDͺ/>WͺĐŽƐƚ
^W&ͺĐŽƐƚ
^ZDͺ/>WͺĚĞůĂLJ
^W&ͺĚĞůĂLJ
(c) Cost; Asymmetric delays and costs se- (d) Delay; Asymmetric delays and costs selected randomly from interval (1,100) lected randomly from interval (1,100) Fig. 7. Test results for ITALIAN-NET. No explicit delay constraints are used.
should be noted that in all tests the set of destinations was chosen randomly, which is the worst case for SRM.
5
Conclusions
We presented an automatic link numbering algorithm, and a related multicast paradigm called Source Routed Multicast (SRM). We compared SRM to IP multicast, as well as to a new proposal, LIPSIN. Overall, our analysis revealed that SRM is very competitive against these approaches.
134
V. Holopainen et al.
Acknowledgements The authors would like to thank Marko Luoma and Aleksi Penttinen for their comments.
References 1. Jokela, P., Zahemszky, A., Rothenberg, C.E., Arianfar, S., Nikander, P.: LIPSIN: line speed publish/subscribe inter-networking. In: Proc. of SIGCOMM (2009) 2. Letchford, A.N., Lysgaard, J., Eglese, R.W.: A Branch-and-Cut Algorithm for the Capacitated Open Vehicle Routing Problem. J. of the Operational Research Society 58(12), 1642–1651, 10 (2007) 3. Kou, L., Markowsky, G., Berman, L.: A fast algorithm for Steiner trees. Acta Informatica 15(2), 141–145 (1981) 4. Chen, W.-T., Sheu, P.-R., Chang, Y.-R.: Efficient multicast source routing scheme. Computer Communications 16(10), 662–666 (1993) 5. Yum, T.-S.P., Chen, M.S.: Multicast source routing in packet-switched networks. IEEE Transactions on Communications 42(234), 1212–1215 (1994) 6. Zhang, X., Wei, J.Y., Qiao, C.: Constrained Multicast Routing in WDM Networks with Sparse Light Splitting. J. of Lightwave Technology 18(12), 1917–1927 (2000) 7. Koch, T., Martin, A.: Solving Steiner tree problems in graphs to optimality. Networks 32(3), 207–232 (1998) 8. Ljubic, I., Weiskircher, R., Pferschy, U., Klau, G.W., Mutzel, P., Fischetti, M.: Solving the prize-collecting Steiner tree problem to optimality. In: Proc. of ALENEX workshop. SIAM, Philadelphia (2005) 9. Sharma, V., Hellstrand, F.: Framework for Multi-Protocol Label Switching (MPLS)-based Recovery, RFC 3469 (2003) 10. Doar, M., Leslie, I.: How Bad is Naive Multicast Routing? In: Proc. of INFOCOM (1993) 11. Barenboim, L., Elkin, M.: Distributed (Δ + 1)-coloring in linear (in Δ) time. In: Proc. of the 41st Symposium on Theory of Computing (2009) 12. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1) (2007) 13. Zhao, J., Zhu, P., Lu, X., Xuan, L.: Does the Average Path Length Grow in the Internet? In: Proc. of ICOIN (2007) 14. Zegura, E.W., Calvert, K.L., Bhattacharjee, S.: How to model an internetwork. In: Proc. of INFOCOM (1996) 15. Yang, B., Mohapatra, P.: Edge router multicasting with MPLS traffic engineering. In: Proc. of ICON (2002) 16. Boudani, A., Cousin, B.: A New Approach to Construct Multicast Trees in MPLS Networks. In: Proc. of ISCC (2002) 17. Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Mining NetFlow Records for Critical Network Activities Shaonan Wang1 , Radu State1 , Mohamed Ourdane2 , and Thomas Engel1 1
Faculty of Science, Technology and Communication, University of Luxembourg {shaonan.wang,radu.state,thomas.engel}@uni.lu 2 P&TLuxembourg {mohamed.ourdane}@dt.ept.lu
Abstract. Current monitoring of IP flow records is challenged by the required analysis of large volume of flow records. Finding essential information is equivalent to searching for a needle in a haystack. This analysis can reach from simple counting of basic flow level statistics to complex data mining techniques. Some key target objectives are for instance the identification of malicious traffic as well as tracking the cause of observed flow related events. This paper investigates the usage of link analysis based methods for ranking IP flow records. We leverage the well known HITS algorithm in the context of flow level dependency graphs. We assume a simple dependency model that can be build in the context of large scale IP flow record data. We apply our approach on several datasets, ranging from ISP captured flow records up to forensic packet captures from a real world intrusion.
1
Introduction
The monitoring of large network traffic volumes is limited by the existing technological solutions. Monitoring high speed 40 Gbps links is challenged by the already existing work charge on the routing data plane. One of the few activities that can be done is limited to recording and analyzing flow records. IP flow records are simple information records capturing the source, destination, the associated ports, the traffic volume and additional time stamps and flow related status. The natural question is how should these pieces of information be processed. On one side, the number of flow records is huge even for small sized edge routers, and on the other side it’s not obvious what information should be analyzed. We have considered this research question in this paper. The main contribution of our paper is twofold: we propose a simple dependency model for IP flow records and show how link based analysis can reveal interesting flow events. We will use in this paper the words IP flow records and NetFlow records interchangeably. We have validated our approach using the proprietary NetFlow data format, but our method is general and can be applied to any flow record format. We aimed in this paper at identifying relevant flow records, where by B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 135–146, 2010. c IFIP International Federation for Information Processing 2010
136
S. Wang et al.
relevant we understand the records that have generated ulterior network activity. We don’t consider that a flow matching a specific signature (application level or based on the involved IP addresses) is relevant per se, but we do consider that flows, having triggered an important follow-up network activity, are relevant. The notion of triggering is linked to a potential dependency relationship among flow records. The best illustration for this is the case of an attacker breaking in over an SSH account. While the SSH related flow traffic is in general not relevant, in this case this could be the case if follow-up activities of the compromised host will be observed: large scale network scanning, rootkit downloading, massive SMTP traffic or botnet membership. For scoring such relevant IP flow records and understanding the most active activities on the network, our approach consists of two major steps. Firstly, with a simple yet efficient dependency model, we discover the causality dependency between NetFlow records. Then, to facilitate analyzing the overwhelming scale of NetFlow dependency graph, we automatically select the most relevant NetFlow records using the link analysis algorithm HITS [12]. To the best of our knowledge, this is the first attempt to apply HITS algorithm from the web search and bibliometrics domain, in the field of network monitoring. The experiment result shows that HITS algorithm suits well the task of ranking most relevant flows from the perspective of network anomaly detection, bandwidth usage, etc. The remainder of the paper is organized as follows. Section 2 provides an overview of our NetFlow ranking architecture. Section 3 presents the background of NetFlow and NetFlow collecting approaches. We explain and discuss our NetFlow dependency discovery engine in section 4, and in section 5, we rank flow records using HITS algorithm and interpret the results. We validate our flow ranking technique with various data sets in section 6. Section 7 summarizes the related work and section 8 concludes the paper and talks about future work.
2
High-Level Architecture of FlowRank
Before showing the design and implementation details of each component, we provide a high-level architecture of our NetFlow ranking system. As illustrated in Figure 1, our ranking architecture consists of three components: NetFlow collector, dependency discovery engine, and rank engine. NetFlow collector collects NetFlow records either through border routers or dedicated probes. The Dependency discovery engine discovers dependencies among NetFlow records. The intuition of the dependency discovery is that if NetFlow A triggers NetFlow B, then destination address of NetFlow A matches source address of NetFlow B. NetFlow dependencies provide a global view of causalities among network traffic, thus is a valuable tool to detect the root cause of abnormal network traffics. To select the most dependent NetFlow records among the huge amount of dependencies discovered previously, rank engine ranks the relative importance of each NetFlow record using link analysis algorithm HITS [12]. Experiment results in the section 6 show that HITS is indeed appropriate for this task.
Mining NetFlow Records for Critical Network Activities
137
Fig. 1. Flow Rank Architecture
3 3.1
NetFlow Collection What Is NetFlow
NetFlow is a proprietary protocol for collecting IP flow packets information on networks. Though some variant versions exist, a NetFlow record summarizes a network traffic flow as its source and destination IP addresses, source and destination ports, transportation protocol as well as the traffic volume transmitted during this flow session. NetFlow operates by creating new flow cache entries when a packet is received that belongs to a new flow. Each flow cache keeps track of the number of bytes and packets of similar traffic during certain period of time until the cache expires, then this information is exported to a collector. NetFlow provides a powerful tool to keep track of what kind of traffic is going on on the network, and are widely used for network monitoring. Most vendors support different flavors of similar flow monitoring approaches and a common standardization is done within the IETF IPFIX working group [6].
3.2
NetFlow Collection
One can collect NetFlow records either directly through border routers or using stand alone probes such as taps. Router based approaches require no extra hardware installation, but suffer from inaccurate report and degraded routing performance in case of traffic peaks. On the other hand, stand alone probe approaches provide reliable NetFlow record reports even in case of large volume of traffic, but require additional hardware installation in every link that needs to be observed and cause extra cost for maintenance.
138
4
S. Wang et al.
Dependency Discovery
Causalities among netflow records contain valuable information for detecting the root cause of attacks as well as the most critical services that other services require to function properly. We assume that a NetFlow record depends on the other if the former is triggered by the latter. In our current model, we consider NetFlow record A causes NetFlow records B if after observing flow A arrives at a host H, within a predefined time window, we observe also flow B going out of the same host H. Figure 2 illustrates an example of dependencies between two flows. Figure 2a shows a message sequence chart in which there are three Hosts A, B, and C. After observing F lowAB coming from HostA to HostB, we observe F lowBC coming from HostB to HostC.
FlowBC src:B dst:C
FlowAB src:A dst:B (a) NetFlow sequence chart
(b) NetFlow dependency graph
Fig. 2. Flow dependency example
Two nodes in the dependency graph as shown in Figure 2b represent two NetFlow records. Each node is labeled with NetFlow record ID (labeled as rcdID), source IP and source port (labeled as src) and destination IP and destination port (labeled after dst). Record IDs suggest the time order in which the NetFlow records are observed. Small ID is observed earlier. That is, NetFlow 1 is observed before NetFlow 2. In addition, the destination IP address of NetFlow 1 matches the source IP address of NetFlow 2, according to our dependency model, we assume flow 1 triggers flow 2, in other words, flow 2 depends on flow1. We use a directed edge pointing from flow 2 to flow 1 to denote a dependency of flow 2 on flow 1. After aggregating all the NetFlow causalities within a time window, our dependency model is able to discover causalities of a flow which depends on the arrival of many other incoming flows. The same holds for discovering all the successive NetFlow records of a NetFlow record which triggers many outgoing NetFlow records. This model fits our objective in terms that it reveals the indepth causal connections among NetFlows on a network level. While the simplicity and efficiency of this model suits well for analyzing network traffic online, the accuracy of discovered dependencies can be improved using more advanced techniques. Previous studies such as Barham [2] and Reynolds
Mining NetFlow Records for Critical Network Activities
139
[15] have applied probability approaches to improve the accuracy of dependencies discovery, but computation overheads made online approach infeasible. Our experiments show that our model can achieve adequately acceptable results in identifying the most critical NetFlow records
5
Relative Importance Rank Engine
Rank Engine ranks the relative importance of NetFlow records based on the dependencies discovered by dependency discover engine using link analysis algorithm HITS [12]. 5.1
HITS Algorithm
Given a dependency graph, HITS ranks the relative importance of each node with two values: authority value and hub values. Authority value indicates the importance of a node in terms of how many other nodes are dependent on it. Hub value indicates how many important nodes a node points at, that is how many other nodes it depends on. Authority value and hub value are computed in a recursive manner. More precisely, given an n*n adjacency matrix A generated from a dependency graph of n nodes, where entry (i, j) is 1 if node i depends on node j, and 0 otherwise. With an all one vector as the initial value, HITS computes authority value ai and hub value hi for node i using the following equations iteratively. (t+1)
ai
(t+1)
hi
=
hj
(t)
(1)
(t+1)
(2)
j:j−>i
=
j:i−>j
aj
where j− > i indicates node j depends on node i. In other words, in round (t + 1), authority value of node i is the sum of round t hub values of the nodes that pointing to node i and hub value of node i is the sum of round t+1 authority values of the nodes that node i points at. Rewriting the above equations in the form of authority vector av and hub vector hv, we can get: av (t+1) = AT hv (t) = (AT A)av (t) hv
(t+1)
= Aav
(t+1)
T
= (AA )hv
(t)
(3) (4)
With normalization to unit length at the end of each iteration, authority and hub vectors converge at the principle eigenvector for matrix AT A and AAT [13]. 5.2
Rank NetFlow Records with HITS
Fed with the adjacency matrix of NetFlow dependency graph, HITS algorithm ranks each NetFlow record with an authority value and hub value. According to
140
S. Wang et al.
the ranks, we can classify NetFlow records into four categories: low authority and low hub value records, high authority and low hub value records, low authority and high hub value records, and high authority and high hub value records. A low authority and a low hub value indicates that the NetFlow record is not well connected with other flows in the dependency graph, thus of little significance in terms of causal evidence. This does not necessarily mean that such a record is benign. It might be highly nefarious, but at least it does not trigger successive network activities. A high authority value and low hub value characterize a flow that many other flows are dependent on, but which on its turn have no many dependencies on other flows. Such flows are prime candidates for representing the root cause and primordial events in case of malicious or suspicious usage. A low authority value and a high hub value for a flow record will indicate a flow that did not trigger important follow up network activity, but has many dependencies on other flows. Such flow records might indicate either a flow that ended a set of activities or a potential false positive. Finally, we might have flow records having both high authority value and a high hub value. The associated flows correspond to network traffic having had both important follow up network activities and depending on many previous flows. These flows should in principle correspond to important traffic.
6
Experiment and Evaluation
The key experiments that we did perform address the following questions: – Can we use our method as a forensic tool? That is, given a packet capture for which we know the real evidence (using a manual annotation) can we reveal the important flows using our method? – How many flows are ranked in the four categories (low and high authority value, low and high hub value) for real network traffic? We have validated our approach on several scenarios, among which we will highlight one whose network traffic are publicly available [14]. These traces were captured on compromised honeypot, deployed by the honeypot project1 . 6.1
Attack Description
The network schema of this attack scenario is shown in Figure 3. The compromised machine is located on a local area network, with IP addresses ranging from 172.16.1.1 to 172.16.1.250. An attacker (IP address 211.180.209.190) performs a simple operating system fingerprinting in order to learn the operating system of 172.16.1.103. This is is done using a telnet: no user name, nor passwords are provided, the attacker aims just at getting the welcome banner of the remote host. Next, the attacker scans several machines for the portmap service. 1
www.honeynet.org
Mining NetFlow Records for Critical Network Activities
141
'
% &$'&
'&
'&
!"###$
Fig. 3. Attack scenario
Several hosts run this service, among which the attacker can find the honeypot (172.16.1.108). These reply to the attacker. The attacker requests the port number of the RPC service stat. Her main objective is to launch an exploit against a vulnerable version of the [7] service. The honeypot is vulnerable to this exploit2 and the attacker is able to compromise it. The result consists in a remote shell being opened on the TCP port 39168. Using this shell, the attacker can interact with the compromised machine. She will download a rootkit using the file transfer protocol. The server is 193.231.236.41 and is located in Romania. Additionally, the attacker sends two emails. These are the connections going to the SMTP servers (216.136.129.14 and 209.61.188.3). To summarize this attack: An attacker uses at least two different step stones to scan and attack the network. Once the attack is successful, the attacker will download additional malware (from another location repository) in order to maintain her privileges and access rights. 6.2
Experimental Result
We extracted 48 NetFlow records from the packet capture. Table 2 summarizes the associated NetFlow records. Figure 4 illustrates dependencies among the NetFlows. Each node stands for a NetFlow record labeled with the associated NetFlow ID. Due to the page space limitation, we labeled each node with a NetFlow record ID number instead of the detailed NetFlow record tuples. Details of selected high ranking NetFlow records can be found in Table 1. It also lists the ranking results after running HITS based on the dependencies graph. The cumulative histogram shown in Figure 5a displays the ranking score distributions of the NetFlow records collected from honey pot attack scenario. 2
http://packetstormsecurity.nl/0008-exploits/statdx.c
142
S. Wang et al.
Fig. 4. NetFlow dependencies
6.3
Rank Distributions on Backbone Traffic
We have looked at the statistical distribution for both authority value and hub value over several datasets obtained from a large European Internet Service Provider. The table 1 summarizes the properties of the underlying flows. We did not consider larger datasets; because we aimed at comparing the distribution with respect to the honeypot scenario. Figures 5 and 6 address this comparison. Note that in order to expose the most important flows, we scale the authority and hub values of each dataset up so that the top ranked authority and hub values ranked as ones. We don’t have full packet captures for the ISP originated datasets, so we don’t really know to what extent the traffic was malicious, but the ranks are distributed similarly. This is not the case for the honeypot case, where a larger quantity of flows do have significant hub values when compared to the authority values. In order to investigate the distribution of HITS ranking scores, we collected 6 data sets, each of which spans a time window of 1 minute. Figure 7 illustrates the proportion of zero ranked flow records, we refer zero ranked flow as flows with both zero authority value and hub value. The experiment shows that as Table 1. Attack activities step flowID 1 2 3 4 5 6 7
42 12 13 15 16 21 22
attack name Telnet Portmap Stat call Buffer overflow Rootkit download Send email Send email
src IP: port 211.180.229.190:3329 211.185.125.124:790 211.185.125.124:791 211.185.125.124:4450 172.16.1.108:1026 172.16.1.108:1028 172.16.1.108:1029
dst IP: port
packet# bytes# authority hub score score 172.16.1.103:23 40 2856 0 0 172.16.1.108:111 4 336 1 0 172.16.1.108:931 6 6708 1 0 172.16.1.108:39168 168 15642 1 0 193.231.236.41:21 74 6318 0 0.48 216.136.129.14:25 48 5562 0 0.89 209.61.188.33:25 40 3680 0 0.89
Mining NetFlow Records for Critical Network Activities
143
(a) honey pot attack scenario
(b) ISP dataset
Fig. 5. NetFlow ranks cumulative histogram
1
0.8 0.75 0.7 0.65
authority value distribution hub value distribution
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
0.9 0.85 0.8
Proportion of NetFlows
Proportion of NetFlows
1
0.96 0.94 0.92
authority value distribution hub value distribution
0.95 0.9 0.85 authority value distribution hub value distribution
0.75 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
(b) ISP dataset 2
(c) ISP dataset 3
(a) ISP dataset 1
0.98
authority value distribution hub value distribution
Proportion of NetFlows
0.9 0.85
0.95
1
1
0.98
0.998
0.96 0.94 0.92
authority value distribution hub value distribution
0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
(d) ISP dataset 4
(e) ISP dataset 5
Proportion of NetFlows
Proportion of NetFlows
Proportion of NetFlows
1 0.95
0.996
0.994
0.992
0.99 0
authority value distribution hub value distribution
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Rank values
(f) ISP dataset 6
Fig. 6. Rank score distributions
the number of NetFlow records increases, the proportion of zero ranked flow records also increases, thus the number of non-zero ranked flow records stays within a manageable size for human interpretation. This shed a light on using our technique as a forensic tool for mining essential activities out of large scale of raw data.
144
S. Wang et al.
Table 2. Flow Statistics
Total Flows Total bytes Total packets avg bytes/sec avg packet/sec Destination Ports # Distinct Source IP # Distinct Destination IP # Average Packets/Flow Average Bytes/Flow
attack scenario 48 1.2M 2.3k 19.0k 0.036 14 14 14 48.3 25.6k
ISP ds 0 ISP ds 1 ISP ds 2 658.1k 26 91 898.8M 141M 378.6M 2.0 M 162k 468k 3.0M 2.35M 6.3M 6.8k 2.7k 7.8k 63879 21 72 69310 16 58 70946 15 57 3.04 5.8k 5.0k 1.37k 5.0M 4.0m
ISP ds 3 141 369.6M 477.3k 6.15M 8.0k 106 91 94 3.3k 2.6m
ISP ds 4 229 145.3M 216.3k 2.4M 3.6k 140 154 136 948.7 637.4k
ISP ds 5 410 3924M 5.29M 65.4M 88.1k 313 243 232 12.9k 9.60M
ISP ds 6 11.3k 124.3M 214.8k 2.1M 3.6k 8.9k 1.6k 3.1k 19.0 11.0m
Fig. 7. Proportion of zero rank NetFlow records distributions
7
Related Work
Discovering dependencies among network traffic has been extensively studied previously in the network management community [10,15,11,1,4,8,9,2,5]. Our approach differs from the previous study mainly in that previous study focuses on discovering host level and application level dependencies in order to facilitate fault allocation, reconfiguration planning, and anomaly detection, while our objective is to reveal causal dependencies among NetFlows to detect abnormal intense activities online as well as the critical hosts that most other hosts require to function properly. Previous works [9] [2] [5] [15] on dependencies discovery applied various statistics and probability techniques to reduce the false dependencies. These works are complementary to our model. In our current model, we have chosen a simple yet
Mining NetFlow Records for Critical Network Activities
145
efficient model due to its better online performance to deal with the large amount of NetFlow records. Sawilla [16] ranked attack graphs using pagerank [3], S Wang [17] ranked NetFlow record with pagerank. While both are link analysis algorithms, pagerank ranks nodes on a dependency graph based on their in-degree and HITS ranking also reflect out-degree as hub values. Considering out-degree of NetFlow records helps to reveal flows with destination address that are targets of large amount of network traffic, the ultimate goal of a series of attack activities for example.
8
Conclusion and Future Work
We have addressed in this paper a new method to detect relevant IP flow records. Our approach leverages the HITS algorithm to search for relevant nodes in dependency graphs. The dependency graphs are build by taking into account the potential causality among several flow records. Albeit simple, such a model can capture causal dependencies, where for instance one flow is the trigger for a large set of follow up network activity. We have applied our method on several datasets. The first dataset concerned a publicly available network capture from a forensic challenge and our method correctly identified the relevant malicious IP flows. We have also validated our method on a large IP flow capture from an ISP border router in order to assess its limits in terms of data volume. We plan to extend our quantitative analysis for a larger set of different traffic scenarios, like for instance botnet detection. The current problem that we face is related to the baseline of an analysis method. Since our approach works on captured netflow records, we have no real evidence (complete packet captur) to compare with. The usage of full packet capturing is for legal reasons impossible. We are also investigating the potential application to a larger class of security events that include firewall logs, IPS data and syslog events within a larger context of event correlation and root cause analysis.
References 1. Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes. In: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 74–89 (2003) 2. Barham, P., Black, R., Goldszmidt, M., Isaacs, R., MacCormick, J., Mortier, R., Simma, A.: Constellation: automated discovery of service and host dependencies in networked systems. TechReport, MSR-TR-2008-67 (2008) 3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30(1-7), 107–117 (1998) 4. Chen, M., Accardi, A., Kiciman, E., Lloyd, J.: Path-based failure and evolution management. In: NSDI 2004 (January 2004) 5. Chen, X., Zhang, M., Mao, Z.M., Bahl, P.: Automating network application dependency discovery: Experiences, limitations, and new solutions. In: Proceedings of OSDI (2008)
146
S. Wang et al.
6. Internet Engineering Task Force(IETF). Ip flow information export (ipfix) (March 2010), http://www.ietf.org/dyn/wg/charter/ipfix-charter.html 7. Network Working Group. Rpc: Remote procedure call protocol specification version 2 (March 2010), http://tools.ietf.org/html/rfc5531 8. Iliofotou, M., Pappu, P., Faloutsos, M., Mitzenmacher, M., Singh, S., Varghese, G.: Network monitoring using traffic dispersion graphs (tdgs). In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp. 315–320 (2007) 9. Jian-Guang, L., Qiang, F., Yi Wang, J.: Mining dependency in distributed systems through unstructured logs analysis, http://research.microsoft.com 10. Kandula, S., Chandra, R., Katabi, D.: What’s going on?: learning communication rules in edge networks. In: Proceedings of the ACM SIGCOMM 2008 conference on Data communication, pp. 87–98 (2008) 11. Kannan, J., Jung, J., Paxson, V., Koksal, C.E.: Semi-automated discovery of application session structure. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, pp. 119–132 (2006) 12. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5) (September 1999) 13. Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: International Joint Conference on Artificial Intelligence, vol. 17(1), pp. 903–910 (2001) 14. The Honeynet Project. Scan18 (March 2010), http://old.honeynet.org/scans/scan18/ 15. Reynolds, P., Wiener, J.L., Mogul, J.C., Aguilera, M.K., Vahdat, A.: Wap5: blackbox performance debugging for wide-area systems. In: Proceedings of the 15th international conference on World Wide Web, pp. 347–356 (2006) 16. Sawilla, R., Ou, X.: Identifying critical attack assets in dependency attack gaphs. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 18–34. Springer, Heidelberg (2008) 17. Wang, S., State, R., Ourdane, M., Engel, T.: Mining netflow records for critical network activities. In: Proceedings of the 6th International Wireless Communications & Mobile Computing Conference (2010)
Implementation of a Stream-Based IP Flow Record Query Language Kaloyan Kanev, Nikolay Melnikov, and J¨ urgen Sch¨ onw¨ alder Computer Science, Jacobs University Bremen, Germany {k.kanev,n.melnikov,j.schoenwaelder}@jacobs-university.de
Abstract. Internet traffic analysis via flow records is an important task for network operators. There is a variety of applications, targeted at identifying, filtering or aggregating flows based on certain criteria. Most of these applications exhibit certain limitations when it comes to the identification of complex network activities. To overcome some of these limitations, a new flow query language has been proposed recently, which allows to express complex time relationships between flows. In this paper, we describe a prototype implementation of this query language and we evaluate its performance. Keywords: Flow Query Language, NetFlow, Network Monitoring.
1
Introduction
Internet traffic analysis via flow records is an important task for network operators. There is a variety of applications, targeted at identifying, filtering or aggregating flows based on certain criteria. Most of these applications exhibit certain limitations. Query definitions often have a non-uniform structure and are difficult to write and to maintain. Furthermore, query languages are often restricted to very basic flow matches and they usually cannot be used to detect complex flow patterns, mainly due to missing time and concurrency matching mechanisms. To overcome some of these limitations, a new stream-based flow query language has been proposed recently [1]. It allows to express complex time relationships between flows by utilizing Allen’s time interval algebra [2] for describing time relationships between flows and flow groups. In this paper, we focus on implementation aspects of the stream-based flow query language. Our research aim was to address the following questions: – How can the stream-based flow query language be implemented in an extensible manner? – What is the performance impact of using a high-level programming language? – How does the complexity of the merger impact the overall execution time? While our primary goals for the prototype were completeness and correctness, we analyze our prototype to identify performance critical sections. The rest of B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 147–158, 2010. c IFIP International Federation for Information Processing 2010
148
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
the paper is organized as follows. In Section 2 we provide an overview of the Flowy prototype implementation. We evaluate the performance and discuss the complexity of the Flowy implementation in Section 3. Section 4 reviews related work before Section 5 concludes our paper.
2
Flowy Prototype Implementation
In this section we provide an overview of the Flowy architecture and its implementation. Using a sample query, we explain the stages of query execution that correspond to the components of the query language. Each stage is implemented as a separate Python class and consists of two modules. Validator modules are used to initiate and interconnect (passing one validator as an argument into the following validator) all stages. They also perform all necessary checks (i.e., double filter definition) of the defined query syntax. Execution modules define methods used at execution time. For the purpose of illustration we employ the following query example that makes use of all stages. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
splitter S {} filter www_req { dstport = 80 } filter www_res { srcport = 80 } grouper g_www_req { module g1 { srcip = srcip dstip = dstip etime < stime delta 1s } aggregate srcip, dstip, sum(bytes) as bytes, count(rec_id) as n, bitOR(tcp_flags) as flags, union(srcport) as srcports } grouper g_www_res { module g1 { srcip = srcip dstip = dstip etime < stime delta 1s } aggregate srcip, dstip, sum(bytes) as bytes, count(rec_id) as n, bitOR(tcp_flags) as flags, union(dstport) as dstports } groupfilter ggf {
Implementation of a Stream-Based IP Flow Record Query Language 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
149
bitAND(flags, 0x13) = 0x13 } merger M { module m1 { branches B, A A.srcip = B.dstip A.srcports = B.dstports A.bytes < B.bytes B oi A OR B d A } export m1 } ungrouper U {} "./netflow-trace.h5" -> S S branch A -> www_req -> g_www_req -> ggf -> M S branch B -> www_res -> g_www_res -> ggf -> M M->U->"./ungrouped.h5"
The splitter is always defined the way it is shown in the example above. It is used for copying records to respective branches. There are two filters defined in the query: www req and www res. The www req filter is a part of branch A, and it selects flow records with destination port value of 80. The www res filter is a part of branch B and performs the same task as www req, but for source ports. Each of the two branches have groupers g www req and g www res following the filters. Both groupers define similar rules, but have a difference in the aggregation mechanism. In this case, there are three rules that say: ”form a group based on the same source and destination IPs of the records, and make sure that the start time of the next record does not exceed the end time of the previous record by more than one second”. The aggregation mechanism attaches to each group aggregated information as meta data. In this case it sums up all the bytes of the records present in groups, counts the number of records, bitwise-or all the TCP flags and creates a list of all source ports appearing at group records (branch A) or the list of all destination ports (branch B). The group filter performs absolute filtering on the group records. The rule of our sample groupfilter ggf states that the aggregated flags of each group should contain SYN, ACK and FIN flags. Another possibility could have been a rule like bytes > 1024, which indicates that groups with an aggregated record size of at least 1024 bytes will pass the rule. The merger M contains a module with four rules and is specified to operate on two branches (A, B). It defines a rule (line 38) to match source IP addresses of branch A groups to destination IP addresses of group B. Having matching source and destination IP addresses is not sufficient information to identify an HTTP download session, so the aggregated source/destination ports of each branch’s groups should match as well (line 39). Normally, the request for data download should be much smaller in size than the actual data. The next rule (line 40) indicates that the download requests from branch A should be smaller than the
150
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
Fig. 1. Stream query for identifying HTTP download sessions
responses (with requested data) from branch B. The last rule (line 41) represents an instance of Allen’s time algebra. The operator oi (overlap inverse) requires for the record groups from branch B (responses) to occur after the record groups from branch A (requests), and d (during) indicates that responses from branch B should occur during requests from branch A. The ungrouper U expands groups into records. These will represent records that satisfy the query. The last part of the query definition indicates the way modules are interconnected. Essentially, the same conceptual view of this two-branch query for HTTP download sessions is presented in Fig. 1. 2.1
Python, PyTables, and PLY
Flowy was implemented in Python, using PyTables [3] (an HDF V.5 access library) for storing the flow records and Python Lex and Yacc (PLY) [4] for generating the parser. The Python programming language was chosen because it is a very high-level, dynamically typed language, which allows rapid development and provides various convenient features relevant to this project. PyTables is a storage solution used by Flowy. It provides an interface to store large amounts of data and is based on the Hierarchical Data Format (HDF) [5]. HDF was chosen due to its fast and memory-efficient performance at the initial testing stages of the implementation effort. The PLY parser generator was chosen due to familiarity with the Lex and Yacc conventions. Since the query files have a relatively simple structure, and are not expected to become very large, parser performance is not of great importance for the Flowy implementation. 2.2
Records
The main unit of data exchanged through Flowy’s processing pipeline is the network flow record. Records can be grouped (i.e., at the grouper stage) to form group-records. A Record class that deals with reading the records and storing them into the PyTables is dynamically created at runtime. This was done in order to add flexibility to the tool, and to allow it to process different versions of NetFlow [6] data (NetFlow V.5 flow records are different from NetFlow V.9 or IPFIX flow records).
Implementation of a Stream-Based IP Flow Record Query Language
2.3
151
Filters and Rules
The filter Python module implements the filtering stage of the pipeline. It is important to note that there is only one Filter class instance for all branches. Instead of using a splitter, which copies each record to the filter for each branch of the pipeline, the Filter instance reads every record and matches it against all filter rules in the branches. By doing so, the filter stage performs absolute filtering, as none of the records are compared to each other. In order to identify which branch’s rule was matched, a record mask is added to the record. The record mask is a tuple of True/False flags, corresponding to the branches where the record should be passed to. Thus, the filter stage produces a stream of (record, record mask) pairs. Each filtering statement is converted to a Rule class instance, against which records are matched. Rule instances are constructed from a branch mask, an operation (=, <, etc.), and arguments. Arguments may be either constants, or references to fields from the record being matched. 2.4
Branches and Branch Masks
The record mask shows which branches the record being filtered should be passed to. This is a simplified example of how two filters a and b from branches A and B respectively, are turned into a set of rules with their corresponding branch masks: filter a { prot = protocol("TCP") dstport = 80 }
filter b { prot = protocol("TCP") bytes > 1024 }
# protocol("TCP") is evaluated during parsing to its numeric value 6 Rule(((True), (True)), EQ, [Field("prot"), 6]) Rule(((True), (False)), EQ, [Field("srcport"), 80]) Rule(((False), (True)), GT, [Field("bytes"), 1024])
The first argument of the Rule constructor is a simplified branch mask, used here instead of a real one, for clarity. The first rule’s branch mask indicates that it is applicable to both filters and it should match flow records representing TCP connections. The second Rule is applicable only to branch A, and it checks that the source port is equal to 80. The last Rule is valid only for branch B and it states that the size of the record should be greater than 1024. After matching against all filter rules, the record is passed to the splitter, which copies it only to branches for which the branch mask is True. For more complicated queries that may involve logical OR, a subbranch mechanism is implemented, but its discussion is left out of this paper. It is also possible to construct composite filters - new filters from existing filters.
152
2.5
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
Splitter
The splitter module is used for copying records to respective branches based on their record masks. Branches do not necessarily go through all records following the filtering stage, therefore, separate threads are used for each branch, in order to be able to do work even if other branches are record-starved. The Splitter class takes a mapping from branch names to branch objects. It has a method split(), which dispatches the record to the branches marked by its mask. The method go(), iterates through all the records available from the filter and splits them to the corresponding branches. 2.6
Grouper
The grouper module forms groups of flow records based on the defined rules. The structure of the rules is similar to those of the filter module. Group objects contain group records’ information for the absolute rule matching, as well as the first and last records of the group - needed for matching relative rules (i.e., group occurrence times comparison). Aggregation operations are performed on Python callable AggrOp objects. They aggregate on the given records. Each group has its own set of AggrOp objects. Their initialisation happens by passing the record field they should read, the field of the group record they should store the end result in, and the data type of the record field. The data type is needed because some operations, like average, should return the result in the same type as the record field. The implementation of the grouping algorithm uses a different approach from the theoretical description presented in [1]. The nesting order of the loops, which iterate over groups and records are reversed. Rather than tagging the records by passing over the whole set of untagged records for each new group, the implementation keeps the groups list in memory, and for each record it checks whether it belongs to the group. This removes the need for random reads from slow permanent storage and achieves the grouping in a single pass over the records. User-defined aggregation operations may be imported using the --aggr-import command line argument. 2.7
Group Filter
Group filters work like a simplified version of the normal record filter. The branch masks mechanism is not used with group filters, since group filters read and export records from a single branch. Besides plain filtering, each group filter adds the records to the group record time index for the branch, and stores output records to a PyTables file, so that the groups can be read using random access, and ungrouped in the ungrouper stage. The time index is an index that maps time intervals to the records, which occur during this interval. 2.8
Merger
The merger is organized as nested branch loops. An output of the merger is an N -tuple of groups, where N is the number of branches. Every merger branch
Implementation of a Stream-Based IP Flow Record Query Language
153
represents a for-loop over its records. Each branch loop reads a record from the corresponding record group and executes the matching rules which have their arguments in the current record group tuple. The branches are organized into a nested structure (by alphabetical order). After matching the tuple with its rule, the branch passes it to the lower level, which adds a record from its branch to the tuple and executes any further rules. One of the merger requirements is that at least one of the Allen operators must be present. In order to improve the efficiency of the merger operation, a time index is used to find only records which have the possibility of satisfying the Allen operators used in the merger. If there is an Allen relation A < B, the branch B loop does not need to iterate over all of its records for each record in A. It can iterate only over records that occur after the current record. In general, if the left argument of an Allen relation is known, it imposes a restriction on the possible right arguments. 2.9
Ungrouper
Ungrouper objects are used to ungroup merger output. Record group tuples are expanded into records. The order of the flow records is determined by the order of the groups that are expanded, while those records within a group are ordered by their record IDs.
3
Performance Evaluation
In our performance analysis we employed a Python profiler function written by Maciej Obarski [7], to achieve multi-threaded profiling of the Python program execution. The output of a profiler consists of three metrics: the total number of calls to each function and the total wall clock and system times spent between function call and function return. A sample output of a profiler for a flush method looks as follows: ((’flush’,’/usr/local/[... ]/tables/table.py’,2408), (12, 0.02, 0.01))
This output displays the function name and the location in the source code it was initiated at, as well as the three other metrics we mentioned above. Establishing a common comparison metric out of these three is rather evident. Obviously, the number of function calls alone is not a good metric, since the cumulative time of these calls may be much less than the time spent on some other functions with less calls. Execution times experience a similar problem, due to calling other functions internally, which would mean that a certain function may be called only a few times, but result in a large execution time. The metric used for finding worst performing functions was time spent per function call or the average time per function call. This value has been calculated by dividing the wall clock time spent between function call and function return, divided by the total number of calls. We executed a simple two-branch Flowy query on a very short flow trace, as a baseline for comparison with other tools. The time it has taken to initiate, validate and process the query was ≈ 3 seconds (Intel(R) Pentium(R) 4 CPU 3.00GHz, 512MiB System Memory).
154
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
At the evaluation stage we employed several queries (simple source/destination ports or addresses extraction, queries with more than two branches, and so on) in order to gain more understanding of the processes that are happening at different stages. All the queries were executed on differently sized flow traces. The traces have been collected by regular users, who had the necessary exporting and capturing tools installed on their machines. The queries have been evaluated on the traces of ≈ 26K, ≈ 57K, ≈ 100K and ≈ 300K records. Here we present the profiling results of a single query, which was defined and discussed in Section 2. The profiler has shown that the heaviest processing load was experienced in the filter, grouper and merger stages. The profiler results show that for all four traces the worst performing functions/methods were similar. In a run with ≈ 57K records we see that the functions coming from the filter, grouper and merger stages are among worst performers, as shown below: ((’reset’, ’/[...]/flowy/filter.py’, 45), (56992, 255.76, 262.48)) ((’match’, ’/[...]/flowy/merger.py’, 23), (527995, 64.41, 62.12)) ((’match’, ’/[...]/flowy/grouper.py’, 126), (1740570, 498.78, 495.35))
Almost the same set of functions is performing the worst for other flow traces. We can see that two out of those are match() functions defined in the corresponding classes. Internally, the match() functions perform different tasks, for instance the match() of the grouper stage performs rule comparisons on the records, and the number of these comparisons for a trace of ≈ 57K records is 1740570. The match() function of the merger module also performs record comparisons based on the specified rules and with the use of Allen’s relations. We can see that the number of match() comparisons is significantly smaller at the merger stage, since less records arrive at this stage than at the grouper stage. The reset() function of the filter stage internally performs deep copying for each record and is specified by the standard deepcopy() Python method. The deepcopy() operation is heavy in itself, since it needs to consider various data structures, but its use in Flowy grows linearly with the number of records. The prevalence of the grouping operation time requirement increases with the number of records at each branch, and follows the performance trend of the two-branch merger up to a certain amount of branch records that passed the filter, approximately 1000 records. However, the number of records is not the only factor for decreasing performance. The number of rules at each of the groupers also influences the overall running time, since each record needs to be compared to more rules. A simple test has shown that an increase in the number of rules at each of the groupers by one, increases the running time of the grouper stage roughly by a factor of two (extra iteration over filtered records). The above statement may vary significantly based on many factors and conditions, of course. Very high grouper performance deterioration comes with small delta values. The grouper potentially needs to traverse through many more records in order to find a matching record that is ”close enough” in time. The merger consumed a large share of the execution time compared to other stages. This became more evident with an increase of the number of filtered records at each
Implementation of a Stream-Based IP Flow Record Query Language
155
12000
8000
Spliter stage Splitter stage fit Grouper stage Grouper stage fit Merger stage Merger stage fit
Merger stage, branch A, fit Grouper stage, branch A, fit Splitter stage, branch A, fit Merger stage, branch B, fit Grouper stage, branch B, fit Splitter stage, branch B, fit
10000
Time (seconds)
Time (seconds)
10000
6000
4000
8000
6000
4000
2000
2000
0
0 50
100
150
200
250
300
Number of records (thousands)
Fig. 2. Total records vs. run time
0
2
4
6
8
10
12
14
16
Number of records in branches (thousands)
Fig. 3. Branch records vs. run time
branch. In general the running time requirements of the HTTP download query can be summarized by Fig. 2. It can easily be noticed that time requirements of the two-branch merger module dominates those of the other modules, especially evident with an increase in record numbers. Nevertheless, Fig. 2 is somewhat inaccurate, since if we executed another simple query (that makes use only of the filter stage) on the same flow traces, most of the time would be consumed in the filtering section, and not in the merger. For that reason we produced another plot in Fig. 3, which is ”more objective” towards the stages that follow the filtering process, and shows the number of filtered records at each branch versus the running time. Similar to the grouper stage, the merger stage depends on the number of defined rules. The larger the number of rules that needs to be specified, the higher is the running time. Results presented in Figures 2 and 3 confirm the worst-case running times of the stages. From previous sections we are aware that both filter and group-filter perform absolute filtering of the records, and, therefore, directly depend on the number of records. The worst time of these modules is O(n), where n is the number of input records or record groups. Similarly, the ungrouper needs to iterate over the resulting record groups only ones, but with consideration that it needs to retrieve the original records, which could influence its performance depending on the storage and retrieval methods used. The grouper stage needs to find a group match for each of the considered records. It selects one record and iterates through the rest of the records trying to find a match. In the worst case, each of the records will form a self-contained group and given that scenario, the program will need to iterate at most O(n2 ) times, where n is the number of input records. The merger is considered last, and it is actually the most time-demanding stage. It directly depends on the number of branches present and the number of branches in each of the groups. It will iterate through all the groups of each of the branches trying to find a match, which would result in the complexity of O(mn ) times, where n is the number of branches and m is the number of groups in each branch. As a simple confirmation of significant performance degradation
156
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
of the merger stage, we evaluated a query with three branches. This caused the evaluation time to increase by a large factor. The problem with performance is connected both to the structure of the program, i.e., an increase in the number of branches causes high overall complexity of the tool, and to the language the tool was implemented in. A Python implementation might provide a good start as a prototype tool, but not when it comes to real usage, with potential processing of a large number of records. Many standard methods of Python that are used in the program consider different cases (i.e., deepcopy()) and are, thus, not optimal. Rewriting those funtions, but optimized for the performed tasks could be a potential run-time improvement. As such, we redefined the deepcopy() method for a simple dictionary data set inside the reset() method of the filter module. This improved the runtime of that particular code piece by a factor of eight: original - ((’reset’, ’/[...]/filter.py’, 45), (56992, 255.76, 262.48)) modified - ((’reset’, ’/[...]/filter.py’, 64), (56992, 31.72, 31.21))
Abandoning Python altogether, and switching to a language like C instead, could potentially improve the run time significantly.
4
Related Work
Existing flow query languages can be divided into three categories: filtering languages, procedural languages and SQL-based languages [8]. The Berkley Packet Filter (BPF) [9] allows to filter network traces by fields such as source/destination IP address and source/destination ports. The filtering mechanism consists of simple filter expressions converted into executable programs. Popular tools, like tcpdump and nfdump, are based on BPF expressions. The CoralReef network analysis tool [10] also uses BPF expressions for generating reports from collected trace files. Another well-known filtering language is the flow-tools suite [11]. It consists of several applications that allow to collect and analyze NetFlow data. Flow tools capture and filter flows, create reports based on different flow record fields, and display filtered or original flow records. The Simple Rules Language (SRL) [12] is a procedural language for defining traffic flows. It is used to specify filtering rulesets that instruct flow meters about which traffic flows are of interest and which flow attributes are to be collected and stored on the meters. FlowScan [13] is another instance of a procedural language. It consists of a number of perl scripts that produce a flow collection tool. FlowScan can generate rather general and high-level traffic reports that could be helpful in detecting particular traffic patterns. The script-based language SiLK [14] is a collection of commands for querying NetFlow data with its own filter expression primitives. SiLK allows to label a set of flows aggregated by a common attribute. The rwgroup application iterates over flow records and groups those that have common attributes. A possible use-case scenario would be the aggregation of flows belonging to an FTP session. The rwmatch application, on the other hand, creates matched groups that consist of an initial record, followed
Implementation of a Stream-Based IP Flow Record Query Language
157
by many more. An example of rwmatch’s application could have been the HTTP download session from Section 2. The last group of applications considered here is based on SQL. A system that uses standard MySQL and an Oracle DBMS for storing attributes of NetFlow records is described in [15]. Using SQL queries, the tool can provide strong support for basic intrusion detection and usage statistics. The queries could be as complex as establishing a list of IP addresses from external autonomous system that have contacted a large number of internal IP addresses. The Data Stream Management System (DSMS) [16] was an improvement over DBMS. DSMS supports some of the DBMS’ missing features, i.e., modeling flows as transient data streams, as opposed to the persistent relational data model. Another example of SQL-based flow query language systems is Gigascope [17], a stream database for network monitoring applications that uses GSQL for query and filtering. GSQL is a modified version of SQL, which allows to define time windows inside the query. GSQL supports selection, aggregation, join and merge operations. A typical query is subdivided into low-level (preliminary filtering and simple aggregation) and high-level (possible BPF invocation and complex aggregation operations using data cube computation algorithms) processing parts.
5
Conclusion
A prototype implementation of a stream-based flow query language called Flowy has been described and evaluated by a number of profiling tests. The test results indicated certain bottlenecks when the number of flow input records increases substantially. We identified two primary reasons for the performance problems: 1. The implementation in a high-level language causes performance problems due to slow execution on critical code sections. 2. The growths of complexity with the number of branches and group records in the merger. The implementation exploits concurrency by processing branches in different independent threads. Our current work explores a more coarse grained level of concurrency by applying the MapReduce framework [18] and distributing flow processing over multiple machines.
Acknowledgement The work reported in this paper is supported by the EC IST-EMANICS Network of Excellence (#26854).
References 1. Marinov, V., Sch¨ onw¨ alder, J.: Design of a Stream-Based IP Flow Record Query Language. In: DSOM 2009, pp. 15–28. Springer, Heidelberg (2009) 2. Allen, J.F.: Maintaining Knowledge About Temporal Intervals. Communications of the ACM 26(11), 832–843 (1983)
158
K. Kanev, N. Melnikov, and J. Sch¨ onw¨ alder
3. Alted, F., Vilata, I., et al.: PyTables: Hierarchical datasets in Python (2002), http://www.pytables.org/ 4. Beazley, D.M.: Ply, python lex-yacc (2001), http://www.dabeaz.com/ply/ 5. Folk, M., McGrath, R.E., Yang, K.: Mapping HDF4 Objects to HDF5 Objects. Technical report, National center for supercomputing applications, University of Illinois (2002) 6. Claise, B.: Cisco Systems NetFlow Services Export Version 9. RFC 3954, Cisco Systems (October 2004) 7. Obarski, M.: Profiling python threads (01-02-2010), http://code.activestate.com/recipes/465831/ 8. Marinov, V., Sch¨ onw¨ alder, J.: Design of an IP Flow Record Query Language. In: Hausheer, D., Sch¨ onw¨ alder, J. (eds.) AIMS 2008. LNCS, vol. 5127, pp. 205–210. Springer, Heidelberg (2008) 9. McCanne, S., Van Jacobson.: The BSD Packet Filter: A New Architecture for User-level Packet Capture. In: USENIX 1993, Berkeley, CA, USA, p. 2. USENIX (1993) 10. Moore, D., Keys, K., Koga, R., Lagache, E., Claffy, K.C.: The CoralReef Software Suite as a Tool for System and Network Administrators. In: LISA 2001, Berkeley, CA, USA, pp. 133–144. USENIX (2001) 11. Romig, S.: The OSU Flow-tools Package and CISCO NetFlow Logs. In: LISA 2000, Berkeley, CA, USA, pp. 291–304. USENIX (2000) 12. Brownlee, N.: SRL: A Language for Describing Traffic Flows and Specifying Actions for Flow Groups. RFC 2723, University of Auckland (October 1999) 13. Plonka, D.: FlowScan: A Network Traffic Flow Reporting and Visualization Tool. In: LISA 2000, Berkeley, CA, USA, pp. 305–318. USENIX (2000) 14. CERT/NetSA at Carnegie Mellon University. SiLK (System for Internet-Level Knowledge), http://tools.netsa.cert.org/silk [Accessed: July 13, 2009] 15. Nickless, B.: Combining Cisco NetFlow Exports with Relational Database Technology for Usage Statistics, Intrusion Detection, and Network Forensics. In: LISA 2000, Berkeley, CA, USA, pp. 285–290. USENIX (2000) 16. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: PODS 2002, pp. 1–16. ACM, New York (2002) 17. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a Stream Database for Network Applications. In: SIGMOD 2003, pp. 647–651. ACM, New York (2003) 18. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004, Berkeley, CA, USA, p. 10. USENIX (2004)
Towards Flexible and Secure Distributed Aggregation Kristj´ an Valur J´ onsson1,2, and Mads F. Dam1, 1
Royal Institute of Technology (KTH), Stockholm, Sweden [email protected], [email protected] 2 Reykjavik University, Iceland [email protected]
Abstract. Distributed aggregation algorithms are important in many present and future computing applications. However, after a decade of research, there are still numerous open questions regarding the security of this important class of algorithms. We intend to address some of these questions, mainly those regarding resilience against active attackers, whose aim is to compromise the integrity of the aggregate computation. Our work is currently in its initial stages, but we have identified promising research leads, which we present in this paper.
1
Introduction
The past decade has shown distributed algorithms to be a practical and scalable approach to a wide range of applications, as demonstrated by various peer-topeer systems, e.g. BitTorrent and Skype. We are interested in distributed aggregation algorithms, which efficiently aggregate local measurements in an efficient and scalable manner by means of in-network processing. These protocols can be roughly categorized by families, the most prominent ones based on gossiping and spanning-tree overlays. Our research is driven by network management applications, where distributed aggregation algorithms have been shown to increase scalability and efficiency in monitoring and management systems [1]. Reliance on distributed algorithms, for monitoring of critical networked systems, motivates a thorough review of their security properties. We hope to contribute to this field of research, as will be outlined in this paper. We are currently working on countermeasures against active insider adversaries, whose objective is to compromise the integrity of the in-network aggregate computation. This particular subject has received considerable attention in the past few years [2, 3, 4], most prominently in sensor networks research, but important issues still remain open. In general, the prior research has focused on patching specific aggregation protocols to increase resilience, whereas we hope
This work is supported in part by grant #080520008 from Rann´ıs, the Icelandic research fund, and funding by Reykjavik University. Work was partially supported by the EU FP7 project 4WARD and a personal grant from the Swedish Research Council.
B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 159–162, 2010. c IFIP International Federation for Information Processing 2010
160
K.V. J´ onsson and M.F. Dam vc
7 c
va a
vb
7 o
7 o
b
]
(a)
(b)
Fig. 1. (a) aggregation nodes, (b) Aggregation and supervisory layers
to develop more broadly applicable methods, focusing on secure management of dynamic networked systems.
2
A Motivating Example
Let us first present a small motivating example of a managed system of untrusted (compromisable) workstations, in which we require an aggregate view of some local input, perhaps for detection of anomalous events. We assume here a spanning-tree-based aggregation network, utilizing in-network aggregation by the managed nodes themselves for scalability. A fraction of such a network is shown in Figure 1(a). Nodes a and b submit partial aggregate updates to their parent c, which is expected to correctly compute a update over its own state and received inputs, and forward a single aggregate message upwards. Now, consider the case in which c is compromised. The adversary may attempt to influence the aggregate computation by misrepresenting contributions, e.g. inflating or deflating the aggregate, manufacturing fictious inputs or ignoring legitimate ones. It is important to realize that the compromised node c does not risk detection of such actions in a typical unsecured network of the type described. It is obvious that a small number of compromised nodes, unrestrained against this class of attacks, may render an aggregation network ineffective, demonstrating the importance of countermeasures, such as the ones proposed in this paper.
3
Problem Statement
We consider means of increasing the resilience of dynamic distributed aggregation networks of untrusted nodes against inside attackers, whose objective is to stealthily compromise the in-network aggregate computation. Our goal is to ensure that, under suitable constraints on the network and distribution of adversaries, the network either performs the computation correctly, or else an adversary breaking the protocol is identified with some non-zero probability. The problem has been considered in the distributed systems literature for the past decade, e.g. [2, 3, 4]. Significant results have been achieved, but the previous work generally involves considerable messaging overhead, and makes
Towards Flexible and Secure Distributed Aggregation
161
specific protocol and graph topology assumptions. The most voluminous body of work is on the popular research field of sensor networks, and addresses the particular challenges of these resource constrained networks. In contrast to most previous work in the field, our objective is to develop protocols applicable to highly dynamic networks and a wide range of aggregationand transport protocols in a network management context.
4
Distributed Security Layer
We approach the development of our solution in a methodical top-down fashion, employing sound systems design methodologies, with theoretical backing as warranted. The starting point is an idealized specification1 . A well-known result from multi-party computing is that any distributed function can be computed securely via a trusted authority: every party simply submits its input to the trusted authority, which computes the function and hands the output back. This formulation can obviously be applied to our problem, which gives us the best case guarantees, as further defined by adversarial modeling and other systems assumptions. A trusted authority can indeed be implemented in form of a trusted server, accepting inputs from a population of managed nodes. This solution is the well-known centralized management model and suffers from obvious scalability problems. We intend to explore means of approaching the ideal functionality for a given set of assumptions, but to do so in a scalable and efficient manner. We outline one possible approach to such an approximation in this paper: a distributed security layer S which accepts cryptographically secure commitments from the managed nodes and may interact with them to enforce a security policy. An example is shown in Figure 1(b). The objective is to remove or deduce the opportunities of compromised nodes to manipulate the aggregate computation undetected. Previous work on accountability systems [5] considers similar objectives. However, we plan to develop lower impact protocols, from the perspective of the managed nodes themselves. A practical approach to constructing S is to build on a secured distributed hash table (DHT). A naive utilization of S is to require aggregation nodes to commit all messages sent and received. This method can be shown to be equivalent to the ideal specification, but the associated overhead is prohibitive. A more promising approach is to require smaller commitments and employing a spot-checking protocol in which S randomly selects nodes to audit – interrogating in some detail to ascertain that the node acted correctly in some past round. We believe this approach will result in an efficient protocol, tuneable to give acceptable detection probability of compromised nodes. We further believe that, in conjunction with robust reputation mechanisms, this method will prove to be an useful tool for increasing the robustness of in-network aggregation. 1
We are inspired by Canetti’s work on Universal composability and the ideal functionality modeling. However, we do not plan to apply this methodology literally at this time.
162
K.V. J´ onsson and M.F. Dam
The approach described requires minimal changes to the basic aggregation system: we assume that each node has a unique and verifiable identity, as well as a set of cryptographic keys for signing or authenticating produced messages as well as commitments sent to S. Managed nodes must implement a small protocol to interact with S, but little or no modifications to the aggregation protocol itself are required. A clear conceptual boundary can be drawn between the duties of the aggregation protocol and the security mechanisms. Spot checking behavior in past rounds also effectively decouples the security mechanisms from the graph structure, meaning that they are tolerant to churn and other dynamic effects. Implementing efficient protocols for spot-checking requires distributing the secure storage duties into the population of untrusted nodes to some degree, e.g. using primitives for construction of a unmodifiable log [6]. This unavoidably blurs the boundaries between the trusted and untrusted overlays. Even further distribution of the security service into the untrusted node population may be considered, even to the extent of distributing S altogether into the general aggregation node population, in a similar manner to the proposals by [2, 4]. However, such approaches require stricter assumptions on the network type and topology, as well as considerable communications overhead on the aggregation nodes themselves.
5
Concluding Remarks
Secure distributed aggregation has been our focus for the last few months and the work is still in its initial stages. Our focus is on the integrity of in-network aggregate computations and resilience against active insider attackers. Privacy is a closely related, but complimentary topic, which we may consider in parallel. We have identified a promising research direction in the formulation of our distributed security service S. Numerous variations and optimizations of this concept can be envisaged, and our proposed approach is sufficiently general to be applicable to a variety of distributed aggregation networks and security issues. We plan to approach the design of such a system from a practical systems oriented angle, backed up by theoretical work as warranted. Next steps include a full design of S and formulation of commitment and checking protocols.
References [1] Dam, M., Stadler, R.: A generic protocol for network state aggregation. In: RVK 2005, Link¨ oping, Sweden (2005) [2] Chan, H., Perrig, A., Song, D.: Secure hierarchical in-network aggregation in sensor networks. In: CCS, pp. 278–287. ACM, New York (2006) [3] Chan, H., Perrig, A., Przydatek, B., Song, D.: SIA: Secure information aggregation in sensor networks. Journal of Computer Security 15(1), 69–102 (2007) [4] Yang, Y., Wang, X., Zhu, S., Cao, G.: SDAP: A secure hop-by-hop data aggregation protocol for sensor networks. In: MobiHoc 2006, New York, NY, pp. 356–367 (2006) [5] Haeberlen, A., Kouznetsov, P., Druschel, P.: PeerReview: Practical accountability for distributed systems. SIGOPS Oper. Syst. Rev. 41(6), 175–188 (2007) [6] Chun, B.G., Maniatis, P., Shenker, S., Kubiatowicz, J.: Attested append-only memory: making adversaries stick to their word. SIGOPS Oper. Syst. Rev. 41(6), 189– 204 (2007)
Intrusion Detection in SCADA Networks Rafael Ramos Regis Barbosa and Aiko Pras University of Twente Design and Analysis of Communication Systems (DACS) Enschede, The Netherlands {r.barbosa,a.pras}@utwente.nl
Abstract. Supervisory Control and Data Acquisition (SCADA) systems are a critical part of large industrial facilities, such as water distribution infrastructures. With the goal of reducing costs and increasing efficiency, these systems are becoming increasingly interconnected. However, this has also exposed them to a wide range of network security problems. Our research focus on the development of a novel flow-based intrusion detection system. Based on the assumption that SCADA networks are well-behaved, we believe that it is possible to model the normal traffic by establishing relations between network flows. To improve accuracy and provide more information on the anomalous traffic, we will also research methods to derive a flow-based model for anomalous flows.
1
Introduction
Large industrial facilities such as water distribution infrastructures, electricity generation plants and oil refineries need to be continuously monitored and controlled to assure proper functioning. SCADA (Supervisory Control and Data Acquisition) systems are commonly deployed to aid these actions, by automating telemetry and data acquisition. Historically, SCADA systems were believed to be secure because they were isolated networks: an operator station, or humanmachine interface (HMI), connected to remote terminal units (RTUs) and programmable logic controlers (PLCs) through a proprietary purpose-specific protocol. Yielding to market pressure, that demands industries to operate with low costs and high efficiency, these systems are becoming increasingly more interconnected. Many of modern SCADA networks are connected to both the company’s corporate network and the Internet[1]. Furthermore, it is common that the HMI is a commodity PC, which is connected to RTUs and PLCs using standard technologies, such as Ethernet and WLAN (see Figure 1). This has exposed these networks to a wide range of security problems. Probably the most well-know attack to a SCADA system happened at Maroochy Water Services in Australia [2]. An attacker was able to successfully interfere with the communications, causing pumps not to work properly and preventing alarms to be sent. Areas were flooded and rivers polluted with sewage. Another example happened in 2003, when the Davis-Besse nuclear power plant in Ohio was infected with B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 163–166, 2010. c IFIP International Federation for Information Processing 2010
164
R.R.R. Barbosa and A. Pras
Fig. 1. Typical modern SCADA topology.
the Slammer worm [3]. The attack made the network highly congested, causing safety and plant process systems to fail for several hours. In the face of these problems, SCADA security has become a main concern of both industry and government, leading to several efforts to increase security in these industrial networks. The American National Institute of Standards and Technology (NIST) published a guideline document that identifies several threats and vulnerabilities of such networks and discusses recommended countermeasures [4]. A report by the Netherlands Organization for Applied Scientific Research (TNO) describes thirty-nine SCADA Security Good Practices for the drinking water sector [5]. This paper describes our research proposal to address the problem of intrusion detection in SCADA networks. Based on the assumption that the traffic in these networks is well-behaved, we plan to build models for the network traffic based on relations between network flows, and detect attacks as violations of these models. In addition, we will research methods to create similar models to anomalous traffic, creating attack signatures at flow level.
2
Related work
There is extensive work in the area of intrusion and anomaly detection in computer networks. In this section we focus our literature review on SCADA networks. In [6], an application-specific intrusion detection system (IDS) for embedded systems, such as RTUs and PLCs, is proposed. Security polices are generated by a middleware that constantly monitors an application to define the accepted behaviour. When detecting a policy violation, the middleware can take actions that range from logging events to terminating connections. Valdes et al. [7] proposes protocol-level models for intrusion detection in process control networks. These models describe expected values for packet fields, relations between dependent fields in one or multiple packets. The challenges involved in securing controls systems are discussed in [8]. The main idea behind their approach for intrusion detection is to understand the interactions between the network and the physical system they control. Packets are considered normal or abnormal based on the effect they have in the control system. Another approach, described in [9], is based on the observation that the
Intrusion Detection in SCADA Networks
165
contents of random access memory (RAM) of PLCs follow specific flows that persist over time. Packets are classified as normal or abnormal, by considering the effects they have in the contents of a PLC’s RAM. In contrast to the individual packet inspection described in these works, our proposal aims to detect intrusions at the network level, by analysing relation between flows. We argue that the former solutions are not suitable to detect attacks such as Denial of Service (DoS) and port scans.
3
Approach
Our goal is to develop an anomaly-based IDS using flows to model the network traffic. We want to describe the network traffic by finding relations between flows. Consider a typical activity that generates data in SCADA networks, a server polling field devices for data. In consequence of this activity several connections will be generated, one for each field device. As polling is commonly periodic, this would create a clear flow pattern in the network. In the case an engineer manually starts a polling instance, a different set of connections would be observed. For example, a connection to the authentication server might be added. A flow model for this activity would include all possible variations. We envision a IDS capable of automatically generating flow models and detect anomalies as violation of such models. While it might be too hard to use this approach to describe all traffic in a enterprise IT network, we believe it is possible to use it to model SCADA traffic. Our assumption is that SCADA traffic is wellbehaved, when compared with traditional IT systems. This is due a number of reasons: – Fixed number of network devices. As a critical infrastructure, availability is a main concern in SCADA. The number of servers and clients rarely change over time. This does not hold for traditional IT networks, where new clients can be easily added and, normally, the consequences of a server being offline for a short period are not so severe. – Limited number of protocols. Traditional IT networks might provide a multitude of services, such as web browsing, email, instant messaging, voice over IP and file sharing. This is not expected in a SCADA network. – Regular communication patterns. Most of the SCADA traffic is generated in a polling fashion. Masters query a number of slaves for data, and only eventually a slave starts the communication to notify a significant event. In contrast, given the large amount of protocols, and the quantity of humangenerated connections, the traffic in traditional IT networks is too unpredictable. As a complement to this approach, we plan to investigate how to create similar models to anomalous traffic. The objective is to increase the accuracy, by creating flow-level signatures of attacks, and also to provide more information about the anomalous traffic. We believe that modelling the anomalous traffic is a more challenging task, as attackers are motivated to conceal their activities.
166
R.R.R. Barbosa and A. Pras
In summary, the main research question we propose to answer is: how to build network traffic models based on correlation between flows? We will evaluate what techniques are best fit to cluster flows to build such network models. We consider techniques such as deterministic finite automata, used in [10] to create flow models that identify application sessions, and Markov models. In addition, to deal with problems like what attacks we should consider and where in the network the IDS should be deployed, we plan to carry out a vulnerability assessment to refine our notion on what are the biggest threats to SCADA networks.
4
Validation
In order to validate our findings we intend to use real-world traffic captured in a Dutch water distribution infrastructure. However, due to the critical nature of the network, we might not be able to perform all the necessary measurements. If necessary, we also consider to study other protocols with similar characteristics to SCADA traffic (i.e., regular communication patterns), such as SNMP, and to make use of simulation models.
References 1. Igure, V., Laughter, S., Williams, R.: Security issues in SCADA networks. Computers & Security 25(7), 498–506 (2006) 2. Slay, J., Miller, M.: Lessons learned from the maroochy water breach. International Federation for Information Processing 253, 73 (2008) 3. Beckner, W.D.: NRC Information Notice 2003-14: Potential Vulnerability of Plant Computer Network to Worm Infection (2003) 4. Luiijf, E.: SCADA Security Good Practices for the Drinking Water Sector. Technical Report - tno.nl (2008) 5. Stouffer, K., Falco, J., Kent, K.: Guide to supervisory control and data acquisition (SCADA) and industrial control systems security (2006) 6. Naess, E., Frincke, D., McKinnon, A., Bakken, D.: Configurable MiddlewareLevel Intrusion Detection for Embedded Systems. In: 25th IEEE International Conference on Distributed Computing Systems Workshops, pp. 144–151 (2005) 7. Valdes, A., Cheung, S.: Intrusion Monitoring in Process Control Systems. In: Proceedings of the Forty-Second Hawaii International Conference on System Sciences, p. 17 (2009) 8. C´ ardenas, A., Amin, S., Sastry, S.: Research challenges for the security of control systems. In: Proceedings of 3rd USENIX workshop on Hot Topics in Security (HotSec), San Jose, CA, USA (2008) 9. Rrushi, J., Kang, K.d.: Detecting Anomalies in Process Control Networks. In: Critical Infrastructure Protection III: Third IFIP WG 11. 10 International Conference, Hanover, New Hampshire, USA, pp. 151–165. Springer, Heidelberg (2009) 10. Kannan, J., Jung, J., Paxson, V., Koksal, C.: Semi-automated discovery of application session structure. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, p. 132. ACM, New York (2006)
Cybermetrics: User Identification through Network Flow Analysis Nikolay Melnikov and J¨ urgen Sch¨ onw¨ alder Computer Science, Jacobs University Bremen, Germany {n.melnikov,j.schoenwaelder}@jacobs-university.de Abstract. Recent studies on user identification focused on behavioral aspects of biometric patterns, such as keystroke dynamics or activity cycles in on-line games. The aim of our work is to identify users through the detection and analysis of characteristic network flow patterns. The transformation of concepts from the biometric domain into the network domain leads to the concept of a cybermetric pattern — a pattern that identifies a user based on her characteristic Internet activity. Keywords: Cybermetrics, User Identification, Network Flow Analysis.
1
Introduction
The increasing usage of the Internet in our daily lives led us to believe that Internet citizens have developed a distinguishable individual browsing pattern and style. Network flow traces recording personal browsing sessions should contain patterns, representing the users’ characteristic cybermetrics. The cybermetric is assumed to reflect user’s priorities during a browsing activity, the sequence of the performed steps at each new Internet browsing session, the pool of destinations visited on the Internet and several other features pertaining to that user’s characteristic network usage. Having a mechanism for identifying users based on their cybermetrics provides a set of advantages for the purpose of network management, system administration, and security. For example, cybermetrics might be used to grant access to specific services or to verify the identity of a user when she is calling the helpdesk. In the following section, we state the research questions. We then report some initial experimental results analyzing the impact of the length of network flow traces on the calculation of cybermetrics. We briefly review related work before we conclude our paper.
2
Research Questions
The goal of our research is to identify and distinguish users based on the Internet flows generated by them. The following questions require further investigation: 1. What is a suitable set of features of a flow trace for user identification? 2. What are suitable mathematical methods that can be employed? B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 167–170, 2010. c IFIP International Federation for Information Processing 2010
168
N. Melnikov and J. Sch¨ onw¨ alder
6
Cross−correlation of traces with different time durations (not−normalized)
x 10 6
nik1 week vs nik2 weeks
nik1 day vs nik2 weeks 5
Cross−correlation value
4
3
2
1
0 nik1 week vs js2 weeks
nik1 day vs js2 weeks 2
3
10
10 Duration(sec)
Fig. 1. Cross-correlation of traces for the feature “duration of https connections”
3. Which thresholds can reliably detect feature similarity (or dissimilarity)? 4. What is a scalable approach to automate the user identification process? It is evident that a plain comparison of two users’ flow traces will result in much noise and would not be a good comparison technique overall. It is therefore necessary to identify a set of features, which have a high potential to differentiate users. The analysis and comparison steps require the usage of proper mathematical methods. While analyzing feature sets, it is important to be able to establish evidence of the similarity of feature sets, or evidence of the dissimilarity of feature sets, or to conclude that no evidence can be derived. Once a suitable user identification technique has been found, we must consider how it can be implemented in a scalable manner.
3
Study of the Impact of the Length of Flow Traces
At the beginning of our research, we wanted to know how cybermetrics may be impacted by the length of flow traces. For our experimental study, we asked several people to collect their personal flow traces by collecting flow records originating from their personal computers. Considering a flow trace spanning a large number of days, it is expected that the number of longer flows increases compared to shorter traces. Fig. 1 shows the cross-correlation of the feature “duration of https connections” for traces of different lengths and of different users. The plot indicates a strong correlation of this feature for traces coming from the same user. It also indicates a time shift of high correlation values when the lengths of the traces increase. The overall cross-correlation shows a stronger similarity for the traces obtained from the
Cybermetrics: User Identification through Network Flow Analysis
169
Fig. 2. Scatter-plot of smoothed flow volume (data carried in a flow) vs. flow duration for traces of different lengths for https (top) and ssh (bottom)
same user, nik. The cross-correlation of traces from different users (nik and js) does not indicate strong similarity. We also wanted to know how the dynamics of the flow volumes depend on the length of the flow traces. The dynamics of two large data sets (coming from the same user) for the two features “volume of https connections” and “volume of ssh connections” is displayed in Fig. 2 (we used the Loess quadratic fit for smoothing the data shown in Fig. 2). In each plot, one curve shows flow data collected over a two-week period (March 10-24, 2009), while the other one shows flow data for five weeks (March 1 - April 4, 2009). The dynamics of the curves in each plot are quite similar. Essentially for smaller durations (< 5000 seconds), the relationship between the duration and the amount of octets carried is similar for ssh connections. However, a decrease of similarity can be observed for some of the flows that lasted between 500 and 1000 seconds for ssh connections of a five-week long flow trace. The https connections were in general shorter than ssh connections. Furthermore, the five-week flow traces had more occurrences of longer lasting flows. The amount of data carried in flows stayed almost constant independently of the length of the traces. The strong match of the flow volume for most of the flow durations is a good indicator of similarity.
4
Related Work
There are two areas that are closely related to our research. The first area of research deals with user identification methods based on the behavioral and activity features of a user. The idea of user recognition and identification by exploitation of biometric patterns has long been known [1], [2]. More recent studies look at dynamics of certain actions performed by the user — be it an on-line game-play activity [3], which shows that the idle and active times in the
170
N. Melnikov and J. Sch¨ onw¨ alder
game are representative of the user; a keystroke analysis [4], which provides an impressive 96% correctness rate at user differentiation; or user-mouse interaction dynamics [5], establishing a behavioral characteristic that can be used as an additional security feature. The second area of research uses passive network traffic monitoring techniques for performance analysis, application type/protocol identification, anomaly and intrusion detections. In [6] the authors propose a novel identification method for revealing Peer-to-Peer traffic. The authors of [7] detect, classify and understand anomaly structures using entropy as a metric of unusual changes in the distribution of traffic features. A more recent study [8] proposes an on-line anomaly detection algorithm that has no prior knowledge about what is normal and abnormal traffic.
5
Conclusion
This paper discusses the possibility of user identification using flow trace analysis. We state our research questions and provide some preliminary results, indicating that the length of the traces can have significant impact on certain flow features.
Acknowledgement The work reported in this paper is supported by the EC IST-EMANICS Network of Excellence (#26854).
References 1. Holmes, J.P., Wright, L.J., Maxwell, R.L.: A performance evaluation of biometric identification devices. Technical report, Sandia National Laboratories, Albuquerque, NM (1991) 2. Ashbourn, J.: Biometrics: advanced identity verification. Springer, London (2000) 3. Chen, K.-T., Hong, L.-W.: User identification based on game-play activity patterns. In: Proc. of the 6th ACM SIGCOMM Workshop on Network and System Support for Games (NetGames 2007), pp. 7–12. ACM, New York (2007) 4. Bergadano, F., Gunetti, D., Picardi, C.: User authentication through keystroke dynamics. ACM Transactions Information System Security 5(4), 367–397 (2002) 5. Ahmed, A.A.E., Traore, I.: A new biometric technology based on mouse dynamics. IEEE Transactions on Dependable and Secure Computing 4(3), 165–179 (2007) 6. Per´enyi, M., Dang, T.D., Gefferth, A., Moln´ ar, S.: Identification and analysis of peer-to-peer traffic. JCM 1(7), 36–46 (2006) 7. Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. SIGCOMM Computer Communication Review 35(4), 217–228 (2005) 8. Stoecklin, M.P., Boudec, J.-Y.L., Kind, A.: A two-layered anomaly detection technique based on multi-modal flow behavior models. In: Claypool, M., Uhlig, S. (eds.) PAM 2008. LNCS, vol. 4979, pp. 212–221. Springer, Heidelberg (2008)
Distributed Architecture for Real-Time Traffic Analysis Cristian Morariu and Burkhard Stiller Department of Informatics, University of Zürich CH-8050, Zürich, Switzerland {morariu, stiller}@ifi.unizh.ch Abstract.Traditional real-time IP traffic analysis applied on todays’ highspeed network links suffers from the lack of scalability. Although sampling proves to be a promising approach, there are application scenarios foreseen, in which decisions cannot be based on sampled data, , for usage-based charging or intrusion detection systems. Moreover, traditional traffic analysis mechanisms do not map the traffic observed in the network to a particular user, but rather to a particular end-node, which may have been shared by several users. Thus, DARTA (Distributed Architecture for Real-time Traffic Analysis) develops a model for distributed IP traffic analysis and introduces new mechanisms for three different aspects in IP traffic monitoring: (a) a framework enabling the development of distributed traffic analysis applications, (b) a distributed packet capture mechanism, (c) an user-based IP traffic accounting for mapping IP traffic to individual users.
1
Introduction
Since the first days of Internet, the traffic carried by network operators increased year by year. Studies have shown that the yearly increase of traffic observed in several large network operators during the last decade ranged between 50% and 100% [5], [6], [8] with steep increases in the last years for the mobile Internet traffic segment. Moreover, [1] sees this trend to continue at least until 2012, when the Internet traffic will grow to approximately 75 times larger than the Internet traffic of 2002. Traffic increase not only impacts the routing and switching infrastructure of an operator, but also his metering, monitoring, and accounting infrastructure which are vital operations for a modern network. If the network traffic increased about 50%-100% every year in the last decade, the memory access speeds only improved about 7-9% [7] per year during the same period. As a result, today, network operators either a) reduce the traffic they inspect by using sampling or aggregation, which reduces the accuracy of analysis applications or b) use hardware specialized in some specific traffic monitoring or analysis tasks, which is usually very expensive and less flexible than a software traffic analysis application.
2
Motivation
Solving traffic monitoring problems by distributing data to several nodes was proposed several times for solving specific problems. Although already existing proposals show that distribution may improve performance of traffic monitoring and analysis applications running in high-speed traffic environments, each of those solutions was designed and tuned for a specific problem. The first aspect which this thesis investigates is . The results of this work resulted in SCRIPT, which is a both, a framework for building distributed traffic analysis applications and an imB. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 171–174, 2010. © IFIP International Federation for Information Processing 2010
172
C. Morariu and B. Stiller
plemented prototype platform. The second aspect is Many traffic monitoring applications were built for the Linux operating system and make use of the of libraries to access the packets on the network link. If was observed that at high packet rates these libraries cause the operating system to spend most of its resources on capturing packets, while leaving less resources for the monitoring application, thus, causing an overload of the system which eventually leads to dropped packets. DiCAP, developed in this thesis allows several PCs running the same monitoring application to share the monitoring workload by splitting the observed packets between themselves. Finally, the third problem is ? The traditional way to address this problem is to assume that an IP address is used by a single user at a time and have a mapping between IP address-to-user mapping all times. A problem arises when the end systems are multi-user capable and several users run at the same time network applications ( background bittorrent applications). In this case an IP-to-user mapping is not possible anymore, as two consecutive packets from the same IP address may be produced by applications of two different users; the solution is LINUBIA.
3
DARTA Approach
The distributed traffic monitoring and analysis model proposed is not intended to address a single particular problem, but to cover a larger area of problems of high speed network monitoring. Figure 1 shows the proposed architecture for distributed IP traffic monitoring and analysis. It shows a layered architecture including a metering layer, a monitoring and analysis layer, and a presentation layer. The distributed metering layer includes one or more metering systems which are responsible with extracting the relevant data from the observed traffic. In order to cover also the user-based IP accounting problem this layer includes a model for general packet capture and processing, and another model for user-based IP traffic accounting. The second layer shown in Figure 1 represents the traffic analysis task which is also distributed. IP traffic data is exchanged between the first and the second layer as IPFIX records and uses internal mechanisms to forward these records to analysis application instances that may use them. Finally, the third layer represents a presentation system which includes interfaces that allow a human administrator to visualize the results of the analysis process. As the presentation system is dependent on the analysis application it was not cover by the thesis, but an API is described which allows a presentation system to be in the distributed analysis system. Figure 2 shows a general model for distributed traffic analysis. In a network there are multiple (such as routers, switches, links, services, etc) that need to be monitored. The operation of these components is observed and measured by one or more . One meter can measure more thana single component, for example it could measure traffic aggregated from several routers. At the same time a network component can be metered by multiple meters, for example one meter doing packet-level measurements, and a second doing flow-level measurements. The metered data, once it is produced, needs to be sent (or exported) to one or more These data collectors may perform limited pre-processing tasks, such as aggregation, anonymiza-
Distributed Architecture for Real-Time Traffic Analysis
173
Presentation System
Distributed Monitoring and Analysis
Distributed Metering Packet capture and processing
User-based IP accounting
Fig.1. Distributed IP Traffic Metering and Analysis Architecture
tion, filtering, or encapsulation, which prepare the data to be used by traffic analysis applications, before feeding the received data to a The encapsulation process is of particular importance as its task is to switch the format of the received metered data ( SNMP, NetFlow v5, Diameter, IPDR, proprietary protocols) to IPFIX which is used by the traffic processing platform. The traffic processing platform consists of one or more Each processing unit runs one or more It is the task of the traffic processing platform to feed each piece of metering data to the right analysis application instance. The results of the traffic analysis applications are fed to a component which presents them to a or . The presentation component also maintains a relation with the underlying traffic processing platform which allows it to access different traffic application instances. Meter
1..*
1..*
1..*
Data collector
User/Administrator
1..* 1..* 1
Network Components
Traffic Processing Platform
1..* 1
1..*
Presentation
Devices 1
Links 1
1..*
Applications
Processing unit
1..*
1..*
Traffic analysis application
Fig. 2. Generic Model for Distributed Traffic Analysis
4
Prototype and Evaluation
Three different distributed mechanisms have been developed in order to validate the distributed traffic metering and analysis model. The first mechanism named DiCAP [2] allows distributed packet capture using a libpcap-based application on a high-packet rate link. As its evaluation shows, DiCAP significantly improves (up to 10 times) the amount of packets that can be processed with four machines in parallel. The second metering mechanism named Linubia allows per-user traffic accounting in Linux end-hosts. It works for both IPv4 and IPv6 and as the evaluation in [3] shows it only introduces a very small overhead in processing a packet.
174
C. Morariu and B. Stiller
The third mechanism developed in this thesis is SCRIPT [4] a framework which can be used to build and deploy distributed traffic analysis applications. SCRIPT is generic enough to support any type of traffic analysis application, as long as it uses IPFIX to transport traffic data. Several prototypes for different traffic analysis applications have been implemented and the evaluation shows that SCRIPT fairly distributes workload among different traffic analysis nodes, and that increase in traffic can be addressed by adding new traffic analysis nodes.
5
Concluding Remarks
DARTA solves major challenges of IP traffic metering and analysis in high speed networks, therefore, the newly designed set of distributed mechanisms for handling traffic data can be used by future network management infrastructures. A generic distracted traffic analysis framework (SCRIPT) has been designed and prototypically implemented. Besides, two different distributed metering mechanisms (DiCAP) for capturing traffic on high-packet rates, and Linubia for mapping IP traffic to individual users on Linux end-hosts have been developed. As the evaluation of these mechanisms shows they increase the amount of traffic that can be handled by analysis applications by combining computational and storage resources from multiple devices. As the evaluation of Linubia shows, retrieving granular metering information (such as the user or process which generated a packet) from Linux end-devices is feasible as it only introduces limited overhead.
Acknowledgements This work was supported in part by the Cisco University Research Program Fund, the SNF DaSAHIT project, and the IST NoE EMANICS .
References [1] Cisco Systems: Hyperconnectivity and the Approaching Zettabyte Era (June 2009) [2] Morariu, C., Stiller, B.: DiCAP: Distributed Packet Capturing Architecture for High-Speed Network Links. In: 33rd Annual IEEE Conference on Local Computer Networks (LCN), Montreal, Canada (October 2008) [3] Morariu, C., Feier, M., Stiller, B.: LINUBIA: A Linux-supported User-Based IP Accounting. In: Clemm, A., Granville, L.Z., Stadler, R. (eds.) DSOM 2007. LNCS, vol. 4785, pp. 229–241. Springer, Heidelberg (2007) [4] Morariu, C., Racz, P., Stiller, B.: SCRIPT: A Framework for Scalable Real-time IP Flow Record Analysis. In: 12th IEEE/IFIP Network Operations and Management Symposium (NOMS 2010), April 2010. IEEE, Osaka (2010) [5] Minnesota Internet Traffic Studies (MINTS), http://www.dtc.umn.edu/mints/home.php (Last accessed: February 2010) [6] Odlyzko, A.M.: Internet Traffic Growth: Sources and Implications. In: Proceedings of SPIE, August 2003, vol. 5247, pp. 1–15 (2003) [7] Patterson, D.A., Hennessy, J.L.: Computer Organization and Design, 4th edn. Morgan Kaufmann, San Francisco (2008) [8] Roberts, L.G.: Beyond Moore’s Law: Internet Growth Trends. IEEE Computer Magazine (January 2000)
Scalable Service Performance Monitoring Idilio Drago and Aiko Pras University of Twente, The Netherlands {i.drago,pras}@utwente.nl
Abstract. Dependable performance measurement is a common requirement for all on-line services. The ongoing tendency to outsource not only infrastructure, but also software parts to several suppliers, creates a new challenge to providers. Traditional solutions cannot easily monitor service performance as perceived by end users, since providers do not have control over all hardware/software involved. Our research focuses on evaluating how data passively collected from network devices can be used to verify end-to-end service performance. Our goal is to compare this approach with current methods used to monitor distributed services. Given the constant mutation of applications on the Internet, as well as the fast increase of the number of users, we are looking for flexible and scalable solutions. Keywords: Measurement, service performance, traffic analysis, IPFIX.
1
Introduction
Services provides always depend on some measurement system to monitor the performance of their services. The common goal of all providers is to detect and to solve problems before their users are affected. From the perspective of end-users, the performance of an application is the combination of the performance of all software and infrastructure involved, including servers, network and client machines [1]. Providers usually monitor the health of their services both by collecting as much information as possible from existing activity on their infrastructure (passive measurement) and by regularly probing their services to inspect a set of performance metrics (active measurement) [2]. Protocols like SNMP [3], RMON-2 [4], Syslog [5], and NetFlow [6] are some of the most employed solutions in those situations. The ongoing tendency to outsource not only infrastructure, but also software parts to several suppliers, creates a new challenge to providers. Although services like Amazon Elastic Compute Cloud (Amazon EC2) [7] (which offers a virtual computing environment) and Google Apps [8] (which offers on-line messaging and collaboration applications) create new possibilities to providers, they also make more difficult to measure and to control the performance of services, especially when only parts are deployed in such environments. In those situations, traditional monitoring solutions are no longer easy to implement for several reasons, as for example: B. Stiller and F. De Turck (Eds.): AIMS 2010, LNCS 6155, pp. 175–178, 2010. c IFIP International Federation for Information Processing 2010
176
I. Drago and A. Pras
1. Providers have neither control over all hardware/software delivering their service nor access to client machines: Providers will not have the same monitoring tools that they have in their own data centre. Although a contract may stipulate quality limits, measurements are normally done by the supplier. Likewise, even in strict environments, like in an enterprise network, client devices may be uncontrollable or not flexible enough to receive parts of the monitoring solution. 2. Server environments are decentralised, applications have multiple instances running on clusters, and capacity can be adjusted via virtualisation: If a service is running on a virtual environment with dynamic capacity, agents collecting parameters in the virtual server may be useless. In the same way, few active probes will not give a correct overview of the performance if the service has several replicas running on clusters. 3. Active approaches become excessively intrusive or application specific in those scenarios: The main weaknesses of active approaches are amplified in highly distributed environments. For example, the amount of probes required to check performance metrics of services running on those environments can easy become prohibitive. Besides, some other solutions, like embedding performance metrics into application data, are application specific. 4. Scalability of the measurement solution is even more critical : Since services are distributed across several suppliers, measurement data must travel through the network. As the number of users increase, the impact of the measurement system on the normal operation becomes more critical. Our research focuses on evaluating solutions for performance monitoring of services in those highly distributed environments. Given the weaknesses of traditional approaches on such environments, our research is based on the following three assumptions: 1. Measurements must come from few sources, and a single collector device is desirable: We are assuming that there are a few concentration points from where providers can watch the activity of their users. For example, edge routers in an enterprise/campus network. In that case, we avoid the complexity of having several instances of the monitoring solution distributed throughout the network, as well as the uncertainty of measurements in virtual environments. 2. Measurement structure must be robust, flexible and scalable: We are assuming that communication patterns on the network, represented by network flows, can provide enough information to extract end-to-end performance metrics. Network flows are already widely used to monitor high speed networks because of scalability concerns. In addition, we are assuming that the definition of a flow is adjustable, giving us flexibility to monitor heterogeneous applications. 3. Standard solutions are preferable: Instead of proposing a completely new architecture, we are interested in evaluating how standard solutions can be applied in a special situation. Since we are dealing with network flows, IPFIX [9] is the natural choice as a starting point. It is important to note
Scalable Service Performance Monitoring
177
that quality of service is a target application of IPFIX. However, end-to-end service monitoring, as defined for example in the RMON2 APP-MIB [1] is not addressed in IPFIX standard [10].
2
Hypothesis and Research Questions
Our hypothesis is that most of the metrics used today to indicate end-to-end performance of services could be calculated or satisfactorily estimated from network flows. In order to verify that hypothesis, we formulated our main research question as follows: – How to monitor service performance based on network flow information? Given the limitations of current approaches described in the previous section, we split our main research question in the following four embedded questions: – How to identify services by their network flow patterns? The traffic generated by Internet applications is constantly mutating. Invariant properties that summarise all communication in a network are desirable, but difficult to define [11]. In order to extract performance metrics from network flow data, we have to connect user actions with a set of flows [10]. Additionally, our solution should not be limited to few applications or specific versions. As an example, in [12] network flows are used to monitor applications, but this solution is only valid for one type of application (SIP protocol). We are interested in methods that automatically identify Internet traffic, as for example some of the techniques presented in [2]. – How to extract service performance metrics from flow data? Once we have identified the flows generated by a user action, we have to extract performance metrics from them. In order to answer this question, we have to define the performance metrics of interest. The RMON2 APP-MIB [1] will be our starting point for that. The ongoing work presented in [13], which shows methods to extract basic performance metrics from flow data, will also be considered in our research. – How would the proposed solution perform in real environments? Several aspects of real networks can unfavourably affect our solution. For example, background traffic, data lost, tunnelling and data encryption are major hurdles for our proposal. In order to overcome those hurdles, we intend to use data from real networks to develop and to evaluate our solution. – How would the proposed solution perform when compared to other approaches? In Section 1, we have identified some of the main weaknesses of current solutions in the studied scenario. We want to compare our solution to current approaches and verify if those weaknesses are satisfactory overcome by our proposal.
178
3
I. Drago and A. Pras
Summary
We are researching the use of network flows for scalable service performance monitoring in highly distribute environments. Our goal is to extract end-to-end performance metrics from network flow data, and to compare this solution with current approaches. We are planning to check our results through experimental validation both in controlled environments, like small laboratory experiments, and in large scale environments, like our campus network. Acknowledgments. This work has been carried out in the context of the IOP GenCom project Service Optimisation and Quality (SeQual), which is supported by the Dutch Ministry of Economic Affairs via its agency SenterNovem.
References 1. Waldbusser, S.: Application Performance Measurement MIB (2004), http://www.ietf.org/rfc/rfc3729.txt 2. Callado, A., Kamienski, C., Szab´ o, G., Gero, B.P., Kelner, J., Fernandes, S., Sadok, D.: A Survey on Internet Traffic Identification. IEEE Communications Surveys & Tutorials 11(3), 37–52 (2009) 3. Case, J., Fedor, M., Schoffstall, M., Davin, J.: A Simple Network Management Protocol, SNMP (1990), http://www.ietf.org/rfc/rfc1157.txt 4. Waldbusser, S.: Remote Network Monitoring Management Information Base Version 2 (1997), http://www.ietf.org/rfc/rfc2021.txt 5. Gerhards, R.: The Syslog Protocol (2009), http://www.ietf.org/rfc/rfc5424.txt 6. Claise, B.: Cisco Systems NetFlow Services Export Version 9 (2004) 7. Amazon.com Inc.: Amazon Elastic Compute Cloud, Amazon EC2 (2010), http://aws.amazon.com/ec2/ 8. Google Inc.: Google Apps (2010), http://www.google.com/apps/ 9. Quittek, J., Zseby, T., Claise, B., Zander, S.: Requirements for IP Flow Information Export, IPFIX (2004), http://www.ietf.org/rfc/rfc3917.txt 10. Zseby, T., Boschi, E., Brownlee, N., Claise, B.: IP Flow Information Export (IPFIX) Applicability (2009), http://www.ietf.org/rfc/rfc5472.txt 11. Williamson, C.: Internet Traffic Measurement. IEEE Internet Computing 5, 70–74 (2001) 12. Anderson, S., Niccolini, S., Hogrefe, D.: SIPFIX: A Scheme For Distributed SIP Monitoring. In: IM 2009: Proceedings of the 11th IFIP/IEEE International Symposium on Integrated Network Management, Piscataway, NJ, USA, pp. 382–389. IEEE Press, Los Alamitos (2009) 13. K¨ ogel, J.: Including the Network View in Application Response Time Diagnostics using NetFlow. In: 2009 USENIX Annual Technical Conference (2009)
Author Index
Abdelnur, Humberto 65 Abreu, Fernando Brito e 85 Ali, Azman 118 Arozarena, Pablo 102 Badonnel, Remi 89 Barbosa, Rafael Ramos Regis Bartolini, Claudio 114 Becker, Sheila 65 Boavida, Fernando 14 Bonnet, Gr´egory 2, 26 Charalambides, Marinos Correia, Anacleto 85 Couch, Alva L. 38
93
65, 135
Fagernes, Siri 38 Feridun, Metin 1 Festor, Olivier 89 Fiorese, Adriano 14 Ga¨ıti, Dominique
2, 26
H¨ amm¨ ainen, Heikki 77 Hecht, Fabio Victora 81 Hickey, Marianne 106, 114 Holopainen, Visa 123 Hutchison, David 118 J´ onsson, Kristj´ an Valur Kanev, Kaloyan Kantola, Raimo
147 123
Lamminen, Olli-Pekka Lu, Jingxian 110
123
163 Makhloufi, Rafik 26 Mauthe, Andreas 118 Melnikov, Nikolay 147, 167 Morariu, Cristian 171 Ourdane, Mohamed
135
Pavlou, George 93 Pras, Aiko 163, 175
Dabbebi, Oussema 89 Dam, Mads F. 159 Desmet, Stein 50 De Turck, Filip 50 Dousson, Christophe 110 Doyen, Guillaume 2, 26 Drago, Idilio 175 Engel, Thomas
Kheir, Nizar 118 Kielthy, Jesse 102 Krief, Francine 110
159
Quinn, Kevin
102
Radier, Benoit 110 Rahmouni, Maher 106, 114 Schaeffer-Filho, Alberto 118 Sch¨ oller, Marcus 118 Sch¨ onw¨ alder, J¨ urgen 98, 147, 167 Sehgal, Anuj 98 Sim˜ oes, Paulo 14 Smith, Paul 118 State, Radu 65, 135 Stiller, Burkhard 81, 171 Taira, Taneli 123 Toribio, Raquel 102 Tuncer, Daphn´e 93 Ullah, Ihsan
2
Volckaert, Bruno Wang, Shaonan Warma, Henna Zach, Martin
50 135 77
102