Monitoring, Security, and Rescue Techniques in Multi-Agent Systems: Proceedings of the International Workshop Msras 2004

Monitoring, Security, and Rescue Techniques in Multiagent Systems Advances in Soft Computing Editor-in-chief Prof. Jan...

Author: Tetsuzo Tanino | Tamaki Tanaka | Masahiro Inuiguchi

6 downloads 679 Views 33MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Monitoring, Security, and Rescue Techniques in Multiagent Systems

Advances in Soft Computing Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw, Poland E-mail: [email protected] Further books of this series can be found on our homepage: springeronline.com Henrik Larsen, Janusz Kacprzyk, SJawomir Zadrozny, Troels Andreasen, Henning Christiansen (Eds.) Flexible Query Answering Systems 2000. ISBN 3-7908-1347-8 Robert John and Ralph Birkenhead (Eds.) Developments in Soft Computing 2001. ISBN 3-7908-1361-3

Leszek Rutkowski, Janusz Kacprzyk (Eds.) Neural Networks and Soft Computing 2003. ISBN 3-7908-0005-8 Jurgen Franke, Gholamreza Nakhaeizadeh, Ingrid Renz (Eds.) Text Mining 2003. ISBN 3-7908-0041-4

Mieczyslaw Kbpotek, Maciej Michalewicz and Slawomir T. Wierzchon (Eds.) Intelligent Information Systems 2001 2001. ISBN 3-7908-1407-5

Tetsuzo Tanino,Tamaki Tanaka, Masahiro Inuiguchi Multi-Objective Programming and Goal Programming 2003. ISBN 3-540-00653-2

Antonio Di Nola and Giangiacomo Gerla (Eds.) Lectures on Soft Computing and Fuzzy Logic 2001. ISBN 3-7908-1396-6

Mieczyslaw Klopotek, Slawomir T. Wierzchon, Krzysztof Trojanowski (Eds.) Intelligent Information Processing and Web Mining 2003. ISBN 3-540-00843-8

Tadeusz Trzaskalik and Jerzy Michnik (Eds.) Multiple Objective and Goal Programming 2002. ISBN 3-7908-1409-1 James J. Buckley and Esfandiar Eslami An Introduction to Fuzzy Logic and Fuzzy Sets 2002. ISBN 3-7908-1447-4 Ajith Abraham and Mario Koppen (Eds.) Hybrid Information Systems« 2002. ISBN 3-7908-1480-6

Ahmad Lotfi, Jonathan M. Garibaldi (Eds.) Applications and Science in Soft-Computing 2004. ISBN 3-540-40856-8 Mieczyslaw Klopotek, Slawomir T. Wierzchon, Krzysztof Trojanowski (Eds.) Intellligent Information Processing and Web Mining 2004. ISBN 3-540-21331-7

Przemyslaw Grzegorzewski, Olgierd Hryniewicz, Maria 9 . Gil (Eds.) Soft Methods in Probability, Statistics and Data Analysis 2002. ISBN 3-7908-1526-8

Miguel Lopez-Diaz, Maria 9 . Gil, Przemyslaw Grzegorzewski, Olgierd Hryniewicz, Jonathan Lawry Soft Methodology and Random Information Systems 2004. ISBN 3-540-22264-2

Lech Polkowski Rough Sets 2002. ISBN 3-7908-1510-1

Kwang H. Lee First Course on Fuzzy Theory and Applications 2005. ISBN 3-540-22988-4

Mieczyslaw Klopotek, Maciej Michalewicz and Slawomir T. Wierzchon (Eds.) Intelligent Information Systems 2002 2002. ISBN 3-7908-1509-8

Barbara Dunin-K^plicz, Andrzej Jankowski, Andrzej Skowron, Marcin Szczuka Monitoring, Security, and Rescue Techniques in Multiagent Systems 2005. ISBN 3-540-23245-1

Andrea Bonarini, Francesco Masulli and Gabriella Pasi (Eds.) Soft Computing Applications 2002. ISBN 3-7908-1544-6

Barbara Dunin-K^plicz Andrzej Jankowski Andrzej Skowron Marcin Szczuka

Monitoring, Security, and Rescue Techniques in Multiagent Systems

With 138 Figures

^ S p r iinger

Barbara Dunin-K^plicz Institute of Computer Science Polish Academy of Sciences Ordona 21 01-237 Warsaw, Poland and Institute of Informatics, Warsaw University Banacha 2 02-097 Warsaw, Poland and Institute for Decision Process Support Chemikow 5 09-411 Piock, Poland Andrzej Jankowski Institute for Decision Process Support Chemikow 5 09-411 Plock, Poland

Andrzej Skowron Institute of Mathematics Warsaw University Banacha 2 02-097 Warsaw, Poland and Institute for Decision Process Support Chemikow 5 09411 Ptock, Poland Marcin Szczuka Institute of Mathematics Warsaw University Banacha 2 02-097 Warsaw, Poland

Library of Congress Control Number: 2004116865

ISSN 16-15-3871 ISBN 3-540-23245-1 Springer Berlin Heidelberg NewYork This work is subject to copyright. AU rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to Prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2005 Printed in Germany The use of general descriptive names, registered names, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and free for general use. Cover design: Erich Kirchner, Springer Heidelberg Typesetting: Digital data supplied by editors Production: medionet AG, Berlin Printed on acid-free paper

62/3141/Rw-5 43 210

Preface

In todays society the issue of security, understood in the widest context, have become a crucial one. The people of the information age, having instant access to the sources of knowledge and information expect the technology to improve their safety in all respects. That, straightforwardly, leads to the demand for methods, technologies, frameworks and computerized tools which serve this purpose. Nowadays such methods and tools are more and more expected to embed not only ubiquitous information sources, but also the knowledge that stems from them. The use of knowledge-based technology in security applications, and in the society at large, clearly emerges as the next big challenge. Within general set of security-related tasks we may exhale some sub-fields such us monitoring, control, crisis and rescue management. Multiagent systems are meant to be a toolset for modelling of, automated reasoning with, and study on the behavior of compound environments that involve many perceiving, reasoning and acting parties. In a natural way they are well suited for supporting the research on foundations of automatic reasoning processes starting from data acquisition (including data entry, sensor measurements and multimedia information processing) to automatic knowledge perception, real-life situation assessment, through planning to action execution in the context of monitoring, security, and rescue techniques. These activities are closely related to many very active research areas like autonomous systems, spatio-temporal reasoning, knowledge representation, soft computing with rough, fuzzy and rough mereological approaches, perception, learning, evolution, adaptation, data mining and knowledge discovery, collective intelligence and behavior. All these research directions have plenty of possible applications in the systems that are concerned with assuring security, acting in emergency and crisis situations, monitoring of vital infrastructures, managing cooperative jobs in the situation of danger, and planning of action for the rescue campaigns. This volume contains extended and improved versions of selected contributions presented at the International Workshop "Monitoring, Security, and Rescue Techniques in Multiagent Systems" (MSRAS 2004) held in Plock, Poland, June 7-9, 2004. The MSRAS 2004 workshop was aimed at gathering world's leading researchers active in areas related to monitoring, security, and rescue techniques in multiagent

VI

Preface

systems. Such techniques are among the core issues that involve very large numbers of heterogeneous agents in the hostile environment. The intention of the workshop was to promote research and development in these significant domains. The workshop itself was a significant success thanks to the presence and contributions of the leading researchers in the field. In this way, by establishing a forum for exchanging results and experience of top specialists working in areas closely related to such tasks the new possibilities for scientific cooperation have been created. The workshop was also the first step on the road to establishing permanent research and technology center in Plock. We hope that this center, a part of Industrial and Technology Park, will become an important institution contributing to fostering the research in knowledge-based technologies and security. Organization of the volume As the 48 contributions in this volume span over very wide area of research, it is quite hard to categorize them precisely. Therefore, there are only two major parts in this volume. First one, entitled "Foundations and Methods" gathers the papers of more theoretical and fundamental character as well as those dealing with general, basic descriptions of various methodologies and paradigms. The second part, entitled "Application Domains and Case Studies" is meant to encapsulate the articles that deal with more specific problems, concrete solutions and application examples. Naturally, the division is very subjective and should not be treated as definite. Within each part the papers are organized in accordance with they role at the workshop. It means that in each of the parts the articles from keynote presenters come first, followed by invited and regular contributions, and finshed by papers that were part of special and poster sessions. Acknowledgements We wish to express our gratitude to Professors Zdzislaw Pawlak and Lotfi A. Zadeh who accepted our invitation to act as honorary chairs of the workshop. We are also very grateful to all oustanding scientists who participated in the workshop including: Andrzej Uszok Jean-Pierre Muller James F. Peters KatiaSycara David Wolpert Philip S. Yu Hans-Dieter Burkhard Tom R. Bums Nikolai Chilov Andrzej Czyzewski Barbara Dunin-K^plicz Rineke Verbrugge Amal EL Fallah-Seghrouchni

Vladimir Gorodetski Zbigniew Michalewicz Hideyuki Nakanishi SankarK.Pal L^ch Polkowski Alberto Pettorossi Zbigniew Ras Alexander Ryjov Marek Sergot V.S. Subrahmanian ^^^^^^ ^^^g ^ui Wang

Preface

VII

Many thanks to all authors who prepared their articles for this volume. We would like to thank PKN Orlen, City of Flock and the supervisor of MSRAS organization - Ms. Sulika Kamiiiska. Without the money, expertise, and organizational muscle they provided, neither the MSRAS workshop nor the publication of this volume would have been possible. We are also thankful to Springer-Verlag and Dr. Thomas Ditzinger for the opportunity of publishing this volume in the "Advances in Soft Computing" series. Warsaw, July 2004

Barbara Dunin-K^plicz Andrzej Jankowski Andrzej Skowron Marcin Szczuka

Contents

Part I Foundations and Methods 1 Flow Graphs, their Fusion and Data Analysis Zdzislaw Pawlak

3

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models James F. Peters

13

3 Distributed Adaptive Control: Beyond Single-Instant, Discrete Control Variables David H. Wolpert, Stefan Bieniawski

31

4 Multi-agent Planning for Autonomous Agents' Coordination Amal El Fallah-Seghrouchni

53

5 Creating Common Beliefs in Rescue Situations Barbara Dunin-K§plicz, Rineke Verbrugge

69

6 Coevolutionary Processes for Strategic Decisions Rodney W. Johnson, Michael E. Melich, Zbigniew Michalewicz, Martin Schmidt

85

7 Automatic Proofs of Protocols via Program Transformation Fabio Fioravanti, Alberto Pettorossi, Maurizio Proietti 8 Mereological Foundations to Approximate Reasoning Lech Polkowski 9 Data Security and Null Value Imputation in DIS Zbigniew W. Ras, Agnieszka Dardzinska

99 117 133

X

Contents

10 Basic Principles and Foundations of Information Monitoring Systems Alexander Ryjov

147

11 Modelling Unreliable and Untrustworthy Agent Behaviour Marek Sergot

161

12 Nearest Neighbours without k Hui Wang, Ivo Duntsch, Gunther Gediga, Gongde Guo

179

13 Classifiers Based on Approximate Reasoning Schemes Jan Bazan, Andrzej Skowron

191

14 Towards Rough Applicability of Rules Anna Gomolinska

203

15 On the Computer-Assisted Reasoning about Rough Sets Adam Grabowski

215

16 Similarity-Based Data Reduction and Classification Gongde Guo, Hui Wang, David Bell, Zhining Liao

227

17 Decision Trees and Reducts for Distributed Decision Tables Mikhail Ju. Moshkov

239

18 Learning Concept Approximation from Uncertain Decision Tables Nguyen Sink Hoa, Nguyen Hung Son

249

19 In Search for Action Rules of the Lowest Cost Zbigniew W Has, Angelina A. Tzacheva

261

20 Circularity in Rule Knowledge Bases Detection using Decision Unit Approach Roman Siminski, Alicja Wakulicz-Deja

273

21 Feedforward Concept Networks Dominik Sl§zak, Marcin Szczuka, Jakub Wroblewski

281

22 Extensions of Partial Structures and Their Apphcation to Modelling of Multiagent Systems Bozena Staruch

293

23 Tolerance Information Granules Jaroslaw Stepaniuk

305

24 Attribute Reduction Based on Equivalence Relation Defined on Attribute Set and Its Power Set Ling Wei, Wenxiu Zhang

317

Contents

XI

25 Query Cost Model Constructed and Analyzed in a Dynamic Environment Zhining Liao, Hui Wang, David Glass, Gongde Quo

327

26 The Efficiency of the Rules' Classification Based on the Cluster Analysis Method and Salton's Method Agnieszka Nowak, Alicja Wakulicz-Deja

333

27 Extracting Minimal Templates in a Decision Table Barbara Marszai-Paszek, Piotr Paszek

339

Part II AppUcation Domains and Case Studies 28 Programming Bounded Rationality Hans-Dieter Burkhard

347

29 Generalized Game Theory's Contribution to Multi-agent Modelling Tom R. Burns, Jose Castro Caldas, Ewa Roszkowska

363

30 Multi-Agent Decision Support System for Disaster Response and Evacuation Alexander Smimov, Michael Pashkin, Nikolai Chilov, Tatiana Levashova, Andrew Krizhanovsky

385

31 Intelligent System for Environmental Noise Monitoring Andrzej Czyzewski, Bozena Kostek, Henryk Skarzynski

397

32 Multi-agent and Data Mining Technologies for Situation Assessment in Security-related Applications Vladimir Gorodetsky, Oleg Karsaev, Vladimir Samoilov

411

33 Virtual City Simulator for Education, Training, and Guidance Hideyuki Nakanishi

423

34 Neurocomputing for Certain Bioinformatics Tasks Shubhra Sankar Ray, Sanghamitra Bandyopadhyay, Pabitra Mitra, SankarK. Pal

439

35 Rough Set Based Solutions for Network Security Guoyin Wang, Long Chen, YuWu

455

36 Task Assignment with Dynamic Token Generation Alessandro Farinelli, Luca locchi, Daniele Nardi, Fabio Patrizi

467

37 DyKnow: A Framework for Processing Dynamic Knowledge and Object Structures in Autonomous Systems Fredrik Heintz, Patrick Doherty

479

XII

Contents

38 Classifier Monitoring using Statistical Tests Rafal Latkowski, Cezary Gtowifiski

493

39 What Do We Learn When We Learn by Doing? Toward a Model of Dorsal Vision Ewa Ranch

501

40 Rough Mereology as a Language for a Minimalist Mobile Robot's Eenvironment Description Lech Polkowski, Adam Szmigielski

509

41 Data Acquisition in Robotics Krzysztof Luks

519

42 Spatial Sound Localization for Humanoid Lech Blazejewski

527

43 Oculomotor Humanoid Active Vision System Piotr Kazmierczak

539

44 Crisis Management via Agent-based Simulation Grzegorz Dohrowolski, Edward Nawarecki

551

45 Monitoring in Multi-Agent Systems: Two Perspectives Marek Kisiel-Dorohinicki

563

46 Multi-Agent Environment for Management of Crisis in an Enterprises-Markets Complex Jaroslaw Kozlak

571

47 Behavior Based Detection of Unfavorable Events Using the Multiagent System Krzysztof Cetnarowicz, Edward Nawarecki, Gabriel Rojek

579

48 Intelligent Medical Systems on Internet Technologies Platform Beata Zielosko, Andrzej Dyszkiewicz

589

Author Index

595

Flow Graphs, their Fusion and Data Analysis Zdzislaw Pawlak Institute of Computer Sciences Warsaw University of Technology Ul. Nowowiejska 15/19, 00665 Warsaw, Poland and Warsaw School of Information Technology ul. Newelska 6, 01-447 Warsaw, Poland zpwiiii . p w . e d u . p l Summary. This paper concerns a new approach to data analysis based on information flow distribution study in flow graphs. The introduced flow graphs differ from that proposed by Ford and Fulkerson, for they do not describe material flow in the flow graph but information "flow" about the data structure. Data analysis (mining) can be reduced to information flow analysis and the relationship between data can be boiled down to information flow distribution in aflownetwork. Moreover, it is revealed that information flow satisfies Bayes' rule, which is in fact an information flow conservation equation. Hence information flow has probabilistic character, however Bayes' rule in our case can be interpreted in an entirely deterministic way, without referring to prior and p(75r^nc>r probabilities, inherently associated with Bayesian philosophy. Furthermore in this paper we study hierarchical structure of flow networks by allowing to substitute a subgraph determined by branches x and y by a single branch connecting x and y, called fusion of x and y. This "fusion" operation allows us to look at data with different accuracy and move from details to general picture of data structure.

Key words: flow graphs, data fusion, data mining, Bayes' rule

1.1 Introduction In [4] we presented a new approach to data analysis based on information flow distribution study in flow graphs. The introduced flow graphs differ from that proposed by Ford and Fulkerson [1], for they do not describe material flow in the flow graph but information "flow" about the data structure. With every branch of the flow graph three coefficients are associated, called strength, certainty and coverage factors. These coefficients were widely used in data mining and rough set theory, but in fact they were first introduced by Lukasiewicz [2] in connection with his study of logic and probability. These coefficients have a

4

Zdzislaw Pawlak

probabilistic flavor, but here they are interpreted in a deterministic way, describing information flow distribution in the flow graph. We claim that data analysis (mining) can be reduced to information flow analysis and the relationship between data can be boiled down to information flow distribution in a flow network. Moreover, it is revealed that information flow satisfies Bayes' rule, which is in fact an information flow conservation equation. Hence information flow has probabilistic character, however Bayes' rule in our case can be interpreted in an entirely deterministic way, without referring to prior and posterior probabilities, inherently associated with Bayesian philosophy. Furthermore in this paper we study hierarchical structure of flow networks by allowing to substitute a subgraph determined by branches x and 2/ by a single branch connecting x and y, cdXltd fusion of x and y. This "fusion" operation allows us to look at data with different accuracy and move from details to general picture of data structure. This approach allows us to study different relationships in data and can be used as a new mathematical tool for data mining. Summing up, we advocate to use flow analysis to: • • • •

searching for patterns in data, searching for dependencies in data, data classification, data fusion.

A simple tutorial example will be used to illustrate the introduced ideas.

1.2 Example 1 - Smoking and Cancer First let us explain basic concepts of the proposed methodology on a simple example taken from [3]. In Table 1.1 data conceming 60 people who do or do not smoke and do or do not have cancer are shown. Table 1.1. Smoking and Cancer Not cancer Cancer Total

Not smoke 40 7 47

Smoke 10 3 13

Total 50 10 60

With every data table like that in presented in Table 1.1 we associate a flow graph as shown in Fig. 1.1. Nodes XQ and xi are inputs of the graph, whereas yo and yi are outputs of the graph. The numbers assigned to the input nodes (J){XQ) and 0(xi) of the flow graph represent inflow to the flow graph, whereas numbers associated with the inputs 0(2/0) and (j){yi) represent outflow of the graph. Every branch (x, y) of the flow graph is

1 Flow Graphs, their Fusion and Data Analysis

5

labeled by a number which represents a throughflow (j){x, y) through the branch from nodes xioy. This representation of data is intended to capture the relationships in the data and is not meant to describe any material flow in the network.

yes cp(^l)=13

cpO;j)=10

Fig. 1.1. Flow graph for Table 1.1 We will show in the next sections that representation of data as flow in a flow graph can be used to discover many important relationships in data, e.g. dependences. However to this end we have to "normalize" the flow graph by using instead of absolute values of flow (j){x) their relative values cr(x), i.e. percentage of flow with respect to total flow of the graph. The absolute throughflow (a:, y) will be also replaced by relative throghflow cr{x,y). This normalized representation has very interesting mathematical properties, which can be use to discover patterns in data. Beside, we will use two additional coefficients called the certainty and coverage factors, denoted cer{x, y) and cov{x, y) respectively, which characterize how the flow is spread between nodes x and y. Normalized flow graph for the flow graph given in Fig. 1.1 is shown in Fig. 1.2.

a(jcj) = 13/60

a(yj)= 10/60

Fig. 1.2. Normalized flow graph for Table 1.1

From the flow graph we arrive at the following conclusions: • • •

85% non-smoking persons do not have cancer (cer(a;o, yo) = 40/47 ^ 0.85), 15% non-smoking persons do have cancer (cer(xo, yi) = 7/47 ^ 0.15), 77% smoking persons do not have cancer {cer{xi,yo) = 10/13 ^ 0.77),

6 •

Zdzislaw Pawlak 23% smoking persons do have cancer (cer(xi, yi) = 3/13 ^ 0.23).

From the flow graph we get the following reason for having or not cancer: • • • •

80% persons having not cancer do not smoke {cov{xo^ yo) = 4/5 = 0.80), 20% persons having not cancer do smoke {cov{xi^yo) = 1/5 = 0.20), 70% persons having cancer do not smoke {cov{xo, yi) = 7/10 = 0.70), 30% persons having cancer do smoke {cov{xi,yi) = 3/10 = 0.30).

Let us observe that in the statistical terminology cr(xo), (T{XI) are priors while (^{xo^yo)^ " ", c^(^i5 2/i) are joint distributions, cov(xo, yo),..., cov{xi,yi) SLTQ posteriors and cr{yo),a{yi) are marginal probabilities.

1.3 Flow Graphs Basic Concepts 1.3.1 Flow Graphs In this section the fundamental concept of the proposed approach flow graph is defined and discussed. A flow graph is a directed, acyclic, finite graph G = {N, B, ), where A'^ is a set oi nodes, B C N x N is Siset of directed branches, cj) : B —^ R^ is ?iflowfunction and R^ is the set of non-negative reals. Input of a node x e N is the set I{x) = {y E N : {y,x) e B}', output of a node X e N is defined as 0{x) = {y e N : {x,y) e B}. We will also need the concept of input and output of a graph G, defined, respectively, as follows: I{G) ^ {x e N : I{x) = 0}, 0{G) = {x e N : 0{x) = 0}. Inputs and outputs of G are external nodes of G', other nodes are internal nodes ofG. If (x, y) £ B then (/)(x, y) is a throughflow from x to y. With every node x of a flow graph G we associate its inflow Mx)=

^ 0(2/,x), yel{x)

(1.1)

and outflow

4>-{x)=

Yl

^(^'2/).

(1.2)

yeo{x) Similarly, we define an inflow and an outflow for the whole flow graph, which are defined as

ct>^{G)= Y.

^-(^)'

(1-^)

yei{G) xei(0) We assume that for any intemal node x, 4>+{x) = -(a:) = (t){x), where
1 Flow Graphs, their Fusion and Data Analysis

7

Obviously, +(G) = (/)-{G) = (j){G), where (G) is a troughflow of graph G. The above formulas can be considered as^ow conservation equations [4]. We will define now a normalized flow graph. A normalized flow graph is a directed, acyclic, finite graph G = (N^B^a), where N is a set of nodes, B C A/^ x A^ is a set of directed branches and cr:>B—» < 0,1 > is a normalized flow of (a:, y) and

is a strength of (x,2/). Obviously, 0 < cr{x^y) < 1. The strength of the branch expresses simply the percentage of a total flow through the branch. In what follows we will use normalized flow graphs only, therefore by flow graphs we will understand normalized flow graphs, unless stated otherwise. With every node x of a flow graph G we associate its inflow and outflow defined as

^^

^

yeO{x)

Obviously for any internal node x, we have cr^{x) = a normalized throughflow of x. Moreover, let

(T-{X)

— cr{x), where a{x) is

Obviously, a+(G) = (7_(G) = c7(G) = 1. If we invert direction of all branches in G, then the resulting graph G = (AT, B\ a') will be called an inverted graph of G. Of course the inverted graph G' is also a flow graph and all inputs and outputs of G become inputs and outputs of G\ respectively. 1.3.2 Certainty and Coverage Factors With every branch (x, y) of a flow graph G we associate the certainty and the coverage factors. The certainty and the coverage of (x, y) are defined as cer(z,j/) = ^ % f ,

(1.10)

8

Zdzislaw Pawlak

and COv{x,y) = ^

^

.

(1.11)

respectively. Evidently, cer{x, y) = cov{y, x), where (a;, y) E B and (y, x) G ^ ' . Below some properties, which are immediate consequences of definitions given above are presented: ^ cer(x,2/) = l, (1.12) yeO{x)

Y^ cov{x,y) = l,

(1.13)

xel{y)

(^{x)=

Y^ cer{x,y)cF{x) = ^ 2/€0(a;)

^(y)=

(T{x,y),

(1.14)

cr{x,y),

(1.15)

yeO(x)

X I coi;(x,2/)a(2/) = xel{y)

^ xyel{y)

cer(.,,)^-(-'.y^),

(1.16)

(T(X)

co^0^,7/ = — H ^ r ^ .

(1.17)

Obviously the above properties have a probabilistic flavor, e.g., equations (14) and (15) have a form of total probability theorem, whereas formulas (16) and (17) are Bayes' rules. However, these properties in our approach are interpreted in a deterministic way and they describe flow distribution among branches in the network. 1.3.3 Paths, Connections and Fusion A {directed) path from x to y, x ^ y in G is a sequence of nodes x i , . . . , x^ such that xi = x^Xn — y and (xj, xi^i) G B for every i, l < z < n — l . A path from x to y is denotedhy[x...y]. The certainty of the path [ x i . . . Xn] is defined as n-l

cer[xi ,..Xn]=

]][ cer{xi,x^+i),

(1.18)

2=1

the coverage of the path [ x i . . . x^] is n-l COt'[xi . . . Xn] = J J COv{Xi, Xi+i), i=l

and the strength of the path [ x . . . y] is

(1-19)

1 Flow Graphs, their Fusion and Data Analysis a[x .. .y] = a{x)cer[x .. .y] = a{y)cov[x .. .y].

9 (1-20)

The set of all paths from x to y{x 7^ y) in O denoted < x, y >, will be called a connection from x to y in G. In other words, connection < x, y > is a sub-graph of G determined by nodes x and y. The certainty of the connection < x, y > is cer < x^y >=

V^

cer[x...y]^

(1.21)

[x...y]e<x,y>

the coverage of the connection < x, y > is GOV < x,y >=

22

cov[x.. .y],

(1-22)

[x...y]e<x,y>

and the strength of the connection < x, y > is a<x,y>=

^

cr[x...2/] =

[x...?/]€

= a{x)cer < x^y >= a{y)cov < x,y > .

(1.23)

If we substitute simultaneously every sub-graph < x, y > of a given flow graph G, where x is an input node and y an output node of G, by a single branch (x, y) such that cr(x, y) = a < x,y >, then in the resulting graph G\ called the fusion of G, we have cer(x,y) = cer < x,y >, cov{x,y) = cov < x,y > and <j(G^) = cr{G'). Thus fusion of a flow graph can be understood as a simplification of the graph and can be used to get a general picture of relationships in the flow graph. 1.3.4 Dependences in Flow Graphs Let X and y be nodes in a flow graph G = (iV, 3, a), such that (x, y) e B. Nodes X and y are independent in G if (j(x,y) =cr(x)a(y).

(1.24)

a(x,y) = cer{x,y) =(j{y), cr(x)

(1.25)

From (21) we get

and

cr(x,y)

cot'(x,y) = cr(x).

(1.26)

If or

cer{x,y) > a{y),

(1.27)

cov{x,y) > cr(x),

(1.28)

10

Zdzislaw Pawlak

then X and y are positively depends on x in G. Similarly, if cer{x,y) < a{y),

(1.29)

or cov{x,y) < CF{X),

(1.30)

then X and y are negatively dependent in G. Relations of independency and dependences are symmetric ones, and are analogous to those used in statistics. For every branch (x, y) G B'WQ define a dependency (correlation) factor //(x, y) defined as cov{x, y) — a[x) cer{x, y) — a{y) r]{x,y) (131) cer{x^y) -\- (j{y) cov[x^y) -{- a(x)' Obviously —1 < rj{x,y) < 1; ri{x,y) = 0 if and only \i cer{x^y) — a{y) and cov{x,y) = a{x);r]{x,y) = — 1 if andonly if cer(x,t/) = cov{x,y) =0;r){x,y) = 1 if and only if a{y) = a{x) = 0. It is easy to check that if r}{x, y) = 0, then x and y are independent, if - 1 < 77(x, y) < 0 then x and y are negatively dependent and if 0 < 77(x, y) < I then X and y are positively dependent. Thus the dependency factor expresses a degree of dependency, and can be seen as a counterpart of correlation coefficient used in statistics.

Disease yes a(x{) = 0.70

a{x^ = 0.30

young Fig. 1.3. Initial data

1 Flow Graphs, their Fusion and Data Analysis

11

1.4 Example 2 - Medical Test Now we are ready to illustrate the basic concepts presented in this paper by a simple tutorial example. Various patient groups are put to the test for certain drug effectiveness. Initial data are shown in Fig. 1.3. Corresponding flow graph is presented in Fig. 1.4.

a(jC2) = 0.30

a(z2) = 0.47

G(y^) = 0.25

young Fig. 1.4. Relationship between Disease, Age and Test Fig. 1.5 shows the corresponding fusion, of Disease and Test.

Disease

Test

yes a(Xj) = 0.70

G(X^)

= 0.30

Giz^) = 0.55

G(Z^)

= 0.45

Fig. 1.5. Fusion of theflowgraph presented in Fig. 1.4 This flow graph leads to the following conclusions:

12

Zdzislaw Pawlak

•

If the disease is present then the test result is positive with certainty 0.68

•

It the disease is absent then the test result is negative with certainty 0.78

Explanation of test results is as follows: • If the test result is positive then the disease is present with certainty 0.87 • If the test result is negative then the disease is absent with certainty 0.61 From the flow graph we get: • • • •

There is slight positive correlation between presence of the disease and positive test result (ry = 0.10). There is low positive correlation between absence of the disease and negative test result (r? = 0.27). There is slight negative correlation between presence of the disease and negative test result (77 = -0.17). There is higher negative correlation between absence of the disease and positive test result (77 = -0.40).

1.5 Conclusions We proposed in this paper to represent relationships in data by means of flow graphs. Flow in the flow graph is meant to capture structure of data rather than to describe any physical material flow in the network. It is revealed the information flow in the flow graph is governed by Bayes' formula, however the formula can interpreted in entirely deterministic way, without referring to its probabilistic character. This representation allows us to study different relationships in the data and can be used as a new mathematical tool for data mining. Summing up: • flow graphs can be used to knowledge representation, • flow distribution represents relationships in data, • flow conservation is described by Bayes' formula, • Bayes' formula has deterministic interpretation. Acknowledgements Thanks are due to Professor Andrzej Skowron for critical remarks.

References 1. Ford L.R, Fulkerson D.R,(1962) Flows in Networks. Princeton University Press, Princeton. New Jersey 2. Lukasiewicz J, (1913) Die logishen Grundlagen der Wahrscheinlichkeitsrechnung. Krakow. In: Borkowski L, (ed.), Jan Lukasiewicz - Selected Works, North Holland Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw, 1970 3. Grinstead Ch. M, Snell J. L, (1997) Introduction to Probability: Second Revised Edition American Mathematical Society 4. Pawlak Z,(2003) Flow Graphs and Decision Algorithms. In: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Proceedings, G. Wang, Y. Yao and A. Skowron (eds.) Lecture Notes in Artificial Intelligence 2639 1-10 Springer

Approximation Spaces for Hierarchical Intelligent Behavioral System Models James F. Peters Department of Electrical and Computer Engineering, University of Manitoba Winnipeg, Manitoba R3T 5V6 Canada j [email protected] [Roughness] is an essential feature of living things, and has deep structural causes. -Christopher Alexander, 2002. Summary. This article introduces a hierarchical behavioral model for an intelligent system that is capable of approximate reasoning. The rough set approach introduced by Zdzislaw Pawlak provides a ground for concluding to what degree a particular model for an intelligent system is a part of a set of models representing a standard. Each layer of the hierarchical view of an intelligent system includes one or more information systems as well as one or more approximation spaces that provide a framework for approximate reasoning, learning, and pattern recognition. An approach to the solution of the behavioral system model classification problem in the context of rough sets and a satisfaction-based approximation space is suggested in this article. Approximation spaces are used to classify and measure intelligent system behavior patterns. In this context, rough inclusion of information granules relative to a standard and the proximity to a satisfaction threshold are measured. In addition, a rough set approach to ethology in classifying the behavior of cooperating agents is introduced. A hierarchical model for a swarmbot is briefly considered by way of illustration of the approach to modeling and classifying the behavior of intelligent systems. Key words: approximation space, behavior, ethology, system model, design pattern, intelligent system, rough sets, swarm intelligence

2.1 Introduction This paper introduces an approach to approximate reasoning about hierarchical, behavioral system models for intelligent systems in the context of rough sets [1924,30,32-33,35,38-45]. Considerable work has been done on approximation spaces in the context of rough sets [24,27-28,38-41,46] based on generalized approximation spaces introduced in [38-39]. This work on approximation spaces is an outgrowth of the original approximation space definition in [24]. It is well-known that experimental models for system design in general and intelligent system design in particular

14

James F. Peters

seldom exactly match what might be considered a standard. This is to be expected, since system designs tend to have an unbounded number of variations relative to an accepted design pattern. Consider, for example, the variations in the implementation of design patterns in architecture made possible by pattern languages [1-4]. This is expected, and encouraged. It is this variation in actual system design patterns that is a source of a difficult classification problem [25-26,28-29]. The approach to creating an intelligent system model for a particular project resembles what architects do in designing a living space or what weavers do in designing a tapestry or a carpet. That is, the particular living space designed by an architect will represent variations of known architectural patterns driven by project context. The architectural patterns are drawn from a pattern language representing standards. Similarly, the design of a particular intelligent system will be guided by a pattern language tailor-made for intelligent systems. In the case of weavers, the design of each tapestry or carpet reflects an understanding about patterns representing the accumulation of wisdom about aesthetically-pleasing objects, in effect, a standard which is approximated by each new artifact created by a weaver. In either case, a design is considered good if it provides to some degree a satisfactory (comfortable, beckoning, safe) place for humans. Similarly, the design of approximation spaces provides gateways to knowledge about intelligent system models. An approach to the solution of the behavioral system model classification problem in the context of rough sets and a satisfactionbased approximation space is suggested in this article. The term model is an abstraction of a physical system with a specific purpose (see, e.g., [16,36]). In this work, a model is a set of interacting objects. The term behavioral system model denotes a set of interacting objects with observable effects of a sequence of events. The term interaction is a specification of how stimuli (e.g., patterns) are sent between objects designed to perform a specific task (e.g., pattern recognition [10,50]). Interaction is best understood in the context of a collaboration between objects, which a realization of a specification for a communicating system behavior. The term behavior denotes the observable effects of a sequence of events in the form of observed stimulation of an object (e.g., arrival of a message) and observed response. A number of the features of the behavior of cooperating agents can be identified with methods used by ethologists in the study of the behavior of biological organisms [7,15,48-49]. For example, one can consider the survival value of a response to a proximate cause (stimulus) at the onset of a behavior by an agent interacting with its environment as well as cooperating with other agents in an intelligent system. At any instance in time, behavior ontogeny (origin and development of a behavior) can be considered as means of obtaining a better understanding of the interactions between agents and the environment in a complex system. Finally, it is helpful to consider features that can be extracted relative to evolution of a behavior. Survival value, proximate cause, ontogeny and evolution of a behavior (the four whys) provide a basis for an ethological explanation of behavior [48]. In this article, a rough set approach to ethology in classifying the behavior of cooperating agents is introduced. A central problem considered in this work is how to reason about complex objects (e.g., a collaboration of objects used to model an intelligent system) using a

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

15

rough mereological framework, where one considers the functor "being a part in degree" as a predicate fjLr where the assertion XfirY means that X is a part of Y in degree at least r [35]. The question considered here is how can we model complex objects relevant for applications such as intelligent systems (e.g., satisfying a given specification to a satisfactory degree), and how can we measure inclusion (closeness) of complex objects. These questions were included in papers on rough mereology (see, e.g., [34]). However, the problem not solved in the earlier work is how to deal with complex objects (e.g., a swarmbot) that are dynamically changing. The problem with complex systems is that although local interactions are definable, it often the case that global behavior is not known exactly. However, using some approximate reasoning methods about global behavior, it is possible to predict (characterize) such behavior to a degree (e.g., some soft constraint will be satisfied to a degree, which can guarantee that the specification for a complex system behavior is satisfied at least to a satisfactory degree). An approach to solving this problem is considered in this article. This article also introduces a hierarchical approach to modeling the behavior of an intelligent system. This paper has the following organization. Basic concepts from rough sets are briefly presented in Sect. 2.2. Basic concepts concerning pattern languages, forms and metamodels appear in Sect. 2.3. A brief presentation of features in a hierarchy of intelligent system models is given in Sect. 2.4. Sections 2.5 and 2.6 present a framework for measuring acceptance of intelligent system models.

2,2 Basic Concepts: Rough Sets This section very briefly covers some fundamental concepts in rough set theory that provide a foundation for the approach to measuring the degree of acceptance of system models. In other words, the rough set approach introduced by Zdzislaw Pawlak [19-24] provides a ground for concluding to what degree a set of candidate design models are a part of a set of design models representing a standard. For computational reasons, a syntactic representation of knowledge is provided by rough sets in the form of data tables. Informally, a data table is represented as a collection of rows each labeled with some form of input, and each column is labeled with the name of an attribute that computes a value using the row input. Formally, a data (information) table IS is represented by a pair (C/, A), where [/ is a non-empty, finite set of objects and A is a non-empty, finite set of attributes, where a : C/ —> Va for every a e A. For example, let X, X' e U, where X, X' represents a model (set of interacting objects) and U is a collection of models. For each B C A, there is associated an equivalence relation lndis{B) such that Ind/5(B) = {(X, X') G U^ | Va G B. a(X) = a(X')} and B(X) denotes a block of B-indiscemible models in the partition of U containing X. This view of the world befits the case where U contains models that are spliced together to form a system model. For X C U, the set X can be approximated from information contained in B by constructing a 5-lower and jB-upper approximation denoted by B*X and B*X, respectively, where B*X = {x G U | B(x) C X} and B*X = {x G U | B(x) n X 7^ 0}.BHcXisa collection of objects that can be classified with full certainty as

16

James F. Peters

members of X using the knowledge represented by attributes in B. By contrast, B*X is a collection of objects representing both certain and possible uncertain knowledge about X. Whenever B*X is a proper subset of B*X, i.e., B*X c B*X, the collection of objects has been classified imperfectly, and X is considered a rough set.

2.3 Pattern Languages, Forms, and Metamodels The idea of a pattern language conforms to the notion of language in semiotics [8]. A language is defined by a set of objects (alphabet), grammar, expressions formed with objects. Similarly, a pattern language consists of a set of patterns, grammar for connecting patterns together, and pattern constructions (patterns pieced together) (see, e.g., [3]). Pattern constructions can be quite complex, and can include overlapping patterns (e.g., an observer pattern sometimes overlaps with a memento pattern in system models, where an observer records changes in the behavior of an observable object, and also acts as a caretaker in causing the internal state of an object (a memento) to be saved). The observer and memento patterns are examples of behavioral patterns (see, e.g., [12]). Each pattern is defined by a conjunction of feature values. A row in a table of feature values for various members of a class can be thought of as an approximate description of an object belonging to a class. Of course, this is reminiscent of an idea in Plato's dialogues where a/orm is a pattern fixed in the nature of things (see, e.g., [31]). Patterns found in tables of feature values derived from experiments can be viewed in some sense as approximations ("likenesses") of the ideal. Consider, for simplicity, a table where all of the rows of feature values are associated with objects belonging to a single class. In effect, such a table of feature values can be viewed as an imperfect imitation of an ideal table (see, e.g., Watanabe [50]). More recently, determination of membership of objects in a class has been viewed relative to categories (clusters of objects with similar patterns), categorical perception and perceptual learning (see, e.g., [11, 13]). A set of objects in a model that describes a system has a "form" recognizable from its feature values (patterns) that identify it as a recognizable member of a class. In the context of modeling intelligent systems, a model is an abstraction that describes interactions between cooperating objects [27]. The term intelligence is restricted to mean pattern recognition capability. In architecture, a pattern name (e.g., comer grocery, beer hall, traveler's inn, bus stop) is used instead of feature values to identify a category or cluster of similar objects (see, e.g., [3])). For example, instead of giving feature values for a comer grocery pattem (e.g., display window size, entrance size, number of counters, and so on), an architect uses the pattem name as a guide (initially, without specifics) in planning a town. This is a good idea because it simplifies the assembly of an architecture (connected pattems) which have infinite variability [1]. This also suggests the possibility of a metamodel for stmctures (either physical or abstract). A metamodel defines a language for models. That is, a metamodel provides a description of models (possibly metamodels) connected together to form a cohesive stmcture or system. In effect, stmctural and functional features in design pattems can be extracted at different levels in an intelligent system model. The complete model for an intelligent system then takes the form of

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

17

layers. In general, a behavioral model for an intelligent system design is represented by a hierarchy of models (see, e.g.. Fig. 2.1). Each layer in the hierarchy contains a set of interacting objects. Patterns commonly found in models for intelligent system designs can be gleaned from diagrams [25-29] using UML, the Unified Modeling Language [16], especially in the context of systems engineering (see, e.g., [14, 17]). A hierarchical model of a swarmbot is briefly considered next by way of illustration of our approach. A swarmbot is a self-organizing collection of cooperating bots. Example 3.1 Hierarchical Description of a Swarmbot. A hierarchical description of a swarmbot (shot) is tree-shaped with a metamodel for a swarmbot at its root, which represents cooperative behavior of a collection of evolving bots (see Fig. 2.1).

Fig. 2.1. Layers in hierarchical swarmbot model The metamodel for an sbot has its counterpart in nature. For example, Tinbergen has observed that cooperation of a flock of flying starlings seems so perfect that one forgets the individuals and thinks of them as one huge "super-individual" [49]. The intelligence of a swarmbot is measured relative to the degree to which it learns to recognize patterns. The idea for a swarmbot comes from [6, 9]. The swarmbot metamodel subsumes metamodels for a bot, conununication channel, parameterized approximation space PAS(U,I^,z/$, |=) explained in [27], bot subsystem, and swarm intelligence components (see Fig. 2.2). Each layer in the hierarchy will have its own information table used to classify feature-value patterns (see Fig. 2.1). Channels are used for interaction (message-passing) between bots. The evolution of the behavior of a swarmbot is the result of the evolving behavior of its subsystems, which are sub-swarmbots (sub-collections of cooperating, communicating bots). One of the objects in the swarmbot model in Fig. 2.2 is a metamodel

18

James F. Peters

of a bot (see Fig. 2.3). The high-level view of a hot in Fig. 2.3 includes metamodels of hot machines, subsystems, approximator, and bot components. Swarmbot Model

{

9 |l.* 1..*

1 1"* y

Bot Model

It

1

1>

PAS(U4„i%,l=)

0..*

l^y^approximate

Bot Subsystem Model

y 1... 1

1..*

1..*

interconnect

1 -•

k

1..* SwarmlntelligenceComponents

Channel Model

^^

evolve behavior

Fig. 2.2. Swarmbot model

Bot Model

—0— 1..*

1..* Bot Machine

Bot Subsystem 1..*

i.t

1..* 1..*

classify pattern <

Approximator 1..* BotComponents

evolve

Fig. 2.3. Bot Metamodel

A bot includes subsystems responsible for the evolving behavior of the machines as well as channels that make it possible for a bot to communicate with other bots. An approximator implements an approximation space used by a bot to classify patterns in sensor signals and in messages received from other bots. A bot machine (see Fig. 2.5) relies on sensor measurements to engage in collaborative approximate reasoning (AR).

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

19

The study of infomorphisms introduced in [5, 44] provides a basis for approximate reasoning in a swarmbot relative to exchanges between bots that manage, classify, guide decision-making, and control local actions. Local bot behavior is a consequence of learning to recognize patterns of values from local sensors or from information received from other bots in information nets. The channel model in Fig. 2.2 is partially represented by the infomorphisms in the information net shown in Fig. 2.4. The two-node net in Fig. 2.4 represents an infomorphism between information systems, where U, IS, Z'(IS), ||*||/5, (f, g) denote universe of objects, information system IS, set of formulas over IS, meaning of • in IS, and a pair mappings between information systems. Let ISi = (Ui, Ai), 182= (U2,A2), f : r ( I S i ) -> ^ d S s ) , g : U2 -> Ui. An infomorphism is a pair of mappings (f, g) such that g(x) | =isi OL if, and only if x I =752 f(ck:) for all objects x G U2 and for all formulas a G I^CISi). The notation "g(x) 1=751 a" reads "g(x) satisfies formula OL\ This makes it possible to define formulas approximately in IS2 by means of formulas obtained from ISi. Basically, this means that the information system for a local bot must be translated into something meaningful for remote bots in an information net. A bot machine occupies one of leaves (an extremity at the lowest level in the hierarchy of a swarmbot model), and provides a model for a physical realization for a bot (see Fig. 2.5). The metamodel for a bot machine is distinguished by being equipped not only with models for sensors, actuators and other components normally associated with a bot but also with an approximate reasoning component that includes a connection to a communication channel. In effect, a result computed by approximate reason (e.g., pattern) becomes input to an infomorphism that maps a message to other bots. f

2(ISi)

Fig. 2.4. Two-Node Information Net Approximate reasoning in a swarmbot results from measurements by local bot sensors, local bot feature values as well as exchanges of messages where the net effect is a form of layered learning (see, e.g., [47]) in the context of approximation spaces. Local bot behavior is a consequence of learning to recognize patterns of values from local sensors or from information received from other bots in information nets.

20

James F. Peters

2.4 Features: A Hierarchical Perspective for an Intelligent System In a hierarchical model of the behavior of an intelligent system, features can be identified for each layer of the hierarchy (see, e.g.. Tables 2.1 and 2.2). A number of features of the behavior of an agent in an intelligent system can be discovered with ethological methods. Ethology is the study of behavior and interactions of animals (see, e.g., [7, 48-49]). The biological study of behavior provides a rich source of features useful in modeling and designing intelligent systems in general. It has been observed that animal behavior has patterns with a biological purpose and that these behavioral patterns have evolved. Similarly, patterns of behavior can be observed in various forms of intelligent systems, which respond to external stimuli and which evolve [15]. In the search for features of intelligent system behavior, one might ask Why does a system behave the way it doesl Tinbergen's four whys are helpful in the discovery of some of the features in the behavior of intelligent systems, namely, proximate cause (stimulus), response together with the survival value of a response to a stimulus (e.g., m3, b3 in Table 2.1), evolution, and behavior ontogeny (e.g., m4 in Table 2.1) [7, 48].

Fig. 2.5. Bot Machine

A proximate cause (e.g., excitation, recognized pattern, arrival of a message) triggers a behavior occurring currently. Let x denote a stimulus. From Table 2.1, ml(x) (proximate cause) in the behavior of an Shot can be rewritten (ml, bl)(x) relative to bl (proximate cause) in the behavior of a bot^ If cl (bot machine type) and dl ^Complex proximate cause patterns are possible. For example, let bl 1, bl2 denote proximate cause in the behavior of a pair of bots in an shot, then we obtain a different pattern, namely, (ml, b l l , bl2)(x).

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

21

Table 2.1. Sample Swarmbot (Shot) and Bot Features Shot ml

m2 m3 m4 m5 m6 m7 Context

Explanation Proximate cause G {arrival of a message, intruder alarm, repair, survey} Evolution G {migrate, selforganize} Response G {classify, hide, filter, explore} Growth level G [0, 1] Robustness level G [0, 1] Networking level G [0, 1] m7 G {x 1 X G [70.0, lyj^y] finding Energy

Or X G

Bot bl

b2 b3 b4 b5 b6 b7 Context

Explanation Proximate cause G {hunger, avoidance, recognition, memento, observer} Evolution G {mutation, selection, reproduction} Response G {hunt, observe, protect, classify, learn} Skill-Level G [0, 1] Robustness G [0, 1] Habituation G [0, 1] b 7 G {X 1 X G [70.0, o r X G [71.0,

10.9]imprinting

ll.9]protecting}

[71.0, 7 1 . 9 ] p r o t e c t i n p }

m8 Forces

m8 G {x 1 X G [80.0, b8 0\j.y\storingEnergy Or X G Forces

b8 G {x 1 X G [80.0, S0.9]marking or X. G [ o l . U ,

Ol.yjidenfyinglntrudersJ

[81.0,81.9],ecurity}

m9 Prob. Sol'd mlO Skill

b9 b9 G {x Prob. 90.9]findingFood Sol'd ^ ^ '^\ security J blO blO G {x 10.9]ciassify or X G [71.0, Coupling 10.9]pathFinding a9 G {x 1 X G [90.0, ^yJ'^ipower Drain or Xe [91.0, 91.9]security} alO G {x 1 X G [70.0, / I'^jrecognizinglntruder

l*-^ Swarmbot

j

DBot G {null (0), biped(l), DBot flying(2)}

/ i.^jpattern

1

X Or

G X

1 X Or

G

G X

G

[90.0, [91.0,

[70.0, [71.0,

Recognition J

Dsot G {null (0), explorer(l), sentry(2)}

(pattern category) from Table 2.2 are considered relative to proximate causes for shot behavior, then ((ml, bl), cl, dl)(x) yields a causal behavior pattern in an sbot. Tinbergen's survival value of a behavior (represented by m3 or response in Table 2.1) feature correlates with proximate cause, and represents a response to a stimulus. This is feature m3 (also, b3) in Table 2.1. The ontogeny of a behavior is defined by its origin and development (represented by m4 or growth-level in Table 2.1). Relative to intelligent system behavior, ontogeny of a behavior is partially discovering by tracking of the number of correctly classified patterns relative to the total number of patterns processed by a system. A decline in m4 serves as an indicator of a deterioration in a behavior (e.g., a bot failing to recharge its battery when its energy level is low, where low energy level defines a pattern that is part of a recovery strategy in the behavior of a bot). In a more complex sbot information table, behavior ontogeny would be represented by more than one feature (e.g., internal influence, external influence, error rate in learning to cope with environmental or internal changes). Evolution is concerned with how a behavior develops within and across species. In addition, robustness, context, consequence of a model (forces), and problem-solved are features common to each level of the hierarchy. It should be noted that discoveries about survival strategies found in the behavior of biological organisms (e.g., synchronization, intention

22

James R Peters

movements, social cooperation among the same species of fish such as minnows or birds such as Jackdaws, Terns and various songbirds [49]), carry over into the study of swarmbot behavioral patterns. Briefly, context, forces, problem solved denotes a Table 2.2. Sample Bot Machine (Bmc) and AR Model Features Bm/c cl c2 c3 c4 c5 c6 c7 Contxt

Explanation Machine type Sensor type Actuator type Skill level G [0, 1] Robustness level € [0, 1] Habituation level € [0, 1] m? G {x 1 X = G [70.0, 70.9]robotic or X G [71.0,

AR dl d2 d3 d4 d5 d6 d7 Contxt

Explanation Pattern category Object population size Paradigm population size Skill level G [0, 1] Robustness level G [0, 1] Number of infomorphisms b7 G {x 1 X = G [70.0, 70.9], XG[71.0,71.9]neur-ai}

d8 Forces

b8 G {x 1 X = G [80.0, 80.9]/iiter or x

c9 Prob.

1 \--y\humanoidj m8 G {x 1 X = G [80.0, ^0.9] finding Food Or X G [81.0, 81.9]security} a9 G {x 1 X = G [90.0, ^yJ'y\survival

d9 Prob.

b9 G {x 1 X = G [90.0, 9Q.9Uonitor

clO Skill

orxG \9\.Q,9l.9\secuTity} alO G {x 1 X = G [70.0, 70.9]ciassi/y or X G [71.0,

dlO Cpling

1 - 1 loosely coupled in X| )/|X|

^AR

D G {0, l(rough), 2(perceptual)}

c8 Forces

1 \..^\recognizingl7xtruder Dsmc

G [81.0,

U.9]classify]

O r x G [91.0,91.9]respon5e}

/

{null (0), explore(l), sentry(2)}

measure of the degree that a behavior conforms to a standard, a measure of the degree that an shot does what it has been designed to do, and a measure of the degree that an shot solves a specific problem, respectively. Certain specialized features can also be identified at each level of the hierarchy. For example, machine, sensor, and actuator features of a bot machine listed in Table 2.2 become specialized relative to project requirements. It should also be observed that each level subsumes the features of the lower level. For example, relative to a stimulus x, the shot feature ml (perception as a proximate cause of a behavior) can be represented by (mlperception, blobserver, ^^^ciassify, ^^cate9ory){^)' TWs says that a pcrccptual activity by an shot stimulates an observer, which classifies a stimulus x relative to a specified category (i.e., measure of the degree that x is a part of a set identified by a category).

2.5 Approximation Space for Intelligent System Models In some cases, a set of objects for a system design will represent a standard model for a design only partially, especially in cases where a set of objects represents overlapping or incomplete combinations of design patterns. In other words, there is the possibility that a particular set of objects does not match exactly the model for a particular pattern. In fact, we can organize our approximate knowledge about a set

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

23

of designs by considering design models in the context of an approximation space [24, 27-28, 38-41, 46]. The classical definition of an approximation space given by Zdzislaw Pawlak in [24] is represented a pair (U, Ind), where the Indiscemibility relation Ind is defined on a universe of objects U. As a result, any subset XofU has an approximate characterization in an approximation space [46]. A generalized approximation space was introduced by Skowron and Stepaniuk in [38-39]. A generalized approximation space is a system GAS = (U, I, u) where • • •

[/ is a non-empty set of objects, and p(U) is the powerset of U. / : U -^ p(U) is an uncertainty function. u : p(U) X p(U) —> [0, 1] denotes rough inclusion

The uncertainty function defines a neighborhood of every object x belonging to the universe U. For example, the sets computed with the uncertainty function I(x) can be used to define a covering of U [35]. The rough inclusion function i/ computes the degree of overlap between two subsets of U. In general, rough inclusion u : p(U) x p(U) —> [0, 1] can be defined in terms of the relationship between two sets where

I

1

, Otherwise

for any X, Y C U [46]. In the case where X C Y, then z/(X, Y) = 1. The minimum inclusion value i/(X, Y) = 0 is obtained when X fl Y = 0 (i.e., X and Y have no elements in common). Other forms of approximation spaces have been considered in approximate reasoning (see, e.g., [27]) and rough neural computing (see, e.g., [18]). In a hierarchical model of an intelligent system, one or more approximation spaces would be associated with each layer. Example 5.1 Approximation Space for a Swarmbot To set up an approximation space for a swarmbot, let ISsbot = (U, A) be an information system, where U is a non-empty set of models, and A is a set of features drawn from Tables 2.1 and 2.2. Assume that I^ : p(U) ^ p(U) is an uncertainty function that computes a subset of U relative to parameter B (subset of attributes in A). For example, let X C U, B C A, and let IB(X) compute B*(X), a B-upper approximation of X. Further, let Y C U represent a standard for swarmbot models, and let B(Y) be a partition of U containing Y (i.e., B(Y) contains models that are equivalent to Y relative to B). Then I/B(B*(X), B ( Y ) ) = |B*(X)nB(Y)|/|B(Y)|. In effect, rough inclusion Z/B(B*(X), B ( Y ) ) measures the extent that an upper approximation is included in a partition of the universe containing B-equivalent sets of interacting objects representing a design standard.

2.6 IMeasuring Acceptance of Intelligent System IModels In this section, an approach to measuring the degree of acceptance of intelligent system models is considered relative to a form of satisfaction-based approximation space, which has been inspired by [5,44].

24

James F. Peters

Definition 1. Rough Inclusion Satisfaction Relation. Let the threshold th G [0, I), and let X, 7 G p(U), B C.A (feature set), where U is a non-empty set of objects. Then X \=Y,th B if and only ifuB(X, Y) > th. That is, X satisfies B if and only if the rough inclusion UB(X, Y) value is greater than or equal to some preset threshold th. The threshold th serves as a filter for a set of candidate models X for system designs relative to a standard Y represented by a set of acceptable models. In the case where the overlap between a set of models X for an intelligent system design and a standard Y is greater than or equal to threshold th, then X has been satisfactorily classified relative to the standard. Let \=Ystd,th denote a rough inclusion satisfaction relation that has been specialized relative to standard Ystd and a threshold th. Basically, then, the measurement of the acceptability of intelligent system models relative to standard Ystd can be considered in the context of PASsat» a parameterized approximation space that includes a satisfaction relation, where PASgat = (U,I^,i/$,|=ystd,t/i). This form of an approximation space was introduced in [27]. Satisfaction-based approximation spaces are also explored in [28]. vibration damper power line

^

];^^\ ^"^^

'

camera used by inspection subsystem

line-crawling inspect-bot

expandable pipe used to guide bot under obstacle

Fig. 2.6. Left: Inspect-bot; Right: Caterpillar swarmbot

2.7 Line-Crawling Swarmbot A brief description of a swarmbot that resembles a caterpillar (cat-sbot) is given in this section. The cat-sbot has been designed to crawl along powerlines during inspection of various structures (e.g., towers, conductors, vibration dampers, insulators) is given in this section. An individual bot in the cat-sbot is equipped with one or more cameras used to inspect and classify power system structures (see Fig. 2.6). Such a bot is called an inspect-bot (it consists of two or more bots, one bot that handles locomotion and one or more independent bots with vision systems for inspection). The appendages of an inspection bot give the bot the ability to hang onto and roll along an unobstructed length of line. However, an inspect-bot has a very simple design and

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

25

does not have the capability to navigate by itself around obstacles such as vibration dampers and line clamps. This bot requires the assistance (a form of pushing action) of another bot to crawl around an obstacle. Such cooperation between bots is one of the hallmarks of swarmbot behavior (see, e.g., [6,9,17]), and in multiagent systems containing independent agents that exhibit some degree of coordination (see, e.g., [47]). The cat-bot in Fig. 2.6(right) shows multiple inspect-bots, which cooperate to navigate along a powerline with many obstacles. Example 7.1 Decisions about observed shot behavior The basic model for an shot given in Fig. 2.2 has been specialized to solve a power system equipment inspection problem. For simplicity, the original types of proximate causes, evolution and response in Table 2.1 have been numerically coded. This coding is shown in Table 2.3. So, for example, ml = 2 denotes the fact that the Table 2.3. Numerically coded why-types Sbot ml

m2 m3

Explanation Proximate cause G {1 communication, 2: inspect, 3: repair, 4: perception} Evolution G {1: migrate, 2: self-organize} Response € {1: build, 2: filter, 3: explore}

Bot bl

Explanation Proximate cause G {1: hunger, 2: avoidance, 3: recognition, 4: memento, 5: observer}

b2

Evolution G {1: mutation, 2: selection, 3: reproduction} Response G {1: hunt, 2: observe, 3: protect, 4: classify, 4: learn}

b3

current stimulus (proximate cause of an sbot behavior) is an inspect signal, and m3 = 3 denotes a sbot response, which is an exploration activity by an sbot. A sample decision table used to record some of the feature values for a number of different behaviors exhibited by a swarmbot is given in Table 2.4. In Table 2.4, ml, m2, m3. Table 2.4. Partial Feature Value Table X XI X2 X3 X4 X5 X6 X7 X8 X9

ml 2 3 1 3 4 4 3 1 2

m2 1 2 1 2 2 2 2 2 1

m3 3 3 2 3 3 3 3 3 3

m4 0 0 0 0 0 0 0 0 0

bl 3 4 3 4 1 1 4 4 3

b2 1 3 1 3 1 1 3 3 1

b3 2 3 3 3 1 1 1 3 2

b4 0 0 0 0 0 0 0 0 0

b6 0 0 0 0 0 0 0 0 0

d 1 0 1 0 1 0 0 1 1

m4 correspond to Tinbergen's four whys for a swarmbot. Also included in this table are bl, b2, b3, b4 representing proximate cause, evolution, response, and skill-level respectively for a bot built into the swarmbot. Feature b6 represents bot habituation. Decision d = 1 indicates that the particular sbot behavior typifies a caterpillar swarmbot (cat-sbot), where d = 0 (reject) indicates that the sbot behavior does not represent

26

James F. Peters

a cat-sbot. Example 7.2 Approximating a Set of Decisions about cat-sbot Models From Table 2.4, for simplicity, let U = {XI, X2, X3, X4, X5, X6, X7, X8, X9}, and let D = {X I Design(X) = 1}. Let B = {m4(sbot growth), bl (bot proximate cause), b3 (bot response), b4 (bot skill-level)} as in Table 2.5. Then consider the approximation Table 2.5. Cat-sbot Feature Table m4 0 0 0 0 0 0 0 0 0

X XI X2 X3 X4 X5 X6 X7 X8 X9

bl 3 4 3 4 1 1 4 4 3

b3 2 3 3 3 1 1 1 3 2

b4 0 0 0 0 0 0 0 0 0

d 1 0 1 0 1 0 0 1 1

of set D relative to B-pattems from Table 2.5. Decision Value = 1 Equivalence Classes for Attributes: {m4, bl, b3, b4} [attr. values]:{equivalence classes} [0,1,1,0] :{X5,X6} [0,3,2,0]:{X1,X9} [0, 3, 3, 0 ] : {X3} [0,4, 1,0]: {X7} [0,4, 3, 0 ] : {X2, X4, X8} Decision Class Stimuli: {XI, X3, X5, X8, X9} B*D = {XI, X3, X9} B*D = {XI, X2, X3, X4, X5, X6, X8, X9} BN^(D) = {X2, X4, X5, X6, X8} [B-boundary region of D] U - B*D = {X7} [B-outside region of D]

aeiD)

m

0.375

[approximation accuracy]

{X7{

U-B*(D) BNB(D)

{X2, X4, X5, X6, X8} B.(D) cat-slM)t

t-sbot/ at-sbot

|Xl,X3,X9)

1

f

Fig. 2.7. Approximation of Set of Decisions

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models 27 Hence, the set D is considered rough or inexact, since the accuracy of approximation of D is less than 1. A visualization of approximating the set of design decisions relative to our knowledge in B about the behavior of the experimental shots is shown in Fig. 2.7. The non-empty boundary region in Fig. 2.7 is an indication that the set D cannot be defined in a crisp manner. Example 7.3 Upper Approximation as a Standard Consider an approximation space (U, IB,J^B) with B equal to a feature set, and where IB : P(U) ^ p(U), where IB(X) = B*(X) i^B : p(U)xp(U)-. [0,1], where ^ B ( B ( X ) , B^D) = l^m^^'Dl Again, let B = {m4, bl, b3, b4} in Table 2.5. Let X denote a set of interacting objects that cooperate to achieve shot design objectives. Relative to the knowledge represented by the feature set B and the set D = {X | Design(X) = 1, cat-sbot design pattem}= { XI, X3, X5, X8, X9}, the following upper approximation has been computed. B*D = {XI, X2, X3, X4, X5, X6, X8, X9} The partition of U is given by B(X5) = {X5, X6} B(X1) = {X1,X9} B(X3) = {X3} B(X7) = {X7} B(X2) = {X2, X4, X8} In this case, assume that IB(D) computes B*D, and compute i^{m4,6i,fe3,M}(B(X), B*D). Then construct Table 2.6, where the threshold th = 2/8. Table 2.6. Rough Inclusion Table B(X) B(X1) = {X1,X9} B(X2) = {X2, X4, X8} B(X3) = {X3} B(X5) = {X5, X6} B(X7) = {X7}

l^B

|B(Xl)nB*D|/|B*D 1=2/8 |B(X2) n B*D |/| B*D 1= 3/8 |B(X3) n B*D |/| B*D 1= 1/8 |B(X4) n B*D |/| B*D 1= 2/8 |B(X7)nB*D|/|B*D|=0

Result (th = 2/8) accept accept reject accept reject

The upper approximation B*(D) represents our knowledge about a particular set of shot behaviors relative to our knowledge reflected in B. That is, every set of behaviors in the upper approximation has been classified as a possible cat-sbot. A comparison of the upper approximation with each of the equivalence classes ranging over the observed shot behaviors provides an indication of where mismatches occur. The degree of mismatch between the upper approximation and a block of equivalent behaviors is measured using v^. In four cases in Table 2.6, i/^C B(X1), B*D), UB^ B(X2), B*D), UB{ B ( X 3 ) , B * D ) , VB{ B ( X 5 ) , B * D ) , there is an overlap between the observed behaviors in B(X1), B(X2), B(X3), B(X5) and B*D. This indicates that the behaviors in blocks B(X1), B(X2), B(X3), B(X5) are related to behaviors in B*D to a degree. The issue now is to devise a scheme provided by an approximation space relative to standards for classifying shot behaviors and a measure of when an shot behavior is satisfactory.

28

James F. Peters

Assume that the threshold th = 0.25 in the rough inclusion satisfaction relation. In what follows, assume that Ystd presents a collection of similar models for a system Sys, where each Y € Ystd is a model for the design of a subsystem of Sys. Further, assume that each partition of the universe represented by B(X) contains candidate models for the design of Sys. The outcome of a classification is to cull from partitions of the set of experimental models those models that satisfy the standard to some degree. Next construct a classification table (see Table 2.5). From Table 2.6, B(X1), B(X2), and B(X5) satisfy the standard. Since UB{ B ( X 3 ) , B * D ) is below the threshold th, it is rejected along with B(X7). The choice of the threshold is somewhat arbitrary and will depend on the requirements for a particular project.

2.8 Conclusion The end result of viewing intelligent system models within a satisfaction-based approximation space is a rough mereological framework that provides with a means of grading (i.e., measuring the extent) a set of intelligent system models are a part of a set of models representing a project standard to an acceptable degree. Since it is common for models of subsystem designs to overlap, a subsystem model extracted from a complete legacy system model has the appearance of a fragment, something incomplete when compared with a standard. Hence, it is appropriate to use approximation methods to measure the extent that an experimental model is to a degree a part of a model representing a standard. Ultimately, it is important to consider ways to model and classify the behavior of an intelligent system as it unfolds. Rather than take a rigid approach where a system behavior is entirely predictable based on its design, there is some benefit in relaxing the predictability requirement and considering how one might gain approximate knowledge about evolving patterns of behavior in an intelligent system. The studies of animal behavior by ethologists provide a number of features useful in the study of changing intelligent system behavior in response to environmental (sources of stimuli) as well as internal influences (e.g., image classification results, battery energy level, response to behavior pattern recognition, various forms of learning). Behavior decision tables would normally be constantly changing during the life of an intelligent system. As a result, there is a need for a cooperating system of agents to gain, measure, and share knowledge discovery about changing behavior patterns. This article presents a suggested rough set approach to coping with behavior pattern classification and measurement in the context of approximation spaces. Acknowledgements The author gratefully acknowledges the profound insights and suggestions made by Andrzej Skowron concerning this paper. The author also wishes to thank Maciej Borkowski, Dan Lockery and Peter Schilling for their recent work on the design of a line-crawling shot that provides a basis for the illustration of swarm intelligence described in this article. This research has been supported by Natural Sciences and Engineering Research Council of Canada (NSERC) grant 185986 and grants T209, T217, T137, and T247 from Manitoba Hydro.

2 Approximation Spaces for Hierarchical Intelligent Behavioral System Models

29

References 1. Alexander, C : The Timeless Way of Building. Oxford University Press, UK (1979) 2. Alexander, C : Notes on the Synthesis of Form. Harvard University Press, Cambridge, MA (1964) 3. Alexander, C , Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, S. Angel, I.: A Pattern Language. Oxford University Press, UK (1977) 4. Alexander, C : A Foreshadowing of 21** Century Art. The color and geometry of very early Turkish carpets, Oxford University Press, UK (1993) 5. Barwise, J., Seligman, J.: Information Flow. The Logic of Distributed Systems. Cambridge University Press, UK (1997) 6. Bonabeau, E., M. Dorigo, M., G. Theraulaz, G.: Swarm Intelligence. From Natural to Artificial Systems, Oxford University Press, UK (1999) 7. Cheng, K.: Generalization and Tinbergen's four whys. Behavioral and Brain Sciences 24 (2001)660-661. 8. Curry, H.B.: Mathematical Foundations of Logic. Dover Publications, NY (1963) 9. Dorigo, M: Swarmbots, Wked (Feb. 2004) 119 10. Duda, R.O., Hart, PE., Stork, D.G.: Pattern Classification, Wiley, Toronto (2001) 11. Fahle, M, Poggio, T. (Eds.): Perceptual Learning, The MIT Press, Cambridge, MA (2002) 12. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Toronto (1995) 13. Harnad, S. (Ed.): Categorical Perception. The Groundwork of cognition, Cambridge University Press, UK (1987) 14. Holt, J.: UML for Systems Engineering. Watching the Wheels. The Institute of Electrical Engineers, Herts, UK (2001) 15. Kruuk, H.: Niko's Nature. A life of Niko Tinbergen and his science of animal behavior. Oxford University Press, London, 2003. 16. OMG Unified Modeling Language Specification. Object Management Group, http://www.omg.org. 17. Mondada, R, Bonani, M., Magnenat, S., Guignard, A., Floreano, D.: Physical connections and cooperation in swarm robotics. In: Frans Groen, Nancy Amato, Andrea Bonarini, Eiichi Yoshida and Ben Krose editors. Proceedings of the 8th Conference on Intelligent Autonomous Systems (IAS8), Amsterdam, NL, March 10-14 (2004) 53-60. 18. Pal, S.K., Polkowski, L., Skowron, A. (eds.): Rough-Neural Computing. Techniques for Computing with Words. Springer-Verlag, Heidelberg (2004) 19. Pawlak, Z.: Rough sets. International J. Comp. Inform. Science. 11 (1982) 341-356 20. Pawlak, Z.: Rough sets and decision tables. Lecture Notes in Computer Science, Vol. 208, Springer Verlag, Berlin (1985) 186-196 21. Pawlak, Z.: On rough dependency of attributes in information systems. Bulletin Polish Acad. Sci. Tech., 33(1985) 551-599 22. Pawlak, Z.: On decision tables. Bulletin Polish Acad. Sci. Tech., 34 (1986) 553-572 23. Pawlak, Z.: Decision tables—a rough set approach, Bulletin ETACS, 33 (1987) 85-96 24. Pawlak, Z.: Rough Sets. Theoretical Reasoning about Data. Kluwer, Dordrecht (1991) 25. Peters, J.F.: Design patterns in intelligent systems. Lecture Notes in Artificial Intelligence, Vol. 2871, Springer-Verlag, Berlin (2003) 262-269 26. Peters, J.F., Ramanna, S.: Intelligent systems design and architectural patterns. In: Proceedings IEEE Pacific Rim Conference on Communication, Computers and Signal Processing (PACRIM'03) (2003) 808-811 27. Peters, J.F. : Approximation space for intelligent system design patterns. Engineering Applications of Artificial Intelligence, Vol. 17, No. 4 (2004) 1-8

30

James F. Peters

28. Peters, J.F., Ramanna, S.: Measuring acceptance of intelligent system models. In: Proc. KES 2004 [to appear] 29. Peters, J.F., Ramanna, S.: Hierarchical behavioral model of a swarmbot. In: Proc. AIMethod 2004 [to appear] 30. Peters, J.F., Skowron, A., Stepaniuk, J., Ramanna, S.: Towards an ontology of approximate reason. Fundamenta Informaticae, Vol. 51, Nos. 1, 2 (2002) 157-173 31. Plato: Parmenides. In: Hamilton, E., Cairns, H. (Eds.): The Collected Dialogues of Plato Including the Letters, Princeton University Press, NJ (1961) 32. Polkowski, L. and Skowron, A. (eds.): Rough Sets in Knowledge Discovery. Vol. 1, Physica-Verlag, Heidelberg (1998a) 33. Polkowski, L. and Skowron, A. (eds.): Rough Sets in Knowledge Discovery. Vol. 2, Physica-Verlag, Heidelberg (1998b) 34. Polkowski, L. and Skowron, A.: Rough mereology: A new paradigm for approximate reasoning. Int. J. Approximate Reasoning, 15/4 (1996) 333-365 35. Polkowski, L.: Rough Sets. Mathematical Foundations. Physica-Verlag, Heidelberg (2002) 36. Sandewall, E.: Features and Fluents. The Representation of Knowledge about Dynamical Systems. Vol. 1. Clarendon Press, Oxford (1994) 37. Skowron, A.: Toward intelligent systems: Calculi of information granules. In: Hirano, S. Inuiguchi, M., Tsumoto S. (eds.). Bulletin of the International Rough Set Society, Vol. 5, No. 1/2(2001)9-30 38. Skowron, A. Stepaniuk, J.: Generalized approximation spaces. In: Proceedings of the 3'^'^ International Workshop on Rough Sets and Soft Computing, San Jose (1994) 156-163 39. Skowron, A. Stepaniuk, J.: Generalized approximation spaces. In: Lin, T.Y., Wildberger, A.M. (Eds.), Soft Computing, Simulation Councils, San Diego (1995) 18-21 40. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae, 27 (1996) 245-253 41. Skowron, A., Stepaniuk, J.,: Information granules and approximation spaces. In: Proc. of the 1^^ Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU'98), Paris (1998) 1354-1361 42. Skowron, A., Stepaniuk, J.,: Information granules and rough neural computing. In: [18], 43-84. 43. Skowron, A. Stepaniuk, J.: Information granules: Towards Foundations of Granular Computing. Int. Journal of Intelligent Systems, 16, (2001) 57-85 44. Skowron, A., Stepaniuk, J. Peters, J.F.: Rough sets and infomorphisms: Towards approximation of relations in distributed environments. Fundamenta Informaticae, Vol. 54, Nos. 2, 3 (2003) 263-277 45. Skowron, A. and Swiniarski, R.W.: Information granulation and pattern recognition. In: [18], 599-636 46. Stepaniuk, J.: Approximation spaces, reducts and representatives. In: [33], 109-126 47. Stone, P.: Layered Learning in Multiagent Systems. A Winning Approach to Robotic Soccer. The MIT Press, Cambridge, MA, 2000. 48. Tinbergen N.: On aims and methods of ethology, Zeitschrift fiir Tierpsychologie 20 (1963) 410-433 49. Tinbergen N.: Social Behavior in Animals with Special Reference to Vertebrates. The Scientific Book Club, London, 1953. 50. Watanabe, S.: Pattern Recognition: Human and Mechanical, Wiley, Toronto (1985)

Distributed Adaptive Control: Beyond Single-Instant, Discrete Control Variables David H. Wolpert^ and Stefan Bieniawski^ ^ NASA Ames Research Center, USA, [email protected] ^ Dept. of Aeronautics, Stanford University, USA, [email protected] Summary. In extensive form noncooperative game theory, at each instant t, each agent i sets its state Xi independently of the other agents, by sampling an associated distribution, qi{xi). The coupling between the agents arises in the joint evolution of those distributions. Distributed control problems can be cast the same way. In those problems the system designer sets aspects of the joint evolution of the distributions to try to optimize the goal for the overall system. Now information theory tells us what the separate qi of the agents are most likely to be if the system were to have a particular expected value of the objective function G{x\,X2, ...)• So one can view the job of the system designer as speeding an iterative process. Each step of that process starts with a specified value of E{G), and the convergence of the qi to the most likely set of distributions consistent with that value. After this the target value for Eq{G) is lowered, and then the process repeats. Previous work has elaborated many schemes for implementing this process when the underlying variables xi all have a finite number of possible values and G does not extend to multiple instants in time. That work also is based on a fixed mapping from agents to control devices, so that the the statistical independence of the agents' moves means independence of the device states. This paper also extends that work to relax all of these restrictions. This extends the applicability of that work to include continuous spaces and Reinforcement Learning. This paper also elaborates how some of that earlier work can be viewed as a first-principles justification of evolution-based search algorithms.

3.1 Introduction This paper considers the problem of adaptive distributed control [1, 2, 3]. There are several equivalent ways to mathematically represent such problems. In this paper the representation of extensive form noncooperative game theory is adopted[4, 5, 6, 7, 8]. In that representation, at each instant t each control agent i sets its state x\ independently of the other agents, by sampling an associated distribution, q\{x\). In this view the coupling between the agents does not arise directly, via statistical dependencies of the agents' states at the same time t. Rather it arises indirectly, through the stochastic joint evolution of their distributions {qD across time.

32

David H. Wolpert and Stefan Bieniawski

More formally, let time be discrete, where at the beginning of each t all control agents simultaneously and independently set their states ("make their moves") by sampling their associated distributions. After they do so any remaining portions of the system (i.e., any stochastic part not being directly set by the control agents) responds to that joint move. Indicate the state of the entire system at time t as y*. (y^ includes the joint move of the agents, x*, as well as the state at t of all stochastic elements not directly set by the agents.) So the joint distribution of the moves of the agents at any moment t is given by the product distribution q^{x^) = Yli Qii^l)^ and the state of the entire system, given joint move x*, is governed by P{y* | x*). Now in general the observations by agent i of aspects of the system's state at times previous to t will determine qj. In turn, those observations are determined by the previous states of the system. So qj is statistically dependent on the previous states of the entire system, y^^ ^*^. Accordingly, the system can be viewed as a multistage noncooperative game among the agents and Nature. Each agent plays mixed strategies {ql} at moment t, and Nature's move space at that time consists of those components of the vector y^ not contained in x* [4, 5, 6, 7, 8]. The interdependence of the agents across time can be viewed as arising through information sets and the like, as usual in game theory. For pedagogical simplicity, consider the problem of inducing an optimal state y rather than the problem of inducing an optimal sequence of states. What the designer of the system can specify are the laws that govern how the joint mixed strategy q^ gets updated from one stage of the game (i.e., one t) to the next. The goal is to specify such laws that will quickly lead to a good value of an overall objective function of the state of the system, F{y)? Note that the agents work in the space of x's; all aspects of the system not directly set by the agents, and in particular all noise processes, are implicitly contained in the distribution P{y \ x). Tautologically then, in distributed control the goal is to induce a joint strategy q{x) with a good associated value of Eq{F) = Jdxq{x)E{F I x). Defining the world utility G{x) = J dyF{y)P{y \ x), we can re-express Eq{F) purely in terms of x, as J dxq{x)G{x) = Eq{G).^ Once such a g is found, one can sample it to get a final x, and be assured that, on average, the associated F value is low. In other words, such sampling is likely to give us a good value of our objective. Previous work has has elaborated several iterative schemes for updating product distributions q to monotonically lower Eq{G) [9, 10, 11, 12, 13, 14, 15]. In all of these schemes, each q in the sequence is defined indirectly, as the minimizer of a different G-parameterized Lagrangian, ^{q). Implementing such a sequence of Lagrangian-minimizing ^'s results in the optimal control policy for the distributed system, i.e., in the q minimizing Eq{G). However while one cannot directly solve for the q minimizing Eq{G) in a distributed manner, as elaborated below one can ^Here we follow the convention that lower F is better. In addition, for simplicity we only consider objectives that depend on the state of the system at a single instant; it is straightforward to relax this restriction. '^For simplicity, here we indicate integrals of any sort, including point sums for countable X, with the / symbol.

3 Distributed Adaptive Control

33

solve for the q minimizing each -Sf (^) in a distributed manner. In this way one can find the optimal distributed control policy using a purely distributed algorithm. Many of these schemes are based on a steepest descent algorithm for each step of minimizing a Lagrangian ^{q). Because the descent is over Euclidean vectors g, these algorithms can be applied whether the Xi are categorical/symbolic, continuous, time-extended, or a mixture of the three. So in particular, they provide a principled way to do "gradient descent over categorical variables". Not all previously considered algorithms for how to perform the Lagrangianminimizing step are based on steepest descent. However they do have certain other characteristics in common. One is that the underlying variables Xi all have a finite number of possible values. Another is G does not extend to multiple instants in time. This paper shows how to relax these restrictions, simply by redefining the spaces involved. This allows the previously considered algorithms to be used for continuous spaces, and also implement Reinforcement Learning (RL) [16, 17, 18, 19]. A final shared characteristic is that all of the previously considered algorithms for minimizing the Lagrangians employ a fixed mapping from the moves of agents to the setting of control devices, so that the statistical independence of the agents' moves means independence of the device states. This paper also shows how that restriction can be relaxed, so that independent agents can result in coupled control devices. The general mathematical framework for casting control and optimization problems in terms of minimizing Lagangians of probability distributions is known as "Probability Collectives". The precise version where the probability distributions are product distributions is known as "Product Distribution" (PD) theory [9]. It has many deep connections to other fields, including bounded rational game theory and statistical physics [10]. As such it serves as a mathematical bridge connecting these disciplines. Some initial experimental results concerning the use of PD theory for distributed optimization and distributed control can be found in [20, 21, 15, 22, 23]. [24,20,21,15,22,23]. The next section reviews the salient aspects of PD theory. The section does not consider any of the schemes for the Lagrangian-minimizing step of adaptive distributed control in great detail; the interested reader is directed to the literature. However it is shown in that section how those schemes provide a first-principles justification of certain types of evolution-based search algorithms. The following section presents two ways to cast PD theory for uncountably infinite X as PD theory for countable X. This allows us to apply all the standard finite-X algorithms even for uncountable X. Experimental tests validating one of those ways of recasting PD theory are presented in [25]. The following section shows how to recast single-instant PD theory to apply to the RL domain, in which y is time-extended. That section considers both episodic and discounted sum RL. The final section considers varying the mapping from the moves of agents to the setting of control devices. Experimental tests validating the usefulness of such variations are presented in [15].

34

David H. Wolpert and Stefan Bieniawski

3,2 Review of PD theory Say the designer stipulates a particular desired value of E{G), 7. For simplicity, consider the case where the designer makes no other claims concerning the system besides 7 and the fact that the joint strategy is a product distribution. Then information theory tells us that the a priori most likely q consistent with that information is the one that maximizes entropy subject to that information [26, 27, 28].^ In other words, of all distributions that agree with the designer's information, that distribution is the "easiest" one to induce by random search. Given this, one can view the job of the designer of a distributed control system as an iterative equilibration process. In the first stage of each iteration the designer works to speed evolution of the joint strategy to the q with maximal entropy subject to a particular value of 7. Once we have found such a solution we can replace the constraint — replace the target value of E{G) — with a more difficult one, and then repeat the process, with another evolution of q [9]. To formalize this, define the maxent Lagrangian by ^{q)=^^(q)=P{E,{G)-j)-S = pij dxq{x)G{x) - 7) - S{q),

(3.1)

where S{q) is the Shannon entropy of q, — j dxq{x)\n^^, and for simplicity we here take the prior ^ to be uniform.^. Given 7, the associated most likely joint strategy is the q that minimizes ^{q) over all those {q, /S) such that the Lagrange parameter /? is at a critical point of -Sf^, i.e., such that | ^ = 0. Solving, we find that the qi are related to each other via a set of coupled Boltzmann equations (one for each agent i), q^{xi) oc e

^(0

(3.2)

where the overall proportionality constant for each 2 is set by normalization, the subscript g^^x on the expectation value indicates that it is evaluated according to the distribution Ylj^i qj, and /3 is set to enforce the condition Eqi3{G) — 7. Following Nash, we can use Brouwer's fixed point theorem to establish that for anyfixed/5, there must exist at least one solution to this set of simultaneous equations. In light of the foregoing, one natural choice for an algorithm that lowers Eq{G) is the repeated iteration of the following step: Start with the q^ matching a current 7 value, then lower 7 slightly, and end by modifying the old q^ to find the one that ^In light of how limited the information is here, the algorithms presented below are bestsuited to "off the shelf* uses; incorporating more prior knowledge allows the algorithms to be tailored more tightly to a particular application. ^Throughout this paper the terms in any Lagrangian that restrict distributions to the unit simplices are implicit. The other constraint needed for a Euclidean vector to be a valid probability distribution is that none of its components are negative. This will not need to be explicitly enforced in the Lagrangian here.

3 Distributed Adaptive Control

35

matches the new 7. A difficulty with this iterative step is the need to solve for /? as a function of 7. However we can use a trick to circumvent this need. Typically if we evaluate E{G) at the solutions q^, we find that it is a declining function of fi. So in following the iterative procedure of equilibrating and then lowering 7 we will raise /S. Accordingly, we can avoid the repeated matching of /? to each successive constraint E{G) = 7, and simply monotonically increase p instead. This allows us to avoid ever explicitly specifying the values of 7 [12]. An alternative interpretation of this iterative scheme is based on prior knowledge of the value of the entropy rather of the expected G. Given this alternative prior knowledge, we can recast the designer's goal as finding the q that is consistent with that knowledge that has minimal E{G). This again leads to Eq.'s 3.1 and 3.2. Now raising f3 is cast as lowering the (never-specified) prior knowledge of the entropy value rather than the (never-specified) prior knowledge of E{G). Simulated annealing is an example of this approach, where rather than work directly with q, one works with random samples of it formed via the Metropolis random walk algorithm [29, 30, 31, 32]. There is no a priori reason to use such an inefficient means of manipulating q however. In [12] for example one works with q directly instead. This results in an algorithm that is not simply "probabilistic" in the sense that the updating of its variables is stochastic (as in simulated annealing). Rather the very entity being updated is a probability distribution. Another advantage of casting the problem directly in terms of the maxent Lagrangian is that one can even avoid the need to explicitly stipulate an annealing schedule. In the usual way, first order methods can be used to find the saddle point of the Lagrangian, e.g., by performing steepest ascent of J^ in the Lagrange parameter (3 while performing a descent in g ^. In many situations one should use a modification of the maxent Lagrangian. Whenever one has extra prior knowledge about the problem domain, that should be used to modify the use of entropy as (in statistics terminology) a regularizer. This leads to Bayesian formulations [12]. Similarly, if one has constraints {fi{x) = 0}, the Lagrangian has to be modified to accout for them. The most naive way of doing this is to simply cast the constraints as Lagrange penalty terms {E{fi) = 0} and add those terms to the Lagrangian, in the usual way [12, 23] ^. 3.2.1 How to find minima of the Lagrangian Consider the situation where each Xi can take on a finite number of possible values, \Xi\, and we are interested in the unconstrained maxent Lagrangian. Say we are iteratively evolving q to minimize JSf for some fixed /3, and are currently at some point q in Q, the space of product distributions (i.e., in the Cartesian product of unit ^Formally, since the maxent Lagrangian is not convex, we have no guarantee that the duality gap is zero, and therefore no guarantee about saddlepoints. Nonetheless, just as in other domains, first order methods here seem to work well in practice. ^Note though that since the gradient of entropy is infinite at the border of the unit simplex, we are guaranteed that no component of q will ever exactly equal 0, which typically means that the constraints {fi{x) = 0} will never be satisfied with probability exactly 1.

36

David H. Wolpert and Stefan Bieniawski

simplices). Using Lemma 1 of [12], we can evaluate the direction from q within Q that, to first order, will result in the largest drop in the value of JSf (g):

J^^^u.U)-Y:M.'^m,

(3.3)

where Ui{j) = l3E{G \ xi = j) + \n[qi{j)], and the symbol d^ indicates that we do not mean the indicated partial derivative, formally speaking, but rather the indicated component of the Ist-order descent vector ^. Eq. 3.3 specifies the change that each agent should make to its distribution to have them jointly implement a step in steepest descent of the maxent Lagrangian. These updates are completely distributed, in the sense that each agent's update at time t is independent of any other agents' update at that time. Typically at any t each agent i knows qi{t) exactly, and therefore knows \n[qi{j)]. However often it will not know G and/or the q(^i). In such cases it will not be able to evaluate the E{G \ Xi = j) terms in Eq. 3.3 in closed form. One way to circumvent this problem is to have those expectation values be simultaneously estimated by all agents by repeated Monte Carlo sampling of q to produce a set of (x, G{x)) pairs. Those pairs can then be used by each agent i to estimate the values E{G \ Xi = j ) , and therefore how it should update its distribution. In the simplest version of this, an update to q only occurs once every ^ time-steps. In this scheme only the samples (x, G{x)) formed within a block of ^ successive time-steps are used at the end of that block by the agents to update their distributions (according to Eq. 3.3). There are numerous other schemes besides gradient descent for finding minima of the Lagrangian. One of these is a second order version of steepest descent, constrained to operate over Q. This scheme, called "Nearest Newton" [12], starts by calculating the point p £ V that one should step to from the current distribution q, if one were to use Newton's method to descend the Lagrangian. Now in general that p is not a product distribution. So we instead find q\ IhQ q G Q that is closest (as measured by Kullbach-Leibler distance) to that point p; the step actually taken is to q'. This step from the current q turns out to be indentical to the gradient descent step, just with an extra multiplicative factor of qi{xi = j) multiplying each associated component of that gradient descent step: A{q,ix,=j))

oc qiU)u,iJ) - ^

M^iM^,

(3.4)

where Ui is as defined just below Eq. 3.3, and the proportionality constant is the step size. ^Formally speaking, the partial derivative is given by Ui{j). Intuitively, the reason for subtracting J2^, Ui(xi)/\Xi\ is to keep the distribution in the set of all possible probability distributions over x, V.

3 Distributed Adaptive Control

37

In the continuum time limit, this step rule reduces to the replicator equation of evolutionary game theory, only with an entropic term added in [11]. (Intuitively, that entropic term ensures the evolution explores sufficiently.) This connection can be viewed as a first-principles justification for (particular versions of) evolution-based search algorithms, e.g., genetic algorithms. To be precise, say we have a biological population of many "genes", each specifying a value x, and an associated "fitness function" G{x). Have the frequency of each gene in the population be updated via the replicator dynamics, as usual in evolutionary game theory. We can justify this evolution-based search algorithm as the /5 -^ oo limit of Nearest Newton for the case of a single agent with moves x. By allowing /? < oo, we can extend those evolution-based search algorithms in a principled manner. A final example of a Lagrangian descent scheme, which is analogous to block relaxation, is "Brouwer updating" [11]. In that kind of updating one or more agents simultaneously jump to their optimal distribution, as given by Eq. 3.2 (with /S rather than 7 specified, as discussed above). It turns out that if the expectations defining the Brouwer updating are "exponentially aged" to reflect nonstationarity, then in the continuum time limit Brouwer updating becomes identical to Nearest Newton. The aging constant in Brouwer updating turns out to be identical to the step size in Nearest Newton. All of the update schemes can be used so long as each agent i knows or can estimate qi together with Eq^_.^ {G \ Xi) = E{G \ Xi) for all of its moves Xi. No other quantities are involved.

3,3 Semicoordinate transformations 3.3.1 Motivation Consider a multi-stage game like chess, with the stages (i.e., the instants at which one of the players makes a move) delineated by t. In game theory, strategies are the if-then rules set by the players before play starts [4, 5, 6, 7, 8]. So in such a multistage game the strategy of player i, Xi, must be the set oft-indexed maps taking what that player has observed in the stages t' < t into its move at stage t. Formally, this set of maps is called player i's normal form strategy. The joint strategy of the two players in chess sets their joint move-sequence, though in general the reverse need not be true. In addition, one can always find a joint strategy to result in any particular joint move-sequence. Now typically at any stage there is overlap in what the players have observed over the preceding stages. This means that even if the players' strategies are statistically independent, their move sequences are statistically coupled. In such a situation, by parameterizing the space Z of joint-move-sequences z with joint-strategies x, we shift our focus from the coupled distribution P{z) to the decoupled product distribution, q{x). This is the advantage of casting multi-stage games in terms of normal form strategies. More generally, any onto mapping C : x ^ z, not necessarily invertible, is called a semicoordinate system. The identity mapping z —> z is a trivial example of a semicoordinate system. Another example is the mapping from joint-strategies in a multi-stage game to joint move-sequences is an example of a semicoordinate system.

38

David H. Wolpert and Stefan Bieniawski

In other words, changing the representation space of a multi-stage game from movesequences z to strategies a: is a semicoordinate transformation of that game. We can perform a semicoordinate transformation even in a single-stage game. Say we restrict attention to distributions over X that are product distributions. Then changing C(.) from the identity map to some other function means that the players' moves are no longer independent. After the transformation their move choices — the components of 2; — are statistically coupled, even though we are considering a product distribution. Formally, this is expressed via the standard rule for transforming probabilities, Pz{z eZ) = aPx) = JdxPx{x)5{z - ax)),

(3.5)

where Px and Pz are the distributions across X and Z, respectively. To see what this rule means geometrically, let V be the space of all distributions (product or otherwise) over Z. Recall that Q is the space of all product distributions over X , and let C(Q) be the image of Q in V. Then by changing ({.), we change that image; different choices of C(.) will result in different manifolds C(Q)As an example, say we have two players, with two possible moves each. So z consists of the possible joint moves, labeled (1,1), (1,2), (2,1) and (2,2). Have X = Z, and choose C(l,l) = (1,1), C(l,2) = (2,2), C(2,1) = (2,1), and C(2,2) = (1,2). Say that q is given by qi{xi = 1) = ^2(^2 = 1) = 2/3. Then the distribution over joint-moves z is Pz{l,l) = P x ( l , l ) — 4/^' Pz(2,1) = Pz(2,2) = 2/9, P z ( l , 2 ) = 1/9. So Pziz) ^ Pz{zi)Pz{z2)\ the moves of the players are statistically coupled, even though their strategies Xi are independent. Such coupling of the players' moves can be viewed as a manifestation of sets of potential binding contracts. To illustrate this return to our two player example. Each possible value of a component xi determines a pair of possible joint moves. For example, setting xi = 1 means the possible joint moves are (1,1) and (2,2). Accordingly such a value of Xi can be viewed as a set of proffered binding contracts. The value of the other components of x determines which contract is accepted; it is the intersection of the proffered contracts offered by all the components of x that determines what single contract is selected. Continuing with our example, given that xi = 1, whether the joint-move is (1,1) or (2,2) (the two options offered by xi) is determined by the value of X2. To relate semicoordintes to distributed control we have tofixsome notation. To maintain consistency with the discussion of maxent Lagrangians, we will have product distributions q{x G X ) € Qx- Also as before, to allow stochasticity, we write the ultimate space of interest as y with associated cost function F{y). This means that X sets z which stochastically sets y\ E,iF) = Jdy P{y)F{y) = Jdz P{z)G{z) = J dx q{x)G{x) where

(3.6)

3 Distributed Adaptive Control

G{z) = j dyF{y)P{y\z)

39

(3.7)

= f dx G{x)S{Cix) - z); G{x) = j dyF{y)P{y\x) =

(3.8)

jdyF{y)P{y\ax)).

3.3.2 Representational properties Binding contracts are a central component of cooperative game theory. In this sense, semicoordinate transformations can be viewed as a way to convert noncooperative game theory into a form of cooperative game theory. Indeed, any cooperative mixed strategy can be cast as a non-cooperative game mixed strategy followed by an appropriate semicoordinate transformation. Formally, any Pz, no matter what the coupling among its components, can be expressed as ({Px) for some product distribution Px for and associated C(-) ^^ Less trivially, given any model class of distributions {Pz}, there is an X and associated ({,) such that {Pz} is identical to CiQx)- Formally this is expressed in a result concerning Bayes nets. For simplicity, restrict attention to finite Z. Order the components of Z from 1 to A^. For each index i e { 1 , 2 , . . . , A^}, have the parent function V{i, z) fix a subset of the components of z with index greater than z, returning the value of those components for the z in its second argument if that subset of components is non-empty. So for example, with A^ > 5, we could have V{1, z) = {z2, Z5). Another possibility is that V{1, z) is the empty set, independent of 2:.

Let A{V) be the set of all probability distributions Pz that obey the conditional dependencies implied by V:W Pz e A{V), z e Z, N

Pz{z)^l[Pz{zi\P{i.z)).

(3.9)

i=l

(By definition, if V{i, z)) is empty, Pz{zi \ V{i, z)) is just the z'th marginal of Pz, Pz{zi)') Note that any distribution Pz is a member of A{V) for some V — in the worst case, just choose the exhaustive parent function V{i^ z) — {zj : j > i}. For any choice of V there is an associated set of distributions C{Qx) that equals A{V) exactly: Theorem 1: Define the components of X using multiple indices: For all i G { 1 , 2 , . . . , N} and possible associated values (as one varies over z e Z) of the vector V{i, z), there is a separate component of x, Xi.^(^i^z)- This component can take on '^In the worst case, one can simply choose X to have a single component, with C() ^ bijection between that component and the vector z — trivially, any distribution over such an X is a product distribution.

40

David H. Wolpert and Stefan Bieniawski

any of the values that zi can. Define C(.) recursively, starting aXi = N and working to lower z, by the following rule: V z G { 1 , 2 , . . . , N},

ThtnAiV) = aQx). Proof: First note that by definition of parent functions, due to the fact that we're iteratively working down from higher z's to lower ones, C{x) is properly defined. Next plug that definition into Eq. 3.5. For any particular x and associated z = C{x), those components of x that do not "match" z by having their second index equal V{i,z) get integrated out. After this the integral reduces to N

Pz{z) = Y[Px{[xi.v(i,z)]

= ^i),

i.e., is exactly of the form stipulated in Eq. 3.9. Accordingly, for any fixed x and associated z = C{x), ranging over the set of all values between 0 and 1 for each of the distributions Px{[xi]'P{i,z) = ^i) will result in ranging over all values for the distribution Pz{z) that are of the form stipulated in Eq. 3.9. This must be true for all X. Accordingly, C(2x) S A{V). The proof that A{V) C C(Qx) goes similarly: For any given Pz and z, simply set Px{[xi'^r{i,z)] = ^i) for ^11 the independent components x^.^-pf^i^z) of ^ ^^^ evaluate the integral in Eq. 3.5. QED. Intuitively, each component oix in Thm. 1 is the conditional distribution Pz{zi \ V{i^ z)) for some particular instance of the vector V{i^ z)). Thm. 1 means that in principle we never need consider coupled distributions. It suffices to restrict attention to product distributions, so long as we use an appropriate semicoordinate system. In particular, mixture models over Z can be represented this way. 3.3.3 Maxent Lagrangians over X rather than Z While the distribution over X uniquely sets the distribution over Z, the reverse is not true. However so long as our Lagrangian directly concerns the distribution over X rather than the distribution over Z, by minimizing that Lagrangian we set a distribution over Z, In this way we can minimize a Lagrangian involving product distributions, even though the associated distribution in the ultimate space of interest is not a product distribution. The Lagrangian we choose over X should depend on our prior information, as usual. If we want that Lagrangian to include an expected value over Z (e.g., of a cost function), we can directly incorporate that expectation value into the Lagrangian over X, since expected values in X and Z are identical: / dzPz{z)A{z) — J dxPx{x)A{C{x)) for any function A{z). (Indeed, this is the standard justification of the rule for transforming probabilities, Eq. 3.5.) However other functionals of probability distributions can differ between the two spaces. This is especially common when C() is not invertible, so X is larger than Z. In particular, while the expected cost term is the same in the X and Z maxent Lagrangians, this is not true of the two entropy terms in general; typically the entropy of a g G Q will differ from that of its image, ({q) e C(Q) in such a case.

3 Distributed Adaptive Control

41

More concretely, the fully formal definition of entropy includes a prior probability fi: Sx = J dxp{x)\n{^^), and similarly for Sz- So long as (i{x) and /i(z) are related by the normal laws for probability transformations, as are p{x) and p{z), then if the cardinalities of X and Z are the same, Sz = Sx^^ - When the cardinalities of the spaces differ though (e.g., when X and Z are both finite but with differing numbers of elements), this need no longer be the case. The following result bounds how much the entropies can differ in such a situation: Theorem 2: For all z e Z, take ii[x) to be uniform over all x such that C,{x) = z. Then for any distribution p{x) and its image p{z),

- fdzp{z)\n{K{z))

< Sx-Sz

<0,

where K{z) = J dx6{z-(^{x)). (Note that for finite X and Z, K{z) > 1, and counts the number of x with the same image z.) If we ignore the /x terms in the definition of entropy, then instead we have

0 < Sx-Sz

< - f

dzp{z)\n{K{z)).

Proof: Write

Sx = -Jdzjdx = —

dz

6{z - C(x)) p{x) l n [ ^ ] dx S{z — C{x)) p{x) X

('"'scSb^'"'"'^'" = - f dz p{z)ln[d{z)]

-

f dz fdx S{z - C{x)) p{x) ln[ ^^'^'* d{z)/ji{x) where dz = f dx S{z — C(^)) 4 | ) • Define //^ to be the common value of all fi{x) such that Ci^) = z. So ii{z) = ii^K{z) and p{z) = ij.^d{z). Accordingly, expand our expression as

"For example, if X = Z = 1, then Hf^] = l n [ £ | | ^ ] = l n [ ^ ] , where Mx) is the determinant of the Jacobian of C() evaluated at x. Accordingly, as far as transforming from X to Z is concerned, entropy is just a conventional expectation value, and therefore has the same value whichever of the two spaces it is evaluated in.

42

David H. Wolpert and Stefan Bieniawski

Sx = -Jdzp{z)ln[^]

- jdzp{z)K{z)

-

/d./..5(.-a.))p(x)ln[^] ^Sz-

I dzp{z)K{z)

+

/d.p(.)(-/d..(.-C(.))g|ln[^]). The x-integral of the right-hand side of the last equation is just the entropy of normalized the distribution | i ^ defined over those x such that C,{x) = z. Its maximum and minimum are ln[-ff (z)] and 0, respectively. This proves the first claim. The second claim, where we "ignore the // terms", is proven similarly. QED. In such cases where the cardinalities of X and Z differ, we have to be careful about which space we use to formulate our Lagrangian. If we use the transformation C(.) as a tool to allow us to analyze bargaining games with binding contracts, then the direct space of interest is actually the a:'s (that is the place in which the players make their bargaining moves). In such cases it makes sense to apply all the analysis of the preceding sections exactly as it is written, concerning Lagrangians and distributions over X rather than z (so long as we redefine cost functions to implicitly pre-apply the mapping C() to their arguments). However if we instead use C,{.) simply as a way of establishing statistical dependencies among the moves of the players, it may make sense to include the entropy correction factor in our x-space Lagrangian. An important special case is where the following three conditions are met: Each point z is the image under C,{.) of the same number of points in x-space, n; /i(x) is uniform (and therefore so is fJi{z))\ and the Lagrangian in x-space, J^^, is a sum of expected costs and the entropy. In this situation, consider a z-space Lagrangian, ^z, whose functional dependence on Pz, the distribution over z's, is identical to the dependence of S^x on P^, except that the entropy term is divided by n ^^. Now the minimizer P*{x) oi^x is a Boltzmann distribution in values of the cost function(s). Accordingly, for any z, P*{x) is uniform across all n points x e C~^iz) (all such x have the same cost value(s)). This in turn means that 5(C(Px)) = nS{Pz). So our two Lagrangians give the same solution, i.e., the "correction factor" for the entropy term is just multiplication by n. 3.3.4 Exploiting semicoordinate transformations This subsection illustrates some way to exploit semicoordinate transformations to facilitate descent of the Lagrangian. To illustrate the generality of the arguments, situations where one has to to use Monte Carlo estimates of conditional expectation values to descend the shared Lagrangian (rather than evaluate them closed-form) will be considered. ^'For example, if-^a.(Px) = pEp^{G{C{.))) - S{Pa:). then ^z{Pz) = f3Ep,{G{.)) S(Pz)/n, where Px and Pz are related as in Eq. 3.5.

3 Distributed Adaptive Control

43

Say we are currently at a local minimum q G Q of J^. Usually we can break out of that minimum by raising /? and then resuming the updating; typically changing (3 changes -Sf so that the Lagrange gaps are nonzero. So if we want to anneal (3 anyway (e.g., to find a minimum of the shared cost function G), it makes sense to do so to break out of any local minima. There are many other ways to break out of local minima without changing the Lagrangian (as we would if we changed /?, for example) [12]. Here we show how to use semicoordinate transformations to do this. As explicated below, they also provide a general way to lower the value of the Lagrangian, whether or not one has local minimum problems. Say our original semicoordinate system is C^(-)- Switch to a different semicoordinate system ('^{.) for Z and consider product distributions over the associated space X'^. Geometrically, the semicoordinate transformation means we change to a new submanifold C^(Q) C V without changing the underlying mapping from p{z) As a simple example, say C^ is identical to C^ except that it joins two components of X into an aggregate semicoordinate. Since after that change we can have statistical dependencies between those two components, the product distributions over X^, C^(Qx2), niap to a superset of C^(QxO- Typically the local minima of that superset do not coincide with local minima of C^(QxO- So this change to X^ will indeed break out of the local minimum, in general. More care is needed when working with more complicated semicoordinate transformations. Say before the transformation we are at a point p* G C^iQx^)- Then in general p* will not be in the new manifold C^CQx^), i.e., p* will not correspond to a product distribution in our new semicoordinate system. (This reflects the fact that semicoordinate transformations couple the players.) Accordingly, we must change from p* to a new distribution when we change the semicoordinate system. To illustrate this, say that the semicoordinate transformation is bijective. Formally, this means that X^ = X'^ = X and C'^{x) = CH^(^)) foi" ^ bijective ^(.). Have ^(.), the mapping from X^ to X^, be the identity map for all but a few of the M total components of X, indicated as indices 1 -^ n. Intuitively, for any fixed ^n+i->M — ^n+i->M, the cffcct of the semicoordinate transformation to C^(-) from C^(.) is merely to "shuffle" the associated mapping taking semicoordinates 1 -^ n to Z, as specified by ^(.). Moreover, since ^(.) is a bijection, the maxent Lagrangians overX^ andX2 are identical: J^xi(^(p^')) = ^x^{{p^^)). Now say we set q^_^i_^ = q^-^i^j^. This means we can estimate the expectations of G conditioned on possible x1^^ from the Monte Carlo samples conditioned on ^(xf_,^). In particular, for any £,{.) we can estimate E{G) as J dxl_^^p'^ {xl_^^)E{G I ^(xf ,^)) in the usual way. Now entropy is the sum of the entropy of semicoordinates n -h 1 —> M plus that of semicoordinates 1 —> n. So for any choice of ^(.) and g^i^^, we can approximate ^x = -^x^ ^s (our associated estimate of) E{G) minus the entropy of p ^ ^ ^ , minus a constant unaffected by choiceof ^(.).

44

David H. Wolpert and Stefan Bieniawski

So for finite and small enough cardinality of the subspace |Xi^n |» we can use our estimates E{G | ^{xl_^^)) to search for the "shuffling" ^(.) and distribution q^__,n that minimizes J5f^ ^^. In particular, say we have descended ^x to a distribution q^ (x) = q*{x). Then we can set q-^ — q*, and consider a set of of "shuffling ^(.)". Each such ^(.) will result in a different distribution q-^ {x) = q-^ (^~^ (x)) = g* (^~^ (x)). While those distributions will have the same entropy, typically they will have different (estimates of) E{G) and accordingly different local minima of the Lagrangian. Accordingly, searching across the ^(.) can be used to break out of a local minimum. However since E{G) changes under such transformations even if we are not at a local minimum, we can instead search across ^(.) as a new way (in addition to those discussed above) for lowering the value of the Lagrangian. Indeed, there is always a bijective semicoordinate transformation that reduces the Lagrangian: simply choose ^(.) to rearrange the G{x) so that G{x) < G{x') ^ q{x) < q{x'). In addition one can search for that ^(.) in a distributed fashion, where one after the other each agent i rearranges its semicoordinate to shrink E{G). Furthermore to search over semicoordinate systems we don't need to take any additional samples of G. (The existing samples can be used to estimate the E{G) for each new system.) So the search can be done off-line. To determine the semicoordinate transformation we can consider other factors besides the change in the value of the Lagrangian that immediately arises under the transformation. We can also estimate the amount that subsequent evolution under the new semicoordinate system will decrease the Lagrangian. We can estimate that subsequent drop in a number of ways: the sum of the Lagrangian gaps of all the agents, gradient of the Lagrangian in the new semicoordinate system, etc. 3,3.5 Distributions over semicoordinate systems The straightforward way to implement these kinds of schemes for finding a good semicoordinate systems is via exhaustive search, hill-climbing, simulated annealing, or the like. Potentially it would be very useful to instead find a new semicoordinate system using search techniques designed for continuous spaces. When there are a finite number of semicoordinate systems (i.e., finite X and Z) this would amount to using search techniques for continuous space to optimize a function of a variable having a finite number of values. However we now know how to do that: use PD theory. In the current context, this means placing a product probability distribution over a set of variables parameterizing the semicoordinate system, and then evolving the probability distribution. More concretely, write

^•'penalizing by the bias^ plus variance expression if we intend to do more Monte Carlo see [9].

3 Distributed Adaptive Control

45

N e

i=i

X N

= ^ E E n Qii^i)PiG)Gia^, 0))+s{q) e

X

2=1

where ^ is a parameter on the semicoordinate system. We can rewrite this using an additional semicoordinate transformation, as A/"+l

^{q*) = / ? E n qKOGia^*)) + S{q*) X*

(3.11)

i—l

where x* = Xi for all i up to N, and x""^^-^ = 6. (As usual, depending on what space we cast our Lagrangian in, the entropy can either have the argument of the entropy term starred — as here — or not.) Intuitively, this approach amounts to introducing a new coordinate/agent, whose "job" is to set the semicoordinate system governing the mapping from the other agents to a z value. This provides an alternative to periodically (e.g., at a local minimum) picking a set of alternative semicoordinate systems and estimating which gives the biggest drop in the overall Lagrangian. We can instead use Nearest Newton, Brouwer updating, or what have you, to continuously search for the optimal coordinate system as we also search for the optimal x. The tradeoff, of course, is that by introducing an extra coordinate/agent, we raise the noise level all the original semicoordinates experience. (This raises the issue of what best parameterization of C() to use, an issue not addressed here.)

3.4 PD theory for uncountable Z In almost all computational algorithms for finding minima, and in particular in the algorithms considered above, we can only modify a finite set of real numbers from one step to the next. When Z is finite, we accomodate this by having the real numbers be the values of the components of the qi. But how can we use a computational algorithm to find a minimum of the maxent Lagrangian when Z is uncountable? One obvious idea is to have the real numbers our algorithm works with parameterize p differently from how they do with product distributions. For example, rather than product distributions, we could use distributions that are mixture models. In that case the real numbers are the parameters of the mixture model; our algorithm would minimize the value of the Lagrangian over the values of the parameters of the mixture model. An alternative set of approaches still use product distributions, with all of its advantages, but employs a special type of semicoordinate system for Z. For pedagogical simplicity, say that Z is the reals between 0 and L So ^ must be a semi-coordinate system for the reals, i.e., each x G ^ must map to a single 2: G C- Now we want to have those of the qi that we're modifying be probability distributions, not probability density functions (pdf's), so that our computational algorithm can work with them.

46

David H. Wolpert and Stefan Bieniawski

Accordingly, in our minimization of the Lagrangian we do not directly modify coordinates that can take on an uncountable number of values (generically indicated with superscript 2), but only coordinates that take on a finite number of values (generically indicated with superscript 1). We illustrate this for the minimization schemes considered in the preceding sections. For generality, we consider the case where Monte Carlo sampling must be used to estimate the values of E{G \ x^) arising in those schemes. Accordingly, we need two things. The first is a way to sample q to get an z, which then provides a G value. The second is a way to estimate the quantities E{G \ x^) based upon such empirical data. Given those, all the usual algorithms for searching q^ to minimize the Lagrangian hold; intuitively, we treat the g^ like stochastic processes that reside in Z but not X, and therefore not directly controllable by us. 3.4.1 Reimann semicoordinates In the Reimann semicoordinate system, x^ can take values 0,1,...,JB — 1, and x^ is the reals between 0 and 1. Then with a^ = i / B , we have z = a^i^x'^/B = a^i + a:^(a^i+i - a^;!).

(3.12)

We thenfixq'^{x'^) to be uniform. So all our minimization scheme can modify are the B values of q^{x^). To sample q, we simply sample q^ to get a value of x^ and g^ to get a value of x^. Plugging those two values into Eq. 3.13 gives us a value of z. We then evaluate the associated value of the world utility; this provides a single step in our Monte Carlo sampling process. Next we need a way to use a set of such Monte Carlo sample points to estimate E{G I x^) for all x^. We could do this with simple histogram averaging, using Laplace's law of succession to deal with bins (x^ values) that aren't in the data. Typically though with continuous Z we expect F{z) to be smooth. In such cases, it makes sense to allow data from more than one bin to be combined to estimate E{G I x^) for each x \ by using a regression scheme. For example, we could use the weighted average regression F{z) = - ^ j _ ( . _ , ) . ; , . .

.

(3.13)

where a is a free parameter, Zi is the i'th value of z out of our Monte Carlo samples, and Fi is the associated i'th value of F. Given such afit,we would then estimate EiG I x^) = f dxV{x^)F((:{x\x^)) ^ Jdx^q\x^)F{ax\x^)). This integral can then be evaluated numerically.

(3.14)

3 Distributed Adaptive Control

47

Typically in practice one would use a trapezoidal semicoordinate system, rather than the rectangular illustrated here. Doing that introduces linear terms in the integrals, but those can still be evaluated as before. 3.4.2 Lebesgue semicoordinates The Lebesgue semicoordinate system generalizes the Reimann system, by parameterizing it. It does this by defining a set of increasing values {ao, a2,..., aj?} that all lie between 0 and 1 such that ao = 0 and a ^ = 1. We then write z = a^,! -f- x'^{a,,i^i - a^i),

(3.15)

Sampling with this scheme is done in the obvious way. The expected value of G if q^ is uniform (i.e., all x^ are equally probable) is E{G) = Y^qiix^) q,{x')

f dx^q\x^)F[a,i /

dz

+ x\a,i^^ ^-^

- a^i)] (3.16)

and similarly for E{G \ x^). When the ai are evenly spaced, the Lebesgue system just reduces to the Reimann system, of course. Note that for a given value of x^, we have probability mass 1 in the bin following a^.!. So q^{x^) sets the cumulative probability mass in that bin. Changing the parameters ai will change what portion of the real line we assign to that mass — but it won't change the mass. This may directly affect the Lagragian we use, depending on whether it's the Xspace Lagrangian or the Z-space one. In the Reimann semicoordinate system, Sx oc Sz, and both Lagrangians are the same (just with a rescaled Lagrange parameter). However in the Lebesgue system, if the a^ are not evenly spaced, those two entropies are not proportional to one another. Accordingly, in that scenario, one has to make a reasoned decision of which maxent Lagrangian to use. The {ai} are a finite set of real numbers, just like q^. Accordingly, we can incorporate them along with q^ into the argument of the maxent Lagrangian, and search for the Lagrangian-minimizing set {ai} and q^ together ^'^. In fact, one can even have q^ fixed, along with q'^, and only vary the {ai}. The difference between such a search over the {ai} when q^ is fixed, and a search over q^ when the {ai} are fixed, is directly analogous to the difference between Reimann and Lebesgue integration, in how the underlying distribution P{z) is being represented. Whether or not q^ is also varied, one must be careful in how one does the search for each a^. Unlike for each {qi}, each ai does not arise as a multilinear product, and therefore appears more than once in the Lagrangian. For example, any particular ^"^Compare this to the scheme discussed previously for searching directly over semicoordinate transformations, where here the search is over probability distributions defined on the set of possible semicoordinate transformations.

48

David H. Wolpert and Stefan Bieniawski

a^i term arises in Eq. 3.16 twice as a limit of an integral, and twice in an integrand. All four instances must be accounted for in differentiating the E{G) term in the Lagrangian with respect to that a^i term. 3.4.3 Decimal Reimann semicoordinates In the standard Reimann semicoordinate system, we use only one agent to decide which bin x^ falls into. To have good precision in making that decision, there must be many such bins. This often means that there are few Monte Carlo samples in most bins. This is why we need to employ a regression scheme (with its attendant implicit smoothness assumptions) to estimate E{G \ Xi), An alternative is to break x^ into a set of many agents, through a hierarchical decimal-type representation. For example, say x^ can take on 2 ^ values. Then under a binary representation, we would specify the bin by K

x^ = Y,xl'^~'

(3.17)

where xj is the bit specifying agent i's value. With this change updating the Lagrangian is done by K agents, with each agent i estimating E{G \ xj) for two values of xj, rather than by a single agent estimating E{G \ x^) for all 2 ^ values of x\ With this system, each agent performs its estimations by looking at those Monte Carlo samples where z fell within one particular subregion covering half of [0.0,1.0]. So long as the samples weren't generated from too peaked a distribution (e.g., early in the search process), there will typically be many such samples, no matter what bit i and associated bit value x} we are considering. Accordingly, we do not need to perform a regression to estimate E(G \ xj) to run our Lagrangian minimization algorithms ^^. When q is peaked, some of bin counts from the Monte Carlo data may be small. We can use regression as above, if desired, for such impoverished bins. Alternatively, we can employ a Lebesgue-type scheme to update the bin borders, to ensure that all xj occur often in the Monte Carlo data.

3.5 PD theory for Reinforcement Learning In this section we show how to use semicoordinate transformations and PD theory for a single RL algorithm playing against nature. The underlying idea is to "fracture" the single learner across multiple timesteps into a set of separate agents, one for each timestep. This gives us a distributed system. Constraints are then used to couple those agents.

^^As usual, we could have the entropy term in the Lagrangian be based on either X space or Z space.

3 Distributed Adaptive Control

49

3.5.1 Episodic RL First consider episodic RL, in which reward comes only at the end of an episode of T timesteps. The learner chooses actions in response to observations of the state of the system via a policy. It does this across a set of several episodes, modifying its policy as it goes to try to maximize reward. The goal is to have it find the optimal such policy as quickly as possible. To make this concrete, use superscripts to indicate timestep in an episode. So z — (z^, z^, 2:^,... z^) = C{x). If we assume the dynamics is Markovian, P{z) = P{z^)P{z^ I z^)P{z^ I ^^). • • P{z^ I z^~^)- Typically the objective function G depends solely on z^. For the conventional RL scenario, each z* can be expressed as (5*, a*), where s^ is the state of the system at t, and a* is the action taken then. As an example, the learner doesn't take into account its previous actions when deciding its current one and that it observes the state of the system (at the previous instant) with zero error. Then P{z' I z'-^) = P{s\a'

I s'-\a'-^)=P{a'

\ s'-^)P{s'

\ s'-\a'-^).

(3.18)

Have C(-) give us a representation of each of the conditional distributions in the usual way using semicoordinates (see Thm. 1). So X is far larger than Z, and we can write P{z) with abbreviated notation as Pis'^.a'^,...,

s^, a^) = P{a'')P{s'') J ] P{a' \ s'-')P{s'

\

s'-\a'-')

t>i

= 9Ao(a°)g5o(5°)n^M*-^(«')^M*-^a*-i(5*)-(3.19) t>l

In RL we typically can only control the Qt^s^-i distributions. While the other qi go into the Lagrangian, they are fixed and not directly varied. If it is desired to have the policy of the learner be constant across each episode, we can add penalty terms Xi[qt,si<^) ~ 9t+i,s(tt)] V^, 5, a to the X-space Lagrangian to enforce time-translation invariance in the usual way ^^. Time-translation invariance of the P(s* I 5*~^, a*~^) does not explicitly need to be addressed. Indeed, it need not even hold. Up to an overall additive constant, the resulting X-space Lagrangian is - S ? ( { 9 M ' - } ) ==

f3Y,lAoia'>)qso{s'')'[[qt,s'Ma')qt,s'-^,a'-^is')Gis) a,s

t>l

- S{qso) - Y^S{qt^s) t>l

+ Yl

Ks,ah,s{^)

-

^t-lA^)]

t>l,s,a

(3.20)

^^Note that unlike constraints over X, those over Q are not generically true only to some high probability, but rather can typically hold with probability 1.

50

David H. Wolpert and Stefan Bieniawski

where s and a indicate the vectors of all 5* and all a*, respectively, and the entropy function 5(.) should not be confused with the subscript 5° on q (which indicates the component of q referring to the time-0 value of the state variable) ^^ We can then use any of the standard techniques for descending this Lagrangian. So for example say we use Nearest Newton. Then at the end of each episode, for each ^ > 1, s, a, we should increase qt^s{o) by a [qt,s{ci){(3E{G I s*"^ =s,a* =a) + \n{qt,s{a))+ Xt,s,a->^t-^i,s,a) - const], (3.21) where as usual a is the step size and const is the normalization constant (see Eq. 3.4). 3.5.2 Discounted sum RL It is worth giving a brief overview of how the foregoing gets modified when we instead have a single "episode" of infinite time, with rewards received at every t, and the goal of the learner at any instant being to optimize the discounted sum of future rewards. Let the matrix P be the conditional distribution of state zt given state zt-i, and 7 a real-valued discounting factor between 0 and 1. Write the single-instant reward function as a vector R whose components give the value for the various zt. Then if Po is the current distribution of (single-instant) states, ZQ, the world utility is

([£(7P)*]Po) • R t=l

The sum is just a geometric series, and equals j ^ ^ , where 1 is the identity matrix, and it doesn't matter if the matrix inversion giving the denominator term is right-multiplied or left-multiplied by the numerator term. We're interested in the partial derivative of this with respect to one of the entries of P (those entries are given by the various qi^j). What we know though (from our historical data) is a (non-IID) set of samples of (7P)*P • R for various values of t and various (delta-function) P . So it is not as trivial to use historical data to estimate the gradient of the Lagrangian as in the canonical optimization case. More elaborate techniques from machine learning and statistics need to be brought to bear. Acknowledgements We would like to thank Stephane Airiau, Chiu Fan Lee, Chris Henze, George Judge, Ilan Kroo and Bill Macready for helpful discussion.

^^Equivalently, at the expense of some extra notation, we could enforce the time-translation invariance without the \t,s,a Lagrange parameters, by using a single variable qs (a) rather than the time-indexed set qt,s{o).

3 Distributed Adaptive Control

51

References 1. Laughlin, D., Morari, M., Braatz, R.: Robust performance of cross-directional control systems for web processes. Automatica 29 (1993) 1394-1410 2. Wolfe, J., Chichka, D., Speyer, J.: Decentralized controllers for unmanned aerial vehicle formation flight. American Institute of Aeronautics and Astronautics 96 (1996) 3933 3. Mesbai, M., Hadaegh, R: Graphs, matrix inequalities, and switching for the formation flying control of multiple spacecraft. In: Proceedings of the American Control Conference, San Diego, CA. (1999) 4148-4152 4. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge, MA (1991) 5. Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Siam, Philadelphia, PA (1999) Second Edition. 6. Osborne, M., Rubenstein, A.: A Course in Game Theory. MIT Press, Cambridge, MA (1994) 7. Aumann, R., Hart, S.: Handbook of Game Theory with Economic Applications. NorthHolland Press (1992) 8. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press, Cambridge, MA (1998) 9. Wolpert, D.H.: Factoring a canonical ensemble. (2003) cond-mat/0307630. 10. Wolpert, D.H.: Information theory — the bridge connecting bounded rational game theory and statistical physics. In D. Braha, A.M., Bar-Yam, Y., eds.: Complex Engineering Systems. (2004) 11. Wolpert, D.H.: What information theory says about best response, binding contracts, and collective intelligence. In et al, A.N., ed.: Proceedings of WEHIA04, Springer Verlag (2004) 12. Wolpert, D.H., Bieniawski, S.: Theory of distributed control using product distributions. In: Proceedings of CDC04. (2004) 13. Macready, W, Wolpert, D.H.: Distributed optimization. In: Proceedings of ICCS 04. (2004) 14. Wolpert, D.H., Lee, C.F.: Adaptive metropolis bastings sampling using product distributions. Submitted to ICCS04 (2004) 15. Airiau, S., Wolpert, D.H.: Product distribution theory and semi-coordinate transformations. (2004) Submitted to AAMAS 04. 16. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998) 17. Kaelbing, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4 (1996) 237-285 18. Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. In Touretzky, D.S., Mozer, M.C., Hasselmo, M.E., eds.: Advances in Neural Information Processing Systems - 8, MIT Press (1996) 1017-1023 19. Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings of the Fifteenth International Conference on Machine Learning. (1998) 242-250 20. Antoine, N., Bieniawski, S., Kroo, I., Wolpert, D.H.: Fleet assignment using collective intelligence. In: Proceedings of 42nd Aerospace Sciences Meeting. (2004) AIAA-20040622. 21. Lee, C.F., Wolpert, D.H.: Product distribution theory for control of multi-agent systems. In: Proceedings of AAMAS 04. (2004) 22. Bieniawski, S., Wolpert, D.H.: Adaptive, distributed control of constrained multi-agent systems. In: Proceedings of AAMAS04. (2004)

52

David H. Wolpert and Stefan Bieniawski

23. Bieniawski, S., Wolpert, D.H., Kroo, I.: Discrete, continuous, and constrained optimization using collectives. In: Proceedings of 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, New York. (2004) in press. 24. Macready, W., Bieniawski, S., Wolpert, D.: Adaptive multi-agent systems for constrained optimization. (2004) 25. Bieniawski, S., Wolpert, D.H.: Using product distributions for distributed optimization. In: Proceedings of ICCS 04. (2004) 26. Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience, New York (1991) 27. Mackay, D.: Information theory, inference, and learning algorithms. Cambridge University Press (2003) 28. Jaynes, E.T., Bretthorst, G.L.: Probability Theory : The Logic of Science. Cambridge University Press (2003) 29. Kirkpatrick, S., Gelatt, C.D.J., Vecchi, M.P.: Optimization by simulated annealing. Science 220 (1983) 671-680 30. Diekmann, R., Luling, R., Simon, J.: Problem independent distributed simulated annealing and its applications. In: Applied Simulated Annealing. Springer (1993) 17^4 31. Catoni, O.: Solving scheduling problems by simulated annealing. SIAM Journal on Control and Optimization 36 (1998) 1539-1575 32. Vidal, R.V.V., ed.: Applied Simulated Annealing (Lecture Notes in Economics and Mathematical Systems). Springer (1993)

Multi-agent Planning for Autonomous Agents' Coordination Amal El Fallah-Seghrouchni LIP6, University Pierre and Marie Curie and CNRS (UMR 7606) 8, rue du Capitaine Scott, 75015 Paris, France [email protected] http://www-poleia.Iip6.fr/~elfallah

4.1 Coordination Issue Features like cooperation, interoperability, distribution, data integration and problem solving are fundamental to build complex and intelligent systems. To support the effective modeling and analysis of inherent complexity, an adequate (and comprehensive) framework should combine the rigor of formal models, the practicality of existing development methods and the performance analysis of modeling tools. This paper proposes an approach to formalize rational agent coordination. We argue that concurrent plan coordination is a necessary bridge for rational agent cooperation. Coordination endow^s rational agents with cooperation abilities in order to generate coherent activities and ensure a correct Multi-Agent System (MAS) behavior. Such a coordination requires both an adequate plan representation and an efficient interaction between agents. Based on information exchange (e.g. data, plans), the interaction allows agents to update their own plans by considering the exchanged information. Coordination aims to produce two effects: cancelling negative interactions (e.g. harmful actions) and taking advantage of helpful interactions (e.g. handling redundant actions). Agents organize their activities and update their plans in order to cooperate and avoid conflicts. Multi-agent planning remains one of the main mechanisms for MAS coordination. However, it raises several interesting issues because of the main features of agents (such as the autonomy and the partial view of agents) and also because of the environment changes.

4.2 Multi-Agent Planning Requirements To meet the specific requirements of the studied application domain, a model for multi-agent planning [6] should be studied from several perspectives: dynamic (or reactive) planning, distributed planning, task allocation and resource sharing, etc. In addition, in the case of a dynamic environment, a such model must take into account the handling of new events. Indeed, in response to an event, the system must

54

Amal El Fallah-Seghrouchni

be able to reorganize itself. This implies that the agents should adapt dynamically their behaviour by adopting new goals and re-planning their course of actions if necessary. In a such context, namely reactive planning, the models for planning have to be enough flexible to avoid to replan from scratch. A good model for multi-agent or multi-agent planning should : - allow the representation and the reasoning about simultaneous actions and continuous processes (e.g. concurrent actions, alternatives, synchronization, etc.), - be domain-independent and support complex plans with different levels of abstraction (i.e. only the relevant information is represented at the earlier phases), - allow dynamic modifications with the associated verification (e.g. no structural inconsistency) and valuation methods (e.g. robustness), - support the interleaving of execution and planning with respect to the environment changes, - offer the plan reuse mechanisms allowing agents to bypass the planning process in the case of similar situations according to the execution context (e.g. library of abstract plans which are the basic building blocks of the new plans), - allow agents to skip some of the planning actions, detect conflicts early and reduce communication costs, - execution control is dynamic in accordance with the associated refinement and therefore minimizes the set of revocable choices [2] because the instantiation of actions can be deferred, - provide plans the size of which remains controllable for the specification and the complexity of which is tractable for validation. In this paper, we propose an overview of two models we developed to deal with distributed and multi-agent planning. The first one is based on recursive Petri nets that allow an event-driven modelling of plans (both single and muli-agent plans) while the second is based on Hybrid automata that are state driven modelling. Through these two models, we outline the coordination issue and the specific constraints it raises.

4.3 Transportation Domain Scenario This scenario will be used to illustrate each stage of the RPN model. The system is made up of two types of agents: Conveyors and Clients. The Conveyor: A Conveyor has a maritime traffic net modeled as a graph where the nodes describe the ports on the net and the arcs represent the channels between them. He also has a boat with a given volume. From each port, the directed neighbors are known using a routing table. Each Conveyor has 2 hangars per port: In_Hangar used for loading incoming goods, and OutJHangar used for unloading outgoing goods. Each hangar can stock only one container at a time. The Conveyor's motivation (goal) is to transport goods through the net using his boat between a source port and a destination port. The Client: A Client produces goods near the departure port and consumes them at the arrival port. The Client puts the goods he produced in the OutJHangar of the departure

4 Multi-agent Planning for Autonomous Agents' Coordination

55

port. He has to find a Conveyor who will transport the goods to the arrival port. Then the Client gets these goods at the InJHangar of the arrival port in order to consume them. The Client's motivation (goal) is to get his goods transported between the production and consumer ports.

The problem now is to answer the two questions: - When must agents' coordination happen? When positive interactions are detected, coordination is desirable and even necessary and may be considered as an optimization of the plans' execution (e.g. Cooperation between Conveyors and Clients and optimization of the boat filling). On the other hand, when negative interactions are detected, coordination becomes indispensable to the plans' execution (e.g. a hangar is a critical resource since it can stock one good at once (Boolean value) and the boat is limited by the quantity of goods it can contain (Real value)). - How may agents' coordination be ensured? To answer this question the coordination mechanism will be illustrated in the following.

4.4 RPN Formalism for Distributed Plamiing 4.4.1 Motivation The synergy between agents helps the emergence of coherent plans, i.e. it cancels negative effects and favors cooperation between agents. Consequently, it requires sharing information and synchronization between the parallel activities of the agents. The Petri Nets are suitable for modeling, analyzing and prototyping dynamic systems with parallel activities, so distributed planning lends itself very well to this approach. The main contribution we expect from Petri Nets is their ability to improve the representation of complex plans and to allow their dynamic coordination and execution. Applied to distributed planning, the Petri Net model mainly offers the following advantages: natural and graphical expression of the synchronization of parallel activities that are the performance of the agents* tasks, clear decomposition of processing (transitions) and sharing data (places), scheduling of the actions (causal and temporal relationships) of plans, dynamic allocation of tasks, qualitative and quantitative analysis of Petri Net modeling of a plan.

The Recursive Petri Net formalism we have introduced overcomes some limitations of usual categories of Petri Nets [9] (e.g. ordinary Petri Nets, High-Level Petri Nets (HLPN) and even Hierarchical HLPN (HHLPN)) that are apparent if one considers a Petri Net as a plan of actions: transition firings are instantaneous whilst an action lasts some time, (HLPN only) transitions are elementary actions as one needs to see an action as an abstraction of a plan, (HHLPN) when a transition is an abstraction, there is no clear end to its firing, some dynamicity is required in the structure of the net but in a controlled way.

56

Amal El Fallah-Seghrouchni

The processing described in the next section is based on dynamic planning which is supported by RPN through the interieaving of execution and planning. In addition, the hierarchical aspect of RPN supports the dynamic refinement of transitions and allows a plan to be considered at multiple levels of abstraction. 4.4.2 A Recursive Model of Plans and Actions Plans: A plan organizes a collection of actions which can be performed sequentially or concurrently in some specific order. Furthermore these actions demand both local and shared resources. A correct plan execution requires that whatever the total order that extends the partial order obtained from the plan, it must remain feasible. Actions: A plan involves both elementary actions associated with irreducible tasks (but not instantaneous ones) and complex actions (i.e. abstract views of the task). Semantically, there are three types of actions: - an elementary action represents an irreducible task which can be performed without any decomposition, - an abstract action, the execution of which requires its substitution (i.e. refinement) by a new sub-plan. There are two types of abstract actions: - a parallel action, which can be performed while another action is being executed, - an exclusive action, the execution of which requires a coordination with other current executions, - an end action, which is necessarily elementary, gives the final results of the performed plan. The plan goals are implicitly given through its end actions. Methods: Intuitively, a method may be viewed as some way to perform an action. Several methods can be associated with an action. A method requires: a label, a list of formal parameters to be linked when the method is executed, a set of Pre-Conditions (i.e. the conditions that must be satisfied before the method is executed), and a set of Post-Conditions (i.e. the conditions that will be satisfied after the method has been executed). Depending on the action definition, a method may be elementary or abstract. An elementary method calls for a sub-routine in order to execute the associated action inmiediately but not instantaneously. An abstract method calls for a sub-plan corresponding to the chosen refinement of the associated abstract action. The refinement occurs so as to detail abstractions and display relevant information. Example: Let us assume that the Initial Conveyor Plan (see Fig.4.1) is reduced to the abstract action Self.Work where Self represents the agent identity and Work

4 Multi-agent Planning for Autonomous Agents' Coordination

57

is the method label. The associated method Ag.Work encapsulates two methods Ag.GoTo(Dest) where Dest represents the source location of the good (see the following table) and Ag.Transport(Good) corresponds to the abstract action in the Self.Work refinement (see Fig.4.1). No method is associated with the end action Self.End-Conv since it is just a synchronization action. Method Type Variables Name Class CONVEYOR Ag Dest LOCATION

Ag.GoTo(Dest) Abstract Conditions Pre Post None Ag.Cur_Loc = Dest None None

4.4.3 Syntactic Definitions of RPN Definition 1. Method A Method is defined through three components: ' An identifier or label - An abstract attribute which represents the type of the associated method (e.g. abstract, elementary) ' A set of initialized RPN generated by an initialization mechanism in the case of abstract method.

RPNl Self.Woric

Reflnement

I Self.GoTo(Dest)

II Abstract Transition

II

I Self.Transport(Good) I

o

Place M0 =(1.0,0)

i

- i ' ' ... I Seif.End_Conv

Fig. 4.1. First Refinement of Initial Conveyor Plan

Definition 2. Recursive Petri Net An RPN is defined as a tuple < P, r , Pre, Post, Var, Call > where: • P is the set of places • T is the set of transitions such that: T = Teiem l±) Tabs l+l T^nd where: Tabs = Tpar l±) Texc cind |+) represents the disjoint union • Pre is the precondition matrix and is a mapping from P x T to N • Post is the postcondition matrix and is a mapping from P x T to N • Var is a set of variables

58

Amal El Fallah-Seghrouchni

• Call{t) is a method call associated with t and defined through the following components: ' the label of the method, - an expression (built on Var variables) of the agent who calls the method, - an expression of the call parameters (built on Var variables) which represents the Pre- and Post-Conditions associated with the method. Definition 3. An initialized RPN An initialized RPN is defined as a tuple < R, Mo, Bind > where: • R is the skeleton of RPN, • Mo is the initial marking of RPN (mapping from P to N), • Bind is the function which links all (a total link) or some (a partial link) variables of Var to the domain objects. Let us note that the objects represent the domain data and allow to instantiate the RPN variables and the parameters of the methods. An RPN model represents a plan according to the previous definitions: - The initial marking Mo allows the plan execution to start. - Pre{p, t) (respectively Post{p, t)) equals n > 0 if the place p is an input (respectively output) place of a transition t and the valuation of the arc linking p to t (respectively t to p) is n. It equals 0 otherwise. - The default value of a non-valuate arc equals to 1. 4.4.4 The RPN Semantics A plan is executed by executing its actions. In RPN formalism, a transition models an action and its firing corresponds to executing an action. The dynamic refinement of an abstract transition when it has to be fired is an elegant way of the handling a conditional plan without developing all the situations at plan generation. In addition, it allows the interleaving of planning and execution. In our model, the planner chooses dynamically the best refinement according to the execution context depending on the Pre- and Post-Conditions (see cases (a), (b) and (c) in Fig.4.2). The choice heuristics are not detailed in this paper in order to focus on the plan management. In the following the RPN semantics is given in order to illustrate the dynamic execution. An RPN models a plan. The successive states of a plan are represented in a tree (which may be considered as an execution tree). A node represents an initialized RPN and an arc represents an abstract transition firing. Definition 4. A Plan State A plan state is defined as a tree Tr =< S,A>

where:

• S is the set of nodes • A is the set of arcs a G A such that: - a =< s,s' > if and only if s' is the child ofs in Tr - An arc a is labeled by a transition which is called Trans (a) where Trans is a function from AtoT with trans (^a) = t

4 Multi-agent Planning for Autonomous Agents' Coordination

59

RPN2 DSelf.MoveTo(Z) i ' ii

^

^^^

Self.MoveTo(Z) ; rrhn ] Self.M

a Self.GoTo(Dest): = ^^^ '^e port destination is an ^.^.J^, :: immediate neighbor port i , •. of the Conveyor's i. current location : \ (c) the port destination is the current port j i (a) the port destination is not a neighbor port

The variable Z represents the next Port Location and will be linked using the route table.

Fig. 4.2. Self.GoTo(Dest) Refinements

• The initial state TVQ is a tree which is reduced to only one node s such that: M{s) = Mo where MQ is the initial marking. Example: Starting from the refinement RPNl in Fig.4.1, the successive states of the plan are given in Fig.4.3 where case (P) gives the chosen initialized RPN associated with the call of the method Self.GoTo(Good.Src_Loc) and using the RPN2 refinement (see Fig.4.2); case (Q) illustrates the firing of the abstract action Self.GoTo(Dest); and case (R) illustrates the firing of the elementary action Self.MoveTo(Dest) according to the following definitions. Notation: The index Tr of S (respectively of A and Trans) means that we consider the set of nodes S (respectively the set of arcs A and the transition Trans) relative to the tree Tr. Pre{., t) (respectively Post{., t)) is a mapping from P to N induced from Pre (respectively Post) by fixing to t as a second argument.

,—3-

^

Tt , - -^'^^ ^ • "•• — ^ SQ (RPNl, (0,0,0), Bl) = i S(w(RPNl, (0,0,0), Bl) t Firing of Elementary '—-*, , ,. . , , , (RPNl, (1,0,0), UFinng of Abstracts ^ i | Transition / Self.GoTo(Dest) . , I Bl) |lJransition__^ ; self.GoTo(Dest = ^ . o„....„..,^..vx ' SQ• " I • I V ^cvTV.-^ I t = Self.Move'n)(Z) \ J / ns = si [#(RPN2,(0,1,0),B2)

'"^"^r'

I i

Bl(Dest,Good)

K_ ^ L „ „ ..^ (Q) B2(Z,Dest)

.-I

K_ ^ ~

(R)

Fig. 4.3. States of the Conveyor Plan

Definition 5. A Transition Firing Rule A transition t is said to befireablefrom a node s e S if and only if: • Pre{s,t) <M{s) • R{s) is totally initialized • The Pre-Conditions of the method associated with t are instantiated with Call{t) and are valid.

60

Amal El Fallah-Seghrouchni

Definition 6. A situation is said to be a failure situation if all the transition firing conditions are satisfied except for the Pre-Conditions of the associated method. Let us now examine the transition firing rule . 4.4.5 Firing Rules in RPN formalism Definition 7. Elementary transition: Let t e Teiem- The firing of t from s e S produces the new tree Tr' with the new marking MTT' ofs such that: • Tr' = {STr'.Arr') where: Srp^f = Sxr such that: Vs' j^ s^MTr'{s') — Mrris') (i.e. the marking is unchanged Vs' ^ s) Axr' = ^Tr such that: \fa € ATr^TransTr'{o) = TransTr{o) • MTr'{s) = MTAS) -h Post{., t) - Pre{.,t) This definition means that the only change that occurs concerns the node s e Srr by firing t. The effects of this change are to add Post{., t) to and subtracts Pre{.,t) from the previous marking of s (see the node si in case(R) in Fig. 4.3). In the plan, the change results corresponds to the applying of the Post-Conditions associated with the fired transition. Definition 8. Abstract transition: Lett e Tabs- The firing oft from s G S produces the new tree Tr' with the new marking MTV' ofs such that: • • •

Sxr' = Sxr Ui^-^} ^here ns represents an initialized RPN produced by the initialization mechanism of the method associated with t ATV' = Axr U{^} ^here a =< 5, ns > and TransTr' {0) = t MTr'{s) = MTAS) - Pre{.,t)

This definition means that the firing of t from the node 5 subtracts Pre{., t) from the previous marking of 5, creates a leaf ns in Tr' (see the node 5o in case (Q) in Fig. 4.3) as a child of s. ns is labeled by one of the initialized RPN associated with the method call of t (see si in case(Q) in Fig. 4.3). The new arc < s^ns > is labeled by the abstract transition t. Definition 9. Let Tr{s) be a tree with the node s as its root. The function PRUNE{Tr^ allows the tree Tr to be cut from the node s as follows: Tr{s) Prune{Tr, s) H-> Tr' such that Tr' =Tr\ Remark: Tr' = 0 if the node s is the root of Tr. Definition 10. End transition: Lett e Tend-Thefiringoft from s G S produces the new tree Tr' with the new marking MTV' ofs such that: •

Tr' = PRUNE{Tr,s) Let s' be the immediate predecessor ofs in the tree Tr and a =< s', s > the arc labeled by t, then:

s)

4 Multi-agent Planning for Autonomous Agents' Coordination •

Wa E ATr'tTransTr'io)

=

•

Mrr'is^) = MTr{s^) + Post{.,

61

TransTr{o) TransTr{o))

This definition means that the firing of an end action tend belonging to an initialized RPN which has been generated by the firing of tabs corresponds to the applying of the Post-Conditions of the call of the abstract method associated with tabs- It closes the sub-net and adds the Post{., tabs)•

4.5 Concurrent Plan Management Interacting situations are generally expressed in terms of binary relationships between actions and are often detected statically. The interleaving of planning and execution requires both static and dynamic detection of such situations which is ensured through the RPN semantics. Moreover, these situations are usually represented semantically which makes their handling difficult if not impossible. The syntactic aspects of such situations are often required to allow their operational management. 4.5.1 Interacting Situations through RPN Here, planning and coordination aspects are merged, thus offering a number of advantages. When an agent cannot execute the refined plan he communicates it to another agent. Conrmiunication triggers a coordination mechanism which is based on the plan merging paradigm. Coordination is globally initiated by the incoming new plan (i.e. a new plan is submitted to an agent). The most important interacting situations handled by our approach include both positive and negative interactions. Positive Interactions -

Redundant Actions: Actions are redundant if they appear in at least two plans belonging to two different agents, have identical Pre- and Post-Conditions, and the associated methods are instantiated by the same parameters except for the agent parameter (who has to perform the action). Hence, coordination assigns action execution to one of the agents. The agent who will perform the action has to provide his results. The others have to modify their plans by including a synchronizing transition. Helpful Actions: Actions are said to be helpful if the execution of one satisfies all or some of the Pre-Conditions of the others. Their execution will be respectively possible or favored. There are two ways of detecting such a situation: dynamic detection: during the execution of an abstract action, the refinement of which encapsulates an elementary action which validates another action' s Pre-Conditions, static detection: when the execution of one action precedes the execution of another and validates its Pre-Conditions.

62

Amal El Fallah-Seghrouchni

Negative Interactions -

-

Harmful Actions: Actions are said to be harmful if the execution of one invalidates all or some of the Pre-Conditions of the others. Consequently, the execution of the latter will be respectively impossible or at an unfair disadvantage. Such a situation must be detected before the new plan execution starts in order to predict failure or deadlock. Our coordination mechanism introduces an ordering between the harmful actions as in [3] which provides a coordination algorithm (COA) for handling such interactions between n agents at once. Exclusive Actions: Actions are said to be exclusive if the execution of one momentarily prevents the execution of the others (e.g. their execution requires the same non-consumable resource). Detected dynamically, this situation occurs when an exclusive action has been started (i.e. an exclusive transition firing). In this case, execution remains possible but is deferred since it requires coordination with other executions. Incompatible Actions: Actions are said to be incompatible if the execution of one prevents the execution of the others (e.g. their execution requires the same consumable resource). In our model, such a situation models an altemative (e.g. two transitions share the same input place with one token). In this case, execution remains possible only if the critical resource can be replaced. In our approach, the planner uses heuristics based on two alternatives which may be combined in order to avoid conflicts: if the conflict concerns an abstract action, the planner tries to substitute the current refinement. Otherwise, the method used will be replaced.

4,5.2 Plan Merging Paradigm Hypotheses -

The RPN is acyclic and the initial net is reduced to an abstract transition before starting the execution. After plan generation, the plan actions have no negative interactions. The value of each unspecified condition is assumed to be unchanged as in STRIPS formalism. If the Pre-Conditions of an action method are valid, then the Post-Conditions are necessarily satisfied after its execution. Otherwise, no assumption is made about the validity of the Post-Conditions.

In the following, the handling of positive and negative interactions will be described through our case study. Let Clients {C/1,0/2} and Conveyor Cv be three agents who are working out their respective plans. Handling Positive Interactions: The main phases of the coordination algorithm are:

4 Multi-agent Planning for Autonomous Agents' Coordination

63

1. Starting coordination: An agent (e.g. Cli) has to perform a plan Hi corresponding to a leaf in his execution tree Tr but i l i is partially instantiated i.e. some plan methods associated with plan transitions have non-instantiated call parameters (e.g. Y e III the TYPE of which is CONVEYOR). The agent (e.g. Ch) must find an agent (e.g. Conveyor) who will execute these methods. He starts a selective communication, based on his acquaintances, until he receives a positive answer. Let us assume that Cli chooses Cv and then sends him i l i . 2. Recognition and unification: The agent who receives iJi (e.g. Cv) detects the methods that are partially instantiated. Then, he examines his execution tree in search of the same methods. This phase is achieved with success if he finds an RPN (a node in his Tr) where appear all the methods to be instantiated (see Y.Transport(X) which is non-assigned in 77i and assigned in 772 in Fig. 4.4). Then, Cv triggers unification of the methods through their call parameters and instantiation of the variables w.r.t. the two plans. If both unification and instantiation are possible, Cv tries to merge the two plans J7i and 772. 3. Structural merging through the transitions: Cv produces a first merging plan 77^ (see Fig. 4.4) through the transitions associated with the previous methods and instantiates the call parameters. Then he checks that all the variables have been instantiated and satisfy both the Pre- and Post-Conditions. 4. Consistency checking: This phase is the keystone of the coordination mechanism since it checks the feasibility of the new plan which results from the structural merging. It is based on the algorithm using the Pre- and Post Conditions Calculus (PPCC) described in the following.

Selfl.Put(X)

Initial Client Plan

Self.Transport(X)

Selfl.Get(X) j 1

Client Plan Refinement Conveyor Plan refinement

iSelf.End_Conv

Proposed Plan

Fig. 4.4. Structural Merging

Pre- and Post-Conditions Calculus Algorithm (PPCC) The PPCC algorithm is based on two phases: i. Reachability Tree Construction (RTC): To begin with, we have to build an RT using a classical recursive right depth first algorithm which returns the tree root. Fig.4.5 shows the RT of the Structural Merging Plan 77^ given in Fig.4.4.

64

Amal El Fallah-Seghrouchni

%>

Initial M a r k e d Places

Selfl.Put(X) Self.Go_To(X.Src_Loc) Can he :Pl.Q0> ^ P Qj Ql> ^^J^exchanged I Self.Go_To(X.Src_Loc) t„^l£L?!^?^) Self.Transport(X) r, i ^ Self 1 .Get(X)' Self.End_Conv ^ ^-^ ^ r^ ^ ^1^-=* *=- ^ 2 Self.End_Conv Selfl.Get(X)

<>

<>

Identical siJb-tree J c a n b e c u t ,,''

> ::i marked places . ^ J ' V ' ^^^ transition t ' is fired , . ' < > : no transition can . be fired

Fig. 4.5. A Reachability Tree

ii. Pre- and Post-Conditions Calculus: This algorithm [3]is called with the root of RT and with the parameters of the current execution context. Property: For all plan 77 modeled as an RPN, if the FFCC algorithm applied to the reachabilty tree (RT) of 77 returns true (i.e. 77 is consistent) and the environment is stable, then no failure situation can occur. Discussion: In order to avoid a combinatorial explosion, there exist algorithms which allow to construct a reduced RT. In our context, the RT can be optimized as the following: let ni and n^ be two nodes of RT. The RTC can be optimized through analysis of the method calls as follows: -

-

Independent Nodes: there is no interference between the Pre- and Post-Conditions of rii on the one hand and the Pre- and Post-Conditions of rij on the other hand, i.e. the associated transitions can be fired simultaneously whatever their ordering (i.e. the global execution is unaffected). Here, an arbitrary ordering is decided (e.g. Self.Put and Self.Go_To can be exchanged) which allows many sub-trees to be cut(the right sub-tree in Fig. 4.5). Semi-Independent Nodes: there is no interference between the Post-Conditions of Ui and rij, i.e. the associated transitions don't affect the same attributes. If the exchanged sequences ({n^, Uj } or {uj, n^}) of firing transitions which lead to the same marking can be detected then the sub-tree starting from this marking can be cut. The obtained graph is then acyclic and merges the redundant sub-trees.

Handling Negative Interactions: Now an other agent (e.g. Cli) sends his plan to Cv who processes 77i (C/i's plan). GCOA as Generalization of the COA Algorithm [3] The Conveyor starts a new coordination. The first and second phases are the same as in the case of positive interactions.He chooses the plan 772. Negative interactions arise when the two refinements (772 and 772) have shared attributes (e.g. the boat volume constraint). Now, Cv has to solve internal negative interactions before proposing a merging plan to C/2. Again the GCOA is divided into two steps.

4 Multi-agent Planning for Autonomous Agents' Coordination

65

i. Internal Structural Merging by Sequencing: Cv connects 772 and il2 by creating a place pi for each pair of transitions (te^U) in End{n2) x Init{Il2) and two arcs in order to generate a merged plan 77^: Function Sequencing(in iJi, 772: Plan): Plan; {this function merges 11\ and 772/ produces a merged plan Tim and the synchronization places } begin Let TE = {te G Hi/te is an end transition} and Ti = {U e 112/U is an initial transition } (i.e. U has no predecessor in Pi2) for ail (te.ti) GTE x T / d o Create a place pe,i Create an input arc IAe,i from te to pe,i Create an output arc OAe,i from pe,i to U (i.e. Post{pe,i,te) = 1 and Pre{pe,i,ti) = 1) endfor Hm := Merged_Plan (TTi, 772, {pa}, {IAe,i}, {OAe,i}) return (77^, {pe,i}) end {Sequencing} ii. Parallelization by Moving up arcs: Cv applies the FFCC algorithm to the merged net 77^^ obtained by sequencing. If the calculus returns true then the planner proceeds to the parallelization phase by moving up the arcs recursively in order to introduce a maximum parallelization in the plan. This algorithm [3] tries to move (or eliminate) the synchronization places. The predecessor transition of each synchronization place will be replaced by its own predecessor transition in two cases: the transition which precedes the predecessor transition is not fired or is not in firing. If both the Pre- and Post-Conditions remain valid, then a new arc replaces the old. The result of this parallelization is to satisfy both Cli and CI2 by executing the merged net 77^.

rg J JL

Moving Up Arcs te;ti ^te//ti

Fig. 4.6. Sequencing and Parallelization

Remark: At each moving up the arcs, the FFCC algorithm is applied to the new net.

66

Amal El Fallah-Seghrouchni

The exchanged plans are the old ones augmented by synchronization places upstream and downstream. This algorithm can be optimized at the consistency control level. In fact, the coherence checking can be applied in incremental way to each previous plan 772.

4.6 Hybrid Automata Fomalism for Multi-Agent Planning The second model of multi-agent planning we developped is based on Hybrid Automata [7] which represent an alternative formalism to deal with multi-agent planning when temporal constraints play an important role. In this modelling, the agents' behaviour (throught individual plans and multi-agent plans) is state-driven. The interest of those automata is that they can model different clocks evolving with different speeds. These clocks may be the resources of each agent and the time. A Hybrid Automaton is composed of two elements, a finite automaton and a set of clocks: •

•

A finite automaton A is a tuple: A =< Q, E, Tr, qo, I > where Q is the set of finite states, E the set of labels, Tr the set of edges, qo the initial locations, and I an application associating the states of Q with elementary properties verified in this state. A set of clocks H, used to specify quantitative constraints associated with the edges. In hybrid automata, the clocks may evolve with different speeds.

Tr is a set of edge t such as ^ G Tr, t =< s, {{g}, e, {r}), s' >, where: •

s and 5' are elements of Q, they respectively model the source and the target of the edge t =< s, {{g}, e, {r}), s' > such that: - {g} is the set of guards. It represents the conditions on the clocks; - e is the transition label, an element of E; - {r} is the set of actions on clocks.

Multi-agent plans can be modeled by a network of synchronized hybrid automata (a more detailed presentation can be found in [4]). They provide an important interest since they take into account the agents features and the time as parameters of the simulation (those variables may be modeled by different clocks evolving with different speeds inside the automata). All the parameters of the planning problem may be represented in the hybrid automata: the tasks to be accomplished are represented by the reachable states in the automata; the relation between tasks by the edges; the pre-, post- and interruption conditions by the guards of the edges; and finally the different variables by the clocks of the automata. Let us define the synchronized product: Considering n hybrid automata Ai =< Qi^Ei, Tri,qo^i, k, Hi >, fori = 1, ...,n. •

Q=

QiX...xQn\

4 Multi-agent Planning for Autonomous Agents' Coordination

•

•

T = {((gi,...,^9n),(ei,...,en),(5l,...,gl)|, Ci = ' - ' and g- = qi or Ci -f=! -' and {q^ e^, q-) G Tr J ; 90 = (9o,i,9o,2,.--,9o,n);

•

H = Hi X ... X Hn-

67

So, in this product, each automaton may do a local transition, or do nothing (empty action modeled by '-') during a transition. It is not necessary to synchronize all the transitions of all the automata. The synchronization consists of a set of Synchronization that label the transitions to be synchronized. Consequently, an execution of the synchronized product is an execution of the Cartesian product restricted to the label of transitions. In our case, we only synchronize the edges concerning the temporal connectors Sstart and Send- Indeed the synchronization of individual agent's plans is done with respect to functional constraints and classical synchronisation technics of the automata formalism like "send / reception" messages. Hybrid Automata formalism and the associated coordination mechanisms are detailed in [5].

4.7 Conclusion The two models presented in this paper are suitable for multi-agent planning. The recursive Petri nets allow the plans modelling (both at the agent and multi-agents levels) and their management when abstraction and dynamic refinement are required. RPN allows, easily, the synchronization of individual agents'plans. They are, in particular, interesting for the multi-agent validation thanks to the reachability tree building if combined to reduction technics (in order to avoid the combinatory explosion of the the number of states). The main shortcoming of this model is the absence of explicit handling of temporal constraints. This is why we developped a model based on Hybrid Automata that model different clocks evolving with different speeds. These clocks may be the resources of each agent and the time.

References 1. R. Alur and D. Dill A Theory of Timed Automata. Theoretical Computer Science. Vol. 126, n. 3, pages 183-225. (1994) 2. A. Barrett and D.S. Weld. Characterizing Subgoal Interactions for Planning. In Proceedings ofIJCAI-93, pp 1388-1393. (1993). 3. A. El Fallah Seghrouchni and S. Haddad. A Recursive Model for Distributed Planning. In the proceedings of ICMAS'96. IEEE publisher. Japan (1996). 4. A. El Fallah-Seghrouchni, I. Degirmenciyan-Cartault and F. Marc. Framework for MultiAgent Planning based on Hybrid Automata. In the proceedings of CEEMAS 03 (International/Eastern Europe conference on Multi-Agent System). LNAI2691. Springer Verlag. Prague.(2003).

68

Amal El Fallah-Seghrouchni

5. A. El Fallah-Seghrouchni, R Marc and I. Degirmenciyan-Cartault. Modelling, Control and Validation of Multi-Agent Plans in Highly Dynamic Context. To appear in the proceedings of AAMAS 04. ACM Publisher. New York.(2004). 6. M.R Georgeff. Planning. In Readings in Planning. Morgan Kaufmann Publishers, Inc. San Mateo, California.(1990) 7. T. A. Henzinger. The theory of Hybrid Automata. In the proceedings of 11th IEEE Symposium Logic in Computer Science, pages 278-292.(1996) 8. T. A. Henzinger, Pei-Hsin Ho and H. Wong-Toi HyTech : a model checker for hybrid systems Journal of Software Tools for Technology Transfer. Vol. 1, n. 1/2, pages 110122. (2001) 9. Jensen, K. High-level Petri Nets, Theory and Application. Springer-Verlag.(1991) 10. Martial, V. 1990. Coordination of Plans in a Multi-Agent World by Taking Advantage of the Favor Relation. In Proceedings of the Tenth International Workshop on Distributed Artificial Intelligence.

Creating Common Beliefs in Rescue Situations Barbara Dunin-K^plicz^'"^ and Rineke Verbrugge"^ ^ Institute of Informatics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland [email protected] ^ Institute of Computer Science, Polish Academy of Sciences, Ordona 21, 01-237 Warsaw, Poland ^ Institute of Artificial Intelligence, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands [email protected] Summary. In some rescue or emergency situations, agents may act individually or on the basis of minimal coordination, while in others, full-fledged teamwork provides the only means for the rescue action to succeed. In such dynamic and often unpredictable situations agents' awareness about their involvement becomes, on the one hand, crucial, but one can expect that it is only beliefs that can be obtained by means of communication and reasoning. A suitable level of communication should be naturally tuned to the circumstances. Thus in some situations individual belief may suffice, while in others everybody in a group should believe a fact or even the strongest notion of common belief is the relevant one. Even though conmion knowledge cannot in general be established by communication, in this paper we present a procedure for establishing common beliefs in rescue situations by minimal conununication. Because the low-level part of the procedure involves file transmission (e.g. by TCP or alternating-bit protocol), next to a general assumption on trust some additional assumptions on conmiunication channels are needed. If in the considered situation communication is hampered to such an extent that establishing a common belief is not possible, creating a special kind of mutual intention (defined by us in other papers) within a rescue team may be of help.

5.1 Introduction Looking at emergency situations in their complexity, a rather powerful knowledgebased system is needed to cope with them in dynamic and often unpredicatble environment. In emergencies, coordination and cooperation are on the one hand vital, and on the other side more difficult to achieve than in normal circumstances. To make the situation even more complex, time is critical for rescues to succeed, and communication is often hampered. Also, usually expertise from different fields is needed. Multiagent systems exactly fit the bill: they deliver means for organizing complex, sometimes spectacular interactions among different, physically and/or logically distributed knowledge based entities [I]:

70

Barbara Dunin-K^plicz and Rineke Verbrugge

A MAS can be defined as a loosely coupled network of problem solvers that work together to solve problems that are beyond the individual capabilities or knowledge of each problem solver. This paper is concerned with a specific kind of MAS, namely a team. A team is a group in which the agents are restricted to having a common goal of some sort in which team-members typically cooperate and assist each other in achieving their common goal. Rescuing people from a crisis or emergency situation is a complex example of such a common goal. Emergency situations may be classified along different lines. It is not our purpose to provide a detailed classification here, but an important dimension of classification is along the need for teamwork. A central joint mental attitude addressed in teamwork is collective intention. We agree with [2] that: Joint intention by a team does not consist merely of simultaneous and coordinated individual actions; to act together, a team must be aware of and care about the status of the group effort as a whole. In some rescue situations, agents may act individually or on the basis of minimal coordination, while in others, full-fledged teamwork, based on a collective intention, provides the only means for the rescue action to succeed. MAS can be organized using different paradigms or metaphors. For teamwork, BDI (Beliefs, Desires, Intentions) systems form a proper paradigm. Thus, some multiagent systems may be viewed as intentional systems implementing practical reasoning — the everyday process of deciding, step by step, which action to perform next. This model of agency originates from Michael Bratman's theory of human rational choice and action [3]. His theory is based on a complex interplay of informational and motivational aspects, constituting together a belief-desire-intention model of rational agency. Intuitively, an agent's beliefs correspond to information the agent has about the environment, including other agents. An agent's desires or goals represent states of affairs (options) that the agent would choose. Finally, an agent's intentions represent a special subset of its desires, namely the options that it has indeed chosen to achieve. The decision process of a BDI agent leads to the construction of agent's commitment, leading directly to action execution. The BDI model of agency comprises beliefs referring to agent's informational attitudes, intentions and then commitments referring to its motivational attitudes. The theory of informational attitudes has been formalized in terms of epistemic logic as in [4, 5]. As regards motivational attitudes, the situation is much more complex. In Cooperative Problem Solving (henceforth CPS), a group as a whole needs to act in a coherent pre-planned way, presenting a unified collective motivational attitude. This attitude, while staying in accordance with individual attitudes of group members, should have a higher priority than individual ones. Thus, from the perspective of CPS these attitudes are considered on three levels: individual, social (bilateral), and collective.

5 Creating Common Beliefs in Rescue Situations

71

When analyzing rescue situations from the viewpoint of BDI systems, one of the first purposes is to define the scope and strength of motivational and informational attitudes needed for successful team action. These determine the strength and scope of the necessary communication. In [6], [7], we give a generic method for the system developer to tune the type of collective commitment to the application in question, the organizational structure of the group or institution, and to the environment, especially to its communicative possibilities. In this paper, however, the essential question is in what terms to define the communication necessary for teamwork in rescue situations. Knowledge, which always corresponds to the facts and can be justified by a formal proof or less rigorous argumentation, is the strongest and therefore preferred informational attitude. The strongest notion of knowledge in a group is common knowledge, which is the basis of all conventions and the preferred basis of coordination. Halpem and Moses proved that common knowledge of certain facts is on the one hand necessary for coordination in well-known standard examples, while on the other side, it cannot be established by communication if there is any uncertainty about the communication channel [4]. In practice in MAS, agents do with belief instead of knowledge for at least the following reasons. First, in MAS perception provides the main background for beliefs. In a dynamic unpredictable environment the natural limits of perception may give rise to false beliefs or to beliefs that, while true, still cannot be fully justified by the agent. Second, communication channels may be of uncertain quality, so that even if a trustworthy sender knows a certain fact, the receiver may only believe it. Conmion belief is the notion of group belief which is constructed in a similar way as common knowledge. Thus, even though it puts less constraints on the communication environment that common knowledge, it is still logically highly complex. For efficiency reasons it is often important to minimize the level of communication among agents. This level should be tuned to the circumstances under consideration. Thus in some situations individual belief may suffice, while in others everybody in a group should believe a fact and again in the others the strongest notion of common belief is needed. In this paper we aim to present a method for establishing common beliefs in rescue situations by minimal conmiunication. If in the considered situation communication is hampered to such an extent that establishing a common belief is not possible, we attempt some alternative solutions. The paper is structured in the following manner. In section 5.2, a short reminder is given about individual and group notions of knowledge and belief, and the difficulty to achieve conmion belief in certain circumstances. Then, a procedure for creating conmion beliefs is introduced in section 5.3, which also discusses the assumptions on the environment and the agents that are needed for the procedure to be effective. Section 5.4 presents three case studies of rescue situations where various collective attitudes enabling appropriate teamwork are established, tuned to the communicative possibilities of the environment. Finally, section 5.5 discusses related work and provides some ideas about future research.

72

Barbara Dunin-K^plicz and Rineke Verbrugge

5.2 Knowledge and belief in groups In multiagent systems, agents' awareness of the situation they are involved in is a necessary ingredient. Awareness in MAS is understood as a reduction of the general meaning of this notion to the state of an agent's beliefs (or knowledge when possible) about itself, about other agents as well as about the state of the environment, including the situation they are involved in. Assuming such a scope of this notion, different epistemic logics can be used when modelling agents' awareness. This awareness may be expressed in terms of any informational (individual or collective) attitude fitting given circumstances. In rescue situations, when the situation is usually rather complex and hard to predict, one can expect that only beliefs can be obtained. 5.2.1 Individual and common beliefs To represent beliefs, we adopt a standard i^£)45n-system for n agents as explained in [4], where we take BEL(i,(/?) to have as intended meaning "agent i believes proposition (/?". A stronger notion than the one for belief is knowledge, often called "justified true belief. The usual axiom system for individual knowledge within a group is 55^, i.e. a version of KDA5n where the consistency axiom is replaced by the (stronger) truth axiom KN0W(2, ip) —^ (p. We do not define knowledge in terms of belief. Definitions occurring in the MAS-literature (such as KNOW(i, if)
^

/ \ BEL(i,v?) ieG

C2 C-BELG(<^) ^ E-BELG(C^ A RCl From ip -^ E-BELG {ip A (p)

R2 From (p infer BEL(z,
C-BELG{^)) C-BELG('0)

infer (p -^

(Induction Rule) (Belief Generalization)

Axiom C2 is often called the fixed-point axiom, showing how C-BELG{(P) can be viewed asfixedpoint of the function f{x) = E - B E L G ( ^ A x). Soundness of rule

5 Creating Common Beliefs in Rescue Situations

73

R2 is proved by induction, to show from the antecedent |= (^ -^ E-BELG(V^ A ip) that \= If -^ E-BEL^('0 A (/?); thus it is named Induction Rule. Alternative way of formalizing common belief is given by Meyer and van der Hoek in [5]. In contrast to common knowledge, which is always sure, common belief need not be truthful, thus in some situations C-EELG{<^) may even become a common illusion. The axiom system governing individual and common belief is called KD45^ (see [4,9,5] for more about these logics). We do not give details about the semantics here, but only a reminder that in a possible world w, agent i knows ip (KNOW(z, (f)) if and only if (p is true in all worlds v that are knowledge-accessible for i from world w, and similarly for belief BEL (i, (p), where the behef-accessible worlds are checked for truth of (p. 5.2.2 Degrees of belief in a group It is well-known that for teamwork, as well as coordination, it often does not suffice that a group of agents all believe or know a certain proposition (E-BELG(V^) or E-KNOWGf(V^)), but they should commonly believe or know it (C-BELG(V^) or C-KNOWGr('0))- An example is formed by collective actions where the success of each individual agent is vital to the result, for example, lifting a heavy object together or coordinated attack. It has been proved that for such an attack to succeed, the starting time of the attack must be common belief (even common knowledge) for the generals involved [4]. Parikh has introduced a hierarchy of levels of knowledge between individual knowledge and common knowledge and, together with Krasucki, proved a number of interesting mathematical properties. It turns out that, due to the lack of the truth axiom, the similarly defined hierarchy between individual belief and common belief is structurally different from the knowledge hierarchy [10]. One advantage of common belief over "everybody believes" is that if C-BELG? holds for i/j, then C - B E L G also holds for all logical consequences of i{;. The same is true for common knowledge. Thus, agents reason in a similar way from I/J and commonly believe in this similar reasoning and the final conclusions. In short, common knowledge and common belief are hard to achieve, but easy to understand. In cases in which only E-BELG(V^) has been established, it is much more difficult for agents to maintain a model of the other team members with respect to ijj and its consequences. However, establishing E-BELG{IP) places much less constraints on the communication medium than C - B E L G ( ^ ) does. Thus, the system developer's decision about the level k of group belief (E-BEL^(V;)) to be established, hinges on determining a good balance between communication and reasoning for a particular application. 5.2.3 Difficulties in attaining common knowledge Halpem and Moses [11] proved a surprising result in the eighties: under some very natural assumptions, namely that processors do not change their local states simultaneously, common knowledge does not increase over a run (sequence of time steps)

74

Barbara Dunin-K^plicz and Rineke Verbrugge

in a distributed system. The well-known example of the two generals who do not manage to reach common knowledge about the time of attack, even if a messenger brings any number of acknowledgments back and forth, is an example of this result. If there is any uncertainty about the messenger making it to the other general, even about whether he may be delayed, conmion knowledge cannot be reached [4]. In rescue situations, there is almost always uncertainty about messages reaching the other party. Note that Halpem and Moses' result does not carry over to common belief. Their proof hinges on the fact that if processors do not change their local states simultaneously, then any two global states in a sequence of time-steps are accessible to each other by a sequence of knowledge-accessibility relations. This is in turn based on the fact that other global states with the same local state are always knowledgeaccessible for a processor, a fact that need not hold for belief-accessibility.

5.3 A procedure for creating common beliefs Even though common knowledge cannot in general be established by communication, we will show that common belief can. In this context, it turns out to be an advantage that belief, in contrast to knowledge, need not be true. Thus, Halpem and Moses' impossibility results about the growth of common knowledge do not carry over to common belief. Even stronger, it is possible to give a procedure that can, under some assumptions, establish common beliefs. Usually in MAS literature, it is assumed as a simplification that public announcements are always successful: announcements reach all group members, and in the end their content is commonly believed by the group. Such an assumption takes for granted that the communication medium is perfect and that no messages are lost, which is not the case in practice. In this paper, we relax this assumption on a perfect communication medium. In this section, we will informally present a procedure for creating a conmion belief in a group, essentially by one initiator broadcasting an appropriate message to all agents in the group. 5.3.1 The procedure for creating common beliefs Suppose an initiator a wants to establish C-BELG((/?) within a fixed group G = { 1 , . . . , n}, where a e G. Informally and from a higher-level view, the procedure works by the initiator a sending messages as follows:

5 Creating Common Beliefs in Rescue Situations

75

1. a sends the message cp to agents { 1 , . . . , n} in an interleaved fashion, where each separate message is sent from a to i using the alternating-bit protocol or TCP; 2. then in the same way, a sends the message C-BELG(V?) to agents { 1 , . . . , n}; 3. recipients send acknowledgements of bits received (as by the alternating-bit protocol and TCP) but need not acknowledge the receipt of the full message. Finally, all agents believe the messages they receive from a, and a believes them as well. Thus, after all agents have received the messages, we will have BEL(z, ip A C-BELGM)foralH < n, thusby axiomCl, wehaveE-BELG(<^AC-BELG((^)), which by axiom C2 is equivalent to C-BELG((/?), as desired. Notice that the reason that this procedure can establish common belief, whereas common knowledge can never be established, is exactly that common beliefs need not be true. Thus, initiator a may believe and utter (p A C - B E L G ( ^ ) even if C-BELG{^) has not yet, in fact, be established. Thus, if if is in fact true, (p A C-BELG(V^) is a prime example of the belief-analogue of a "successful formula" as defined in dynamic epistemic logic, namely a formula that comes to be commonly believed by being publicly announced [12]. On a lower level, the procedure is built on a well-known protocol for file transmission. We give a short reminder here. 5.3.2 A file transmission protocol There are two processors, let us say a sender S and a receiver R. The goal is for S to read a tape X = ( x o , x i , . . . ) , and to send all the inputs it has read to R over a communication channel. R in turn writes down everything it reads on an output tape Y. Unfortunately, the channel is not trustworthy: there is no guarantee that all messages arrive. On the other hand, if an agent repeats sending a certain message long enough, an instance of it will arrive eventually. This property is csiWed fairness. Now one needs a protocol that satisfies the following two constraints, provided that fairness holds: • safety: at any moment, y is a prefix of X; • liveness: every Xi will eventually be written on Y, In the knowledge-based protocol below, Ks{xi) means that S knows that the ithe element of X is equal to Xi. PROTOCOL FOR 5:

5 1 i :=0 52 w h i l e true do 53 b e g i n read x^; 54 send xi u n t i l KsKR{xi)\ 55 s e n d ''KsKR{xiY u n t i l SQ^ i := i + 1 57 end PROTOCOL FOR R\

KsKRKsKR{xi)

76

Barbara Dunin-K^plicz and Rineke Verbrugge

Rlvjhen KR(XO) set i :=0 R2 w h i l e true do R3 b e g i n write a^^; R4 s e n d "KRixiY' u n t i l KRKsKR{xi)\ R5 s e n d "KRKsKR{xiy" u n t i l KR{xij^i) R6 i := i + 1 R7 end Suitable implementations of this protocol have been proved to be correct for fair environments in which errors like deletion, mutation or insertion may occur, or even any combination of two of them, but not all three [13]. As a final remark, let us note that it is possible to rewrite the protocol without using any knowledge operators. The result is known as the 'alternating-bit protocol'. The often used Internet protocol TCP is a variation of this protocol, where transmission does not work bit by bit, but window by window, where the size of the window may be adapted to the available bandwidth [14]. 5.3.3 The effectiveness of the procedure under some assumptions There are three different kinds of assumptions that work together to make the procedure above effective. Assumptions about the communication channels Because the low-level part of the procedure involves file transmission from a to the other agents in G by the alternating-bit protocol or TCP, the assumptions of the chosen protocol need to hold, namely fairness, and presence of no more than two types of error (see subsection 5.3.2). Also, we make the usual that the sender a's state records all data elements it has read and acknowledgements received, and that all other receiver agents in G record all data elements they have written. These assumptions take care that, after a finite time, all agents have received the complete message (/?AC-BELGM.

Assumptions on trust Whenever communication between agents appears, the question of trust is inevitably involved. Though this paper is not meant to be yet another voice in the discussion about trust in commonsense reasoning, in order to make conmiunication and reasoning based on it more context-sensitive, it is useful to distinguish different notions or levels of trust. For example, an agent can trust the other completely: (TRUST{j, a) for j trusts a), or partially (e.g. TRUST^{j,a) for j trusts a w.r.t. formula ijj). See [15] for interesting discussions about trust in MAS. It seems that for public announcements, the speaker's assertions are believed by the hearers as long as trust is present. Thus, after agent a asserts ip to agent j in such

5 Creating Common Beliefs in Rescue Situations

77

a context and j has received the message, we have (see [16]):

TRUST^iJ^a) -^ B E L ( j » . For the procedure of subsection 5.3.1 to work, an assumption is needed of partial trust of all agents in G with respect to agent a and the formulas (p and C - B E L G (
78

Barbara Dunin-K^plicz and Rineke Verbrugge

occur. Formally, this can be proved using the semantics of interpreted systems / (i.e. sets of runs) that are consistent with the knowledge-based protocol [13]: Theorem 1. Let I be an interpreted system consistent with the knowledge-based protocol given in section 5.3.2. Then every run of I has the safety property and every fair run of I has the liveness property. Intuitively, safety is obvious since each receiver j E G writes a data element only if it knows its value, for, by coding, all errors are detected. Assuming fairness, one can show that a eventually sends j every bit of the representation of (p, thus that every message eventually arrives and is written by the receiver. Halpem and Zack shown in [13] that the sender a establishes explicit depth 4 knowledge of the form KaKjKaKj{xi) of data element xi before sending the next data element: Theorem 2, Let R be any set of runs where: • •

• •

the environment allows for only two kinds of errors; the safety property holds (so that at any moment the sequence Y of data elements received by j is a prefix of the infinite sequence X of data elements on a's input tape); a's state records all data elements that it has read and all acknowledgements that it has received; j's state records all the data elements it has written, for each j £ G, j ^ a.

Then for all runs in R we have: If a stores message "KjKaKj{xi)", KaKjKaKjiXi).

then for all moments from then on, it holds that

In particular, after j has received the acknowledgement on the last bit of (p, we have for all moments from then on, KaKjKaKj{xi) for all bits Xi of (p. Thus at the end of step 1, for all agents j ^ am G: KNOW(j,"the formula (p has been received"), and a knows this in turn. To conclude from this that BEL(j, (/?), we use the assumption on trust, namely that TRUST^ (j, a) holds everywhere in the run, from which BEL(j, (p) and thus E-BELG(V^) follow immediately. By the assumption on persistence of beliefs, these beliefs about (p remain valid throughout the next part of the procedure, step 2. Step 2 of the procedure Similarly as in step 1, by the end of step 2, for all agents j 7^ a in G we have: KNOW(j,"the formula C-BELG{^) has been received"), and a knows this in turn. By assumption TRUSTQ-BELGMU^^) ^^is leads to BEL(j, C-BELG((/P)) for all j e G, thus by the assumption on persistence of beliefs on (p, E-BELc?((/? A C - B E L G ( V ^ ) ) , which by axiom C2 is equivalent to C-BELG?(V^), as desired. In a full proof, assumptions such as TRUST^{j, a) should be formally operationalized.

5 Creating Common Beliefs in Rescue Situations

79

5.4 Creating collective attitudes in rescue situations The procedure for creating common beliefs is typically suited for situations where coordination (and thus a common belief) is needed, but there is no time to waste on communication among group members. In such a case the kind of fixed one-tomany communication (like announcement) from an initiator to the rest of a group, as exemplified by this procedure, is efficient and effective. Situations where oneto-many communication is possible, but where other members in G cannot easily reach each other, are common in rescue or emergency situations (e.g. alarm telephone hotlines). 5.4.1 Case study I: Creating common belief in a rescue situation Let us consider a situation from the real world in which a variant of the procedure of subsection 5.3.1 would work well. Let a be the operator of an alarm telephone line SOS, that people can call in emergency situations where lives are in danger. Let h be a witness who sees that a house is on fire and assumes that there are people inside, and calls the SOS line. Now h provides a with information about the place and the nature of the disaster, represented by a formula -0. In her turn, a calls the ambulance service A, the police P and the fire department F , giving all of them essentially the information ijj A C-BELG(V^) (where G = {a, A, P, F}), by checking that the others receive all information accurately, and conveying that she is giving exactly the same information to the others. In this example the role of each participant in the rescue process is clearly identified and well defined. If the rescue procedure goes well along some well established procedures there may be no need for extensive coordination between members of different services. After the common belief about the disaster is established, commonly known rescue procedures suffice for coordinated action. However, usually life is more complex and rescue procedure requires more advanced forms of teamwork, that can adapt to a dynamic environment presenting unexpected changes. The success of the complex rescue action will depend on the successful establishment of collective motivational attitudes within the rescue team. The first step towards a goal-directed activity is creating a collective intention within a group. 5.4.2 The notion of collective intention As multiagent systems consist of independent autonomous entities, the problem of an adequate organizational structure as well as the problem of predicting the behaviour of other agents becomes of special importance in rescue situations. When considering the rationale behind the behaviour of others, social theories about group behaviour come to the fore. In these theories collective intention towards an action or a state of affairs is a first-class citizen.

80

Barbara Dunin-K^plicz and Rineke Verbrugge

The first phase of our research concerned investigation on the sound and complete logical systems modelling a notion of mutual intention. Let us remind the reader of our characterization of mutual and collective intentions in cooperative teams [17]. A necessary condition for a collective intention C-INTG{(P) is that all members of the team G have the associated individual intention INT(i, (p) towards the overall goal (p. However, to exclude cases of competition and adversarial action, all agents should also intend all members to have the associated individual intention, as well as the intention that all members have the individual intention, and so on; we call such a mutual intention M - I N T G (
^

M.INTG(V^) A C-BELG(M-INTG(V:?))

RMl From (p -> E-INTG(V^ A cp) infer (p -^ M-INTG(V^) (Induction Rule) Even though C-INTG(V^) seems to be an infinite concept, collective intentions may be established in practice in a finite number of steps: an initiator persuades all potential team members to adopt a mutual intention, and, if successful, announces that the mutual intention is established [18]. Here we give a short description of the process, the announcement part of which may be viewed as an instantiation of the general procedure for establishing common beliefs presented in subsection 5.3.1. 5.4.3 Case study II: Creating collective intention in a rescue situation Usually, for teamwork to emerge, it is not sufficient to share information about the present situation as does team G in Case study I: one also needs a collective intention to solve the problem as a team, especially if no fixed commonly known procedures are at hand.

5 Creating Common Beliefs in Rescue Situations

81

Axiom M3 makes evident that, in establishing a collective intention, a crucial step for the initiator is to persuade all members of a potential team to take the overall goal as an individual intention. To establish the higher levels of the mutual intention, the initiator also persuades each member to take on the intention that all members of the potential team have the mutual intention, in order to strengthen cooperation from the start. It suffices if the initiator persuades all members of a potential team G to take on an individual intention towards (/? (INT(2,(^)) and the intention that there be a mutual intention among that team (INT(i,M-INTG!((^)). This results in INT(i,99 A M - I N T G ( ( ^ ) ) for all z G G, or equivalently by axiom Ml: E-INTG((/? A M-INTG(V^)), which in turn implies by axiom M2 that M-INTG(V:?). When all the individual motivational attitudes are established within the team, the initiator broadcasts the fact M - I N T G ( ^ ) A C - B E L G C M - I N T G ((/?)) by the general procedure described previously, by which the necessary common belief C-BELGf(M-INTG((^)) is established and the collective intention is in place. Thus, in the example case of the fire introduced in Case study I, the telephone operator a may establish a collective intention C - I N T G ( ( ^ ) among G using the abovedescribed procedure, where (p stands for "all people in the house have been rescued and have received appropriate medical treatment". This collective intention is a trigger for more concrete planning how to achieve ip and establishing a collective commitment within the team [6], but we do not treat this further step here. The collective intention allows the team to monitor its progress towards the main goal. It also acts as a kind of glue of the team, enabling appropriate re-planning when the circumstances unexpectedly change. In [19, 20] we describe a generic algorithm for effective and efficient reconfiguration, and show how, while collective intentions persist as long as possible and needed, collective commitments evolve in appropriate ways. 5.4.4 Case study III: Creating a collective pre-attitude without communication In some situations, for example time-critical ones, communication may be impossible, so that the low-level part of the procedure of subsection 5.3.1 does not work. In such cases there is no immediate way of establishing a common belief, except if a rescue protocol has been common knowledge (or common belief) from the start. The latter is the case for people who go on a canoeing trip with two boats: then there is common knowledge about exactly what they need to do in case one of the two boats capsizes. Let us consider less lucky situations, without pre-knowledge or communication. We have argued in [17] that teamwork may tentatively start even if the collective intention as defined in subsection 5.4.3 has not yet been established, especially in circumstances where it has not been possible (yet) to establish a common belief among the team about their mutual intention. For such situations, as a start for teamwork we defined a notion that is somewhat stronger than the mutual intention defined in axiom M2: in the axiom M2' below, even though a common belief about the mutual intention has not been established in actual fact, all members of the group intend it to be established.

82

Barbara Dunin-Kfplicz and Rineke Verbrugge

Consider, for example, a situation in which a person c has disappeared under the ice and two potential helpers a and h are in the neighbourhood; they do not know each other, and there is no clearly marked initiator among them. Suppose further that, at this point in time, communication among them is not possible, for example because of strong wind. Perception is possible in a limited way: they can see the other one move, but cannot distinguish facial expressions. Both have the individual intention to help (thus INT(a, if) and INT(6, (p), i.e. E-BELG(
C-INT'G((^) ^

U-mU'oi^)

A C-BELG(M-INT^((/^))

5 Creating Common Beliefs in Rescue Situations

83

5.5 Discussion and conclusions We have shown that for coordination in rescue or emergency situations, a group's informational and motivational attitudes are vital. In time-stressed or even time-critical dynamic and unpredictable environments, the communication necessary for proper coordination is crucial, but hard to achieve. Thus one needs to develop methods of creating such complex notions like common belief or collective intention on the basis of minimal communication. A procedure for establishing common belief that is applicable in rescue situation when the communication may be hampered is the main contribution in this paper. From the time when the notion of conmion knowledge and common belief were first studied, there has been a puzzle about their establishment and assessment (Mutual Knowledge Paradox described in [21]). How can it be that to check whether one makes a felicitous reference when saying "Have you seen the movie showing at the Roxy tonight", one has to check an infinitude of facts about reciprocal knowledge, but people do this in a finite, indeed short, time? Can common knowledge (belief) be established in finite time? Even if common knowledge in general cannot be created by communication in distributed system, our somewhat devious procedure shows that common belief can. In [22], another procedure is given for establishing shared beliefs (of the form E - B E L G ( E - B E L G ( ( ^ ) ) between two agents, and it is argued that, by reasoning, these shared beliefs lead to a common belief between the agents. The authors also make use of the fact that (common) beliefs need not be true, but their protocol is much more complex than ours. We believe that their protocol is correct, but there is unfortunately a gap in their proof that E - B E L G E - B E L G ( ^ ) implies C-BELG{(P), which they use to prove correctness. The presented analysis of emergency situations may be viewed as a starting point for a more refined classification of rescue situations along the need for teamwork. On this basis, various attitudes necessary for proper organization of a rescue team activity, like collective intentions, bilateral and collective conmiitments, and different types of group beliefs, could be tuned to the strength of teamwork needed. As such, a resulting formal model could be viewed as a natural extension of our theory of motivational attitudes (see [17, 19, 6, 20]) realized in multimodal logics. Finally, there is room for various implementations of prototypical systems based on a the previous modelling. Acknowledgments We would like to thank Rohit Parikh and Marcin Dziubinski for fruitful discussion about this work. This work is partially supported by the Polish KBN Grant supporting the EU funded ALFEBIITE++ project.

References 1. Jennins, N.R., Sycara, K., Wooldridge, M.: A roadmap of agent research and development. Autonomous Agents and Multi-agent Systems 1 (1998) 7-38

84

Barbara Dunin-K^plicz and Rineke Verbrugge

2. Levesque, H., Cohen, P., Nunes, J.: On acting together. In: Proceedings Eighth National Conference on AI, AAAI-Press and MIT Press (1990) 94-99 3. Bratman, M.: Intention, Plans, and Practical Reason. Harvard University Press, Cambridge (MA) (1987) 4. Fagin, R., Halpem, J., Moses, Y., Vardi, M.: Reasoning about Knowledge. MIT Press, Cambridge, MA (1995) 5. Meyer, J.J.C, van der Hoek, W.: Epistemic Logic for AI and Theoretical Computer Science. Cambridge University Press (1995) 6. Dunin-Kfplicz, B., Verbrugge, R.: Calibrating collective commitments. In: Proceedings of the 3rd International Central and Eastern European Conference on Multi-Agent Systems. Volume 2691 of LNAI., Springer Verlag (2003) 73-83 7. Dunin-K^plicz, B., Verbrugge, R.: A tuning machine for cooperative problem solving. Fundamenta Informaticae to appear (2004) 8. Steup, M.: The analysis of knowledge. In Zalta, E.N., ed.: The Stanford Encyclopedia of Philosophy. (Spring 2001) 9. van der Hoek, W., Verbrugge, R.: Epistemic logic: A survey. In Petrosjan, L., Mazalov, V, eds.: Game Theory and Applications. Nova Science Publishers, vol. 8, New York (2002) 53-94 10. Parikh, R., Krasucki, P.: Levels of knowledge in distributed computing. Sadhana: Proceedings of the Indian Academy of Sciences 17 (1992) 167-191 11. Halpem, J.Y., Moses, Y: Knowledge and common knowledge in a distributed environment. In: Symposium on Principles of Distributed Computing. (1984) 50-61 12. van Ditmarsch, H., Kooi, B.: Unsuccessful updates. In: Proceedings of the 12th International Congress of Logic, Methodology, and Philosophy of Science (LMPS), Oviedo University Press (2003) 139-140 13. Halpem, J., Zuck, L.: A little knowledge goes a long way: Simple knowledge-based derivations and correctness proofs for a family of protocols. In: 6th ACM Symposium on Principles of Distributed Computing. (1987) 268-280 14. Stulp, F., Verbrugge, R.: A knowledge-based algorithm for the internet protocol TCP. Bulletin of Economic Research 54 (2002) 69-94 15. Castelfranchi, C , Tan, YH., eds.: Trust and Deception in Virtual Societies. Kluwer, Dordrecht (2001) 16. Dunin-K^plicz, B., Verbrugge, R.: Dialogue in teamwork. In: Proceedings of The 10th ISPE International Conference on Concurrent Engineering: Research and Applications, Rotterdam, A.A. Balkema Publishers (2003) 121-128 17. Dunin-K^plicz, B., Verbrugge, R.: Collective intentions. Fundamenta Informaticae 51(3) (2002)271-295 18. Dignum, R, Dunin-K?plicz, B., Verbrugge, R.: Creating collective intention through dialogue. Logic Journal of the IGPL 9 (2001) 145-158 19. Dunin-K^plicz, B., Verbrugge, R.: A reconfiguration algorithm for distributed problem solving. Engineering Simulation 18 (2001) 227 - 246 20. Dunin-K^plicz, B., Verbrugge, R.: Evolution of collective commitments during teamwork. Fundamenta Informaticae 56 (2003) 329-371 21. Clark, H.H., Marshall, C : Definite reference and mutual knowledge. In Joshi, A., Webber, B., Sag, I., eds.: Elements of Discourse Understanding, Cambridge University Press (1981) 10-63 22. Paurobally, S., Cunningham, J., Jennings, N.R.: Ensuring consistency in the joint beliefs of interacting agents. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems, ACM Press (2003) 662-669

Coevolutionary Processes for Strategic Decisions Rodney W. Johnson^, Michael E. Melich^, Zbigniew Michalewicz^, and Martin Schmidt'* ^ Wayne E. Meyer Institute of Systems Engineering, Naval Postgraduate School, Monterey, CA 93943, USA ^ Wayne E. Meyer Institute of Systems Engineering, Naval Postgraduate School, Monterey, CA 93943, USA ^ Department of Computer Science, University of North Carolina Charlotte, NC 28223, USA, and Institute of Computer Science, Polish Academy of Sciences ul. Ordona 21, 01-237 Warsaw, Poland and Polish-Japanese Institute of Information Technology ul. Koszykowa 86, 02-008 Warsaw, Poland "^ NuTech Solutions, Inc. 8401 University Executive Park, Suite 102, Charlotte, NC 28262, USA Summary. We present a description and initial results of a computer code that coevolves Fuzzy Logic rules to play a two-sided zero-sum competitive game. It is based on the TEMPO Military Planning Game that has been used to teach resource allocation to over 20,000 students over the past 40 years. No feasible algorithm for optimal play is known. The coevolved rules, when pitted against human players, usually win the first few competitions. For reasons not yet understood, the evolved rules (found in a symmetrical competition) place no value on information concerning the play of the opponent.

6.1 Introduction The notion of "big decisions", those that shape the future evolution of a business or organization, frequently attach to the word "strategic". For example, what a company chooses to do or avoid doing is shaped by its answer to the "strategic" question: Are we a consulting company or a product company? Or in the case of the US Navy the question has taken the form: Are we an organization that provides prompt and sustained operations at sea, or are we operators of ocean going naval combatants? The allocation of the people, capital, goodwill, and other assets will be different depending upon which choice is made. Product companies often expect to derive revenue streams from a developed set of loyal customers whose needs are understood through ongoing contact that informs new development and leads to sales of subsequent generations of products. Product companies usually see themselves lasting many product generations and hope to leverage their collective skills to increase

86

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

margins. Consulting companies often are not tied to given products and find themselves working in a "fee for service" arrangement on their client's "problem of the moment". Constantly finding "new problems (and new clients)" for the next engagement grows to dominate the marketing effort and tends to limit the profitability to the profit on hourly charges. Long lived consultancies tend to attach to problems such as "accounting standards", the "tax code", and other complexes of regulation and custom. How is a strategic question like this to be answered? First, any analysis will ask if there are customers and competitors and how are they described. Second, can we profitably obtain and serve the customers in the face of the existing and potential competition? Third, what should be done and in what order to become "successful"? Easily stated questions — but difficult to answer. Even describing what constitutes a "good choice or set of choices" is made complicated by the tens to thousands of different actions that can be taken. In larger organizations that have had time to evolve in response to competitive and environmental pressures, the allocation of effort — the decision — is most explicitly presented in the "budget". But budgets tend to describe inputs to the organization's activities and not the outputs. Businesses fall back on measures of profitability over some time period as the measure of their success. Military organizations look to wars and conflicts to characterize their success. Thus, analyzing "strategic" questions can be cast as asking: Will a particular sequence of investments, expressed as budgets, over many years produce a successful result in the face of competition and a changing environment? Resource allocation in mission or market oriented large enterprises, either government departments or large businesses, is made difficult by the large number of possible investment plans that could be considered. This complexity is in addition to the normal uncertainties associated with a changing environment — changing competition, technical innovation, etc. For example, within the US Department of Defense it is not uncommon for tens of thousands of different categories to be routinely examined annually. Decisions are then made to allocate funds and personnel for the forthcoming budget year as well projections for six years in the future. Similar activities and associated complexity are found in non-governmental organizations [19]. In the early 1960s, the Department of Defense created a management system, the Planning, Programming, and Budgeting System (PPBS) of considerable complexity to rationalize its resource allocation problems. A major training program was instituted to teach the PPBS and a "game" was created by General Electric's "TEMPO think tank" to train people in the use of the new system. The Defense Management Resource Institute (DRMI) (see www.nps.navy.mil/drmi/98org.htm) has used the TEMPO game in its courses for nearly 40 years. Over 20,000 students from 125 countries have benefited from exposure to this game. We became interested in resource allocation problems while conducting large scale, multi-nation "futures" studies. Our studies used scenario methods [24]. An integral part of the multi-year competitive decision environment was allocation of national resources to defense. This forced trade-offs between investment in economic growth, foreign assistance, education, etc. These are a very complex set of decisions

6 Coevolutionary Processes for Strategic Decisions

87

and we soon appreciated that we were trimming a very complex decision tree and had litde hope of understanding what other options might offer. The work presented here reports one facet of our research program, initiated in 2000, to deal with aspects of resource allocation problems in a world where your competitors are also able to make choices.

6.2 Coevolutionary Approaches to Games Games are characterized by rules that describe the moves each player can make. These moves constitute the behavior of the players, the manner in which each allocates his resources. When a player makes a move, he receives a payoff; usually he tries to maximize the cumulative payoff over a period of time. In some games, such as chess, the payoff comes at the end of the game, but we can imagine a surrogate payoff, or evaluation function, that correlates with a player's chances of winning at each point in the course of the game. Some games are competitive, others are cooperative, and still others are mixed, depending on the form of the evaluation function. If, for example, when one player is gaining in payoff, the other player is losing payoff, it's a competitive game. The evaluation function is a key ingredient in a game-playing system. Sometimes, however, we have no idea of how to create a good evaluation function; there may be no clear measure of performance beyond simply whether you win, lose, or draw. The situation is similar to that of living creatures in nature, who are consummate problem solvers, constantly facing the most critical problem of avoiding being someone else's lunch. Many of their defensive and offensive survival strategies are genetically hardwired. But how did these strategies begin? We can trace similarities across many species. For example, many animals use cryptic coloration to blend into their background. They may be only distantly related, such as the leafy sea dragon and the chameleon, and yet their strategy is the same: don't be noticed. Other animals have learned that there is "safety in numbers," including schooling fish and herd animals such as antelope. Furthermore, herding animals of many species have learned to seek out high elevations and form a ring looking outwards, so as to sight predators as early as possible. These complex behaviors were learned over many generations of trial and error, and a great deal of life and death. This is a process of coevolution. It is not simply one individual or species against its environment, but rather individuals against other individuals, each competing for resources in an environment that itself poses its own threats. Competing individuals use random variation and selection to seek out survival strategies that will give them an edge over their opposition. Antelope learned to form a ring to spot predators more quickly; predators learned to hunt in teams, and use the tall grasses of the savanna to mask their approach. Each innovation from one side may lead to an innovation from another, an "arms race" wherein individuals evolve to overcome challenges posed by other individuals, which are in turn evolving to overcome new challenges, and so forth.

88

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

Note that the individual antelopes did not gather in a convention to discuss new ideas on survival and come up with the strategy of defensive rings on high ground. Nevertheless, the strategies are unmistakably a process of learning. When instinctual, they have been accumulated through random variation and selection, with no evaluation function other than life and death. The entire genome of the species is then the learning unit, with individuals as potential variations on a general theme. It is not surprising that coevolutionary processes have been used by many researchers, whether in optimization or in game playing. An example in optimization is Hillis's now famous example of minimizing a sorting network: a fixed sequence of operations for sorting a fixed-length string of numbers [14]. By an evolutionary search he had found a network that sorted 16 numbers with just 65 comparisons. Networks were scored on the fraction of all test cases (unsorted strings) that they sorted correctly. Hillis then noted that many of the sorting tests were too easy and only wasted time. He therefore devised a method in which two populatons coevolved: sorting networks and sets of test cases. The networks were scored according to the limited number of test cases presented (10 to 20), and the test sets were scored on how well they found problems in the networks. Variation and selection were applied to both populatons; the test cases became more challenging as the networks improved. Hillis reported that the coevolutionary approach avoided stalling at local optima, and that it eventually found a network comprising only 61 comparisons. (This is just 1 short of the best known network to date, discovered by Green and using 60 comparisons [17].) Sebald and Schlenzig have studied the design of drug controllers for surgical patients by coevolving a population of so-called "CMAC" controllers, chosen for effectiveness, against a population of (simulated) patients, chosen for presenting difficulties [25]. Many researchers have studied pursuit-evasion games, for example [23, 6, 8]. Various interesting approaches to constraint-satisfaction problems are reported in [21, 22, 20, 18]. With a bit of thought, what would appear to be a straightforward optimization problem can often be recast with advantage as a problem of coevolution. In the remainder of this section we mention some developments in game playing. In 1987 Axelrod studied the Iterated Prisoner's Dilemma (IPD) by an evolutionary simulation [2]. Strategies were represented as look-up tables giving a player's move — cooperate or defect — as a function of the past 3 moves (at most) on each side. Strategies competed in a round-robin format (everyone plays against every possible opponent) for 151 moves in each encounter. The higher-scoring strategies were then favored for survival using proportional selection, and new strategies were created by mutation and by one-point crossover. Axelrod made two observations. First, the mean score of the survivors decreased in the early generations, indicating defection, but then rose to a level indicating that the population had learned to cooperate. Second, many of the strategies that eventually evolved resembled the simple but effective strategy of "tit-for-tat" — cooperate on the first move, and then mirror the opponent's last move. In 1993 Fogel studied the effect of changing the representation of strategies in the IPD, replacing Axelrod's look-up tables with finite state machines [9,10]. The results

6 Coevolutionary Processes for Strategic Decisions

89

were essentially the same as what Axelrod had observed. Harrald and Fogel, on the other hand, observed entirely different behavior in a version of IPD where the player could make moves on a continuous numeric scale from —1 (complete defection) to 1 (complete cooperation) [13]. Strategies were represented by artificial neural networks. In the vast majority of trials payoffs tended to decrease, not increase. Darwen and Yao observed similar results in a variation of IPD with eight options, rather than two or a continuous range [7]. They observed, first, that as the number of options to play increases, the fraction of the total game matrix that is explored decreases. Second, when the IPD had more choices, strategies evolved into two types, where the two types depended on each other for high payoffs and did not necessarily receive high payoffs when playing against members of their own type. These observations are very interesting, yet they perhaps do not fully explain the degradation in payoff that is seen when a continuous range of options is employed. Hundreds of papers about the prisoner's dilemma are written each year, and very many of the contributions to this literature have involved evolutionary algorithms in different forms. These and many other studies indicate the potential for using coevolutionary simulation to study the emergence of strategies in simple and complex games. In the late 1990s and into 2000, Chellapilla and Fogel implemented a coevolutionary system that taught itself to play checkers at a level on par with human experts. The system worked like this. Each position was represented as a vector of 32 components, corresponding to the available positions on the board. Components could take on values from -K, -1, 0, +1, K, where K was an evolvable real value assigned to a king, and 1 was the value for a regular checker. A 0 represented an empty square, positive values indicated pieces belonging to the player, and negative values were for the opponent's pieces. The vector components served as inputs to a neural network with an input layer, multiple hidden layers, and an output node. The output value served as a static evaluation function for positions — the more positive the value, the more the neural network "liked" the position, and the more negative, the more it "disliked" the position. Minimax was used to select the best move at each play based on the evaluations from the neural network. The coevolutionary system started with a population of 15 neural networks, each having its weighted connections and K value set at random. Each of the 15 parent networks created an offspring through mutation of the weights and K value, and then the 30 neural networks competed in games of checkers Points were awarded for winning (+1), losing (—2), or drawing (0). The 15 highest-scoring networks were selected as parents for the next generation, with this process of coevolutionary self-play iterating for hundreds of generations. The networks did not receive feedback about specific games or external judgments on the quality of moves. The only feedback was an aggregate score for a series of games. The best evolved neural network (at generation 840) was tested by hand, using the screen name "Blondie24", against real people playing over the Internet in a free checkers website. After 165 games, Blondie24 was rated in the top 500 of 120,000 registered players on the site. The details of this research are in [3, 4, 5, 11].

90

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

There are 1020 possible positions in checkers, far too many to enumerate, and checkers remains an unsolved game: nobody knows for sure whether the game is a win for red, a win for white, or a draw. Chess, at 1064 positions, is still further from being solved. But Fogel and Hays have combined neural networks and coevolution to create a master-level chess-playing program, again without giving the simulated players any feedback about specific games [12]. Coevolution can be a versatile method for optimizing solutions to complex games, and a reasonable choice for exploring for useful strategies when there's little available information about the domain.

6.3 The TEMPO game The TEMPO Military Planning Game is a two-sided zero-sum competitive game. Teams of players compete in building force structures by dividing limited budgets, over a succession of budgeting periods ("years") between categories such as "acquisition" and "operation" of "offensive units" and "defensive units". The rules are no more complex than the rules of, say, Monopoly. However, players learn that the rules' apparent simplicity is deceptive: they pose challenging and difficult decision problems. No feasible algorithm for optimal play is known. The full set of investment categories for the TEMPO game comprises: (1) operation of existing forces, (2) acquisition of additional or modified forces, (3) research and development ("R&D"), (4) intelligence, (5) counter-intelligence. There are four types of forces: two offensive ("Offensive A and B") and two defensive ("Defensive A and B"). Each type comprises several weapon systems with varying acquisition and operation costs (measured in "dollars"), measures of effectiveness (in "utils"), and dates of availability (in "years"). A team's objective is to maximize its total "net offensive utils". A team's net offensive utils of type A are the total utils for its operating Offensive A units, minus the opposing team's Defensive A, but not less than zero. Likewise for type B. Thus there is no advantage in investing more in a defensive system than is necessary to counter the opponent's offensive systems of the same type. R&D is current investment that buys the possibility in a future year of acquiring new weapon systems, possibly with better price/performance ratios than those now available. Investment in intelligence buys information about the opponent's operating forces and investment activities. Investment in counterintelligence degrades the information the opponent obtains through intelligence. Every year the probability of war (PWar) is announced. When this is low, players may well decide to invest heavily in R&D and acquisition of new units; when it is high, they may prefer to concentrate on operating existing units.

6.4 Initial experiments In 2000 we did some experiments applying EC to a highly simplified version of the TEMPO game. There were only one offensive and one defensive weapon system.

6 Coevolutionary Processes for Strategic Decisions

91

R&D, intelligence, and counterintelligence were eliminated. The experiments were done with the help of John Koza's simple Lisp code for Genetic Programming [23]. Individuals (candidate algorithms) were represented as computer programs in a simple Lisp-like language, written in terms of variables that describe the current state, and operations that attempt to allocate funds to various categories for the coming budgeting period. State variables included the current total available budget, PWar, current inventories, acquisition limits, prices, and operating costs for the offensive and defensive units. The budget categories were acquisition and operation of offensive and defensive units. Investment algorithms were evaluated for fitness by pitting each in games against a selection of others from the same population and recording wins and losses. The question was if anything reasonable would emerge in such a simple framework? And indeed, starting from an initial generation of completely random programs, an algorithm was evolved that allocated funds according to rudimentary sensible rules, which can be characterized as "dumb, but not crazy": it would not attempt to acquire units beyond the appropriate acquisition limits or to operate units beyond the number in inventory, and it incorporated a check to assure that an initial allocation to operation of offensive units had not exhausted available funds before further allocations were attempted.

6,5 Design of a New System In an attempt to improve the evolution of human readable rules we designed a coevolutionary system that evolves Fuzzy Logic rulebases to play the TEMPO game. The new system has the following features: •

• •

•

•

Each individual can encode a maximum of w rules for acquiring and operating weapons ("weapon rules") and i rules for buying intelligence or counterintelligence ("Intel rules"). Each rule can use one or more of the available environmental parameters.^ We use the Mamdani Fuzzy Logic System with Gaussian membership functions,^ Singleton Fuzzyfier, product operation rule for fuzzy AND, and Centre of Average Defuzzyfication. The weapon rules assign a value (a "desirability") to each weapon system; the intel rules assign a value to each intelligence/counterintelligence category. The budget is allocated by linear scaling of these values, followed by normalization in order not to exceed the available budget. We use two populations A and B, each consisting of m individuals with a fixed genotype length n. The decoded phenotype has a varying number of rules and

^There are a total of 20+ environmental parameters for weapon allocation and intelligence gathering. ^We use one Gaussian spreading parameter for each input.

92

•

•

• •

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

membership functions^ with a maximum number (given by the maximum length n of the chromosome). Each individual from population A (B) is evaluated by letting it compete against o randomly chosen opponents from the population B (A). The fitness of each individual is its average "net offensive utils" minus a penalty term that is linear in the number of parameters used by the Fuzzy Logic System (for "pruning"). Hence, more compact well-performing rulebases are preferred during the evolution. The mutation operator could perform either small or large mutations of each parameter with a small probability PMut. A mutation would be a variation of a parameter by adding a randomly distributed value in the range [—d, +d] or [—d/lO, +d/lO] for large or small mutation respectably. The crossover operator is a standard two-point crossover. For each population the environment changes from game to game, i.e., available weapons and effectiveness and prices change. This results in a dynamically changing environment in which the rulebases have to make budget allocations.

Starting from random populations the coevolutionary system develops powerful Fuzzy Logic rulebases. In order to get an understanding of some kind-of "absolute" performance the best performing individual we let it play against a static "expert" based on simple heuristics (expressed as Fuzzy Logic rules). The performance against the "expert" is not included in any fitness calculations but is used to understand the quality of the evolved rulebases during the evolution. The rules can be presented in a form that can be understood easily by humans; one reason for choosing Fuzzy Logic. Here is an example: if [PWar IS Very Low - Low] [CATEGORY IS DEFENSIVE] [SUBTYPE IS 1 OR 2] [Inventory IS Low] [MaxAcquisitonUnits IS Low - Medium] [AcquisitionCost IS Very Low] [UtilsPerAcquisitionCost IS Very Low - Low] then [Evaluation IS Low] The terms of the "if" part refer to seven of the environment variables that are available for constructing a weapon rule. A term like "AcquisitionCost IS Very Low" refers to the degree of membership of the acquisition cost in a certain fuzzy set represented internally by a Gaussian membership function with a given center c and standard deviation a. The program uses the actual numeric values of c and a internally, and these are the quantities that mutation and crossover operate on. But for the human reader, expressions like "Very Low" are presented, as presumably more palatable than a pair of numbers like 48.7682, 17.1056. The range of meaningful ^Rules and inputs can be "pruned*', i.e., be "NULL" and hence not used. Pruning reduced the effect of overleaming. The fitness reflects that a smaller rulebase is preferable using a static penalty approach.

6 Coevolutionary Processes for Strategic Decisions

93

acquisition costs is divided into five subranges running from "Very Low" to "Very High". Here the interval c±a lies within the "Very Low" subrange for acquisition costs. The "Evaluation IS Low" in the "then" part of the rule refers to a "desirability" value. Again the program uses a specific number. The human reader is told that the number is in the low subrange of possible "goodness" values. Thus, to the extent that the rule applies to a weapon, it is a reason not to buy it.

6.6 New Experiments Initial experience with the coevolution code inunediately demonstrated the power of the approach — it proceeded to win first games with most of those who played against the derived rules. It was also clear early on, Winter 2001, that the coevolved rules didn't value information about the opponent's choices. That is, no rules for buying intelligence or counter-intelligence were of sufficient value to be included in the evolved set. Similar behavior had been seen in play of the TEMPO paper game when we were using it to teach our students. We attributed this either to avoidance of excessive inputs — a common human strategy for coping with information overload — or to the "gaming of the game" that occurs when you know approximately when the game will be over. Another possibility was that since the initial version of the TEMPO code provided information that was not quantitative on what the opponent was getting with investments made, there was truly little value. We did some preliminary investigations to determine if we could configure the game so that there might be value to buying intelligence. We gave player X a larger budget and immediate access to all weapons as they became available, while player Y had a smaller budget and was delayed one year in having investment opportunity on the various weapons. This coupled with a reduction in the prolixity penalty did produce a few "weak" rules for the purchase of intelligence by the disadvantaged player. These efforts immediately highlighted another problem. It had taken approximately 2 weeks of computation on a single 3 GHz processor to coevolve the initial rules. To properly investigate issues of the sort just described would require faster computational turnaround. We embarked on a porting of the coevolutionary code to the Processing Graph Method Tool (PGMT), a parallel computing program support system developed at the Naval Research Laboratory (see [15, 1]). An application under PGMT is represented as a data-flow graph (similar to a Petri net) whose processing nodes can run in parallel on separate processors. The mapping of nodes to physical processors takes place at runtime. This flexibility facilitates moving an application, without rewriting, from one parallel-processing system to a very different one — from a small, heterogeneous network of workstations, say, to a large, homogeneous, high-performance shared-memory multi-processor system. We also modified the information provided to be more useful. A rule input was provided indicating whether the opponent had bought counterintellignence. An input giving the initially available number of units of a weapon system was replaced with one giving the player's current invemtory. The opponent's operating forces were given in total utils rather than number of weapon units and these values were given as

94

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

absolute current values, rather than as changes relative to the previous year. This last change was motivated by the fact that the rules incorporate no "memory" of previous years' decisions. The result of a coevolution using these modifications has been used in an economics course at NPS. In addition to the coevolutionary system, there is a game system that lets a human player play against a saved individual. The computer distributes its budget according to its rule base, while the human player interacts with the game system through a spreadsheed interface. Many of the students needed three or four tries before achieving an outcome that they were willing to submit for grading. Thus we continue to see human-competitive play in the coevolved rules. One of our colleagues, an economist with previous experience with the DRMI paper form of the game, was able through prolonged and concerted effort to beat the machine by a small margin on a first try. During play, he was ascribing all manner of sophisticated motivations to the machine for its moves. He was dismayed to learn afterward that he had been competing against a set of precisely three rules: the one shown above in section V and the following two others. if [Budget IS Low - Medium] [EnemyCounterintel IS NOT BOUGHT] [SUBTYPE IS 1] [Utils IS Low] [UtilsPerAcquisitionCost IS Very High] [YearAvailable IS Medium] then [Evaluation IS Low]

if [Budget IS Low] [CATEGORY IS OFFENSIVE] [TYPE IS B] [Utils IS Very High] [YearAvailable IS Medium - High] [EnemyOffensiveUtils IS Unknown OR Very Low] [EnemyDefensiveUtils IS Unknown OR Very Low] then [Evaluation IS Very High] Such a low number of rules is not atypical. Figure 1 shows how the number of rules used by the best player during the coevolution varied over the first 600 generation. The actual run went to generation 1927, but instances of 3-rule best players were already appearing before generation 500. With the availability of the PGM2 port, we are beginning to be able to use larger populations experiment with somewhat larger problems than previously, in particular to increase the number of weapon systems from 2 to a dozen or so. Doing so, with further relaxation of parsimony pressure, seems to encourage appearance of rule sets (now larger than 3 rules) containing intel rules with High or Very High in their "then" parts.

6 Coevolutionary Processes for Strategic Decisions

95

6.7 Conclusions and Future Work Four years ago as we started this work we didn't know if resource allocation problems of the type described in the TEMPO-paper game would be approachable using coevolutionary computation methods. And, even though the initial LISP experimental system suggested an affirmative answer, the nature of what we could learn from a coevolutionary system was not obvious to us. We have learned a number of surprising things: 1. The environment, e.g., PWar, budget size, sequence of available weapon types, cost per utile, etc, is important but knowing what your opponent is doing — intelligence information — is a far slippier component in a sequential game of non-abelian decisions. 2. Though the derived rules can beat most human players immediately, the human players are able to learn the "manner of play" of the machine codes. This suggests that "intel" may after all be important for the machine players to consistently win. 3. Exploring questions that arise, as in 1. and 2., will require a large and growing computational environment. Fortunately, the choice of PGMT has facilitated our ability to move between different multi-processor environments with a minimal amount of recoding. 4. The prolixity penalty appears likely to be useful as a proxy for the information handling capacity of the "decision maker". The more rules an organization uses to make decisions the greater the demand for information processing capability. Since most large organization use fairly simple metaphors for making decisions, and these metaphors can be captured as "if-then" rules, it is possible to imagine exploring alternative rules using different fitness functions to determine why organizations have come to the rules they use. This is very much like the "inverse problem" where given a result we have to find out what was a potential cause of that result. This is why "IR" and "Naive Bayes" work so well so often. 5. Non-transitive (paper, rock, scissors) ordering of strategies seems to be possible. We are just at the beginning of this research and its application. We need to understand why the code produces rules that favor allocation of most effort to evaluating the cost per utiles of weapon capability and pay little attention to the value of the opponent's behavior. We also need to include investment opportunities in R&D, including both modemization and totally new systems. The current system has PWar and budgets as externally supplied values. A hierarchical competition where the higher-level system that determines these values interacts with the current game is of considerable interest. Our research has now moved from our preliminary trials on desktop computing machines to networks of processors. We thus far have done our computation in 2-3 Ghz desktop computers, on a loosely coupled cluster of 2 Sun and 3 Silicon Graphics workstations, on 28 processors in a tightly coupled network of 128 processor in the SGI 3800, and are prepared to do experiments in other networked environments. The use of PGMT has permitted us to move efficiently from one computing environment

96

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

to another. Once PGMT has been installed bringing our TEMPO codes into operation has taken between 4-18 man hours. Our expectation is that this combination of a very flexible representation of the resource allocation problem and the computational environment that PGMT provides will permit research on a growing family of poorly understood problems encountered in large living systems. Let us conclude by making a few general comments here. Analyzing large complex systems in the physical sciences made great advances when theoretical methods evolved that could organize the experimental results. For example, the notion of "coordinate invariance" under Galilean transformation or Lorentz transformation put severe constraints on allowable theories. In chemistry, Mendeleev's 1873 Periodic Table of the Atoms created extraordinarily useful categories for the organization and understanding of a growing body of optical spectroscopy measurements. The demand that our mathematics and other methods of representing the subject being analyzed have certain reasonable properties has been discussed by E.T Jaynes, "Probability Theory: The Logic of Science". [E.T. Jaynes edited by G. Larry Bretthorst, (2003), Cambridge, UK, Cambridge University Press]. Jaynes points out that our world is discreet and that continuum mathematics, the foundation of classical mathematical analysis, is an approximation that brings with it some unusual results — such as the Axiom of Choice. Had Newton had digital computers when formulating the laws of classical mechanics we probably would have first learned and stayed with finite difference equations and not calculus. In trying to project the behavior of living systems, the work by James Grier Miller, Theory of Living Systems, [James Grier Miller; (1995) University of Colorado Press; P.O. Box 849; Niwot, Colorado] offers an analytical approach for dealing with the complexity of the world of evolving, living systems. Miller's work brings together a wide range of research on living systems at the cell, organ, organism, group, organization, community, society, and supranational system levels. The living systems at the cell, organ, and organism are relatively stable since their evolution has had time to develop examples that thrive as open systems in our world. The higher level systems are not stable and much evolutionary experimentation is occurring. Miller formally proposes cross-level "metaphors" [see the work of George Lakoff and Mark Johnson: Philosphy in the Flesh: The Emobided Mind and Its Challenge to Western Thought; (1999) Basic Books, New York] that can be tested. The ability to use coevolutionary computing methods of the type described in this paper to better understand the issues raised by Miller is made possible by the growing computational power at our disposal.

Acknowledgements The authors extend thanks to Yunjun Mu for writing code for the TEMPO FuzzyLogic coevolutionary system; to Wendell Anderson and Roger Hillson for an education in PGMT and for support in parallelizing the system; and to John Thornton, also for support in the parallelization and for thought-provoking discussions as well.

6 Coevolutionary Processes for Strategic Decisions

97

References 1. W. Anderson, "Processing Graph Method (PGMT) User's Manual", US Naval Research Laboratory, October 2002. 2. R. Axelrod, "Evolution of Strategies in the Iterated Prisoner's Dilemma", in "Genetic Algorithms and Simulated Annealing", L. Davis, ed.. Pitman, London, pp. 32-41, 1987. 3. K. Chellapilla and D.B. Fogel, "Evolution, neural networks, games, and intelligence", in Proceedings of the IEEE, Vol. 87, No. 9, pp. 1471-1496, 1999. 4. K. Chellapilla and D.B. Fogel, "Evolving neural networks to play checkers without expert knowledg"", in IEEE Transactions on Neural Networks, Vol. 10, No. 6, pp. 1382-1391, 1999. 5. K. Chellapilla and D.B. Fogel, "Evolving an expert checkers playing program without using human expertise", in IEEE Transactions on Evolutionary Computation, Vol. 5, No. 4, pp. 422-428, 2001. 6. D. Cliff and G. F. Miller (1996). CoEvolution of Neural Networks for Control of Pursuit and Evasion. University of Sussex, U.K. [Online]. Available: http://www.cogs.susx.ac.uk/users/davec/pe.html 7. P.J. Darwen and X. Yao, "Why More Choices Cause Less Cooperation in Iterated Prisoner's Dilemma", in Proceedings of the 2001 Congress on Evolutionary Computation, IEEE, Piscataway, NJ, pp. 987-994. 8. D. Floreano. and S. Nolfi, "God Save the Red Queen! Competition in Co-evolutionary Robotics," in Genetic Programming 1997, J.R. Koza, K. Deb, M. Dorigo, D.B, Fogel, M. Garzon, H. Iba, and R.L. Riolo, eds., Morgan Kaufmann, San Mateo, CA, pp.398-406, 1997. 9. D.B. Fogel, "Evolving Behaviors in the Iterated Prisoner's Dilemma", in Evolutionary Computation, Vol. 1, No. 1, pp. 77-97, 1993. 10. D.B. Fogel, "Evolutionary Computation: Toward a New Philosophy of Machine Intelligence", in IEEE Press, Piscataway, NJ, 1995. 11. D.B. Fogel, "Blondie 24: Playing At The Edge of AI", Morgan Kaufmann, San Francisco, CA, 2002. 12. D.B. Fogel and T.J. Hays, "New results in evolving strategies in chess", in Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation VI, Vol. 5200, B. Bosacchi, D.B. Fogel, and J.C. Bezdek (chairs), SPIE, Bellingham, WA, pp. 56-63, 2003. 13. P.G. Harrald and D.B. Fogel, "Evolving Continuous Behaviors in the Iterated Prisoner's Dilemma", in BioSystems, Vol. 37, pp. 135-145, 1996. 14. W.D. Hillis, "Co-evolving Parasites Improve Simulated Evolution as an Optimization Procedure", in "Artificial Life II", C.G. Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, eds., Addison-Wesley, Reading, MA, pp. 313-324, 1992. 15. D. Kaplan, "Introduction to the Processing Graph Method", U.S. Naval Research Laboratory, Mar. 1997. 16. J.R. Koza, "Genetic Programming", Cambridge, MA: MIT Press, 1992. 17. D.E. Knuth, "Sorting and Searching", Vol.3, in "The Art of Computer Programming", Addison-Wesley, New York, NY, 1973. 18. R. Le Riche, C. Knopf-Lenoir, and R.T. Haftka, "A Segregated Genetic Algorithm for Constrained Structural Optimization", in Proceedings of the Sixth International Conference on Genetic Algorithms, L.J. Eshelman, ed., Morgan Kaufmann, San Mateo, CA, pp.558-565, 1995. 19. D. Lovallo and D. Kahneman, "Delusions of Success," Harvard Business Review, vol. 81, no. 7, pp. 57-63, July 2003.

98

R.W. Johnson, M.E. Melich, Z. Michalewicz and M. Schmidt

20. Z. Michalewicz and G. Nazhiyath, "GENOCOP III: A Coevolutionary Algorithm for Numerical Optimization Problems with Nonlinear Constraints," in Proceedings of the 1995 IEEE Conference on Evolutionary Computation. IEEE Press, Piscataway, NJ, pp.647651, 1995. 21. J. Paredis, "Co-Evolutionary Constraint Satisfaction,'* in Proceedings of the 3rd Conference on Parallel Problem Solving from Nature, Y. Davidor, H.-P Schwefel, and R. Manner, eds.. Lecture Notes in Computer Science, vol.866. Springer, Berlin, pp.46-55, 1994. 22. J. Paredis,. "The Symbiotic Evolution of Solutions and Their Representations," in Proceedings of the Sixth International Conference on Genetic Algorithms, L.J. Eshelman, ed., Morgan Kaufmann, San Mateo, CA, pp.359-365,. 1995. 23. C. Reynolds, "Competition, Coevolution and the Game of Tag", in Proceedings of Artificial Life IV, R. Brooks and P Maes, eds., MIT Press, Cambridge, MA, pp. 56-69, 1994. 24. Peter Schwartz, The Art of the Long View: Planning for the Future in an Uncertain World. New York: Currency/Doubleday, 1991. 25. A.V Sebald and J. Schlenzig, "Minimax Design of Neural-net Controllers for Uncertain Plants", IEEE Transactions on Neural Networks, Vol.5, No.l, pp. 73-82, 1994.

Automatic Proofs of Protocols via Program Transformation Fabio Fioravanti^, Alberto Pettorossi^, and Maurizio Proietti^ 1

Computer Science Department, University of L'Aquila 1-67100 L'Aquila, Italy [email protected] ^ DISP, University of Roma Tor Vergata, 1-00133 Roma, Italy [email protected] ^ lASI-CNR Viale Manzoni 30,1-00185 Roma, Italy [email protected]

Summary. We propose a method for the specification and the automated verification of temporal properties of protocols which regulate the activities of multiagent systems. The set of states of those systems may be infinite so that, in general, the verification of a property of a multiagent system cannot be performed by an exhaustive inspection. We specify a given multiagent system by means of a constraint logic program P with locally stratified negation, and we specify a given temporal property to be verified by means of an atomic formula A. In order to verify that the given temporal property holds, we transform the program P into an equivalent program T such that the fact A <— belongs to T. Our transformation method consists of a set of rules and an automatic strategy that guides the application of the rules. Our method is sound for verifying properties of protocols that are expressible in the CTL logic [5]. Although our method is incomplete for proving properties of infinite state systems, it is able to verify important properties of several protocols which are used in practice.

7.1 Introduction Many models of computation have been considered in the literature. Among these are: (i) the centralized, sequential model based on the von Neumann architecture, (ii) the centralized, parallel model based on the parallel random machine with the *concurrent-read and exclusive-write' restriction on its registers, and (iii) the distributed, multiagent system model. Other models of computation proposed in the literature include those based on rewriting systems, logical deductions, quantum computations, and DNA computations. In this paper we want to consider, in particular, the multiagent system model and we want to analyze a few issues that we think are of major relevance for that model.

100

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

The multiagent system model can be understood as a set of agents, each of which can be viewed as a von Neumann computer. These agents interact with each other and cooperate by exchanging messages with the aim of achieving a common goal, maybe in the presence of antagonistic agents. The cooperation among agents is realized via protocols that ensure suitable global properties of the multiagent system. While computation progresses, a global knowledge of the system of agents is built up, starting from a local, maybe imprecise, knowledge of each individual agent. In the multiagent model the following concepts are important: (i) the local computations (also called local rules), which regulate the activity of each agent, (ii) the protocols (also called metarules), which establish the way agents interact with each other, (iii) the communications, which specify the messages that can be exchanged among agents, and (iv) the distributivity property, which indicates the degree of interaction among agents. Before illustrating these concepts, let us present a simple example of multiagent system that will help the reader to fix his/her own ideas. In this example we consider the problem of computing a directed spanning tree of a given undirected, connected, finite graph G = (TV, E), where A/' is a set of nodes and E is a set of edges (that is, ordered pairs of nodes). Since G is undirected we assume that for every edge (n^m) there exists the symmetric edge (m^n). For reasons of simplicity, we also assume that in G there is no edge of the form (n, n). This spanning tree is computed by several cooperating agents, each of which acting at a node of the given graph G. For every node n e N we define the two sets P{n) = {p \ (p, n) e E} and 5(n) = {s I (n, s) G E}, which are the sets of the so called predecessor nodes of n and successor nodes of n, respectively. The agent at node n can communicate only with the agents which are at the nodes of P{n) U S(n), and the messages it sends are based only on the local knowledge it has. At the end of the computation, when the spanning tree has been computed, each agent knows the edges which insist on its node and belong to the computed spanning tree. The multiagent system has to construct a directed spanning tree of the given graph G with a given node no as its root. During the execution of the algorithm a node may be either unmarked or marked. Initially all nodes are unmarked. The computation performed by the agents at the nodes is done according to the following local rules R1-R3, which transform the given graph G into a directed spanning tree of G with root no. (i) Rule Rl is applied by the agent at the root node no- This agent deletes all edges arriving at the root, marks the root, and sends a mark-token to every node in S{no). (ii) Rule R2 is applied by an agent at an unmarked, non-root node n iff there is at least one incoming mark-token from a node, say i, of P{n). The agent at node n deletes all edges arriving at n from a node different from z, marks the node n, and sends a mark-token to every node of S{n). (iii) Rule R3 is applied by the agent at a marked node n iff all agents at the nodes in S{n) have sent an end-token to node n. In particular, rule R3 is applied at every node n such that S{n) = 0. The agent at node n applies rule R3 by sending an end-token

7 Automatic Proofs of Protocols via Program Transformation

101

to each node in P{n). (Notice that each time this rule is applied, P{n) has exactly one element.) This rule RS is needed for detecting the termination of the algorithm. The algorithm terminates iff every node in 5(no) has sent an end-token to the root no. One can show that when this happens, the given graph G has been transformed into a directed subgraph T of the original graph G and T is a tree with root no. In this spanning tree example the protocol consists of the metarules with which every agent should comply. The protocol establishes that: (i) all nodes are initially unmarked and the fixed node no is the root of the spanning tree to be computed, (ii) the local rules are applied in an atomic way, in the sense that when one of them is applied to a node n, no rule can be applied to a node of P(n) U S{n), and (iii) the termination of the algorithm occurs when the root no has received an end-token from each of its children. Communications among agents take place by sending messages. In the case of our spanning tree algorithm, these messages are either mark-tokens or end-tokens. In general, the messages may be any first-order or higher-order value. They may also be agents themselves [3]. In this case the agents sent as messages, once they reach destination, will operate in a concurrent way together with the agents residing at the destination nodes. In a multiagent system messages may also modify the topology of the communication channels while the computation proceeds. This phenomenon can be modeled within, for instance, the 7r-calculus [16]. The notion of distributivity in a multiagent system is related to the amount of global knowledge which is shared by the individual agents during the computation. We say that distributivity is high if every agent in the multiagent system knows a small amount of global knowledge, and it is low if this amount is big. We will not give a formal definition of distributivity: for our purposes here it will be enough to say that the spanning tree algorithm that we have described above, has high distributivity, while the usual, centralized algorithm which performs a depth-first visit of the given graph and keeps a global representation of it, has low distributivity. The main objective of this paper is to present a technique for proving correctness of protocols. Before entering into this matter, we would like to stress the relevance of the correctness of protocols for multiagent systems by considering the familiar example of the n-queens problem for n = 4. We assume that each queen is a distinct agent. A position of a queen in the 4 x 4 board is denoted by a pair of numbers in {1,2,3,4} X {1,2,3,4}. For instance, a queen in position (1,2) is placed in the first row and in the second column. Let us consider the initial configuration whose positions are: (1,1), (2,2), (3,3), and (4,4) (that is, the queens are in the diagonal of the board from bottom-left to top-right). Let us also assume the metarules (or protocol) by which: (i) one queen at a time may make a move, while all other queens do not move, and (ii) a queen may go only to a position which is safe. Under these metarules there is no solution to the 4-queens problem because no queen can move. However, there is a solution to the 4-queens problem if we consider the following metarules: (i) each queen can move only along her column, and (ii) any two mutually attacking queens make their next moves, while the other queens do not move, so that after the move, at least one of the two queens is free from attacks by any other queen.

102

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

(We assume that each queen knows whether or not she is free from attacks by any other queen, and in this sense she has a global knowledge of the board.) This protocol is correct, in the sense that there exists a sequence of board configurations which leads to a final configuration where no queen is under attack. One such a sequence is the following: (1,1), (2,2), (3,3), (4,4) -^ {moves of queens 1 and 2} -> -^ (1,2), (2,4), (3,3), (4,4) -^ {moves of queens 3 and 4} -> ^ (1,2), (2,4), (3,1), (4,3). Notice that the protocol we have presented, leaves unspecified the order in which the pairs of mutually attacking queens should be considered.

7.2 Specifying Protocols Using Logic Programs In this section we present a method for specifying protocols and their properties by means of logic programs. The behaviour of a multiagent system that evolves over time according to a given protocol, can be modeled by means of a state transition system. This modeling approach has been used in model checking techniques for the formal verification of concurrent systems [5]. A state is a configuration of a system which is identified by an assignment of values to the system variables. A transition between states models an action performed by an agent. Formally, a state transition system is given by: (i) a set S of states, (ii) an initial state SQ G S, and (iii) a transition relation t C S x S. We assume that t is a total relation, that is, for every state si e S there exists a state S2 G S, called successor state of si, such that ^(51,52) holds. A computation path starting from a state 5i is an infinite sequence of states SQSI ... such that, for every i > 0, t{si^ s^+i) holds. In this section and in the next one, we deal v^ithfinitestate systems, that is, we assume that the system variables range over finite sets of values and, thus, the set of states is finite. The transition relation t can be defined by clauses of a logic program. For instance, the finite state system depicted in Figure 7.1 can be specified by the relation t defined by the following four unit clauses: t{So,Si)

^

t{si,So)

<-

t ( 5 i , 5 2 ) <-

t{s2, S2)

^

The properties of the evolution over time of a state transition system are specified by using a temporal logic called Computation Tree Logic (or CTL, for short [5]). We suppose that, for each state 5 G 5, we are given a set of elementary properties that hold in s. These elementary properties are specified by a binary relation elem such that elem{s, p) holds iff the elementary property p holds in s. The relation elem can be defined by a logic program. For instance, if we assume that no elementary property holds in SQ and si, and the elementary property a holds in 52 (see Figure 7.1), then the relation elem is defined by the following unit clause: elem{s2,a) ^— The formulas of CTL are built from the given set of elementary properties by using: (i) the connectives: -> ('not') and A ('and'), (ii) the following quantifiers along a

7 Automatic Proofs of Protocols via Program Transformation

103

Fig. 7.1. A finite state transition system. The formula ^ placed near state Si indicates that (/? holds in Si. so is the initial state.

computation path: g ('for all states on the path' or 'globally'), / ('there exists a state in the path' or 'in the future'), x ('next time'), and u ('until'), and (iii) the quantifiers over computation paths: a ('for all paths') and e ('there exists a path'). In this paper we will consider only the subset of the formulas of CTL that can be constructed by using the connectives and the two combinations ef and af of quantifiers. Thus, we assume that the syntax of a temporal formula (f is as follows: if :::= p \ -nip \ cpi A (p2 \ ef (f \ af if where p is an elementary property. We also use the following abbreviations: (i) egcp = -^af ^(f and (ii)a^(^ = -ie/-'(^. The semantics of temporal formulas is given by: (i) a Kripke structure K [5], which is a state transition system together with a relation elem specifying a set of elementary properties for each state s of /C, and (ii) a satisfaction relation /C, s f= (^ denoting that a formula if holds in a state s ofJC. The relation IC,s \= cpis inductively defined as follows: JC^s \= p iff p is an elementary property such that elem{s,p) holds /C, 5 1= -i(^ iff it is not the case that IC,s \= (p K,,s \= (fi A(p2 iff /C, 5 1= (fi and /C, 5 [= (^2 /C, 5 ^ e/ (^ iff there exists a computation path so si... such that (i) s = So and (ii) for some n > 0 we have that /C, s^ \= 0 such that /C, Sn [= (f. For instance, let JCQ be the Kripke structure consisting of the transition system of Figure 7.1, where the elementary property a holds in state §2 only. Then the following properties hold: ^ 0 , so[= ef a /Co, SQ \= af ef a ^ 0 , So \= -^af a /Co, 5o \= ag ef a In order to verify that a temporal formula ip holds in the initial state 5o of a given Kripke structure /C, we encode the satisfaction relation /C, so |= ^ ^^ a predicate sat{so, if) defined by a logic program PK,. This program can be constructed by induction on the structure of the formula ip as follows [9]: 51. sat{X,F) ^ elem{X,F) 52. sat{X,^F) <- -^sat{X,F) 53. sat{X, Fl A F2) ^ sat{X, Fl) A sat{X, F2)

104

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

54. sat{X, efF) ^ sat{X,F) 55. sat{X, ef F) ^ t{X, Y) A sat{Y, ef F) 56. satlx, af F) ^ sat{X, F) together with, for each state s G 5, a clause of the form: 57. sat{s^ af F) <^ sat{si^ af F) A ... A sat{sk, af F) where 5 i , . . . , 5/c are the successor states of s. Program Pjc also includes the clauses that define the relations elem and t for the given Kripke structure /C. In program Pjc clauses S4 and S5 express that the formula ef (p holds in a state s iff either (p holds in s or ef (p holds in a successor state of s. Clauses S6 and S7 express that the formula af (p holds in a state 5 iff either (p holds in s or af (p holds in all successor states of s. Program P)c is a logic program with locally stratified negation and, thus, it has a unique perfect model, denoted by M{P)c) [1]. For every Kripke structure /C, state So, and temporal formula (p, we have that [9]: }C,so\=p iff sat{so,
7.3 Transformation Rules and Strategies for the Verification of Protocols In this section we describe our method for verifying temporal properties of protocols. This method is based on suitable transformation rules (see rules R1-R4 below) and on a transformation strategy, called UFV (short for Unfold/Fold Verification), for guiding the application of these rules. Our verification method consists of two steps as indicated below.

7 Automatic Proofs of Protocols via Program Transformation

105

The Verification Method. Step 1. Given a Kripke structure /C, an initial state SQ, and a temporal formula cp, we construct a program Pjc such that /C, 5o \= (p iff sat{so^ (p) G M{PK) as indicated in Section 7.2. Step 2. We introduce the clause 5o\ newO <«— sat{so, (p), where ne?i;0 is a new predicate symbol. Then we apply the transformation rules R1-R4 according to the transformation strategy UFV and from program P)cU {60} V/Q derive a transformed program T such that: newO € M{P)c U {SQ}) iff newO G M{T). Finally, we inspect program T and (i) if the unit clause newO <— occurs in T then /C, 5o h= <^, and (ii) if no clause with head newO occurs in T then /C, SQ ^ (p. The correctness of our verification method is a consequence of the following facts: (i) by construction we have that JC^SQ \= (p iff sat{so^ (p) G M{P)c), (ii) since <5o: newO <(— sat{so^

) is the only clause that defines nei^O in PK, U {(^O}» we have that sat{so, ip) G M{PK) iff neiyO G M{P)c U {(^o})» (iii) since the transformation rules, when applied according to the transformation strategy UFV, preserve the perfect model, newO G M{P)c U {SQ}) iff newO G M{T), and finally, (iv) by the definition of perfect model, (iv.l) if newO <— occurs in T then newO G M{T) and (iv.2) if no clause with head newO occurs in T then newO ^ M{T). Let us return to our example of Figure 7.1 and let us suppose that we want to verify that JCQ^SQ \= -^ef -^ef a holds, that is, for every state reachable from the initial state 5o it is possible to get to a state where a holds. Let Po be the program constructed at Step 1 as indicated in Section 7.3. Step 2 of our verification method consists in transforming the program PQ U {SQ} where SQ is the clause newO <^ sat{sQ, -^ef -«e/ a) into a program where newO <— occurs. We will present this transformation in Section 7.3.3. 7.3.1 The Transformation Rules The process of transforming a given program Pjc thereby deriving program T, can be formalized as a sequence PQ, • • •, ^n of programs, called a transformation sequence, where: (i) Po = PA:, (ii) Pn = T, and (iii) for 2 = 0 , . . . , n - 1 , program P^+i is obtained from program Pi by applying one of the transformation rules listed below. These rules are variants of the rules considered in the literature for transforming logic programs (see, in particular, [20, 21]). The atomic definition rule allows us to introduce a new predicate definition by adding to program Pi a new clause whose body consists of one atom only. We can use this rule to add clause So to program P/c at the beginning of Step 2 of our verification method. Rl. Atomic Definition. We introduce a new clause, called a definition, of the form: S : newp{Xi, • • •, ^m) ^ A where: (i) newp is a predicate symbol not occurring in PQ, . . . , Pi, (ii) X i , . . . , Xm are the variables occurring in A, and (iii) the predicate of A occurs in PQ. By atomic

106

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

definition (or definition, for short), we derive the new program P^^-i = P^ U {5}. For i > 0, DefSi denotes the set of definitions introduced during the transformation sequence PQ^. .. ^Pi.ln particular, Defs^ = 0. The unfolding rule corresponds to a symbolic computation step. It replaces a clause 7 in Pi by the set of all clauses that can be derived by applying a resolution step w.r.t. a literal L occurring in the body of 7. We have a positive and a negative unfolding rule, according to the case where L is a positive or negative literal. Notice that in the negative unfolding rule (see case R2n below) the literal L should be either valid or failed. We say that an atom A is valid in a program P iff there exists a unit clause i7 ^— in P such that A is an instance of H. We say that an atom A is^ failed in P iff there exists no clause H ^ Gin P such that A is unifiable with B. The negated atom -^A is valid iff A is failed and -^A is failed iff A is valid. R2. Unfolding. Let 7 : H '^ Gi A L A G2 be a. clause in Pi and let P/ be a variant of Pi without common variables with 7. We consider the following two cases. (R2p: Positive Unfolding) Let L be a positive literal. By unfolding ^ w.r.t, L v/e derive the set of clauses r : {{H ^GiAGA G2)'d | (i) i^ ^ G is a clause in P^ and (ii) L and K are unifiable with mgu 'd} We derive the new program P^+i = (P^ — {7}) U P. (R2n: Negative Unfolding) Let L be a negative literal, (i) If L is valid in P/, then by unfolding ^ w.r.t. L ^e derive the clause

ry:

H^GiAG2

and we derive the new program P^+i = {Pi - {7}) U {7]}. (ii) If L is failed in P/, then by unfolding j w.r.t. L we derive the new program P,+i =Pi- {7}. The atomic folding rule allows us to replace an atom A which is an instance of the body of a definition by the corresponding instance of the head of the definition. R3. Atomic Folding. Let 7 : i? ^ Gi AL AG2 be a clause in Pi and IciS : N ^ A be a clause in Defs^ without common variables with 7. We consider the following two cases. (R3p: Positive Folding) Let L be the atom A'd for some substitution i9. By folding 7 w.r.t. A using S we derive the clause

rj: H

^GiAN'dAG2

and we derive the new program Pi_{-i = {Pi — {7}) U {r/}. (R3n: Negative Folding) Let L be the negated atom ^Ad. By folding 7 w.r.t. -^A using S, we derive the clause

77: H

^GiA-^N^AG2

and we derive the new program Pi^i = {Pi — {7}) U {rj}. The following clause removal rule may be used for removing from Pi a redundant clause 7, that is, a clause 7 such that M{Pi) = M{Pi - {7}). Let us first introduce the following definitions. The set of useless predicates in a program P is the maximal set U of predicate symbols occurring in P such that a predicate p is in C/ iff for every clause p{...) ^— G in P , the body G is of the form Gi A q{...) A G2 for some predicate g in C/. A clause is useless iff the predicate in its head is useless.

7 Automatic Proofs of Protocols via Program Transformation

107

R4. Clause Removal, Let 7 be a clause in Pi. By clause removal we derive the new program P^+i = Pi — {7} if one of the following cases occurs: (R4s: Removal of Subsumed Clause) 7 is a clause H
PhaseB.

RemovedUnfold{PA,T)

The f/ipy strategy is divided into two phases: Phase A and Phase B. Phase A takes program PK, and clause ^o as input and returns program PA as output. Initially program PA is the empty set of clauses. During Phase A we use the following two sets of clauses: (1) Defs, which is the set of definitions introduced during the transformation process, and (2) A, which is the set of definitions that have been introduced but not yet unfolded. At the beginning of Phase A we apply the definition rule and we add clause 6Q to Defs and A. Then the strategy performs a WHILE-DO loop whose body consists of applications of the unfolding rule according to the Unfold procedure (see below), followed by applications of the definition and folding rules according to the DefineSzFold procedure (see below). The Unfold procedure takes a definition clause 5 e A and derives a set F of clauses by applying one or more times the positive unfolding rule starting from clause S. Every clause in F is of the form iJ ^ Li A . . . A Ln, where n > 0 and each literal Li is either an atom of the form sat{s, V^) or a negated atom of the form -^sat{s, ijj). The DefineSzFold procedure takes as input: (i) the set F of clauses and (ii) the set Defs of definitions, and returns as output: (i) a set NewDefs of new definitions and (ii) a set ^ of transformed clauses. NewDefs is the set of definitions of the form newp
108

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

the definitions of NewDefs are added to Defs and, in the set A, clause 5 is replaced by the clauses of NewDefs. Phase B is realized by the Removek, Unfold procedure, which transforms PA by repeatedly removing useless clauses, subsumed clauses, and applying the positive and the negative unfolding rule w.r.t. valid or failed literals. Upon termination this procedure returns a program T where new^ is either valid or failed, that is, either (i) the unit clause new^ ^— occurs in T or (ii) no clause with head new^ occurs in T. In case (i) we have that /C, SQ \= ^ and in case (ii) we have that /C, SQ ^ (/?. For a Kripke structure /C with a finite set of states the L^FV strategy terminates. In particular, only a finite number of definitions will be introduced during the execution of the WHILE-DO loop because only a finite number of distinct atoms of the form sat{s^ ^) can be generated. Indeed s is an element of a finite set of states and ^ is a (proper or not) subformula of the given formula ^. Thus, the UFV strategy is a decision procedure for checking whether or not /C, 5o \= ^ holds for any given finite state Kripke structure /C, initial state 5o, and temporal formula (f. 7.3.3 An Example of Application of the Verification Method In this section we will see our transformation strategy in action for the verification of a property of the finite state system of Figure 7.1. We want to show that in that system, for every state which is reachable from the initial state SQ, it is possible to get to a state where a holds, that is, /Co, 5o |= ag ef a. The initial program PQ which encodes the Kripke structure /Co, is the following: 1. sat{s2,CL) <— 4. sat{sQ^ef F) <— sat{si,ef F) 2. sat{S,-^F) ^ -^sat(S,F) 5. satlsi, ef F) ^ sat{so, ef F) 3. sat{S, ef F) ^ sat(S,F) 6. sat{si, ef F) ^ sat{s2, efF) 7. sat{s2, ef F) <— sat{s2, efF) Program PQ has been obtained by unfolding the program Pjc which encodes a generic Kripke structure /C, by using the clauses which define the relations elem and t relative to /Co (see Section 7.2). We have not considered the clauses for temporal formulas of the form F l A F2 and afF because they are not needed during the application of the transformation strategy. The clause 5Q is newO ^— sat{soj -^ef ^ef a). (Recall that ag ef ^ is equivalent to ->ef ->ef (f.) By applying the Unfold procedure, we unfold clause 5o using clause 2 and we get: 8. newO ^— ^sat{so, ef -> ef a). Then we apply the DefinekFold procedure. We introduce the definition: 9. newl <— sat{sQ, ef -> ef a) and we fold clause 8 using clause 9. We get the following clause: 10. newO newl At the end of the DefinekFold procedure, we have that the set ^ of clauses and the program PA are both equal to {clause 10}. We also have that Defs = {Jo, clause 9} and NewDefs = A = {clause 9}. Thus, we perform again the body of the WHILEDO loop of Phase A of the UFV strategy. We apply the Unfold procedure and we unfold clause 9. We derive the following clauses:

7 Automatic Proofs of Protocols via Program Transformation

109

11. newl
7.4 Verification of Infinite State Systems Our transformational approach for the verification of properties of protocols can be extended from finite state systems to infinite state systems. In order to specify infinite state systems we extend the method described in Section 7.2 by using constraint logic programs [15]. These programs generalize logic programs by allowing the bodies of the clauses to contain constraints. Constraints are formulas that define relations over some given domains, such as real numbers, integers, and trees. For our purposes here, constraints are simply first order formulas whose predicate symbols are taken from a distinct set. These constraints can be evaluated by using constraint solvers that are realized by ad hoc algorithms. The semantics of a constraint c is defined by the usual first order satisfaction relation V \= c, where X> is a fixed interpretation. The notion of perfect model can be extended from logic programs to constraint logic programs in a straightforward way [9].

110

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

A transition relation t on an infinite set of states can be specified by using constraints in the body of the clauses that define t. For instance, the clause t{X, Y) <— X >0 AY = X'\-1 specifies a transition relation on the set of the integer numbers. We will see a more elaborate example in Section 7.4.1. In order to encode the satisfaction relation JC,s \= (pfora, Kripke structure /C with an infinite set of states, we can construct a constraint logic program Pjc similarly to what we have described in Section 7.2. However, in order to encode the satisfiability of a temporal formula of the form af (p, the method of Section 7.2 introduces a clause for each state of/C and this is impossible for infinite state systems. Fortunately, this problem can be overcome for a large class of infinite state systems by using constraints as indicated in [9] (see also Section 7.4.1 for an example). In order to extend our verification method to the case of infinite state systems, we need to extend the transformation rules and the transformation strategy presented in Section 7.3 to the case of constraint logic programs. The extensions of the definition, unfolding, folding, and clause replacement rules can be found in [8, 9]. Moreover, we will use the following two transformation rules which specifically refer to constraints: (i) Rule R4f, for deleting a clause whose body contains an unsatisfiable constraint, and (ii) Rule R5, for replacing a constraint by an equivalent one. R4f. Removal of Clauses with Unsatisfiable Body. Let 7 be a clause of the form H <^ c A G in Pi. Suppose that c is unsatisfiable, that is, V \= -'3(c), where 3(c) is the existential closure of c. Then, we derive the new program Pi^i = Pi — {7}. R5. Constraint Replacement. Let 71 : ^ ^ ci A G be a clause in Pi. Suppose that for some constraint C2, we have that: PhV(3yi...3nci^3Zi...3Z^C2) where: (i) F i , . . . , Y^ are the variables occurring free in ci and not in {H, G}, (ii) Z i , . . . , Zm are the variables occurring free in C2 and not in {H, G}, and V((/?) denotes the universal closure of formula ^p. Then by constraint replacement we derive the clause 72 : E ^ C2/\G and we derive the new program P^+i = {Pi — {71}) U {72}. The transformation strategy L^FV presented in Section 7.3.2 can be extended to constraint logic programs that encode infinite state systems. During the execution of this strategy, we apply the modified transformation rules for constraint logic programs. In particular, during the execution of the DefineSzFold procedure, when applying the definition rule, we introduce new definitions of the form: NewH ^c{X)Asat{X,2/j) where: (i) X is a variable ranging over states, (ii) c{X) is a constraint representing a possibly infinite set of states, and (iii) V^ is a temporal formula. The main issue that arises when dealing with infinite state systems is that the termination of the f/FV strategy is no longer guaranteed. Indeed, for any given temporal formula ip, an infinite number of constrained atoms of the form c{X) A sat{X^ 2p) with non-equivalent constraints may be generated and, thus, an infinite number of non-equivalent definitions may be introduced.

7 Automatic Proofs of Protocols via Program Transformation

111

For instance, during the verification of the mutual exclusion property of the Bakery protocol (see Section 7.4.1 below), starting from the initial definition SQ : newO l) and the t/FV strategy does not terminate. We can often overcome this non-termination problem by introducing a generalization operator between clauses, and modifying the DefineSzFold procedure used of the UFV strategy as we now indicate. Suppose that a constrained literal L of the form either d{X)Asat{X, ijj) or d{X)A->sat{X, ip) occurs in the body of a clause 7 belonging to the set F of the clauses derived by the Unfold procedure. In order to fold 7 w.r.t. L we proceed as follows: (i) if 7 can be folded using a definition S belonging to the set Defs of all definitions introduced so far, then we fold 7 using 6, otherwise (ii) we consider a clause in Defs of the form: NewHl ^ c{X) A sat{X,^) and a clause of the form: NewH2^d{X)Asat{X,^lj) and we introduce the generalized clause: Sgen- GenH 4— genc{X) A sat{X,il^) where V\=yX {c{X) -^ genc{X)) and V^WX {d{X) -^ genc{X)). Then we fold 7 using Sgeji. For example, during the verification of the Bakery protocol, from clauses 62 and ^3, by using a generalization operator we introduce the new clause: gen{A2) ^ ^ 2 > 1 A B 2 = 0 A sat{{wait,A2, think,B2), ef unsafe) Suitable generalization operators which ensure the termination of the UFV strategy for a large class of infinite state systems can be found in [9]. 7.4.1 Examples of Verification of Infinite State Systems In this section we use our verification method for proving various properties of infinite state systems. In particular, we consider the Bakery protocol [11] and we verify that it satisfies the mutual exclusion and starvation freedom properties. Then, at the end of the section, we report on some more experimental results concerning the proofs of properties of several other protocols. Let us consider two agents A and B which want to access a shared resource in a mutual exclusive way by using the Bakery protocol. The state of agent A is represented by a pair {Al, A2), where Al is an element of the set {think, wait, use} of control states, and A2 is a counter that takes values from the set of natural numbers. Analogously, the state of agent B is represented by a pair ( S I , B2). The state of the system consisting of the two agents A and B, whose states are {Al, A2) and

112

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

(Bl, B2), respectively, is represented by the 4-tuple (Al, A2, B l , 52). The transition relation t of the two agent system from an old state OldState to a new state NewState, is defined as follows: TA. t{OldState, NewState) ^ tA{OldState, NewState) TB. t\oidState, NewState) ^ tB{OldState, NewState) where the transition relation tA for the agent A is given by the following clauses whose bodies are conjunctions of constraints (see also Figure 7.2): Al. tA{{think, A2,51,52), {wait, A21,51,52)) ^ ^21 = 5 2 + 1 A2. tA{{wait, A2,51,52), {use, A2,51,52)) ^ A2<52 A3. tA{{wait, A2,51,52), {use, A2,51,52)) ^52 =0 A4. tA{{use, A2,B\,B2), {think, A2\,B\,B2)) ^ A21 = 0 The following analogous clauses define the transition relation tB for the agent 5 : Bl. tB{{A\,A2,think,B2),{A\,A2,wait,B2l)) ^ 5 2 1 = ^12+1 B2. tB({A\, A2, wait, 52), {Al, A2, use, 52)) ^ 5 2 < A2 B3. tB({Al, A2, wait, 52), {Al, A2, use, 52)) ^ A2 = 0 B4. tBl{Al,A2, use, B2),{A1,A2, think, B21)) ^ 5 2 1 = 0

A2:=0

A2
Notice that the two agent system has an infinite number of states, because counters may increase in an unbounded way, as indicated by the underlined states of the following computation path starting from the initial state {think, 0, think, 0): {think, 0, think, 0), {wait, 1, think, 0), {wait, 1, wait, 2), {use, 1, wait, 2), {think, 0, wait, 2), {think, 0, use, 2), {wait, 3, use, 2), {wait, 3, think, 0), . . . We may apply our verification method for checking the mutual exclusion property of the Bakery protocol. This property is expressed by the CTL formula: ->e/ unsafe, where for all states s, elem{s, unsafe) holds iff s is of the form {use, A2, use, 5 2 ) , that is, unsafe holds iff both agents A and 5 are in the control state use. The initial program Pjc which encodes the Kripke structure of the Bakery protocol with two agents A and 5 , is the following one:

7 Automatic Proofs of Protocols via Program Transformation

113

1. sat{{use, A2, use, B2)J unsafe) <^ 2. sat{S,^F) ^^sat{S,F) 3. sat{S,efF) ^ sat{S,F) 4. sat{S, ef F) ^ t{S, T) A sat{T, ef F) together with the clauses TA, TB, A1-A4, and B1-B4, which define the transition relation t. The initial clause So for the mutual exclusion property is: newOme ^— sat {{think, 0, think, 0), -le/ unsafe) For the Bakery protocol we may also want to prove the starvation freedom property which ensures that an agent, say A, which requests the shared resource, will eventually get it. This property is expressed by the CTL formula: ag {waitA -^ af use A), which is equivalent to: -^ef {waitA A ^af useA). For the elementary properties waitA and useA, the satisfaction relation is defined by the clauses: sat{{wait, A2, Bl, B2), waitA) <— sat{{use, A2, S I , S2), useA) ^ For the starvation freedom property the initial clause 6o is: newOsf <— sat {{think, 0, think, 0),->ef {waitA A->af use A)) The clauses for sat{X,-^F), sat{X,FlAF2), and sat{X, ef F) are clauses S2-S5 (see Section 7.2). We do not have the space here to list all clauses for sat{X, af F). These clauses include clause S6 (see Section 7.2) together with one or more clauses of the form S7 (see Section 7.2) for each state s of the form {A1,A2,B1,B2), where Al and J51 belong to {think, wait, use}. For instance, the clause for the state {think, A2, think, B2) is: sat{{think,A2, think, B2), af F) ^ A21 = B2-hl A 521 = ^ 2 + 1 A sat{{wait, A21, think, 52), af F) A sat{{think, A2, wait, B21), af F) The remaining clauses for sat{X, af F) can be found in [9]. By using our experimental constraint logic program transformation system MAP [14] we have been able to automatically verify the mutual exclusion and the starvation freedom properties of the Bakery protocol. We have verified some more properties of various other protocols by using our system MAP running on a Linux machine with a 900 MHz clock, and the results of these experiments are reported in the following Table 7.1. The verification times we have obtained demonstrate that our system performs well w.r.t. the DMC system [6] and the other systems cited in [6, 7].

7.5 Conclusions and Related Work We have presented a method for verifying CTL properties of protocols for multiagent systems specified as constraint logic programs. For systems which have a finite number of states, the method is complete and can be used as a decision procedure. For systems which have an infinite number of states, the method may not terminate. However, for a large class of infinite state systems, the method terminates if we use suitable generalization operators. We have applied our method for proving safety and liveness properties of several infinite state protocols.

114

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

Table 7.1. Experimental results of the verification of some properties of various protocols. The protocols and the properties are taken from [6]. The verification time is expressed in seconds. Protocol Bakery (mutual exclusion) Ticket (mutual exclusion) Berkeley RISC (cache coherence) Xerox Dragon (cache coherence) DEC Firefly (cache coherence) Illinois University (cache coherence) MESI (cache coherence)

Property -le/ unsafe ag (wait A —> af use A) -le/ unsafe ag (wait A —> af use A) -^ef{dirty>2) ->ef {dirty > 1 A shared > 1) -^ef {dirty>2) -le/ {dirty > 1 A sharedxlean > 1) -le/ {dirty > 1 A shareddirty > 1) ^ef{dirty>2) ->e/ {dirty > 1 A shared > 1) -^ef {dirty>2) -le/ {dirty > 1 A shared > 1) -yef {dirty>2) ->e/ {dirty > 1 A shared > 1)

Verification Time 0.2 2.3 0.6 3.0 2.0 1.3 1.3 0.9 1.0 0.4 0.4 0.3 0.3 0.3 0.2

Our verification method is related to others presented in the literature for the proofs of properties of concurrent systems which use the logic programming paradigm. Among them we mention the following ones. In [18] the authors present XMC, a model checking system implemented in the tabulation-based logic programming language XSB. XMC can verify temporal properties expressed in the altemation-free fragment of the /i-calculus of finite state systems specified in a CCS-like language. In [17] a model checker is presented for verifying CTL properties of finite state systems, by using logic programs with finite constraint domains that are closed under conjunction, disjunction, variable projection and negation. The verification process is performed by executing a constraint logic program encoding the semantics of CTL in an extended execution model that uses constructive negation and tabled resolution. In [10] an automatic method for verifying safety properties of infinite state Petri nets with parametric initial markings is presented. The method constructs the reachability set of the Petri net being verified by computing the least fixpoint of a logic program with Presburger arithmetic constraints. A method for the verification of some CTL properties of infinite state systems using constraint logic programming is described in [7]. Suitable constraint logic programs which encode the system and the property to be verified, are introduced, and then, the CTL properties are verified by computing exact and approximated least and greatest fixed points of those programs. Finally, the use of program transformation for verifying properties of infinite state systems has been investigated in [12, 19]. In particular, (i) specialization of logic programs and abstract interpretation are used in [12] for the verification of safety properties of infinite state systems, and (ii) unfold/fold transformation rules

7 Automatic Proofs of Protocols via Program Transformation

115

are applied in [19] for proving safety and liveness properties of parameterized finite state systems with various network topologies.

Acknowledgements Many thanks to Dr. A. Jankowski, Prof. A. Skowron, and Dr. M. Szczuka for their kind invitation at the International Workshop MSRAS 2004. We would like also to thank Prof. G. Delzanno, Prof. S. Etalle, and Prof. M. Leuschel for helpful discussions and comments.

References 1. K.R. Apt and R.N. Bol. Logic programming and negation: A survey. J, Logic Programming, 19,20:9-11, 1994. 2. R.M. Burstall and J. Darlington. A transformation system for developing recursive programs. JACM, 24(l):44-67, January 1977. 3. L. Cardelli and A.D. Gordon. Mobile ambients. Theoretical Computer Science, 240(1):177-213,2000. 4. W. Chen and D.S. Warren. Tabled evaluation with delaying for general logic programs. yACM, 43(1), 1996. 5. E.M. Clarke, O. Grumberg, and D. Peled. Model Checking, MIT Press, 2000. 6. G. Delzanno. Automatic verification of parametrized cache coherence protocol. In Proc. CAW 2000, LNCS 1855, 55-68. Springer, 2000. 7. G. Delzanno and A. Podelski. Model checking in CLP. In R. Cleaveland (ed.) Proc. TkCAS '99, LNCS 1579, 223-239. Springer, 1999. 8. S. Etalle and M. Gabbrielli. Transformations of CLP modules. Theoretical Computer Science, 166:101-146, 1996. 9. F. Fioravanti, A. Pettorossi, and M. Proietti. Verifying CTL properties of infinite state systems by specializing constraint logic programs. R. 544, lASI-CNR, Roma, Italy, 2001. 10. L. Fribourg and H. Olsen. Proving safety properties of infinite state systems by compilation into Presburger arithmetic. In Proc. CONCUR '97, LNCS 1243, 96-107. SpringerVerlag, 1997. 11. L. Lamport. A new solution of Dijkstra's concurrent progranmiing problem. CACM, 17(8):453-455, 1974. 12. M. Leuschel and T. Massart. Infinite state model checking by abstract interpretation and program specialization. In Proc. LOPSTR '99, LNCS 1817, 63-82. Springer, 1999. 13. J.W.Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin, 1987. 14. MAP group. The MAP transformation system. Available from: h t t p : //\tmw. i a s i . r m . c n r . i t / ~ p r o i e t t i / s y s t e m . h t m l , 1995-2004. 15. K. Marriott and P. Stuckey. Programming with Constraints: An Introduction. The MIT Press, 1998. 16. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes. Part I and II. Information and Computation, 100(1): 1-77, 1992. 17. U. Nilsson and J. Liibcke. Constraint logic progranmiing for local and symbolic modelchecking. In Proc. CL 2000, LNAI 1861, 384-398. Springer, 2000.

116

Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti

18. Y.S. Ramakrishna, C.R. Ramakrishnan, I.V. Ramakrishnan, S.A. Smolka, T. Swift, and D.S. Warren. Efficient model checking using tabled resolution. In Proc. CAV '97, LNCS 1254, 143-154. Springer, 1997. 19. A. Roychoudhury, K. Narayan Kumar, C.R. Ramakrishnan, I.V. Ramakrishnan, and S.A. Smolka. Verification of parameterized systems using logic program transformations. In Proc. TACAS2000, LNCS 1785, 172-187. Springer, 2000. 20. H. Seki. Unfold/fold transformation of stratified programs. Theoretical Computer Science, S6:101-139, 1991. 21. H. Tamaki and T. Sato. Unfold/fold transformation of logic programs. In S.-A. Tamlund (ed.) Proc. 2nd Int. Conf. Logic Programming, 127-138, Uppsala, Sweden, 1984.

8 Mereological Foundations to Approximate Reasoning Lech Polkowski* Polish-Japanese Institute of Information Technology, Koszykowa 86, 02008 Warsaw, Poland Department of Mathematics and Computer Science, University of Warmia and Mazury, Zolnierska 14a, 10561 Olsztyn, Poland email: [email protected]

Summary. In this article, we intend to present a synthetic account of mereological foundations for approximate reasoning along with an outline of applications of this approach to modem paradigms like Granular Computing, and Spatial Reasoning. Key words: rough set theory, rough mereology, rough inclusions, granulation, spatial reasoning, granular rough set theory

8.1 Introduction: Rough Sets and Mereology in Approximate Reasoning We begin with an example that will demonstrate basic ideas of rough sets and mereology and will introduce at the same time a problem of approximate reasoning. The well-known heap paradox due to Eubulides of Miletus consists in two assumptions: 1. 1 grain does not make a heap; 2. if n grains do not make a heap then n + 1 grains do not make a heap, along with a tacit assumption that there is a number making the heap. The paradox results when one applies the mathematical induction to infer from 1 and 2 that there is no number of grains that could make a heap. Let us suppose that we denote with heap-number the set of natural numbers n such that n grains make a heap, and introduce two disjoint to each other and to heap-number sets non-heap-number and B such that the union of the three sets is the set N of natural numbers. We modify the rules 1,2 by adopting the new set of rules: 3. 1 G non-heap-number \ 4. if n G non^ heap-number then n + 1 G W n -\- 1 G B; 5. if n G B then n-f-1 G S V n - f l G non-heap-number heap-number. Using 3-5 and the tacit assumption, one infers that there is a number of grains at which one passes from non- heap state to B, and then a number of grains at which one goes from B to the heap state. The set B is a witness to vagueness of the notion of a heap [18]. *This article is an extended version of the plenary talk given by the author at MSRAS 2004 in Plock, Poland on June 7, 2004

118

Lech Polkowski

The example illustrates aptly the idea of approximate reasoning: one attempts at a description of an imprecise concept, e.g., of a heap, by means of a precise set of notions. As a result, such description may be only approximate. This example also illustrates well the idea of a rough set: in order to describe the imprecise concept of a heap, one does introduce a region B that does witness the imprecision: we actually do not know at what number n the state of non_heap changes into B and at what number m the state B changes into the heap state, yet we know it must be so: B is the region of uncertainty. Finally, one may observe that the notion of a heap is of mass (collective) nature and as such, it may be described better in terms of parts then in terms of elements, i.e., a heap should rather be discussed in mereological terms. Actually, the heap paradox catches in an ingenious way the fact that in describing a heap one passes from settheoretical description in terms of elements to a mereological description in terms of parts; this passing happens somewhere in the set B. 8.1.1 A formal idea of a rough set The idea of a rough set was proposed by Zdzislaw Pawlak [11] in the context of knowledge represented as an equivalence relation E on a set U of entities. Equivalence classes of R contain entities that are indiscernible with respect to R, and a concept (a subset of the set 17) X is said to be exact in case it is a union of a family of equivalence classes of R; otherwise, X is said to be rough. An information system (Pawlak, see [10]) is a well-known way of presenting data, and representing the relation i? ; it is symbolically represented as a pair A={U,A). The symbol U denotes a set of objects, and the symbol A denotes the set of attributes. Each pair (attribute, object) is uniquely assigned a value: given a £ A,u eU, the value a{u) is an element of the value set V. Each entity (object) u £ U is represented in the information system A by its information set InfA{u) = {(a, a{u)) : a e A}, that corresponds to the u-th row of the data table A. Two objects, u,w, may have the same information set: Inf{u) = Inf{w) in which case they are said to be A-indiscemible (see [11], [10]); the relation IND{A) = {{u,w) : InfA{u) = Inf^iw)} is said to be the A-indiscemibility relation. It is an equivalence relation that renders a form of the general relation R. The symbol [U]A denotes the equivalence class of the relation IND{A) containing u. It follows that a concept X is A-exact if and only if X is a union of equivalence classes: X = U { M A • '^ ^ -^}It may be observed that the notion of indiscemibility may be defined with respect to any set B C A of attributes, i. e., a B-information set InfB{u) — {(a, a{uf) : a G 5 } is defined and then the relation of B-indiscemibility, IND{B) = {{u, w) : InfB{u) = Irifsiw)} is introduced, classes of which form B-exact sets. Given a set B of attributes, and a concept X C U, that is not J5-exact, there exists u e U with neither [U]B Q X nor [U]B Q U \X. Thus, the B-exact sets, BLOWX

= {U eU

: [U]B Q X},

and, B^^^X

= {u e U : [U\B H X ^

0}

8 Mereological Foundations to Approximate Reasoning

119

are distinct, and BLOWX Q X C B^^^X. The set BLOW^ is the lower Bappwximation to X whereas B^^^X is the upper B-approximation to X, The concept X is said to be B-rough. B{X) = B^^^X \ BLOWX is the S-boundary region, the counterpart to B in the heap example. 8.1.2 Mereology From among mereological theories of sets, we choose the chronologically first, and conceptually most elegant, viz., the mereology proposed by Le^niewski, see [4], and based on the notion of a part. Parts The relation of being a part, denoted TT, satisfies the requirements, (PI) xny A ynz => xnz. (P2) xny =^ ->{y7rx). It follows that -i(x7rx).

(8.1)

The relation of proper containment C in a family of sets satisfies (PI), (P2). The notion of a n-element (mereological element), el-j^, is defined as follows, (El) xel-j^y ^ xTTy W x = y. By (El) and (PI-2), eZ^r is a partial ordering on the mereological universe. It follows by (El) that CIQ =C is the mereo-element relation in any family of sets. Class operator The relation of a part is defined for individual objects, not for collections of objects and the class operator allows to make collections of objects into objects. The definition of the class operator is based on the notion of element e/^r; we denote this operator with the symbol CIST^. Given a non-empty collection M of objects, the class of M, denoted CIST^M, is the object that satisfies the following requirements, (Clsl) ifxeM then xelrcCls^M. (Cls2) if XCIT^CIST^M then there exist objects y,z with the properties that yel^rX, yelT^z, and z e M. (Cls3) for each non-empty collection M, the class CIST^M exists and it is unique.

120

Lech Polkowski

The reader has certainly observed that the object CISQM in case of a collection M of sets in afield of setsy is the union set^jM. We may conclude that mereology, a set theory based on the notion of a part, is a feasible vehicle for a non-standard set theory that renders intuitions fundamental to rough set theory, and commonly expressed in the language of naive set theory at the cost of operating with collections of objects, not objects themselves.

8.2 Reasoning in Rough Set Framework: Set Theory and Logic for Rough Sets From a formal point of view, given a knowledge Rom. set of entities U, each exact set, X, satisfies the dichotomy,

\fueU[ueXyueU\X].

(8.2)

To reveal the nature of (8.2), let us observe that in case X is exact, \lueU[ueX^

[U]R C X].

(8.3)

Formula (8.3) suggests a new notion of element, viz., one we denote with the symbol G*, defined for any set X C t/ and u £U, as, ^ G* X <^

[U]R C

X.

(8.4)

8.2.1 Rough set theory Collecting together the facts on rough sets and exact sets and a new notion of an element, we may define Rough Set Theory, RZF (Rough Zermelo-Fraenkel) as a set theory whose instances are tuples of the form,

(c/,u,nA,G,G^<s,7^),

(8.5)

where C/ is a set (in standard ZF), U, fi, \ are standard set operations of the union, the intersection and the complement (set difference as well), G denotes the standard element predicate, G* denotes the rough element predicate, and, moreover the following requirements are satisfied, where the notion of containment C is based on the element predicate G: 1. £^ is the family of subsets of U with the property: uG*XViiG* t / \ X for each u eU, 2. (5, U, n, \) is a C-complete Boolean algebra. 3. X en^X ^e, for each X CU,

8 Mereological Foundations to Approximate Reasoning

121

4. £ is C-co-initial as well as C-co-final in the power set 2^; hence, for each X en, there exist Y,Z eS such that 7 C X C Z. 5. t/ G* X if and only if 3 7 eS.ueY CX. A model ofRZF is induced naturally in any knowledge base (t/, {Ri : i G /}) by considering classes of the relation i? = P|^ Ri, the relation G* defined as x e* X if and only if[x]R C X, and S defined as the collection of exact sets with respect to R. By properties 2 and 4, for each X e IZ, there exist sets i{X),c{X) G S that satisfy, i{X) C X C c{X), (8.6) moreover,

• i{X) = sup{Y •

c{X) = inf{Z

eS-.YcXY eS:X

CZ},

meaning that i(X) is the largest in £ subset of X, and c{X) is the smallest in £ superset of X; these sets are counterparts of lower, respectively, upper approximations, BLOWX, B^P^X, B{X) = c{X) \ i{X) is the boundary region. 8.2.2 A logic for rough sets Intuitively, a rough set X induces a 3-valued logical structure: the case u G i{X) is interpreted as u with certainty in X (truth value 1), the case u ^U\X\^ interpreted as u with certainty not in X (truth value 0) whereas u G B{X) is interpreted as uncertain state (state of truth | ) . In the context of RZF, we define an intensional unary predicate logic RL. We consider a set Fred of unary predicates of the form 7(x), where the variable x runs over the set C/, that have denotations (either exact or rough) in the set C/, along with logical connectives N of negation and C of implication. Instead of the symbol C, in particular formulas, we will use the more familiar symbol =>. Intensions We define the intension / as a mapping from the family £ into the set {0,1, ^ } , where 0,1 denote logical values of falsity and truth, respectively, and \ denotes the uncertain state (in the Lukasiewicz 3-valued logic (see [2]) it is denoted 2, or ^) of neither being false nor being true. Formulas of rough logic RL will be evaluated as extensions at particular sets A e £. The construction of / , will be done in few steps. 1. First, given a predicate 7(0:), we consider its meaning, or denotation, [[7]] = {x e U : 7(x) holds true}. 2. We assume the following denotation rules: [[A^7]]=\[[7]]; [[Cl6]]=\Mu[[S]]. Given A e £, the extension / ^ of / at ^ is defined as follows,

(8.7)

122

Lech Polkowski

{

1 incase A C [[7]], 0 incase y l n [ [ 7 ] ] = 0 ,

(8.8)

^ otherwise. Remark We may notice that the condition yl C [[7]] is equivalent to the condition A C ^([[7]]), and, similarly, conditions A n [[7]] = 0 and A D c([[7]]) = 0 are equivalent. On the basis of (8.7), (8.8), we now establish truth tables for operators N, C. In Table 1, the first row represents truth states of 7 whereas the second row gives the corresponding truth states of Nj. In Table 2, the values of 1^(0^6) are given in the /^(7)-row and the /^(5)-column. Table 8.1. Truth table for N NpOl\

Table 8.2. Truth table for C CO 1±_ 0 111 1 0 1I 1 1 in case An [[7]] C [[S]]; ^, otherwise

Relations to Lukasiewicz logic

Table 8.3. Truth table for N in L3 lOi Np 0 1

Table 8.4. Truth table for C in L3 CO 0 1 11 1 0 1 1 11 2 2

H

8 Mereological Foundations to Approximate Reasoning

123

It is convenient to recall here the truth table (Tables 3 and 4) of 3-valued Lukasiewicz logic that we denote L3. A comparison of Tables 1, 3 and Tables 2, 4 shows that the only difference between L3 and rough logic RL is in treatment of implication between uncertain states of truth: whereas L3 assigns to this case value 1, rough logic discerns in this case between 1 and ^, assigning the former to the extension at yl € £ in case ^ n [[7]] C [[S]] only. Relations between the two logics may be expressed by means of the notion of an acceptable formula. We assume the ordering of states of truth: 0 < ^ < 1. Following a practice established in many-valued logic, we set a threshold, in our case ^, and we say that a formula in rough logic is acceptable if and only if its state of truth is at least ^ at every yl-extension. A formula 7 of rough logic will be called a theorem whenever its value is 1 at every yl-extension. Similarly, a formula 7 of rough logic will be called a theorem (respectively, acceptable) with respect to a family M of exact sets whenever its value is 1 at every yl-extension with A e M: I^{j) = 1 for every A e M (respectively, /^(7) > ^ for every A E A^). A theorem of I/3 is any formula whose value is 1 regardless of values of variables in it. Collapse is the operation that transforms a formula of rough logic RL of unary predicates into a formula in L3 by forgetting about variable x and disregarding parentheses related to usage of the variable. As a result, we have the same form of a formula in both logics. Then we have (see [13]): a formula 7 is acceptable in rough logic RL if the collapsed formula is a theorem of the logic L3. Similarly, one may verify (see op.cit.),that, for each theorem 7 of rough logic RL, the collapsed formula is a theorem of the 3-valued Lukasiewicz logic. For theorems in both logics, derivation rules, (Modus P o n e n s ) ^ ^ , (Modus Tollens) ^^^^^ are valid. Let us observe that Modus Ponens is not valid in case of acceptable formulas of rough logic whereas Modus Tollens is valid in that case (see op.cit.).

8,3 Rough Mereology We have seen in sect. 8.1.2 that mereology-based theory of sets is a proper theory of sets for rough set theory. In order to accomodate variants of rough set theory like Variable Precision Rough Sets (Ziarko, see [24]), and to account for modem paradigms like granular computing (Zadeh, see [5], [21]), or computing with words (Zadeh, [23], see [9]), it is suitable to extend the mereology based rough set theory by considering a more general relation of a part to degree.

124

Lech Polkowski

8.3.1 Set partial inclusions As a starting point, we can consider the theoretical proposition of Lukasiewicz [7] of endowing formulas of unary predicate calculus interpreted in a finite set U with partial degrees (states) of truth. Given, e.g., an implication p{x) => q{x), its degree of truth is given by the value of the fraction, \{u:p{u)Aq{u)}\ \{u:p{u)}\

^^^^

The formula (8.9) has been exploited in many contexts, e.g., in rough membership functions (Pawlak and Skowron, op.cit.), accuracy and coverage coefficients for decision rules (Tsumoto, [20]), association rules (Agrawal et al., [1]), variable precision rough sets (Ziarko, [24]), approximation spaces (Skowron and Stepaniuk, [19]). The following properties of a measure of partial containment, /i(rr, y) between two objects, X, y, defined according to (8.9) may be derived, (SIl)/x(x,x) = 1. (SI2) fi{x, y) = 1 if and only ifXCY, (SB) if /i(a:, y) = 1 then fi{z, x) < fi{z^ y) for each non-empty set z. We will call set partial inclusions functions defined on pairs of non-empty sets and satisfying (SIl-3). In general, measures fi will be called rough inclusions, see [17]. 8.3.2 Rough inclusions We consider a universe U of non-empty objects along with a mereological relation TT of a part, inducing the mereological element relation el-j^. A rough inclusion, is a relation fx^^ CU x U x [0,1] that satisfies the following requirements, (RIl) jU7r(x, X, 1) for each x eU. (RI2) /XTT(x, 2/, 1) if and only if xel^^y for each pair x, y of elements of U. (RI3) if /i7r(^5 y? 1) then for each z e U, and each r G [0,1], the implication holds: if f^^{z, X, r) then fi^{z, y, r). (RI4) if/XTTC^J 2/5 '^) and s r, cf also [12]. It seems that (RIl-4) is a collection of general properties of rough inclusions that sums up properties of partial containment. Neither symmetry nor transitivity hold in general for rough inclusions, as borne out by simple examples.

8 Mereological Foundations to Approximate Reasoning

125

8.3.3 Rough inclusions in information systems We would like to begin with single elements of the universe U of an information system (U^A), on which to define rough inclusions. First, we want to address the problem of transitive rough inclusions. We recall that a t-norm T^ is archimedean if in addition it is continuous and T{x,x) < X for each x € (0,1). It is well known (Ling, see [6], cf. [15]) that any archimedean t-norm T, can be represented in the form, T{x,y)=9[f{x) + f{y)),

(8.10)

where / : [0,1] -^ [0,1] is continuous decreasing and g is the pseudo-inverse to /^ We will consider the quotient set UIMD = U/IND{A), and we define attributes on UjMD by means of the formula, a(M/iVD(.4)) = «(^)For each pair x, y of elements of UIND, we define the discemibility set DIS{x, y) {a£ A: a(x) j^ a{y)} C A. For an archimedean t-norm, T, we define a relation ^T by letting,

M^,J/,r)^5(l^^^M)>,.

(8.11)

Then, /XT is a rough inclusion that satisfies the transitivity rule, see [14],

f^T{x,y,r),/j.T{y,z,s)

(8.12)

fZT{x,z,T{r,s))

Particular examples of rough inclusions are the Menger rough inclusion, (MRI, in short) and the Lukasiewicz rough inclusion (LRI, in short), corresponding, respectively, to the product t-norm TM{x,y) = x - y, and the Lukasiewicz product TL{X, y) = max{0, x-\-y -l). The Menger Rough inclusion For the t-norm T/v/, the generating function f{x) = —Inx whereas g{y) = e~^ is the pseudo-inverse to / . The rough inclusion ^TM ^^ given by the formula, \DIS{x,y)\

fj.TM{^,y.r)^e

1^1

>r.

(8.13)

^i.e., a map T : [0,1]^ -^ [0,1] that is symmetric, associative, increasing and satisfies r(a:,0) = 0. ^This means that g(x) = 1 for rr G [0, / ( I ) ] , g{x) = 0 for x € [/(O), 1], and g{x) = /-^(x)forx€[/(l),/(0)].

126

Lech Polkowski

The Lukasiewicz rough inclusion For t-norm TL, the generating function f{x) inverse to / . Therefore,

= I — x and g = f is the pseudo-

,,^i:,,y,r)^l-\£l^>r.

(8.14)

Let us observe that rough inclusions based on sets DIS are necessarily symmetric. Table 8.5. The information system A U ai a2 Cis a4 a:i 1 1 1 2 X2 X3 X4 X5 X6 X7 X8

1 2 3 3 3 1 2

0 0 2 1 2 2 0

1 0 1 1 1 0 1 0 1 2 01 02

For the information system A in Table 5, we calculate values of LRI, shown in Table 6; as //TL is symmetric, we show only the upper triangle of values. Table 8.6. /x^ for Table 5 U

Xl X2

Xs

X4 Xs

XQ XJ XS

xi 1 X2 X3X4 -

0.5 0.25 0.25 0.5 0.5 0.25 0.25 1 0.5 0.5 0.5 0.25 0.25 0.25 - 1 0.25 0.25 0.25 0.25 0.5 - - 1 0.75 0.75 0.25 0

X5-

-

-

-

1

0.5

0 0

X6- - X7 - - -

- - 1 0.25 0.25 - - - 1 0.25

X8 -

-

-

-

-

-

-

1

Rough inclusions over relational information systems In some applications, a need may arise, to stratify objects more subtly than it is secured by sets DIS. A particular answer to this need can be provided by a relational information system by which we mean a system {U, A , R), where R — {Ra ' CL G A} with Ra ^Va xVa^ relation in the value set Va.

8 Mereological Foundations to Approximate Reasoning

127

A modified set DIS^{x,y) is defined as follows; DIS^{x,y) = {a e A : Ra{ci{^),o,{y))}' Then, for any archimedean t-norm T, and non-reflexive, nonsymmetric, transitive, and linear, relation R, we define the rough inclusion /x^ by the modified formula,

^,^i:c,y,r)^gi\^l^^^>r,

(8.15)

where g is the pseudo-inverse to / in the representation r ( r , s) = g{f{r) -f f{s)); clearly, the notion of a part is here: xn^y if and only \i x ^ y and Ra{a{y), a{x)) for each a e A. Let us look at values of /x^ in Table 7 for the information system in Table 5 with value sets ordered linearly as a subset of real numbers. Table 8.7. I^^TL for Table 5 U

X\

XI X2 X3 X4 X5 X6 X7 X8

1 1 0.75 0.5 0.5 0.5 0.75 0.75 0.5 1 0.5 0.5 0.5 0.25 0.5 0.5 0.5 1 1 0.5 0.5 0.25 0.75 0.75 0.75 1 0.75 1 1 0.75 0.75 0.75 0.75 1 0.75 0.75 1 0.5 0.5 0.75 1 1 1 1 1 1 1 1 0.5 0.75 0.5 0.5 0.5 0.25 1 0.5 0.5 0.75 0.75 0.25 0.25 0.25 0.75 1

X2

X^ X4 X5

X6

Xj

Xs

As expected, the modified rough inclusion is non-symmetric. We now discuss problems of granulation of knowledge, showing an approach to them by means of rough inclusions.

8.4 Rough Mereological Granule Calculus Granular computing paradigm proposed by Lotfi Zadeh, is based on the idea of making entities into granules of entities and performing calculations on granules in order to reduce computational cost of approximate problem solving. Here we propose a general scheme for granular computing based on rough mereology. We assume a rough inclusion /i^ on a mereological universe {U, el^r) with a part relation TT. For given r < 1 and x E C/, we let, Qrix) = Cls{%),

(8.16)

%iy)^f^^{y,x,r).

(8.17)

where The class gr{x) collects all atomic objects satisfying the class definition with the concept iZv.

128

Lech Polkowski

We will call the class gr{x) the r-granule about x; it may be interpreted as a neighborhood of x of radius r. We may also regard the formula yyirX as stating similarity oiyiox (to degree r). We do not discuss here the problem of representation of granules; in general, one may apply sets or lists as the underlying representation structure. The following are general properties of the granule operator gr induced by a rough inclusion /i^r, see [14]. 1. 2. 3. 4. 5.

\ifi^{y,x,r)i\iQnyel^gr{x). if /^TT(^, y-t ^) A yelT^z then xel^^gr (z). \/z.[zelT^y => 3w, q.{welT^z A WCIT^Q A finiQ^ ^j ^)] => yel-Kgri^)if yelT^grix) A zel^ry then zel^^grix). if 5 < r then gr{'^)elT^gs{x).

8.4.1 Granulation via archimedean t~norm based rough inclusions For an archimedean t-norm T — induced rough inclusion /i^, we have a more detailed result, viz., the equivalence, 6. for each x, y G UjNDy ^^lirgriy) if and only if fj^rix, y, ^)We consider the information system of Table 5 along with values of rough inclusions fiTL^I^TL Siv^^» respectively, in Tables 6 and 7. Admitting r = .5, we list below granules of radii .5 about objects xi — xg in both cases. We denote with the symbol gi^gf, respectively, the granule go.bixi) defined by ^J'TL^^^TL' respectively, presenting them as sets. We have, 1- gi =

2. g2 =

{XI,X2,X5,XG},

{xi,X2,Xs,X4,Xs},

3. gs = { X 2 , X 3 , X 8 } ,

4. g4 = {X2,X4,X5,X6}, 5. g5 = {xi,X2,X4,X5,X6}, 6. g6 =

{XI,X4,X^,XG},

7. gj = {xj}, 8. ^8 = {xs.xs}, what provides an intricate relationship among granules: i^g^^g^ Q gs, gs Q g2, ^2, g5 incomparable by inclusion, gr isolated. We may contrast this picture with that for fXj,^. 2. g^ = 9^ =97=U\ 3- 9s =

{xe},

{xi,X2,X3,X7,Xs},

providing a nested sequence of three distinct granules.

8 Mereological Foundations to Approximate Reasoning

129

8.4.2 Extending rough inclusions to granules We now extend /XTT over pairs of the form x, g, where x G Ujjsfo, 9 a granule. We define /x^r in this case as follows, fi^{x,g,r)

<^3y e UiND^yel^g and iJ.^{x,y,r).

(8.18)

The notion of element el^ on pairs x, g follows by treating p as a class of elements under element relation el^, i.e., we admit that, xelng if and only if for each element z of x, there exist elements w,t such that WEIT^Z, welT^q, and g{q). By g{q) true, we mean that q has the property defining g. Clearly, the extended relation e/^ is transitive. The relation //TT on pairs x,y or x^g, where x^y £ UjjsfD, and g a concept, is a rough inclusion. Finally, we define ^^r on pairs of the form g,h, of granules by means of the formula, f^nig, h, r) ^ \/xel^g3y.yel^h and /i^(x, y, r). (8.19) The corresponding notion of element extended to pairs of concepts g, h will be defined as follows, gelT^h if and only if for each element xel-j^g there exist elements w, y with wel-j^x, welT^y and yelT^h. The extended most general form of /u is a rough inclusion. Extended archimedean rough inclusions For a rough inclusion fir based on an archimedean t-norm T, the extended rough inclusion satisfies the generalized transitivity rule, liT{k,g,r),^iT{g,h,s) ^T{Kh,T{r,s))

'

^^-^^^

We also notice a property of granules based on /^T, xelT,gr{y) =^ 9s{x)el^gT(r,s){y)-

(8.21)

8.4.3 Approximations by granules Given a granule g, and r, 5 e (0,1), we can define, by means of the class operator, approximations to g by granules as follows, subject to the restriction that classes in question are non-empty. The /x, r-lower approximation to ^ by a collection H = {h} of granules is the class.

130

Lech Polkowski

ClsLOwifJ', H, r) = Cls^{ii, g, r, iJ), where ^{ji, g, r, H){h) holds if and only if fi{h, g^ r) and h e H hold. Similarly, the /x, s-upper approximation to p by a collection H of granules is the class, Cls^PP{lJL, H, r) = Cls^ifi, g, r, H), where ^()Li, p, r, if )(/i) holds if and only if fi{h, g,t) -^ t < s and h e H hold. Taking set inclusion /i in the above definitions, we obtain approximations in the Variable Precision Rough Set Model (Ziarko, op.cit.).

8.5 Spatial Reasoning Based on Rough Mereology The scheme presented above may be exploited as a basis for spatial reasoning. We present a sketch of this approach. This approach is based on the functor C of being connected that satisfies the following, (CI) xCx\ meaning reflexivity of C. (C2) xCy ==> yCx\ meaning symmetry. (C3) \iz.{zCx ^^=> zCy)] ==> {x = y)\ meaning extensionality."* In terms of connections, schemes for spatial reasoning are constructed, see [3]. 8.5.1 Connections from rough inclusions In this section we investigate some methods for inducing connections from rough inclusions /x = /XTT, see [16]. Limit connection We define a functor CT as follows, xCry ^=^ - ( 3 r , 5 < l.ext{gr{x),gs{y))),

(8.22)

where ext{x^ y) is satisfied in case x, y have no common parts. Clearly, (C1-C2) hold with CT irrespective of a rough inclusion fi applied . The status of (C3) depends on /i. In case x ^ y^v/e have, e.g., zelx and ext{z, y) for some z. Clearly, CT{Z, X)\ to prove -I(CT(>2^, t/)), we add a new property of /x: (RM5) ext{x,y) ==:^ 3s < l.Vt > 5.-i[/i(x,y,t)]. Under (RM5), CT induced via // does satisfy (C3), i.e. it is a connection. 8.5.2 From Graded Connections to Connections We begin with a definition of an individual BdrX. BdrX ~ CIST^{II'^{X)), where/i;^(x)(2) 4=^ ^{z,x,r) A -i(3s > We introduce a graded (r, s)-connection C{r, s) (r, s
r.fi{z,x,s)).

8 Mereological Foundations to Approximate Reasoning xC{r, s)y <^==^ 3w.welTrBdrX A welT^{Bdsy).

131 (8.23)

We have then (i) xC(l, l)x\ (ii) xC(r, s)y = > yC{s, r)x. Concerning the property (C3), we adopt here a new approach. It is valid from theoretical point of view to assume that we may have "infinitesimal" parts i.e. objects as "small" as desired.^ Infinitesimal parts model We adopt a new axiom of infinitesimal parts (IP) -^{xelj^y) ==^ yr > O.Bz.zel^^x^ s <

r.zfif{y).

Our rendering of the property (C3) under (IP) is as follows: (C3)/p -^{xel^y) = > Vr > 0.3^, s > r.zC{l, l)x A zC{l, s)y. Connections from Graded Connections Our notion of a connection will depend on a threshold, a, set according to the needs of the context of reasoning. Given 0 < a < 1, we define a functor Ca as follows, xCay -^^ 3r, s > a.xC{r, s)y.

(8.24)

Then the functor C^ has all the properties (C1)-(C3) of a connection, see [16].

References 1. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, and Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining. AAAI Press. 2. Borkowski L (ed)(1970) Jan Lukasiewicz. Selected Works. North Holland - Polish Sci. Publ., Amsterdam - Warsaw 3. Cohn AG, Varzi A (1998) Connections relations in mereotopology. In: Prade H (ed) Proceedings ECAI 98, 13th European Conference on Artificial Intelligence. Wiley, Chichester 4. Lesniewski S (1982) On the foundations of mathematics. Topoi 2: 7-52 5. Lin TY, Yao YY, Zadeh LA (eds) (2001) Rough Sets, Granular Computing and Data Mining. Physica, Heidelberg 6. Ling CH (1965) Representation of associative functions. Publ. Math. Debrecen 12 : 189212 7. Lukasiewicz J (1913) Die Logischen Grundlagen der Wahrscheinlichtkeitsrechnung. Krakow (see [2]) ^Cf. an analogous assumption in mereology based on connection [8]).

132

Lech Polkowski

8. Masolo C, Vieu L (1993) Atomicity vs. infinite divisibility. In: Freksa C, Mark DM (eds) Spatial Information Theory, LNCS vol. 692. Springer, Berlin 9. Pal SK, Polkowski L, Skowron A (eds) (2004) Rough-neural Computing. Techniques for Computing with Words. Springer, Berlin 10. Pawlak Z (1992) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht 11. PawlakZ (1982)Rough sets.Intem. J. Comp. Inf. Sci.ll : 341-356 12. Pawlak Z, Skowron A (1994) Rough membership functions. In: Yager RR, Fedrizzi M, Kacprzyk J (eds) Advances in the Dempster-Schafer Theory of Evidence. Wiley, New York 13. Polkowski L (2004) A note on 3-valued rough logic accepting decision rules.Fundamenta Informaticae, 61(1): 3 7 ^ 5 14. Polkowski L (2003)Rough mereology: A rough set paradigm for unifying rough set theory and fuzzy set theory. Fundamenta Informaticae 54(1): 67-88 15. Polkowski L (2002) Rough Sets. Mathematical Foundations. Physica, Heidelberg 16. Polkowski L (2001) On connection synthesis via rough mereology. Fundamenta Informaticae 46 (1/2): 83-96 17. Polkowski L, Skowron A (1996) Rough mereology: a new paradigm for approximate reasoning. International Journal of Approximate Reasoning 15(4): 333-365 18. Read S (1995)Thinking about Logic: An Introduction to the Philosophy of Logic. Oxford U.R 19. Skowron A, Stepaniuk J (2001) Information granules: Towards foundations of granular computing. International Journal for Intelligent Systems 16: 57-85 20. Tsumoto S (1998) Automated induction of medical system expert rules from clinical databases based on rough set theory. Information Sciences 112: 67-84 21. Yao YY (2004)Information granulation and approximation. In: [9]. 22. Zadeh LA (1965) Fuzzy sets. Information and Control 8: 338-353 23. Zadeh LA, Kacprzyk J (eds) (1999) Computing with Words in Information/Intelligent Systems 1. Physica, Heidelberg 24. Ziarko W (1993) Variable precision rough set model. J. Computer and System Sciences 46:39-59

Data Security and Null Value Imputation in Distributed Information Systems Zbigniew W. Ras^'^ and Agnieszka Dardziiiska^ ^ UNC-Charlotte, Department of Computer Science, Charlotte, N.C. 28223, USA ^ Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland ^ Bialystok Technical Univ., Dept. of Mathematics, ul. Wiejska45A, 15-351 Bialystok, Poland Summary. Distributed Information System (DIS) is seen as a collection of autonomous information systems which can collaborate with each other. This collaboration can be driven by requests for knowledge needed to predict what values should replace null values in missing or incomplete attributes. Any incompleteness in data can be seen either as the result of a partial knowledge about properties of objects stored in DIS or some attributes might be just hidden from users because of the security reason. Clearly, in the second case, we have to be certain that the missing values can not be predicted from the available data by chase, distributed chase or any other null value imputation method. Let us assume that an attributes d is hidden at one of the sites of DIS, denoted by S and called a client. With a goal to reconstruct this hidden attribute, a request for a definition of this attribute can be sent by S to some of its remote sites (see [15]). These definitions stored in a knowledge-base KB can be used by Chase algorithm (see [4], [6]) to impute missing attribute values describing objects in S. In this paper we show how to identify these objects and what additional values in S have to be hidden from users to guarantee that initially hidden attribute values in S can not be properly predicted by Distributed Chase.

9.1 Introduction Distributed information system is a system that connects a number of information systems using network communication technology. In this paper, we assume that these systems are autonomous and incomplete. Incompleteness is understood by allowing to have a set of weighted attribute values as a value of an attribute. Additionally, we assume that the sum of these weights has to be equal 1. The definition of an information system of type A and Distributed Information System {DIS) proposed in this paper is a modification of definitions given by Ras in [14] and used later by Ras and Dardzinska in [15] to talk about semantic inconsistencies among sites of DIS from the query answering point of view. The type A is introduced mainly to monitor if weights assigned to values of attributes by Chase algorithm are greater than or equal to A. If the weight assigned by Chase to one of the attribute values

134

Zbigniew W. Ras and Agnieszka Dardzinska

is less than the allowed threshold value, then this attribute value has to be ruled out. Semantic inconsistencies are due to different interpretations of attributes and their values among sites (for instance one site can interpret the concept young differently than other sites). Different interpretations are also due to the way each site is handling null values. Null value replacement by a value suggested either by statistical or some rule-based methods is quite common before a query is answered by QAS. Ontology ([10], [11], [18], [19], [20], [2], [3], [21], [7]) is a set of terms of a particular information domain and the relationships among them. Currently, there is a great deal of interest in the development of ontology to facilitate knowledge sharing in general and information systems integration in particular. Ontologies and interontology relationships among them are created by experts in corresponding domain, but they can also represent a particular point of view of the global information system by describing customized domains. Ontologies can be expressed, for instance, using statements in description logics. These descriptions are organized as a lattice and may be considered as semantically rich metadata that capture the information content of underlying data repositories. As ontologies are abstractions, they can describe almost any kind of data format. Also, to allow an intelligent query processing, we assume that any information system in DIS is described by one or more ontologies. Inter-ontology relationships can be seen as a semantical bridge between autonomous information systems so they can collaborate and understand each other. In [15], the notion of the optimal rough semantics and a method of its construction was proposed. The rough semantics can be used to model and nicely handle semantic inconsistencies among sites due to different interpretations of incomplete values. Distributed chase is a chase algorithm [1] linked with a site S of DIS, called a client, which is similar to Chasel [6] and Chase2 [4] with additional assumption concerning the creation of knowledge bases at all sites of DIS involved in the process of solving a query submitted to 5. The knowledge base at the client site contains rules extracted from S and also rules extracted from information systems at its remote sites. The structure of the knowledge base and its properties are the same as properties of the knowledge bases used in Chasel or Chase2 algorithms. The difference only lies in the process required to collect these rules. In a distributed framework, these rules are extracted from the local and remote sites, usually under different semantics. Although the names of the attributes are often the same among sites, their granularity levels may differ from site to site. As the result of these differences, the knowledge base has to satisfy certain properties in order to be used by Chase, The same properties are required by the query answering system based on Chase. In this paper, we mainly concentrate on the problem of reconstructing values of attributes which are hidden from users for either some or all objects stored at one of the sites of S called a client. The knowledge related to hidden attributes in S can be extracted at remote sites for S which means that hidden values for some objects in S might be reconstructed even with a high degree of certainty. To avoid this, system S has to be made more incomplete. The goal of this paper is to propose a strategy for identifying the minimal number of values in S which additionally have to be hidden from users to guarantee that hidden attribute values in S can not be reconstructed by chase or distributed chase.

9 Data Security and Null Value Imputation in DIS

135

9.2 Query Processing with Incomplete Data In real life, data are often collected and stored in information systems residing at many different locations, built independently, instead of collecting them and storing at only one single location. In such cases we talk about distributed (autonomous) information systems. It is very possible that an attribute is missing or hidden in one of them while it occurs in many others. Also, in one information system, an attribute might be partially hidden, while in other systems the same attribute is either complete or close to being complete. Assume that user submits a query to one of the information systems (called a client) which involves some hidden or non-local attributes. In such a case, network communication technology is used to get definitions of these unknown or hidden attributes from other information systems (called servers). All these new definitions form a knowledge base which can be used to chase both missing and hidden attributes at the client site. But, before any chase-based algorithm can be applied, semantic inconsistencies among sites have to be resolved first. For instance, it can be done by taking rough semantics [Ras & Dardzinska], mentioned earlier. Otherwise, an inter-ontology relationship between local ontologies associated with two involved information systems has to be provided. Definition 1: We say that S = (X, A, V) is a partially incomplete information system of type A, if S is an incomplete information system and the following three conditions hold: •

as{x) is defined for any a: G X, a G A,

•

(Vx G X)(Va G A)[{as[x) = {(a,,p,) : 1 < i < m})-^

•

(Vx G X){Wa G A)[{as{x) = {(a,,p,) : 1 < i < m}) ^ {\/i){pi > A)].

YZiPi

= ^l

Now, let us assume that 5i, 52 are partially incomplete information systems, both of type A. The same set X of objects is stored in both systems and the same set A of attributes is used to describe them. The meaning and granularity of values of attributes from A in both systems Si, S2 are also the same. Additionally, we assume thata5,(x) = {{aii,pii) : 1 < m i } Sindas^ix) = {(a2i,P2i) : 1 < ^ 2 } . We say that <5-containment relation ^ holds between ^i and S2, if the following three conditions hold: •

(Vx G X)(Va G A)[card{as^{x)) > card{as^{x))],

•

(Vx G X){\/a G A)[[card{as^{x)) = card{as^{x))] -^ [£i^j \P2i-P2j\ > iZi^j \Pli-V\3\]]' [Ei#j IP2i - P231 - Yli^j \Pii - Pij II] > ^•

•

Instead of saying that 5-containment relation holds between Si and ^2, we can equivalentiy say that Si was transformed into ^2 by (5-containment mapping ^. This fact can be presented as a statement ^ ( 5 i ) = ^2 or (Vx G X)(Va G

136

Zbigniew W. Ras and Agnieszka Dardzinska

A)[^{asi{x)) = ^{as2{^))]' Similarly, we can either say that as^{x) was transformed into as2 (x) by ^ or that (5-containment relation ^ holds between as^ {x) and as^ix). So, if 5-containment mapping ^ converts an information system 5 to 5', then 5 ' is more complete than S. Saying another words, for a minimum one pair (a, x) e A X X, either ^ has to decrease the number of attribute values in as{x) or the average difference between confidences assigned to attribute values in as{x) has to be increased minimum by S. To give an example of a (^-containment mapping ^, let us take two information systems Si, S2 both of the type A, represented as Table 9.1 and Table 9.2. Also, we assume that 5 = ^. Table 9.1. Information System Si

X

a

b

c

d

Xi

{(ai. l),(«2,i)}

{(^1:>l)db2.. >!)}

Cl

di

{(ei,.i),(e2,i)}

X2

{(^2, i),(«3,|)}

{(^1:.5)'(^2,.1)} 62

X3

c

d2

ei

{(ci,i),(C3,i)}

d2

63

C2

di

X4

as

X5

{(«i, i),(a2,i)}

61

C2

xe

^2

62

C3

d2

{(62,

X7

^2

{(^1,.i),(^2,>!)}

{(ci,i),(C2,i)}

d2

62

62

Cl

di

63

X8

{(ei, i).(e2,|)} ei

|),(e3,|)}

Table 9.2. Information System ^2 X

a

b

c

d

e

XI

{(ai,i),(a2,i)}

{(6i,i),(62,i)}

ci

di

{(ei, | ) , (62, f ) }

X2

{(a2a).(«3,f)} ^1

{(ci,|),(c2,i)}

ci2 ei

X3

ai

{(ci,i),(c3,|)}

c?2

X4

as

C2

X5

{(«!,!), («2,i)}

61

C2

Xe

a2

^2

C3

C?2 { ( e 2 , | ) , ( e 3 , | ) }

X7

a2

{(^1,1), (^2, I ) }

Cl

d2 62

X8

{(^1, | ) , ( a 2 , | ) }

&2

Cl

(ii

^2

ea 62 ei

63

9 Data Security and Null Value Imputation in DIS

137

It can be easily checked that the values assigned to e{xi), b{x2), c{x2), a{xs), e(x4), a{xs), c{x7), and a{xs) in 5i are different than the corresponding values in 52. In each of these eight cases, an attribute value assigned to an object in 52 is less general than the value assigned to the same object in Si. Also, it can be easily checked that ^ satisfies S restriction. It means that ^{Si) = 52.

9.3 Query Processing with Distributed Data and Chase Assume now that L{D) = {{t -^ Vc) e D : c e In{A)} (called a knowledge-base) is a set of all rules extracted from 5 = (X, A, V) by ERID{S, Ai, A2), where In{A) is the set of incomplete attributes in 5 and Ai, A2 are thresholds for minimum support and minimum confidence, correspondingly. ERID is the algorithm for discovering rules from incomplete information systems, presented by Dardzinska and Ras in [5] and used as a part of Chase algorithm in [16]. The type of incompleteness in [16] is the same as in this paper. Assume now that a query q{B) is submitted to system 5 = (X, A, V), where B is the set of all attributes used in g(B) and that AnB ^^. Attributes in JB — [^ n B] are called either foreign or hidden in 5. If 5 is a part of a distributed information system, definitions of such attributes can be extracted at remote sites for 5 (see [15]). Clearly, all semantic inconsistencies and differences in granularity of attribute values among sites have to be resolved first. To simplify the problem, we adopt the same assumption as in [15]. It means that different granularity of attribute values and different interpretation of incomplete attribute values are only allowed among sites. It was shown in [15] that to process a query of type q{B) at site 5, we can discover definitions of values of attributes from B — [AoB] at the remote sites for 5 and next use them to answer q{B). Hidden attributes for 5, can be seen as attributes entirely incomplete in 5, which means values (either exact or partially incomplete) of such attributes have to be ascribed to all objects in 5. Stronger the consensus among sites on a value to be ascribed to X, better the result of the ascription process for x can be expected in most of the cases. The question remains, whether the values predicted by the imputation process are correct. Possible approach, to this type of problems, is to start with a complete information system and remove randomly from it, let's say, 10 percent of its values and next run the imputation algorithm on the resulting system. The next step is to compare the descriptions of objects in the system which is the outcome of the imputation algorithm with descriptions of the same objects in the original system. Clearly, hidden attribute values are known to some of the users. So, we can run the imputation algorithm for all hidden attributes and compare the results with values known to be correct. Descriptions of objects for which hidden attribute values are predicted reasonably well should be made more incomplete. Before we continue this discussion, we have to decide first on the interpretation of functors or and and, denoted in this paper by + and *, correspondingly. We will adopt the semantics of tenns proposed by Ras & Joshi in [17] as their semantics has all the properties required for the query transformation process to be sound and complete [see

138

Zbigniew W. Ras and Agnieszka Dardziriska

[17]]. It was shown that their semantics satisfies the following distributive property: tl * (^2 + ^3) = (^1 * ^2) -f (^1 * ^3).

Let us assume that S = (X, A^ V) is an information system of type A and t is a term in predicate calculus constructed, in a standard way, from values of attributes in V seen as constants and from two functors + and *. By Ns{t), we mean the standard interpretation of a term t in 5 defined as (see [17]):, • • •

Ns{v) = {{oc,p) : {v,p) e a{x)}, for any v e K , Ns{ti+t2)=Ns{ti)®Ns{t2), Ns{ti*t2) = Ns{ti)^Ns{t2),

where, for any Nsih) • •

= {ixi,pi)}i^i,

iV^fe) = {{xj^Qj)}jeJ^ we have:

Ns{ti) e Ns{t2) = {{xi,Pi)}ie{i-J) ^ {{^3^Pj)}je{J-i) Nsih) 0 Ns{t2) = {{xi,Pi ' qi)}ieiinJ)'

U

{ixi^rnax{pi,qi))}ieinj,

The incomplete value imputation algorithm Chase (see [16]), based on the above semantics, converts information system S of type A to a new more complete information system Chase{S) of the same type. This algorithm assumes partial incompleteness of data (sets of weighted attribute values can be assigned to an object as its value) in system S. Rules discovery system ERID (see [4]) was used to extract rules from this type of incomplete data set and next applied in Chase algorithm. Now, let us assume that a partially incomplete information system S of type A is used to store descriptions of objects. When a query asking for objects in S, satisfying some hidden property, is submitted to 5, its query answering system QAS will replace S by Chase{S) and next will solve the query using, for instance, the strategy proposed Ras & Joshi in [17]. Clearly, we have to make sure that our system is secure and objects in S which satisfy this hidden property can not be retrieved by QAS.

9.4 Distribution, Inconsistency, and Distributed Chase As we already pointed out, the knowledge base L{D), contains rules extracted locally at the client site (information system queried by user) as well as rules extracted from information systems at its remote sites. Since rules are extracted from different information systems, inconsistencies in semantics, if any, have to be resolved before any query can be processed. There are two options: •

a knowledge base L{D) at the client site is kept consistent (in this scenario all inconsistencies have to be resolved before rules are stored in the knowledge base),

•

a knowledge base at the client site is inconsistent (values of the same attribute used in two rules extracted at different sites may be of different granularity levels and may have different semantics associated with them).

9 Data Security and Null Value Imputation in DIS

139

In general, we assume that the information stored in ontologies and, if needed, in inter-ontologies (if they are provided) is sufficient to resolve inconsistencies in semantics of all sites involved in Chase. Inconsistencies related to the confidence of conflicting rules stored in L{D) do not have to be resolved at all (algorithm Chase does not have such a requirement).

•^3 1 g [q b\ cl

qs2 •4

Si\b\a

Tpl

^ ZlJ

KB

! 1

1 1 11 1 KB

fsf rt ~ extracted from S\

Fig. 9.1. Global extraction and exchange of knowledge

The fact, that rules stored in L{D) can be extracted at different sites and under different interpretations of incomplete values, is not pleasant assuming that we need to use them in Chase. In all such cases, following the same approach as in [15], rough semantics can be used for interpreting rules in L{D). One of the problems related to an incomplete information system S = (X, A, V) is the freedom how new values are constructed to replace incomplete values in 5, be-

140

Zbigniew W. Ras and Agnieszka Dardzinska

fore any rule extraction process begins. This replacement of incomplete attribute values can be done either by Chase or/and by a number of available statistical methods (see [9]). This implies that semantics of queries submitted to S and queries processed by the query answering system QAS based on Chase, may often differ. In such cases, following again the approach in [15], rough semantics can be used by QAS to handle this problem. In this paper we only concentrate on granularity-based semantic inconsistencies. Assume first that Si — {Xi,Ai,Vi) is an information system for any i G / and that all of them form a Distributed Information System (DIS). Additionally, we assume that, if a e AiH Aj, then only the granularity levels of a in Si and Sj may differ but conceptually its meaning, both in Si and Sj is the same. Assume now that D = Ui^j L{Di) is a set of rules which can be used by Chase algorithm, associated with any of the sites of DIS, and L{Di) contains rules extracted from S^. Now, let us say that system Sk,kelis queried be a user. Chase algorithm, to be applicable to Sk, has to be based on rules from D which satisfy the following conditions: •

•

•

attribute value used in the decision part of a rule from D has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Sk. the granularity level of any attribute used in the classification part of a rule from D is either equal or softer than the granularity level of the corresponding attribute inSk, attribute used in the decision part of a rule from D either does not belong to A^ or is incomplete in 5^.

These three conditions are called distributed Chase 5A;-applicability conditions. Let Lk{D) denotes the subset of rules in D satisfying these three Chase Skapplicability conditions. Assuming now that a match between the attribute value used in the description of the tuple t and the attribute value used in a description of a rule s -^ d £ Lk{D) is found, the following two cases should be considered: •

•

an attribute involved in matching is the decision attribute m s -^ d. If two attribute values, involved in that match, have different granularity, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in Sk. an attribute a involved in matching is the classification attribute ins -^ d. If two attribute values, involved in that match, have different granularity, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in S^.

The new set of rules constructed from Lk(D), following the above two steps, is called granularity-repaired set of rules. So, the assumption that Lk{D) satisfies distributed Chase 5^-applicability conditions is sufficient to run Chase successfully on Sk using this new granularity-repaired set of rules. In Figure 9.1, we present two consecutive states of a distributed information system consisting of Si, S2, S3.

9 Data Security and Null Value Imputation in DIS

141

In the first state, all values of all hidden attributes in all three information systems have to be identified. System ^i sends request qs^ to the other two information systems asking them for definitions of its hidden attributes. Similarly, system ^2 sends request qs2 to the other two information systems asking them for definitions of its hidden attributes. Now, system S^ sends request ^53 to the other two information systems also asking them for definitions of its hidden attributes. Next, rules describing the requested definitions are extracted from each of these three information systems and sent to the systems which requested them. It means, the set L{Di) is sent to 52 and 53, the set L{D2) is sent to Si and 53, and the set L{Ds) is sent to Si and 52. The second state of a distributed information system, presented in Figure 9.1, shows all three information systems with the corresponding L{Di) sets, i G {1,2,3}, all abbreviated as KB. Now, the Chase algorithm is run independently at each of our three sites. Resulting information systems are: Chase{Si), Chase{S2), and Chase{Sz). Now, the whole process is recursively repeated. It means, both hidden and incomplete attributes in all three new information systems are identified again. Next, each of these three systems is sending requests to the other two systems asking for definitions of its either hidden or incomplete attributes and when these definitions are received, they are stored in the corresponding KB sets. Now, Chase algorithm is run again at each of these three sites. The whole process is repeated till some fixed point is reached (no changes in attribute values assigned to objects are observed in all 3 systems). When this step is accomplished, a query containing some hidden attribute values can be submitted to any Si,i G {1,2,3} and processed in a standard way.

9.5 Distributed Chase and Security Problem of Hidden Attributes Assume now that an information system 5 = (X, A, V) is a part of DIS and attribute h e A has to be hidden. For that purpose, we construct 5^ = (X, A, V) to replace 5, where: • • •

as{x) = as^{x), for any a e A — {6}, x e X, bsf, (x) is undefined, for any x e X, bs{x) e Vb.

Users can submit queries to Sb and not to 5. What about the information system Chase{Sb)l How it differs from 5? Clearly, bs{x) can be equal to bchase{Sb)i^) for a number of objects in X. If this is the case, additional values of attributes for all these objects should be hidden. In this section, we show how to identify the minimal number of values which should be additionally hidden in 5^ to guarantee that values of attribute b can not be reconstructed by Chase for any x e X. We present our strategy using system 5 = (X, A, V) of type A = | from Table 9.1 as an example. We also assume that attribute d is hidden in 5. The corresponding system 5^ of type A = | is given as Table 9.3. Also, assume that the following rules have been extracted at the remote sites for Sd:

142

Zbigniew W. Ras and Agnieszka Dardzinska Table 9.3. Information System Sd

X

a

6

c

Xl

{(«i. l)-(«2,i)}

{(61.. 3 ) ' ( ^ ' 3 ) /

Cl

X2

{(«2,

ei

{(ci,,

|),(C3,

1)}

e3

C2

{(ei, f),(e2,|)}

i)-(a2,|)} h

C2

ei

0'2

h

C3

a2

{{bl: .i),(&2,f)}

{(ci,

b2

Cl

X4

as

X5

{(ai,

X6 X7

e {(^1,^5)'(e2,§)}

i),(a3,|)} {(bl: , | ) , ( b 2 , | ) } 62

X3

d

X8

ri = ^2 = ^3 = ^4 = ^5 = ^6 =

{(^2, I),(e3,f)} |),(e2,

i)}

62

ea

[a2 • 62 -^
Let us consider the first tuple in Sd. It supports rule r i , r2, r4, rs and, TQ. Rule ri supports value ^2 with weight [[L32| -. 1|3J]] - 3 - l = | . Rule r2 supports value di with weight [| • 1] • 2 • 1 = | . Rule r4ipportj supports value ^2 with weight [| • 1] • 3 • 1 = 1. Rule rs supports value di with weight [ | - | ] - 2 - l = | . Rule re supports value 0^2 with weight [| • 1] • 4 • 1 = | . So, ^ is the total support for value d2 whereas § is the total support for the value di. Because § • ^ < ^, then the value di is rule out and the same can not be predicted by Chase. Now, let us take tuple XQ. This tuple supports only two rules: ri and r^. Rule ri supports value d2 with weight [1 • 1] • 3 • 1 = 3. Rule rs supports value d2 with weight [1 • 1] • 3 • 1 = 3. It can be easily checked that by removing value 62 from the description of XQ we decrease the total support of value d2 to 3 but still keep the support of di equal to zero. Also, additional removal of C3 or a2 will not help, {cs, 02,62} is the smallest set which has to be removed from the description of XQ to guarantee that the value ^2 will be not assigned by Chase as the value of d for XQ. NOW, let us take tuple X7. This tuple supports three rules: r i , r2, and re. Rule ri supports value c?2 with weight [1 • | ] • 3 • 1 = | . Rule r2 supports value di with weight [1 • | ] • 2 • 1 = | . Rule re supports value ^2 with weight [1 • 1] • 4 • 1 = | . So, ^ is the total support for value ^2, whereas | is the total support for value di, which means that the value di is rule out. By removing the value 02 from the description of object xj, both values di, ^2 will be assigned as possible values of the attribute d for X7. Following similar strategy for the remaining objects, we get a new information system Sd represented by Table 9.4: Clearly, the hidden attribute d can not be reconstructed by distributed Chase from the available data in Sd, for any object x.

9 Data Security and Null Value Imputation in DIS

143

Table 9.4. Information System Sd X

a

b

c

^1

{(«i,|),(«2,|)}

{(6i,i),(62,|)}

ci

X2

{(a2,i),(a3,|)}

2^3 as

X5

{(ai,|),(a2,|)}

{(C1,|),(C3,|)} C2

&i

C2

X6

a:8

e {(ei,|),(e2,^)}

{(6i,|),(62,i)} 62

a:^4

d

63 {(ei,|),(e2,|)} ei {(e2,|),(e3,|)}

^2

ci

63

In general, for any tuple x, we identify all rules supported by that tuple. Next, on the basis of these rules, we calculate the total support for each value of the hidden attribute. These total supports are used to calculate the confidence in each of this values. If the confidence in any of them is below the threshold A, then such a value is ruled out. We need minimum two weighted values remaining if the correct value is one of them. This can be achieved by replacing some values in S^ by Null Values. The strategy outlined in this paper shows how to search for such minimal sets.

9.6 Security of Hidden Attributes and Testing In this section, we give more precise description of the algorithm for identifying the minimal number of cells in Sd which additionally have to be hidden from users in order to guarantee that attribute d cannot be reconstructed by them through Distributed Chase. Finally, we test that algorithm on data obtained from one of the insurance companies. Assume that KB contains rules extracted in DIS at server sites for Sd with a goal to reconstruct hidden attribute d in Sd. In this section, by d{x) we mean the value of d for x which is hidden in Sd. For each object x in Sd, we look first for all rules in KB supported by x. Several cases have to be considered: •

There is only one rule r = [t —> di] in KB supported by object x in 5^. If d{x) = di, then value di is predicted correctly by r. It means that minimum one of the attributes listed in t has to be additionally hidden for x in Sd- This attribute can be chosen randomly (its corresponding slot is denoted by hidl).

•

There is a set of rules {ri = [ti —> c?i],r2 = [^2 —> G?i],...,rfc = [tk —^ di]} in KB supported by x. If d{x) = G?I, then value di is predicted correctly by rules from {ri, r2,..., rfc}. It means that a set containing at least a minimal set of attributes covering all terms {^1,^2,...,^/.} has to be additionally hidden for x in

144

Zbigniew W. Ras and Agnieszka Dardzinska S (corresponding slots are denoted by hid2).

•

There is a set of rules {ri = [ti —> di],r2 = [^2 —> ^2], •••, ^fc = [h —^ dk]} in KB supported by x. Let [s^, Q] denotes support and confidence of rule r^, for i < k. Let Confsd {d'^ x, KB) denotes the confidence in attribute value d' € Vd for X in Sd driven by iiTB. It is defined as ^{si - Ci : [1 < i < k] A [d^ = di]}/ J2{^i ' Ci : 1 < i < k}.lf d{x) = dj and A is the threshold for minimal confidence in attribute values describing objects in Sd, then > A], we do - if ConfsAdj^x,KB) > A and {3d ^^ dj)[ConfsAd,x,KB) not have to hide any additional slots for x. -

if ConfsMji^^KB) > A and {Wd 7^ dj)[Confs^{d,x,KB) have to hide additional slots (denoted by hidS) for x.

< A], we

-

If Confs^ {dj, X, KB) < A and (3d 7^ dj) [Confs^ {d, x, KB) > A], we do not have to hide additional slots for x.

So, each slot asd{x) which has to be hidden is assigned to one of the 3 groups: hidl, hid2, hidS. To check what is the percentage of slots which have to be additionally hidden in Sd in order to guarantee that a randomly hidden attribute d can not be reconstructed by distributed Chase, we use sampling data table containing 10,000 objects described by 100 attributes. These objects are extracted randomly from a complete database describing customers of an insurance company. To build DIS environment as simple as possible (without problems related to handling different granularity and different semantics of attributes at different sites and without either using a global ontology or building inter-ontology bridges between local ontologies), this data table was randomly partitioned into 4 equal tables containing 2,500 tuples each. Next, from each of these tables 40 attributes (columns) have been randomly removed leaving 4 data tables of the size 2,500 x 60 each. One of these tables is called a client and the remaining 3 are called servers. All of them represent sites in DIS. Now, for all objects at the client site, we have hidden values of one of the attributes which was chosen randomly. This attribute is denoted by d. At each server site, if d is listed in its domain schema, we learn descriptions of d using See5 software (data are complete so for that purpose we do not have to use ERID). All these descriptions, in the form of rules, have been stored in KB of the client. Distributed Chase was applied to predict what is the real value of a hidden attribute for each object X at the client site. The threshold value A = 0.125 was used in our example. The number of additional slots required to be hidden: •

3176 slots of /izdl-type (2.117% of slots at client table)

•

811 slots of hid2-typQ (0.54% of slots at client table)

•

24 slots of hidS-type (0.016% of slots at client table)

9 Data Security and Null Value Imputation in DIS

145

It should be observed that the majority of slots which additionally are hidden at the client site are uniquely predicted {hidl and hide2 types) by rules in KB.

9.7 Conclusion Proposed strategy shows the steps one has to follow if he needs to identify additional slots in a data table which have to be jointly hidden with a chosen hidden attribute. Presented example gives also the percentage of additional slots which have to hidden in a data table at the client site to guarantee the security of a hidden attribute from the point of view of a distributed Chase. To improve our strategy, we can look for additional hidden slots taking into consideration their influence on predicting incorrect values for a chosen hidden attribute d. To be more precise, if there is a rule r = [t —> di] in KB supported by object X in Sd which identifies d{x) correctly, one of the attributes listed in t has to be additionally hidden for x in 5^. We can chose this attribute randomly but also we can identify which attribute used in t has the highest influence on predicting incorrect values for our hidden attribute. Similar strategy can be followed for slots of the type hid2 and hid^ in the data table at the client site.

References 1. Atzeni, P., DeAntonellis, V. (1992) Relational database theory, The Benjamin Cummings Publishing Company 2. Benjamins, V. R., Fensel, D., Perez, A. G. (1998) Knowledge management through ontologies, in Proceedings of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM-98), Basel, Switzerland. 3. Chandrasekaran, B., Josephson, J. R., Benjamins, V. R. (1998) The ontology of tasks and methods, in Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada 4. Dardzinska, A., Ras, Z.W. (2003) Rule-Based Chase Algorithm for Partially Incomplete Information Systems, in Proceedings of the Second International Workshop on Active Mining (AM'2003), Maebashi City, Japan, October, 2003, 42-51 5. Dardzinska, A., Ras, Z.W. (2003) On Rules Discovery from Incomplete Information Systems, in Proceedings of ICDM'03 Workshop on Foundations and New Directions of Data Mining, (Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau), Melbourne, Florida, IEEE Computer Society, 2003, 31-35 6. Dardzinska, A., Ras, Z.W. (2003) Chasing Unknown Values in Incomplete Information Systems, in Proceedings of ICDM'03 Workshop on Foundations and New Directions of Data Mining, (Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau), Melbourne, Florida, IEEE Computer Society, 2003, 24-30 7. Fensel, D., (1998), Ontologies: a silver bullet for knowledge management and electronic commerce. Springer-Verlag, 1998 8. Grzymala-Busse, J. (1997) A new version of the rule induction system LERS, in Fundamenta Informaticae, Vol. 31, No. 1, 27-39

146

Zbigniew W. Ras and Agnieszka Dardzinska

9. Giudici, P. (2003) Applied Data Mining, Statistical Methods for Business and Industry, Wiley, West Sussex, England 10. Guarino, N., ed. (1998) Formal Ontology in Information Systems, lOS Press, Amsterdam 11. Guarino, N., Giaretta, P. (1995) Ontologies and knowledge bases, towards a terminological clarification, in Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, lOS Press 12. Pawlak, Z. (1991) Rough sets-theoretical aspects of reasoning about data, Kluwer, Dordrecht 13. Pawlak, Z. (1991) Information systems - theoretical foundations, in Information Systems Journal, Vol. 6, 1981, 205-218 14. Ras, Z.W. (1994) Dictionaries in a distributed knowledge-based system, in Concurrent Engineering: Research and Applications, Conference Proceedings, Pittsburgh, Penn., Concurrent Technologies Corporation, pp. 383-390 15. Ras, Z.W., Dardzinska, A. (2004) Ontology Based Distributed Autonomous Knowledge Systems, in Information Systems International Journal, Elsevier, Vol. 29, No. 1, 2004, 47-58 16. Ras, Z.W., Dardzinska, A. (2004) Query Answering based on Collaboration and Chase, in the Proceedings of FQAS'04 Conference, Lyon, France, LNCS/LNAI, Springer-Verlag, 2004, will appear 17. Ras, Z.W., Joshi, S. Query approximate answering system for an incomplete DKBS, in Fundamenta Informaticae Journal, lOS Press, Vol. 30, No. 3/4, 1997, 313-324 18. Sowa, J.F. (2000a) Ontology, metadata, and semiotics, in B. Ganter Sz G. W. Mineau, eds.. Conceptual Structures: Logical, Linguistic, and Computational Issues, LNAI, No. 1867, Springer-Verlag, Berlin, 2000, pp. 55-81 19. Sowa, J.F. (2000b) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA. 20. Sowa, J.F. (1999a) Ontological categories, in L. Albertazzi, ed.. Shapes of Forms: From Gestalt Psychology and Phenomenology to Ontology and Mathematics, Kluwer Academic Publishers, Dordrecht, 1999, pp. 307-340. 21. Van Heijst, G., Schreiber, A., Wielinga, B. (1997) Using explicit ontologies in KBS development, in International Journal of Human and Computer Studies, Vol. 46, No. 2/3, 183-292.

10 Basic Principles and Foundations of Information Monitoring Systems Alexander Ryjov Chair of Mathematical Foundations of Intelligent Systems Department of Mechanics and Mathematics Lomonosov' Moscow State University, Moscow, Russia

ryj [email protected]

10.1 Introduction This article describes main ideas of Information Monitoring Systems (IMS) and applications of IMS in real-world problems. Information monitoring systems relate to a class of hierarchical fuzzy discrete dynamic systems. The theoretical base of such class of systems is made by the fuzzy sets theory, discrete mathematics, methods of the analysis of hierarchies which was developed independently in works of Zadeh [9, 10], Messarovich [2], Saaty [8] and others. IMS address to process uniformly diverse, multi-level, fragmentary, unreliable, and varying in time information about some problem/process. Based on this type of information IMS allow perform monitoring of the problem/process evolution and work out strategic plans of problem/process development. These capabilities open a broad area of applications in business (marketing, management, strategic planning), socio-political problems (elections, control of bilateral and multilateral agreements, terrorism), etc. One of such applications is a system for monitoring and evaluation of state's nuclear activities (department of safeguards, IAEA) [3] - have been shortly described in the report.

10.2 Basic elements of IMS and their characteristic We shall name a task of evaluation of a current state of the problem/process and elaboration of the forecasts of its development as an information monitoring problem and human-computer systems ensuring support of a similar sort of information problems - information monitoring systems. Basic elements of monitoring system at the top level are the information space, in which information about the state of the problem/process circulates, and expert (experts), working with this information and making conclusions about the state of the problem/process and forecasts of its development. The information space represents a set of various information elements, which can be characterized as follows: •

diversity of the information carriers, i.e. fixing of the information in the articles, newspapers, computer kind, audio- and video- information etc.;

148

Alexander Ryjov

• fragmentariness. The information more often concerns to any fragment of the problem, and the different fragments may be differently "covered" with the information; • multi-levels of the information. The information can concern to the whole problem, to some its parts, to a particular element of the problem; • various degree of reliability. The information can contain the particular data which has a various degree of reliability, indirect data, results of conclusions on the basis of the reliable information or indirect conclusions; • possible discrepancy. The information from various sources can coincide, slighdy to differ or in general to contradict one another; • varying in time. The problem develops in time, therefore the information at different moments of the time about the same element of the problem may and should be differ; • possible bias. The information reflects certain interests of the source of the information, therefore it can have tendentious character. In the specific case it may be misinformation (for example, for political problems or for problems, connected to competitiveness). The experts are an active element of the monitoring system and, observing and studying elements of the information space, they make conclusions about the state of the problem and prospects of its development taking into account listed above properties of the information space.

10.3 Basic principles of information monitoring technology Information monitoring systems allow: • • • •

to process uniformly diverse, multi-level, fragmentary, unreliable, information varying in time; to receive evaluations of status of the whole problem/process and/or its particular aspects; to simulate various situations in the subject area; to reveal "critical ways" of the development of the problem/process. It means to reveal those elements of the problem, the small change of which status may qualitatively change the status of the problem/process as a whole.

Taking into account the given features of the information and specific methods of its processing, it is possible to declare the main features of the information monitoring technology as follows: •

The system provides the facility for taking into account data conveyed by different information vehicles (journals, video clips, newspapers, documents in electronic form etc.). Such a facility is provided by means of storage in a database of a system of references to an evaluated piece of information, if it is not a document in electronic form. If the information is a document in electronic form, then both the evaluated information (or part thereof) and a reference thereto are stored in the system. Thus the system makes it possible to take into account and use in

10 Basic Principles and Foundations of Information Monitoring Systems

•

•

•

•

149

an analysis all pieces of information which have a relationship to the subject area irrespective of the vehicles concerned. The system makes it possible to process fragmentary information. For this purpose a considerable part of the model is represented in the form of a tree. It is clear that for complex problems/process such representation of a model is some simplification. However in this way good presentation and simplicity of operation with the model is attained. Information with different degrees of reliability, some of it possibly tendentious, can be processed in the system. This is achieved by reflecting the influence of a particular piece of information on the status of the elements of the model of the problem with the aid of fuzzy linguistic values. It should be borne in mind that an evaluation of an element of the model. The information with a various degree of reliability, probably, biased can be processed in the system. For this purpose the description of the influence of the information received on a status of the model of a problem was done with use of fuzzy linguistic variable. It is necessary to take into account, that the evaluation of the element of model may both vary under influence of the information received and remain unchanged (i.e. be confirmed). Time is one of the parameters of the system. This makes it possible to have a complete picture of the variation of the status of the model with time.

Thus, the systems constructed on the basis of this technology allow having the model of the problem developing in time. It is supported by the references to all information materials, chosen by the analysts, with general and separate evaluations of the status of the problem/process. Use of the time as one of parameters of the system allows to conduct the retrospective analysis and to build the forecasts of development of the problem/process. There is the opportunity of allocation "of critical points", i.e. such element(s) of the model, the small change of which can cause significant changes in a status of the whole problem/process. The knowledge of such elements has large practical significance and allows to reveal "critical points" of the problem/process, to work out the measures on blocking undesirable situations or achievement desirable, i.e. somewhat operate the development of the problem/process in time in the desirable direction.

10.4 Theoretical Basis For effective practical application of the proposed technological solutions it is necessary to tackle a series of theoretical problems, the results of which are given below.

10.4.1 Problem 1: Modeling of human's perception It is assumed that the expert describes the degree of inconsistency of the obtained information (for example, the readiness or potential for readiness of certain processes in a country [3]) in the form of linguistic values. The subjective degree of convenience of such a description depends on the selection and the composition of such linguistic values. Let us explain this on a model example.

150

Alexander Ryj ov

Example 1. Let it be required to evaluate the quantity of plutonium. Let us consider two extreme situations ([3]). Situation 1. It is permitted to use only two values: "small" and "considerable quantity". Situation 2. It is permitted to use many values: "very small", "not very considerable quantity",..., "not small and not considerable quantity",..., "considerable quantity". Situation 1 is inconvenient. In fact, for many situations both the permitted values may be unsuitable and, in describing them, we select between two "bad" values. Situation 2 is also inconvenient. In fact, in describing a specific quantity of nuclear material, several of the permitted values may be suitable. We again experience a problem but now due to the fact that we are forced to select between two or more "good" values. Could a set of linguistic values be optimal in this case? It is assumed that the system tracks the development of the problem, i.e. its variation with time. It is also assumed that it integrates the evaluations of different experts. This means that one object may be described by different experts. Therefore it is desirable to have assurances that the different experts describe one and the same object in the most "uniform" way. On the basis of the above we may formulate the first problem as follows: Problem 1. Is it possible, taking into account certain features of the man's perception of objects of the real world and their description, to formulate a rule for selection of the optimum set of values of characteristics on the basis of which these objects may be described? Two optimality criteria are possible: Criterion 1. We regard as optimum those sets of values through whose use man experiences the minimum uncertainty in describing objects. Criterion 2. If the object is described by a certain number of experts, then we regard as optimum those sets of values which provide the minimum degree of divergence of the descriptions. This problem may be reformulated as a problem of construction of an optimal information granulation procedure from point of view of criterion 1 and criterion 2. Let us consider t fuzzy variables with the names ai, a2, . . . , at, specified in one universal set. We shall call such a set the semantic space Sf. Let us introduce a system of limitations for the membership functions of the fuzzy variables comprising st. We shall consider that: (1) Vj(l <j< t)3\J] i=. 0, where U] = {?/ G U \[i^{u) = 1}, U ] is an interval or a point; (2) Vj(l < j < t)fij{u) does not decrease on the left of U ] and does not increase on the right of U] (since, according to (1), U j is an interval or a point, the concepts "on the left" and "on the right" are determined unambiguously). Requirements 1 and 2 are quite natural for membership functions of concepts forming a semantic space. In fact, the first one signifies that, for any concept used in the universal set, there exists at least one object which is standard for the given concept.

10 Basic Principles and Foundations of Information Monitoring Systems

151

If there are many such standards, they are positioned in a series and are not "scattered" around the universe. The second requirement signifies that, if the objects are "similar" in the metrics sense in a universal set, they are also "similar" in the sense of membership of a certain concept. Henceforth, we shall need to use the characteristic functions as well as the membership functions, and so we shall need to fulfil the following technical condition: (3) Vj(l < j < t)fXj{u) has not more than two points of discontinuity of the first kind. For simplicity let us designate the requirements 1-3 as L. Let us also introduce a system of limitations for the sets of membership functions of fuzzy variables comprising Sf. Thus, we may consider that: (4) completeness: Vn € U 3 j ( l < j
(5) orthogonal: V-w G U ^^ iij{u) = 1. Requirements 4 and 5 also have quite a natural interpretation. Requirement 4, designated the completeness requirement, signifies that for any object from the universal set there exists at least one concept to which it may belong. This means that in our semantic space there are no "holes". Requirement 5, designated the orthogonality requirement, signifies that we do not permit the use of semantically similar concepts or synonyms, and we require sufficient distinction of the concepts used. Note that this requirements is often fulfilled or not fulfilled depending on the method used for constructing the membership functions of the concepts forming the semantic space. Note also that all the results given below are justified with a certain weakening of the orthogonality requirement [7], but for its description it is necessary to introduce a series of additional concepts. Therefore let us dwell on this requirement. For simplicity we shall designate requirements 4 and 5 as G. We shall term the semantic space consisting of fuzzy variables, the membership functions of which satisfy the requirements (1) - (3), and their populations the requirements (4) and (5), a complete orthogonal semantic space and denote it G{L). As can be seen from example 1, the different semantic spaces have a different degree of internal uncertainty. Is it possible to measure this degree of uncertainty? For full orthogonal semantic spaces the answer to this question is yes. To prove this fact and derive a corresponding formula, we need to introduce a series of additional concepts. Let there be a certain population of t membership functions St G G{L). Let st = {/xi(w),..., /it(^^)}. Let us designate the population of t characteristic functions st = {hi (?i),..., ht{u)} as the most similar population of characteristic functions, if

M.) = {J;'f°»-«^« = "•<»> «(!<.<.) It is not difficult to see that, if the complete orthogonal semantic space consists not of membership functions but of characteristic functions, then no uncertainty will arise when describing objects in it. The expert unambiguously chooses the term aj, if the object is in the corresponding region of the universal set. Some experts describe

152

Alexander Ryjov

one and the same object with one and the same term. This situation may be illustrated as follows. Let us assume that we have scales of a certain accuracy and we have the opportunity to weigh a certain material. Moreover, we have agreed that, if the weight of the material falls within a certain range, it belongs to one of the categories. Then we shall have the situation accurately described. The problem lies in the fact that for our task there are no such scales nor do we have the opportunity to weigh on them the objects of interest to us. However we can assume that, of the two semantic spaces, the one having the least uncertainty will be that which is most "similar" to the space consisting of the populations of characteristic functions. In mathematics distance can be a degree of similarity. Is it possible to introduce distance among semantic spaces? For complete orthogonal semantic spaces it is possible. Lemmsil. Let st G G(L), sj G G(L), st = s'f. = {fi[{u)^..., /ij(ii)}, p(/, g) is a measure in L. Then

{fii{u),...

,iJ.t{u)},

t

is a measure in Gt{L), The semantic statements formulated by us in the analysis of st may be formalized as follows. Let st G G{L). For the measure of uncertainty of st we shall take the value of the functional C(st), determined by the elements of G{L) and assuming the values in [0,1] (i.e. (,{st) : G{L) -^ [0,1]), satisfying the following conditions (axioms): Al .^(st) = 0, if St is a set of characteristic functions; A2.Let 5t, s'^, G G{L), t and t' may be equal or not equal to each other. Then^(5t) < ^(^J/), if p{st,st) < p{s[,^s[,), where/9(-, •) is some metric in G{L). Do such functional exist? The answer to this question is given by the following theorem. Theorem 1. (Theorem of existence). Let St G G{L). Then functional

^{st) = ^

^ / (M^I W - f'iii^)) ^^'

(10-1)

where ai*(u) = max LLi(u),ai*(u) =

max

Ui(u),

f satisfies the following requirements: Fl : /(O) = 1, / ( I ) = 0; F2 : / does not increase is a measure of uncertainty of St, i.e. satisfies Al andA2.

(10.2)

10 Basic Principles and Foundations of Information Monitoring Systems

153

There are many functionals satisfying the conditions of Theorem 1. They are described in sufficient detail in [13]. The simplest of them is the functional in which the function f is linear. It is not difficult to see that conditions Fl and F2 are satisfied by the sole linear function f{x) = 1 — x. Substituting it in Eq. ( 10.1), we obtain the following simplest measure of uncertainty of the complete orthogonal semantic space Cist) = ^J^{1-

(/x,*(u) - Mi*M)) du,

(10.3)

where /Xi* {u), fXi* (u) are determined by the relations (10.2). Let us denote the sub-integral function in (10.3) by rj{st^u): r]{su u) = l - {fii* (u) - fXi* (u)) Now we may adduce the following interpretation of the measure of uncertainty ( 10.1). Interpretation. Let us consider the process of describing objects in the framework of the semantic space 53 e G{L) - Fig. 10.1. For the objects ui and 1^5, man will

Fig. 10.1. Interpretation of measure of uncertainty without hesitation select one of the terms (ai and as respectively). For the object U2 the user starts selecting between the terms ai and a^. This hesitation increases and attains its peak for the object u^\ at this point the terms ai and 02 are indistinguishable. All the experts will be unanimous in describing the objects ui and 1^5, while in describing U2 a certain divergence will arise which attains its peak for the object 1^4. Let us now consider formula for r]{st,u). It is not difficult to see that

0 = 7y(sf, U5) = ri{st,ui) < 77(5^, 2x2) < 7/(5^,^3) < ry(st, U4) = 1. Thus, rj{st,u) actually reflects the degree of uncertainty which man experiences in describing objects in the framework of the corresponding semantic space or the degree of divergence of opinion of the experts in such a description. Then the degree of fuzziness of ^{st) ( 10.1) is an average measure of such uncertainty in describing all the objects of the universal set. The following theorems are true [7]:

154

Alexander Ryjov Let us define the following subset of function set L: 1/ is a set of functions from L, which are part-linear and linear on U = {u G U : Vi(l <j
iij{u) < 1},

L is a set of functions from L, which are part-linear on U (including U). Theorem 2. Let st G G(L). Then ^{st) = 2iTJl' ^^^^^ ^ = l^lTheorem 3. Letst G G{L). Then^{st)

^roi' ^^^^^ ^ ' |U|; c < 1, c = Const.

Let g is some one-to-one function, which is defined on U. This function is induced transformation of some st G Gt{L) on universum U to g{st) on universum U', where U ' = gCU) = {u' : u' = g{u), u G U } . The above induction is defined by following way: g{st) is a set of membership functions {/xi(u'),... ,/iJ(7x')}, where fJ^^j{u^) = fJ'jidiu)) = tij{g-^(u')) = fij{u), fXj{u) est, l<j< t. The following example illustrate this definition. Example 2. Let st G G{L), U is universum st and ^ is expansion (compression) of universum U. In this case, g{st) is a set of functions produced from st by the same expansion (compression). Theorem 4. Let st G G{L), U w universum st and g is some linear one-to-one Junction defined on U and^{st) i^ 0. Then ^(s^) = i{g{st)). An important aspect of the practical use of any model is its stability. It is quite natural that in identifying the parameters of the model (in our case when constructing the membership functions) small measuring errors can occur. If the model is sensitive to such errors, then its practical use is very problematic. Let us consider that the membership functions in our case are not given accurately but have a certain "accuracy" 5 (Fig. 10.2). Let us call this particular situation the J-model and denote

Fig. 10.2. 5 - model it by G^{L). In this situation we can calculate the top (^(st)) and bottom (^{st)) valuations of the degree of fuzziness ^{st).

10 Basic Principles and Foundations of Information Monitoring Systems

155

Theorem 5. Let st e G^(L). Then i(£i) = 2 | t ^ ( l - < 5 i ) '

(10.4)

^(5*) = ^ ( l + 2<5i)

(10.5)

By comparing the results of the theorem 2 and theorem 5, we see that for small significances, the main laws of our model are preserved. Therefore, we can use our technique of estimation of the degree of fuzziness in practical tasks, since we have shown it to be stable. Based on described above results we can propose following rule for selection of the optimum set of values of characteristics on the basis of which these objects may be described: • • • •

All the "reasonable" sets of linguistic values are formulated. Each of such sets is represented in the form of a complete orthogonal semantic space. For each set the measure of uncertainty is calculated ( 10.1). As the optimum set minimizing both the uncertainty in the description of objects and the degree of divergence of opinions of experts we select the one, the uncertainty of which is minimal.

We can formulate the following resume for this section. It is shown that we can formulate a method of selecting the optimum set of values of qualitative indications (collection of granules [5]). Moreover, it is shown that such a method is stable, i.e. the natural small errors that may occur in constructing the membership functions do not have a significant influence on the selection of the optimum set of values. The sets which are optimal according to criteria 1 and 2 coincide. Following this method, we may describe objects with minimum possible uncertainty, i.e. guarantee optimum operation of the information monitoring system from this point of view. 10.4.2 Problem 2: Information retrieval in fuzzy data bases Information monitoring technology assumes the storage of information material (or references to it) and their linguistic evaluations in the system database. In this connection the following problem arises. Problem 2. Is it possible to define the indices of quality of information retrieval in fuzzy (linguistic) databases and to formulate a rule for the selection of such a set of linguistic values, use of which would provide the maximum indices of quality of information retrieval? This problem may be reformulated as a problem of construction of an optimal information granulation procedure from point of view of information retrieval in fuzzy (linguistic) databases. The information monitoring systems are human-machine information systems. The user's estimations of the accessible information materials are store in a database

156

Alexander Ryj ov

of system and are used for an evaluation of the current status of a problem and for a forecasting of its development (see section 10.3). In this sense the database of system is a basis of information model of a subject area. The quality of this basis (and, accordingly, model of the problem) is expressed, in particular, through parameters of the information retrieval. If the database containing the linguistic descriptions of objects of a subject area allows to carry out qualitative and effective search of the relevant information then the system of information monitoring will work also qualitatively and effectively. As well as in section 10.4.1, we shall consider that the set of the Hnguistic meanings describing a degree of discrepancy of the received information and available or a possibility of realization of some processes in a state can be submitted as G{L). In our study of the process of information searches in data bases whose objects have a linguistic description, we introduced the concepts of loss of information i^x{U)) and of information noise {^x{U)). These concepts apply to information searches in these data bases, whose attributes have a set of significances X, which are modeled by the fuzzy sets in st . The meaning of these concepts can informally be described as follows. While interacting with the system, a user formulates his query for objects satisfying certain linguistic characteristics, and gets an answer according to his search request. If he knows the real (not the linguistic) values of the characteristics, he would probably delete some of the objects returned by the system (information noise), and he would probably add some others from the data base (information losses), not returned by the system. Information noise and information losses have their origin in the fuzziness of the linguistic descriptions of the characteristics. Let us denote N(u) the number of objects, the descriptions of which are stored in the data base, that possess a real (physical, not linguistic) significance equal to u\ Pj(j = I,... ,t) - the probability of some request offering in some j - meaning of the characteristic. The following theorems are true [8]. Theorem 6. Let st e G{L), N{u) = N = Const andpj = | ( j = 1 , . . . , ^). Then ^x{U) = ^x{U) =

IN

—^{st).

Theorem7. Letst e G{L), N{u) = JV = Constandpj

= ^{j = I,...,t).

Then

where c is a constant with depends from N only. Theorem 8. Let st G G^{L), N{u) = N = Const andpj = i ( j = 1 , . . . , t). Then —

_

\3

2N t{l-\-25i)

a ^ + 2.. «».). O.N

10 Basic Principles and Foundations of Information Monitoring Systems

157

Based on described above results we can propose following rule for the selection of such a set of linguistic values, use of which would provide the maximum indices of quality of information retrieval: • • • •

All the "reasonable" sets of linguistic values are formulated. Each of such sets is represented as a complete orthogonal semantic space. For each set the measure of uncertainty is calculated (10.1). As the optimum set of linguistic values, use of which would provide maximum indices of quality of information retrieval, ratio ^ ^ of which is minimal.

We can formulate the following resume for this section. It is shown that it is possible to introduce indices of the quality of information retrieval in fuzzy (linguistic) databases and to formalize them. It is shown that it is possible to formulate a method of selecting the optimum set of values of qualitative indications which provides the maximum quality indices of information retrieval. Moreover, it is shown that such a method is stable, i.e. the natural small errors in the construction of the membership functions do not have a significant effect on the selection of the optimum set of values. It allows to approve that the offered methods can be used in practical tasks and to guarantee optimum work of information monitoring systems. 10.4.3 Problem 3: Aggregation of information in fuzzy hierarchical systems Because model of the problem/process have hierarchical stricture (see section 10.2), choice and selection (tuning) of aggregation operators for the nodes of the model is one more important issue in development IMS. We may formulate this problem as follows: Problem 3. Is it possible to propose the procedures of information aggregation in fuzzy hierarchical dynamic systems which allow us to minimize contradictoriness in the model of problem/process in IMS? Let model of object or process is tree D with nodes dj {j = 0,..., ND), each of which links with some set of the linguistic values Xj describing a state of the node. Every not leaf node is associated with some operator of aggregation of the information, allowing on the basis of estimations of a state of the subordinated nodes to calculate its state (i.e. to choose one of elements of corresponding set of values). Frequently the choice of such operator is defined by properties of model. For example, for technical problems/processes min is good enough operator. However often this choice is not so obvious (for example, for problems from area of political science, sociology or medicine). Development of methods of a choice of adequate operators of aggregation on the basis of the accessible information from experts and the analysis of functioning of model is necessary for these cases. Let us consider some not leaf node dj^ with subordinates to it (in sense of considered tree D) nodes dj^ ,dj^,.. .,dj^^. Then the operator of aggregation of information (OAI) is the function defined on set of all possible values of subordinated nodes and accepting values in set: 0j, : Xj, X X,, X . . . X Xj^^ ^ Xj,

(10.6)

158

Alexander Ryjov

^j T ^ Sets Xj {j = 0,...,ND) is a dset of linguistic values a^^Oj,. aJ ^. Let us denote M[0jo] a set of OAI for node dj^. It is obvious, that for a concrete element of model the number possible OAI is big: from (10.6) directly follows, that

Our task is the choice of the concrete operator Oj e M[0j] for all not leaf nodes dj of our model D. This choice is based on some information Ij about "ideal" OAI Oj € M[Gj]. This information represents by two sets: -

f(i)

ij=ir^ij

(2)

(10.7)

r(i)^ is a set of statements of experts about "correct behaviour" of OJ; where /•

JJ^^ is a set of records of "work" of Oj. The following statements can illustrates Iji ''If dj^ = a]^ and djr^ = a]^,.-, and djj^^ = a ] ^ , then dj^ = a]^"; ''If dj^ is strongly increasing, then dj^ is decreasing"; "djo is monotonely decreasing on all arguments", etc. Ij is a table of following kind: ( " j l ) " J 2 ' •••>djMn)

''JO

^ J 1 ' ^ J 2 ' •• •^"•JNn 1 1 •'°-JN„

^K'l.aja." •.«k) e{a}^,a]^,.. •'«Ln)

a""

e{a]^,a]^,...

a"" ^(^jMo right column is values of dj^, based djQ, based on /• \ or empty records. In the beginning of our work the table contains dj^ values of first type (which are based on Ij). In process of reception and an estimation by user of the information the table is filled on the basis of calculations for the chosen operator Oj till the moment when the user disagrees with "theoretical" value dj^. If such contradiction does not arise, the operator is chosen successfully and is adequate OAI for the given node of a model. If the contradiction arises, it is necessary to repeat procedure of a choice of adequate OAI, but on the basis of added and, perhaps, specified with the expert the information / • ^ and / • . This process repeats until the table will be filled completely. The final table is adequate OAI for the given node. Unfortunately we can not present all our algorithms of choosing of adequate OAI here due to article's volume limitation.

10 Basic Principles and Foundations of Information Monitoring Systems

159

We can just formulate the following resume. It is shown that it is possible to propose the following approaches based on different interpretations of aggregation operators: geometrical, logical, and learning-based. The last one includes leaning based on genetic algorithms and learning based on neural networks. These approaches are described in details in [6].

10.5 Application's features Some applied information monitoring systems based on described above technology have been developed. Based on this experience, we can formulate the following necessary stages of the development process: • conceptual design; • development of the demonstration prototype; • development of a prototype of the system and its operational testing; • development of the final system. Volume of article does allow us to describe these items in details; therefore we can focus on a matter of principle only. The most difficult point in development process is the elaboration of structure of the problem/process model. In some well-developed areas (marketing, medicine) we used descriptions of the process from professional books and references (like [1]) as a draft of the model, coordinated this draft with the professional experts (conceptual design and development of the demonstration prototype stages), and "tuned" this improved draft during testing of the system (development of a prototype stage). Sometimes the problem/process for monitoring is formalized enough for application of information monitoring technology. An example of this situation is a state nuclear program evaluation procedure in IAEA [3]. Developed earlier so-called physical model of the nuclear fuel cycle was a good base for the model of the problem/process in information monitoring system. Based on this model a prototype of information monitoring system has been developed. This prototype allows [3]: • Provide a tool for continuous monitoring of the status of the subject area. • Provide IAEA expert with a tool to input into the system documents concerning the States' nuclear activities in textual format or references to documents in the form of hard copies, video topics, audio reports, etc. • Produce an evaluation of the influence of obtained sign on the status of elements of the model and to change (confirm) their status accordingly. • Provide a tool for examining the status of the subject area with several levels of detail. • Detect inconsistencies between the declared States' capabilities for processing nuclear material and those capabilities as established by the Agency through analysis of information from other available sources. • Assess the importance of any detected inconsistencies from the point of view of a change in the States' capabilities to produce HEU and Pu. • Detect "critical points" important from the point of view of production of HEU and Pu, information about which is crucial for resolving an inconsistency between a country's declaration and its capabilities for processing nuclear material established by the Agency.

160

•

•

•

Alexander Ryj ov

Provide storage in its database of all the documents evaluated by the expert on references to them with linkage to specific elements of the model of the nuclear activity of a country. Provide IAEA expert with the tool for retrospective analysis of a change in the evaluations of each element of the model with the possibility of scanning the corresponding document or obtaining references to it. Record changes occurring in the system and provide the user with the tool for analyzing them.

10.6 Conclusion IMS works with diverse, multi-level, fragmentary, unreliable, and varying in time information about some problem/process and allows performing monitoring of the problem/process evolution and working out strategic plans of the problem/process development. The most difficult point in development process is the elaboration of structure of the problem/process model. Perspective way for automation of this process (development of the model of problem/process) is an application of advanced technologies like data mining. Our first experiments shown that data mining can be a good tool for this task especially if we have enough data on the problem/process. Developed methods for problems 1-3 (section 10.3) allow us to guarantee optimum work of IMS.

References 1. Philip Kotler. Marketing Management (10th Edition). Prentice Hall, 1999, 784 p. 2. Messarovich M.D., Macko D., Takahara Y. Theory of hierarchical multilevel systems. Academic Press, N.Y.- London 1970 - 344 p. 3. Ryjov A., Belenki A., Hooper R., Pouchkarev V., Fattah A. and Zadeh L.A.. Development of an Intelligent System for Monitoring and Evaluation of Peaceful Nuclear Activities (DISNA), IAEA, STR-310, Vienna, 1998, 122 p. 4. Ryjov A. Estimation of fuzziness degree and its application in intelligent systems development. Intelligent Systems. V.l, 1996, p. 205 - 216 (in Russian). 5. Ryjov A. Fuzzy Information Granulation as a Model of Human Perception: some Mathematical Aspects. Proceeding of Eight International Fuzzy Systems Association World Congress 99, p. 82-86. 6. Ryjov A. On information aggregation in fuzzy hierarchical systems. Intelligent Systems. V.6, 2001, p. 341 - 364 (in Russian). 7. Ryjov A. The principles of fuzzy set theory and measurement of fuzziness. Moscow, Dialog-MSU Publishing, 1988, 116 p. (in Russian). 8. Ryjov A. Models of information retrieval in fuzzy databases. Moscow, MSU Publishing, 2004,96 p. (in Russian). 9. Saaty T.L. The Analysis of the Hierarchy Process. Moscow, Radio and Swjaz, 1993 - 315 p. (in Russian). 10. Zadeh L.A. Fuzzy sets. Information and Control, 1965, v.8, pp. 338-353. 11. Zadeh L.A. The concept of a linguistic variable and its application to approximate reasoning. Part 1,2,3. Inform.Sci.8, 199-249; 8,301-357; 9,43-80 (1975).

n Modelling Unreliable and Untrustworthy Agent Behaviour Marek Sergot Department of Computing, Imperial College London 180 Queen's Gate, London SW7 2BZ, UK mj [email protected] Summary. It cannot always be assumed that agents will behave as they are supposed to behave. Agents may fail to comply with system norms deliberately, in open agent systems or other competitive settings, or unintentionally, in unreliable environments because of factors beyond their control. In addition to analysing system properties that hold if specifications/norms are followed correctly, it is also necessary to predict, test, and verify the properties that hold if system norms are violated, and to test the effectiveness of introducing proposed control, enforcement, and recovery mechanisms. C-\-'^'^ is an extended form of the action language C-f of Giunchiglia, Lee, Lifschitz, McCain, and Turner, designed for representing norms of behaviour and institutional aspects of (human or computer) societies. We present the permission component of C-f"^"^ and then illustrate on a simple example how it can be used in conjunction with standard model checkers for the temporal logic CTL to verify system properties in the case where agents may fail to comply with system norms.

11.1 Introduction It is a common assumption in many multi-agent systems that agents will behave as they are intended to behave. Even in systems such as IMPACT [1], w^here the language of ^obligation' and 'permission' is employed in the specification of agent behaviour, there is an explicit, built-in assumption that agents always fulfill their obligations and never perform actions that are prohibited. For systems constructed by a single designer and operating on a stable and reliable platform, this is a perfectly reasonable assumption. There are at least two main circumstances in which the assumption must be abandoned. In open agent societies, where agents are programmed by different parties, where there is no direct access to an agent's internal state, and where agents do not necessarily share a conmion goal, it cannot be assumed that all agents will behave according to the system norms that govern their behaviour. Agents must be assumed to be untrustworthy because they act on behalf of parties with competing interests, and so may fail, or even choose not to, conform to the society's norms in order to achieve their individual goals. It is then usual to impose sanctions to discourage norm violating behaviour and to provide some form of reparation when it does occur.

162

Marek Sergot

The second circumstance is where agents may fail to behave as intended because of factors beyond their control. This is likely to become commonplace as multi-agent systems are increasingly deployed on dynamic distributed environments. Agents in such circumstances are unreliable, but not because they deliberately seek to gain unfair advantage over others. Imposition of sanctions to discourage norm violating behaviour is pointless, though there is a point to specifying reparation and recovery norms. There is a third, less common, circumstance, where deliberate violations may be allowed in order to deal with exceptional or unanticipated situations. An example of discretionary violation of access control policies in computer security is discussed in [2]. In all these cases it is meaningful to speak of obligations and permissions, and to describe agent behaviour as governed by norms, which may be violated, accidentally or on purpose. In addition to analysing system properties that hold if specifications/norms are followed correctly, it is also necessary to predict, test, and verify the properties that hold if these norms are violated, and to test the effectiveness of introducing proposed control, enforcement, and recovery mechanisms. In previous work [3, 4, 5] we presented a framework for specifying open agent societies in terms of permissions, obligations, and other more complex normative relations. Norms are represented in various action formalisms in order to provide an executable specification of the agent society. This work, however, did not address verification of system properties, except in a limited sense. In another strand of work [6, 7], we have addressed verification of system properties but that was in the specific context of reasoning about knowledge in distributed and multi-agent systems. We showed that by adding a simple deontic component to the formalism of 'interpreted systems' [8] it is possible to determine formally which of a system's critical properties are compromised when agents fail to behave according to prescribed communication protocols, and then to determine formally the effectiveness of introducing additional controller agents whose role is to enforce compliance. In this paper we conduct a similar exercise, but focussing now on agent behaviours generally rather than on communication and epistemic properties specifically. We present the main elements of a formalism C-\-'^'^ which we have been developing for representing norms of behaviour and institutional aspects of (human or computer) societies [9]; we then present a simple example to sketch how it can be used in modelling unreliable/untrustworthy agent behaviour. C-h"^"^ is an extended form of the action language C+ of Giunchiglia, Lee, Lifschitz, McCain, and Turner [10], a formalism for specifying and reasoning about the effects of actions and the persistence ('inertia') of facts over time. An 'action description' in C-f is a set of C+ rules which define a transition system of a certain kind. Implementations supporting a range of querying and planning tasks are available, notably in the form of the 'Causal Calculator' C C A L C . Our extended version C-f-"^"^ provides two main extensions. The first is a means of expressing 'counts as' relations between actions, also referred to as 'conventional generation' of actions. This feature will not be discussed in this paper. The second extension is a way of specifying the permitted (acceptable, legal) states of a transition system and its permitted (acceptable, legal) transitions. This will be the focus of attention in this paper.

11 Modelling Unreliable and Untrustworthy Agent Behaviour

163

A main attraction of the C+ formalism compared to other action languages in the AI literature is that it has an explicit semantics in terms of transition systems and also a semantics in terms of a nonmonotonic formalism ('causal theories', summarised below) which provides a route to implementation via translations to executable logic programs. The emphasis in this paper is on the transition system semantics. Transition systems provide a bridge between AI formalisms and standard methods in other areas of computer science. We exploit this link by applying standard temporal logic model checkers to verify system properties of transition systems defined using the language C-h"^"^. Two points of clarification: (1) We do not distinguish in this paper between deliberate and unintentional norm violation, and (2) we are modelling agent behaviour from an external "bird's eye" perspective. We do not discuss an agent's internal state or its internal reasoning mechanisms.

11.2 The language C+ The language C was introduced by Giunchiglia and Lifschitz [11]. It applies the ideas of 'causal theories' to reasoning about the effects of actions and the persistence ('inertia') of facts ('fluents'), building on earlier suggestions by McCain and Turner. C-h extends C by allowing multi-valued fluents as well as Boolean fluents and generalises the form of rules in various ways. The definitive presentation of C-f, and its relationship to 'causal theories', is [10]. An implementation supporting a range of querying and planning tasks is available in the form of the Causal Calculator ( C C A L C ) ^ We present here a concise, and necessarily rather dense, sunmiary of the language. Some features (notably 'statically determined fluents') are omitted for simplicity. There are also some minor syntactic and terminological differences from the version presented in [10], and we give particular emphasis to the transition system semantics. Syntax and semantics We begin with a, a multi-valued, propositional signature, which is partitioned into a (non-empty) set a^ of fluent constants and a (non-empty) set a^ of action constants. For each constant c e a there is a finite, non-empty set dom{c) of values. For simplicity, in this paper we will assume that each dom{c) has at least two elements. An atom of the signature is an expression c=v, where c e a and V G dom{c), c=v is SLfluentatom when c e a^ and an action atom when c G a^. A Boolean constant is one whose domain is the set of truth values {t, f}. When c is a Boolean constant, we often write c for c=t and -"C as a shorthand for c=f. Formulas are constructed from the atoms using the usual propositional connectives. The expressions T and _L are 0-ary connectives, with the usual interpretation. A fluent formula is a formula whose constants all belong to cr^; an action formula is a formula whose constants all belong to a^, except that T and ± are fluent formulas but not action formulas. ^http://www.cs.utexas.edu/users/tag/cc

164

Marek Sergot

An interpretation of a multi-valued signature cr is a function mapping every constant c to some v e dom{c)\ an interpretation X is said to satisfy an atom c=^v if X{c) = V, and in this case we write X [= c=v. The satisfaction relation [= is extended from atoms to formulas in accordance with the standard truth tables for the propositional connectives. We let the expression I(cr) stand for the set of interpretations of a. For convenience, we adopt the convention that an interpretation X of cr is identified with the set of atoms that are satisfied by X, i.e., X |= c=v iff c=v G X for any atom v=^v of a. Every action description D of C+ defines a labelled transition system (5, A, R) where • • •

5 is a (non-empty) set of states, each of which is an interpretation of the fluent constants a^ of D; S C l{a^); A is a set of transition labels, sometimes referred to as action labels or events; A is the set of interpretations of the action constants a^, A = l(a^)', i? is a set of transitions, RC S x A x S.

For example: suppose there are three agents, a, b, and c which can move in direction E, W, N, or 5, or remain idle. Suppose (for the sake of an example) that they can also whistle as they move. Let the action signature consist of action constants move{a), move{h), move{c) with domains {E^W^N^S^idle}, and Boolean action constants whistle{a), whistle{b), whistle{c). Then one possible interpretation of the action signature, and therefore one possible transition label, is {move{a)=E^ move{b)=N, move{c)=idle, whistle{a), ->whistle{b), whistle{c)}. Because every transition label e is an interpretation of the action signature a^, action formulas a can be evaluated on the transition labels. We sometimes say that a transition (s, €, 5') is a transition of type a when e |= a. An action description D in C+ is a set of causal laws, which are expressions of the following three forms. A static law is an expression: Fife

(II.1)

where F and G are fluent formulas. Static laws express constraints on states. A fluent dynamic law is an expression: F if G after 7/;

(11.2)

where F and G are fluent formulas and ip is any formula of signature a^ U a^. Informally, (11.2) states that fluent formula F is satisfied by the resulting state 5' of any transition (5, €, 5') with 5 U € |= V^, as long as fluent formula G is also satisfied by 5'. Some examples follow. An action dynamic law is an expression: aifip

(11.3)

where a is an action formula and ip is any formula of signature a^ U a^. Action dynamic laws are used to express, among other things, that any transition of type a

11 Modelling Unreliable and Untrustworthy Agent Behaviour

165

must also be of type a' {a' if a), or that any transition from a state satisfying fluent formula G must be of type j3 {(3 if G). Examples will be provided in later sections. The C+ language provides various abbreviations for common forms of causal laws. We will employ the following in this paper. a causes F if G expresses that fluent formula F is satisfied by any state following the occurrence of a transition of type a from a state satisfying fluent formula G. It is shorthand for the dynamic law F if T after G Aa. a causes F is shorthand for F if T after a. nonexecutable a if G expresses that there is no transition of type a from a state satisfying fluent formula G, It is shorthand for the fluent dynamic law i. if T after GA a, or a causes ± if G. inertial / states that values of the fluent constant / persist by default ('inertia') from one state to the next. It is shorthand for the collection of fluent dynamic laws f=v if f=v after f=v for every v G dom{f). Of most interest are definite action descriptions, which are action descriptions in which the head of every law (static, fluent dynamic, or action dynamic) is either an atom or the symbol _L, and in which no atom is the head of infinitely many laws of D. We will restrict attention to definite action descriptions in this paper. Now for the semantics. (See [9] for further details.) Let Tstatic (s) stand for the heads of all static laws in D whose bodies are satisfied by s; let E{s, e, s') stand for the heads of all fluent dynamic laws in D whose bodies are satisfied by the transition (5, e, s'); and let A{e, s) stand for the heads of all action dynamic laws whose bodies are satisfied by the transition (s, e, s'). ^ static \^) ^^def {F I F if G is in D, 5 t= G}

F(s, e, 5') =def {F I F if G after iP isin D, s' \= G, sU e ^ ip} A{e,s) =def {o; I a if -0 is in D, sUe \= i/;} Let Dbea definite action description and a^ its fluent signature. A set s of fluent atoms is a state of D iff it satisfies the static laws of D, that is, iff •

s h Tstaticis)

(i.e.,

A static

(s) C s)

(5, e, s') is a transition of D iff s and 5' are interpretations of the fluent signature a^ and e is an interpretation of the action signature a^ such that: • • •

s\= Tstatic{s) (Tstatic{s) C s; s is a state of D) s' = Tstatic{s')UE{s,e,s') €\=A(€,s) (A{e,s)Ce)

One can see from the definition that s' is a state of D when (5, e, 5') is a transition of D. Paths Finally, when (5, A, i^) is a labelled transition system, a path of length m is a sequence so ^o ^i • • • Sm-i ^m-i Sm ( ^ > 0) such that (s^-i, e^-i, 5^) G R for i 6 l..m. We will also be interested in infinite (a; length) paths.

166

Marek Sergot

Causal theories The language C-h can be regarded as a higher-level notation for defining particular classes of theories in the non-monotonic formalism of 'causal theories', and indeed this is how it is presented in [10]. For present purposes the important points are these: for every (definite) action description D and non-negative integer m there is a natural translation from D toa causal theory F ^ which encodes the paths of length m in the transition system defined by D; moreoever, for every definite causal theory F ^ there is a formula com;?(F^) of (classical) prepositional logic whose (classical) models are in 1-1 correspondence with the paths of length m in the transition system defined by D. Thus, one method of computation for C+ action descriptions is to construct the formula comp{T^) from the action description D and then employ a (standard, classical) satisfaction solver to determine the models of comp{r^). This is the method employed in the 'Causal Calculator' C C A L C . We summarise the main steps for completeness; the reader may wish to skip the details on first reading. A full account is given in [10]. A causal theory of signature cr is a set of expressions ('causal rules') of the form F<=G where F and G are formulas of signature a. F is the head of the rule and G is the body. A rule F ^ Gisiobe read as saying that F is 'caused' if G is true, or (perhaps better), that there is an explanation for the truth of F if G is true. Let F be a causal theory and let X be an interpretation of its signature. The redact T^ is the set of all rules of F whose bodies are satified by the interpretation X: F ^ =def {F I F <^ G is a rule in F and X \=G}.Xisa model of F iff X is the unique model (in the sense of multi-valued signatures) of F ^ . A causal theory F is definite if the head of every rule of F is an atom or ±, and no atom is the head of infinitely many rules of F. Every definite causal theory F can be translated into a formula comp(r) of (classical) propositional logic via the process of 'literal completion': for each atom c=v construct the formula c=v <e^ Gi V • VG^ where G i , . . . , Gn (n > 0) are the bodies of the rules of F with head c=v\ comp(F) is the conjunction of all such formulas together with formulas -iF for each rule of the form _L <^ F in F. The models of a definite causal theory F are precisely the (classical) models of its literal completion, comp{T). Given an action description D in C-f, and any non-negative integer m, translation to the corresponding causal theory F ^ proceeds as follows. The signature of F ^ is obtained by time-stamping every fluent and action constant of D with non-negative integers between 0 and m: the (new) atom f[i]=v represents that fluent f=v holds at integer time z, or more precisely, that f=v is satisfied by the state 5, of a path So ^0 • • • €m-iSmOf the transition system defined by D; the atom a[i]=t' represents that action atom a=v is satisfied by the transition e^ of such a path. In what follows, i/:[i] is shorthand for the formula obtained by replacing every atom c=v in z/' by the the timestamped atom c[i]=i;. Now, for every static law F if G in D and every i e 0.. m, include in F £ a causal rule of the form

11 Modelling Unreliable and Untrustworthy Agent Behaviour

167

For every fluent dynamic law F \f G after ipin D and every i G 0.. m—1, include a causal rule of the form F[i-\-l]<=G[i+l]AiP\i] And for every action dynamic law o; if -0 in D and every i e 0.. m—1, include a causal rule of the form a[i] 4= ip[i]

We also require the following 'exogeneity laws'. For every fluent constant / and every v e dom{f), include a causal rule: /[0]=^ <= f[0]=v And for every action constant a, every v G dom{a), and every z G 0.. m—1, include a causal rule: a[z]=i; 4= a[z]=f (There are some further complications in the full C+ language concerning 'statically determined' fluents and non-exogenous actions, which we are ignoring here for simplicity.) It is straightforward to check [10] that the models of causal theory F ^ , and hence the (classical) models of the propositional logic formula comp(r^), correspond 1-1 to the paths of length m of the transition system defined by the C-h action description D. In particular, models of comp{T^) encode the transitions defined by D and models of comp^T^) the states defined by D, Given an action description D and a non-negative integer m, the 'Causal Calculator' CCALC performs the translation to the causal theory F ^ , constructs comp(F^), and then invokes a standard propositional satisfaction solver to find the (classical) models of comp(F£). So, for example, plans of length m from an initial state satisfying fluent formula F to a goal state satisfying fluent formula G can be found by determining the models of the (classical) propositional formula co7np(F^) A F [ 0 ] AG[m]. It must be emphasised, however, that C-h is a language for defining labelled transition systems (of a certain kind), and is not restricted to use with C C A L C . A variety of other languages can be interpreted on the transition system defined by a C-i- action description. In particular, in later sections we will look at the use of the branching time temporal logic CTL for expressing system properties to be checked on transition systems defined by C+. Example (trains) The following example is used in [12, 13] to illustrate the use of alternating-time logic (ATL) for determining the effectiveness of 'social laws' designed to co-ordinate the actions of agents in multi-agent system. We will use the example for a different purpose: in this section, to illustrate use of the language C-f, and in later sections, to show how the extended form C+"^''"can be used to analyse variants of the example in which agents may fail to obey social laws.

168

Marek Sergot

There are two trains, a and b, with a running clockwise round a double track, and b running anti-clockwise. There is a tunnel in which the double track becomes a single track. If the trains are both inside the tunnel at the same time they will collide. The tunnel can thus be seen as a kind of critical section, or as a resource for which the trains must compete. t W

There are obviously many ways in which the example can be formulated. The following will suffice for present purposes. Although it may seem unnecessarily complicated, this formulation is convenient for the more elaborate versions of the example to come later. Let fluent constants loc (a) and loc (b) represent the locations of trains a and b respectively. They both have possible values {W, t, E}. For action constants, we take a and b with possible values {go^ stay}. (Action constants act (a) and act{b) may be easier to read but we choose a and 6 for brevity.) The C-f- action description representing the possible movements of the trains is as follows. We will call this action description Dtrainsinertial loc (a), loc{b) train a moves clockwise: a=go causes loc (a)=t if loc (a)=W a=go causes loc (a)=E if loc (a)=t a-go causes loc (a)=W if loc (a)=E

train b moves anti-clockwise: b=go causes loc (6)=t if loc (6)=E b=go causes loc {b)=\N if loc (6)=t b=go causes loc {b)=E if loc (6)=W

collisions: collision iff loc (a)=t A loc {b)=X

% for convenience

nonexecutable a=go if collision nonexecutable b=go if collision

The Boolean fluent constant collision is introduced for convenience.^ The example can be formulated perfectly well without it. The transition system defined by Dtrains is shown in Fig. 11.1. ^For readers familiar with C-h, a law of the form F iff G is used here as shorthand for the pair of laws F if G and default -iF. default ->F is a C+ abbreviation for the law -iF if ->F.

11 Modelling Unreliable and Untrustworthy Agent Behaviour

EW

WW

tW

EE

WE

tE

Et

Wt

tt

169

Fig. 11.1. The transition system for the trains example. A state label such as EW is short for {loc (a)=E, loc {b)=\N}. Horizontal edges, labelled a-, are transitions in which train a moves and b does not. Vertical edges, labelled -6, are transitions in which train b moves and a does not. Diagonal edges, unlabelled in the diagram, are transitions in which both trains move. Reflexive edges, corresponding to transitions in which neither train moves, are omitted from the diagram to reduce clutter.

11.3 The language C+++ 0+"^"^ is an extended form of the language C+ designed for representing norms of behaviour and institutional aspects of (human or computer) societies [9]. It provides two main extensions to C-f-. The first is a means of expressing ^counts as' relations between actions, also referred to as 'conventional generation' of actions. That will not be discussed further in this paper. The second extension is a way of specifying the permitted (acceptable, legal) states of a transition system and its permitted (acceptable, legal) transitions. Syntax and semantics An action description of C-\-'^'^ defines a coloured transition system, which is a structure of the form: (5, A, R^ 5g, R^) where {S^ A, R) is a labelled transition system of the kind defined by C+ action descriptions, and where the two new components are • •

Sg C S, the set of 'permitted' ('acceptable', 'ideal', 'legal') states—we call 5g the 'green' states of the system; RgC R, the set of 'permitted' ('acceptable', 'ideal', 'legal') transitions—we call Rg the 'green' transitions of the system.

We refer to the complements S — Sg and i? — i?g as the 'red states' and 'red transitions', respectively. Semantical devices which partition states (and here, transitions) into two categories are familiar in the field of deontic logic where are known to yield rather simplistic logics; full discussion of their adequacy is outside the scope of this paper. It is also possible to consider a more elaborate structure, of partially coloured transition systems in which states and transitions can be green, red, or uncoloured, but we shall not present that version here.

170

Marek Sergot

A coloured transition system (S, A, R^ 5g, Rg) must further satisfy the following constraint: •

if (s, e, 5') G jRg and s e Sg then 5' G Sg.

We refer to this as the green-green-green constraint. The idea is that occurrence of a permitted (green) transition in a permitted (green) state must always lead to a permitted (green) state. All other possible combinations of green/red states and green/red transitions are allowed. In particular, and contra the assumptions underpinning JohnJules Meyer's construction of 'dynamic deontic logic' [14], a non-permitted (red) transition can result in a permitted (green) state. Similarly, it is easy to devise examples in which a permitted (green) transition can lead to a non-permitted (red) state. Some illustrations will arise in the examples to be considered later. The only combination that cannot occur is the one eliminated by the 'green-green-green' constraint: a permitted (green) transition from a permitted (green) state cannot lead to a nonpermitted (red) state. The language C-j-"^"^ extends the language C-\- with two new forms of rules. A state permission law is an expression of the form not-permitted F

(11.4)

where F is a fluent formula. An action permission law is an expression of the form not-permitted aifi/j

(11-5)

where a is an action formula and ip is any formula of signature a^Ua^. not-permitted a is an abbreviation for not-permitted a if T. It is also convenient to allow two variants of rule forms (11.4) and (11.5), allowing oblig F as an abbreviation for not-permitted -iF and oblig a as an abbreviation for not-permitted -na. Informally, in the transition system defined by an action description D, a state s is red whenever 5 |= F for any state permission law not-permitted F . All other states are green by default. A transition (s, e, 5') is red whenever sU e \= ip and e \= a for any action permission law not-permitted a \f F after ip. All other transitions are green, subject to the 'green-green-green' constraint which may impose further conditions on the possible colouring of a given transition. Let D be an action description of C-(-"^"^. i^basic refers to the subset of laws of D that are also laws of C-j-. The transition system defined by D has the states 5 and transitions R that are defined by its C-f- component, J5basic» and green states Sg and green transitions Rg given by Sg =cief S — 5red, Rg =def R ~ Rred where S'red =def {« | s |= F for somc law not-permitted F in D} -Rred =def K^, 6, s') | s U 6 |= ^, € \= a foT somc law not-permitted a if V^ in D} U {(5, e, 5') I 5 G Sg and 5' ^ 5g} The second component of the iZred definition ensures that the 'green-green-green' constraint is satisfied.

11 Modelling Unreliable and Untrustworthy Agent Behaviour

171

Example Consider the trains example of section 11.2. A collision is undesirable, unacceptable, not permitted ('red'). Construct an action description Di of C-f"^"^ by adding to the C+ action description Dtrains of section 11.2 the state permission law not-permitted collision

(11.6)

The coloured transition system defined by Di is the transition system of Fig. 11.1 with the collision state tt coloured red and all other states coloured green. The three transitions leading to the collision state are coloured red because of the green-greengreen constraint; all other transistions, including the transition from the collision state to itself, are green. Causal theories Any (definite) action description of C-f"^"*" can be translated to the language of (definite) causal theories, as follows. Let D be an action description and m a non-negative integer. The translation of the C+ component -Dbasic of D proceeds as usual. For the permission laws, introduce two new fluent and action constants, status and trans respectively, both with possible values green and red. They will be used to represent the colour of a state and the colour of a transition, respectively. For every state permission law not-permitted F and time index i e 0.. m, include in r ^ a causal rule of the form status[z]=red <= F[i], and for every z G 0.. m, a causal rule of the form status[z]=green <^ status[z]=green to specify the default colour of a state. A state permission rule of the form oblig F produces causal rules of the form status[i]=red <^ ~*F[i]. For every action permission law not-permitted a if V^ and time index i e 0.. m—1, include in r £ a causal rule of the form trans[i]=red <= a[i] A ilj[i], and for every i e 0..m—1, a causal rule of the form trans[i]=green <^ trans[i]=green to specify the default colour of a transition. An action permission law of the form oblig a if -0 produces causal rules of the form trans[i]=red <= -^a[i] A il;[i]. Finally, to capture the 'green-green-green' constraint, include for every i e 0.. m—1 a causal rule of the form trans[z]=red <= status[i]=green A status[i+l]=red

(11-7)

It is straightforward to show [9] that models of the causal theory r £ correspond to all paths of length m through the coloured transition system defined by D, where the fluent constant status and the action constant trans encode the colours of the states and transitions, respectively. Notice that, although action descriptions in C+"*""'" can be translated to causal theories, they cannot be translated to action descriptions of C-{-: there is no form of causal law in C+ which translates to the green-green-green constraint (11.7). In addition to permission laws of the form (11.4) and (11.5), which are convenient but rather restrictive, the C-\-~^~^ language allows distinguished fluent and action constants status and trans to be used explicitly in formulas and causal laws. The atoms status=red and trans=red can then be regarded as what are sometimes called 'violation constants' in deontic logic. It is also easy to allow more 'shades' of red and green to allow different notions of permitted/legal/acceptable to be mixed. We will not employ that device in the examples discussed in this paper.

172

Marek Sergot

Example (trains, continued) The action description Di of the previous section states that collisions are not permitted but says nothing about how the trains should ensure that collisions are avoided. Suppose, for the sake of an example, that we impose additional norms (social laws), as follows: no train is permitted to enter the tunnel unless the other train has just emerged. (We assume that this will be observed by the train that is preparing to enter.) Will such a law be effective in avoiding collisions? To construct a C^"^"^ action description D2 for this version, ignore (11.6) and instead add to the C+ action description Dtrains the following laws. First, it is convenenient to define the following auxiliary action constants (all Boolean): enter (a) iff a=go A loc (a)=W exit (a) Iff a=qo A loc (a)=t ^ ^ ^ ^ enter {b) iff b=go A loc {b)=E exit (6) iff b=go A loc (6)=t

(11.8)

Again, these are introduced merely for convenience; the example can be constructed easily enough without them. Now we formulate the social laws: not-permitted enter (a) if loc {b)^\N not-permitted enter (6) if loc (a)7^E The coloured transition system for this version of the example is shown in Fig. 11.2. Notice that since we are now using green/red to represent what is permitted/not permitted from the point of view of train behaviour, we have discarded the state permission law (11.6). Consequently the collision state tt is coloured green not red. We could combine the two notions of permission expressed by laws (11.6) and (11.9), for instance by introducing two different shades of green and red and relating them to each other, but we do not have space to discuss that option here. How do we test the effectiveness of the social laws (11.9)? Since the causal theory r^^ encodes the transitions defined by D2, the following captures the property that if both trains comply with the social laws, no collisions will occur. comp(r^^) \= -ico//mon[0] A trans[0]=green -^ ->collision[l]

(11.10)

This can be checked, as in C C A L C , by using a standard sat-solver to determine that the formula comp{r^^) A ->collision[0] A trans[0]=green A collision[l] is not satisfiable. The property (11.10) is equivalently expressed as: comp{Ti'^) 1= -tcollision[0] A collision[l] —> trans[0]=red

(H-H)

which says that a collision occurs only following a transition in which either one train or both violate the norms. Notice that comp(r^^) ^ trans[0]=green -^ -tcollision[l]: as formulated by D2, the transition from a collision state to itself is green.

11 Modelling Unreliable and Untrustworthy Agent Behaviour

EW

WW

tW

EE

WE

tE

Et

Wt

tt

173

Fig. 11.2. Coloured transition system defined by action description D2. Dotted lines indicate red transitions. All states and all other transitions are green. Reflexive edges (all green) are omitted for clarity. One major advantage of taking C+ as the basic action formalism, as we see it, is its explicit transition system semantics, which enables a wide range of other analytical techniques to be applied. In particular, system properties can be expressed in the branching time temporal logic CTL and verified on the transition system defined by a C-\- or C-f-"^"^ action description using standard model checking systems. We will say that a formula (f of CTL is valid on a (coloured) transition system (S',I(cr^),i?, 5g,J^g) defined by C+"^"^ action description D when s U e \= (f for every 5 U e such that (5, e, s') G R for some state s\ The definition is quite standard, except for a small adjustment to allow action constants in (/? to continue to be evaluated on transition labels e. (And we do not distinguish any particular set of initial states; all sets in S are initial states.) We will also say in that case that formula (p is valid on the action description D. In CTL, the formula AX (f expresses that (p is satisfied in the next state in all future branching paths from now.^ EX is the dual of AX: EXcp = -lAX -K^. EX (p expresses that cp is satisfied in the next state of some future branching path from now. The properties (ILIO) and (ILl 1) can thus be expressed in CTL as follows: -> collision A tra ns=green —^ AX -• collision

(1L12)

or equivalently -^collision A EX collision -^ trans=red. It is easily verified by reference to Fig. 11.2 that these formulas are valid on the action description D2. Also valid is the CTL formula EX trans=green which expresses that there is always a permitted action for both trains. This is true even in collision states, since the only available transition is then the one where both trains remain idle, and that transition is green. The CTL formula EF collision is also valid on D2, signifying that in every state there is at least one path from then on with collision true somewhere in the future.^ ^so U eo \= AX (p if for every infinite path so eosiei • • we have that si U ei |= cp. ^so U €0 \= Ef (p if there is an (infinite) path SQ €Q - • • Sm ^m - • • with Sm U €m \= (p for some m > 0.

174

Marek Sergot

11.4 Example: a simple co-ordination mechanism We now consider a slightly more elaborate version of the trains example. In general, we want to be able to verify formally whether the introduction of additional control mechanisms—additional controller agents, communication devices, restrictions on agents' possible actions—are effective in ensuring that agents comply with the norms ('social laws') that govern their behaviour. For the trains, we might consider a controller of some kind, or traffic lights, or some mechanism by which the trains communicate their locations to one another. For the sake of an example, we will suppose that there is a physical token (a metal ring, say) which has to be collected before a train can enter the tunnel. A train must pick up the token before entering the tunnel, and it must deposit it outside the tunnel as it exits. No train may enter the tunnel without possession of the token. To construct the C-f"^+ action description D3 for this version of the example, we begin as usual with the C-f action description Dtrains of section 11.2. We add a fluent constant tok to represent the position of the token. It has values {W, E, a, b}. tok=\N represents that the token is lying at the West end of the tunnel, tok=a that the token is currendy held by train a, and so on. We add Boolean action constants pick (a), pick (6) to represent that a (resp., b) picks up the token, and drop (a), drop (6) to represent that a (resp., 6) drops the token at its current location. For convenience, we will keep the action constants enter (a), enter (6), exit (a), exit (b) defined as in D2 of the previous section. The following causal laws describe the effects of picking up and dropping the token. To avoid duplication, x and / are variables ranging over a and b and locations W, E, t respectively. inertial tok drop (x) causes tok=l if tok=x A loc {x)=l nonexecutable drop (x) if tok^x pick (x) causes tok-x nonexecutable pick {x)

if loc

{x)^tok

The above specifies that the token can be dropped by train x only if train x has the token (tok=x), and it can be picked up by train x only if train x and the token are currently at the same location (loc {x)==tok). Notice that, as defined, an action drop (x) A x=stay drops the token at the current location of train x, and drop (x) A x=^go drops it at the new location of train x after it has moved. Since tok=\ is not a well-formed atom, it is not possible that (there is no transition in which) the token is dropped inside the tunnel, pick {x) A x=go represents an action in which train x picks up the token and moves with it. More refined representations could of course be constructed but this simple version will suffice for present purposes. The action description D3 is completed by adding the following permission laws: not-permitted enter (x) if tok^x A -^pick (x) oblig drop (x) if exit (x) A tok=x

11 Modelling Unreliable and Untrustworthy Agent Behaviour

175

It may be helpful to note that in C+"^~^, the first of these laws is equivalent to oblig pick (x) if enter (x) A tok^x The coloured transition system defined by action description D3 is larger and more complicated than that for D2 of the previous section, and cannot be drawn easily in its entirety. A fragment is shown in Fig. 11.3.

-Et

WW

Et

-WW

Et-

WW

-tE

ET

WW-

IE

EW

-WE

tE

EW-

WE

tE-

-tw

-EW

WE

-tt

tw

EW

WE-

It

Tw

-EE

-Wt

xJ

tw-

EE

Wt

EE-

WT

EE

Wt-

tt-

Fig. 11.3. Fragment of the coloured transition system defined by D3. The figure shows all states but not all transitions. The dash in state labels indicates the position of the token: it is at W/E when the dash is on the left/right, and with train a/b when the dash appears above the location of a/b. Dotted lines depict red transitions. All other depicted transitions, and all states, are green. One property we might wish to verify on D3 is that collisions are guaranteed to be avoided if both trains comply with the norms ('social laws'). Using the 'Causal Calculator' C C A L C , we can try to determine whether co7np{r^)

\= -^collision[0] A trans[0]=green A . . . A trans[m—l]=green —> -^collision[m]

that is, whether the formula comp{T^) A -^collision[0] A trans[0]=green A • • • A trans[m—l]=green A collision[7n] is satisfiable. But what should we take as the

176

Marek Sergot

length m of the longest path to be considered? In some circumstances it is possible to find a suitable value m for the longest path to be considered but it is far from obvious in this example what that value is, or even if one exists. The problem can be formulated conveniently as a model checking problem in CTL. The CTL formula E[trans=green U collision] expresses that there is at least one path with collision true at some future state and trans=green true on all intervening transitions.^ So the property we want to verify can be expressed in CTL as follows: -^collision —^ ->E[trans=green U collision]

(11-14)

It can be seen from Fig. 11.3 that property (11.14) is not valid on the action description D3: there are green transitions leading to collision states, from states where there is already a train inside the tunnel without the token. However, as long as we consider states in which both trains are initially outside the tunnel, the safety property we seek can be verified. The following formula is valid on D3: loc {a)^X A loc {b)y^X -^ -<E[trans=green U collision]

(11.15)

We are often interested in the verification of formulas such as (11.14) and (11.15) which express system properties conditional on norm compliance (conditional on all transitions being green). Verfication of such properties is particularly easy: translate the coloured transition system M = (5, A, jR, 5g, Rg) to the transition system M' = (5g, A, iZg) obtained by deleting all red states and red transitions from M. Now, since in CTL E[T U (^] = EF (f, instead of checking, for example, formula (11.15) on M we can check whether loc {a)^X A loc {b)^X —> -lEF collision

(11.16)

is valid on A^'. This is a standard model checking problem.

11.5 Conclusion We have presented the permission component of the action language C-f-"^"*" and sketched how it can be applied to modelling systems in which agents do not necessarily comply with the norms ('social laws') that govern their behaviour. Space limitations prevented us from discussing more elaborate examples where non-compliance with one set of norms imposes further norms for reparation and/or recovery. We are currently working on the use of C-\-'^'^ as the input language for various temporal logic model checkers, CTL in particular. Scaleability is of course an issue; however, here the limits are set by the model checking techniques employed and not by the use of CH-"*"^. At the MSRAS workshop, our attention was drawn to the model checking technique of [15] which uses program transformations on constraint logic programs representing transition systems to verify formulas of CTL. Since action descriptions in C-h, and in C-\-'^'^, can be related via causal theories to logic programs [10], this presents an interesting avenue to explore. ^5o U €0 1= E[y?i U (/?2] if there is an (infinite) path so eo • • • Sm€m • • • with Sm U em |= (p2 for some m >0 and with SiU ei \= <^i for all 0 < z < TTI.

11 Modelling Unreliable and Untrustworthy Agent Behaviour

177

References 1. Subrahmanian, V.S., Bonatti, P., Dix, J., Eiter, T., Kraus, S., Ozcan, F., Ross, R.: Heterogeneous Agent Systems. MIT Press, Cambridge (2000) 2. Rissanen, E., Sadighi Firozabadi, B., Sergot, M.J.: Towards a mechanism for discretionary overriding of access control (position paper). In: Proc. 12th International Workshop on Security Protocols, Cambridge, April 2004. (2004) 3. Artikis, A., Pitt, J., Sergot, M.J.: Animated specification of computational societies. In Castelfranchi, C , Johnson, W.L., eds.: Proc. 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'02), Bologna, ACM Press (2002) 1053-1062 4. Artikis, A., Sergot, M.J., Pitt, J.: Specifying electronic societies with the Causal Calculator. In Giunchiglia, K, Odell, J., Weiss, G., eds.: Agent-Oriented Software Engineering III. Proc. 3rd International Workshop (AOSE 2002), Bologna, July 2002. LNCS 2585, Springer (2003) 1-15 5. Artikis, A., Sergot, M.J., Pitt, J.: An executable specification of an argumentation protocol. In: Proc. 9th International Conference on Artificial Intelligence and Law (ICAIL'03), Edinburgh, ACM Press (2003) 1-11 6. Lomuscio, A., Sergot, M.J.: Deontic interpreted systems. Studia Logica 75 (2003) 63-92 7. Lomuscio, A., Sergot, M.J.: A formalisation of violation, error recovery, and enforcement in the bit transmission problem. Journal of Applied Logic 2 (2004) 93-116 8. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge. MIT Press, Cambridge (1995) 9. Sergot, M.: The language C-\--^^. In Pitt, J., ed.: The Open Agent Society. Wiley (2004) (In press). Extended version: Technical Report 2004/8. Department of Computing, Imperial College, London. 10. Giunchiglia, E., Lee, J., Lifschitz, V., McCain, N., Turner, H.: Nonmonotonic causal theories. Artificial Intelligence 153 (2004) 49-104 11. Giunchiglia, E., Lifschitz, V.: An action language based on causal explanation: Preliminary report. In: Proc. AAAI-98, AAAI Press (1998) 623-630 12. van der Hoek, W, Roberts, M., Wooldridge, M.: Social laws in alternating time: Effectiveness, feasibility, and synthesis. Technical report, Dept. of Computer Science, University of Liverpool (2004) Submitted. 13. Jamroga, W, van der Hoek, W, Wooldridge, M.: On obligations and abilities. In Lomuscio, A., Nute, D., eds.: Proc. 7th International Workshop on Deontic Logic in Computer Science (DEON'04), Madeira, May 2004. LNAI 3065, Springer (2004) 165-181 14. Meyer, J-J.: A different approach to deontic logic: Deontic logic viewed as a variant of dynamic logic. Notre Dame Journal of Formal Logic 29 (1988) 109-136 15. Fioravanti, R, Pettorossi, A., Proietti, M.: Verifying CTL properties of infinite state systems by specializing constraint logic programs. In: Proceedings of Second ACM-Sigplan International Workshop on Verification and Computational Logic (VCL*01), Florence, September 2001. (2001) 85-96 Expanded version: Technical Report R.544, lASI-CNR, Rome.

12 Nearest Neighbours without k Hui Wang^, Ivo Duntsch^, Giinther Gediga^, and Gongde Guo^ ^ School of Computing and Mathematics University of Ulster at Jordanstown Northern Ireland, BT37 OQB, United Kingdom h . wang|g. guo@uls t . a c . u k ^ Department of Computer Science Brock University St. Catherines, Ontario, L2S 3AI, Canada d u e n t s c h | g e d i g a @ c o s c . b r o c k u . ca Summary. In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a "good" value for k, so in a sense kNN is biased by k. However, it is unclear what is a universally good value for k. We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record - a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass Junction for events weighted by a measurement of events. A mass function is an assignment of basic probability to events. In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods. We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure. Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN. Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here. Key words: k-nearest neighbour, data mining and knowledge discovery, Dempster-Shafer theory, contextual probability, pignistic probability

180

Hui Wang, Ivo Duntsch, Gunther Gediga, and Gongde Guo

12.1 Introduction k-Nearest-Neighbours (kNN) is a popular method for classification. It is simple but effective in many cases [3, 2, 1]. For a data record t to be classified, k nearest neighbours are retrieved, which form a neighbourhood of t. Majority voting with or without weighting among the data records in the neighbourhood is used to decide the classification for t. To apply kNN we need to choose a value for k and a metric for selecting nearest neighbours, and the success of classification is very much dependent on the choices of k and the metric. In a sense kNN is biased by k and the metric. In this paper we look at the choice-of-A: issue only. There are many ways of choosing the k value, but a simple one is to run the algorithm many times with different k values and choose the one with best performance. This approach is effective, but it lacks theoretical justification. Each neighbourhood contains some degree of support for all classes. If we can aggregate all these supports, we could end up with a less biased classification in the sense that it is not too dependent on a single value for k. We call this the aggregation problem. This is the motivation for the work reported in this paper. The Bayes Rule can be applied to this problem as follows. Let D be a dataset, t be a record to be classified, and Ai,A2,'" , ^ n be a series of neighbourhoods of t corresponding to different k values - A;i, A;2, • *' ? ^n- For example, Ai is the set of the nearest ki neighbours of t. Then we have t £ Aifori = 1,- -- ,n. The problem can then be formulated as the calculation of P(c|t, ^1,^42, • • • , ^ n ) . where c is a class label. According to Bayes Rule we have P{c\t,A^,A„

,A„)-

p(,,i^,i^...^^^) P(i|c)P(c) -, since ^ G Ai

Pit) -Pic\t)

Consequently, the neighbourhoods do not play any role at all! To calculate P{c\t) we can take either the regression approach or the class-conditional approach [3], which are two well-studied research areas. In order to aggregate the various supports from neighbourhoods for classification we can try to simply add some measure of the supports. Since the neighbourhoods are not mutually exclusive (i.e., Ai D Aj may be non-empty), probability theory does not apply directly. Dempster-Shafer (D-S) Theory does not require mutual exclusiveness, but it gives two numbers as lower (belief) and upper (plausibility) probabilities. This may give rise to difficulty in using the two numbers for classification. This further motivated us to seek to find a probability function that is additive, does not require mutual exclusiveness, and produces a single number. In the nearest neighbours methodology, the closer the neighbours are to a data record the more relevant they are. In the same spirit different neighbourhoods should play different

12 Nearest Neighbours without k

181

roles in classification, and the smaller a neighbourhood the more relevant it is in classification. This effort resulted in a probability function, which generalises the classical probability function. This function takes into account any set of neighbourhoods, and the smaller a neighbourhood the more significant its contribution.

12.2 Contextual probability Let i? be a finite set called/ram^ of discernment. A mass function is m : 2^ -^ [0,1] such that

Y. ^W = 1

(12.1)

X
The mass function is interpreted as a representation (or measure) of knowledge about i?, and m{A) is interpreted as a degree of support for A. Our objective is to extend our knowledge to those events that we do not know explicitly in m. Therefore we consider a function G :2^ -^ [0,1] such that for any ACQ G(^)=5:m(X)^i^,

(12.2)

The interpretation of the above definition is as follows. Event A may not be known explicitly in the representation of our knowledge, but we know explicitly some events X that are related to it (i.e., A overlaps with Xox Af\X ^^). Part of the knowledge about X, m(X), should then be shared by A, and a measure of this part is 1^4 n

x\/\x\. Theorem 1. G is a probability Junction on Q. That is to say, 1. For any ACQ. G{A) > 0; 2. G{Q) = 1; 3. For Ai,A2e fi, G{A^ U A-z) = G(Ai) + 6 ( ^ 2 ) ifAi n ylj = 0. Proof. The first claim is true following the fact that m{X) > 0 for any XCQ. The equation holds when A = $. The second claim is true since G{Q) = ^xcn "^(-^)Let's now consider the third claim. Xn{AiU A2) = {Xr\Ai)U{Xr\ A2). If Ai n A2 = 0then \Xr\{AiLlA2)\ = \Xr\Ai\ + | X n ^ 2 | - As a result we have

G{A, U ^2) = E m{X)^

'Tir

= E -(^)

182

Hui Wang, Ivo Diintsch, Giinther Gediga, and Gongde Quo

We therefore call G a contextual probability function to emphasize the fact that the probability values are derived from various "contexts" ^. For simplicity, if A is a singleton set, e.g., A = {a}, we write G{a) for G{{a}). Now we look at an example. Example 1. Let i? = {a, 6, c, d, e, / } , and the mass function m be as follows: m{{a,h}) m({a, 6,c}) m{{a^h^c^d}) m({a,6,c,d,e,/})

= = = =

0.3 0.4 0.1 0.2

Suppose that we are interested in the probabilities of the events: {a}, {6}, {c}, {d}, {e}, {/}, {6, c}, {a, 6, d]. According to the definition of G function, we have G{a) =m({a,6}) x J M L + m({a,6,c}) x

r^^^,+

m{{a, b, c,d}) X

'["^1 + m{{a,b,c, d,e, / } ) x '•^"^' \{a,b,c,d}\ ' ' ' ' ' \{a,b,c,d,e,f}\ =0.3 X 1/2 + 0.4 X 1/3 + 0.1 X 1/4 + 0.2 x 1/6 = 41/120 Similarly we have G(6) = G{a), G{c) = 23/120, G{d) = 7/120, G(e) = 4/120, and G{f) = G(e). Clearly G(o) + G{b) + G(c) + G{d) + G(e) + G ( / ) = 1. Further on, we have G{{aAd])

=m({a,6}) x K ^ + ^ ( { « , 5 , c } ) x r r ^ + ||a,6|| Ka,6,c}| + m({a,6,c,c/})x-ii^l^ + m({a,6,c,d,e,/})x

1^^'^'^^' |{a,6,c,d,e,/}| 0.3 + 0.4 X 2/3 + 0.1 X 3/4 + 0.2 x 3/6 = 89/120 : G{a) + G{h) + G{d)

Similarly we have G({6, c}) = 64/120 = G{h) + G(c).

12.3 An interpretation of the mass function The mass function can be interpreted in different ways. In this section we present one interpretation in order to solve the aggregation problem. ^The contextual probability is also known as pignistic probability, which was invented by Smets (see, for example, [4]). The probability (contextual or pignistic) function serves as a theoretical basis for the novel methods developed in this paper.

12 Nearest Neighbours without k

183

Let 5 be a finite set of class labels, and i? be a (finite) dataset each element of which has a class label in S. The labelling is denoted by a function / : i? -^ 5 so that for X G i?, / ( x ) is the class label of x. Consider a class c e S. Let N ^= |i7|, N^ =^ \{x e Q : f{x) = c}|, and ^xcn ^(^1^)- Th^ mass function for c is defined as mc : 2 that, for ACf2, ^c[A)

—> [0,1] such

def P{C\A) P{C\A) = :=^ p . .yv = —TF—

(12.3)

Clearly X ; x c r ? ^ c ( ^ ) = lLet ( ^ be the combinatorial number representing the number of ways of picking n unordered outcomes from N possibilities. From combinatorics we know that /M _

\nj

—

AT!

(N-n)\n\'

Lemma 1, If the distribution over Q is uniform, then

i=l

Proof ^ ( ^ i ^ ) ' definition

Mc=J2 XCi?

v ^ P(X|c)P(c) ^

xcn

P{X\c)Nc/N , ' , •^\X\/N \X\/N

, uniformdistnbution ' N

XCi?

^ 1 =^Nc E ~ i=l

'

'

E

i=l

E

XCi7,|X|=2

^(^1^)' probability property

XCQ,\X\=ix^X

=Nc E

~ (i^i^) E

=Nc E

" (i^i^)' probability distribution property

=-TT(2^

^(^1^)' combination

- 1), combinatorics

184

Hui Wang, Ivo Diintsch, Giinther Gediga, and Gongde Quo Based on the mass function we define G^-.l^ ^ [0,1] such that, for A C i? G,{A)=

^ m , ( X ) ^ ^ ^

(12.4)

From the result in Section 12.2 we know that Gc{A) is a probability function, so it has all the properties of a probability function. l - t cc '^ i : E f = i h ((f-T^) - (f-2^) and /3e '^ t E f = i * ( f - 2 ^ . Since Me = ^ ( 2 ^ - 1) we have /3e = 2 ? ^ E t i :^G^2% Clearly /3e is independent of c, so we drop the subscript for /3. Theorem 2. iff f^e distribution over fi is uniform then, for a & Q and c& S, Gc{a) = P{c\a)ac + 13 Proof. By definition we have Gc(a) = E

^ ^ - < ^ W ^

=

1 P{c\X) ^

E 1

^ ^

P(c|X)

According to Bayes rule, P{c\X) =

P{c)P{X\c) P{X)

If the distribution over Q is uniform, then P{c) = N^/N and P(A') = |X|/Ar. Therefore we have P{c\X) =

j^^P{X\c)

As a result we have

Note also that P{a\c) = P{c\a)P{a)/P{c). If the distribution is uniform, P{a) = 1/N and P{c) = Nc/N. Therefore P{a\c) = P{c\a)/Nc. Following similar workings as in the proof of Lemma 1 we then have «c(«) = # E

4 {(f-7^)^(«k) + (f--.^[l - Pia\c)]}

=NcP{a\c)ac + (3 = P{c\a)ac + (3

12 Nearest Neighbours without k

185

Now we consider an example. Example 2. Consider a set J? = {a, 6, c} with uniform probability distribution. There are two classes (H- and - ) on the elements: {a: +, b: -, c: -1}. We want to show the relationship between P{c\A) and Gc{A) for AC.Q. Here AT = 3, Ar+ = 1, and iV_ = 2. Then we have M+ = 7/3, M_ = 14/3, a+ = 15/28, a_ = 15/56, and (3 = 13/84. The P{A) values are obtained by the uniform distribution assumption, and the additive property of probability functions. The P{c\A) are calculated from P{A,c) and P{A) by P\c\A) = P{A, c)/P{A). The mdA) are calculated according to Eq. 12.3. The Gc{A) are calculated according to Eq. 12.2. The results of calculation are shown in Table 12.1. Now we illustrate the calculation of Gc{a). r

(n\

rr. /f^iN • ^4.({a, 6}) + m^.({fl, c}) , m.^,({Q,6,c}) _ 3

_3^

i_ _

^

"7'^14'^21~84

G^(a) =m_({a}) + ^-(K^}) + ^-(K^i) + ^-({^^^^^}) = -A 2 _ 13 ~ 28 "^ 3 X 14 ~ 84 Note that due to the additive property of G, if \A\ > 1 we can obtain Gc{A) accordingtoGe(^) = Ex6AGc(a;). From Table 12.1 we can verify that the relationship between Gci^A) and P(c\A) holds for singleton A. We can also verify that the same relationship does not hold for non-singleton A, For example P(H-|{a, 6})Q;+ + ^ = 1/2 X 15/28 -I-13/84 = 71/168, whereas G^.({a,6}) = 71/84. The above example is meant to show the relationship between P{c\A) and Gc{A), In practice our knowledge is only up to a certain higher granularity level, and it is our aim to apply the knowledge to infer on more detailed cases. For some tasks (e.g., classification), we don't know P(c\x) for some a; G i?, but we may know P{c\A) for some ACQ, The task can be tackled by approximating P{c\x). Now we present an example to show how this can be done.

12.4 kNN for classification using multiple neighbourhoods Based on the contextual probability, we designed and implemented a kNN classification procedure based on multiple neighbourhoods - nokNN classifier or just nokNN for short. To classify a new data record we consider its h neighbourhoods, each of which is determined as in the standard kNN method. Supports for the class membership of the data record from these neighbourhoods are aggregated to give a combined class

186

Hui Wang, Ivo Duntsch, Gunther Gediga, and Gongde Guo

Table 12.1. Q = {a, 6, c} with uniform probability distribution. The elements in the set fall into two classes (+ and —): {a: +, b: -, c: -1}. The table is meant to show the relationship between P and G.

A 1 \ J__ PiA) 1 P{A^ P{A-)\ P{MA)\ P{-\A)\ m+{A) m-{A)\ G^iA) G-{A)\

A 1 [Ml PiA) 1 2/3 ^(A+) 1/3 PiA-)\ 1/3 P{MA)\ 1/2 PHA)\ 1/2 m+(A) 3/14 m-{A)\ 3/28 71/84 G^A) G-{A)\ 97/168

{a} {b} {c} 1/3 1/3 1/3 1/3 0 0 0 1/3 1/3 1 0 0 0 1 1 3/7 0 0 0 3/14 3/14 58/84 13/84 13/84 13/84 71/168 71/168 {a,c} {b,c} {a,b,c} 2/3 2/3 1 1/3 0 1/3 1/3 2/3 2/3 1/2 0 1/3 1/2 1 2/3 3/14 1/7 0 3/28 3/14 2/14 71/84 26/84 1 97/168 142/168 1

distribution, and the classification is done by simply choosing the class which has the highest conditional probability. This algorithm was evaluated with real world datasets in order to see if and how aggregating different neighbourhoods improves classification accuracy. In the experiment we used 7 public datasets available from UC Irvine Machine Learning Repository. General information about these datasets is shown in Table 12.2. Table 12.2. General information about the datasets used in the experiment including number of attributes, number of examples, and number of classes. Data #Attr. #Exa. #Cls Australian 14 690 2 Colic 22 368 2 Diabetes 8 768 2 Hepatitis 19 155 2 Iris 4 150 3 Sonar 60 208 2 Wine 13 178 3

12 Nearest Neighbours without k

187

In order to compare with the standard kNN we also implemented the standard voting kNN. kNN and nokNN were both implemented in C++ '^. In the experiment, 10 neighbourhoods were used and for every dataset, kNN was run with varying neighbourhoods (e.g., 1st neighbourhood, 2nd neighbourhood) and nokNN was run with varying number of neighbourhoods (e.g., 1 neighbourhood, 2 neighbourhoods). Due to page limit we can not show full details of the results. Figures 12.1 and 12.2 show the full details for one dataset - Diabetes, and the averages for all datasets. Table 12.3 shows that worst and best performance of kNN along with the corresponding "k" values, and the performance of nokNN when all 10 neighbourhoods were used. Note that 5-fold cross validation was used.

Diabetes

4

5

6

7

Neighbourhood

Average

i§ in

(0

c .-g *^

o 2 o o

CO

> w o ^

83 82.5 82 81.5 81 80.5 4

5

6

7

8

9

10

Neighbourhood

Fig. 12.1. ^NN results.

From the experiment result it is clear that the standard voting kNN performance varies when different neighbourhoods are used while nokNN performance improves with increasing number of neighbourhoods but stabilises after certain stages. Fur"^The implementation can be found at ~cbcj 23/papers/nokNN.html.

http://www.infj.ulst.ac.uk/

188

Hui Wang, Ivo Dtintsch, Gunther Gediga, and Gongde Guo Diabetes

2

3

4

5

6

7

8

9

10

Count of neighbourhoods

Average

u ih %

c 2

85 84

83 u > £ 0) 82

+«•

(0

^

(A

oO o^ 81 1

2

3

4

5

6

7

8

9

10

Count of neighbourhoods

Fig. 12.2. noA;NN results.

Table 12.3. The worst and best performance of kNN along with the corresponding values for k. Also the performance of nokNN when 10 neighbourhoods are used. Dataset Australian Colic Diabetes Hepatitis Iris Sonar Wine Average

nokNN kNN Worst case Best case All of 10 k %correct k %correct %correct 85.15 2 83.04 10 85.48 82.63 7 79.64 2 82.89 74.86 1 71.73 2 74.22 79.35 1 78.71 2 79.35 96.00 1 93.33 3 96.00 76.43 10 65.89 1 72.08 93.21 3 89.29 1 92.65 83.24 83.95 80.23

12 Nearest Neighbours without k

189

thermore the stabilised performance is comparable (in fact slightly better in our experiment on the datasets) to the best performance of kNN within 10 neighbourhoods.

12.5 Summary and conclusion In this paper we have discussed the "choice-of-A:" issue related to the kNN method for classification. In order for kNN to be less dependent on the choice of value for k, we proposed to look at multiple sets of nearest neighbours rather than just one set of k nearest neighbours. A set of neighbours is here called a neighbourhood. For a data record t each neighbourhood bears certain support for different possible classes. The key question is: how can we aggregate these supports to give a more reliable support value which better reveals the true class of t? In order to answer this question we presented a probability function, G. It is defined in terms of a mass function on events and it takes into account the cardinality of events. A mass function is a basic probability assignment for events. For the classification problem, an event is specified as a neighbourhood and a mass function is taken to represent the degree of support for a particular class from different neighbourhoods. Under this specification we have shown that G is a linear function of conditional probability, which can be used to determine the class of a new data record. In other words we calculate G from a set of neighbourhoods, then we calculate the conditional probability from G according the linear equation, and finally we classify based on the conditional probability. We designed and implemented a classification algorithm based on the contextual probability - nokNN. Experiment on some public datasets shows that using nokNN the classification performance (accuracy) increases as the number of neighbourhoods increases but stabilises soon after a few number of neighbourhoods; using the standard voting kNN, however, the classification performance varies when different neighbourhoods are used. Experiment further shows that the stabilised performance of nokNN is comparable (in fact, slightly better than) to the best performance of kNN. This fulfils our objective.

References 1. Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1-5): 11-73. 2. Han, J. and Kamber, M. (2000). Data Mining : Concepts and Techniques. Morgan Kaufmann. 3. Hand, D., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. The MIT Press. 4. Smets, P. and Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2):191-234.

13 Classifiers Based on Approximate Reasoning Schemes Jan Bazan^ and Andrzej Skowron^ ^ Institute of Mathematics, University of Rzeszow Rejtana 16A, 35-959 Rzeszow, Poland [email protected] ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected]

Summary. We discuss classifiers [3] for complex concepts constructed from data sets and domain knowledge using approximate reasoning schemes (AR schemes). The approach is based on granular computing methods developed using rough set and rough mereological approaches [9, 13, 7]. In experiments we use a road simulator (see [15]) making it possible to collect data, e.g., on vehicle-agents movement on the road, at the crossroads, and data from different sensor-agents. We compare the quality of two classifiers: the standard rough set classifier based on the set of minimal decision rules and the classifier based on AR schemes.

13.1 Introduction A classification algorithm {classifier) permits making a forecast in new situations on the basis of accumulated knowledge. We consider here classifiers predicting decisions for objects previously unseen; each new object will be assigned to a class belonging to a predefined set of classes on the basis of observed values of suitably chosen attributes (features). Many approaches have been proposed for constructing of classifiers. Among them we would like to mention classical and modem statistical techniques, neural networks, decision trees, decision rules and inductive logic programming (see e.g. [5] for more details). One of the most popular methods for classification algorithms constructing is based on learning rules from examples. The standard rough set methods based on calculation of so called local reducts makes it possible to compute, for a given data, the descriptions of concepts by means of minimal consistent decision rules (see, e.g., [6], [2]). Searching for relevant patterns for complex concepts can be performed using AR schemes. AR schemes (see, e.g., [13]) can be treated as approximations of reasoning performed on concepts from domain knowledge and they represent relevant patterns for complex classifier construction. The proposed approach is based on granular

192

Jan Bazan and Andrzej Skowron

computing methods developed using rough set and rough mereological approaches [9,13,7]. In our experiments we use a road simulator (see [15]) making it possible to collect data, e.g., on vehicle-agents movement on the road and at the crossroads and data from different sensor-agents. The simulator also registers a few more features, whose values are defined by an expert. Any AR scheme is constructed from labelled approximate rules, called productions that can be extracted from data using domain knowledge [13]. In the paper we present a method for extracting productions from data collected by road simulator and an algorithm for classifying objects by productions, that can be treated as an algorithm for on-line synthesis of AR scheme for any tested object. We report experiments supporting our hypothesis that classifiers induced using the AR schemes are of higher quality than the traditional rough set classifiers (see Section 13.5). For comparison we use data sets generated by road simulator.

13.2 Approximate reasoning scheme One of the main tasks of data exploration [4] is discovery from available data and expert knowledge of concept approximations expressing properties of the investigated objects and rules expressing dependencies between concepts. Approximation of a given concept can be constructed using relevant patterns. Any such pattern describes a set of objects belonging to the concept to a degree p where 0 < p < 1. Relevant patterns for complex concepts can be represented by AR schemes. AR schemes can be treated as approximations of reasoning performed on concepts from domain knowledge. Any AR scheme is constructed from labeled approximate rules, called productions. Productions can be extracted from data using domain knowledge. We define productions as a parameterized implications with premises and conclusion built from patterns sufficiently included in the approximated concept. C3> "large"

C3> "medium" CI > "medium" C2 > "large" C3> "small" CI > "small" C2 > "medium "

CI > "smair

C2 > "small"

Fig. 13.1. The example of production as a collection of three production rules In Figure 13.1 we present an example of production for some concepts CI, C2 and C3 approximated by three linearly ordered layers small, medium, and large. This

13 Classifiers Based on Approximate Reasoning Schemes

193

production is a collection of three simpler rules, called production rules, with the following interpretation: (1) if inclusion degree to a concept CI is at least medium and to concept C2 at least large then the inclusion degree to a concept C3 is at least large', (2) if the inclusion degree to a concept CI is at least small and to a concept C2 at least medium then the inclusion degree to a concept C3 is at least medium', (3) if the inclusion degree to a concept CI is at least small and to a concept C2 at least small then the inclusion degree to a concept C5 is at least small. The concept from the upper level of production is called the target concept of production, whilst the concept from the lower level of production are called the source concepts of production. For example, in case of production from Figure 13.1 C3 is the target concept and CI, C2 are the source concepts.

Cl-si^'smair C2>"memmf C4>''smair

Fig. 13.2. Synthesis of approximate reasoning scheme

One can construct AR scheme by composing single production rules chosen from different productions from a family of productions for various target concepts. In Figure 13.2 we have two productions. The target concept of the first production is C5 and the target concept of the second production is the concept C3. We select one production rule from the first production and one production rule from the second production. These production rules are composed and then a simple AR-scheme is obtained that can be treated as a new two-levels production rule. Notice, that the target pattern of lower production rule in this AR-scheme is the same as one of the source patterns from the higher production rule. In this case, the common pattern is

194

Jan Bazan and Andrzej Skowron

described as follows: inclusion degree (of some pattern) to a concept C3 is at least medium. In this way, we can compose AR-schemes into hierarchical and multilevel structures using productions constructed for various concepts.

13.3 Road simulator Road simulator (see [15]) is a tool for generating data sets recording vehicle movement on the road and at the crossroads (see [15]). Such data is extremely crucial in testing of complex decision systems monitoring the situation on the road that are working on the basis of information coming from different devices. ,.^MiM Maximal number of vehicles: 20 Current number of vehicles: 14 Humidily: LACK Visibility: 500 Traffic parameter of main road: 0.5 Traffic parameter of subordinate road: 0.2 Current simulation step: 68 (from 500) Saving data: NO

SOUTH I

Fig. 13.3. The board of simulation Driving simulation takes place on a board (see Figure 13.3) which presents a crossroads together with access roads. During the simulation the vehicles may enter the board from all four directions that is East, West, North and South. The vehicles coming to the crossroads form South and North have the right of way in relation to the vehicles coming from West and East. Each of the vehicles entering the board has only one aim - to drive through the crossroads safely and leave the board. The simulation takes place step by step and during each of its steps the vehicles may perform the following maneuvers during

13 Classifiers Based on Approximate Reasoning Schemes

195

the simulation: passing, overtaking, changing direction (at the crossroads), changing lane, entering the traffic from the minor road into the main road, stopping and pulling out. Planning each vehicle's further steps takes place independently in each step of the simulation. Each vehicle, is "observing" the surrounding situation on the road, keeping in mind its destination and its own parameters (driver's profile), makes an independent decision about its further steps; whether it should accelerate, decelerate and what (if any) maneuver should be commenced, continued, ended or stopped. Making decisions concerning further driving, a given vehicle takes under consideration its parameters and the driving parameters of five vehicles next to it which are marked FRl, FR2, FL, BR and BL (see Figure 13.4).

FL-H-

1

-+-FR2

i-

-H—FRl

BL-

4 i-

A given vehicle

h-BR

Fig. 13.4. A given vehicle and five vehicles next to it

During the simulation the system registers a series of parameters of the local simulations, that is simulations connected with each vehicle separately, as well as two global parameters of the simulation that is parameters connected with driving conditions during the simulation. The value of each simulation parameter may vary and what follows it has to be treated as a certain attribute taking values from a specified value set. We associate the simulation parameters with the readouts of different measuring devices or technical equipment placed inside the vehicle or in the outside environment (e.g., by the road, in a helicopter observing the situation on the road, in a police car). These are devices and equipment playing the role of detecting devices or converters meaning sensors (e.g., a thermometer, range finder, video camera, radar, image and sound converter). The attributes taking the simulation parameter values, by analogy to devices providing their values will be called sensors. The exemplary sensors are the following: initial and current road (four roads), distance from the crossroads (in screen units), current lane (two lanes), position of the vehicle on the road (values from 0.0 to 1.0), vehicle speed (values from 0.0 to 10.0), acceleration and deceleration, distance of a given vehicle from FRl, FL, BR and BL

196

Jan Bazan and Andrzej Skowron

vehicles and between FRl and FR2 (in screen units), appearance of the vehicle at the crossroad (binary values), visibility (expressed in screen units values from 50 to 500), humidity (slipperiness) of the road (three values: lack of humidity - dry road, low humidity, high humidity). If, for some reason, the value of one of the sensors may not be determined, the value of the parameter becomes equal NULL (missing value). Apart from sensors the simulator registers a few more attributes, whose values are determined using the sensor's values in a way determined by an expert. These parameters in the present simulator version take the binary values and are therefore called concepts. The results returned by testing concepts are very often in a form YES, NO or DOES NOT CONCERN (NULL value). Here are exemplary concepts: 1. 2. 3. 4. 5.

Is the vehicle forcing the right of way at the crossroads? Is there free space on the right lane in order to end the overtaking maneuver? Will the vehicle be able to easily overtake before the oncoming car? Will the vehicle be able to brake before the crossroads? Is the distance from the FRl vehicle too short or do we predict it may happen shortly? 6. Is the vehicle overtaking safely? 7. Is the vehicle driving safely? Besides binary concepts, simulator registers for any such concept one special attribute that approximates binary concept by six linearly ordered layers: certainly YES, rather YES, possibly YES, possibly NO, rather NO and certainly NO. Some concepts related to the situation of the road are simple and classifiers for them can be induced directly from sensor measurement but for more complex concepts this is infeasible. In searching for classifiers for such concepts domain knowledge can be helpful. The relationships between concepts represented in domain knowledge can be used to construct hierarchical relationship diagrams. Such diagrams can be used to induce multi-layered classifiers for complex concepts (see [14] and next section). In Figure 13.5 there is an exemplary relationship diagram for the above mentioned concepts. The concept specification and concept dependencies are usually not given automatically in accumulated data sets. Therefore they should be extracted from a domain knowledge. Hence, the role of human experts is very important in our approach. During the simulation, when a new vehicle appears on the board, its so called driver's profile is determined. It may take one of the following values: a very careful driver, a careful driver and a careless driver. Driver's profile is the identity of the driver and according to this identity further decisions as to the way of driving are made. Depending on the driver's profile and weather conditions (humidity of the road and visibility) speed limits are determined, which cannot be exceeded. The generated data during the simulation are stored in a data table (information system). Each row of the table depicts the situation of a single vehicle and the sensors' and concepts' values are registered for a given vehicle and the FRl, FR2, FL,

13 Classifiers Based on Approximate Reasoning Schemes

197

Safe driving Safe overtaking

Safe distance from FL during overtaking

Forcing the right of way

Possibility of going back to the right lane

Possibility of safe stopping before the crossroads

SENSORS Fig. 13.5. The relationship diagram for presented concepts

BL and BR vehicles (associated with a given vehicle). Within each simulation step descriptions of situations of all the vehicles on the road are saved to file.

13.4 Algorithm for classifying objects by production In this section we present an algorithm for classifying objects by a given production but first of all we have to describe the method for the production inducing. To outline a method for production inducing let us assume that a given concept C registered by road simulator depends on two concepts CI and C2 (registered be road simulator too). Each of these concepts can be approximated by six linearly ordered layers: certainly YES, rather YES, possibly YES, possibly NO, rather NO and certainly NO. We induce classifiers for concepts CI and C2. These classifiers generate the corresponding weight (the name of one of six approximation layers) for any tested object. We construct for the target concept C a table T over the Cartesian product of sets defined by weight patterns for CI, C2, assuming that some additional constraints hold. Next, we add to the table T the last column, that is an expert decision. From the table T, we extract production rules describing dependencies between these three concepts. In Figure 13.6 we illustrate the process of extracting production rule for concept C and for the approximation layer rather YES of concept C The production rule can be extracted in the following four steps: 1. Select all rows from the table T in which values of column C is not less than rather YES.

198

Jan Bazan and Andrzej Skowron The tatget pattern of production rule

C 1 C2 1 ^^ 1 certainly YES certainly YES certainly YES \ certainly NO

1 certainly NO

certBinly NO \

certainly YES rather YES 1 possibly YES possibly NO possibly YES \ 1 rather YES

1 possibly YES possibly NO 1 possibly YES ratiierYES

rather NO \ rather YES \

1 possibly YES certainly NO possibly NO \ C1> possibly YES 1 certainly YES rather YES certainly YES | possibly NO

1 certainly NO

C2> rather YES

The source patterns of production rule

certainly NO \

certainly YES > rather YES > possibly YES > possibly NO > rather NO > certainly NO Fig. 13.6. The illustration of production rule extracting 2. Find minimal values of attributes CI and C2 from table T for selected rows in the previous step (in our example it easy to see that for the attribute CI minimal value is possibly YES and for the attribute C2 minimal value is rather YES). 3. Set sources patterns of new production rule on the basis of minimal values of attributes that were found in the previous step. 4. Set the target pattern of new production, i.e., concept C with the value rather YES. Finally, we obtain the production rule: (*) If (CI > possibly YES) and {C2 > rather YES) then (C > rather YES). A given tested object can be classified by the production rule (*), when weights generated for the object by classifiers induced for concepts from the rule premise are at least equal to degrees from source (premise) patterns of the rule. Then the production rule classifies tested object to the target (conclusion) pattern. 1 1

CI certainly YES

1

CI

1

possibly YES

C2 n — ^ rather YES \ 02

1

certainly NO \

-/>

0>ratherYES

C>rather YES

Fig. 13.7. Classifying tested objects by single production rule

13 Classifiers Based on Approximate Reasoning Schemes

199

For example, the object ui from Figure 13.7 is classified by production rule (*) because it is matched by the both patterns from the left hand side of the production rule (*) whereas, the object U2 from figure 13.7 is not classified by production rule (*) because it is not matched by the second source pattern of production rule (*) (the value of attribute C2 is less than rather YES). The method of extracting production rule presented above can be applied for various values of attribute C In this way, we obtain a collection of production rules, that we mean as a production. Using production rules selected from production we can compose AR schemes (see Section 13.2). In this way relevant patterns for more complex concepts are constructed. Any tested object is classified by AR scheme, if it is matched by all sensory patterns from this AR scheme. The method of object classification based on production can be described as follows: 1. Preclassify object to the production domain. 2. Classify object by production. We assume that for any production a production guard (boolean expression) is given. Such a guard describes the production domain and is used in preclassification of tested objects. The production guard is constructed using domain knowledge. An object can be classified by a given production if it satisfies the production guard. For example, let us assume that the production P is generated for the concept: "Is the vehicle overtaking safely ?". Then an object-vehicle u is classified by production P iff ti is overtaking. Otherwise, it is returned a message ''HAS NOTHING TO DO WITH (OVERTAKING) ". Now, we can present algorithm for classifying objects by production. Algorithm 1. The algorithm for classifying objects hy production Step 1 Select a complex concept C from relationship diagram. Step 2 If the tested object should not be classified by a given production P extracted for the selected concept C, i.e., it does not satisfy the production guard: return HAS NOTHING TO DO WITH Step 3 Find a rule from production P that classifies object with the maximal degree to the target concept of rule if such a rule of P does not exist return / DO NOT KNOW. Step 4 Generate a decision value for object from the degree extracted in the previous step if (the extracted degree is greater than possibly YES) then the object is classified to the concept C (return YES) else the object is not classified to the concept C (return NO). The algorithm for classifying objects by production presented above can be treated as an algorithm of dynamical synthesis of AR scheme for any tested object. It is easy to see, that during classification any tested object is classified by single

200

Jan Bazan and Andrzej Skowron

Table 13.1. Results of experiments for the concept: "Is the vehicle overtaking safely?" Decision class Method Accuracy Coverage Real accuracy 0.784 YES RS 0.949 0.826 0.974 ARS 0.973 0.948 0.889 NO RS 0.979 0.870 ARS 0.926 1.0 0.926 All classes RS 0.999 0.996 0.995 (YES + NO) ARS 0.999 0.999 0.998

production rule selected from production. It means that the production rule is dynamically assigned to the tested object. In other words, the approximating reasoning scheme is dynamically synthesized for any tested object. We claim that the quality of the classifier presented above is higher than the classifier constructed using algorithm based on the set of minimal decision rules. In the next section we present the results of experiments with data sets generated by road simulator supporting this claim.

13.5 Experiments with Data To verify effectiveness of classifiers based on AR schemes, we have implemented our algorithms in the AS-lib programming library. This is an extension of the RSESlib 2.1 progranmiing library creating the computational kernel of the RSES system [16]. The experiments have been performed on the data sets obtained from the road simulator. We have applied the train and test method for estimating accuracy (see e.g. [5]). Data set consists of 18101 objects generated by the road simulator. This set was randomly divided to the train table (9050 objects) and the test table (9051 objects). In our experiments, we compared the quality of two classifiers: RS and ARS. For inducing of RS we use RSES system generating the set of minimal decision rules that are next used for classifying situations from testing data. ARS is based on AR schemes. We compared RS and ARS classifiers using accuracy of classification, learning time and the rule set size. We also checked the robustness of classifiers. Table 13.1 and table 13.2 show the results of the considered classification algorithms for the concept: "Is the vehicle overtaking safely?" and for the concept "Is the vehicle driving safely?" respectively. One can see that accuracy of algorithm ARS is higher than the accuracy of the algorithm RS for analyzed data set. Table 13.3 shows the learning time and the number of decision rules induced for the considered classifiers. In case of the number of decision rules we present the average over all concepts (from the relationship diagram) number of rules.

13 Classifiers Based on Approximate Reasoning Schemes

201

Table 13.2. Results of experiments for the concept: "Is the vehicle driving safely?" Decision class Method Accuracy Coverage Real accuracy YES NO All classes (YES + NO)

RS ARS RS ARS RS ARS

0.978 0.962 0.633 0.862 0.964 0.958

0.946 0.992 0.740 0.890 0.935 0.987

0.925 0.954 0.468 0.767 0.901 0.945

Table 13.3. Learning time and the rule set size for concept: "Is the vehicle driving safely?" Method Learning time Rule set size 835 801 seconds RS ARS 247 seconds 189

One can see that the learning time for ARS is much shorter than for RS and the number of decision rules induced for ARS is much lower than the number of decision rules induced for RS.

13.6 Summary We have discussed a method for construction (from data and domain knowledge) of classifiers for complex concepts using AR schemes (ARS classifiers). The experiments showed that: • • •

the accuracy of classification by ARS is better than accuracy of RS classifier, the learning time for ARS is much shorter than for RS, the number of decision rules induced for ARS is much lower than the number of decision rules induced for RS.

Finally, the ARS classifier is much more robust than the RS classifier. The results are consistent with the rough-mereological approach. Acknowledgement. The research has been supported by the grant 3 T l l C 002 26 from Ministry of Scientific Research and Information Technology of the Republic of Poland.

References 1. Bazan J. (1998) A comparison of dynamic non-dynamic rough set methods for extracting laws from decision tables. In: [8]: 321-365 2. Bazan J., Nguyen H. S., Skowron A., Szczuka M. (2003) A view on rough set concept approximation. LNAI2639, Springer, Heidelberg: 181-188

202

Jan Bazan and Andrzej Skowron

3. Friedman J. H., Hastie T., Tibshirani R. (2001) The elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg. 4. Kloesgen W., Zytkow J. (eds) (2002) Handbook of KDD. Oxford University Press 5. Michie D., Spiegelhalter D.J., Taylor, C.C. (1994) Machine learning, neural and statistical classification. Ellis Horwood, New York 6. Pawlak Z (1991) Rough sets: Theoretical aspects of reasoning about data. Kluwer, Dordrecht. 7. Pal S. K., Polkowski L., Skowron A. (eds) (2004) Rough-Neuro Computing: Techniques for Computing with Words. Springer-Verlag, Berlin. 8. Polkowski L., Skowron A. (eds) (1998) Rough Sets in Knowledge Discovery 1-2, Physica-Verlag, Heidelberg. 9. Polkowski L., Skowron, A. (1999) Towards adaptive calculus of granules. In: [17]: 201227 10. Polkowski L., Skowron A. (2000) Rough mereology in information systems. A case study: Qualitative spatial reasoning. In: Polkowski L., Lin T. Y., Tsumoto S. (eds). Rough Sets: New Developmentsin Knowledge Discovery in Information Systems, Studies in Fuzziness and Soft Computing 56, Physica-Verlag, Heidelberg: 89-135 11. Skowron, A. (2001) Toward intelligent systems: Calculi of information granules. Bulletin of the International Rough Set Society 5 (1-2): 9-30 12. Skowron A., Stepaniuk J. (2001) Information granules: Towards foundations of granular computing. International Journal of Intelligent Systems 16 (1): 57-86 13. Skowron A., Stepaniuk J. (2002) Information granules and rough-neuro computing. In [7]: 43-84 14. Stone P. (2000) Layered Learning in Multi-Agent Systems: A Winning Approach to Robotic Soccer. The MIT Press, Cambridge, MA 15. The Road simulator Homepage - l o g i c . m i m u w . e d u . p i / ^ b a z a n / s i m u l a t o r 16. The RSES Homepage - l o g i c , mimuw . e d u . p l / ~ r s e s 17. Zadeh L. A., Kacprzyk J. (eds.) (1999) Computing with Words in Information/Intelligent Systems 1-2. Physica-Verlag, Heidelberg

14 Towards Rough Applicability of Rules Anna Gomoliiiska* University of Bialystok, Department of Mathematics, Akademicka 2, 15267 Bialystok, Poland [email protected]

Summary. In this article, we further study the problem of soft applicability of rules within the framework of approximation spaces. Such forms of applicability are generally called rough. The starting point is the notion of graded applicability of a rule to an object, introduced in our previous work and referred to as fundamental. The abstract concept of rough applicability of rules comprises a vast number of particular cases. In the present paper, we generalize the fundamental form of applicability in two ways. Firstly, we more intensively exploit the idea of rough approximation of sets of objects. Secondly, a graded applicability of a rule to a set of objects is defined. A better understanding of rough applicability of rules is important for building the ontology of an approximate reason and, in the sequel, for modeling of complex systems, e.g., systems of social agents. Key words: approximation space, ontology of approximate reason, information granule, graded meaning of formulas, applicability of rules To Emilia

14.1 Introduction It is hardly an exaggeration to say that soft application of rules is the prevailing form of rule following in real life situations. Though some rules (e.g., instructions, regulations, laws, etc.) are supposed to be strictly followed, it usually means "as strictly as possible" in practice. Typically, people tend to apply rules "softly" whenever the expected advantages (gain) surpass the possible loss (failure, harm). Soft application of rules is usually more efficient and effective than the strict one, however, at the cost of the results obtained. In many cases, adaptation to changing situations requires a

*Many thanks to James Peters, Alberto Pettorossi, Andrzej Skowron, Dominik Sl^zak, and last but not least to the anonymous referee for useful and insightful remarks. The research has been partially supported by the grant 3T11C00226 from the Ministry of Scientific Research and Information Technology of the Republic of Poland.

204

Anna Gomolinska

change in the mode of application of rules only, retaining the rules unchanged. Allowing rules to be applied softly simplifies multi-attribute decision making under missing or uncertain information as well. As a research problem, applicability of rules concerns strategies (meta-rules) which specify the permissive conditions for passing from premises to conclusions of rules. In this paper, we analyze soft applicability of rules within the framework of approximation spaces (ASs) or, in other words, rough applicability of rules. The first step has been already made by introducing the concept of graded applicability of a rule to an object of an AS [3]. This fundamental form of applicability is based on the graded satisfiability and meaning of formulas and their sets, studied in [2]. The intuitive idea is that a rule r is applicable to an object u in degree Hff a sufficiently large part of the set of premises of r is satisfied for uin a. sufficient degree, where sufficiency is determined by t. We aim at extending and refining this notion step by step. For the time being, we propose two generalizations. In the first one, the idea of approximation of sets of objects is exploited more intensively. The second approach consists in extending the graded applicability of a rule to an object to the case of graded applicability of a rule to a set of objects. Studying various rough forms of applicability of rules is important for building the ontology of an approximate reason. In [9], Peters et al. consider structural aspects of such an ontology. A basic assumption made is that an approximate reason is a capability of an agent. Agents classify information granules, derived from sensors or received from other agents, in the context of ASs. One of the fundamental forms of reasoning is a reflective judgment that a particular object (granule of information) matches a particular pattern. In the case of rules, agents judge whether or not, and how far an object (set of objects) matches the conditions for applicability of a rule. As explained in [9]: Judgment in agents is a faculty of thinking about (classifying) the particular relative to decision rules derived from data. Judgment in agents is reflective but not in the classical philosophical sense [...]. In an agent, a reflective judgment itself is an assertion that a particular decision rule derived from data is applicable to an object (input). [... ] Again, unlike Kant's notion of judgment, a reflective judgment is not the result of searching for a universal that pertains to a particular set of values of descriptors. Rather, a reflective judgment by an agent is a form of recognition that a particular vector of sensor values pertains to a particular rule in some degree. The ontology of an approximate reason may serve as a basis for modeling of complex systems like systems of social, highly adaptive agents, where rules are allowed to be followed flexibly and approximately. Since one and the same rule may be applied in many ways depending, among others, on the agent and the situation of (inter)action, we can to a higher extent capture the complexity of the modelled system by means of relatively less rules. Moreover, agents are given more autonomy in applying rules. From the technical point of view, degrees of applicability may serve as lists of tuning parameters to control application of rules. Another area of possible use of rough applicability is multi-attribute classification (and, in particular, decision making). In

14 Towards Rough Applicability of Rules

205

the case of an object to which no classification rule is applicable in the strict sense, we may try to apply an available rule roughly. This happens in the real life, e.g., in the process of selection of the best candidate(s), where no candidate fully satisfies the requirements. If a decision is to be made anyway, some conditions should be omitted or their satisfiability should be treated less strictly. Rough applicability may also help in classification of objects, where some values of attributes are missing. In Sect. 14.2, approximation spaces are overviewed. Section 14.3 is devoted to the notions of graded satisfiability and meaning of formulas. In Sect. 14.4, we generalize the fundamental notion of applicability in the two directions mentioned earlier. Section 14.5 contains a concise summary.

14.2 Approximation Spaces The general notion of an approximation space (AS) was proposed by Skowron and Stepaniuk [13, 14, 16]. Any such space is a triple M = (U^F^K), where f/ is a non-empty set, F .U \-^ pU is an uncertainty mapping, and K : (pf/)^ ^-^ [0,1] is a rough inclusion function (RIF). pU and (pC/)^ denote the power set of U and the Cartesian product pU x pU, respectively. Originally, F and n were equipped with tuning parameters, and the term "parameterized" was therefore used in connection with ASs. Exemplary ASs are the rough ASs, induced by the Pawlak information systems [6, 8]. Elements of C/, called objects and denoted by u with subscripts whenever needed, are known by their properties only. Therefore, some objects may be viewed as similar. Objects similar to an object u constitute a granule of information in the sense of Zadeh [17]. Indiscemibility may be seen as a special case of similarity. Since every object is obviously similar to itself, the universe U of M\^ covered by a family of granules of information. The uncertainty mapping T is a basic mathematical tool to describe formally granulation of information on U, For every object u, Fu is a set of objects similar to u, called an elementary granule of information drawn to u. By assumption, u G Fu. Elementary granules are merely building blocks to construct more complex information granules which form, possibly hierarchical, systems of granules. Simple examples of complex granules are the results of set-theoretical operations on granules obtained at some earlier stages, rough approximations of concepts, or meanings of formulas and sets of formulas in ASs. An adaptive calculus of granules, measure(s) of closeness and inclusion of granules, construction of complex granules from simpler ones which satisfy a given specification are a few examples of related problems (see, e.g., [11, 12, 15, 16]). In our approach, a RIF K : [pU)^ \-^ [0,1] is a function which assigns to every pair (x, y) of subsets of U, a number in [0,1] expressing the degree of inclusion of x in 2/, and which satisfies postulates (A1)-(A3) for any x^y^z C U:{A\) K{x^y) = 1 iff X C 2/; (A2) If x ^^ 0, then K{X, y) = 0 iff X fl y = 0; (A3) If y C, z, then /^(^j y) < i^{x^ z). Thus, our RIFs are somewhat stronger than the ones characterized by the axioms of rough mereology, proposed by Polkowski and Skowron [10, 12].

206

Anna Gomolinska

Rough mereology extends Lesniewski's mereology [4] to a theory of the relationship of being-a-part-in-degree. Among various RIFs, the standard ones deserve a special attention. Let the cardinality of a set X be denoted by #a:. Given a non-empty finite set U and x,y CU, the standard RIF, «-^, is defined by /^-^(x, y) = < ^^ , I 1 otherwise. The notion of a standard RIF, based on the frequency count, goes back to Lukasiewicz [5]. In our framework, where infinite sets of objects are allowed, by a quasi-standard RIF we understand any RIF which for finite first arguments is like the standard one. In M, sets of objects (concepts) may be approximated in various ways (see, e.g., [1] for a discussion and references). In [14,16], a concept x CU is approximated by means of the lower and upper rough approximation mappings low, upp : pU H-^ pU, respectively, defined by lowx = {u G C/ I K,{ru^x) = 1} and uppx = {u e U \ K^FU^X) > 0}. (14.1) By (A1)-(A3), the lower and upper rough approximations of a:, lowx and uppa;, are equal io {u e U \ Fu C x} and {u eU \ FuDx ^ (/)}, respectively. Ziarko [18, 19] generalized the Pawlak rough set model [7, 8] to a variableprecision rough set model by introducing variable-precision positive and negative regions of sets of objects. Let t e [0,1]. Within the AS framework, in line with (14.1), the mappings of t-positive and t-negative regions of sets of objects, pos^,neg^ : pU i-> pU, respectively, may be defined as follows, for any set of objects x'} pos^x = {ti G [/ I

K{FU,X)

> t} and neg^a: = {u E U \ K{FU,X)

< t}. (14.2)

Notice that lowx = pos^x and uppx = U — neggX.

14.3 The Graded IVfeaning of Formulas Suppose a formal language L expressing properties of M is given. The set of all formulas of L is denoted by FOR. We briefly recall basic ideas concerning the graded satisfiability and meaning of formulas and their sets, studied in [2]. Given a relation of (crisp) satisfiability of formulas for objects of [/, \=c, the c-meaning (or, simply, meaning) of a formula a is understood as the extension of a, i.e., as the set | |a| |c = {u £ U \ u \=c a}. For simplicity, "c" will be omitted in formulas whenever possible. By introducing degrees t G [0,1], we take into account the fact that objects are perceived through the granules of information attached to them. In the formulas below, li 1=^ a reads as "a is t-satisfied for u" and \\ct\\t denotes the t-meaning of a: u\=ta

iff K{FU, \\a\\) > t and ||a||t = {u £ U \ u ^t (^}-

The original definitions, proposed by Ziarko, are somewhat different.

(14.3)

14 Towards Rough Applicability of Rules

207

In other words, ||a||t = posJ|a||. Next, for t € T == [0,1] U {c}, the set of all formulas which are t-satisfied for an object u is denoted by \u\t, i.e., \u\t = {a E FOR I ^i |=t a } . Notice that it may be t = c here. The graded satisfiability of a formula for an object is generalized on the left-hand side to a graded satisfiability of a formula for a set of objects, and on the right-hand side to a graded satisfiability of a set of formulas for an object, where degrees are elements of Ti = T x [0,1]. For any n-tuple t and i = 1 , . . . , n, let Tr^t denote the z-th element of t. For simplicity, we use |=t, | • |t» and 11 • | |t both for the (object, formula)case as well as for its generalizations. Thus, for any object u, a set of objects x, a formula a, a set of formulas X, a RIF K* : (pFOR)^ h-> [0,1], and teTi, x\=tOL iff K{X, llallTTit) > 7r2t and \x\t = {a e FOR | x |=t a } ; u^tX

iff K^XM^it)

> 7T2t and ||X||t = {ueU\u\=t

X}.(14.4)

u [=t X reads as "X is t-satisfied for u'\ and | |X| |t is the t-meaning of X. Observe that \=t extends the classical, crisp notions of satisfiability of the sorts (set-of-objects, formula) and (object, set-of-formulas). Along the standard lines, x \= a iff \fu e x.u t= a, and u\= X iffWa e X.u \= a. Hence, x ^ a iff x f=(c,i) Q^» and u [= X iff u |=(c,i) X. Properties of the graded satisfiability and meaning of formulas and sets of formulas may be found in [2]. Let us only mention that a non-empty finite set of formulas X cannot be replaced by a conjunction f\X of all its elements as it happens in the classical, crisp case. In the graded case, one can only prove that II /\ X||t C ||X||(t 1), where t eT, but the converse may not hold.

14.4 The Graded Applicability of Rules Generalized All rules over L, denoted by r with subscripts whenever needed, constitute a set RUL. Any rule r is a pair of finite sets of formulas of L, where the first element, Pr, is the set of premises of r and the second element of the pair is a non-empty set of conclusions of r. Along the standard lines, a rule which is not applicable in a considered sense is called inapplicable. A rule r is applicable to an object u in the classical sense iff the whole set of premises Pr is satisfied for u. The graded applicability of a rule to an object, viewed as a fundamental form of rough applicability here, is obtained by replacing the crisp satisfiability by its graded counterpart and by weakening the condition that all premises be satisfied [3]. Thus, for any t eT\, r e apl^u iff «:*(Pr, l^kit) > 7r2t, i.e., iff u e \\Pr\\t'

(14.5)

r e aip\^u reads as "r is t-applicable to u''? Properties of apl^ are presented in [3]. Let us only note that the classical applicability and the (c, 1)-applicability coincide. Example 1. In the textile industry, a norm determining whether or not the quality of water to be used in the process of dyeing of textiles is satisfactory, may be written ^Equivalently, "r is applicable to u in degree f\

208

Anna Gomolinska

as a decision rule r with 16 premises and one conclusion (o?, yes). In this case, the objects of the AS considered are samples of water. The c-meaning of the conclusion of r is the set of all samples of water u eU such that the water may be used for dyeing of textiles, i.e., ||(G?,yes)|| = {u £ U \ d{u) = yes}. Let a i , . . . ,07 denote the attributes: colour (mgPt/1), turbidity (mgSi02/l), suspensions (mg/1), oxygen consumption (mg02/l), hardness (mval/1), Fe content (mg/1), and Mn content (mg/1), respectively Then, (ai, [0,20]), (as, [0,15]), (as, [0,20]), {a^, [0,20]), (as, [0,1.8]), (ae, [0,0.1]), and (07, [0,0.05]) are exemplary premises of r. For instance, the cmeaning of (a2, [0,15]) is the set of all samples of water such that their turbidity does not exceed 15mgSi02/l, i.e., ||(a2, [0,15])|| = {u eU \ a2{u) < 15}. Suppose that the values of a2, aa slightly exceed 15,20 for some sample u, respectively, i.e., the second and the third premises are not satisfied for u, whereas all remaining premises hold for u. That is, r is inapplicable to the sample u in the classical sense, yet it is (c, 0.875)-applicable to u. Under special conditions as, e.g., serious time constraints, applicability ofriou in degree (c, 0.875) may be viewed as sufficient or, in other words, the quality of u may be viewed as satisfactory if the gain expected surpass the possible loss. Observe that r € apl^tz iff ix G /^c/||^r||t, where Ipu is the identity mapping on pU. A natural generalization of (14.5) is obtained by taking a mapping /$ : pU H-^ pU instead of /pt/, where $ is a possibly empty list of parameters. For instance, /$ may be an approximation mapping. In this way, we obtain a family of mappings aplf^ : U H-^ pRUL, parameterized hyteTi and $, and such that for any r and u,

re^vi'u'HuehWPrWf The family is partially ordered by C, where for any ti,t2

(14.6) ETI,

aplff E apl/^ ^^ Wu e C/.aplf> C aplf>.

(14.7)

The general notion of rough applicability, introduced above, comprises a number of particular cases, including the fundamental one. In fact, apl^ = apl^*^^. Next, e.g., r e aplj^^i^ iff ix € low| |Pr | |t iff ^ is ^-applicable to every object similar to u. In the same vein, r € apl^^^u iff u € upp||Pr||t iff r is ^-applicable to some object similar to u. We can also say that r is certainly ^-applicable and possibly t-applicable to u, respectively. In the variable-precision case, for / = pos^ and s e [0,1], r e aplj tx iff w € pos^llPrll* iff r is i-applicable to a sufficiently large part of Fu, where sufficiency is determined by 5. In a more sophisticated case, where / = pos^ o low (o denotes the concatenation of mappings), r e apl/zz iff u G pos^lowUP^Hf iff A^(Pu, low||Pr||t) > 5 iff r is certainly ^-applicable to a sufficiently large part of Fu, where sufficiency is determined by s. Etc. For t = (^1,^2) ^ [0,1]^, the various forms of rough t-applicability are determined up to granularity of information. An object u is merely viewed as a representative of the granule of information Fu drawn to it. More precisely, a rule r may practically be treated as applicable to u even if no premise is, in fact, satisfied for u.

14 Towards Rough Applicability of Rules

209

It is enough that premises are satisfiable for a sufficiently large part of the set of objects similar to u. If used reasonably, this feature may be advantageous in the case of missing data. The very idea is intensified in the case of pos^. Then, r is t-applicable to u in the sense of pos^ iff it is t-applicable to a sufficiently large part of the set of objects similar to u, where sufficiency is determined by s. This form of applicability may be helpful in classification of u if we cannot check whether or not r is applicable to u and, on the other hand, it is known that r is applicable to a sufficiently large part of the set of objects similar to u. Next, rough applicability in the sense of low is useful in modeling of such situations, where the stress is laid on the equal treatment of all objects forming a granule of information. A form of stability of rules may be defined, where r is called stable in a sense considered if for every u, r is applicable to It iff r is applicable to all objects similar to u in the very sense. Example 2. Consider a situation of decision making whether or not to support a student financially. In this case, objects of the AS are students applying for a bursary. Suppose that some data concerning a person u is missing which makes decision rules inapplicable to u in the classical, crisp sense. For simplicity, assume that r would be the only decision rule applicable to u unless the data were missing. Let a be the premise of r of which we cannot be sure if it is satisfied for u or not. Suppose that for 80% of students whose cases are similar to the case of u, all premises of r are satisfied. Then, to the advantage of u, we may view r as practically applicable to u. Formally, r is (0.8, l)-applicable to u. Additionally, let r be (0.8,0.9)-applicable to 65% of objects similar to u. In sum, r is (0.8,0.9)-applicable to u in the sense of The second (and last) generalization of the fundamental notion of rough aplicability, proposed here, consists in extension of applicability of a rule to an object to the case of applicability of a rule to a set of objects. In the classical case, a rule is applicable to a set of objects x iff it is applicable to each element of x. For any a, let {a)'^ denote the tuple consisting of n copies of a, and (a)^ be abbreviated by (a). For arbitrary tuples s,t, st denotes their concatenation. Next, if t is at least a pair of items (i.e., an n-tuple for n > 2), then TTSL

(14.8)

Thus, a family of mappings Apl^ : pU \-^ pRUL is obtained, parameterized by t eT2 and partially ordered by a relation C, where for any ^1, ^2 G T2, Apl,^ C Apl,^ ^ ' Vx C UApk^x

C Apl,^x.

(14.9)

The graded applicability, introduced above, is an exemplary notion of rough applicability of a rule to a complex object which is a set of objects of the underlying

210

Anna Gomoliiiska

approximation space M in our case. This notion may be useful in modeling of a number of situations. Three such cases are sketched below. Example 3. Suppose that objects of an AS are questions which may be subject to negotiation. Then, sets of objects are packets of such questions and represent possible negotiation problems. Let the rules considered be decision rules on how to solve particular problems. We can rank decision rules depending, among others, on their graded applicability to given negotiation problems. The more questions solved positively by a rule, the better is the rule. Example 4. Let objects of an AS be school students in a town. A conmiittee constructs rules to rank classes of students in order to award a prize to the best class. They search for the most universal rule(s) satisfying some additional conditions. A rule r is viewed as more universal than a rule r' iff r applies in a considered sense to larger parts of given classes of students than r' does. Example 5. In a factory, every lot of products is tested whether or not the articles comply with a norm r or, in other words, how far the norm r is applicable in some considered sense to every lot of products. In this case, products are objects of an AS and lots of products are the complex objects considered. A lot x passes the test if a sufficiently large part of x complies with r or, in other words, if r applies to x in a sufficient degree. Below, we present a number of properties of the forms of applicability of rules defined earlier. For natural numbers n > l , z = l , . . . , n , non-empty partially ordered sets {xi, pU, s e [0,1], and t, t' 6 Ti, we have: (a) Where / = pos^, aplfix = Apl^^^^Pw. (6) ap4^- = a p i r ^ and a p l ^ ^ = {}{^viu

\ f = posj.

s>0

(c) If Fu ~ Fu' and g G {upp o /$,pos5 o /$}? then apl^^z = apl^-u' and aplf Ti = aplf u'. (d) If /$ is monotone and t ^t\

then aplj^f C. aplf^.

(e) apll^- C apl, C apl^^^.

(/) Apli(i)^ = n^^P^*^ \uex}. Proof. We prove (d), (f) only. For (d) consider a rule r and assume (dl) /$ is monotone and (d2) t :< t'. First, we show (d3) \\Pr\\t' ^ WPrWt- Consider the non-trivial case only, where nit,nit' ^ c. Assume that u e WPrWr- Then

14 Towards Rough Applicability of Rules

211

(d4) K*{Pr, I^^UitO > ^^2^' by the definition of graded meaning. Observe that for any formula a, if K{ru, \\a\\) > 7rit\ then K{ru, \\a\\) > nit by (d2). Hence, Hint' Q lulTTit' As a consequence, K*{Pr,\u\^^t') < K,*{Pr,\u\^^t) by (A3). Hence, A^*(P^, \u\^,t) > ^2t' > TTS^ by (d2), (d4). Thus, u G \\Pr\\t by the definition of graded meaning. In the sequel, /$||Pr||t' Q /$||Pr||t by (dl), (d3). Hence, r £ apl/^^ implies r G aplf'^ by the definition of graded applicability in the sense of /$. Incase (f), for any rule r,r e Apl^^^^x iff x C ||Pr||t iff Vii G x.u G \\Pr\\t iff Wu G x.r G apl^w iff r G p|{apl^w \u G x}. D Let us briefly comment the results. By (a), rough applicability of a rule to u in the sense of pos^ and the graded applicability of a rule to Fu coincide, (b) is a direct consequence of the properties of approximation mappings, (c) states that the fundamental notion of rough applicability as well as the graded forms of applicability in the sense of uppo/$ and pos^o/$ are determined up to granulation of information. By (d), ift: 0, then Apl^ji^} = apl<j^ii. (e) If t :< t\

then Apl^, C Apl^.

(/) Apl(i)5/ E Apl(^„),, E Apl(o)3(g) f l f l Apl,a: = Apl(i)3[/ = Apl(,,i,i)C/ = { r G R U L | | | P , | | = C / } . xCUteT2

{h) If Pr C Pr' and 7r2t = 1, then r' G Apl^x implies r G Apl^x. (i) If 3a G Pr.||a||7rit = 0, 7r2t = 1, and TTS* > 0, then r G Apl^x iff a: = 0. (j) If x' n llPrlUt = 0, then r G Apl^(x U x') implies r G Apl^x and r G Apl^x impUes r G Apl^(a: — x'). (fc) If x' C ||Pr||
212

Anna Gomolinska

then Fu C ||a||. Since u G Fu, it holds u |= a as required. As a consequence, (g3) \u\i C \u\. In the next step, we prove (g4) ||Pr||(i,i) = t/ iff ||Pr|| = U (recall that \\Pr\\ = ||-Pr||(c,i))- "=^" Assume ||Pr||(i,i) = U, Hence, for every object u, Pr C \u\i by the definition of (1, l)-meaning. In virtue of (g3), Pr C \u\. Hence ||Pr|| = U by the definition of meaning. "<=" Assume \\Pr\\ — U. Hence, for every object u, Pr C \u\ by the definition of meaning. In other words, Vu € f/.Va e Pr.u e \\a\l i.e., Va G Pr.||<^|| = U, Hence, \/u G ^.Va G Pr.Fu C ||Q;||. Thus, Vt/ G C^.Va G Pr-^ t=i oi by the definition of ^=i, i.e., Wu G C/.Va G Pr.a G |^x|i, i.e., Vu G f/.Pr S l^li- Hence, ||Pr||(i,i) = ^ by the definition of (1, l)-meaning. By (gl), (g2), and (g4), it holds that (g5) Apl^ipC/ = Apl(c,i,i)C^- Observe that (g6) for any x C U, f]{Ap\^x \ t G T2} = Apl^^^sa: by (e),'(f). Next, we show that (gl) fKApl^i^sx \ x C U} = Ap^^^^sU. "C" is obvious. To prove " 2 ' \ consider a rule r G Apl(i)3{7. By the definition of (1,1,1)applicability, U C ||Pr||(i,i). Hence, for any x C U, x C ||Pr||(i,i). Again by the definition of (1,1,1)-applicability, r G Apl^i^ao: for every set of objects x. Hence, r G n{Apl(i)3a: \ x C U}. Thus, CiiApl^x \ x CU At GT2} = Ap\^yU by (g6), (g7). Hence, (g) finally follows by (g2), (g5). D Some comments can be handy. First, as directly follows from the definitions of applicability, the (c, 1,1)-applicability is the same as the classical applicability. Next, if 7r2t = TTst = 1, then a rule r is ^-applicable to a set of objects x iff every premise of r is 0, then the ^-applicability of a rule to {u} is the same as the o^-applicability of the rule to u by (d). Property (e) states that if f is greater than or equal t in the sense of ^ , then every rule ^'-applicable to a set of objects x is t-applicable to x as well."^ It follows from (e) and (f) that Apl^^^a and Apl(o)3 are the least and the greatest elements of the partially ordered family of mappings Apl^, respectively. By (g), the following sets of rules are identical: the set of all rules f-applicable to all sets of objects for each t e T2; the set of all rules (1,1,1)applicable to U; the set of all rules (c, 1,1)-applicable to U; and the set of all rules of which every premise is satisfied for each object of U. Hence, axiomatic rules (i.e., rules without premises) are ^-applicable to every set of objects for each t eT2 since ||0|| = U. In virtue of (h), if 7r2t = 1 and a rule r' is ^applicable to a set of objects X, then every rule of which premises are also premises of r' is t-applicable to x as well. By (i), if TTQ^ = 1, irst > 0, and some premise of a rule r is vrit-unsatisfiable, then r is ^-applicable to the empty set only. Recall that RIFs are quasi-standard in cases (j), (k). (j) states that the property of being inapplicable (resp., applicable) in the sense of Apl^ is invariant under adding (removing) objects for which sets of premises of rules are
14 Towards Rough Applicability of Rules

213

14.5 Summary The aim of this paper was to further analyze rough applicability of rules. We generaUzed the fundamental concept of graded applicability in two ways, where, nevertheless, all premises of a rule were treated on equal terms. In the future, rules with premises partitioned into classes will be of interest. Applicability is only one aspect of application of rules. An analysis of the results of rough application and the question of rough quality of rules are of importance as well. The latter problem is closely related to propagation of uncertainty. Obviously, not all concepts of rough applicability can prove useful from the practical point of view. Nevertheless, some of them deserve our attention as they seem to describe formally certain forms of soft applicability of rules, observed in real life situations.

References 1. Gomolinska A (2002) A comparative study of some generalized rough approximations. Fundamenta Informaticae 51(1-2): 103-119 2. Gomolinska A (2004) A graded meaning of formulas in approximation spaces. Fundamenta Informaticae 60:159-172 3. Gomolinska A (2004) A graded applicability of rules. In: Tsumoto S, Slowiriski R, Komorowski J, Grzymala-Busse J W (eds) Proc 4th Int Conf Rough Sets and Current Trends in Computing (RSCTC'2004), Uppsala, Sweden, 2004, June 1-5, LNAI 3066. Springer, Berlin Heidelberg, pp 213-218 4. Le^niewski S (1916) Foundations of the general set theory 1 (in Polish). Works of the Polish Scientific Circle 2 Moscow Also in: Surma S J et al (eds) (1992) Stanislaw Lesniewski collected works. Kluwer Dordrecht, pp 128-173 5. Lukasiewicz J (1913) Die logischen Grundlagen der Wahrscheinlichkeitsrechnung. Krakow Also in: Borkowski L (ed) (1970) Jan Lukasiewicz - Selected works. North Holland Amsterdam London, Polish Sci Publ Warsaw, pp 16-63 6. Pawlak Z (1981) Information systems - theoretical foundations. Information Systems 6(3):205-218 7. Pawlak Z (1982) Rough sets. Int J Computer and Information Sciences 11:341-356 8. Pawlak Z (1991) Rough sets - Theoretical aspects of reasoning about data. Kluwer Dordrecht 9. Peters J F, Skowron A, Stepaniuk J, Ramanna S (2002) Towards an ontology of approximate reason. Fundamenta Informaticae 51(1-2): 157-173 10. Polkowski L, Skowron A (1996) Rough mereology: A new paradigm for approximate reasoning. Int J Approximated Reasoning 15(4):333-365 11. Polkowski L, Skowron A (1999) Towards adaptive calculus of granules. In: Zadeh L A, Kacprzyk J (eds) Computing with words in information/intelligent systems 1. Physica Heidelberg, pp 201-228 12. Polkowski L, Skowron A (2001) Rough mereological calculi of granules: A rough set approach to computation. J Comput Intelligence 17(3):472-492 13. Skowron A, Stepaniuk J (1994) Generalized approximation spaces. In: Proc 3rd Int Workshop on Rough Sets and Soft Computing, San Jose, USA, 1994, November 10-12, pp 156-163

214

Anna Gomolinska

14. Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundamenta Informaticae 27:245-253 15. Skowron A, Stepaniuk J, Peters J F (2003) Towards discovery of relevant patterns from parameterized schemes of information granule construction. In: Inuiguchi M, Hirano S, Tsumoto S (eds) Rough set theory and granular computing. Springer Berlin Heidelberg, pp 97-108 16. Stepaniuk J (2001) Knowledge discovery by application of rough set models. In: Polkowski L, Tsumoto S, Lin T Y (eds) Rough set methods and applications: New developments in knowledge discovery in information systems. Physica Heidelberg New York, pp 137-233 17. Zadeh L A (1973) Outline of a new approach to the analysis of complex system and decision processes. IEEE Trans on Systems, Man, and Cybernetics 3:28^4 18. Ziarko W (1993) Variable precision rough set model. J Computer and System Sciences 46(l):39-59 19. Ziarko W (2001) Probabilistic decision tables in the variable precision rough set model. J Comput Intelligence 17(3):593-603

15 On the Computer-Assisted Reasoning about Rough Sets Adam Grabowski * Institute of Mathematics, University of Bialystok, Akademicka 2, 15-267 Bialystok, Poland, adain@math. uwb. edu. p i Summary. The paper presents some of the issues concerning a formal description of rough sets. We require the indiscemibility relation to be a tolerance of the carrier, not an equivalence relation, as in the Pawlak's classical approach. As a tool for formalization we use the Mizar system, which is equipped with the largest formalized library of mathematical facts. This uniform and computer-checked for correctness framework seems to present a satisfactory level of generality and may be used by other systems as well as it is easily readable for humans. Key words: tolerance approximation spaces, formalized mathematics, automated reasoning, knowledge representation

15.1 Introduction Do we need another formal approach to the rough approximations? The formalizations are different and the generalizations go in different directions. Our approach differs from all those previously known (e.g. [2]) because we require the encoding to be machine-checkable. The idea of using a computer as a math-assistant is not new, also in the rough set conmiunity. Well-known existing systems (Rosetta, RSES, etc.) are a good example of how automatic tools may be used with the developed theory as a background. We would like to discuss the non-KDD approach, i.e. how the (rough set) theory can be machine-formalized. We are also going to propose our formal approach to some basic notions. By a formalization of mathematics we mean the encoding of mathematics in a formal language sufficiently detailed for a computer program to verify the correctness. As two main applications of formalized mathematics we may point out representation, and from this presentation of mathematics, and verification of the correctness of the formalized knowledge. Since it is hard to develop a uniform system which performs well in all stages of automated reasoning, there is a need for a flexible cooperation between various *Many thanks are due to Anna Gomoliiiska for the motivation for this work.

216

Adam Grabowski

computer systems specialized in specific problem domains. Recently, especially under the auspices of the European Union (Mathematical Knowledge Management MKM-Net and Calculemus - Systems for Integrated of Computation and Deduction, to note the most important networks), a number of such experiments with an agent-oriented mechanized reasoning approach were conducted, e.g. MathWeb or the Logic Broker Architecture, just to name a few. More detailed discussion on the topic can be found e.g. in [1]. The feasibility and usefulness of a distributed multiagent system for problem solving in formalized mathematics will depend on flexibility and value of its components. There are several efforts to join proof-checkers, e.g. Mizar with his classical first order logic and ZFC set theory as a base (it has a language relatively close to that used by mathematicians and computer scientists), provers (EQP/Otter is one of the most famous for the solution of the many equational problems with MACE as a tool for finite model generator), and a module translating results into an XML (Extensible Markup Language) style with its growing popularity as (one of the) information interchange standards for industry and academia. Mathematica, Maple or other computer algebra systems can be used for solving ordinary mathematical problems. Highly configured user interface based on the popular GNU editor Emacs may be a good choice not only for programmers, finally e.g. Omega group can provide a broker architecture. Mizar Mathematical Library itself can also be a subject for research - e.g. for knowledge discovery as a large database of mathematical facts. Some works on the using Mizar as a tool for checking computer program for correctness are also known. For some reasons, in this paper we focused mainly on the proof-checking agent. Among systems which are designed for this purpose we may enumerate here: Coq, NuPRL/MetaPRL, Mizar, Isabelle/Isar, HOL, PVS, two of them are declarative (that is enable to write scripts in a style close to the mathematical jargon): Mizar and Isar. We concentrated on building a sufficiently general and flexible framework which should give the opportunity for reusing it by other automated math-assistants and enable further generalizations. Obviously, we also wanted to choose a system which does not force a researcher to write something in a way far from his/her usual mathematical style. In our opinion, the language and typing mechanisms available in the Mizar system serve good for this purpose.^ We would like to describe some of the issues concerned with the first (as we know of) computer-checked formalization of rough set theory basics. The paper is organized as follows. The second section deals with a brief description of the system we have chosen as a basic tool. In the sections 3-6 the fundamental concepts for tolerance approximation spaces were described, rough approximations, rough inclusion predicate, and rough membership functions, respectively. The next section contains the full formalization of a chosen lemma as an example while in the last one we draw some concluding remarks. ^The Mizar project started in the early seventies of the previous century in Flock, Poland, where MSRAS 2004 workshop took place, under the auspices of Plock Scientific Society.

15 On the Computer-Assisted Reasoning about Rough Sets

217

All Mizar examples will be cited in the typewriter font, for lack of space we will not provide full proofs (with the exception in Section 7). They are available both in every Mizar distribution and online from the website of the Mizar project.

15.2 The Mizar System We are going to shed some light on a system we have chosen for the formalization. Its description as well as many useful links are available on the web page of the project [8], so we will focus here on the generalities to make this presentation possibly selfcontained. The Mizar system is based on three programs (agents in some sense): the accommodator (which imports all the necessary notions and theorems from the Mizar Mathematical Library - MML), the verifier (the core of the system, which parses texts and verifies the logical correctness of the reasoning), and the exporter/transferer (which separates reusable knowledge for inclusion in the database and exports it). There are precompiled binaries for Intel platforms: Linux, Solaris and Win32 freely available from the homepage of the project. The most important part of the system however is the large computer-managed database of mathematical facts. As of the time of writing, MML consisted of 834 Mizar articles - over 60 megabytes of texts written by more than 150 authors. This large repository contains 7093 definitions and 36847 theorems covering different disciplines in mathematics and computer science. Since the Mizar type system is based on ZFC set theory with the classical first-order logic, the developed repository is close in some sense to the Bourbaki school. Comparing to the Bourbaki project, MML is developed by many more authors which causes problems with uniformization of the library. The core of it is organized into the encyclopedia-like style, although the original division into articles assigned to authors is kept. The Library Committee of the Association of Mizar Users (which is a non-profit organization), reserves the rights to revise authors' work. Since the system evolves, articles must be kept compatible with it. There is a service, MML Query, which enables advanced article browsing and searching functions. Every Mizar distribution contains full texts, their abstracts (with proofs removed), and the database as well as a collection of proof-enhancing software. All articles are also automatically translated into the ETEX source and are available on the Internet at [8] as the Journal of Formalized Mathematics. Although the Mizar checking software is developed by a small group of progranmiers, MML is open for external developments. All new articles are reviewed by the Library Committee and the accepted ones are included into MML. Fundamental Theorem of Algebra, Birkhoff Variety Theorem, the equivalence of Robbins and Boolean algebras ([5]), Wedderbum Theorem, or Stone representation theorem for Boolean algebras are examples of what is already proved in MML. The projects

218

Adam Grabowski

of formalization of Compendium of continuous lattices^ or the first machine-checked proof of the Jordan Curve Theorem are on their way, to note the most important ones.

15.3 Tolerance Approximation Spaces Following Jarvinen works (e.g. [6]), it can be argued that neither reflexivity, symmetry, nor transitivity are indispensable properties of indiscemibility relations. In our view, we have chosen an approach introduced originally in [11], i.e. we require this relation to be a tolerance. Obviously, we could develop in parallel two views for the notion of an approximation space: the classical and the generalized one, however it could be easily explored by MML software that some theorems are corollaries from the others, and they would be a subject for deleting from the library. This may be treated as a lack of focus of some kind, because some theorems remain true only in a classical version. In our opinion, this framework is sufficiently uniform as forced by the criterion of a generalization level. One of the most important constructors for types used in MML is the notion of a structure. The antecedent of all other structures, which has only one field the carrier, is called 1 - s o r t e d . Inheritance mechanisms and type polymorphism implemented in Mizar allow notions defined for a certain structure to be used also for all its descendants. Out of nearly 100 structures declared in MML we have chosen R e l S t r , that is a relational structure - a carrier together with a relation defined on it, technically named I n t e r n a l R e l . It corresponds to (C/, IND), but the properties of the fields of a structure are usually added to it by the adjectives (attributes), as in the example below. d e f i n i t i o n l e t P be R e l S t r ; a t t r P i s with.equivalence means :: ROUGHS.lidef 2 the I n t e r n a l R e l of P i s Equivalence_Relation of the c a r r i e r of P; a t t r P i s with_tolerance means :: ROUGHS_l:def 3 the I n t e r n a l R e l of P i s Tolerance of the c a r r i e r of P; end;

As we already noticed, Mizar articles can be revised. Actually, the changes are very often. At the beginning of MML development, tolerances and equivalence relations were introduced independently. While the Mizar language evolved to enable more inheritance mechanisms working, equivalence relations became tolerances in MML thanks to the distribution of their types into adjectives. After this revision tolerances are defined in MML as reflexive and symmetric relations. If transitivity is added, they turn into equivalence relations, as usual. ^Mizar formalization is acknowledged in the revised edition of the Compendium, issued as Continuous lattices and domains by G. Gierz et al., Cambridge, 2003.

15 On the Computer-Assisted Reasoning about Rough Sets

219

d e f i n i t i o n l e t X be s e t ; mode Tolerance of X i s t o t a l r e f l e x i v e symmetric Relation of X; end;

The notion of a mode as a basic constructor for types will be explained later in detail. After we prepared the necessary formal apparatus, we may construct an example of a tolerance approximation space and introduce appropriate mode. definition mode Tolerance_Space i s with_tolerance non empty RelStr; end;

In this manner, if we consider a subset of a given tolerance (or approximation) space, the associated tolerance (or equivalence, respectively) relation is also fixed as a hidden argument. Thanks to the attributes we can force structure properties, but also classify objects of other types. This is the way crisp and rough subsets of a given tolerance space are defined. d e f i n i t i o n l e t A be Tolerance_Space; l e t X be Subset of A; a t t r X i s rough means :: ROUGHS.l:def 7 BndAp X <> { } ; end;

The functor BndAp is just a set-theoretical difference between the upper and the lower approximation (i.e. boundary) of X described in the next section thoroughly. notation l e t A be Tolerance.Space; l e t X be Subset of A; antonym X i s exact for X i s rough; end;

Antonyms and synonyms may be introduced if the author would prefer to use his own notation or lexicon for yet defined notion. The use of this mechanism however is controlled by the people responsible for the library in order to avoid the lexical overloading of MML.

15.4 Rough Approximations The basic ideas of RST deal with situations in which the objects of a certain universe can be identified only within the limits determined by the knowledge represented by a given indiscemibility relation (that is, by a internal relation of a relational structure). Regardless which approach we are claiming, the key notion of RST is the notion of an approximation. A lower approximation of a set X consists of objects which are surely (w.r.t. indiscemibility relation) in X. Similarly, the upper one extends the lower for the objects which are possibly in X. Formally, we have XR = {X&U: [X\R C X}. Its translation to Mizar is quite similar:

220

Adam Grabowski

definition let A be Tolerance_Space, X be Subset of A; func LAp X -> Subset of A equals :: ROUGHS.l:def 4 { X where x is Element of A : Class (the InternalRel of A, x) c= X }; end;

On the right hand side of an arrow " - > " the so-called mother type of a functor is given. Because in a variant of ZF set theory types of all objects expand to the type s e t (except Mizar structures which are treated in a different way), the user may drop this part of a definition not to restrict its type. We wanted Mizar to understand automatically that approximations yield subsets of an approximation space. For uniformity purposes, we used notation C l a s s (R, x) instead of originally introduced in MML n e i g h b o u r h o o d (x, R) - even if we dealt with tolerances, not equivalence relations. Because of implemented inheritance mechanisms and adjectives it worked surprisingly well. The Mizar articles are plain ASCII files, so some usual (often close to its ETgX equivalents) abbreviations are claimed: "c=" stands for the set-theoretical inclusion, " i n " for G," {}" for 0, " \ / " and "/ \ " for the union and the intersection of sets, respectively. The double colon starts a comment, while semicolon is a delimiter for a sentence. Another important construction in the Mizar language which we extensively used, was cluster, that is a collection of attributes. There are three kinds of cluster registrations: •

existential, because in Mizar all types are required to be non-empty, so the existence of the object which satisfies all these properties has to be proved. We needed to construct an example of an approximation space; r e g i s t r a t i o n l e t A be non diagonal Approximation.Space; cluster rough Subset of A; existence; end;

Considered approximation space A which appear in the locus (argument of a definition) have to be non d i a g o n a l . If A will be diagonal, i.e. if its indiscemibility relation will be included in the identity relation, therefore all subsets of A will become crisp with no possibility for the construction of a rough subset. • functorial, i.e. the involved functor has certain properties, used e.g. to ensure that lower and upper approximations are exact (see the example below); r e g i s t r a t i o n l e t A be Approximation_Space, X be Subset of A; cluster LAp X -> exact; coherence; end;

•

Functorial clusters are most frequent due to a big number of introduced functors (5484 in MML). The possibility of adding of an adjective to the type of an object is also useful (e.g. often we force that an object is non-empty in this way). conditional stating e.g. that all approximation spaces are tolerance spaces.

15 On the Computer-Assisted Reasoning about Rough Sets

221

registration cluster with.equivalence -> with^tolerance RelStr; coherence; end;

This kind of a cluster is relatively rare (see Table 15.1) because of a strong type expansion mechanism. Table 15.1 contains number of clusters of all kinds comparing to those introduced in [4]. Table 15.1. Number of clusters in MML vs. RST development type

in MML in [4]

existential functorial conditional

1501 3181 1131

7 9 7

total

5813

23

As it sometimes happens among other theories (compare e.g. the construction of fuzzy sets), paradoxically the notion of a rough set is not the central point of RST as a whole. Rough sets are in fact classes of abstraction w.r.t. rough equality of sets and their formal treatment varies. Majority of authors (w^ith Pawlak in [9] for instance) define a rough set as an underlying class of abstraction (as noted above), but some of them (cf. [2]) claim for simplicity that a rough set is an ordered pair containing the lower and the upper limit of fluctuation of the argument X. These two approaches are not equivalent, and we decided to define a rough set also in the latter sense. d e f i n i t i o n l e t A be Approximation.Space; l e t X be Subset of A; mode RoughSet of X means :: ROUGHS_l:def 8 i t = [LAp X, UAp X]; end;

What should be recalled here, there are so-called modes in the Mizar language which correspond with the notion of a type. To properly define a mode, one should only prove its existence. As it can be easily observed, because the above definiens determines a unique object for every subset X of a fixed approximation space A, this can be reformulated as a functor definition in the Mizar language. If both approximations coincide, the notion collapses and the resulting set is exact, i.e. a set in the classical sense. Unfortunately, in the above mentioned approach, this is not the case. In [4] we did not use this notion in fact, but we have chosen some other solution which describes rough sets more effectively, i.e. by attributes.

222

Adam Grabowski

15.5 Rough Inclusions and Equality Now we are going to briefly present the fundamental predicate for the rough set theory: rough equahty predicate (the lower version is cited below, while the dual upper equality - notation is "= "", and assumes the equality of upper approximations of sets). d e f i n i t i o n l e t A be Tolerance_Space, X, Y be Subset of A; pred X _= Y means :: ROUGHS^lidef 14 LAp X « LAp Y; reflexivity; symmetry; end;

Two additional properties (reflexivity and symmetry) were added with their trivial proofs: e.g. the first one forces the checker to accept that X ^^ X without any justification. In Mizar it is also possible to introduce the so-called redefinitions, that is to give another definiens, if equivalence of it and the original one can be proved (in the case above, the rough equality can be defined e.g. as a conjunction of two rough inclusions). This mechanism may be also applied to our Mizar definition of a rough set generated by a subset of approximation space - as an ordered pair of its lower and upper approximation, not as classes of abstraction w.r.t. rough equality relation.

15.6 Membership Functions Employing the notion of indiscemibility the concept of a membership fiinction for rough sets was defined in [10] as

^^^""^ -

\I{x)\

'

where \A\ denotes cardinality of A. Because the original approach deals with equivalence relations, I{x) is equal to [x]/, i.e. an equivalence class of the relation / containing element x. Using tolerances we should write rather x/I instead. Also in Mizar we can choose between C l a s s and n e i g h b o u r h o o d , as we already noted in the fourth section. As it can be expected, for a finite tolerance space A and X which is a subset of it, a function //^ is defined as follows. d e f i n i t i o n l e t A be f i n i t e Tolerance_Space; l e t X be Subset of A; func MemberFimc (X, A) -> Function of the carrier of A, REAL means for X being Element of A holds i t . x = card (X / \ Class (the InternalRel of A, x)) / (card Class (the InternalRel of A, x ) ) ; end;

15 On the Computer-Assisted Reasoning about Rough Sets

223

Actually, the dot " . " stands in MML for the function application, i t in the definiens denotes the defined object. Extensive usage of attributes make formulation of some theorems even simpler (at least, in our opinion) than in the natural language, because it enables us e.g. to state that JJ,-^ is equal to the characteristic function xx (theorem 44 from [4]) for a discrete finite approximation space A (that is, with the identity as an indiscemibility relation) in the following way: theorem :: ROUGHS.1:44 for A being discrete finite Approximation.Space, X being Subset of A holds

15.7 Example of the Formalization We formalized 19 definitions, 61 theorems with proofs, and 23 cluster registrations in [4]. This representation in Mizar of the rough set theory basics is 1771 lines long (the length of a line is restricted to 80 characters), which takes 54855 bytes of text. In this section we are going to show one chosen lemma together with its proof given in [6] and its full formalization in the Mizar language."^ Lemma 9. Let Re Tol{U) and X,Y XRUYR

=

CU. IfX is R-definable, then {XUY)R.

Proof. It is obvious that XRUYR C {X U Y)R. Let x e {X U Y)R, i.e., x/R C X U y. If x/R n X 7^ 0, then x e X^ md x £ XR because X is i?-definable. If x/R D X = 0, then necessarily x/R C Y and x e YR. D Hence, in both cases x e XR U YR . What is worth noticing, the attribute e x a c t (sometimes called i?-definable in the literature) has been defined earlier to describe sets with their lower and upper approximations equal (that is, crisp). Defining new synonyms and redefinitions however is also possible here. One of the features of the Mizar language which reflects closely mathematical vernacular is reasoning per cases (subsequent cases are marked by the keyword s u p p o s e ) . The references (after by) for XB00LE_1 (which is identifier of the file containing theorems about Boolean properties of sets) take external theorems from MML as premises, all other labels are local. Obviously, some parts of proofs in the literature may be hard for machine translation (compare "It is obvious that..." above), other may depend on the checker architecture (especially if an author would like to drive remaining part of his/her proof analogously to the earlier one). However, the choice of the above example is rather accidental.

'^In fact, to keep this presentation compact, we dropped dual conjunct of this lemma.

224

Adam Grabowski

theorem Lemma_9: for A X Y LAp proof let

being Tolerance_Space, being exact Subset of A, being Subset of A holds X \/ LAp Y = LAp (X \/ Y)

A be Tolerance^Space, X be exact Subset of A, Y be Subset of A; thus LAp X \/ LAp Y c= LAp (X \/ Y) by Th26; let X be set; assume Al: X in LAp (X \/ Y ) ; then A2: Class (the InternalRel of A, x) c= X \/ Y by Th8; A3: LAp X c= LAp X \/ LAp Y & LAp Y c= LAp X \/ LAp Y by XB00LE_1:7; per cases; suppose Class (the InternalRel of A, x) meets X; then X in UAp X by Al, Thll; then X in LAp X by Thl5; hence x in LAp X \/ LAp Y by A3; suppose Class (the InternalRel of A, x) misses X; then Class (the InternalRel of A, x) c^ Y by A2, XB00LE_1:73; then X in LAp Y by Al, Th9; hence x in LAp X \/ LAp Y by A3; end;

Even though Mizar source itself is not especially hard to read for a mathematician, some translation services are available. The final version converted automatically back to the natural language looks like below: For every tolerance space A, every exact subset X of A, and every subset Y of A holds LAp(X) U LAp(y) = LAp(X U Y). The name de Bruijn factor is claimed by automated reasoning researchers to describe "loss factor" between the size of an ordinary mathematical exposition and its full formal translation inside a computer. However in Wiedijk's considerations and Mizar examples contained in [12] it is equal to four (although in the sixties of the previous century de Bruijn assumed it to be about ten times bigger), in our case two is a good upper approximation.

15.8 Conclusions The purpose of our work was to develop a uniform formalization of basic notions of rough set theory. For lack of space we concentrated in this outline mainly on the notions of rough approximations and a membership function. Following [6] and [10], we formalized in [4] properties of rough approximations and membership functions

15 On the Computer-Assisted Reasoning about Rough Sets

225

based on tolerances, rough inclusion and equality, rough set notion and associated basic properties. The adjectives and type modifiers mechanisms available in the Mizar type theory made our work quite feasible. Even if we take into account that the transitivity was dropped from the classical indiscemibility relation treated as equivalence relation, further generalizations (e.g. variable precision model originated from [14]) are still possible. It is important that by including the formalization of rough sets into MML we made it usable for a number of automated deduction tools and other digital repositories. The Mizar system closely cooperates with OMDOC system to share its mathematical library via a format close to XML. Works concerning exchange of results between automatic theorem provers (e.g. Otter) and Mizar (already resulted in successful solution of Robbins problem) are on their way. Formal concept analysis, as well as fuzzy set theory is also well developed in MML. Successful experiments with theory merging mechanisms implemented in Mizar (e.g. to describe topological groups or continuous lattices) are quite promising to go further with rough concept analysis as defined in [7] or to do the machine encoding of the connections between fuzzy set theory and rough sets. We also started with the formalization of a paper [3], which focuses upon a comparison of some generalized rough approximations of sets. We hope that much more interconnections can be discovered automatically. Rough set researchers could be also assisted in searching in a distributed library of facts for analogies between rough sets and other domains. Eventually, it could be helpful within the rough set domain itself, thanks to e.g. proof restructurization utilities available in Mizar system itself - as well as other specialized tools. One the most useful at this stage is discovering irrelevant assumptions of theorems and lemmas. Comparatively low de Bruijn factor allows us to say that the Mizar system seems to be effective and the library is quite well developed to go further with the encoding of the rough set theory. Moreover, the tools which automatically translate the Mizar articles back into the ET^X source close to the mathematical vernacular are available. This makes our development not only machine- but also human-readable.

References 1. Ch. Benzmiiller, M. Jamnik, M. Kerber, V. Sorge, Agent-based mathematical reasoning, Electronic Notes in Theoretical Computer Science, 23(3), 1999. 2. E. Bryniarski, Formal conception of rough sets, Fundamenta Informaticae, 27(2-3), 1996, pp. 109-136. 3. A. Gomolinska. A comparative study of some generalized rough approximations, Fundamenta Informaticae, 51(1-2), 2002, pp. 103-119. 4. A. Grabowski, Basic properties of rough sets and rough membership function, to appear in Formalized Mathematics, 12(1), 2004, available at h t t p : / / m i z a r . org/JFM/Voll5 / r o u g h s . l . html.

226

Adam Grabowski

5. A. Grabowski, Robbins algebras vs. Boolean algebras, in Proceedings of Mathematical Knowledge Management Conference, Linz, Austria, 2001, available at http://www.emis.de/proceedings/MKM2001/. 6. J. Jarvinen, Approximations and rough sets based on tolerances, in: W. Ziarko, Y. Yao (eds.). Proceedings of RSCTC 2000, LNAI2005, Springer, 2001, pp. 182-189. 7. R. E. Kent, Rough concept analysis: a synthesis of rough sets andformal concept analysis, Fundamenta Informaticae, 27(2-3), 1996, pp. 169-181. 8. The Mizar Home Page, h t t p : / / m i z a r . o r g . 9. Z. Pawlak, Rough sets, International Journal of Information and Computer Science, 11(5), 1982, pp. 341-356. 10. Z. Pawlak, A. Skowron, Rough membership functions, in: R. R. Yaeger, M. Fedrizzi, and J. Kacprzyk (eds.), Advances in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 251-271. 11. A. Skowron, J. Stepaniuk, Tolerance approximation spaces, Fundamenta Informaticae, 27(2-3), 1996, pp. 245-253. 12. F. Wiedijk, The de Bruijnfactor, h t t p : / /www. c s . k u n . n l / ~ f r e e k / f a c t o r / . 13. L. Zadeh, Fuzzy sets, Information and Control, 8, 1965, pp. 338-353. 14. W. Ziarko, Variable precision rough sets model. Journal of Computer and System Sciences, 46(1), 1993, pp. 39-59.

16 Similarity-Based Data Reduction and Classification Gongde Guo^'^, Hui Wang\ David Bell^, and Zhining Liao^ ^ School of Computing and Mathematics, University of Ulster, BT37 OQB, UK { G . G U O , H.Wang, Z . L i a o } @ u l s t e r . a c . u k ^ School of Computer Science, Queen's University Belfast, BT7 INN, UK [email protected] ^ School of Computer and Information Science, Fujian University of Technology Fuzhou, 350014, China Summary. The ^-Nearest-Neighbors (^NN) is a simple but effective method for classification. The major drawbacks with respect to ^NN are (1) low efficiency and (2) dependence on the parameter k. In this paper, we propose a novel similarity-based data reduction method and several variations aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the number of data for learning, thus making classification faster. Experiments conducted on some public data sets show that the proposed methods compare well with other data reduction methods in both efficiency and effectiveness. Key words: data reduction, classification, /:-Nearest-Neighbors

16.1 Introduction The ^-Nearest-Neighbors (^NN) is a non-parametric classification method, which is simple but effective in many cases [6]. For an instance dt to be classified, its k nearest neighbors are retrieved, and this forms a neighborhood of dt. Majority voting among the instances in the neighborhood is generally used to decide the classification for dt with or without consideration of distance-based weighting. In contrast to its conceptual simplicity, the A:NN method performs as well as any other possible classifier when applied to non-trivial problems. Over the last 50 years, this simple classification method has been intensively used in a broad range of applications such as medical diagnosis, text categorization [9], pattern recognition, data mining, and e-commerce etc. However, to apply kNN we need to choose an appropriate value for k, and the success of classification is very much dependent on this value. In a sense, the kNN method is biased by k. There are many ways of choosing the k value, but a simple one is to run the algorithm many times with different k values and choose the one with the best performance, but this is not a pragmatic method in real applications.

228

Gongde Guo, Hui Wang, David Bell, and Zhining Liao

In order for A:NN to be less dependent on the choice of k, we proposed to look at multiple sets of nearest neighbors rather than just one set of A:-nearest neighbors [12]. The proposed formalism is based on contextual probability, and the idea is to aggregate the support of multiple sets of nearest neighbors for various classes to give a more reliable support value, which better reveals the true class of dt. As it aimed at improving classification accuracy and alleviating the dependence on k, the efficiency of the method in its basic form is worse than ^NN, though it is indeed less dependent on k and is able to achieve classification performance close to that for the best k. From the point of view of its implementation, the A:NN method consists of a search of pre-labelled instances given a particular distance definition to find k nearest neighbors for each new instance. If the number of instances available is very large, the computational burden for ^NN method is unbearable. This drawback prohibits it in many applications such as dynamic web mining for a large repository. These drawbacks of A:NN method motivate us to find a way of instances reduction which only chooses a few representatives to be stored for use for classification in order to improve efficiency whilst both preserving its classification accuracy and alleviating the dependence on k.

16.2 Related work Many researchers have addressed the problem of training set size reduction. Hart [7] made one of the first attempts to reduce the size of the training set with his Condensed Nearest Neighbor Rule (CNN). His algorithm finds a subset S of the training set T such that every instance of T is closer to an instance of S of the same class than to an instance of 5 of a different class. In this way, the subset S can be used to classify all the instances in T correctly. Ritter et al. [8] extended the condensed NN method in their Selective Nearest Neighbor Rule (SNN) such that every instance of T must be closer to an instance of S of the same class than to any instance of T (instead of S) of a different class. Further, the method ensures a minimal subset satisfying these conditions. Gate [5] introduced the Reduced Nearest Neighbor Rule (RNN). The RNN algorithm starts with S-T and removes each instance from S if such a removal does not cause any other instances in T to be misclassified by the instances remaining in S. Wilson [13] developed the Edited Nearest Neighbor (ENN) algorithm in which S starts out the same as T, and then each instance in S is removed if it does not agree with the majority of its k nearest neighbors. The Repeated ENN (RENN) applies the ENN algorithm repeatedly until all instances remaining have a majority of their neighbors with the same class, which continues to widen the gap between classes and smooth the decision boundary. Tomek [11] extends the ENN with his AUk-NN method of editing. This algorithm works as follows: for i=l to k, flag as bad any instance not classified correctly by its / nearest neighbors. After completing the loop all k times, remove any instances from S flagged as bad. Other methods include ELGrow {Encoding Length Grow), Explore by Cameron-Jones [3], IB1~IB5 by Aha et al. [1][2], Dropl~Drop5, and DEL by Wilson et al. [15] etc. From the experimental results conducted by Wilson et al., the average classification

16 Similarity-Based Data Reduction and Classification

229

accuracy of those methods on reduced data sets is lower than that on the original data sets due to the fact that purely instances selection suffers information loss to some extent. In the next section, we introduce a novel similarity-based data reduction method (SBModel). It is a type of inductive learning methods. The method constructs a similarity-based model for the data by selecting a subset S with some extra information from training set T, which replaces the data to serve as the basis of classification. The model consists of a set of representatives of the training data, as regions in the data space. Based on SBModel, two variations of SBModel called e-SBModel and p-SBModel are also presented which aim at improving both the efficiency and effectiveness of SBModel. The experimental results and a comparison with other methods will be reported in section 16.4.

16.3 Similarity-Based Data Reduction 16.3.1 The Basic Idea of Similarity-Based Data Reduction Looking at Figure 16.1, the Iris data with two features: petal length and petal width is used for demonstration. It contains 150 instances with three classes represented as diamond, square, and triangle respectively, and is plotted in 2-dimensional data space. In Figure 16.1, the horizontal axis is for feature petal length and the vertical axis is for feature petal width.

3 y^

2.5-

N^m(rfi). M^(di)-43

2

^ ^ i v

1.5 •

* A A ^

1 •

y

kCksslI

*

UCl«iss2| |»C1MS3|

0.50C

\W* 2

4

6

8

Fig. 16.1. Data distribution in 2-dimension data space Given a similarity measure, many instances with the same class label are close to each other in many local areas. In each local region, the central instance di looking at Figure 16.1 for example, with some extra information such as Cls{di) - the class label of instance df, Num{di) - the number of instances inside the local region; Sim{di) - the similarity of the furthest instance inside the local region to di, and Rep{di) - 2i representation of di, might be a good representative of this local region. If we take these representatives as a model to represent the whole training set, it will significantly reduce the number of instances for classification, thereby improving efficiency.

230

Gongde Guo, Hui Wang, David Bell, and Zhining Liao

For a new instance to be classified in the classification stage, if it is covered by a representative it will be classified by the class label of this representative. If not, we calculate the distance of the new instance to each representative's nearest boundary and take each representative's nearest boundary as an instance, then classify the new instance in the spirit of kNN. 16.3.2 Terminology and Definitions Before we give more details about the designs of the proposed algorithms, some important terms (or concepts) need to be explicitly defined first. Definition 1. 7. Neighborhood A neighborhood is a term referred to a given instance in data space. A neighborhood of a given instance is defined to be a set of nearest neighbors of this instance. 2. Local Neighborhood A local neighborhood is a neighborhood, which covers the maximal number of instances with the same class label 3. Local €-Neighborhood A local ^-neighborhood is a neighborhood which covers the maximal number of instances in the same class label with allowed e exceptions. 4. Global Neighborhood The global neighborhood is defined to be the largest local neighborhood among a set of local neighborhoods in each cycle of model construction stage. 5. Global £-Neighborhood The global ^-neighborhood is defined to be the largest local e-neighborhood among a set of local e-neighborhoods in each cycle of model construction stage. The global e-neighborhood is defined to be the largest local ^-neighborhood among a set of local ^-neighborhoods in each cycle of model construction stage. With the above definitions, given a training set, each instance has a local neighborhood. Based on these local neighborhoods, the global neighborhood can be obtained. This global neighborhood can be seen as a representative to represent all the instances covered by it. For instances not covered by any representative we repeat the above operation until all the instances have been covered by chosen representatives. All representatives obtained in the model construction process are used for replacing the data and serving as the basis of classification. There are two obvious advantages: (1) we needn't choose a specific k in the sense of A:NN for our method in the model construction process. The number of instances covered by a representative can be seen as an optimal k which is generated automatically in the model construction process, and is different for different representatives; (2) using a list of chosen representatives as a model for classification not only reduces the number of instances for classification, but also significantly improves the efficiency. From this point of view, the proposed method overcomes the two shortcomings of ^NN.

16 Similarity-Based Data Reduction and Classification

231

16.3.3 Modelling and Classification Algorithm Let D be a collection of n class-known instances {di, G?2, • * * ,dn},di e D. For handling heterogeneous applications - those with both numeric and nominal features, we use HVDM distance function (to be presented later) as a default similarity measure to describe the following algorithms. The detailed model construction algorithm of SBModel is described as follows: Step 1: Select a similarity measure and create a similarity matrix for a given training setD. Step 2: Set to 'ungrouped' the tag of all instances. Step 3: For each 'ungrouped' instance, find its local neighborhood. Step 4: Among all the local neighborhoods obtained in step 3, find its global neighborhood Ni, Create a representative {Cls{di),Sim{di),Nu'm{di), Rep{di)) into M to represent all the instances covered by Ni, and then set to 'grouped' the tag of all the instances covered by Ni. Step 5: Repeat step 3 and step 4 until all the instances in the training set have been set to 'grouped'. Step 6: Model M consists of all the representatives collected from the above learning process. In the above algorithm, D represents a given training set, M represents the created model. The elements of representative {Cls{di), Sim{di)^Num{di)^ Rep{di)) respectively represent the class label of di, the HVDM distance of di to the furthest instance among the instances covered by Ni', the number of instances covered by Ni, and a representation of di itself. In step 4, if there are more than one local neighborhood having the same maximal number of neighbors, we choose the one with minimal value of Sim{di), i.e. the one with highest density, as representative. The classification algorithm of SBModel is described as follows: Step 1: For a new instance dt to be classified, calculate its similarity to all representatives in the model M. Step 2: If dt is covered by a representative {Cls{dj),Sim{dj),Num{dj), Rep{dj)), i.e. the HVDM distance of dt to dj is smaller than Sim{dj), dt is classified as the class label of dj. Step 3: If no representative in the model M covers dt, classify dt as the class label of a representative which boundary is closest to dt. The HVDM distance of dt to a representative di's nearest boundary is equal to the difference of the HVDM distance of di to dt minus Sim{di). In an attempt to improve the classification accuracy of SBModel, we implemented two different pruning methods in our SBModel. One method is to remove both the representatives from the model M created by SBModel that only cover a few instances and the relevant instances covered by these representatives from the training set, and then to construct the model again using SBModel from the revised training set. The SBModel algorithm based on this pruning method is called p-SBModel. The model construction algorithm of p-SBModel is described as follows:

232

Gongde Guo, Hui Wang, David Bell, and Zhining Liao

Step 1: For each representative in the model M created by SBModel that only covers a few (a pre-defined threshold by users) instances, remove all the instances covered by this representative from the training set D. Set model M=0, then go to step 2. Step 2: Construct model M from the revised training set D again by using SBModel. Step 3: The final model M consists of all the representatives collected from the above pruning process. The second pruning method modifies the step 3 in the model construction algorithm of SBModel to allow each local neighborhood to cover e (called error tolerance rate) instances with different class label to the majority class label in this neighborhood. This modification integrates the pruning work into the process of model construction. The SBModel algorithm based on this pruning method is called sSBModel. The detailed model construction algorithm of e-SBModel is described as follows: Step 1: Select a similarity measure and create a similarity matrix for a given training setD. Step 2: Set to 'ungrouped' the tag of all instances. Step 3: For each 'ungrouped' instance, find its local ^-neighborhood. Step 4: Among all the local ^-neighborhoods obtained in step 3, find its global sneighborhood Ni. Create a representative {Cls(di),Sim{di),Num{di), Rep{di)) into M to represent all the instances covered by A^^, and then set to 'grouped' the tag of all the instances covered by A^^. Step 5: Repeat step 3 and step 4 until all the instances in the training set have been set to 'grouped'. Step 6: Model M consists of all the representatives collected from the above learning process. The SBModel is a basic algorithm with s=0 (error tolerance rate) and without pruning.

16.4 Experiment and Evaluation 16.4.1 Data Sets To evaluate the SBModel method and its variations, fifteen public data sets have been collected from the UCI machine learning repository for training and testing. Some information about these data sets is listed in Table 16.1. The meaning of the title in each column is follows: NF-Number of Features, NN-Number of Nominal features, NO-Number of Ordinal features, NB-Number of Binary features, NI-Number of Instances, CD-Class Distribution. 16.4.2 Experimental Environment Experiments use the 10-fold cross validation method to evaluate the performance of SBModel and its variations and to compare them with C5.0, A:NN (Voting ^NN), and w^NN (Distance weighted A:NN). We implemented SBModel and its variations, ^NN

16 Similarity-Based Data Reduction and Classification

233

Table 16.1. Some information about the data sets Dataset

NF NN NO NB

Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo

14 23 8 9 13 13 19 34 4 6 60 18 16 13 16

NI

CD

4 383:307 6 4 690 232:136 16 7 0 368 268:500 0 8 0 768 0 9 0 214 70:17:76:0:13:9:29 164:139 7 3 303 3 120:150 7 3 270 3 1 12 155 6 32:123 126:225 0 34 0 351 4 0 150 50:50:50 0 145:200 6 0 345 0 97:111 0 60 0 208 0 18 0 846 212:217:218:199 267:168 0 16 435 0 59:71:48 0 13 0 178 16 0 0 90 37:18:3:12:4:7:9

and wA;NN in our own prototype. The C5.0 algorithm used in our experiments were implemented in the Clementine' software package. The experimental results of other editing and condensing algorithms to be compared here are obtained from Wilson's experiments [15]. In voting ^NN, the k neighbors are implicitly assumed to have equal weight in decision, regardless of their distances to the instance x to be classified. It is intuitively appealing to give different weights to the k neighbors based on their distances to jc, with closer neighbors having greater weights. In wA:NN, the k neighbors are assigned to different weights. Let c/ be a distance measure, and xi, 0:2, • • ? ^fc be the A: nearest neighbors of ;c arranged in increasing order of d{xi,x). So xi is the first nearest neighbor of jc. The distance weight Wi for i*^ neighbor Xi is defined as follows:

I

1

if d{xk,x) = d{xi,x)

Instance x is assigned to the class for which the weights of the representatives among the k nearest neighbors sum to the greatest value. In order to handle heterogeneous applications - those with both ordinal and nominal features - we use a heterogeneous distance function HVDM [14] as the distance function in the experiments, which is defined as:

HVDM{^,y) = where the function daixa-, Va) is the distance for feature a and is defined as:

234

Gongde Guo, Hui Wang, David Bell, and Zhining Liao

(

1

if Xa or ya is unknown;

otherwise

vdma{Xa,ya) if a is nominal, else l^a — ?/a|/4cra if a is numeric In above distance function, aa is the standard deviation of the values occurring for feature a in the instances in the training set D, and vdma{xa, Va) is the distance function for nominal features called Value Difference Metric [10]. Using the VDM, the distance between two values Xa and ya of a single feature a is given as: C=l

^'^

^^

where Nx^ is the number of times feature a had value Xa', Nx^,c is the number of times feature a had value Xa and the output class was c; and C is the number of output classes. 16.4.3 Experimental Results [Experiment 1] In this experiment, our goal is to evaluate the basic SBModel algorithm, and to compare its experimental results with C5.0, kNN and wA:NN. So the error tolerance rate is set to 0, k for kNN is set to 1, 3, 5 respectively, k for wA:NN is set to 5, and the allowed minimal number of instances covered by a representative (N) in the final model of SBModel is set to 2, 3, 4, 5 respectively. Under these settings, A comparison of C5.0, SBModel, ^NN, and wkNN in classification accuracy using 10-fold cross validation is presented in Table 16.2. The reduction rate of SBModel is listed in Table 16.3. Note that in Table 16.2 and Table 16.3, N = / means each representative in the final model of SBModel at least covers / instances of the training set. From the experimental results, the average classification accuracy of the proposed SBModel method in its basic form on fifteen training sets is better than C5.0, and is comparable to A:NN and wA:NN. But the SBModel significantly improves the efficiency of A:NN by keeping only 9.19 percent (N=4) of the original instances for classification with only a slight decreasing in accuracy (81.29%) in comparison with A:NN (82.58%) and wkNN (82.34%). [Experiment 2] In this experiment, our goal is to evaluate s-SBModel. So we tune the error tolerance rate £ in a small range from 0 to 4 for each training set, and choose the e for obtaining relatively higher classification accuracy. The experimental results are presented in Table 16.4. Note that in Table 16.4 heading RR for short represents 'Reduction Rate'. From the experimental results in Table 16.4, e-SBModel obtains better performance than C5.0, SBModel, ^NN, and wkNN. Even when A^=5,6:-SBModel still obtains 82.93% classification accuracy which is higher than 79.96% of C5.0, 82.58% of ^NN, and 82.34% of wA:NN (Refer to Table 16.2 for more details). In this situation, £-SBModel only keeps 7.67 percent instances of the original training set for classification, thus significantly improving the efficiency whilst improving the classification accuracy, ofA:NN.

16 Similarity-Based Data Reduction and Classification

235

Table 16.2. A comparison of C5.0, SBModel, ^NN, and w^NN in classification accuracy Dataset

C5.0 N=2 N=3 N=4 N=5 A:NN(1) itNN(3) ^NN(5) witNN(5)

Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo

85.5 80.9 76.6 66.3 74.9 75.6 80.7 84.5 92.0 65.8 69.4 67.9 96.1 92.1 91.1

79.42 78.89 70.92 68.10 78.33 76.30 80.67 87.14 95.33 60.00 88.00 68.57 91.30 95.88 96.67

82.75 83.89 73.03 66.67 82.33 80.37 80.67 85.14 94.67 66.47 83.50 71.79 92.17 94.71 95.56

85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56

82.46 81.94 72.37 67.62 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56

Average

79.96 82.16 82.03 81.29 80.22 81.03

82.25

82.58

82.34

84.20 81.67 75.00 65.24 82.67 80.37 83.33 94.29 96.00 63.53 84.00 65.83 88.70 95.29 92.22

84.64 82.50 74.08 65.24 80.33 80.74 85.33 93.71 96.00 64.41 82.50 65.36 88.70 94.71 92.22

84.78 83.06 74.21 61.43 80.33 80.37 87.33 92.57 96.00 63.82 80.00 63.69 88.70 94.12 88.89

84.63 82.50 74.74 55.71 78.00 77.78 87.33 91.43 96.00 61.76 79.50 62.26 88.70 94.12 88.89

Ta ble 16.3. The reduction rate of SBModel in the firlal model Dataset

N=2

N=3

N=4

N=5

Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo

86.81 78.26 80.47 79.44 84.16 84.81 85.81 81.48 95.33 73.62 81.73 80.38 91.38 90.45 91.11

90.43 84.24 86.98 88.32 87.79 88.52 88.39 85.19 96.00 83.48 86.06 87.83 93.53 90.45 92.22

92.17 87.50 89.58 90.19 91.42 90.00 90.32 88.60 96.00 88.70 87.50 91.96 93.97 92.13 92.22

92.46 88.86 91.67 93.93 92.74 91.48 91.61 89.74 96.00 92.75 90.87 93.50 94.40 92.70 93.33

Average

84.35 88.63 90.81 92.40

236

Gongde Guo, Hui Wang, David Bell, and Zhining Liao Table 16.4. The classification accuracy and reduction rate of s-SBModel Dataset

£ N=2 RR

N=3 RR N=4

RR N=5 RR

Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo Average

2 84.93 90.43 84.93 90.43 85.22 92.17 85.51 92.46 1 83.06 78.26 83.06 84.24 82.78 87.50 83.61 88.86 1 74.34 80.47 74.47 86.98 75.13 89.58 75.53 91.67 3 69.52 90.19 69.52 90.19 69.52 90.19 69.05 93.93 4 81.67 92.08 81.67 92.08 81.67 92.08 81.67 92.08 1 80.74 84.81 81.11 88.52 81.85 90.00 81.11 91.48 1 88.00 85.81 89.33 88.39 88.67 90.32 88.67 91.61 1 93.71 81.48 93.71 85.19 92.86 88.60 92.57 89.74 0 96.00 95.33 96.00 96.00 96.00 96.00 96.00 96.00 2 68.53 83.48 68.53 83.48 68.24 88.70 67.94 92.75 2 82.50 86.54 82.50 86.54 82.50 88.94 81.50 90.38 2 66.43 87.83 66.43 87.83 66.55 91.96 66.67 93.50 4 91.74 94.40 91.74 94.40 91.74 94.40 91.74 94.40 0 95.29 90.45 94.71 90.45 94.12 92.13 94.12 92.70 0 92.22 91.11 92.22 92.22 88.89 92.22 88.89 93.33 83.25 87.51 83.33 89.13 83.05 90.99 82.93 92.33

[Experiment 3] In this experiment, our goal is to evaluate p-SBModel. It is a nonparametric classification method which conducts pruning work by removing both the representatives from the model M that only cover 1 instances (it means no any induction being done for this representative) and the relevant instances covered by these representatives from the training set, and then constructing the model from the revised training set again. The experimental results are presented in Table 16.5. Form the experimental results shown in Table 16.5, it is clear that with the same classification accuracy, p-SBModel has a slight higher reduction rate than SBModel on average. The main merit of the /?-SBModel algorithm is that it does not need any parameter to be set in both modelling and classification stages. However, its average classification accuracy is comparable to A:NN and wA:NN. It keeps only 10.13 percent instances of the original training set for classification. [Experiment 4] In this experiment, we compare our SBModel method and its variations with other algorithms in the literature in average classification accuracy and reduction rate. These algorithms to be compared in the experiment include CNN, SNN, IB3, DEL, ENN, RENN, Allk-NN, ELGrow, Explore and Drop3, each of which has been described in section 16.2 in this paper. The experimental results are presented in Figure 16.2. Note that the values of the horizontal axis In Figure 16.2 represent different algorithms, i.e. 1-CNN, 2-SNN, 3-IB3, 4-DEL, 5-Drop3, 6-ENN, 7-RENN, 8-Allk-NN, 9-ELGrow, 10-Explore, 11-SBModel, 12-(^-SBModel), 13-(p-SBModel). From the experimental results, it is clear that the average classification accuracy and reduction rate of our proposed SBModel method and its variations on fifteen data sets are better than other data reduction methods in 10-fold cross validation with exceptions of

16 Similarity-Based Data Reduction and Classification

237

Table 16.5. A comparison of A:NN, SBModel, andp-SBModel Dataset

itNN(5) wfcNN(5) RR SBModel(3) RR p-SBModel RR

LiverBupa Sonar Vehicle Vote Wine Zoo

85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56

82.46 81.94 72.37 67.62 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

84.64 82.50 74.08 65.24 80.33 80.74 85.33 93.71 96.00 64.41 82.50 65.36 88.70 94.71 92.22

90.43 84.24 86.98 88.32 87.79 88.52 88.39 85.19 96.00 83.48 86.06 87.83 93.53 90.45 92.22

86.23 82.78 73.16 65.24 80.67 81.85 84.67 92.00 95.33 62.94 82.50 67.26 90.00 94.71 91.11

95.22 88.59 87.11 84.58 89.11 91.11 96.77 87.18 95.33 82.03 86.54 83.69 96.98 90.45 93.33

Average

82.58

82.34

0

82.03

88.63

82.03

89.87

Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere

Iris

m Classification Accuracy

• Reduction Rate

120 -1 100 - §^B'«^^^^^^^^^©^^^l*^^^^^^^N^^^^^^^^^^^^^^^^^^Rf ^F% 80 - s ^ w f 4 i ^ ^ a ^ s ^ : f i i ^ ^ ^ ^ s t i M ^ a B m 60 40 - ^^^ M' fli B l H i ^ • m i : ^ : W H M 1 SHf I11 H I-Hi ^ l K1Pml^^B" 20 - M^M •SK^ ^B^«*^^Bl!-''~ WUf -mlLkimli-*'" *ffl^S £HfT-^Hr>>^BHC .mm^''-'mm^^'l - • i ^ « i ' « i , ^ @ i •

m ai 1

1

2

3

4

5

6

7

8

9

10

11

12

13

Fig. 16.2. Average classification accuracy and reduction rate ELGrow and Explore in reduction rate. Though ELGrow obtains a highest reduction rate among all the algorithms for comparison, its rather lower classification accuracy counteracts its advantage in reduction rate. Explore seems to be a competitive algorithm with a higher reduction rate and a slight lower classification accuracy in comparison with our proposed SBModel and its variations. Otherwise, Drop3 is the one closest to our algorithms both in classification accuracy and reduction rate.

16,5 Conclusions In this paper we have presented a novel solution for dealing with the shortcomings of ^NN. To overcome the problems of low efficiency and dependency on k, we select a few representatives from training set with some extra information to represent the whole training set. In the selection of each representative we use the optimal but different k, decided automatically according to the local data distribution, to eliminate

238

Gongde Guo, Hui Wang, David Bell, and Zhining Liao

the dependency on k without user's intervention. Experimental results carried out on fifteen public data sets have shown that SBModel and its variations e-SBModel and p-SBModel are quite competitive for classification. Their average classification accuracies on fifteen public sets are better than C5.0 and are comparable with A:NN, and wA:NN. But our proposed SBModel and its variations significantly reduce the number of the instances in the final model for classification with a reduction rate ranging from 88.63% to 92.33%. Moreover, comparing to other reduction techniques, s-SBModel obtains the best performance. It only keeps 7.67 percent instances of the original training set on average for classification whilst improving the classification accuracy of A:NN and wA:NN. It is a good alterative of ^NN in many application areas such as for text categorization and for financial stock market analysis and prediction.

References 1. Aha DW, Kibler k, Albert MK (1991) Instance-Based Learning Algorithms, Machine Learning, 6, pp.37-66. 2. Aha DW (1992) Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms, International Journal of Man-Machine Studies, 36, pp. 267-287. 3. Cameron-Jones, RM (1995) Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing, Proc. of the 8th Australian Joint Conference on Artificial Intelligence, pp. 99-106. 4. Devijver P, Kittler J (1972) Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ. 5. Gates G (1972) The Reduced Nearest Neighbor Rule, IEEE Transactions on Information Theory, 18, pp. 431-433. 6. Hand D, Mannila H, Smyth P (2001) Principles of Data Mining, The MIT Press. 7. Hart P (1968) The Condensed Nearest Neighbor Rule, IEEE Transactions on Information Theory, 14,515-516. 8. Riter GL, Woodruff HB, Lowry SR et al (1975) An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21-6, November, pp. 665-669. 9. Sebastiani F (2002) Machine Learning in Automated Text Categorization, In ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47. 10. StanfiU C, Waltz D (1986) Toward Memory-Based Reasoning Communications of the ACM, 29, pp. 1213-1228. 11. Tomek A (1976) An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, 6-6, pp. 448-452. 12. Wang H (2003) Contextual Probability, in Journal of Telecommunications and Information Technology, 4(3):92-97. 13. Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data, IEEE Transactions on Systems, Man, and Cybernetics, 2-3, pp. 408-421. 14. Wilson DR, Martinez TR(1997) Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), 6-1, pp. 1-34. 15. Wilson DR, Martinez TR (2000)Reduction Techniques for Instance-Based Learning Algorithms, Machine Learning, 38-3, pp. 257-286.

17 Decision TVees and Reducts for Distributed Decision Tables Mikhail Ju. Moshkov Institute of Computer Science, University of Silesia 39, B^ziiiska St., Sosnowiec, 41-200, Poland [email protected]

Summary. In the paper greedy algorithms for construction of decision trees and relative reducts for joint decision table generated by distributed decision tables are studied. Two ways for definition of joint decision table are considered: based on the assumption that the universe of joint table is the intersection of universes of distributed tables, and based on the assumption that the universe of joint table is the union of universes of distributed tables. Furthermore, a case is considered when the information about distributed decision tables is given in the form of decision rule systems. Key words: distributed decision tables, decision trees, relative reducts, greedy algorithms

17.1 Introduction In the paper distributed decision tables are investigated which can be useful for the study of multi-agent systems. Let T i , . . . , T^ be decision tables, and { a i , . . . , an} be the set of attributes of these tables. In the paper two questions are considered: how we can define a joint decision table T with attributes a i , . . . , a^ generated by tables Ti,... ,Tm. and how we can construct decision trees and relative reducts for the table T. We study two extreme cases: • •

The universe of the table T is the intersection of universes of tables T i , . . . , Tm. The universe of the table T is the union of universes of tables T i , . . . , T^n.

In reality, we consider more complicated situation when we do not know exactly universes of the tables T i , . . . , r ^ . In this case we must use upper approximations of the table T which are tables containing at least all rows from T. We study two such approximations which are minimal in some sense: the table T^ for the case of the intersection of universes, and the table T^ for the case of the union of universes. We show that in the first case (intersection of universes) even simple problems (for given tables T i , . . . , T^ and a decision tree it is required to recognize is this tree a decision tree for the table T^; for given tables T i , . . . , T^ and a subset of the set

240

Mikhail Ju. Moshkov

{ a i , . . . , ttn} it is required to recognize is this subset a relative reduct for the table T^) are NP-hard. We consider approaches to minimization of decision tree depth and relative reduct cardinality on some subsets of decision trees and relative reducts. In the second case (union of universes) the situation is similar to the situation for single decision table: there exist greedy algorithms for decision tree depth minimization and for relative reduct cardinality minimization which have relatively good bounds on precision. Furthermore, we consider the following problem. Let we have a complicated system consisting of parts Q i , . . . , Qm- For each part Qj the set of normal states is described by a decision rule system Sj. It is required to recognize for each part Qj is the state of this part normal or abnormal. We consider an algorithm for minimization of depth of decision trees solving this problem, and bounds on precision for this algorithm. The results of the paper are obtained in the frameworks of rough set theory [8,9]. However, for simplicity we consider only "crisp" decision tables in which there are no equal rows labelling by different decisions. The paper consists of six sections. In the second section we consider known results on algorithms for minimization of decision tree depth and relative reduct cardinality for single decision table. In the third and fourth sections we consider algorithms for construction of decision trees and relative reducts for joint tables T^ and T^ respectively. In the fifth section we consider an algorithm which for given rule systems 5 i , . . . , Sm constructs a decision tree that recognizes the presence of realized rules in each of these systems. The sixth section contains short conclusion.

17.2 Single Decision Table Consider a decision table T (see Fig. 17.1) which has t columns labelling by attributes a i , . . . , at. These attributes take values from the set {0,1}. For simplicity, we assume that rows are pairwise different (rows in our case correspond not to objects from the universe, but to equivalence classes of the indiscemibility relation). Each row is labelled by a decision d.

Fig. 17.1. Decision table T

We correspond the classification problem to the decision table T: for a given row it is required to find the decision attached to this row. To this end we can use values of attributes a i , . . . ,at.

17 Decision Trees and Reducts for Distributed Decision Tables

241

Test for the table T is a subset of attributes which allow to separate each pair of rows with different decisions. Relative reduct (reduct) for the table T is a test for which each proper subset is not a test. Decision tree for the table T is a tree with the root in which each terminal node is labelled by a decision, each non-terminal node is labelled by an attribute, two edges start in each non-terminal node, and these edges are labelled by numbers 0 and 1. For each row the work of the decision tree is finished in terminal node labelling by the decision corresponding to the considered row. The depth of the tree is the maximal length of a path from the root to a terminal node. It is well known that the problem of reduct cardinality minimization and the problem of decision tree depth minimization are NP-hard problems. So we will consider only approximate algorithms for optimization of reducts and trees. 17.2.1 Greedy Algorithm for Decision Tree Construction Denote by P{T) the number of unordered pairs of rows with different decisions. This number will be interpreted as uncertainty of the table T. Sub-table of the table T is any table obtained from T by removal of some rows. For any a^ G { a i , . . . , a^} and b e {0,1} denote by T{ai^ b) the sub-table of the table T consisting of rows which on the intersection with the column labelling by a^ contain the number b. If we compute the value of the attribute a^ then the uncertainty in the worst case will be equal to t/(T,a,) = niax{P(T(a,,0)),P(r(a,,l))} . Let P{T) ^ 0. Then we compute the value of an attribute a^ for which C/(T, a^) has minimal value. Depending on the value of ai the given row will be localized either in the sub-table T(ai, 0) or in the sub-table T{ai, 1), etc. The algorithm will finish its work when in the constructed tree for any terminal node for sub-table T' corresponding to this node the equality P{V) = 0 holds. It is clear that the considered greedy algorithm has polynomial time complexity.Denote by h{T) the minimal depth of a decision tree for the table T. Denote by hgreedyiT) the depth of a decision tree for the table T constructed by the greedy algorithm. It is known [3,4] that hgreedyiT)

< h{T)\nP{T)

+ 1 .

Using results of Feige [1], one can show (see [5]) that if NP2DTIME(n^(^^s^^sn)) then for any e > 0 there is no polynomial algorithm that for a given decision table T constructs a decision tree for this table which depth is at most

{l-e)h{T)lnP{T)

.

Thus, the considered algorithm is close to best polynomial approximate algorithms for decision tree depth minimization.

242

Mikhail Ju. Moshkov

It is possible to use another uncertainty measures in the considered greedy algorithm. Let F be an uncertainty measure, T a decision table, V a sub-table of T, a^ an attribute of T and h G {0,1}. Consider conditions which allow to obtain bounds on precision of greedy algorithm: (a) F{T) - F{T{au h)) > F{r) - F{r{au b)). (b) F{T) = 0 iff r has no rows with different decisions. (c) If F{T) — 0 then T has no rows with different decisions. One can show that if F satisfies conditions (a) and (b) then Veedy(T)
\nP{T)

+ 1 .

Using results of Feige [1] one can show that if NP 2 DTIME(n^(^^s^^sn)) then for any ^ > 0 there is no polynomial algorithm that for a given decision table T constructs a reduct for this table which cardinality is at most {l~e)R{T)\nP{T)

.

Thus, the considered algorithm is close to best polynomial approximate algorithms for reduct cardinality minimization. To obtain bounds on precision of this algorithm it is important that we can represent the considered problem as a set cover problem. We can weaken this condition and consider a set cover problem such that each cover corresponds to a reduct, but not each reduct corresponds to a cover. In this case we will solve the problem of reduct cardinality minimization not on the set of all reducts, but we will be able to obtain some bounds on precision of greedy algorithm on the considered subset of reducts.

17 Decision Trees and Reducts for Distributed Decision Tables

243

17.3 Distributed Decision Tables. Intersection of Universes In this section we consider the case when the universe of joint decision table is the intersection of universes of distributed decision tables. 17.3.1 Joint Decision Table T^ Let Ti,..., Tyn be decision tables and { a i , . . . , a^} be the set of attributes of these tables. Let b = ( 6 i , . . . , 6^) ^ {0,1}*^, j G { 1 , . . . , m} and {a^^,..., a^,} be the set of attributes of the table Tj. We will say that the row b corresponds to the table Tj if {hi^,..., 6i J is a row of the table Tj. In the last case we will say that (6^^,... ,bij is the row from Tj corresponding to b. Let us define the table T^ (see Fig. 17.2). This table has n columns labelling by attributes a i , . . . , a^. The row b = ( 6 i , . . . , 6n) G {0, l}'^ is a row of the table T^ iff b corresponds to each table Tj,j e { 1 , . . . , m}. This row is labelled by the tuple (cfi,..., d^) where dj is the decision attached to the row from Tj corresponding to h,j e { 1 , . . . , m}. Sometimes we will denote the table T"^ by Ti x ... x T^.

{di,...

,dm)

Fig. 17.2. Joint decision table T^ One can interpret the table T^ in the following way. Let t/i,..., Um be universes corresponding to tables Ti,..., T^ and f/n = C^i Pi... n f/^. If we know the set C/p we can consider the table T(C/n) with attributes ai,..., On, rows corresponding to objects from C/n» and decisions of the kind ( d i , . . . , d^). Assume now that we do not know the set C/n- In this case we must consider an upper approximation of the table T{U^) which is a table containing all rows from T{Up). The table T^ is the minimal upper approximation in the case when we have no any additional information about the set

17.3.2 On Construction of Decision Trees and Reducts for T ^ Our aim is to construct decision trees and reducts for the table T^. Unfortunately, it is very difficult to work with this table. One can show that the following problems are NP-hard: •

For given tables Ti,..., T^ it is required to recognize is the table T^ = Ti x ... x Tm empty.

244

• • •

Mikhail Ju. Moshkov

For given tables Ti,..., T^ it is required to recognize has the table T^ rows with different decisions. For given tables Ti,..., T^ and decision tree F it is required to recognize is F a. decision tree for the table T^. For given tables Ti,..., T^ and a subset D of the set { a i , . . . , a^} it is required to recognize is P a reduct for the table T^.

So really we can use only sufficient conditions for decision tree to be a decision tree for the table T^ and for a subset of the set { a i , . . . , an} to be a reduct for the table T^. If P 7^ NP then there are no simple (polynomial) uncertainty measures satisfying the condition (b). Now we consider two examples of polynomial uncertainty measures satisfying the conditions (a) and (c). L e t a i i , . . . , a i , e { a i , . . . ,an}, 6 1 , . . . ,6^ G {0,1}, and Oi = ( a i , , 6 i ) . . . ( a ^ , , 6 t ) .

Denote by T^a the sub-table of T^ which consists of rows that on the intersection with columns labelling by a^^,..., a^^ have numbers 6 1 , . . . , 6^. Let j € { 1 , . . . , m} and Aj be the set of attributes of the table Tj. Denote by Tja the sub-table of Tj consisting of rows which on the intersection with column labelling by a^^ have number bk for each a^^ G Aj d {a^,,..., a^ J . Consider an uncertainty measure Fi such that F i ( r ^ a ) = P(Tia) H- ... + P ( T ^ a ) . One can show that this measure satisfies the conditions (a) and (c). Unfortunately, the considered measure does not allow to use relationships among tables Ti,,.. ,Tm- Describe another essentially more complicated but polynomial measure which takes into account some of such relationships. Consider an uncertainty measure F2. For simplicity, we define the value of this measure only for the table T^ (the value F2 (T^a) can be defined in the similar way). Set F2(T^) = GI + . . . - K G ^

.

Let j e { 1 , . . . , m}, and the table Tj have p rows r i , . . . , rp. Then

Gj-E^i^: where 1 < ^ < A; < p, and rows r^ and r^ have different decisions. Let q G { l , . . . , p } . Then ^^q

^ ql

'''

^qm

where V^^,i = 1 , . . . , m, is the number of rows r in the table Tj x Ti such that r^ is the row from Tj corresponding to r. It is not difficult to prove that this measure satisfies the conditions (a) and (c). One can show that if P 7^ NP then it is impossible to reduce effectively the problem of reduct cardinality minimization to a set cover problem. However, we can

17 Decision Trees and Redacts for Distributed Decision Tables

245

consider set cover problems such that each cover corresponds to a reduct, but not each reduct corresponds to a cover. Consider an example. Denote B{Tj), j — 1 , . . . , m, the set of unordered pairs of rows from Tj with different decisions, B = J5(ri) U . . . U B(Tm), and Q the set of pairs from B separating by a^, i = 1 , . . . , n. It is not difficult to show that the set cover problem for the set B and family { C i , . . . , C^} of subsets of B has the following properties: each cover corresponds to a reduct for the table T^, but (in general case) not each reduct corresponds to a cover.

17.4 Distributed Decision Tables. Union of Universes In this section we consider the case when the universe of joint decision table is the union of universes of distributed decision tables. 17.4.1 Joint Decision Table T ^ Let Ti,..., Tm be decision tables and { a i , . . . , «„} be the set of attributes of these tables. Let us define the table T^ (see Fig. 17.3). This table has n columns labelling by attributes a i , . . . , a^. The row b = (fei,..., 6^) G {0,1}"^ is a row of the table T^ iff there exists j G {1, • •., m} such that b corresponds to the table Tj. This row is labelled by the tuple (d*,..., cij^) where dj is the decision dj attached to the row from Tj corresponding to b, if b corresponds to the table Tj, and gap otherwise, j e {l,...,m}.

(di, — , . . . ,dm)

Fig. 17.3. Joint decision table T"^ Two tuples of decisions and gaps will be considered as different iff there exists digit in which these tuples have different decisions (in the other words, we will interpret gap as an arbitrary decision). We must localize a given row in a sub-table of the table T^ which does not contain rows labelling by different tuples of decisions and gaps. Most part of results considered in Sect. 17.2 is valid for joint tables T^ too. One can interpret the table T^ in the following way. Let C/i,..., Um be universes corresponding to tables Ti, ...,r^, and Uu = (7i U ... U 1/^. If we know the set U\j we can consider the table T{Uu) with attributes ai,..., a^, rows corresponding to objects from Uu, and decisions of the kind (di, —,..., dm)- Assume now that we do not know the set C/y In this case we must consider an upper approximation of the table T{U[j) which is a table containing all rows from T{Uu). The table T^ is the minimal upper approximation in the case when we have no any additional information about the set f/y-

246

Mikhail Ju. Moshkov

17.4.2 On Construction of Decision Trees and Reducts for T^ L e t a i i , . . . , a i , C {ai,...,On}, 6 1 , . . . , 6t G {0,1}, and a = {ai^.h).. Consider the uncertainty measure Fi such that

.{ai^,bt).

Fi(T^a) = P(Tia) + ... + P{Tma) . One can show that this measure satisfies the conditions (a) and (b). So we can use greedy algorithm for decision tree depth minimization based on the measure Fi, and we can obtain relatively good bounds for this algorithm precision. The number of nodes in the constructed tree can grow as exponential on m. However, we can effectively simulate the work of this tree by construction of a path from the root to a terminal node. Denote B{Tj), j = 1 , . . . , m, the set of unordered pairs of rows from Tj with different decisions, B = B(Ti) U . . . U B{Tm), and Q the set of pairs from B separating by a^, i = 1 , . . . , n. It is not difficult to prove that the problem of reduct cardinality minimization for T^ is equivalent to the set cover problem for the set B and family { C i , . . . , C^} of subsets of B. So we can use greedy algorithm for minimization of reduct cardinality, and we can obtain relatively good bounds for this algorithm precision.

17.5 From Decision Rule Systems to Decision Tree Instead of distributed decision tables we can have information on such tables represented in the form of decision rule systems. Let we have a complicated object Q the state of which is described by values of attributes a i , . . . , an. Let Q i , . . . , Qm be parts of Q, For j = 1 , . . . , m the state of Qj is described by values of attributes from a subset Aj of the set { a i , . . . , a„}. For any Qj v/e have a system Sj of decision rules of the kind Oil = bi A ,.. Aai^ =bt -^ normal where a^^,..., a^^ are pairwise different attributes from Aj, and 6 1 , . . . , 6t are values of these attributes (not necessary numbers from {0,1}). These rules describe the set of all normal states of Qj. We will assume that for any j G { 1 , . . . , m} and for any two rules from Sj the set of conditions from the first rule is not a subset of the set of conditions from the second rule. We will assume also that all combinations of values of attributes are possible, and for each attribute there exists an "abnormal" value which is not in rules (for example, missed value of the attribute). For each part Qj we must find a rule from Sj which is realized (in this case Qj has normal state) or must show that all rules from Sj are non-realized (in this case Qj has abnormal state).

17 Decision Trees and Reducts for Distributed Decision Tables

247

Consider simple algorithm for construction of a decision tree solving this problem. Really we will construct a path from the root to a terminal node of the tree. Describe the main step of this algorithm which consists of 6 sub-steps. Main step: 1. Find minimal set of attributes which cover all rules (an attribute covers a rule if it is in this rule). 2. Compute values of all attributes from this cover. 3. Remove all rules which contradict to obtained values of attributes. If after the realization of this sub-step a system of rules Sj, j G { 1 , . . . , m}, will be empty then corresponding part Qj has abnormal state. 4. Remove from the left side of each rule all conditions (equalities) containing attributes from the cover. 5. Remove all rules with empty left side (such rules are realized). Remove all rules from each system Sj which has realized rules. For each such system the corresponding part Qj has normal state. 6. If the obtained rule system is not empty then repeat main step. Denote h the minimal depth of a decision tree solving the considered problem, /laig the depth of the decision tree constructed by the considered algorithm, and L the maximal length of a rule from the system 5 = 5i U . . . U 5 ^ . One can prove that max{L, h} < /laig < L x h . It is possible to modify the considered algorithm such that we will construct a cover of rules by attributes using greedy algorithm for set cover problem. Denote by /if[g®^^ the depth of constructed decision tree. By N we denote the number of rules in the system S. One can prove that

hir^''^'^
+ l) .

Some other algorithms for transformation of decision rule systems into decision trees can be found in [6].

17,6 Conclusion In the paper algorithms for construction of decision trees and relative reducts for joint decision table generated by distributed decision tables are considered. Some of these algorithms can be useful in applications.

Acknowledgments The author is greatly indebted to Andrzej Skowron for stimulating discussions.

248

Mikhail Ju. Moshkov

References 1. Feige U (1996) A threshold of In n for approximating set cover (Preliminary version). In: Proceedings of 28th Annual ACM Symposium on the Theory of Computing 2. Johnson D S (1974) J Comput System Sci 9:256-278 3. Moshkov M Ju (1982) Academy of Sciences Doklady 265:550-552 (in Russian); English translation: Sov Phys Dokl 27:528-530 4. Moshkov M Ju (1983) Conditional tests. In: Yablonskii S V (ed) Problems of Cybernetics 40. Nauka Publishers, Moscow (in Russian) 5. Moshkov M Ju (1997) Algorithms for constructing of decision trees. In: Proceedings of the First European Symposium Principles of Data Mining and Knowledge Discovery, LNCS 1263, Springer-Verlag 6. Moshkov M Ju (2001) On transformation of decision rule systems into decision trees. In: Proceedings of the Seventh International Workshop Discrete Mathematics and its Applications 1 (in Russian) 7. Nigmatullin R G (1969) Method of steepest descent in problems on cover. In: Memoirs of Symposium Problems of Precision and Efficiency of Computing Algorithms 5 (in Russian) 8. Pawlak Z (1991) Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht Boston London 9. Skowron A, Rauszer C (1992) The discemibility matrices and functions in information systems. In: Slowinski R (ed) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht Boston London

18 Learning Concept Approximation from Uncertain Decision Tables Nguyen Sinh Hoa^ and Nguyen Hung Son^ ^ Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008, Warsaw, Poland ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland e-mails: {hoa, son}@mimuw. edu. pi

Summary. We present a hierarchical learning approach to approximation of complex concept from experimental data using inference diagram as a domain knowledge. The solution, based on rough set and rough mereology theory, require to design a learning method from uncertain decision tables. We examine the effectiveness of the proposed approach by comparing it with standard learning approaches with respect to different criteria on artificial data sets generated by a traffic road simulator.

18.1 Introduction Concept approximation is an important problem in data mining [4]. In a typical process of concept approximation we assume that there is given information consisting of values of conditional and decision attributes on objects from a finite subset (training set, sample) of the universe and using this information one should induce approximations of the concept over the whole universe. In many learning tasks, e.g., identification of dangerous situations on the road by unmanned vehicle aircraft (UAV), the target concept is too complex and it can not be approximated directly from feature value vectors. In some cases, when the target concept is a composition of some simpler one, the layered learning [14] is an altemative approach to concept approximation. Given a hierarchical concept decomposition. The main idea is to gradually synthesize a target concept from simpler ones. A learning process can be imagined as a treelike structure with the target concept located at the highest layer. At the lowest layer, basic concepts are approximated using feature values available from a data set. At the next layer more complex concepts are synthesized from basic concepts. This process is repeated for successive layers. The importance of hierarchical concept synthesis is now well recognized by researchers (see, e.g., [9, 6]). An idea of hierarchical concept synthesis, in the rough mereological and granular computing frameworks has been developed (see, e.g..

250

Nguyen Sinh Hoa and Nguyen Hung Son

[9, 6, 10]) and problems connected with compound concept approximation are discussed, e.g., in [6, 11, 1, 13]. In this paper we concentrate on concepts that are specified by decision classes in decision systems [7]. The crucial for inducing concept approximations is to create the description of concepts in such a way that makes it possible to maintain the acceptable level of imprecision along all the way from basic attributes to final decision. In this paper we discuss some strategies for concept composing founded on the rough set theory approach. We also examine effectiveness of layered learning approach by comparison with standard rule-based learning approach. Quality of the new approach will be verified with respect to generality of concept approximation, preciseness of concept approximation, computation time required for concept induction and concept description lengths. Experiments are carried out on an artificial data set generated by a traffic road simulator.

18.2 Basic notions Many problems in machine learning, pattern recognition or data mining can be formulated as searching for concept approximation issues. Formally, given an universe U of objects (cases, states, patients, observations, etc.), and a concept X which can be interpreted as a subset of ZY, the problem is to find a description of X, that can be expressed in a predefined descriptive language C. We assume that C consists of such formulas that are interpretable as subsets ofU. The concept approximation problem can be formulated as a problem of searching for a (approximated) description of a concept X based on a finite set of examples U cU called training set. The approximation is required to be closed to the original concept. The closeness of approximation to the original concept can be measured by different criteria like accuracy, description length,..., which can be also approximated by so called testing examples. Usually, we assume that the input data for concept approximation problem is given by decision table, which is a tuple § = {U,A, dec), where C/ is a non-empty, finite set of training objects, A is a non-empty, finite set, of attributes and dec ^ A is a distinguished attribute called decision. Each attribute a e A corresponds to the function a :U -^Va called evaluation function, where Va is called the domain of a. For any non-empty set of attributes B C A and any object x G C/, we define the Binformation vector of a: by: inf Q{X) = {(a, a(x)) : a G B}, The set INFB{S)

=

: X G U} is called the B-information set of S. Without loss of generality, we assume that the domain of the decision dec is equal to V^ec = {l, • • • j^}- For any k e Vdec, the set CLASSk = {x e U : dec{x) = A;} is called the k*^ decision class of S. The decision dec determines a partition of U into decision classes, i.e., U = CLASSi U . . . U CLASSd. In case of concept approximation problem, we can assume that Vdec = {y^s, no}, and U = CLASSyes U CLASSno{^^/JB (X)

18 Learning Concept Approximation from Uncertain Decision Tables

251

The approximated description of a concept can be induced by any learning algorithm from inductive learning area like rule extraction, decision tree, ... In the next Section we concentrate on methods based on layered learning and rough set theory. In some concept approximation problems (see next Section), we have to approximate the given concept from uncertain decision table. In such uncertain decision tables, attributes are not determined exactly. Formally, every attribute a e A is associated with an evaluated function Ua ' U x Va -^ [0,1]. Assume that K = {'^ii-'-i'^ka} is the domain of the (discrete) attribute a, then the vector i/a{x) = [i/aix^vi),..., Ua{x^ '^ka)] is Called the value distribution of attribute a for the object x. We also assume that the condition ^

Va[x,Vi) < 1

ViEVa

holds for any object u eU. In layered learning approach, the standard decision tables are used to approximate the basic concepts, which are located in the lowest layer. For concepts that are located on the higher levels (called compound concepts), we can use uncertain decision table to induce their approximation.

18.3 Rough Sets and Concept Approximation Problem 18.3.1 Basic idea Rough set methodology for concept approximation can be described as follows. Let X C U be Si concept and let C/ C ZY be a finite sample of U. Assume that for any X eU there is given information if x e X nU ovx eU - X. Any pair P = (L, U) is called rough approximation ofX if it satisfies the following conditions:

1. L C U C W ; 2. L, U are subsets ofU expressible in the language £;

3. LnU cxnu

cvnU;

4. L is maximal (and U is minimal) in the family of sets definable in C satisfying 3. The sets L and U are called the lower approximation and the upper approximation of the concept X CU, respectively. The set B N = U \ L is called the boundary region of approximation of X. The set X is called rough with respect to its approximations (L, U) if L 7^ U, otherwise X is called crisp in U. The pair (L, U) is also called the rough set (for the concept X). The condition (4) in the above list can be substituted by inclusion to a degree to make it possible to induce approximations of higher quality of the concept on the whole universe U. In practical applications the last condition in the above definition can be hard to satisfy. Hence, by using some heuristics we construct sub-optimal instead of maximal or minimal sets. The rough approximation of concept can be also defined by means of rough membership function.

252

Nguyen Sinh Hoa and Nguyen Hung Son

Definition 1 Let X C U be a concept and let decision table S = {U,A, dec) describe the training objects U C.U. A function JJLX : W —» [0,1] is called a rough membership function of the concept X C. U if and only if (L^x -> ^MX ) ^^ ^ rough approximation ofX (induced from sample U), where L^^ = {x eU : p^xi^) = 1} andV^x = {^ € ZY : /ix(^) > 0}Many methods of discovering rough approximations of concepts from data have been proposed, e.g., method based on reducts [7][8], on k-NN classifiers [1], or on decision rules [1]. Let us remind the construction of rough membership function in the concept approximation approach based on decision rules. Given a decision table S = {U, A, dec). Let us assume that RULES(S) is a set of decision rules induced by some rule extraction method. For any object x e U, let MatchRules{S, x) be the set of rules from RULES(S) supported by x. One can define the rough membership function pcLASSk : Z^ —> [0,1] for the concept determined by CLASSk as follows: 1. Let Ryes be the set of all decision rules from MatchRules{S, x) for k^^ class and let Rno be the set of decision rules from MatchRules{S, x) for other classes. 2. We define two real values Wyes.Wno by 'i^yes =

/__] strength{r) and Wno = T ^

strength{r)

where strength{r) is a normalized function depending on length, support, confidence of r and some global information about the decision table S like table size, class distribution (see [2]). 3. One can define the value of fjicLASSk (^) by ^ undetermined if msx{wyes^'Wno) < ^ 0 if > 9 and Wno > ^ PCLASSk (^) = { 1

if Wyes - "^no > 0 and Wyes > (^

' ^ ^ " ^ - " " - M n other cases where cj, 0 are parameters set by user. These parameters make it possible in a flexible way to control the size of boundary region for the approximations established according to Definition 1. 18.3.2 Rough set based layered learning In this section we discuss a strategy composing concepts that are described with concepts established on top of already existing ones. A method for concept composition is a crucial point in concept synthesis. We will discuss the method that gives us the ability to control the level of approximation quality along all the way from basic concepts to the target concept. Let us assume that a concept hierarchy H is given. The concept hierarchy should contain either inference diagram or dependence diagram that connect the target concept with input attribute through intermediate concepts. A training set is represented

18 Learning Concept Approximation from Uncertain Decision Tables

253

by decision table S5 = {U, A, D), where D is a set of decision attributes corresponding to all intermediate concepts and to the target concept. Decision values indicate if an object belong to basic concepts and to the target concept, respectively. Using information available from a concept hierarchy for each basic concept Cb one can create a training decision system SQ, = {U^ Act,, c^eccj,), where Acf, C A, and deccf, e D.To approximate the concept Cb one can apply any classical method (e.g., k-NN, supervised clustering, or rule-based approach [5]) to the table Scf,- In further discussion we assume that basic concepts are approximated by rule based classifiers (see Section 2) derived from relevant decision tables.

c Ac

§c«

--(U,Ac,,deccJ

iU,Ac,decc) = {Wy^s , ^no

, 'Wyes, Wno }

'Sc{, = {U,Ac(,,decCf^

Fig. 18.1. The construction of compound concept approximation using rough description of simpler concepts To avoid overly complicated notation let us limit ourselves to the case of constructing compound concept approximation on the basis of two simpler concept approximations. Assume we have two concepts Ci and C2 that are given to us in the form of rule-based approximations derived from decision systems S^i = {U, Aci,decci) and Sca = {U, Ac^, decc^)- Henceforth we are given two rough membership functions fj>Ci{x), /xcaC^)- These functions are determined with use of parameter sets {w^^s,w^^,uj^\9^^} and {w^^^^w^^^uj^'^.O^''}, respectively. We want to establish similar set of parameters {wy^^.w^^^uj^,0^} for the target concept C, which we want to describe with use of rough membership function fiC' As previously, the parameters UJ, 9 controlling of the boundary region are userconfigurable. But, we need to derive {Wy^^, w^^} from the data. The issue is to define a decision system from which rules used to define approximations can be derived. This problem can be described by uncertain decision table as follows: The uncertain decision system § c — (t/, Ac, decc), that is necessary for learning an approximation of concept C, contains conditional attributes Ac = {aci, CLC2}

254

Nguyen Sinh Hoa and Nguyen Hung Son

related to simpler concepts Ci and C2. There are two possibilities of defining the evaluated functions i/a^^ and Uac^: 1. by rough membership functions, i.e.

2. by voting weights: We propose the following methods for learning approximation of compound concept from uncertain decision tables: Naive method: One can treat uncertain decision table Sc as a normal decision table §' with more attributes. By extracting rules from §' (using discretization as preprocessing), the rule-based approximations of the concept C are created. It is important to observe that such rules describing C use attributes that are in fact classifiers themselves. Therefore, in order to have more readable and intuitively understandable description as well as more control over quality of approximation (especially for new cases) it pays to stratify and interpret attribute domains for attributes in AcStratification method: Instead of using just a value of membership function or weight we would prefer to use linguistic statements such as ''the likeliness of the occurrence of Ci is low". In order to do that we have to map the attribute value sets onto some limited family of subsets. Such subsets are then identified with notions such us "certain", "low'\ "high" etc. It is quite natural, especially in case of attributes being membership functions, to introduce linearly ordered subsets of attribute ranges, e.g., {negative^ low^ medium, high, positive}. That yields fuzzy-like layout, or linguistic variables, of attribute values. One may (and in some cases should) consider also the case when these subsets overlap. Stratification of attribute values and introduction of linguistic variable attached to inference hierarchy serves multiple purposes. First, it provides a way for representing knowledge in more human-readable format since if we have a new situation (new object X* eU\U) to be classified (checked against compliance with concept C) we may use rules like: If compliance ofx* with Ci is high or medium and compliance ofx* with C2 is high then x* € C. Another advantage of imposing the division of attribute value sets lays in extended control over flexibility and validity of system constructed in this way. As we may define the linguistic variables and corresponding intervals, we gain the ability of making system more stable and inductively correct. In this way we control the general layout of boundary regions for simpler concepts that contribute to construction

18 Learning Concept Approximation from Uncertain Decision Tables

255

of the target concept. The process of setting the intervals for attribute values may be performed by hand, especially when additional background information about the nature of the described problem is available. One may also rely on some automated methods for such interval construction, such as, e.g., clustering, template analysis and discretization. Some extended discussion on foundations of this approach, which is related to rough-neural computing [6] and computing with words can be found in [13, 12].

18.4 Experimental Results To verify a quality of hierarchical classifiers we performed some experiments with the road simulator system. 18.4.1 Road simulator Learning to recognize and predict traffic situations on the road is the main issue in many unmanned vehicle aircraft (UVA) projects. It is a good example of hierarchical concept approximation problem. We demonstrate the proposed layered learning approach on our own simulation system. ROAD SIMULATOR is a computer tool generating data sets consisting of recording vehicle movements on the roads and at the crossroads. Such data sets are next used to learn and test complex concept classifiers working on information coming from different devices (sensors) monitoring the situation on the road. Let us present some most important features of this system. During the simulation the system registers a series of parameters of the local simulations, that is simulations connected with each vehicle separately, as well as two global parameters of the simulation that is parameters connected with driving conditions during the simulation. The local parameters are related to driver's profile, which is randomly determined, when a new vehicle appears on the board, and may not be changed until it disappears from the board. The global parameters like visibility, weather conditions are set randomly according to some scenario. We associate the simulation parameters with the readouts of different measuring devices or technical equipment placed inside the vehicle or in the outside environment (e.g., by the road, in a police car, etc.). Apart from those sensors, the simulator registers a few more attributes, whose values are determined based on the sensor's values in a way determined by an expert. These parameters in the present simulator version take over the binary values and are therefore called concepts. Concepts definitions are very often in a form of a question which one can answer YES, NO or NULL (does not concern). In Figure 18.3 there is an exemplary relationship diagram for the above mentioned concepts we present some exemplary concepts and dependency diagram between those concepts. During the simulation data may be generated and stored in a text file. The generated data are in a form of a rectangle board (information system). Each line of the board depicts the situation of a single vehicle and the sensors' and concepts' values

256

Nguyen Sinh Hoa and Nguyen Hung Son Maximal number of vehicles: 20 Current number of vehicles: 14 Humidity: LACK Visibility: 500 Traffic parameter of main road: 0.S Traffic parameter of subordinate road: 0.2 Current simulation step: 68 (from 500) Saving data: NO

Fig. 18.2. The board of simulation. Safe driving Safe overtaking

Safe distance from FL during overtaking

Possibility of going back to therightlane

Possibility of safe stopping before the crossroads

SENSORS Fig. 18.3. The relationship diagram for presented concepts.

are registered for a given vehicle and its neighboring vehicles. Within each simulation step descriptions of situations of all the vehicles are saved to file.

18 Learning Concept Approximation from Uncertain Decision Tables

257

18.4.2 Experiment setup We have generated 6 training data sets: clO^lOO, clOMOO, cl0^300, cl0^400, C10JS500, C20JS500 and 6 corresponding testing data sets named by CIOJSIOON, CIOJSIOON, CIOJSSOON, C10^400N, C10JS500N, C20^500N. All data sets consists of 100 attributes. The smallest data set consists of above 700 situations (100 simulation units) and the largest data set consists of above 8000 situations (500 simulation units). We compare the accuracy of two classifiers, i.e., RS: the standard classifier induced by the rule set method, and RS-L: the hierarchical classifier induced by the RS-layered learning method. In the first approach, we employed the RSES system [3] to generate the set of minimal decision rules. We use the simple voting strategy for conflict resolution in new situation classification. In the RS-layered learning approach, from training table we create five sub-tables to learn five basic concepts (see Figure 18.3): Ci: ''safe distance from FL during overtaking'' C2'. ''possibility of safe stopping before crossroads',' C3: "possibility of going back to the right lane'' C4: "safe distance from FRl," C5: "forcing the right of way." These tables are created using information available from the concept relationship diagram presented in Figure 18.3. A concept in the next level is CQ: "safe overtaking". To approximate concept CQ, we create a table with three conditional attributes. These attributes describe fitting degrees of object to concepts Ci, C2, C3, respectively. The decision attribute has three values YES, NO, or NULL corresponding to the cases of safe overtaking, dangerous overtaking, and not applicable (the overtaking has not been made by car). The target concept C7: "safe driving" is located in the third level of the concept decomposition hierarchy. To approximate Cj we also create a decision table with three attributes, representing fitting degrees of objects to the concepts C4, C5, CQ, respectively. The decision attribute has two possible values YES or NO if a car is satisfying global safety condition, or not, respectively. The comparison results are performed with respect to the following criteria: (1) Accuracy of classification, (2) Covering rate of new cases (generality), and (3) Computing time necessary for classifier synthesis. Classification accuracy: Similarly to real life situations, the decision class "safe driving = YES" is dominating. The decision class "safe driving = NO" takes only 4% - 9% of training sets. Searching for approximation of "safe-driving = NO" class with the high precision and generality is a challenge of leaning algorithms. In experiments we concentrate on quality of the "NO" class approximation. In Table 18.1 we present the classification accuracy of RS and RS-L classifiers. One can observe, the accuracy of "YES" class of both standard and hierarchical classifiers is high. Whereas accuracy of "NO" class is very poor, particularly in case of the standard classifier. The hierarchical classifier showed to be much better than the

258

Nguyen Sinh Hoa and Nguyen Hung Son

standard classifier for this class. Accuracy of "NO" class of the hierarchical classifier is quite high when training sets reach a sufficient size. Table 18.1. Classification accuracy of a standard and hierarchical classifiers. Accuracy cl0.slOON cl0.s200N clO_5300iV clO_s400iV cl0.s500N c20.s500N Average

RS 0.94 0.99 0.99 0.96 0.96 0.99 0.97

Total RS-L 0.97 0.96 0.98 0.77 0.89 0.89 0.91

RS 1 1 1 0.96 0.99 0.99 0.99

Class YES RS-L 1 0.98 0.98 0.77 0.90 0.88 0.92

Class of NO RS RS-L 0 0 0.60 0.75 0 0.78 0.64 0.57 0.80 0.30 0.44 0.93 0.34 0.63

Covering rate: Generality of classifiers usually is evaluated by the recognition ability for unseen objects. In this section we analyze covering rate of classifiers for new objects. One can observe the similar scenarios to the accuracy degree. The recognition rate of situations belonging to "NO" class is very poor in the case of the standard classifier. One can see in Table 18.2 the improvement on coverage degree of "YES" class and "NO" class of the hierarchical classifier. Table 18.2. Covering rate for standard and hierarchical classifiers. Covering rate clO^slOQN clO_5200Ar cl0_s3007V clO_s400iV cl0.5500iV c20.s500iV Average

RS 0.44 0.72 0.47 0.74 0.72 0.62 0.62

Total RS-L 0.72 0.73 0.68 0.90 0.86 0.89 0.79

RS 0.44 0.73 0.49 0.76 0.74 0.65 0.64

Class YES RS-L 0.74 0.74 0.69 0.93 0.88 0.89 0.81

RS 0.50 0.50 0.10 0.23 0.40 0.17 0.32

Class NO RS-L 0.38 0.63 0.44 0.35 0.69 0.86 0.55

Computing speed: With time respect the layered learning approach shows a tremendous advantage in comparison with the standard learning approach. In the case of standard classifier, computational time is measured as a time required for computing a rule set using to decision class approximation. In the case of hierarchical classifier computational

18 Learning Concept Approximation from Uncertain Decision Tables

259

time is a total time required for all sub-concepts and target concept approximation. One can see in Table 18.3 that speed up ratio of the layered learning approach to the standard one reaches from 40 to 130 times (all experiments were performed on computer with processor AMD Athlon 1.4GHz., 256MB RAM) Table 18.3. Time for standard and hierarchical classifier generation. Tables cl0.5lOO cl0_s200 clO_5300 cl0.s400 cl0.s500 c20_s500 Average

RS RS-L Speed up ratio 94 s 2.3 s 40 714 s 6.7 s 106 1450 s 10.6 s 136 60 2103 s 34.4 s 3586 s 38.9 s 92 104 10209 s 98s 90

18,5 Conclusion We presented a new method for concept synthesis. It is based on the layered learning approach. Unlike traditional approach, in the layered learning approach the concept approximations are induced not only from accessed data sets but also from expert's domain knowledge. In the paper, we assume that knowledge is represented by concept dependency hierarchy. The layered learning approach showed to be promising for the complex concept synthesis. Experimental results with road traffic simulation are showing advantages of this new approach in comparison to the standard approach. Acknowledgements: The research has been partially supported by the grant 3T11C00226 from Ministry of Scientific Research and Information Technology of the Republic of Poland. The authors are deeply grateful to Dr. J. Bazan for his road simulator system and Prof. A. Skowron for valuable discussions on the layered learning approach.

References 1. J. Bazan, H. S. Nguyen, A. Skowron, and M. Szczuka. A view on rough set concept approximation. In G. Wang, Q. Liu, Y. Yao, and A. Skowron, editors, RSFDGrC'2003, Chongqing, China, volume 2639 of LNAI, pages 181-188, Heidelberg, Germany, 2003. Springer-Verlag.

260

Nguyen Sinh Hoa and Nguyen Hung Son

2. J. G. Bazan. A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In L. Polkowski and A. Skowron, editors, Rough Sets in Knowledge Discovery 1: Methodology and Applications, pages 321-365. Physica-Verlag, Heidelberg, Germany, 1998. 3. J. G. Bazan and M. Szczuka. RSES and RSESlib - a collection of tools for rough set computations. In W. Ziarko and Y. Yao, editors, RSCTC'02, volume 2005 of LNAI, pages 106-113, Banff, Canada, October 16-19 2000. Springer-Verlag. 4. W. Kloesgen and J. Zytkow, editors. Handbook of Knowledge Discovery and Data Mining. Oxford University Press, Oxford, 2002. 5. T. Mitchell. Machine Learning. Mc Graw Hill, 1998. 6. S. K. Pal, L. Polkowski, and A. Skowron, editors. Rough-Neural Computing: Techniques for Computing with Words. Cognitive Technologies. Springer-Verlag, Heidelberg, Germany, 2003. 7. Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data, volume 9 of System Theory, Knowledge Engineering and Problem Solving. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991. 8. Z. Pawlak and A. Skowron. A rough set approach for decision rules generation. In Proc. ofIJCAI'93, pages 114-119, Chambery, France, 1993. Morgan Kaufmann. 9. L. Polkowski and A. Skowron. Rough mereology: A new paradigm for approximate reasoning. IntemationalJoumal of Approximate Reasoning, 15(4):333-365, 1996. 10. A. Skowron. Approximate reasoning by agents in distributed environments. In N. Zhong, J. Liu, S. Ohsuga, and J. Bradshaw, editors, lAT'Ol, Japan, pages 28-39. World Scientific, Singapore, 2001. 11. A. Skowron. Approximation spaces in rough neurocomputing. In M. Inuiguchi, S. Tsumoto, and S. Hirano, editors. Rough Set Theory and Granular Computing, pages 13-22. Springer-Verlag, Heidelberg, Germany, 2003. 12. A. Skowron and J. Stepaniuk. Information granule decomposition. Fundamenta Informaticae, 47(3-4):337-350, 2001. 13. A. Skowron and M. Szczuka. Approximate reasoning schemes: Classifiers for computing with words. In Proceedings ofSMPS 2002, Advances in Soft Computing, pages 338-345, Heidelberg, Canada, 2002. Springer-Verlag. 14. P. Stone. Layered Learning in Multi-Agent Systems: A Winning Approach to Robotic Soccer. The MIT Press, Cambridge, MA, 2000.

19 In Search for Action Rules of the Lowest Cost Zbigniew W. Ras^'^ and Angelina A. Tzacheva^ ^ UNC-Charlotte, Computer Science Dept., Charlotte, NC 28223, USA ^ Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland

Summary. There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures ([2], [1], [3], [11],[12]). Objective measures are data-driven and domain-independent. Generally, they evaluate the rules based on their quality and similarity between them. Subjective measures, including unexpectedness, novelty [11], and actionability, are user-driven and domain-dependent. A rule is actionable if user can do an action to his/her advantage based on this rule ([2], [1], [3]). Action rules introduced in [7] and investigated further in [8] are constructed from actionable rules. To construct them, authors assume that attributes in a database are divided into two groups: stable andflexible.Flexible attributes provide a tool for making hints to a user what changes within some values offlexibleattributes are needed for a given group of objects to re-classify these objects to another decision class. Ras and Gupta (see [10]) proposed how to construct action rules when information system is distributed with autonomous sites. Additionally, the notion of a cost and feasibility of an action rule is introduced in this paper. A heuristic strategy for constructing feasible action rules which have high confidence and possibly the lowest cost is also proposed. Interestingness of such action rules is the highest among actionable rules.

19.1 Introduction There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures ([2], [1], [3], [11],[12]). Objective measures are data-driven and domain-independent. Generally, they evaluate the rules based on their quality and similarity between them. Subjective measures, including unexpectedness, novelty [11], and actionability, are user-driven and domaindependent. A rule is actionable if user can do an action to his/her advantage based on this rule ([2], [1], [3]). Action rules introduced in [7] and investigated further in [8] are constructed from actionable rules. They suggest ways to re-classify consumers to a desired state. However, quite often, such a change cannot be done directly to a chosen attribute (for instance to the attribute profit). In such situations, definitions of such an attribute in terms of other attributes have to be learned. These definitions are used to construct action rules showing what changes in values of attributes, for a given consumer, are needed in order to re-classify this consumer the way busi-

262

Zbigniew W. Ras and Angelina A. Tzacheva

ness user wants. This re-classification may mean that a consumer not interested in a certain product, now may buy it, and therefore may shift into a group of more profitable customers. These groups of customers are described by values of classification attributes in a decision system schema. Ras and Gupta, in [10], assume that information system is distributed and its sites are autonomous. They claim that it is wise to search for action rules at remote sites when action rules extracted at the client site can not be implemented in practice (they are too expensive, too risky, or business user is unable to make such changes). Also, they show under what assumptions two action rules extracted at two different sites can be composed. One of these assumptions requires that semantics of attributes, including the interpretation of null values, have to be the same at both sites. In the present paper, this assumption is relaxed. Additionally, we introduce the notion of a cost and feasibility of an action rule. Usually, a number of action rules or chains of action rules can be applied to re-classify a given set of objects. The cost associated to changes of values within one attribute is usually different than the cost associated to changes of values within another attribute. We present a strategy for constructing chains of action rules driven by a change of attribute values suggested by another action rule which are needed to reclassify some objects. This chain of action rules uniquely defines a new action rule and it is built with a goal to lower the cost of reclassifying these objects. Silberschatz and Tuzhilin [11], [12] quantify actionability in terms of unexpectedness and define unexpectedness as a subjective measure of interestingness. They have shown that the most actionable knowledge is unexpected and most of the unexpected knowledge is actionable. So, by discovering action rules of possibly the lowest cost, we obtain the most actionable knowledge and the same the mostly unexpected knowledge related to a desired reclassification of objects.

19.2 Information System and Action Rules An information system is used for representing knowledge. Its definition, presented here, is due to Pawlak [4]. By an information system we mean a pair S = {U, A, V), where: • • •

C/ is a nonempty, finite set called the universe, A is a nonempty, finite set of attributes i.e. a : U —> 14 is a function for a e A, V = [j{Va : a G A}, where Va is a set of values of the attribute a e A.

Elements of U are called objects. In this paper, they are often seen as customers. Attributes are interpreted as features, offers made by a bank, characteristic conditions etc. By a decision table we mean any information system where the set of attributes is partitioned into conditions and decisions. Additionally, we assume that the set of conditions is partitioned into stable conditions and flexible conditions. For simplicity reason, we also assume that there is only one decision attribute. Date of Birth is an example of a stable attribute. Interest rate on any customer account is an example

19 In Search for Action Rules of the Lowest Cost

263

of a flexible attribute (dependable on bank). We adopt the following definition of a decision table: By a decision table we mean an information system of the form 5 = (C/, Ast U Api U {d}), where d 0 Ast U Api is a distinguished attribute called decision. The elements of Ast are called stable conditions, whereas the elements of AFI are called flexible conditions. As an example of a decision table we take S = ({xi, rr2, X3, X4, xs, xe, X7, a^s}, {a, c} U {b} U {d}) represented by Table 1. The set {a, c} lists stable attributes, b is a flexible attribute and d is a decision attribute. Also, we assume that H denotes a high profit and L denotes a low one.

X

a

b

c

d

Xi

0

S

0

L

X2

0

R

1

L

X3

0

S

0

L

X4

0

R

1

L

X5

2

P

2

L

xe

2

P

2

L

X7

2

S

2

H

Xs

2

S

2

H

Table 19.1. Decision System S

In order to induce rules in which the THEN part consists of the decision attribute dand the IF part consists of attributes belonging to Ast^Apu subtables (C/, Bu{d}) of 5 where B is a d-reduct (see [4]) in S should be used for rules extraction. By L{r) we mean all attributes listed in the IF part of a rule r. For example, if r = [(a, 2)*(6, S) —> (d, H)] is a rule then L{r) = {a, 6}. By d{r) we denote the decision value of a rule. In our example d{r) = ^ . If r i , r2 are rules and B C A^t U Api is a set of attributes, then ri/B = r2/B means that the conditional parts of rules r i , r2 restricted to attributes B are the same. For example if ri = [(6, S) * (c, 2) —> (c?, J?)], thenri/{6} = r/{6}. In our example, we get the following optimal rules: (a,0)-^(d,L),(c,0)-.(d,L), (6,i2)^(ci,L),(c,l)^(d,L), (6, P) ^ (d, L), (a, 2) * (6,5) - . (d, ff), (6,5) * (c, 2) ^ (d, H). Now, let us assume that (a, v -^^ it;) denotes the fact that the value of attribute a has been changed from v to w. Similarly, the term (a, t; —» tt;)(x) means that a{x) = V has been changed to a{x) — w. Saying another words, the property (a, v) of object X has been changed to property (a, w).

264

Zbigniew W. Ras and Angelina A. Tzacheva

Let S = {U,Ast U Api U {d}) is a decision table and rules r i , r2 have been extracted from S. Assume that S i is a maximal subset of Ast such that ri/Bi = ^2/Si, d{ri) = ki, d{r2) = k2 and the user is interested in reclassifying objects from class ki to class k2. Also, assume that (61,62,..., bp) is a Hst of all attributes in L{ri) D L{r2) H An on which r i , r2 differ and ri{bi) = t;i, ri(62) = V2,..., ri{bp) = Vp, r2{bi) = wi, r2{b2) = W2,..., r2{bp) = Wp, By (ri, r2)-action rule on x G C/ we mean an expression (see [7]): [{bi,vi -^ wi) A (62,1^2 -^ '^^2) A... A {bp,Vp -> Wp)]{x) => Uki-^k2)]{x). The rule is valid, if its value on x is true in S (there is object xi e S which does not contradict with x on stable attributes in 5 and (Vi < p)(Vfei)[6i(x2) = Wi] A d{x2) = k2). Otherwise it is false.

19.3 Distributed Information System By a distributed information system we mean a pair DS = {{Si}i^i, L) where: • • •

/ is a set of sites. Si = {Xi, Ai,Vi) is an information system for any i e I, L is a symmetric, binary relation on the set / showing which systems can direcdy communicate with each other.

A distributed information system DS = {{Siji^j, L) is consistent if the following condition holds: (Vz)(Vi)(Vx eXiO Xj){Wa eAiH Aj) [{a[s,]{x) C a[Sj]{x)) or (a[5.|(x) C ais,]{x))]. Consistency basically means that information about any object x in one system can be either more general or more specific than in the other. Saying another words two systems can not have conflicting information stored about any object x. Another problem which has to be taken into consideration is semantics of attributes which are common for a client and some of its remote sites. This semantics may easily differ from site to site. Sometime, such a difference in semantics can be repaired quite easily. For instance if Temperature in Celsius is used at one site and Temperature in Fahrenheit at the other, a simple mapping will fix the problem. If information systems are complete and two attributes have the same name and differ only in their granularity level, a new hierarchical attribute can be formed to fix the problem. If databases are incomplete, the problem is more complex because of the number of options available to interpret incomplete values (including null vales). The problem is especially difficult in a distributed framework when chase techniques based on rules extracted at the client and at remote sites (see [6]), are used by the client to impute current values by values which are less incomplete. In this paper we concentrate on granularity-based semantic inconsistencies. Assume first that Si — (Xi.Ai, Vi) is an information system for any i e I and that

19 In Search for Action Rules of the Lowest Cost

265

all S^s form a Distributed Information System (DIS). Additionally, we assume that, if a e Ai 0 Aj, then only the granularity levels of a in Si and 5^ may differ but conceptually its meaning, both in Si and Sj is the same. Assume now that L{Di) is a set of action rules extracted from 5^, which means that D = IJie/ ^(-^0 ^^ ^ ^^^ of action rules which can be used in the process of distributed action rules discovery. Now, let us say that system Sk, k e I is queried be a user for an action rule reclassifying objects with respect to decision attribute d. Any strategy for discovering action rules from S^ based on action rules D' C D is called sound if the following three conditions are satisfied: • •

•

for any action rule in D', the value of its decision attribute d is of the granularity level either equal to or finer than the granularity level of the attribute din S^. for any action rule in D\ the granularity level of any attribute a used in the classification part of that rule is either equal or softer than the granularity level of a in Skattribute used in the decision part of a rule has to be classified as flexible in 5^.

In the next section, we assume that if any attribute is used at two different sites of DIS, then at both of them its semantics is the same and its attribute values are of the same granularity level.

19.4 Cost and Feasibility of Action Rules Assume now that DS = ({5^ : i € / } , L) is a distributed information system (DIS), where Si = {Xi.Ai^ Vi),i e LhQtb e Aiisa flexible attribute in Si and 6i, 62 G Vi are its two values. By ps^ (^1, ^2) we mean a number from (0, +00] which describes the average cost to change the attribute value from 61 to 62 for any of the qualifying objects in Si. Object x e Xi qualifies for the change from 61 to 62, if b{x) = bi. If the implementation of the above change is not feasible for one of the qualifying objects in Si, then we write psi{bi,b2) = +00. The value of ^5^(61,62) close to zero is interpreted that the change of values from 61 to 62 is quite easy to accomplish for qualifying objects in Si whereas any large value of p^. (61,62) means that this change of values is practically very difficult to get for some of the qualifying objects in Si. If psi (61, ^2) < PSi {bs, 64), then we say that the change of values from 61 to 62 is more feasible than the change from 63 to 64. We assume here that the values pSi (6ji, 6^2) are provided by experts for each of the information systems Si. They are seen as atomic expressions and will be used to introduce the formal notion of the feasibility and the cost of action rules in Si. So, let us assume that r = [{bi.vi -^ wi) A (62,^2 —^ W2) A ... A {bp^Vp -> Wp)]{x) => (d, ki -^ k2){x) is a (ri,r2)-action rule. By the cost of r denoted by cost{r) we mean the value Ylips.i'^ki '^k) ' ^ ^ k < p}. We say that r is feasible if cost{r) < pSi{ki,k2). It means that for any feasible rule r, the cost of the conditional part of r is lower than the cost of its decision part and clearly cost{r) < +00.

266

Zbigniew W. Ra^ and Angelina A. Tzacheva

Assume now that disa. decision attribute in Si,ki,k2 G V^, and the user would like to re-classify some customers in Si from the group ki to the group k2. To achieve this goal he may look for an appropriate action rule, possibly of the lowest cost value, to get a hint which attribute values have to be changed. To be more precise, let us assume that Rsi [( k2)] he may identify a rule which has the lowest cost value. But the rule he gets may still have the cost value much to high to be of any help to him. Let us notice that the cost of the action rule r = [{bi.vi -^ wi) A {b2,V2 -^ '^2) A ... A {bp,Vp -^ Wp)]{x) ^ {d,ki -^ k2){x) might be high only because of the high cost value of one of its sub-terms in the conditional part of the rule. Let us assume that {bj^Vj —> Wj) is that term. In such a case, we may look for an action rule in Rs^ [{bj^Vj -^ Wj)] which has the smallest cost value. Assume that ri = [{bji^Vji —> Wji) A {bj2,Vj2 —^ '^32) A ... A {bjq^Vjq -^ '^3q)]{y) =^ iPj^'^j "^ '^j){y) is such a rule which is also feasible in Si, Since x,y e Xi, we can compose r with ri getting a new feasible rule which is given below: [(61,-^i -> wi) A ... A [{bji.Vji -^ Wji) A {bj2,Vj2 -^ '^32) A ... A {bjq.Vjq -^ Wjq)] A ... A {bp.Vp -^ 'Wp)]{x) => {d,ki -> k2){x). Clearly, the cost of this new rule is lower than the cost of r. However, if its support in Si gets too low, then such a rule has no value to the user. Otherwise, we may recursively follow this strategy trying to lower the cost of re-classifying objects from the group ki into the group k2. Each successful step will produce a new action rule which cost is lower than the cost of the current rule. This heuristic strategy always ends because there is a finite number of action rules and any action rule can be applied only once at each path of this recursive strategy. One can argue that if the set Rsi[{d^ki -^ k2)] contains all action rules reclassifying objects from group ki into the group k2 then any new action rule, obtained as the result of the above recursive strategy, should be already in that set. We do not agree with this statement since in practice Rsi [(c/, ki —> A;2)] is only a subset of all action rules. Firstly, it takes too much time (complexity is exponential) to generate all possible rules from an information system and secondly even if we extract such rules it still takes too much time to generate all possible action rules from them. So the applicability of the proposed recursive strategy, to search for rules of lowest cost, is highly justified. Again, let us assume that the user would like to reclassify some objects in Si from the class 61 to the class 62 and that ps^ (^1, ^2) is the current cost to do that. Each action rule in i?^. [(d, ki —> k2)] gives us an alternate way to achieve the same result but under different costs. If we limit ourself to the system 5^, then clearly we can not go beyond the set Rsi [(0?, ki -^ A:2)]. But, if we allow to extract action rules at other information systems and use them jointly with local action rules, then

19 In Search for Action Rules of the Lowest Cost

267

the number of attributes which can be involved in reclassifying objects in Si will increase and the same we may further lower the cost of the desired reclassification. So, let us assume the following scenario. The action rule r = [{bi,vi —^wi)A (62,f2 —^ W2) A ... A {bp.Vp -^ Wp)]{x) =^ {d,ki -^ k2){x), extracted from the information system Si, is not feasible because at least one of its terms, let us say {bj, Vj -^ Wj) where 1 < j < p, has too high cost ps-. {vj, Wj) assign to it. In this case we look for a new feasible action rule ri = [(bji^Vji -^ Wji) A {bj2,Vj2 -> ^i2) A ... A {bjq.Vjq -^ u)jq)]iy) ^ {bj.Vj "^ '^j){y) wWch Concatenated with r will decrease the cost value of desired reclassification. So, the current setting looks the same to the one we already had except that this time we additionally assume that ri is extracted from another information system in DS. For simplicity reason, we also assume that the semantics and the granularity levels of all attributes listed in both information systems are the same. By the concatenation of action rule ri with action rule r we mean a new feasible action rule ri o r of the form: [(61,vi -^ wi) A ... A [{bji,Vji -^ Wji) A ibj2,Vj2 "^ ^j2) A ... A {bjq.Vjq -^ Wjq)] A ... A {bp.Vp -^ Wp)]{x) => {d,ki -^ k2){x) where x is an object in Si = (X^, Ai.Vi). Some of the attributes in {6^1,6^2, ••, bjq} may not belong to Ai. Also, the support of ri is calculated in the information system from which r i was extracted. Let us denote that system by Sm = (^m, ^m, Kn) and the set of objects in Xjn supporting ri by Supsmi^i)- Assume that Supsi{r) is the set of objects in Si supporting rule r. The domain of ri o r is the same as the domain of r which is equal to SupSi{r), Before we define the notion of a similarity between two objects belonging to two different information systems, we assume that Ai = {61,62,63,64}, Am = {bi,b2,b3,b5,bG}, and objects x e Xi, y e Xm are defined by the table below: Table 19.2. Object x from Si and y from Sn 61

X Vi y vi

62

63

^4

^5

V2 V3 V4 W2 W3

ws

We

The similarity p(x, y) between x and y is defined as: [1 -f 0 -f- 0 + 1/2 + 1/2 + 1/2] = [2 -h 1/2]/6 = 5/12. To give more formal definition of similarity, we assume that: p{x, y) = [S{p{bi{x), bi{y)) : bi 6 {Ai U Am)}]/card{Ai U Am), where: • • •

p{bi{x),bi{y)) = 0, if bi{x) ^ bi{y), p{bi{x),bi{y)) = 1, if bi{x) = bi{y), p{bi{x)^ bi{y)) = 1/2, if either bi{x) or bi{y) is undefined.

268

Zbigniew W. Ras and Angelina A. Tzacheva

Let us assume that p(a:,5''ixp5^(ri)) = max{p{x,y) : y e Sups^{ri)}, for each x G SupSi{r). By the confidence of ri o r we mean Conf{ri o r) = lUipi^^S'^PSmin)) ' X € Sups,{r)}/card{SupSi{r))] • Conf{ri) • Conf{r), where Conf{r) is the confidence of the rule r in Si and Conf(ri) is the confidence of the rule ri in Sfn. If we allow to concatenate action rules extracted from 5^ with action rules extracted at other sites of DIS, we are increasing the total number of generated action rules and the same our chance to lower the cost of reclassifying objects in Si is also increasing but possibly at a price of their decreased confidence.

19.5 Heuristic Strategy for the Lowest Cost Reclassification of Objects Let us assume that we wish to reclassify as many objects as possible in the system Si, which is a part of DIS, from the class described by value ki of the attribute d to the class k2. The reclassification ki -^ k2 jointly with its cost psi {ki,k2) is seen as the information stored in the initial node no of the search graph built from nodes generated recursively by feasible action rules taken initially from i?^. [(d, ki -> ^2)]. For instance, the rule r = [{bi,vi -> wi) A (62,^2 -^ ^2) A ... A {bp.Vp -^ Wp)]{x) =^ {d,ki -^k2){x) applied to the node UQ = {[ki -^ k2^ pSi (^15 ^2)]} generates the node ni = {[vi -^wi,ps,{vi,wi)],[v2 -^W2,pSiiv2,W2)],..., [Vp -^Wp,ps,{Vp,Wp)]}, and from rii we can generate the node ^2 = {[Vl -^Wi,pSi{vi,Wi)],[v2 -^W2,pSi{v2,yJ2)],'", [Vji -^ Wji,ps,(Vjl,Wji)],[Vj2 -^Wj2,pSi{Vj2,Wj2)l..., [Vjq -> VJjq^ps.iVjq.Wjq)], ..., [Vp -> Wp, ps,iVp,Wp)]} assuming that the action rule n = [{bjl^Vjl -> Wji) A {bj2,Vj2 -^ Wj2) A ... A {bjq.Vjq -^ Wjq)]{y) => {bj.Vj -^Wj){y) from Rs^ [{bjiVj -^ '^j)] is applied to ni. /see Section 4/ This information can be written equivalently as: r(no) = n i , ri(ni) = n2, [ri o r](no) = n2. Also, we should notice here that ri is extracted from S^ and Supsm (^1) ^ ^rn whereas r is extracted from 5^ and Sups^ (r) C Xi. By Sup Si (r) we mean the domain of action rule r (set of objects in 5^ supporting r). The search graph can be seen as a directed graph G which is dynamically built by applying action rules to its nodes. The initial node no of the graph G contains information coming from the user, associated with the system Si, about what objects in Xi he would like to reclassify and how and what is his current cost of this reclassification. Any other node n in G shows an alternative way to achieve the same reclassification with a cost that is lower than the cost assigned to all nodes which are preceding n in G. Clearly, the confidence of action rules labelling the path from the

19 In Search for Action Rules of the Lowest Cost

269

initial node to the node n is as much important as the information about reclassification and its cost stored in node n. Information from what sites in DIS these action rules have been extracted and how similar the objects at these sites are to the objects in Si is important as well. Information stored at the node {[^;i -^ wi.ps,(^^1,^i)], [v2 -^ ^2,pSi{v2,W2)\,..., [vp -^ Wp,ps,{vp,Wp)]} says that by reclassifying any object x supported by rule r from the class vi to the class Wi, for any i < p, we also reclassify that object from the class ki to k2. The confidence in the reclassification of x supported by node {[vi -^ 'Wi,pSi{vi,wi)],[v2 -^ W2,pSi{y2i'i^^2)],-'A'^P ^ Wp,ps,{vp,Wp)]} IS tht Same as the confidence of the rule r. Before we give a heuristic strategy for identifying a node in G, built for a desired reclassification of objects in Si, with a cost possibly the lowest among all the nodes reachable from the node no, we have to introduce additional notations. So, assume that N is the set of nodes in our dynamically built directed graph G and no is its initial node. For any node n e N,by f{n) = {Yn,{[vn,j -^ Wnj,pSi{yn,j^'^n,j)]}jein) ^^ mcau its domain, the reclassification steps related to objects in Xi, and their cost all assigned by reclassification function f to the node n, where Yn C Xi /Graph G is built for the client site Si/. Let us assume that/(n) = (Yndb^n.k -^ ifn,fc,p5i(^n,fc,^t^n,fc)]}fc€/.)-We say that action rule r, extracted from Si, is applicable to the node n if: • •

YnnSups,{r)y^ili, (Bk e In)[f ^ Rsi[vn,k -^ tt;n,A;]]./see Section 4 for definition of i?5. [...]/

Similarly, we say that action rule r, extracted from 5 ^ , is applicable to the node nif:f: • •

{3x e Yn){3y e Sups^{r))[p{x,y) < A], lp{x,y) is the similarity relation between x, y (see Section 4 for its definition) and A is a given similarity threshold/ {3k e /n)[^ ^ Rsm [^n,k —^ Wn,k]]- /scc Scctiou 4 for definition of Rs^ [...]/

It has to be noticed that reclassification of objects assigned to a node of G may refer to attributes which are not necessarily attributes listed in Si. In this case, the user associated with Si has to decide what is the cost of such a reclassification at his site, since such a cost may differ from site to site. Now, let RA{n) be the set of all action rules applicable to the node n. We say that the node n is completely covered by action rules from RA{n) if Xn = [JlSups^ (r) : r e RA{n)}. Otherwise, we say that n is partially covered by action rules. What about calculating the domain Yn of node n in the graph G constructed for the system 5^? The reclassification (d, ki -^ k2) jointly with its cost psi{ki^k2) is stored in the initial node no of the search graph G. Its domain YQ is defined as the settheoretical union of domains of feasible action rules in Rs^ [{d, ki —> k2)] applied to Xi. This domain still can be extended by any object x e Xi if the following condition holds: (3m)(3r € Rsjki ^ k2]){3y G Sups^{r))[p{x,y) < A].

270

Zbigniew W. Ras and Angelina A. Tzacheva

Each rule applied to the node no generates a new node in G which domain is calculated in a similar way to no. To be more precise, assume that n is such a node and / ( n ) = {Yn, {K,/c -> 'Wn.k^pSi{vn,k,'Wn,k)]}kein)' Its domain Yn is defined as the set-theoretical union of domains of feasible action rules in IJi^s^i [^n,k -^ Wn,k] ' k e In} applied to Xi. Similarly to no, this domain still can be extended by any object x e Xiif the following condition holds: {3m){3k e /n)(3r G Rsm[vn,k -^ ^^n,A:])(32/ e Sups^{r))[p{x,y) < A]. Clearly, for all other nodes, dynamically generated in G, the definition of their domains is the same as the one above. Property 1. An object x can be reclassified according to the data stored in node n, only if x belongs to the domain of each node along the path from the node no to n. Property 2. Assume that x can be reclassified according to the data stored in node n a n d / ( n ) = (Fn,{K,fe -^ w^^k,pSi{vn,k,'l^n,k)]}keIr^)' The cost Cosifci-^fcaC^j ^) assigned to the node n in reclassifying x from ki to k2 is equal to J2{pSi{yn,k,Wn,k) ' k G In}Property 3. Assume that x can be reclassified according to the data stored in node n and the action rules r, r i , r2,..., rj are labelling the edges along the path from the node no to n. The confidence Confk^-^k2 (^? ^) assigned to the node n in reclassifying x from fci to k2 is equal to Conf[rj o ... o r2 o ri o r] /see Section 4/. Property 4. If node nj2 is a successor of the node n^i, then Confk^^k2{'^j2,x) < Con/fc.^A^sKi.^)Property 5. If a node nj2 is a successor of the node n^i, thenCostki^k2{'^j2,x) < Costfci-^fcaC^ji^^)Let us assume that we wish to reclassify as many objects as possible in the system Si, which is a part of DIS, from the class described by value ki of the attribute d to the class k2. We also assume that R is the set of all action rules extracted either from the system Si or any of its remote sites in DIS. The reclassification (c?, fci —^ ^2) jointly with its cost pSi {ki, ^2) represent the information stored in the initial node no of the search graph G, By Xconf we mean the minimal confidence in reclassification acceptable by the user and by Xcost, the maximal cost the user is willing to pay for the reclassification. The algorithm Build-and-Search generates for each object x in Si, the reclassification rules satisfying thresholds for minimal confidence and maximal cost. Algorithm Build-and-Search(i^, x, Xconf^ Xcost, n, m); Input Set of action rules R, Object X which the user would like to reclassify, Threshold value Xconf for minimal confidence. Threshold value Xcost for maximal cost. Node n of a graph G. Output Node m representing an acceptable reclassification of objects from 5^. begin if Co5tfci_fc2(^5^) > Acost,then

19 In Search for Action Rules of the Lowest Cost

271

generate all successors of n using rules from R', while ni is a successor of n do if Con/fci^A;2(^i7^) < ^'Conf then stop else if Co5tfci_^jfe2(ni,a:) < Xcost then Output[ni] else Build-and-Search(i2, x, Xconf, Xcost, ni,m) end Now, calling the procedure Build-and-Search(i?,x, Acon/, Acost,^o,^), we get the reclassification rules for x satisfying thresholds for minimal confidence and maximal cost. The procedure, stops on the first node n which satisfies both thresholds: Xconf for minimal confidence and Xcost for maximal cost. Clearly, this strategy can be enhanced by allowing recursive calls on any node n when both thresholds are satisfied by n and forcing recursive calls to stop on the first node ni succeeding n, if only Costk^^k2{'^i^^) < ^Cost and Confk^^k2{'^i^^) < Xconf- Then, the recursive procedure should terminate not on rii but on the node which is its direct predecessor.

19.6 Conclusion The root of the directed search graph G is used to store information about objects assigned to a certain class jointly with the cost of reclassifying them to a new desired class. Each node in graph G shows an alternative way to achieve the same goal. The reclassification strategy assigned to a node n has the cost lower then the cost of reclassification strategy assigned to its parent. Any node nin G can be reached from the root by following either one or more paths. It means that the confidence of the reclassification strategy assigned to n should be calculated as the maximum confidence among the confidences assigned to all path from the root of G to n. The search strategy based on dynamic construction of graph G (described in previous section) is exponential from the point of view of the number of active dimensions in all information systems involved in search for possibly the cheapest reclassification strategy. This strategy is also exponential from the point of view of the number of values of flexible attributes in all information systems involved in that search. We believe that the most promising strategy should be based on a global ontology [14] showing the semantical relationships between concepts (attributes and their values), used to define objects in DAIS. These relationships can be used by a search algorithm to decide which path in the search graph G should be exploit first. If sufficient information from the global ontology is not available, probabilistic strategies (Monte Carlo method) can be used to decide which path in G to follow.

References 1. Adomavicius, G., Tuzhilin, A., (1997), Discovery of actionable patterns in databases: the action hierarchy approach, in Proceedings of the Third International Conference on

272

2.

3. 4. 5. 6.

7.

8.

9.

10. 11.

12.

13. 14.

15.

Zbigniew W. Ras and Angelina A. Tzacheva Knowledge Discovery and Data Mining (KDD97), Newport Beach, CA, AAAI Press, 1997 Liu, B., Hsu, W., Chen, S., (1997), Using general impressions to analyze discovered classification rules, in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD97), Newport Beach, CA, AAAI Press, 1997 Liu, B., Hsu, W., Mun, L.-E, (1996), Finding interesting patterns using user expectations, DISCS Technical Report No. 7, 1996 Pawlak Z., (1985), Rough Ssets and decision tables, in Lecture Notes in Computer Science 208, Springer-Verlag, 1985, 186-196. Pawlak, Z., (1991), Rough Sets: Theoretical aspects of reasoning about data, Kluwer Academic Publisher, 1991. Ra^, Z., Dardzinska, A., "Handling semantic inconsistencies in query answering based on distributed knowledge mining", in Foundations of Intelligent Systems, Proceedings of ISMIS*02 Symposium, LNCS/LNAI, No. 2366, Springer-Verlag, 2002, 66-74 Ras, Z., Wieczorkowska, A., (2000), Action Rules: how to increase profit of a company, in Principles of Data Mining and Knowledge Discovery, (Eds. D.A. Zighed, J. Komorowski, J. Zytkow), Proceedings of PKDD'OO, Lyon, France, LNCS/LNAI, No. 1910, SpringerVerlag, 2000, 587-592 Ras, Z.W., Tsay, L.-S., (2003), Discovering Extended Action-Rules (System DEAR), in Intelligent Information Systems 2003, Proceedings of the IIS'2003 Symposium, Zakopane, Poland, Advances in Soft Computing, Springer-Verlag, 2003, 293-300 Ras, Z.W., Tzacheva, A., (2003), Discovering semantic incosistencies to improve action rules mining, in Intelligent Information Systems 2003, Advances in Soft Computing , Proceedings of the IIS*2003 Symposium, Zakopane, Poland, Springer-Verlag, 2003, 301310 Ras, Z., Gupta, S., (2002), Global action rules in distributed knowledge systems, in Fundamenta Informaticae Journal, lOS Press, Vol. 51, No. 1-2, 2002, 175-184 Silberschatz, A., Tuzhilin, A., (1995), On subjective measures ofinterestingness in knowledge discovery, in Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD95), AAAI Press, 1995 Silberschatz, A., Tuzhilin, A., (1996), What makes patterns interesting in knowledge discovery systems, in IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, 1996 Skowron A., Grzymala-Busse J., (1991), From the Rough Set Theory to the Evidence Theory, in ICS Research Reports, 8/91, Warsaw University of Technology, October, 1991 Sowa, J.F., (2000), Ontology, Metadata and Semiotics, in Conceptual Structures: Logical, Linguistic, and Computational Issues, B. Ganter, G.W. Mineau (Eds), LNAI 1867, Springer-Verlag, 2000, 55-81 Suzuki, E., Kodratoff, Y., (1998), Discovery of surprising exception rules based on intensity of implication, in Proc. of the Second Pacific-Asia Conference on Knowledge Discovery and Data mining (PAKDD), 1998

20

Circularity in Rule Knowledge Bases Detection using Decision Unit Approach Roman Siminski and Alicja Wakulicz-Deja University of Silesia, Institute of Computer Science B^ziiiska 39, 41-200 Sosnowiec, Poland [email protected] wakulicziius . edu. p i

20.1 Introduction Expert systems are programs that have extended the range of application of software systems to non-structural, ill-defined problems. Perhaps the crucial characteristic of expert systems that distinguishes it from classical software systems is impossibility to obtain correct and complete formal specification. It comes from the nature of knowledge engineering process, that is essentially a modeling discipline, where the results of modeling activity, is the modeling process itself. Expert systems are programs, that solves problems using knowledge acquired, usually, from human experts in the problem domain, as opposed to conventional software that solves problems by following algorithms. But expert systems are programs and programs must be validated. Regarding the classical definition of validation, as stated in [1]: Determination of the correctness of a program with respect to the user needs and requirements, we claim that it is adequate for KBS. But we encounter some differences if we try to use classical verification methods for software engineering. The tasks performed by the knowledge-based systems usually can not be correctly and completely specified, these tasks are usually ill-structured and no efficient algorithmic approach is known from them. KBS are constructed using declarative languages (usually rule-based) that are interpreted by inference engines. This kind of programming is concerned with truth values, rule dependencies and heuristic associations, in contrast to conventional programming that deals with variables, conditionals, loops and procedures. The knowledge base of expert system contains program, usually constructed using rule-based languages, knowledge engineer uses declarative languages or specialized expert system shells. In this work we concentrate our attention on verification of rule knowledge bases of expert systems. We assume that the inference engine and other parts of expert system doesn't need any verification for example, they derives properties from commercial expert system shell. Although the basic validation concepts are common for knowledge and software engineering, we encounter difficulties if we try to apply classical definitions of

274

Roman Siminski and Alicja Wakulicz-Deja

verification and validation (from software engineering) to knowledge engineering. Verification methods of conventional software are not directly applicable to expert systems and the new, specific methods of verification are required. In our previous works [2, 3,4, 5] we present some of the theoretical and practical information about verification and validation of knowledge bases as well as some of the best known methods and tools described in references. Perhaps the best reference materials we found in Alun Precee home page: httpi/Zwww.csd.abdn.ac.ukTapreece, especially in [8, 9, 10, 1, 12]. We can identify some kinds of anomalies in rule knowledge bases. A. Preece divides them in to the four groups: redundancy, ambivalence, circularity and deficiency. In present work we will discuss only one kind of anomalies - circularity. Circular rule sequences are undesirable because they may cause endless loops, as long as inference system does not recognize them at execution time. We present circularity detection algorithm based on decision units conception described in details in [6, 7].

20,2 Circularity - the problem in backward chaining systems Circularity presents an urgent problem in backward chaining systems. A knowledge base has circularity iff it contains some set of rules such that a loop could occur when the rules are fired. In other words - a knowledge base is circular if it contains a circular sequence of the rules, that is, a sequence of rules that the right-hand of all but the last rule are contained in the left-hand side of next rule in sequence, and the right-hand side of the rule is contained in the left-hand side of the first rule of sequence. More formally [10], a knowledge base R contains circular dependences if there is a hypothesis H that unifies with the consequent of a rule R in the rule base R, where R is Arable only when H is supplied as an input to R (see eq. 1). {3R eR,3Ee

E, 3H e H)

{H = conseq{R) A -^firable{R, R, E) A firable{R, R,EU {H}))

(20.1)

where function conseq{R) supplies the literal from the consequent of rule R : R = Li A L2 A ... A Lm -^ M : conseq{R) = M. E, called environment, is a subset of legal input literals (that does not imply a semantic constraint). H, called inferable hypothesis, is defined to be the set of literals in the consequents ant their instances: H e R if{3R e R) {conseq{R) = H). Predicate fireable describes that a rule i? G R is firable if there is some environment E such that the antecedent of i^ is a logical consequence of supplying E as input to R : fireable{R, R, E)if{3a)(RUE) We can distinguish between direct cycle, where the rule calls itself: P{x) A R{x) -> R{x)

(20.2)

Ri : P{x) A Q{x) -> R{x) i?2 : R{x) A S(x) -> P{x)

(20.3)

20 Circularity in Rule Knowledge Bases...

275

I And

Andf

Fig. 20.1. An example circular rule sequence

20.3 Decision units In the real-world rule knowledge bases literals are often coded using attribute-value pairs. In this chapter we shall briefly introduce conception of decision units determined on a rule base containing the Horn clause rules, where literals are coded using attribute-value pairs. We assume a backward inference. A decision unit U is defined as a triple U = (/, O, R), where / denotes a set of input entries, O denotes a set of output entries and R denotes a set of rules fulfilling given grouping criterion. These sets are defined as follows:

/ = O =

{{attri^valij) {(attri^valij)

:3r e R '."ir G R

Rz=z ^r :Wi j^ j , ri^ Vj G R

{attri^valij) G antec{r)} attri =

conclAttr{r)}

conclAttr{ri) = conclAttr{rj)}

(20.4)

Two functions are defined on a rule r : conclAttr{r) returns attribute from conclusion of rule r, antec{r) is a set of conditions of rule r. As it can be seen, decision unit U contains the set of rules R, each rule r e R contains the same attribute in the literal appearing in the conclusion part. All rules grouped within a decision unit take part in an inference process, confirming the aim described by attribute, which appears in the conditional part of each rule. The process given above is often considered to be a part of decision system, thus it is called - a decision unit. All pairs (attribute, value ) appearing in the conditional part of each rule are called decision unit input entries, whilst all pairs (attribute, value) appearing in the conclusion part of each set rule R are called decision unit output entries. Summarising, the idea of decision units allows arranging rule-base knowledge according to a clear and simple criterion. Rules within a given unit work out or confirm the aim determined by a single attribute. When there is a closure of the rules within a given unit and a set of input and output entries is introduced it is possible to review a base on the higher abstraction level. This reveals simultaneously the global connections, which are difficult to be detected immediately, on the basis of rules list verification. Decision unit idea can be well used in knowledge base verification and

276

Roman Siminski and Alicja Wakulicz-Deja Rule set R

r

(ai, valjj)

(Oj, valj if (aI, valji) (a^, valj if(a^

(a^

valj

Input entries /

val^J

(a^, valj I

:

(a^, valj (a^ valj

Output entries O

Fig. 20.2. The structure of the decision unit U

validation process and in pragmatic issue of modelling, which is the subject to be presented later in this paper.

20.4 Decision units in knowledge base verification Decision unit introduction allows implementing the anomalies division into local and global: • •

local anomalies appear within considered individually the decision unit and their detection is local; global anomalies disclose at the decision unit net level. Their detection is based on the connection analysis between units and is global.

A single decision unit can be considered as a model of an elementary, partial decision that has been worked out by the system. The reason for this situation is that all rules being a constitution of a decision unit have the same conclusion attribute. All conclusions create a set of unit output entries specifying possible to confirm inference aims. The decision unit net allows us to formulate the global verification method. On the strength of connections between decision unit analysis it is possible to detect local anomalies in rules, such as deficiency, redundancy, incoherence or circularity, creating chains during an inference process. We can apply considerations at the unit level using black box and glass box techniques. Digressing from an integral unit structure, which creates a net, allows us to detect characteristic symptoms of global anomalies. This can give us a push to do a detailed analysis, making allowance for an integral structure of each unit. This analysis is nevertheless limited to a given fragment of a net, having been tipped previously through a black box verification method.

20 Circularity in Rule Knowledge Bases...

277

20.5 Circular relationship detection technique using decision units There is one particular case of circularity - circularity inside decision unit. This is an example of local anomalies. We can detect this kind of circularity on the local level, building local casual graph - this case presents Fig. 20.3. Global circular rule relationship detection technique shall be presented on example. Figure 20.4a presents such an example. A net can be described as a directed graph. After exclusion of input and output entries discrimination and after rejection of vertexes which stand for disjointed input and output entries the graph assumes shape like the one presented by Figure 20.4b. Such graph shall be called a global relationship decision unit graph. As it can be seen, there are two cycles: 1-2-3-1 and 1-3-1. a)

0

b) l:c=v^,if

<^>—K^=^ci> ^~vcl''

2: c=v^ if

o - > ( ^ G^w^

^>-<«zw%^^^ c^->(5^^>)Gii>t

E^ Fig. 20.3. An example of circularity in decision unit - local casual graph

The presence of cycles can indicate appearance of a cycle relationship in a considered rule base. Figure 20.4c presents example where there is no cyclical relationship - the arcs define proper rules. To make the graph more clear a text description has been omitted. On the contrary, the figure 4d presents case where both cycles previously present at figure 4b now stand for real cycles in a base rule. Thus, the presence of cyclical relationship on a decision unit relationship graph is an indication to carry out a cyclical relationship presence inspection on a global level. This can be achieved by creating a suitable reason-result (casual) graph, representing relations between input and output entries of units, causing cyclical relations described by decision unit relationship diagram. Scanned graph shall consist of only nodes and arcs necessary to determine a circularity causing limitations in scanned area.

20.6 Summary This paper presents the usage of decision units in circularity detection task. Decision units allow modular organisation of the rule knowledge base, which facilitate programming and base verification, simultaneously increasing the clarity of achieved

278

Roman Simiiiski and Alicja Wakulicz-Deja 3 C 3^C 15^

>--i

^

C

T

2

g""*!

^

)) ) •»(

±^

c

A.1.,,,

) 3 C ^''g

>^ >*-?

rf)

>-<^

A»Un, }~»--°

Fig. 20.4. Circularity in the decision units net results and ergonomics of a knowledge engineer labour. The decision units are simple and intuitive models describing relations in rule knowledge base, being direct superstructure of a rule-base model. Decision units allow us to reduce search space in circularity detection task. Thus, the decision units can be considered as a source of simple heuristics, which make verification simple and more efficient. In this same way we can reduce search spaces in other verification tasks, i.e. in redundancy detection in rules chains. We can take into account only selected paths in rules chains and performing verification more quickly. The knowledge base shown as a decision unit

20 Circularity in Rule Knowledge Bases...

279

net allows us to make a convenient presentation in the graphic form - both on a computer screen and on the prints. Thus, we can present verification result in clear and user friendly form. The difficulties with rule base verification have been described in a natural and intuitive way. Detected anomalies and suggested methods of their elimination can be presented to the knowledge engineer in a suggestive way, without the necessity of relating to a complex, conceptual issues.

References 1. Adrion W.R., Branstad M.A., Cherniavsky J.C., 1982, Validation, verification and testing of computer software, ACM Computing Surveys, June, 14(2) pp. 159-192. 2. Simiriski R., Wakulicz-Deja A. (1998) A., Principles and Practice in Knowledge Bases Verification, Proceedings of the IIS VII, Intelligent Information Systems, Poland, Malbork, 15-19.06.1998, pp. 203-211. 3. Siminski R. (1998), Methods and Tools for Knowledge Bases Verification and Validation, Proceedings of CAr98 - Colloquia in Artificial Intelligence, 28-30.9.1998, Lodz, Poland. 4. Siminski R., Wakulicz-Deja A. (1998), Principles and Practice in Knowledge Bases Verification, Proceedings of IIS'98 - Intelligent Information Systems VII, 15-19.6.1998, Malbork, Poland. 5. Siminski R., Wakulicz-Deja A. (1999), Dynamic Verification Of Knowledge Bases, Proceedings of IIS'99, Intelligent Information Systems VIII, 14-18.6.1999, Ustron, Poland. 6. Siminski R., Wakulicz-Deja A. (2000), Verification of Rule Knowledge Bases Using Decision Units, Advances in Soft Computing, Intelligent Information Systems, PhysicaVerlag, Springer Verlag Company, 2000. 7. Siminski R., Wakulicz-Deja A. (2003), Decision units as a tool for rule base modeling and verification, Proceedings of Intelligent Information Systems Intelligent: Information Processing and Web Mining, 2-5.6.2003, Zakopane, Polska, Advances in Soft Computing, Physica-Verlag, Springer Verlag Company, 2003, pp. 553-556. 8. Preece A.D. (1991), Methods for Verifying Expert System Knowledge Base, [email protected]. 9. Preece A.D. (1991a), Verifying expert system knowledge bases: An example, [email protected]. 10. Preece A.D. (1994). Foundation and Application of Knowledge Base Verification. International Journal of Intelligent Systems, 9 pp. 683-701 11. Preece A.D. Batarekh A. Shinghal R. (1990) Verifing Rule-Based Systems, [email protected]. 12. Preece A.D., Shinghal R., Batarekh A. (1992), Principles and Practices in Verifing Rule-Based Systems, Knowledge Engineering Review, vol. 7, no. 2, pp. 115-141, [email protected].

21 Feedforward Concept Networks* Dominik Sl^zak^'^, Marcin Szczuka^, and Jakub Wroblewski^ ^ Department of Computer Science, University of Regina Regina, SK, S4S 0A2, Canada ^ Polish-Japanese Insdtute of Information Technology Koszykowa 86, 02-008 Warsaw, Poland ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected], [email protected], j akubw@pj ws tk.edu.pi

Summary. The paper presents an approach to construction of hierarchical structures of data based concepts (granules), extending the idea of feedforward neural networks. The operations of processing the concept information and changing the concept specification through the network layers are discussed. Examples of the concepts and their connections are provided with respect to the case study of learning hierarchical rule based classifiers from data. The proposed methods are referred to the foundations of granular and rough-neural computing.

21.1 Introduction If we take a look at the standard approach to classification and decision support with use of learning systems, we quickly realize that it does not always fit the purpose. Equipped with the hypothesis formation (learning) algorithm, we attempt to find a possibly direct mapping from the input values to decisions. Such an approach does not always result in success, because of various reasons. We address the situation when the desired solution should be more fine-grained, namely, it should have an internal structure. Although possibly hard to find and learn, such architecture repays us by providing significant extensions in terms of flexibility, generality and expressiveness of the yielded model. We attempt to show our view on the process of construction and tuning of hierarchical structures of concepts (which can be also referred to as granules of knowledge [11, 13, 14]). We address these structures diS feedforward concept networks, which can be regarded as a special case of hierarchical structures developed within the rough-neural computing (RNC) methodology [6, 8,9]. In particular, we consider * Supported by grant 3T11C00226 from the Polish Ministry of Scientific Research and Information Technology. The first author also supported by the grant from the Faculty of Science, the University of Regina.

282

Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski

classifier networks, where the input concepts correspond to the classified objects' behavior with respect to the standard classifiers and the output (target) concept reflects the final classification. The basic idea is that the relationship between such input and output concepts (granules) is not direct but based on the internal layers of intermediate elements, which help in more reliable transition from the basic information to possibly compound classification goal. We strive against formalization of our approach with use of analogies rooted in other areas, such as artificial neural networks [3, 4], ensembles of classifiers [2, 12, 21], and layered learning [20]. We show how the presented ideas can be exploited within wider frameworks of rough-neural and granular computing. We also make an effort to provide examples of actual models outlined in our earlier, applicationoriented papers [18, 19]. The paper starts with general overview sketching main points of the proposed approach. We provide some intuitions and familiarize the reader with mechanisms present in our proposed model. Then, we go step-by-step through formalization describing the kinds of dependencies that drive the whole approach. Where possible, we provide examples to better ground the ideas.

21.2 Hierarchical learning and classification Let us start by explaining how we intend to treat the notion of a concept. In general, a concept is an element drawn from a parameterized concept space. By a proper setting of these parameters we choose the right concept. Note, that we do not initially demand that all concepts come from the same space. Such an informal definition of a concept space can be referred to the notion of an information granule system S = (G, R, Sem), where G is a set of parameterized formulas called information granules, i? is a (parameterized) relation structure, and Sem is the semantics of G in i? (cf. [14]). In our approach, we focus especially on the concept parametrization and ability of parameterized construction of new concepts from the others. In this sense, our understanding of a concept space can be regarded as equivalent to information granule system and the terms concept and granule can be used exchangeably. Let a concept represent an element acting on the basis of information originating from other concepts or directly from the data source. To better depict the whole structure, it is convenient to exploit the analogy with artificial neural networks. In this case, a concept corresponds to a signal transmitted through a neuron - the basic computing unit. Dependencies between concepts, their precedence and importance, are represented by weighted connections between nodes. Similarly to the feedforward neural network, operations can be performed from bottom to top. They can correspond to the following goals: Construction of compound concepts from the elementary ones. It can be observed in the case-based reasoning (cf. [5]), layered learning (cf. [20]), as well as rough mereology [10] and rough neural-computing [6, 8, 9], where we want to approximate target concepts step by step, using the simpler concepts that are easier to learn directly from data.

21 Feedforward Concept Networks

283

Construction of simple concepts from the advanced ones. It can be considered for the synthesis of classifiers, where we start from compound concepts (granules) reflecting the behavior of a given object with respect to particular, often compound classification systems, and we tend to obtain a very simple concept of a decision class where that object should belong to [9, 11]. The first goal corresponds to generalization of simple concepts while the second - to instantiation of general concept in a simpler, more specialized concept (cf. [16]). Obviously, we do not assume that the above are the only possible types of constructions. For instance, in a classification problem, decision classes can have a compound semantics requiring gradual specification corresponding to the first type of construction. Then, once we reach an appropriate level of expressiveness, we follow the second scenario to synthesize those compound specifications towards obtaining the final response of the classifier network.

21.3 General network architecture When considering hierarchical structures for compound concept formation, several issues pop-up. At the very general level of hierarchy construction/learning, one has to make choices with respect to homogeneity and synchronization. We mention below how these factors determine the complexity of construction task. Homogeneous vs. heterogeneous. At each level of hierarchy we make choice of the type of concepts (granules) to be used. In the simplest case each node implements the same type of mapping. We have studied such a fully homogeneous system in [18, 19] to express probabilistic classifiers based on the rough set reducts [12] and Naive Bayes approach. First step towards heterogeneity is by permitting different types of concepts to be used at various levels in hierarchy, but retaining uniformity across a single layer. This creates typical layered learning model [20]. Finally, we may remove all restrictions on the uniformity of models in the neighboring nodes. In this way we produce a more general but harder to control structure. Synchronous vs. asynchronous. This issue is concerned with the layout of connections between nodes. If it has easily recognizable layered structure we regard it to be synchronized. In other words, we can analyze the hierarchical structure in a level-by-level manner and, consequently, have an ability to clearly indicate the level of abstraction for composite concepts. If we permit the connections to be established on less restrictive basis, the synchronization is lost. Then, the nodes from non-consecutive levels may interact and the whole idea of simple-to-compound precedence of concepts becomes less usable. The layouts of classifier networks for various levels of homogeneity and synchronization are illustrated in Figure 21.1. The simplest case of homogeneous and synchronized network corresponds to Figure 21.1a. The partly homogeneous, synchronized architecture that we are attempting to formalize in this paper is shown in Figure 21.1b. Figures 21.1c and 21. Id represent the harder cases.

284

Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski

One can see that there are also other cases possible. For instance, we can consider asynchronous but homogeneous network described in [1], where the nodes correspond semantically to the complex concepts we want to approximate although syntactically the operations within the nodes remain of the same type, regardless of whether those nodes represent the advanced of very initial concepts.

Fig. 21.1. Examples of network layout: a. both synchronized and homogeneous; b. synchronized and partly heterogeneous; c. synchronized and heterogeneous; d. neither synchronized nor homogeneous.

21.4 Hierarchical concept schemes In this section we present a general notation for feedforward networks transmitting the concepts. Since we restrict ourselves to the two easier architecture cases illustrated by Figures 21.1a and 21.1b, we can consider the following notion: Definition 1. By a hierarchical concept scheme we mean a tuple (C.MAV). C = {Ci,... ,Cn, C} is a collection of the concept spaces (information granule systems), where C is called the target concept space. The concept mappings MAV

= {mapi : Ci -^ Q + i : z = 1,.

) C'n-f 1 — C}

(21.1)

are the functions linking consecutive concept spaces. We assume that any feedforward concept network corresponds to (C^MAV), i.e. each 2-th layer provides us with the elements of Q . In case of total homogeneity, we have equalities Ci = • • • = C^ = C and mapi = • • • = mapn = identity. For partly homogeneous architecture, some of the mappings can remain identities but we should also expect non-trivial mappings between the concepts of entirely different nature, where Ci ^ Q + i .

21 Feedforward Concept Networks

285

Following the structure of feedforward neural network, we calculate the inputs to each next layer as combinations of the concepts from the previous one. In general, we cannot expect the traditional definition of a linear combination to be applied. Still, the intuition says that the labels of connections should somehow express the level of concepts' importance in formation of the new ones. We refer to this intuition in terms of so called generalized linear combinations: Definition 2. Feedforward concept scheme is a triple (C, AiAV^ CXM), where CIM = {lirii : 2^^^^^ -> Q : 2 = 1 , . . . , n }

(21.2)

defines generalized linear combinations over the concept spaces Ci. For any i = 1 , . . . , n, Wi denotes the space of the combination parameters. If Wi is a partial or total ordering, then we interpret its elements as weights reflecting the relative importance ofparticular concepts in construction of the resulting concept. Let us denote by m{i) G N the number of nodes in the i-th network layer. For any i = 1 , . . . , n, the nodes from the z-th and {i + l)-th layers are connected by the links labeled with parameters ^j(*4.i) ^ Wi, for j{i) = 1 , . . . , m(i) and j{i -h 1) = 1 , . . . , m(i + 1). For any collection of the concepts c j , . . . , c^^*^ G Q occurring as the outputs of the i-th network's layer in a given situation, the input to the j{i -h l)-th node in the {i -h l)-th layer takes the following form: 4 \ + ^ U m a p , (lim ( { ( 4 ' ' \ 4 ^ - i i ) ) -.Jii) = l , . . . , m ( i ) } ) )

(21.3)

The way of composing functions within the formula (21.3) requires, obviously, further discussion. In this paper, we restrict ourselves to the case of Figure 21.2a, where mapi and lirii are stated separately. However, parameters ^^/*^_i) could be also used directly in a generalized concept mapping genmapi : 2^^^^^ - . Q + i

(21.4)

as shown in Figure 21.2b. These two possibilities reflect construction tendencies described in Section 21.2. Function (21.4) can be applied to construction of more compound concepts parameterized by the elements of Wi, while the usage of Definitions 1 and 2 results rather in potential syntactical simplification of the new concepts (which can, however, still become more compound semantically). One can see that function genmap and the corresponding illustration 21.2b refer directly to the ideas of synthesizing concepts (granules, standards, approximations) known from rough-neural computing, rough mereology, and the theory of approximation spaces (cf. [6, 11, 14]). On the other hand, splitting genmap's functionality, as proposed by formula (21.3) and illustrated in 21.2a, provides us with a framework more comparable to the original artificial neural networks and their supervised learning capabiHties (cf. [19,18]).

286

Dominik Slf zak, Marcin Szczuka, and Jakub Wroblewski

21.5 Weighted compound concepts Beginning with the input layer of the network, we expect it to provide the conceptssignals c j , . . . , c^^^^ G Ci, which will be then transmitted towards the target layer using (21.3). If we learn the network related directly to real-valued training sample, then we get Ci = R, lirii can be defined as classical linear combination (with Wi = M), and mapi as identity. An example of a more compound concept space originates from our previous studies [18, 19]:

C;]^7|^^'^ = map.

olin^

Fig. 21.2. Production of new concepts in consecutive layers: a. the concepts arefirstweighted and combined within the original space Ci using function lirii and then mapped to a new concept in Ct+i; b. the concepts are transformed directly to the new space Ci-^i by using the generalized concept mapping (21.4). Example 1. Let us assume that the input layer nodes correspond to various classifiers and the task is to combine them within a general system, which synthesizes the input classifications in an optimal way. For any object, each input classifier induces possibly incomplete vector of beliefs in the object's membership to particular decision classes. Let DEC denote the set of decision classes specified for a given classification problem. By the weighted decision space WDEC we mean the family of subsets of DEC with elements labeled by their beliefs, i.e.:

WDEC=

U

{{k,iik)'keX,yik^

(21.5)

XCDEC

Any weighted decision Jl = {(fc,jLtfc) : A; G Xjx^fjik G R} corresponds to a subset Xji C DEC of decision classes for which the beliefs fik ^^ are known. Another example corresponds to the specific classifiers - the sets of decision rules obtained using the methodology of rough sets [12, 21]. The way of parametrization is comparable to the proceedings with classification granules in [11, 14]. Example 2. Let DESC denote the family of logical descriptions, which can be used to define decision rules for a given classification problem. Every rule is labeled with its description amie ^ DESC and decision information, which takes - in the most

21 Feedforward Concept Networks

287

general framework - the form of Jlmie ^ WDEC. For a new object, we measure its degree of satisfaction of the rule's description (usually zero-one), combine it with the number of training objects satisfying amie, and come out with the number appmie € M expressing the level of rule's applicability to this object. As a result, by the decision rule set space RULS we mean the family of all sets of elements of DESC labeled by weighted decision sets and the degrees of applicability, i.e.: RULS =

(J

{(a, Jl, app) :a£X,p£

WDEC, app £ E}

(21.6)

XCDESC

Definition 3. By a weighted compound concept space C we mean a space of collections of sub-concepts/ram some sub-concept space S (possibly from several spaces), labeled with the concept parameters/ram a given space V, i.e.: C^

[j {{s,Vs):seX,VseV} xcs

(21.7)

For a given c= {{s,Vs) : s e X^ Vg G V}, where Xc C S is the range ofc, parameters Vs EV reflect relative importance of sub-concepts s G Xc within Ci. Just like in case of combination parameters Wi in Definition 2, we can assume a partial or total ordering over the concept parameters. A perfect situation would be then to be able to combine these two kinds of parameters while calculating the generalized linear combinations and observe how the sub-concepts from various outputs of the previous layer fight for their importance in the next one. For the sake of simplicity, we further restrict ourselves to the case of real numbers, as stated by Definition 4. However, in general Wi does not need to be E. Let us consider a classifier network, similar to Example 2, where decision rules are described by parameters of accuracy and importance (initially equal to their support). A concept transmitted by network refers to rules matched by an input object. The generalized linear combination of such concepts may be parameterized by vectors (w^O) G Wi and defined as a union of rules, where importance is expressed by w and 9 states a threshold for the rules' accuracy. Definition 4. Let the i-th network layer correspond to the weighted compound concept space Ci based on sub-concept space Si and parameters V^ = E. Consider the j{i -\-l)-th node in the next layer We define its input as follows:

lini{{{4^'\w^g,^) : j(i) = l,...,m(i)}) =

(21.8)

where Xj(^i) C Si is simplified notation for the range of the weighted compound concept c^^*^ and Vs G E denotes the importance of sub-concept s e Si in c^^^K Formula (21.8) can be applied both to WDEC and RULS. In case of WDEC, the sub-concept space equals to DEC. The sum J2j(i)-sex i ^^(I-i-i)^« gathers the

288

Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski

weighted beliefs of the previous layer's nodes in the given decision class s e DEC. In the case of RULS we do the same with the weighted applicability degrees for the elements-rules belonging to the sub-concept space i ^ E ^ C x WDEC. It is interesting to compare our method of the parameterized concept transformation with the way of proceeding with classification granules and decision rules in the other rough set based approaches [11, 12, 14, 21]. Actually, at this level, we do not provide anything novel but rewriting well known examples within a more unified framework. A more visible difference can be observed in the next section, where we complete our methodology.

O

to be ^ / \ classified

RULS layers

WDEC^l—I for the ' I 1 nhiect

Fig. 21.3. The network-based object classification: the previously trained decision rule sets are activated by an object by means of their applicabihty to its classification; then the rule set concepts are processed and mapped to the weighted decisions using function (21.9); finally the most appropriate decision for the given object is produced.

21.6 Activation functions The possible layout combining the concept spaces DEC, WDEC, and RULS with the partly homogeneous classifier network is illustrated by Figure 21.3. Given a new object, we initiate the input layer with the degrees of applicability of the rules in particular rule-sets to this object. After processing with this type of concept along (possibly) several layers, we use the concept mapping function map{ruls) = { (fc, Eia,ji,app)eruis:keXj, ^PP ' f^k) : k € U(,a,fi,app)Eruls

4

X;

(21.9) that is we simply summarize the beliefs (weighted by the rules' applicability) in particular decision classes. Similarly, we finally map the weighted decision to the decision class, which is assigned with the highest resulting belief. The intermediate layers in Figure 21.3 are designed to help in voting among the classification results obtained from particular rule sets. Traditional rough set approach (cf. [12]) assumes specification of a fixed voting function, which, in our terminology, would correspond to the direct concept mapping from the first RULS

21 Feedforward Concept Networks

289

layer into DEC, with no hidden layers and without possibility of tuning the weights of connections. An improved adaptive approach (cf. [21]) enables us to adjust the rule sets, although the voting scheme still remains fixed. In the same time, the proposed method provides us with a framework for tuning the weights and, in this way, learning adaptively the voting formula (cf. [6, 11, 14]). Still, the scheme based only on generalized linear combinations and concept mappings is not adjustable enough. The reader may check that composition of functions (21.8) for elements of RULS and WD EC with (21.9) results in the collapsed single-layer structure corresponding to the most basic weighted voting among decision rules. This is exactly what happens with classical feedforward neural network models with no non-linear activation functions translating the signals within particular neurons. Therefore, we should consider such functions as well. Definition 5. Neural concept scheme is a quadruple (C, MAV, CXM, ACT), where the first three entities are provided by Definitions 1, 2, and ACT = {acti : Ci -^ Ci : 2 = 2 , . . . ,n + 1}

(21.10)

is the set of activation fiinctions, which can be used to relate the inputs to the outputs within each i-th layer of a network. It is reasonable to assume some properties of ACT, which would work for the proposed generalized scheme analogously to the classical case. Given a compound concept consisting of some interacting parts, we would like, for instance, to guarantee that a relative importance of those parts remains roughly unchanged. Such a requirement, corresponding to monotonicity and continuity of real functions, is well expressible for weighted compound concepts introduced in Definition 3. Given a concept Ci G Ci represented as the weighted collection of sub-concepts, we claim that its more important (better weighted) sub-concepts should keep more influence on the concept acti{ci) G Ci than the others. In [18, 19] we introduced sigmoidal activation function working on probability vectors comparable to the structure of WD EC in Example 1. That function, originated from the studies on monotonic decision measures in [15], can be actually generalized onto any space of compound concepts weighted with real values: Definition 6. By a-sigmoidal activation function for weighted compound concept space C with the real concept parameters, we mean function act^ : C -^ C parameterized by a> 0 which modifies these parameters in the following way: act2:{c) = {\s,^-^^^^^^^y.it,vt)ec\

(21.11)

By composition of lirii and mapi, which specify the concepts c^^^'^ ^ e C^+i as inputs to the nodes in the (z+l)-th layer, with functions actf_^i modifying the concepts within the entire nodes, we obtain a classification model with a satisfiable expressive and adaptive power. If we apply this kind of function to the rule sets, we modify the rules' applicability degrees by their internal comparison. Such performance cannot

290

Dominik Slf zak, Marcin Szczuka, and Jakub Wroblewski

be obtained using the classical neural networks with the nodes assigned to every single rule. Appropriate tuning of a > 0 results in activation/deactivation of the rules with a relative higher/lower applicability. Similar characteristics can be observed within WDEC, where the decision beliefs compete with each other in the voting process (cf. [15]). The presented framework allows for modeling also other interesting behaviors. For instance, the decision rules which inhibit influence of other rules (so called exceptions) can be easily achieved by negative weights and proper activation functions, what would be hard to emulate by plain, negation-free conjunctive decision rules. Further research is needed to compare the capabilities of the proposed construction with other hierarchical approaches [6, 10, 9, 20].

21.7 Learning in classifier networks A cautious reader have probably already noticed the arising question about the proper choice of connection weights in the network. The weights are ultimately the component that decides about the performance of entire scheme. As we will try to advocate, it is - at least to some extent - possible to learn them in a manner similar to the case of standard neural networks. Backpropagation, the way we want to use it here, is a method for reducing the global error of a network by performing local changes in weights' values. The key issue is to have a method for dispatching the value of the network's global error functional among the nodes (cf. [4]). This method, when shaped in the form of an algorithm, should provide the direction of the weight update vector, which is then applied according to the learning coefficient. For the standard neural network model (cf. [3]) this algorithm selects the direction of weight update using the gradient of error functional and the current input. Obviously, numerous versions and modifications of gradient-based algorithm exist. In the more complicated models which we are dealing with, the idea of backpropagation transfers into the demand for a general method of establishing weight updates. This method should comply to the general principles postulated for the rough-neural models (cf. [8, 21]). Namely, the algorithm for the weight updates should provide a certain form of mutual monotonicity i.e. small and local changes in weights should not rapidly divert the behavior of the whole scheme and, at the same time, a small overall network error should result in merely cosmetic changes in the weight vectors. The need of introducing automatic backpropagation-like algorithms to rough-neural computing were addressed recently in [6]. It can be referred to some already specified solutions like, e.g., the one proposed for rough-fuzzy neural networks in [7]. Still, general framework for RNC is missing, where a special attention must be paid on the issue of interpreting and calculating partial error derivatives with respect to the complex structures' parameters. We do not claim to have discovered the general principle for constructing backpropagation-like algorithms for the concept (granule) networks. Still, in [18,19] we have been able to construct generalization of gradient-based method for the homogeneous neural concept schemes based on the space WDEC. The step to partly homogeneous schemes is natural for the class of weighted compound concepts.

21 Feedforward Concept Networks

291

which can be processed using the same type of activation function. For instance, in case of the scheme illustrated by Figure 21.3, the conservative choice of mappings, which turn to be differentiable and regular, permits direct translation from the previous case. Hence, by small adjustment of the algorithm developed previously, we get a recipe for learning the weight vectors. An example of two-dimensional weights {w, 6) e Wi proposed in Section 21.4 is much harder to translate into backpropagation language. One of the most important features of classical backpropagation algorithm is that we can achieve the local minimum of an error function (on a set of examples) by local, easy to compute, change of the weight value. It does not remain easy for two real-valued parameters instead of one. Moreover, parameter ^ is a rule threshold (fuzzified by a kind of sigmoidal characterisitcs to achieve differentiable model) and, therefore, by adjusting its value we are switching on and off (almost, up to the proposed sigmoidal function) entire rules, causing dramatic error changes. This is an illustration of the problems arising when we are dealing with more complicated parameter spaces - In many cases we have to use dedicated, time-consuming local optimization algorithms. Yet another issue is concerned with the second „tooth" of backpropagation: transmitting the error value backward the network. The question is how to modify the error value due to connection weight, assuming that the weight is generalized (e.g. the vector as above). The error value should be translated into value compatible with the previous layer of classifiers, and should be useful for an algorithm of parameters modification. It means that information about error transmitted to the previous layer can be not only a real-valued signal, but e.g. a complete description of each rule's positive or negative contribution to the classifier performance in the next layer.

21.8 Conclusions We have discussed construction of hierarchical concept schemes aiming at layered learning of mappings between the inputs and desired outputs of classifiers. We proposed a generalized structure of feedforward neural-like network approximating the intermediate concepts in a way similar to traditional neurocomputing approaches. We provided the examples of compound concepts corresponding to the decision rule based classifiers and showed some intuition concerning their processing through the network. Although we have some experience with neural networks transmitting non-trivial concepts [18, 19], this is definitely the very beginning of more general theoretical studies. The most emerging issue is the extension of proposed framework onto more advanced structures than the introduced weighted compound concepts, without loosing a general interpretation of monotonic activation functions, as well as relaxation of quite limiting mathematical requirements corresponding to the general idea of learning based on the error backpropagation. We are going to challenge these problems by developing theoretical and practical foundations, as well as by referring to other approaches, especially those related to rough-neural computing [6, 8,9].

292

Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski

References 1. Bazan, J., Nguyen, S.H., Nguyen, H.S., Skowron, A.: Rough Set Methods in Approximation of Hierarchical Concepts. In: Proc. of RSCTC'2004. LNAI3066, Springer Verlag (2004) pp. 346-355 2. Dietterich, T.: Machine learning research: four current directions. AI Magazine 18/4 (1997) pp. 97-136. 3. Hecht-Nielsen, R.: Neurocomputing. Addison-Wesley (1990). 4. le Cun, Y.: A theoretical framework for backpropagation. In: Neural Networks - concepts and theory. IEEE Computer Society Press (1992). 5. Lenz, M., Bartsch-Spoerl, B., Burkhard, H.-D., Wess, S. (eds.): Case-Based Reasoning Technology: From Foundations to Applications. LNAI1400, Springer (1998). 6. Pal, S.K., Peters, J.F., Polkowski, L., Skowron, A,: Rough-Neural Computing: An Introduction. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 15-^1. 7. Pedrycz, W., Peters, J.F.: Learning in fuzzy Petri nets. In: J. Cardoso, H. Scarpelli (eds.), Fuzziness in Petri Nets. Physica (1998) pp. 858-886. 8. Peters, J.F., Szczuka, M.: Rough neurocomputing: a survey of basic models of neurocomputation. In: Proc. of RSCTC'2002. LNAI 2475, Springer (2002) pp. 309-315. 9. Polkowski, L., Skowron, A.: Rough-neuro computing. In: W. Ziarko, Y.Y. Yao (eds.), Proc. of RSCTC'2000. LNAI 2005, Springer (2001) pp. 57-64. 10. Polkowski, L., Skowron, A.: Rough mereological calculi of granules: A rough set approach to computation. Computational Intelligence, 17/3 (2001) pp. 472-492. 11. Skowron, A.: Approximate Reasoning by Agents in Distributed Environments. Invited speech at IAT'2001. Maebashi, Japan (2001). 12. Skowron, A., Pawlak, Z., Komorowski, J., Polkowski, L.: A rough set perspective on data and knowledge. In: W. Kloesgen, J. Zytkow (eds.). Handbook of KDD. Oxford University Press (2002) pp. 134-149. 13. Skowron, A., Stepaniuk, J.: Information granules: Towards foundations of granular computing. International Journal of Intelligent Systems 16/1 (2001) pp. 57-86. 14. Skowron, A., Stepaniuk, J.: Information Granules and Rough-Neural Computing. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 43-84. 15. Sl^zak, D.: Normalized decision functions and measures for inconsistent decision tables analysis. Fundamenta Informaticae 44/3 (2000) pp. 291-319. 16. Sl^zak, D., Szczuka, M., Wrdblewski, J.: Harnessing classifier networks - towards hierarchical concept construction. In: Proc. of RSCTC'2004, Springer (2004). 17. Sl^zak, D., Wroblewski, J.: Application of Normalized Decision Measures to the New Case Classification. In: W. Ziarko, Y. Yao (eds.), Proc. of RSCTC'2000. LNAI 2005, Springer (2001) pp. 553-560. 18. Sl^zak, D., Wroblewski, J., Szczuka, M.: Neural Network Architecture for Synthesis of the Probabilistic Rule Based Classifiers. ENTCS 82/4, Elsevier (2003). 19. Sl^zak, D., Wroblewski, J., Szczuka, M.: Constructing Extensions of Bayesian Classifiers with use of Normalizing Neural Networks. In: N. Zhong, Z. Ras, S. Tsumoto, E. Suzuki (eds.), Proc. of ISMIS'2003. LNAI 2871, Springer (2002) pp. 408-416. 20. Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer. MIT Press, Cambridge MA (2000). 21. Wroblewski, J.: Adaptive aspects of combining approximation spaces. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 139-156.

22

Extensions of Partial Structures and Their Application to Modelling of Multiagent Systems Bozena Staruch Faculty of Mathematics and Computer Science University of Warmia and Mazury Zolnierska 14a, 10-561 Olsztyn, Poland bs [email protected] Summary. Various formal approaches to modelling of multiagent systems were used, e.g., logics of knowledge and various kinds of modal logics [4]. We discuss an approach to multiagent systems based on assumption that the agents possess only partial information about global states, see [6]. We make a general assumption that agents perceive the world by fragmentary observations only [8, 4]. We propose to use partial structures for agent modelling and we present some consequences of such an algebraic approach. Such partial structures are incrementally enriched by new information. These enriched structures are represented by extensions of the given partial model. The extension of partial structure is a basic notion of this paper. It makes it possible for a given agent to model hypotheses about extensions of the observable world. An agent can express the properties of the states by properties of the partial structure he has at his disposal. We assume that every agent knows the signature of the language that we use for modelling agents.

22.1 Introduction A partial structure is a partial algebra [2, 1] enriched in predicates. For simplicity, we use a language with a satisfactory number of constants and, in consequence, we describe theories of partial structures in terms of atomic formulas with constants and additionally, inequalities between some constants. Such formulas can be treated as constraints defining the discemibility conditions that should be preserved, e.g, during the data reduction [8]. Our theoretical considerations splits into two methods: partial-model theoretic and logical one. We investigate two kinds of sets of first order sentences. An infallible set of sentences (a partial theory) contains all sentences that should be satisfied in every extension of the given family of partial structures. A possible set of sentences is the set of sentences that is satisfied in a certain extension of the given family of partial structures. Any partial algebraic structure is closely related to its partial theory. The theory of a partial structure that is the intersection of all its extensions corresponds to the common part of extensions considered in non-monotonic logics [5].

294

Bozena Staruch

Temporal, modal, multimodal and epistemic logics are used to express properties of extensions of partial structures (see, e.g., [10],[12] or [13]). We investigate the inconsistency problem that may appear in multiagent systems during extending and synthesizing (fusion) of partial results. From logical point of view, inconsistency could appear if the theory of a partial structure representing knowledge of a given agent is logically inconsistent under available information for this single agent or other agents. From algebraic point of view, inconsistency could appear when identification of different constants by agents is necessary. The main tool we use for fusion of partial results is the coproduct operation. For any family of partial structures there exists the unique (up to isomorphism) coproduct that is constructed as a disjoint sum of partial structures factored by a congruence identifying constants that should be identified. Then, inconsistency can be recognized during the construction of this congruence. Notice that Pawlak's information systems [8], can be naturally represented by partial structures. For example, any such system can be considered as a relational structure with some partial operations. Extensions of partial structures can also be applied to problems concerning data analysis in information systems such as the decomposition problem or the synthesis (fusion) problem of partial results [7]. We also consider multiagent systems where some further logical constraints (in form of atomic formulas), controlling the extension process, are added. The paper is organized as follows. We introduce basic facts on partial structures in Section 2. We define there extensions of a partial structure and of a family of partial structures, as well. In Subsection 2.1 we give the construction of coproduct of the given family of partial structures. Section 3 includes the logical part of our theory. We give here a definition of possible and infallible sets of sentences. In the next section we discuss how our algebraic approach can be used in multiagent systems.

22.2 Partial structures We use partial algebra theory [2, 1] throughout the paper. Almost all facts concerning partial algebras are easily extended to partial structures [10 -13]. We consider a signature (F,C,i7,n), with at most countable (finite in practice) and pairwise disjoint sets of function, constants and predicate symbols and with an arity function n : FU n -^ Af, where Af represents the set of nonnegative integers. Any constant is a 0-ary function, so we omit a set of constants in a signature generally, and we write it apparently if necessary. Definition 1. A partial structure of signature (F, 77, n) is a triple A = {A, ( / ^ ) / e F , {'f^^)ren) such that for every f £ F f^ is a partial n{f)-ary operation on A (the domain of the operation f^ C A^^^^ x A is denoted by domf^) and for every r £ 11 r ^ C A^^^\ We say that A is a total structure of signature (F, 77, n) if all its operations are defined everywhere. An operation or relation is discrete if its domain is empty. A partial structure A is discrete if all its operations and relations are discrete.

22 Extensions of Partial Structures ...

295

Notice that for any constant symbol c, the appropriate operation c^ is either a distinguished element in A or is undefined. Every structure (even total) of a given signature is a partial structure of any wider signature. Then, the additional operations and relations are discrete. Remark 1. We will use Pawlak's information systems [8] for presenting examples, so let us recall some definitions here. An information system is a pair S = (f7, A), where each attribute a e A, is identified with function a :U —^Va, from the universe U of objects, into the set Va of all possible values on a. A formula a = Vais called a descriptor and a template is defined as a conjunction of descriptors /\{ai, Va^) where a^ E A,ai ^ aj firi^j. A decision table is an information system of the form A = (C/, A U {d}), where d ^ A is a distinguished attribute called the decision. For every set of attributes 5 C ^ , an equivalence relation, denoted by INDA{B) and called the B-indiscemibility relation, is defined by INDA{B)

= {(u, u') e V^ : for every aeB,

a{u) = a(u')}

(22.1)

Objects u, u' satisfying the relation INDA{B) are indiscernible by attributes from B. If A = {U,A) is an information system, 5 C yl is a set of attributes and X C.U is a set of objects, then the sets: BX = {u e U : [U]B C X} and BX = {u e U : [U]B n Xj^0} are called the B-lower and the B-upper approximation of X in A, respectively. The set BNB{X) =1BX - BX will be called the B-boundary of X. In rough set theory there are considered also approximations determined by tolerance relation instead of equivalence relation. Our approach can be used there, too. Example 1. We interpret the information system as a partial structure A = (f/, R), where R = {(r^^^) '- CL ^ A^v E T4}, and ra^v is a unary relation such that for every X eU X e r^^y iff a{x) = v. Partial operations can be also considered there. Example 2. Every partially ordered set is a partial lattice of signature with two binary operations V and A of the least upper bound and the greatest lower bound, respectively. Definition 2. A homomorphism of partial structures h : A —> B of signature (F, 77, n) is a function h : A —> B such that, for any f E F, if a e domf^ then ho a£ domf^ and then h{f^{a)) = f^{ho a) and for any r E 11 andai,... ,an(r) G A ifr^[ai,... ,an(r)) then r^{h{ai),... ,/i(an(r)))Definition 3. A partial structure B is an extension of a partial structure A iff there exists an injective homomorphism e^ : A —> B. //"B is total, then we say that B is a completion of A, •

E{A) denotes the class of all extensions and

296

•

B ozena Staruch

T( A) denotes the class of all completions of A.

Remark 2. For applications in further sections we use a generalization of the above notion of extension. By a generalized extension of the given partial structure A we understand any partial structure B (even of extended signature) such that there exists a homomorphism /i : A ^ B preserving some a'priori chosen constraints. Properties of extensions defined by monomorphisms are important from theoretical point of view and can be easily imported to more general cases. We also consider extensions under some further constraints which follow from assumption on extensions to belong to special classes of partial structures [13]. Definition 4. A is a weak substructure o/B iff the identity embedding id A : A -^ B is a homomorphism ofpartial structures idA : A —^ B . Hence, every partial structure is an extension of its weak substructure. We do not recall here notions of a relative substructure and a closed substructure. Example 3. If B is a subtable of the given information system A, then the corresponding partial structure A is an extension of B . By a subtable we mean any subset of the given universe with some attributes and some values of these attributes. We allow null attribute values in subtables. B = (UB^RB) is a weak substructure of the given information system A = ([/, R) if UB CU and RB Q R (then also B C A). It means that if x G r^^ then X e r^y. Hence, it may be that a{x) = t' in A and x £ UB and a G 5 but a{x) is not determined in B. Example 4. For generalized extensions we discern some constants. For example let A = ([/, -R) be a relational system corresponding to information system A = (C/, A) and let X C A. Take the language Cu in which every object of C/ is a constant. Assume that every constant of the lower approximation A{X) should be discerned from every constant from the complement, while no assumption is taken for objects in the boundary region of the concept. One can describe the above discemibility using decision tables. To do that let d be an additional decision attribute such that d{x) = 1 for every x G A{X) and d{x) = 0 for every x eU\ A{X). For partial lattices A , B , C described in Figure 22.1, A is an extension of B and obviously B is a weak substructure of A. A is a generalized extension of C under assumption that a ^ h ^ c ^ a. The appropriate homomorphism glues c with d. II.IX Extensions of a family of partial structures We assume that a family of partial structures (agents) is given. Every possible extension should include (in some way) every member of the given family, as well as the entire family. Let us take the following definition:

22 Extensions of Partial Structures ...

297

Example 5.

K B

Fig. 22.1. Partial lattices

Definitions. Let dt = {Ai)i^i be a family of partial structures of a given fixed signature. A partial structure B is an extension of^iff^ is an extension of every AiE^. is a generalized extension of every And B is a generalized extension of^iff^ E{^), T(9?) denote the classes of extensions and completions of a family 3^, respectively. Definition 6. Let 3? = {Ai)i^i be a family of partial structures of signature (F, C,n,n). A partial structure B of the same signature is called a coproductc?/3^ iff there exists a family of homomorphisms hi : Ai —^ 'B for every Ai G ^ and iffor a certain partial structure C there exists a family of homomorphisms gi \ Ai ^^ C then there exists a unique homomorphism /i : B —> C such that ho hi = Qi, Proposition 1. For any family ofpartial structures the coproduct of this family exists and is unique up to isomorphism. Construction of coproducts of partial structures Let ^ = {Ai)i^i be a family of partial structures of signature (F, C, 77, n). We assume that there are no 0-ary functional symbols in F, i.e., all constants are included in C. Let for any A^ e 5R, A^ denote its reduct to signature (F, 77, n). We first take a disjoint sum (j^^ of the family SR° = {A? : A^ e ^}. We now take care of the appropriate identification of the existing constants and set ceC, c^Sc^^exist} ^o = {((eAi^^)^(cA,-^^-)). Ai^AjGdi, Moreover, let 6 be the congruence relation on | J ^ ° generated by 0^. Finally, we set B = \J^^/O and a family of appropriate homomorphisms hi : A^ -^ B so that hi{a) = [ia,i)]0. Proposition 2. The partial structure B constructed above is a coproduct of the family SR.

298

Bozena Staruch Example 6.

Coproduct Fig. 22.2. Coproduct of partial lattices A, B, C If each of the above homomorphisms is injective, then we call the coproduct the free sum of K and the free sum is an extension of 3?. If there are no constants in the signature then the disjoint sum of the family 3ft is a free sum of 3?. The coproduct is a generalized extension of 5t if it preserves all the a'priori chosen inequalities. We also consider coproducts under further constraints in form of a set of atomic formulas. Look at Figure 22.2. Assume that a, 6, c, cJ, e are constants in the signature of the language. In the coproduct of A, B, C there must be determined a A 6, a A c, 6 A c, because they have been determined in A. It follows from the construction of the coproduct that all a V 6, a V c, 6 V c must be determined and must be equal. Notice that this coproduct is not an extension of the given family of partial lattices, but it is a generalized extension of this family when it is assumed that a ^ h ^ c^ a. And it is not a generalized extension of {A, B , C} if we assume that all the constants a, 6, c, d, e are pairwise distinct. We see from this example that depending on the initial conditions the coproduct of the given family of partial structures can be a generalized extension or not. If it is not a generalized extension then it means that generating the congruence in the construction of coproduct we have to identify constants which are assumed to be different. In this situation we say that the given family of partial structures is inconsistent with undertaken assumptions (consistent, for short). This inconsistency is closely related to logical inconsistency. The given family 9? of partial structures is inconsistent with undertaken assumptions A iff the infallible set of sentences for 3? (definition in the next section) is inconsistent with A. It is important to know what to do when inconsistency appears. We consider the simplest approach to detect what condition cause problems and take a crisp decision of rejecting it. In the above example by rejecting d ^ e ^t obtain a consistent generalized extension. For applications it is worth to assume that partial structures under considerations are finite with finite sets of functional and relational symbols. In this situation we can check inconsistency in a finite predicted time. Various methods for conflict resolving dependent on the application problem are possible. For example, facts that cause conflicts can be rejected, one can use voting in eliminating facts causing conflicts. In

22 Extensions of Partial Structures ...

299

general, the process of eliminating some constraint can require some more advanced negotiations between agents.

22.3 Possible and infallible sets of sentences We present in this section the logical part of our approach. Let £ be a first order language of signature (F, 77, n). The set of all sentences of the language C is denoted \yj Sent{C). Assume that A is a given partial structure in the signature of £ (a partial structure of £, for short). Definition 7. A set of sentences E C Sent{C) is possible/(7r A iff there is a total structure B G T(A) such that B \= E. The set of sentences PA = C\{^K^) ' ^ ^ ^ ( A ) } is called an infallible set of sentences for A. We say also that PA is the theory ofpartial model A. Notice that a set of sentences is possible for a partial structure A if it is possible for a certain extension of A. The infallible set of sentences for a partial structure A is also an intersection of infallible sets for all its extensions. Notice here that extension used in non-monotonic logics corresponds to theories of total structures, whereas the infallible set for a partial structure corresponds to the intersection of non-monotonic extensions. The properties of possible and infallible sets of sentences are described and proved in [10 - 13]. If 5^ is a family of partial structures then we define possibility and infallibility for 3^ analogously, and if Pg? denotes the set of sentences infallible for 3fJ then we have the following: 1. Pgj = Cn((J{PAi • Ai € 3?}), where Cn denotes classical operator of first order consequences. 2. P^ is logically consistent iff T(§ft) is nonempty. Let A be a partial structure of a language C of signature (P, 7J, n). We extend C to CA by adding a set of constants CA = {ca • CL ^ A}. Now, we describe all the information about A in £A- Let EA be the sum of the following sets: ^F ={f{ca^,...,Ca^^f^) = Ca: f ^ F, (tti,..., a ^ / ) ) E d o m / ^ , /^(tti,..., a ^ / ) ) a} ^n ={r(cai,...,Ca„(,)): r G 77 , r ^ ( a i , ...,a^(^))} SCA = {ca ^ Cfe: a,b e A , ay^b} Remark 3, When dealing with generalized extensions as in Remark 2, we do not assume that the homomorphism for extension is injective. Then in place of EA we may take a set Ep U En U Ec where Ec is any subset of Ec^ • Then all the results for extensions easily transform to the generalized ones. Definition 8. Let Abe a partial structure of a language £. Let CA be the language as above. We say that a partial structure A' is an expansion of A to CA iff A' = A, yA = yA ^A _ ^A ^^j. ^y^yy f £ F and r e n and c^ = a.

300

B ozena Staruch

Proposition 3. For any partial structure AofC Sent{C).

Pp^> = Cn(i7^) and PA = PA' H

For a family 5R = {Ai)i^i of partial structures of a given language C we can take a language >Cs^ extending £ by a set of constants Cs^ = {ca^ : ai e Ai,i e I}. Here the set Ss^ = (J Z'A. has analogous properties to EA-

22A Partial structures and their extensions in multiagent systems Let us consider a team of agents with the supervisor agent who collects information, deals with conflicts and distributes knowledge. Generally, the system that we discuss will be like a firm hierarchy. There is the chief director and n departments with their chiefs, the departments 1,..., n are divided into Ui, i = 1,..., n sections with their chiefs and so on. There are information channels from sections to departments and from departments to the director. Agents that are inmiediately higher are able to receive information from their subordinate (child) agents, fuse the information, resolve conflicts, and change knowledge of their children. There are also information channels between departments and between sections and so on. These channels can be used only for exchanging information. It may be assumed that such a frame for agent systems is obligatory. For simplicity we do not assume any frame except the supervisor agent. However, agents can create subsystems of child agents and supervise them. They can also exchange information if they need. The relationships between agents follow from existence of a homomorphism. We represent collected information as a partial structure of a language C with signature {F^IJ^n). We assume that the environment of the problem one work is described in a language of arbitrarily large signature. We can reach this world by observing finite fragments only. Hence the signature we use should be finite, but sufficient for the observed fragments and whenever a new interrelation is discovered, the signature can be extend. At the beginning we perform the following steps: •

• •

We give names to the observable objects and discover interrelations between objects and decide which of them should be written as relations, partial operations or constants and additionally which names describe different objects. Depending on the application problem we decide which interrelations should be preserved while extending. We represent our knowledge by means of afinitepartial structure A of a language C with signature (F, IJ, n) with information about discemibility of some names. The names of observed objects are elements of the structure either all of them or these that are important for some reason.

Having a partial structure A of a language C in signature (F, 77, n) we extend the language to CA- Let A denote the expansion of Kio CA- The discemibility of some names is written up as a subset S C EA- We distribute the knowledge to n agents Agi^ ...Agn- The method of distribution depends on the problem we try to solve, but the most generally, we select n weak substructures Ai,..., An from A. Every agent

22 Extensions of Partial Structures ...

301

Agi, i = 1,..., n is provided with knowledge represented by Ai and additionally we provide him/her with a set of inequalities Si C S. There are a lot of possibilities to do that: we can either take relative or closed substructures or substructures covering A ( or not), or even n copies of A. And the inequalities from E could be distributed to the agents by various procedures. We describe a situation when the agents get Si C S such that there exists a set of constants d C C^^ such that Si = {ca ^ Cb : Ca^Cb e Ci^a ^h}. It means that all the constants from Ci are pairwise unequal. As we will show below this situation is easy to menage from theoretical point of view. Thus, the knowledge of an agent Agi is represented by a partial structure A^ such that Ai = (Ci, {f^')f^F, {'f^^')ren) is a weak substructure of A^ and additionally for every Ca^Ca €Ci ,a ^bii holds that Ca ^ c^. Notice that from logical point of view S^_ C Z*^. C 17 C SA- Hence (J i7^. C U ^Ai ^ SA is inconsistent set of sentences in CAThe knowledge distribution process is based on properties of a finite number of constants. In such an algebraic approach we can take advantage of homomorphisms, congruences, quotient structures, coproducts, and so on. We have at our disposal a closely related logical approach, where we can use infallible sets of sentences, consistency, and so on. We are able to propose logical methods of knowledge distribution via the standard family of partial structures (see [13]) whereas the logic, as not effective, would be less useful in applications. We consider also multiagent systems where some further logical constraints (in form of atomic formulas), controlling the extension process, are added. Hence let every agent Agi possesses a set Ai of atomic formulas. Example 7. Let our information be written as a relational system A = ([/, i?) corresponding to information system A = (f/, A). Let X C ^4 be a set of objects. Take the language Cu. Thus every object is now a constant. We discern every constant of the lower approximation A{X) from every constant from the complement, while no assumption is taken for objects in the boundary region of the concept. Now we distribute A to n agents Agi,..., Agn giving to every Agi, i = 1,..., n, a weak substructure (subtable) A^ = (f/^, RAJ- This distribution depends on expert decision, it may be a covering of A or only a covering of a chosen(training) set of objects. Moreover, every agent Agi has at his disposal sets of objects Xi = XnUi, U—Xi and makes his own approximations of these sets and discern constants. Additionally, every agent Agi gets a set of descriptors (or templates) which is either derived from his own information or is obtained from the system. Hence, every agent approximates a part of the concept described by X, he can get new objects, new attributes, new attribute values and a new set of descriptors. Now one can consider fusion (the coproduct) of information obtained from agents.

302

Bozena Staruch

22.4.1 Agents acting Now we are at starting point, i.e., in a moment t = 0. We have the fixed language CA> Local state of every agent Agi is represented by a partial structure A^ equipped with a set of inequalities of constants from Ci and also with a set of atomic formulas Ai. We take the following assumptions. • • • •

Every agent knows the signature of the language. One can consider also situation where every agent knows all the sets d. Every agent is able to exchange information with others. Then he makes fusion and resolves his internal conflicts using the construction of the coproduct. Every agent can build his own system of child agents in the same way as the whole system. For every moment of time there is the supervising agent with his own constraints (in form of atomic formulas) that collects all the information and deals with inconsistency using coproduct.

Now, knowledge is distributed and we are going to show how agents act. Every agent possesses his knowledge and is able to collect new information independently. New information can be obtained by the agent by his own exploring of the world or by his child agents acting or by exchanging information with other agents. The new information is in the form of new objects, new operation and relation symbols, new determination of constants,extension of domains of some relations and operations, new constraints that can be derived or added. Assume that at some time our system is in a state s. It means that agents have collected information and written it as partial structures Af,..., A^ which are consistent generalized extensions of Ai,..., A^, respectively. The main tool for fusion is a notion of coproduct, which plays the supervising role in the system. Additionally, a set As of atomic formulas is given. We construct a coproduct S of a family of partial structures A, Af,..., A^. If S is a generalized extension of Ai,..., An and As holds in S, then the system is consistent, and knowledge can be redistributed, otherwise we should resolve conflicts. Now, S plays role of A i.e. we take a new language Cs and the expansion S of S to this language. The most general way of knowledge redistribution is repetition of the process. It is often necessary to redistribute the knowledge in accordance, for example, either with the initial information (i.e., preserving A^ for every agent Agi) or with actually sent information. Notice that it is not necessary to stop the system for synthesis. Agents can work during this process. In this situation every agent should synthesize his actual results with redistributed knowledge (i.e., construct a coproduct of this two) and dispose of the eventual inconsistency by his own. 22.4.2 Dealing with inconsistency From logical point of view inconsistency may appear when the set of sentences IJ PA\ is either not possible for the family Ai,..., An or is possible for this family

22 Extensions of Partial Structures ...

303

but is inconsistent with As that is, it is logically inconsistent. Algebraic inconsistency is when the coproduct is inconsistent with the given constraints. There are two kinds of inconsistency: (i) the first one is "internal", when an agent, say Agi needs to identify constants that were different in A^ as a consequence of extending his knowledge; (ii) an "external" one, when knowledge of every agent is internally consistent, but there are conflicts in the whole system. Notice that decision of removing some determinations of constants and operations is not irreversible since the agent can resend the same information. If it happens "often" then we have a signal that there is something wrong, and we have to correct our initial knowledge. We remove inconsistency while exchanging information. If agent Agj sends some information to Agi, then Agi should resolve conflicts using a coproduct as above. The requirement of exchanging information can be recognized by Agi when he gets a constant and he knows that Agj has some information about this constant. One can use the following schema for dealing with inconsistency. First identification of constants in the process of congruence generating in the construction of the coproduct is a signal that inconsistency could appear. From proceeding of this process the supervisor detects the cause of inconsistency and sends orders to remove given determination to detected agents, respectively. We do not assume that the process should stop for the control. If agents work during this time then they fuse their actual knowledge with this redistributed and resolve conflicts with an assumption that the knowledge form the supervisor is more important. The proposed system may work permanendy, stoping after human decision or in case if some conditions are satisfied, e.g., time conditions or restrictions on the system size.

22.5 Conclusions We have presented a way of modelling multiagent systems via partial algebraic methods. We propose the coproduct operator to fuse knowledge and resolve conflicts under some logical-algebraic constraints. Further constraints may be also considered, for example on number of agents and or on the system size.

References 1. Bartol, W., 'Introduction to the Theory of Partial Algebras', Lectures on Algebras, Equations and Partiality, Univ. of Balearic Islands, Technical Report B-006, (1992), pp. 36-71. 2. Burmeister, P., 'A Model Theoretic Oriented Approach to Partial Algebras', Mathematical Research 32, Akademie-Verlag, Berlin (1986). 3. Burris, S., Sankappanavar, H.P[.], 'A Course in Universal Algebra', Springer-Verlag, Berlin (1981). 4. Fagin, R., Halpem, J., Moses, Y., Vardi, M.Y., 'Reasoning About Knowledge', MIT Press, Cambridge MA (1995). 5. Gabbay, D.M., Hogger, C.J., Robinson, A. A., 'Handbook of Logic in Artificial Intelligence and Logic Programming 3: Nonmonotonic Reasoning and Uncertain Reasoning', Oxford University Press, Oxford (1994).

304

Bozena Staruch

6. dlnverno M., Luck, M., 'Understanding Agent Systems', Springer-Verlag, Heidelberg, (2004). 7. Pal, S.K., Polkowski, L., Skowron, A. (Eds.), 'Rough-Neural Computing: Techniques for Computing with Words', Springer-Verlag, Berlin, (2004). 8. Pawlak, Z., 'Rough sets. Theoretical Aspects of Reasoning about Data', Kluwer Academic Publishers, Dordrecht, (1991). 9. Shoenfield, J.R., 'Mathematical Logic', Addison-Wesley Publishing Company, New York 1967. 10. Staruch, B., 'Derivation from Partial Knowledge in Partial Models', Bulletin of the Section of Logic 32, (2002), pp. 75-84. 11. Staruch, B., Staruch B., ' Possible sets of equations'. Bulletin of the Section of Logic 32, (2002), pp. 85-95. 12. Staruch, B., Staruch B . , ' Partial Algebras in Logic', submitted to Logika, Acta Universitatis Vratislaviensis, (2002). 13. Staruch, B., Staruch B.,' First order theories for partial model', accepted for publication in Studia Logica, (2003).

23 Tolerance Information Granules Jaroslaw Stepaniuk Department of Computer Science Bialystok University of Technology Wiejska45a, 15-351 Bialystok, Poland [email protected] Summary. In this paper we discuss tolerance information granule systems. We present examples of information granules and we consider two kinds of basic relations between them, namely inclusion and closeness. The relations between more complex information granules can be defined by extension of the relations defined on parts of information granules. In many application areas related to knowledge discovery in databases there is a need for algorithmic methods making it possible to discover relevant information granules. Examples of SQL implementations of discussed algorithms are included.

23.1 Introduction Last years have shown a rapid growth of interest in granular computing. Information granules are collections of entities that are arranged together due to their similarity, functional adjacency or indiscemibility [13], [14]. The process of forming information granules is referred to as information granulation. Granular computing as opposed to numeric-computing is knowledge-oriented. Knowledge based processing is a cornerstone of knowledge discovery and data mining [3]. A way of constructing information granules and describing them is a common problem no matter which path (fuzzy sets, rough sets,...) we follow. In this paper we follow rough set approach [7] to constructing information granules. Different kinds of information granules will be discussed in the following sections of this paper. The paper is organized as follows. In Section 23.2 we recall selected notions of the tolerance rough set model. In Section 23.3 we discuss information granule systems. In Section 23.4 we present examples of information granules. In Section 23.5 we discuss searching of optimal tolerance granules.

23.2 Selected Notions of Tolerance Rough Sets In this section we recall selected notions of the tolerance rough set model [8], [9], [11], [12].

306

Jaroslaw Stepaniuk

We recall general definition of an approximation space [9], [11], [12] which can be used for example for introducing the tolerance based rough set model and the variable precision rough set model. For every non-empty set C/, let P {U) denote the set of all subsets of U. Definition 1. A parameterized approximation space is a system AS^^% = {UJ^,jy$), where • • •

U is a non-empty set of objects, I^ :U —^ P{U) is a granulation function^ iy$ : P (U) X P (U) —• [0,1] is a rough inclusion function.

The granulation function defines for every object x a set of similarly described objects. A constructive definition of granulation function can be based on the assumption that some metrics (distances) are given on attribute values. For example, if for some attribute a e Aa. metric Sa : Va x Va —> [0, oo) is given, where Va is the set of all values of attribute a then one can define the following granulation function: y e i t {x) if and only if 5a (a {x), a {y)) < fa (a (x)

,a{y)),

where fa'-^a^Va -^ [0, oo) is a given threshold function. A set X C f/ is definable in AS:^^$ if and only if it is a union of some values of the granulation function. The rough inclusion function defines the degree of inclusion between two subsets ofC/[9].

This measure is widely used by data mining and rough set conmiunities. However, Jan Lukasiewicz [5] was first who used this idea to estimate the probability of implications. The lower and the upper approximations of subsets of U are defined as follows. Definition 2, For an approximation space AS^^$ = (C/, J:,^, u$) and any subset X C U the lower and the upper approximations are defined by LOW (A5^,$, X)={xeU:u, (/^ (x), X) = 1} , UPP (A%,$, X) ={xeU :us (/# (x), X) > 0}, respectively Approximations of concepts (sets) are constructed on the basis of background knowledge. Obviously, concepts are also related to unseen so far objects. Hence it is very useful to define parameterized approximations with parameters tuned in the searching process for approximations of concepts. This idea is crucial for construction of concept approximations using rough set methods. In our notation # , $ are denoting vectors of parameters which can be tuned in the process of concept approximation. Approximation spaces are illustrated on Figure 23.1.

23 Tolerance Information Granules

307

Fig. 23.1. Approximation Spaces with Two Vectors # 1 and # 2 of Parameters

Rough sets can approximately describe sets of patients, events, outcomes, keywords, etc. that may be otherwise difficult to circumscribe. We recall the notion of the positive region of the classification in the case of generalized approximation spaces [12]. Definitions. Let AS^^% = {U^I^^u%) be an approximation space and let for a natural number r > 1 a set {Xi,... ,Xr} be a classification of objects (i.e. Xi^... ,Xr CU, |J[_-^ Xi = U and Xi fl Xj = 0/or i ^ j , where z, j = 1 , . . . , r j . The positive region of the classification { X i , . . . , Xr} with respect to the approximation space AS^ $ is defined by POS (A5#,$, { X i / . . .,Xr]) = U L i LOW ( A % , $ , X , ) . Let DT = ([/, A U {d}) be a decision table [7], where C/ is a set of objects, A is a set of condition attributes and c? is a decision. For every condition attribute a G A is known a distance function Sa : Va x Va -^ [0, oo), which is defined as follows: for numeric attributes Sa{ci{Xi),Ci{Xj))

= \ci{Xi) -

a{Xj)\

for symbolic attributes

*.(.fe)..fe))={;:?:<:;!;:<:;!

308

Jaroslaw Stepaniuk

For every attribute a e AV/Q determine tolerance threshold Sa- If we know attribute a e A and Sa, the tolerance relation Ta{sa) (reflexive and symmetric relation) is as follows: Vxi,x,€t/ {{Xi.Xj)

G TaiSa) ^

[Sa{a{Xi),

a{Xj))

< £«]) •

If we have a set of attributes B C A and £ai, where ai e B, the tolerance relation TB {£ai, ^aa 5 • • • 5 ^an) is defined as follows:

Let DT = (U,AU {d}), discemibihty matrix of DT table we called square matrix n x n, where n is a number of objects in the set U. Descemibility matrix M{xi,Xj) is defined as follows: M{^ ^ \_ j W^^'^aia ivi[Xi,Xj)-<^^

(xi), a (xj)) > £a} if d{xi) ^ d{xj) z / d ( x , ) = d(x,)

Let Q be a quality function which is defined, for example, by: Qisa,, £a2,. •., ^an) = (NRdRIA/NRd)

^w-\- (NRdRIA/NRIA)

*{l-w)

where: • • • •

NRd is the number of pairs of objects which have the same decision attribute value; NRIA is the number of pairs of objects which are in TA {sai ? ^02 ? • • i ^an) relation; NRdRIA is the number of pairs of objects, which have the same decision attribute value and they are in TA {sai, ^02 5 • • • ^ ^on) relation; w is a weight, it could have values in the interval [0,1];

If we want to know if our vector of tolerance thresholds or decision rules are optimal, we check values, which are returned by a quality function. The optimal vector or rule is one, which has the highest value of the quality function.

23.3 Information Granule Systems In this section, we present a basic notion for our approach, i.e., information granule system. Any such system S consists of a set of granules G. Moreover, a family of relations with the intended meaning to be a part to a degree between information granules is distinguished. The degree structure is described by a relation to be an exact part. More formally, an information granule system is any tuple S = {G,H,<,{iyp}peH) where

(23.1)

23 Tolerance Inforaiation Granules

309

1. G^ is a finite set of granules; 2. jff is a finite set of granule inclusion degrees with a binary relation < which defines on i? a structure used to compare the degrees 3. Up G G X G is a binary relation to be a part to a degree at least p between information granules from G, called rough inclusion. One can consider the following examples of elementary granules: 1. a set of descriptors of the form (a, v) where a e A and v e Va for some finite attribute set A and value sets T4; 2. a set of descriptor conjunctions. In the standard rough set model information granules are corresponding to indiscemibility classes of an equivalence relation. Let, for example, U be a set of cars (see, Figure 23.2) and we consider two attributes colour and type of car's body. Let ycolour = {white, yellow, black, green} and Vtype = {van, sedan, station wagon}. In this case we obtain twelve information granules corresponding to conjunctions of descriptors e.g. {colour, white) A {type, van), {colour, yellow) A {type,

van),...

1 , station wagon

sedan

van

white

yellow

black //

Fig. 23.2. Granules in the Standard Rough Set Model

green

310

Jaroslaw Stepaniuk

For a set X of cars the lower and the upper approximation is also depicted on Figure 23.2. Examples of complex granules are tolerance granules created by means of similarity (tolerance) relation between elementary granules, decision rules or sets of decision rules. 23.3.1 Syntax and Semantics of Information Granules Usually, together with an approximation space, there is also specified a set of formulas # expressing properties of objects. Hence, we assume that together with the approximation space AS^^$ there are given • •

a set of formulas # over some language, semantics Sem of formulas from ^, i.e., a function from ^ into the power set PiU).

Let us consider an example [7]. We define a language Ljs used for elementary granule description, where IS = {U, A) is an information system. The syntax of LIS is defined recursively by 1. 2. 3. 4.

(a in V) G I//5, for any a e A and V C Va. If a G LIS then -^a € LisIf a, /3 E LIS then a A /3 E L/5. If a, /? E LIS then aV (3 eLis^

The semantics of formulas from Lis with respect to an information system IS is defined recursively by 1. 2. 3. 4.

Semis{a inV) = {x eU : a {x) E V} . Semis{~^Oi) = U — Semis{oi). Semis {a A /3) = Semis{ot) fl Semis{P). Semis {a V /?) = Semis (a) U Semis{/3).

A typical method used by the rough set approach [7] for constructive definition of the uncertainty function is the following: for any object x eU, there is given information InfA {x) (information signature of x in A) which can be interpreted as a conjunction EFB {X) of selectors a = a (x) for a E A and the set / # (x) is equal to Semis{EFB{x))

= Semis

/\ \aeA ) One can consider a more general case taking as possible values of I^ {x) any set ||a|| j5 containing x. Next from the family of such sets the resulting neighborhood / # [x) can be selected. One can also use another approach by considering more general approximation spaces in which / ^ (x) is a family of subsets of IJ, We now present the syntax and the semantics of examples of information granules. These granules are constructed by taking collections of already specified granules. They are parameterized by parameters which can be tuned in applications. In

23 Tolerance Information Granules

311

the following sections we discuss some other kinds of operations on granules as well as the inclusion and closeness relations for such granules. Let us note that any information granule g formally can be defined by a pair {Syn{g), Sem{g)) consisting of the granules syntax Syn{g) and semantics Sem{g). However, for simplicity of notation we often use only one component of the information granules to denote it.

23.4 Examples of Information Granules Elementary granules. In an information system IS = {U,A), elementary granules are defined by EFB (X) , where EFB is a conjunction of selectors (descriptors) of the form a = a{x), B C A and x eU. For example, the meaning of an elementary granule a = l A 6 = l i s defined by Semis (a = 1 A 6 = 1) = {x G C/ : a{x) = lk b{x) = 1} . Thus, in the system 5B = ( G s , i / , < , K } p G / f )

(23.2)

of elementary granules GB is a set of conjunctions of selectors, H = [0,1] and Up{EFB, EF'B) if and only if card (Semis (EFB) D Semis {EF'^)) ^ card (Semis (EFB)) "^ The number of conjuncts in the granule can be taken as one of parameters to be tuned what is well known as the drooping condition technique in machine learning. One can extend the set of elementary granules assuming that if a is any Boolean combination of descriptors over A, then (Ba) and (Ra) define syntax of elementary granules too, for any B C A. Sequences of granules. Let us assume that 5 is a sequence of granules and the semantics Semis (•) in IS of its elements have been defined. We extend Sem^is (•) on 5 by

Semis (S) = {Semis (9)}ges • Example 1. Granules defined by rules in information systems are examples of sequences of granules. Let IS be an information system and let (a, /?) be a new information granule received from the rule if a then (3 where a,/3 are elementary granules of IS. The semantics Semis ((<^i0)) of ((^^l3) is the pair of sets (Semis ((^) 5 Semis (/?)) • If the right hand sides of rules represent decision classes then among parameters to be tuned in classification is the number of conjuncts on the left hand sides of rules. Typical goal is to search for minimal (or less than minimal) number of such conjuncts (corresponding to the largest generalization) which still guarantee the satisfactory degree of inclusion in a decision class. Sets of granules. Let us assume that a set G of granules and the semantics Semis

(•)

312

Jaroslaw Stepaniuk

in IS for granules from G have been defined. We extend Semis (•) on the family of sets H C Gby Semis {H) = {Semis id) ' 9 ^ H}, One can consider as a parameter of any such granule its cardinality or its size (e.g., the length of such granule representation). In the first case, a typical problem is to search in a given family of granules for a granule of the smallest cardinality sufficiently close to a given one. Example 2. One can consider granules defined by sets of rules. Assume that there is a set of rules RuleSet = {(a^, /3i) : i = 1,... ,k} . The semantics of Rule^Set is defined by Sem^is (Rule-Set) = {Sem,is ((a^, /3i)) : i = 1 , . . . ,fc}. The above mentioned searching problem for a set of granules corresponds in the case of rule sets to searching for the simplest representation of a given rule collection by another set of rules (or a single rule) sufficiently close to the collection. Example 3. Let us consider a set G of elementary information granules - describing possible situations together - with decision table DT^ representing decision tables for any situation a e G. Assume Rule.Set{DTa) to be a set of decision rules generated from decision table DTa (e.g., in the minimal form). Now let us consider a new granule {{a,Rule.Set{DTa)) : a e G} with semantics defined by {SemoT {{a,Rule.Set{DTa))) : a e G} = {{Semis {(^), SemoT {Rule.Set {DTJ))) \aeG}. An example of a parameter to be tuned is the number of situations represented in such granule. A typical task is to search for a granule with the minimal number of situations creating together with the sets of rules, corresponding to them, a granule sufficiently close to the original one. Extension of granules defined by tolerance relation. Now we present examples of granules obtained by application of a tolerance relation (i.e., reflexive and symmetric relation; for more information see, e.g., [9]). Example 4. One can consider extension of elementary granules defined by tolerance relation. Let IS = (f/, A) be an information system and let r be a tolerance relation on elementary granules of IS. Any pair (r : a) is called a r-elementary granule. The semantics Sem,is ((r : a)) of (r : a) is the family {Semis {P) : (/?, a) G r } . Parameters to be tuned in searching for relevant tolerance granule can be its support (represented by the number of supporting it objects) and its degree of its inclusion (or closeness) in some other granules as well as parameters specifying the tolerance relation. Example 5. Let us consider granules defined by rules of tolerance information systems [9]. Let IS = {U^ A) be an information system and let r be a tolerance relation on elementary granules of IS. If if a then /3 is a rule in IS then the semantics of a new information granule (r : a,/?) is defined by Semis ((r : a,/?)) =

23 Tolerance Information Granules

313

Semis {{OL^T)) X Semis ( ( / 3 , T ) ) . Parameters to be tuned are the same as in the case of granules being sets of more elementary granules as well as parameters of the tolerance relation. Example 6. We consider granules defined by sets of decision rules corresponding to a given evidence in tolerance decision tables. Let DT = {U,A,d)bea. decision table and let r be a tolerance on elementary granules of IS = (U^A). Now, any granule (a, Rule.Set {DTa)) can be considered as a representative for the information granule cluster (r : {a,Rule.Set{DTa))) with the semantics SemoT i(r : (a, Rule.Set (DTa)))) = {SemDT ((/^, Rule.Set {DT13))) : (^, a)eT}, One can see that the considered case is a special case of information granules from Example 3 with G defined by tolerance relation.

23.5 Searching for Optimal Tolerance Granules In this section we discuss searching for tolerance granules. One of the problems while generating decision rules is determination of optimal tolerance thresholds. Quality of generated rules depends on right choice of tolerance threshold vector. To find optimal tolerance thresholds we have to count difference between objects in decision table. For DT = {U,A\J {d}) and 5a for every attribute a G ^ we can build new decision table DT' = ([/', A' U {D}) where: U' = A' = {a' : U' ^R^

{{xuXj)eUxU:{i<j)} : a\{xi,Xj))

=

5a{a{xi),a{xj))}

Next we can search tolerance threshold vector for appropriate attributes, which describe objects. The easiest way is to consider all possible combinations of vectors what "check all" method presents. For all combinations estimation of quality function is calculated. The best thresholds vector is one with the higher value of estimated quality function. This problem is computationally complex. Experiments showed that pairs of objects with different decision attribute value do not improve quality of tolerance thresholds. It is enough to consider only pairs of objects with the same decision attribute. If we want to use this method for many objects we have to divide table DT' to parts. Second method is heuristic method step forward. Threshold values are counted for every attribute separately. Let A; > 0 be a given natural number. This method consists of steps:

314

Jaroslaw Stepaniuk

1. choose k best thresholds values for first attribute; 2. choose k best thresholds values for next attribute, from thresholds already chosen and actually counted; 3. repeat second step for all condition attributes; 4. choose k best thresholds values for first attribute considering all other threshold values. In our further presentations we will use the following data table: DT {ID, num, sym, d) where the first attribute is a unique identifier for an object and there is numeric attribute num and symbolic attribute sym and decision attribute d. Query which creates DT' table is as follows: CREATE TABLE DT' (num' INTEGER, sym' INTEGER, D INTEGER); Query which inserts data into DT' table: INSERT INTO DT' SELECT ABS(DTO.num-DTl.num) AS num', IIF(DTO.sym=DTl.sym,0,l) AS sym', IIF(DTO.d=DTl.d,0,l) AS D FROM DT AS DTO, DT AS DTI WHERE DTO.ID < DTI.ID; An idea of query which calculates NRd is as follows: SELECT COUNT (*) AS NRd FROM DT' WHERE (D=0); A sketch of SQL query for step forward method is presented below: SELECT DISTINCT

(((NRdRIA/NRd)*w)+((NRdRIA/NRIA)*(l-w))) AS q, (SELECT COUNT (*) FROM DT' WHERE (D=0 AND num' < A.num' AND sym'=B.sym')) AS NRdRIA, (SELECT COUNT (*) FROM DT' WHERE (num'
23 Tolerance Information Granules

315

FROM DT' WHERE (num' < A.num' AND sym'=B.sym')) <> 0 AND A.D=0 AND B.D=0 AND ((A.niiin'= THRES.O.l) OR ... OR (A.num'= THRES_0_k)); For more detailed presentation of SQL queries in searching for tolerance thresholds and generation of tolerance decision rules see [1].

Conclusions Syntax and semantics of information granules is discussed. An approach to tolerance information granules is presented. The examples are illustrated using SQL language. The approach seems to be promising and will be further explored.

Acknowledgements The research has been supported by the grant 4T11C014 25 from Ministry of Scientific Research and Information Technology of the Republic of Poland.

References 1. Dakowicz M., Stepaniuk J.: Tolerance Rough Sets and Data Base Management Systems, Proceedings of Concurrency, Specification and Programming Workshop, Czama, Poland, September 25-27,2003, 108-119. 2. Garcia-Molina H., Ullman J., Widom J.: Database Systems: The Complete Book, Prentice Hall, 2002. 3. Kloesgen W., Zytkow J. (Eds.): Handbook of Knowledge Discovery and Data Mining, Oxford University Press, Oxford, 2002. 4. Krawiec K., Slowinski R., Vanderpooten D.: Learning of Decision Rules from Similarity Based Rough Approximations. In: Skowron A., Polkowski L.(Eds.) Rough Sets in Knowledge Discovery. Physica Verlag, Heidelberg, 1998, 37-54. 5. Lukasiewicz J.: Die logischen grundlagen der wahrscheinilchkeitsrechnung, Krakow 1913. In Borkowski L., ed.: Jan Lukasiewicz - Selected Works. North Holland Publishing Company, Amstardam, London, Polish Scientific Publishers, Warsaw, 1970. 6. Pal S.K., Polkowski L., Skowron A. (Eds.): Rough-Neural Computing: Techniques for Computing with Words. Springer-Verlag, Berlin, 2004. 7. Pawlak Z.: Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht, 1991.

316

Jaroslaw Stepaniuk

8. Skowron A., Stepaniuk J.: Generalized Approximation Spaces, Proceedings of the Third International Workshop on Rough Sets and Soft Computing, November 10-12,1994, San Jose, California, USA, 18-21. 9. Skowron A., Stepaniuk J.: Tolerance Approximation Spaces, Fundamenta Informaticae, vol. 27 (2,3), 1996,245-253. 10. SQL standards: http://www.jcc.com/SQLPages/jccs_sql.htm . 11. Stepaniuk J.: Optimizations of Rough Set Model, Fundamenta Informaticae vol. 36(2-3), 1998, 265-283. 12. Stepaniuk J.: Knowledge Discovery by Application of Rough Set Models, L. Polkowski, S. Tsumoto, T.Y. Lin (Eds.), Rough Set Methods and Applications. New Developments in Knowledge Discovery in Information Systems, Physica-Verlag, Heidelberg, 2000, 137233. 13. Zadeh L.A.: Toward a theory of fuzzy information granulation and its certainty in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90, 1997, 111-127. 14. Zadeh L.A.: A new direction in AI: Toward a computational theory of perceptions. AI Magazine 22(1), 2001, 73-84.

24

Attribute Reduction Based on Equivalence Relation Defined on Attribute Set and Its Power Set Ling Wei^ and Wenxiu Zhang'.2

•

^ Institute for Information and System Science, Faculty of Science, Xi'an Jiaotong University, Xi'an, People's Republic of China Department of Mathematics, North-west University, Xi'an, People's Republic of China q j jwv@nwu. e d u . en ^ Institute for Information and System Science, Faculty of Science, Xi'an Jiaotong University, Xi'an, People's Republic of China [email protected] t u . e d u . e n Summary. The knowledge discovery in information systems, essentially, is to classify the objects according to attributes and to study the relation among those classes. Attribute reduction, which is to find a minimum attribute set that can keep the classification ability, is one of the most important problems in knowledge discovery in information system. The general method to study attribute reduction in information system is rough set theory, whose theoretical basis is the equivalence relation created on universe. Novotny. M.(1998) [17] has proposed a new idea to study attribute reduction by creating equivalence relation on attribute set. In this paper, we develop this idea to study attribute reduction through creating equivalence relations on attribute set and its power set. This paper begins with the basis theory of information systems, including definitions of information systems, and equivalence relation RB on universe. Furthermore, two equivalence relations r and R are defined on attribute set and its power set separately. In the next section, two closed operators - C(R) and C{r) are created. Using these two operators, we get two corresponding closed set families - Cr, CR, which are defined as CR = {B, C(R){B) = B}, Cr = {B : C{r){B) = B}. Further, we study properties of these two closed set families, and prove that CR is a subset of Cr. One of the most important result is the necessary and sufficient condition about Cr=CR. This equivalence condition is described by elements of attribute set's division. Finally, based on the equivalence proposition, we find an easy method to acquire attribute reduction when Cr=CR. This method is easy to understand and use. Key words: information system, equivalence relation, closed operator, closed set

24.1 Introduction Knowledge discovery in database is an intelligent method of discovering unknown or unexplored relationship within a large database. It is defined as the nontrivial process *This work was supported by 973 Program of China (No. 2002CB312200).

318

Ling Wei and Wenxiu Zhang

of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [1]. Many different methods and tools are used in knowledge discovery of databases, such as Neural Network algorithm (NN) based on network structure, Genetic Algorithm (GA), Rough Sets (RS), Fuzzy Sets (FS) [2, 3, 4, 5, 6] and so on. Today, knowledge discovery is applied in a wide spectrum of fields: finance, banking, retail sails, manufacturing, monitoring and diagnosis, health care, marketing, and science data acquisition, among others [7, 8, 9, 10]. The knowledge discovery in information systems, essentially, is to classify the objects according to attributes and to study the relation among those classes. So, the knowledge discovery in information systems is the discovery about concepts and rules. Research literatures in this topic focus on attribute reduction. We all know not all of the attributes in an information system have the same importance. Some attributes are absolutely unnecessary, therefore deleting them would hardly affect the classification ability; some are absolutely necessary, and deleting them would surely lead to a decrease in classification ability; for those attributes are relatively necessary, keeping such attribute with others can improve the classification ability. Attribute reduction is to find the minimum attribute set to keep the classification ability [11,12]. The theory of rough sets is an important method in knowledge discovery field. It has been introduced by Zdislaw Pawlak to deal with imprecise or vague concept [2]. In recent years, we witnessed a rapid growth of interest in rough set theory and its applications [3, 13, 14, 15, 16]. Rough set theory study information systems through the equivalence relation created on universe (i.e. objects set). This idea inspired our initiative of studying information systems by utilizing equivalence relation created on attributes set. Miroslav Novotny (1998) has proposed this idea [17] and studied information systems using this method. In this paper, we studied not only the information systems, but also the relations based on attribute set. The paper begins with basic concept about information systems, and then we define two equivalence relations on attributes set and its power set separately. Then we create two closed set families C^, CR , and examine their relationship and properties. Finally, we find the necessary and sufficient condition about Cr=CR.

24.2 Equivalence Relation Based on Attribute Set and Its Power Set Definition 1. An information system is defined as a three-tuple IS=(U,A,F), where U = {xi, ...,Xn} is a finite set of objects, xi (i < n) is an object; A = {ai,..., am} is a finite set of attributes, aj {j < m) is an attribute; F = {fj : j < m} is a set of relationship between U and A, fjiU—^ Vj {j < m), Vj is the value set of attribute aj. An information system is named decision system when the attributes in A are composed by condition attribute set C and decision attribute set D, i.e. A = C U D, C n D = 0, which is denoted by DS=(U,C,D,F). From definition 1, we see that an information system corresponds to a relation database table, and vice versa. That is to say, an information system is the abstract

24 Attribute Reduction Based on Attribute Set

319

description of a relation database table. In general, we say an information system {U,A,F) is a pure attribute system, each attribute in which can classify the object set into at least two classes. Such information system does not have a one-value attribute. Namely, for any a e A,Va>2 holds. In fact, one-value attribute is ineffective for knowledge discovery in information systems, since it has no classification ability. For these attributes, we always delete them before analyzing the information system. The information systems we study in this paper are such pure information systems after deletion of one-value attributes. Suppose IS=(U,A,F) is an information system, for arbitrary B C A, define RB = {{x^Xj)

: fiixi) = fi{xj), {Wai e B)}

(24.1)

Then, RB is an equivalence relation based on £/, which can be proved directly. Especially, for an attribute b e A, there is R{b} = {{xi.Xj) : fb{xi) = fb{xj)} Let R = { ( 5 , C):RB=

RC}.

r = {({6}, {c}) : R^^} = R{c}}

(24.2)

We then have the following conclusions in [17]. Theorem 1. R is a congruence relation on V{A). r is an equivalence relation on A, where V{A) denotes power set on A. We denote the congruence relation as an equivalence relation, and it satisfies the following condition: For any B^ Q e V{A), i = 1,2. (Bi, Ci) G R, (^2, C2) G R hold, we have

{BiUB2,CiUC2)eR.

24.3 Relation Between Equivalence Classification Based on Attribute Set and Its Power Set In this section, we create two closed operators and corresponding closed set families using the above equivalence relation R and r. Furthermore, we study the relation between these two closed set families, get the equivalence condition for CR = Cr, and acquire a method of attribute reduction when CR = Cr. 24.3.1 About Closed Operator C(R) and C{r) Definition 2. Suppose (V{A), C) is an ordered set. C is a mapping ofV{A) into itself. Then C : V{A) -^ V{A) is called a closed operator, if it satisfies the following three conditions: (1). B C C{B)forany B G V{A) (law ofextensivity); (2). If{Bi,B2) are in V{A) and that Bi C B2. then C{Bi) C C{B2) (law of monotony); (3). C{C{B)) = C{B) hold for any B G V{A) (law of indempotence).

320

Ling Wei and Wenxiu Zhang

Because R is an equivalence relation on V{A), there exists an equivalence classification on P ( ^ ) : V{A)/R = {[B]R:BCA} (24.3) where, [B]R = {C : RB = Re}- Similarly, there is another equivalence classification on A: A/r = {[b]r:beA} (24 A) where, [6]^ = {a e A: {a,b) e r} Theorem 2. Suppose [B]R {B C A) is an equivalence class ofB with respect to R, [b]r (b E A) is an equivalence class ofb with respect to r, define C{R){B)

= U[B]R

(24.5)

and C{r){B) = U{E £A/r:EnBj^0}

(24.6)

then, C{R) and C(r) are closed operators. Proof. The definition of C{R){B) implies B C C{R){B). So, the mapping C{R) is extensive. Suppose that Bi, B2 are in V{A) and J5i C B2 holds, we have: (Bi,C(i?)(5i)) G R,

{B2,C(R){B2))

€ R.

Then, ( 5 i U B2, C{R){Bi) U C{R){B2)) G R. That is to say, {B2,C{R){B^)uC{R){B2))eR. So, C{R){B2) D C(i?)(Bi) U C{R){B2) 2 C{R){B2). The mapping C(R) is monotone. From {C{R){CIR){B)),C{R){B)) e R and (J5,C(jR)(B)) e R, we have {C{R)(C{R){B)),B) e R, so C{R){C{R){B)) C C{R){B)\ on the other hand, the extensivity and monotony of C{R) implies that C{R){B) C C{R) {C{R){B)). So, C{R){C{R){B))=C{R){B). The law of indempotence is satisfied. Therefore, C{R) is a closed operator on V{A). The proof about C{r) is a closed operator on V{A) can be found in [17]. 24.3.2 Relation Between CR and C^. Because of C{R) and C(r) are closed operators on V{A), we denote CR = {B, C(E)(B) = B } ,

a = {B, C{r){B) = B}

(24.7)

They are closed set families corresponding to closed operator C{R) and C{r). We will study properties of CR and Cr.

24 Attribute Reduction Based on Attribute Set

321

Theorem 3. CR and Cr have the following properties.

(l).iDeCr,AeCr. (2).il\eCR,AeCR. (3), IfE e A/r then {b} G [E]Rjorany b e E, (4). IfE e A/r then E' e [E]Rjorany E' C E. (5). IfE e A/r then E e Cr, That is, A/r C Cr. (6). Cr can be described as Cr = {UE; E e H C A/r}. (7). IfE e A/r, EnBj^0,thenBuEe [B]R. (8). Suppose A = {ai,..., a ^ } . If A/r = {J5i,..., Em}, ic. Ei = {a^}, then Cr = V{A). Proof (1). The definition of Cr implies these two results. (2). The definition of CR implies A G CR, and the information system we studied is pure information system implies 0 e CR. (3). If E G A/r,R{a} = R{b} holds for any a,b e E. So RE

= \\ R{a} = R{b}' aeE

That is, {b} G [E]R. (4). For Wb e E' C E,R[b} D RE^ 2 RE is right; moreover, we know R^i,y = RE from property (3). And then, RE^ = RE- SO, E' G [E]R. (5). Suppose E G A/r. For \/F{j^ E) G A/r, there must be F D £; = 0. That is to say, for each element in A/r , only E itself whose intersection with E is nonempty. Furthermore, from the definition of C{r){B) in formula (6), we see C{r){E) = E. It follows that E eCr . (6). Every element B in Cr satisfies

B = \J{EeA/r;EnBj^0}=

\J E EnB^0

So, Cr = {UE; E e H C A/r}. This is another description of Cr . (7).If 6 G EDB, thenRBUE = RB^RE = RB^R{h} = RB- So,BuEe

[B]R.

(8). Since Ei = {ai}, we have, VC G V{A) ^C

= [j{E

G A/r,E

C C} = \J{E G A/r,EnC

^ 0}

^CeCr. So, Cr = V{A) . This property shows: if each attribute's classification result is different, then Cr = V{A). Theorem 4. There exists inclusion relation between CR and Cr: CR C Cr. Proof Suppose B £ CR , which implies C{R){B) = B. For WE G A/r, when E H B f^ 0, we know B U E e [B]R from the property (7) of theorem 3. So, C{R){B) DBuEDB. Since C{R){B) = B, BuE = B holds. That is, ECB.So

322

Ling Wei and Wenxiu Zhang

C{r){B) = \J{E

e A/r; E n B ^ 0} = \J{E e A/r; E C B} C B.

On the other hand, for V 6 G B, there must exist E £ A/r which satisfies b e E, and EnB^9;^nd then be EC C{r){B). So, B C C{r){B) is proved. The above proof means B = C{r){B). So, B e Cr. From the above proof, we see CRCCr. In general, there exists the relation CR ^ Cr . This theorem 4 gives the clear relation description between them. That is, CR is a sub-set of Cr . Next sub-section will give the necessary and sufficient condition about CR = Cr . 24.3.3 Equivalence Condition For CR = Cr and Attribute Reduction Method Generally speaking, CR and Cr are different. For example, in an information system described as table 1, Table 24.1. An information system of CR / Cr

Xi X2 X3

ai

a2

as

1 2 2

2 1 2

2 2 1

we have: Cfl = {0,{ai},{a2},{«3},>l},

Cr.=ViA)

This section will give the equivalence condition about CR = Cr . Theorem 5. The necessary and sufficient condition for CR = Cr is: ifB^C, RB ^ Re- Where

B= \J Ei

then

C= D Ei

and a, /? C A/r, which satisfies a ^ (3. Proof. We only need to prove the following equivalence proposition: CR^ Cr <=> There exist a, /3 C A/r, which are satisfied a^f3, let B=

[j Eieoc

Ei

C=

[j

Ei

Eie(3

When B^C,RB = RC' Sufficiency. If there exist a,P C A/r, B and C are defined as the proposition, then B,C G C^.. At the same time, when B ^ C, RB — Re is right. Then, RBUC = RB^RC = RB ^ Rc> So, we get C{R){B) D B U C D B, or C{R){C) D BUC D C . That is to say, C{R){B) ^ B ox C{R){C) ^ C . That

24 Attribute Reduction Based on Attribute Set

323

means, B,C ^ CR. Therefore, we have Cn^Cr . Necessity. If CR i^ Cr , then, there exist a B , which satisfies C{r){B) = B,C{R){B) j^ B , As B e Cr , from property (6) of theorem 3, B can be described as

B=

[j Ei EieoL

where a C A/r . Put C = C{R){B). Since C{R) is a closed operator, C = C{R){C). So, C € C/? C C^ . And then, C can be described as

C= [j Ei f3 C A/r. It is obvious that a 7^ /?, 5 7^ C , but C = C{R){B), RB = Rc^ Analyzing theorem 5 in detail, we can naturally find and prove an attribute reduction method using the equivalence condition when CR = Cr . The method is as follows: selecting an element (an attribute) in each Ei arbitrarily, we can obtain a set containing such attributes, which is just the attribute reduction of information system {U, A, F) . If there exists some Ei , which is a one-point set, then its element must be a kernel attribute. We show this method by theorem 6: Theorem 6, Suppose (U^A^F) is an information system. When CR = Cr , let A/r = {Eu...,En}.Then, B is the reduction of A ^ B = {ei,..., €„}, where Ci e Ei, {i = 1,2,..., n). If there is a Ek which satisfies \Ek\ — 1, then its element is kernel attribute. We give two examples about attribute reduction using theorem 6. In these examples, we define equivalence relations R = {{B, C); RB = Re}, r = {({b}^ {c}); R{b} = Example 1. The information system is shown in table 2. The attribute set A = {ai, 02,03}, the object set f7 = {xi,X2,X3,a:4} . Table 24.2. An information system of example I

Xl X2 X3 X4

ai

a2

as

I 2 3 1

2 1 1 1

1 2 2 2

We can easily get the following results: CR = Cr = {{ai}, {a2, as}, A, 0}

324

Ling Wei and Wenxiu Zhang

A/r = {{ai},{a2,as}} Therefore, the attribute reduction of this information system is {ai, 02} or {ai ,03}, and ai is kernel attribute. Example 2. The information system is shown in table 3. We can easily get the folTable 24.3. An information system of example 2 ai a2 as XI

1

1

0^2

1

1

1 2

xa 0:4

1 2

2 2

2 2

lowing results:

A/r = {{ai},{a2,a3}} Therefore, the attribute reduction of this information system is A,

24.4 Conclusion In this paper, we study the following problems in information system {U,A,F)\ the connection between equivalence relation r and R which are defined on attribute set A and its power set V{A) separately; the sufficient and necessary condition for CR = Cr\ one method of attribute reduction when CR = Cr. Using these results, we can simplify the knowledge discovery in information system by transfer research on attribute set's power set into attribute set itself. In addition, our study has no restriction for any attribute set S . If S is a condition attribute set, we will get the knowledge of raw information system; if JB is a decision attribute set, we will get the knowledge of decision system. All the results we get in this paper can be a theoretical base for knowledge discovery in information systems. Of course, we do believe that there may exist more useful methods and theories which can improve the knowledge discovery in information systems. We hope this paper can inspire further researches in knowledge discovery in information system.

References 1. U. Fayyad, G. Piatetsky-Shapiro, and P.Smyth(1996) From Data Mining to Knowledge Discovery: An Overview. In: U. Fayyad, G. Piatetsky-Shapiro, P.Smyth, R. Uthurusamy (eds) Advances in knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass.

24 Attribute Reduction Based on Attribute Set

325

2. Pawlak Z (1991). Rough sets: Theoretical Aspects of Reasoning about Data. BostonrKluwer Academic Publishers, Dordrecht. 3. Polkowski L, Lin T Y, Tsumoto S (Eds). Rough Set Methods and applications: New Developments in Knowledge Discovery in Information Systems. Heidelberg: PhysicaVerlag. 4. Siddhartha B, Olivier V. Pictet, Gilles Z (2002) Knowledge-intensive genetic discovery in foreign exchange markets. IEEE Transaction, Evolutionary Computation. 6(2): 169-181. 5. Engelbert Mephu Nguifo, Vincent Duquenne, Michel Liquiere (2003) Introduction - Concept Lattice-Based Theory, Methods and Tools for Knowledge Discovery in Databases: Applications. Applied Artificial Intelligence. 17(3): 177-180. 6. K. J. Adams, D. A. Bell, Liam P. Maguire, J. McGregor (2003) Knowledge Discovery from Decision Tables by the Use of Multiple-Valued Logic. Artificial Intelligence Review. 19(2): 153-176. 7. U. Fayyad (1996) Data Mining and Knowledge Discovery: Making Sense Out of Data. IEEE Expert. 20-25. 8. T. Oyama, K. Kitano, Kenji Satou, T (2002) Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics. 18(5):705-714. 9. Kay Chen Tan, Q. Yu, C. M. Heng, T. H. Lee (2003) Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine. 27(2): 129-154. 10. Carolina Silva, Cirano lochpe, Paulo Engel (2003) Using Knowledge Discovery to Identify Analysis Patterns for Geographic Database. ICEIS (2):359-364. 11. Kryszkiewicz M, Rybinski H (1993) Finding reducts in composed information systems. In: Ziarko W (eds) Rough Sets, Fuzzy Sets and Knowledge Discovery. 12. Wroblewski J (1995) Finding minimal reducts using genetic algorithms. In: Wang P P(eds) Proceedings of the International Workshop on Rough Sets Soft Computing at Second Annual Joint Conference on Information Science (JCIS'95), Wrighsville Beach, NC. 13. James F. Peters, Andrzej Skowron (2002) A rough set approach to knowledge discovery. International Journal Intelligence System. 17(2): 109-112. 14. Yasdi R (1995) Combining Rough Sets learning and neural learning method to deal with uncertain and imprecise information. Neurocomputing. 7(l):61-84. 15. Slowinski, R (1995) Rough set approach to decision analysis. AI Expert. 5:19-25. 16. Aijun A et al (1996) Discovering rules for water demand prediction - an enhanced rough set approach. Engineering Applications of Artificial Intelligence. 9(6):645-653. 17. Miroslav Novotny (1998) Dependence Spaces of Information Systems. In: E. Ortowska (eds) Incomplete Informations: Rough Set Analysis. Physica-Verlag.

25

Query Cost Model Constructed and Analyzed in a Dynamic Environment Zhining Liao, Hui Wang, David Glass, and Gongde Quo School of Computing and Mathematics, Faculty of Engineering, University of Ulster, BT37 OQB, UK {Z.Liao, H.Wang, Dh.Glass, G.Guo}@ulster.ac.uk

Summary. Query processing over the Internet involving multiple data sources has been proven one of the most difficult and important problems in modem e-data sharing society. In this new data processing environment, three major factors affect the cost of a query: network congestion situation, server states (server workload), and data/query complexity. In this paper, we construct cost models for estimating the cost of query and split query cost into data searched cost and data transmitted cost. We also study how to capture the changes of the query system in order to update the cost models whenever it needs ,and use a real discrete fourier transform method tofilterthe noise in the main trend of the network and the query system for the more accurate cost models. So we can choose the best query plan according to the updated cost model. Key words: query optimization, cost model, data processing

25.1 Introduction To meet the growing needs for sharing pre-existing data sources over the Internet, data integration from a multitude of autonomous data sources has recently been a research focus. Query optimization is a major stage in data integration over the Internet. It requires the estimated costs of possible query plans to select the best query plan in terms of costs. The key challenges arise due to the dynamics and unpredictability of the workloads of both the network and the autonomy of remote data sources. These sources may not provide availability metrics for accurate cost estimation. Therefore, some methods of deriving cost models for autonomous data sources at a global level are significantly important in order to accurately process query. Several methods proposed [1, 3, 7-13] assume that the system environment (both the network itself and the remote servers) does not change significantly over time. Therefore, the impact of the two factors is not explicitly involved in query cost estimation. The significance of recognizing the impact of the overall system contention states has been studied in two separate research projects recently. In [11-13], the effects of the workload of a server on the cost of a query are investigated and a method to decide the contention states of a server is developed. Cost models derived through sampling queries for

328

Zhining Liao, Hui Wang, David Glass, and Gongde Quo

each contention state are also constructed for estimating the costs of further queries. This research has been concentrated on the workload of a server only and the experiments have been carried out in a peer-to-peer environment, and it is assumed that the network is steady and does not consume much of the query cost. In [5, 9], on the other hand, the importance of coping with the dynamic network environment is addressed. The effects of the network factor are investigated and a cost estimation model is proposed that measures the costs of the same query in different network situations, e.g., different times of the day. The main drawbacks of the method are twofold. First of all, the dimension of Time has the minimum scale of one hour. If a remote source is highly dynamic, hourly intervals may be too large to reflect the change of the server. Secondly, this approach considers only the quantity of data to be transferred and does not consider the variety of queries using different operators. The complexity of a query can affect its response time significantly even under the same workload of the network. In [4], We combine two factors (network congestion situation, server contention states) together as system contention states to discuss the effect of system contention states on the cost of a query. In this paper, we construct the estimated cost model and split the query cost into the cost of the network and the cost of servers. We also propose a model to update the cost model to adapt the changing environment.

25.2 System contention states and cost models 25.2.1 Grouping costs of sample queries using clustering techniques To establish the relationship between the contention states of a system and the cost of a query, a sample query is carefully designed. This query is of reasonable complexity and can be evaluated by the remote server quickly. It is tested on the remote server at a fixed time interval over 24 hours period. The costs of the query (T^) at these time points (U) are collected. To determine an appropriate set of contention states for a dynamic environment, an algorithm often used for multi-dimensional data clustering [6] is modified to cluster one-dimensional data (i.e., the cost of the sample query in terms of time spent by the server). The key idea underlying the algorithm is to place each data object (the cost of a sample query) in its own cluster initially and then gradually merge clusters into larger ones until a desired number of clusters have been found. To determine appropriate cost models for system contention states, we carry out multiple regression analysis to build cost formulae [2, 11]. The multiple regression process allows us to establish a statistical relationship between the costs of queries and the relevant contributing (explanatory) variables, as listed below. The more details can be found in [4]. 25.2.2 Multiple linear regression cost models There are mainly five factors affecting the cost of a query in the wide area environment that we have mentioned as follows: 1. How many tuples in an operand table. 2. How many tuples in the result table. 3. The cardinality of an intermediate table.

25 Query Cost Model Constructed and Analyzed in a Dynamic Environment

329

4. The tuple length of the result table. 5. Contention about the system, including system factors, such as CPU, I/O, or data items, and network factors, such as, network speed and data volume. Other factors (i.e. the tuple length of an operand) are less important in wide area environment and some other factors (i.e. the physical size, index of database) are not available in most cases in query processing systems over the Internet, since every data source is autonomous. As we know, the factors affecting the cost of query, we can construct the cost model as following. Let Xi, X2,... ,Xp be p explanatory variables, which correspond to the factors we discussed above in query processing. They do not have to represent different independent variables. It is allowed, for example, that Xs = Xi * X2. The response (dependent) variable Y (which is the query cost in this paper) tends to vary in a systematic way with the explanatory variables Xi. For an unary query, we let Ru be the operand table, Nu be the cardinalities of the operand table, Nr be the cardinalities of the result table, Lr is the tuple length of the result table. Then LNr = Nr * Lr is the data volume that is to be transferred to the user. The cost estimation formula for unary queries is: Y = Bo-hBi'Nu-\-B2'Nr-\-Bs' LNr (25.1) For a join query, we let Rui be one of the operand tables and Ru2 be the other operand table, Nui and Nu2 be its cardinalities of the operand tables, Nr be the cardinality of the result table, Lr be the tuple length of the result table. Then LNr is the data volume of the result table that is to be transferred to the user. Y = Bo-\-Bi'Nui-\-B2' Nu2 -\-Bs-Nr-\-B4' LNr (25.2)

25,3 The cost of query in the wide area environment As we know, the network delay, the server states and the data volume affect the cost of a query. The estimated cost (of time) of a query plan is then divided into three parts. TotalTime = {Timeu -f Time2i + Timesi);i = 1 , . . . , A; (25.3) TotalTime is the time from a query is submitted until the user gets all of the data for the query. Timei is the time to transmit the query from query agent to server over the Internet. Time2 is the time taken by a remote data source to perform a subquery. Times is the time to transfer all the resulting data from a remote data source to the query agent. Since a global query is decomposed into many sub-queries and some of these sub-queries can be performed in parallel, variable k is the (estimated) number of sub-queries that will be carried out in sequence. The total time taken to answer this global query is the sum of times taken by each sequential sub-query. Here, we focus on the part of timei and times. In [4], we proposed a sample query method to establish the relationship between a system contention state and the cost of a query. The process of the sample query method is to send a query to a server from a query agent and record the relative time points. In here, we extend the method. We recorded several time points (Ts, T^/, Tri) and the number of data packages that the query agent sent and received (Numg, Numr). Tg : the time point to send the query out from query agent; Trj: the time point when query agent receives the first data package; Tri: the time point when the last data package is received by query agent; Nurns', the number of data packages to be sent to the server when the agent

330

Zhining Liao, Hui Wang, David Glass, and Gongde Quo

submits a query. Nurrir'. the number of data packages to be received by query agent for the data of query. Query agent records Tg, T^/, Tri. Then we can get the following formulae and calculate the cost of single data package {Cg) that is transmitted over the Internet (In here, for simple reason we use unary query as an example). TotalTime = TH - Ts (25.4) Times = Tri - Trf + (T^ - Trf)/{Nurrir Cs = Times/{Nurrir

- 1)

- 1)

Timei = Nurris • Cs

(25.5) (25.6) (25.7)

Time2 = TotalTime - Timei - Times (25.8) From formulae 25.4 to 25.8, we can get the cost of a query from the sever cost and the network cost. So when we use the sample query method, we can get the network cost and calculate the network speed during the time when the sample query was submitted and get the server's contention states by the method used in [4]. Then we can make a more accurate estimate of the cost of query by estimating the cost of server and the network respectively.

25.4 Updating query cost models in the wide area environment After we construct the cost models for the query system, we can estimate the query cost accurately. But in the wide area environment, there are many factors to make the cost models changed. Although such an environment may not change dramatically during the execution of one query, the costs of the same query executed at different times in the environment can be significantly different. The query optimizer may not choose the best query plan for a query, which would cause a serious performance problem. In this paper, we tackle this problem by changing a cost model and analyze two aspects of query optimizer: the changes of the network and the changes of server contention states to capture the changing environment so that the cost model can be kept up-to-date all the time. 25.4.1 Updating the estimated formula for the cost of network Most of situation the speed of network changes irregularly and is effected by many kinds of factors. It is difficult to estimate the network states using a single state when the query is submitted to server. To cope with this problem, the method of Real Discrete Fourier Transform is proposed to filter noise and keep the main trend of network speed. We can estimate the server states base on the main trend of network speed. It is assumed that the recorded raw data about the speed of the network, Nspeed{^X is composed additively of a long-term signal Nlspeed{^) and a noise n(t) , that is Nspeed{t) = Nlspeed{t) + n(t) . If wc are able to reduce n(t) form Nspeed{t), then we can obtain the main trend of the network speed. In our study, the Fourier transform is applied to the raw graph to identify the long-term signal NlgpeediO as it is constructed mainly from waves with low frequency (slow changes over time), while the noise signal is constructed from waves with high frequency (fast changes over time).

25 Query Cost Model Constructed and Analyzed in a Dynamic Environment

331

The formulae are as follows. The n-point (n = power of 2) Real Discrete Fourier Transform of a signal x= [xt], t-Oy 1,..., n-l is defined to be a sequence x of n/2+1 complex numbers ,/=ft I,..., n/2, given by Xf=Rf -\-iIf in which, Rf = Ilt=o XtCos{2nft/n)/ / = Ylt=o xtsin{27rft/n),t

^^5.9) = 0,...,n/2

Here i is the imaginary unit. The signal x can be recovered by the inverse transform with the reduction of noise: xt = l/n{{Ro -h Rn/2COs{7rt))/2 + J^tH'^ XtCos{27rft/n) -^Y:'^il~^Xtsin{27rft/n)),t = 0,,..,n/2 As we observe, although the changes speed of the network are irregular in a short time, but the main trend of network speed changes regularly in the period of a week. The speed of the network from Monday to Friday are much lower than the speed of weekend. The peak time (8:00am to 6:00pm) is much busier than off peak time. If the estimated cost of network is significantly different to the observed cost, we can put the data of observed cost in the data set of the cost of network at proper time points and rebuild the formula for estimating cost of the network by the formula 25.9 and 25.10. So we can capture the changes of the network and get the new cost formula for the network. 25.4.2 Updating the cost model for the server cost As we know, many factors can affect server contention states, those including the data volume, the number of visitors, physical data distribution/organization on disk, local database conceptual/physical schemas, etc. These factors usually change little by little, and significant change may be accumulated after certain period of time (e.g., couple of days, weeks, or months). To deal with this problem, one direct way is to rebuild the cost model by the query sampling method [4] after obtaining the new observed data of query cost. Two drawbacks still exist in this method. First of all, the relatively high overhead is caused by frequently rebuilding the cost model. The other problem is caused by some unpredicted factors such as temporarily interruption of servers. Network congestion, some of routings fail in the line of data transmitted or virus attacking in the wide area environment. The kind of factors may appear randomly and lead to the very high cost of query . When we record the high cost of query, the kind of data are the noise in the data set of cost of query. If the data set with noise is used to rebuild the cost model, the query optimizer will choose an inefficient execution plan for a query based on the rebuilt cost model and the effect of noise will last long time. To deal with the first problem, we record all sample data points since the last time rebuilding cost model and use them to update the cost model all at once to reduce the updating overhead. We also use the current cost model to estimate the sample queries and compare the observed data of sample queries to the estimated data. If the error rate is high (e.g., the number of queries with a large error of cost estimation is beyond a threshold), we can use the observed data to update the cost model to capture the changes of environment. For secondly problem, we pre-process the data set of cost of query before the data set is used to rebuild the new cost model. There is a clear advantage if we pre-process the raw data and work

332

Zhining Liao, Hui Wang, David Glass, and Gongde Guo

on the pre-processed information. In this paper, we use a real Fourier transform to filter the noise and keep the main trend of the changes of environment. It is assumed that the data set of the cost of the sample query, Cs(t), is composed of a long-term signal Csl(t) and a noise n(t), that is Cs(t) = Cls(t) + n(t). If we are able to reduce n(t) from Cs(t), the main trend of the changes of environment can be obtained. The cost model rebuilt by the preprocessed data set is more accurate to reflect the query system. So to use the proposed techniques is quite promising in maintaining accurate cost models efficiently for the changing environment.

25.5 Conclusions In this paper, we construct the cost models for estimating the cost of query, study the behavior of the wide area network. So, we can estimate the cost of query through separating the estimated query cost into data searching cost and data transmitted cost in the wide area environment. We also study how to capture the changes of the network and the query system, and use a real discrete fourier transform method to filter the noisy in the main trend of the network and the query system. Then we use the preprocessed data sets to rebuild the cost models of the network and query system. We can keep the cost models to update whenever it needs. We can be more accurate to choose the best query plan according to the new situation.

References 1. Adali S.,Candan K.S., Papakonstantinou Y., Subrahmanian V.S, Query caching and optimization in distributed mediator systems, Proc. of ACM SIGMOD'96, 1996, pp.l37-I48 2. Chatterjee S., Price ^..Regression Analysis by Example John Wiley & Sons, 1991 3. Du W, Query optimization in heterogeneous DBMS, Proc. of VLDB'92, 1992, pp.l03119 4. Liu W., Liao Z., Jun H., Query cost estimation through remote server analysis over the Internet, Proc. Of Wr03, 2003, pp.345-355 5. Gruser J.R., Raschid L, Zadorozhny V., Zhan T, Learning response time for web-sources using query feedback and application in query optimization, VLDB Journal, 9(1), 2000, pp. 18-37 6. Muralikrishna M., Dewitt D.J., Equi-depth histograms for estimating selectivity factors for multi-dimensional queries, Proc. of SIGMOD'88,1988, pp.28-36. 7. Roth M.T., Ozcan R, Haas L.M, Cost models DO matter: providing cost information for diverse data sources in a federated system, Proc. of VLDB'99,1999, pp.599-610. 8. Ling Y., Sun W, A supplement to sampling-based methods for query size estimation in a database system SIGMOD Record, 21(4), 1992, pp.12-15 9. Zadorozhny V., Raschid L., Zhan T., Bright L, Validating an Access Cost Model for Wide Area Applications. Cooperative Information Systems, Cooperative Information Systems, Vol9,2001,pp.371-385 10. Zhu Q., Larson, P. A, Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System, Proc. of ICDE'94, 1994, pp. 144-153 11. Zhu Q., Larson P., Building Regression Cost Models for Multidatabase Systems, Proc.PDIS'96, 1996, pp. 220-231. 12. Zhu Q., Motheramgari S., Sun Y, Cost estimation for large queries via fractional analysisand probabilistic approach in dynamic multidatabase environments, Proc. of DEXA'OO, 2000, pp.509-525 13. Zhu Q., Motheramgari S., Sun Y, Developing cost models with qualitative variables for dynamic multidatabase environments, Proc. of ICDE'OO, 2000.

26 The Efficiency of the Rules' Classification Based on the Cluster Analysis Method and Salton's Method Agnieszka Nowak and Alicja Wakulicz-Deja University of Silesia, Institute of Computer Science B^dzinska 39, 41-200 Sosnowiec, Poland [email protected] [email protected]

26.1 Introduction The aim of this paper is the comparison of the document classification method based on distance analysis (cluster analysis) and based on coefficients of similarity (Salton's method). We are interested if the received groups of documents are identical (similar) and if new cases are similarly classificated. Getting the positive results will give us the basis to use the method based on coefficients of similarity to classify the set of rules. Classification based on the coefficient of similarity is used to create documentation structural based and retrieval in search engines. The cluster analysis is used to facts and rules classification in knowledge base. It seems that using the mechanism based on the coefficient of similarity can be very comfortable to inference's process. Our paper includes the first issue, it means the comparison of the rules's process classification based on the cluster analysis method and Salton's method.

26.2 The classification as the solution to recognize The classification takes the task of recognize the memberships the various type of objects to some classes. Any objects is describe by the set of values attributes. It gives us the possibilities of making calculation if the objects belongs to specific class. Assume D is the set of objects, which should be recognize. Then, on this set, it will be able to define the equivalence class K C D x D, so called the classification. It defines the division of the set D on to the class collection of equivalence {Di}. [1] 26.2.1 The structure of space attributes The measurement of the objects attributes both: model and new ones it is the initial element of any algorithm of classification. It leads to changing all objects d into

334

Agnieszka Nowak and Alicja Wakulicz-Deja

the point in the X-space. This space is treated as the Euchdean space, where each documents are the n- dimensional vectors x = {xi, 0:2, Xn} G X, From among all classification method's we pay attention to minimal distance method's, where is required choosing one of distance of measurements. It can be any imitation q : X X X -^ R, where for all vectors x G X ( l , 2 , ) following foundations are truly [3]:

g(x^,x^)
(26.1)

26.2.2 The method of distance and similarity estimate The cluster analysis method studies the distance between objects, however Salton's method the similarity. There is the very close relationship between them: the smaller distance between the objects the bigger degree of their similarity (and vice versa). The literature gives a lot of different similarity measures and distance. Mostly we use the following measures:[l]. Euclidean measure: d{x,y)= \Y,{xi-yiY

(26.2)

p(x, y) =

(26.3)

Cosine measure: ^^'I'y'

where: vectors x and y are compared in n-dimensional space. 26.2.3 The types of classification There are a lot of method's of objects classifications, from among which the most important and most general are: graphical, hierarchical and k-optimal methods. The most popular method of all is the /[:-means method. That is why this method was chosen to rules's classification in this paper.

26.3 The k-means method In literature we can find many version of k-means method. The simples version of this method tells us to choose at random k objects from the set as the centers of k groups and to assign the rest of objects to them. In each iteration the number of groups is invariable and only the composition of the group can be change. In the front of each group there is one group's representative (the center), which is the center of gravity the document's set of vectors[l]. The k-means algorithm:

26 TheEfficiency of the Rules'Classification...

335

1. For every cluster its center is counted. 2. For every considered objects the closest center is created. 3. If the stabilization was not reached yet, the objects move boundary strip with clusters, and new corrected centroides are marked. 4. It repeats oneself steps 2 and 3 until stabilization is not achievement (the objects were not in the best groups). Our goal is to get minimum distance inside the concentrations, maximum - between concentrations.

26.4 The Salton's method Initially we suppose that all documents are loose documents. Each of this document will be subjected to the test of thickness to check how many documents are situated in it's neighborhood. Over Ni of documents should have the coefficient correlation with tested document, higher than some parameter pi, and more than A^2 documents higher than p2. If the document passed the test of thickness, we calculate the minimal and maximal groups. Then we choose the Pmin as the minimal value of coefficient correlation. The group is created by documents, which have higher correlation with the central element than Pmin- Next, for each group, we create centroid vector. It consist of the attributes describing the objects of given groups. This centroid vector is compared with all set. To create the final group, threshold parameters are calculated again. Combined process is repeated for all unconnected elements[2].

26.5 The description of stages analysis The analysis introduced in this paper was carried out on testing set of 24 different length rules. The aim of it was formation two groups of documents and selection of classification vectors for created groups. The aim is an easy classification of new cases and the selection of the final decision of attribute value. The groups of combined objects will be created and the retrieval process will be carried out only on special groups. 26.5.1 The guidelines To solve many problems the object will be normalized till the solid dimension. In this situation, the values of each attribute are numbered one by one in turn form 1 to n: 1 = 1 ^ 5 = 1,8 = 4(00001004] The first stage of classification were conditional parts of rules. Each rules was replaced with 8 dimensional solid length vector. On each position of this vector there is the value of given attribute, if it is in conditional parts of given rule. If given rule doesn't use given attribute there is zero value on this position. In result we get the data set contains the same structured, which will be easy to analyze. We want to create the groups of similar objects (documents) with the representative in front of them. This objects are recorded in the new cases's table. Except the conditional part

336

Agnieszka Nowak and Alicja Wakulicz-Deja

of given rule, there is also weight of it. If the objects is new it is added to the new cases's table with the weight equal one, however the weight of existing cases are suitably increased.

I0I0I0I0I1I0I0I4 1 2I

__whlnlnl NJ U U il Inlnlnin | U U U U 1 Z0 1

In! 1 Inl 1 In In In In 1 "^1

fcH

0 O0 0 H 0 0

0

0|110|1 |0|u|0|u 1 3fz^;;^ ^|i|u u|U|U|U|U|U | ^ | ^^^|0|0|0|0|0|0|0| 1 1

Fig. 26.1. The structure of new cases table for under the examination gathering of rules

26.5.2 The cluster analysis method We choose k clusters at random: ki ^ 5 = 1 , 8 = 4 4=> [00001004] fc2
X18, X20, ^ 2 1 , ^23,3:24}

In following iterations we check the stability of this structure. In order to this we calculate again the distance of each object to the group's centers. As new group's centers we choose so called centers of gravity. They are calculated according to the method of average gravity link (average connections weighed). Then the center object so called centroid Ci, is calculated as the average distance all objects till the given cluster, in accordance with the following measure:

Ci

N

l
(26.4)

where: N - the number of objects, Oji - the values of object Oj at the i-posistion. In this way we got new group's centers, which we are going to use to form a new group. These are: ki = [00001003] k2 = [00010000]. After next iteration it turned out, that groups didn't change their state. It means that in that moment, we

26 TheEfficiency of the Rules'Classification...

337

can stop building further cluster analysis. If none of objects didn't moved to other group, then group's centers didn't change themselves so the distances of separated objects hasn't changed either. If we would like to aim at the best results, we should choose completely new group's centers and compare with the results we have aimed so far. Finally, we should choose such classifications, which made possibility to aim both goals of cluster analysis. 26.5.3 Salton's method In offered Rocchio's algorithm, Salton required certain estabilished parameters (so called the test of thickness's parameters): pi > 0.85, Ni = 3, p2 > 0-7, N2 = 5.Additionally, we choose one object Xc at random, as the potential center of the first group. Xci = 5 = 1,8 = 4(00001004] Round this object we will try to form a new group. We calculate the similarity of each object to the chosen one, according to cosine measure. The value of this measure belongs to the interval [0,1], where the value closer 1 defines that two objects are very similar to each other. Value closer 0 means the reverse situation. Received coefficients are set decreasing. On the top of hierarchy, there are objects, which similarity to the group's centers is the largest. Chosen objects passed the test of thickness, so now, on the bases of sprit group we estabilish Pmin- For so formed groups we create the centroid vector as the gravity center of vectors, which belong to the max group. It turned out that, the vector would have the same form as vector created in cluster analysis method's. Now it is necessary to test the thickness for centroid and finally estabilished the size of group. We have to calculate the similarity of initial group's objects with estabilish Xc = [00001003]. It turned out that mi = m2, so we have to calculate Pmin = 0,86 again and all objects, which have the coefficient of similarity from Xc higher or equal Pmin^ will be added to the first group. Therefore, after two iterations there are following structures: The centroid Ci = [00001003] with the group Xi = {xi, xs, 0:4,0:19, a:22}The centroid C2 = [01010000] with the group X2 = {xe, XQ, xio, x n , X12,2:13, 3^14? ^21, ^23}. The third group of loose documents L = X3 = {x2,xs,X7,X8,xi5, X16, X i 7 , X18, X20, 2:24}.

26.6 The received results - Final conclusions So formed group's structure let us classify further new cases. We intent to explore the similarity of new objects in comparison with objects have been classified and placed in cases's table so far. If in cases's table, there is the same vector, which are analyzing, then we have to increase the weight for this case. Additionally, we check uniqueness of decision (for this case). If there is only one decision, the weight for this case aprioppriately increased. In the opposite situation we have to aprioppriately increased the weight for each cases, or for this one, which has got the biggest frequency. If the case is new, it is added at the end to the table with the weight equal

338

Agnieszka Nowak and Alicja Wakulicz-Deja

1. The result of classification process the same set of objects, two rules's groups were created. The state of these groups are very similar, it means that the same objects belong to given group, and even the centroides are the same. So there is one conclusion, that both methods let us achieve the same result, only initial parameters both algorithm are different. No doubt both method's except similarities have also differences: 1. k-means method generate all groups in each steps of algorithm and the number doesn't changed. Whereas, the Salton's method in each iteration generates only one group. 2. k-means method for each object choose the cluster, with the least distance, while Salton's method let the object, which is similar to more than one group to be in these in the same time. It can be also classified in loose document's group. According to this statement it is really hard to estimate, which method is better. In connection with that we can ask the question how to measure the efficiency of these received classifications. On this stage we are able to say that k-means method with the k-medoid modification(which will be the object of further researches), is more useful than Salton's method. This method calculate Total Cost each classification, as the sum all distances of objects to their centers. It follows that the classification which gained the least cost is the best possible solution. In our researches as a model method we will be using cluster analysis method to create groups.

References 1. E. Gatnar: Symboliczne metody klasyfikacji danych, W-wa 1998, PWN [Polish] 2. G. Salton: SMART - automatyczny system wyszukiwania informacjU W-wa 1975, WNT [Polish] 3. R. Tadeusiewicz, M. Flasiiiski: Rozpoznawanie obrazow, W-wa, PWN, [Polish]

27 Extracting Minimal Templates in a Decision Table Barbara Marszat-Paszek and Piotr Paszek Institute of Computer Science, University of Silesia B^zinska 39,41-200 Sosnowiec, Poland [email protected]

Summary. In 1991 there were defined basic functions of the evidence theory based on the concepts of the rough set theory. In this paper we use these functions in specifying minimal templates of decision tables. The problem offindingsuch templates is NP-hard. Hence, we propose some heuristics based on genetic algorithms. Key words: templates, rough sets, evidence theory

27.1 Basic Concepts of the Rough Set Theory In 1982 Z. Pawlak created the rough set theory as an innovative mathematical tool for describing knowledge, including the uncertain and inexact knowledge [2]. In this theory knowledge is based on possibility (capability) of classifying objects. The objects may be for instance real objects, statements, abstract concepts, or processes. We recall some basic definitions of the rough set theory [2]. Definition 1. Information system A pair A = (C/, ^4) will be called information system, where: U -is a nonempty, finite set called the universe; A-is a nonempty, finite set of attributes. Each attribute a e Ais a function, where Va - is a set of values of attribute a and is called the domain of the attribute. One of the basic concepts of the rough set theory is the indiscemibility relation. It can be defined for a given information system as follows. Definition 2. Indiscemibility relation The B-indiscemibility relation IND{B)for IND{B)

= {{x,y) eUxU:

B C Ais defined by V a{x) = a{y)}. aeB

340

Barbara Marszal-Paszek and Piotr Paszek

The indiscemibility relation is an equivalence relation. The equivalence class of the relation IND{B) defined by an object x is the set of all objects J5-indiscemible with X. By [X]B we denote the equivalence class of x, i.e., the set [X]B = {yeU:x

IND{B)

y}.

Now we can introduce definitions of the upper approximation and the lower approximation of a set, that make it possible to define rough concepts. Definition 3. Upper and lower approximations of a set For any X CU and B C Athe B-lower approximation B_X ofX and the B-upper approximation BX ofX are defined by BX

= {XGU:

BX =

[X]B C

X},

{xeU:[x]BnX^0},

The set BNB{X)

=

BX\BX

is called B-boundary region ofX. Definition 4. Positive region and negative region of a set For each X CU and B C Athe B-positive region of set X is defined by POSB{X)

=

BX.

B-negative region of set X is the complement (within the universe) of the B-positive one, Le.y NEGB{X)

=

U\BX.

Any decision table is a specific information system. Definition 5. Decision Table Let A = {U,A) be an information system and let A = C U D where C, D are nonempty, disjoint subsets of A. The set C is called the set of condition attributes and the set D is called the set of decision attributes. The tuple A = (U^A^C^D) is referred as a decision table.

27.2 Belief and Plausibility Functions in Rough Sets In 1991 A. Skowron and J. Grzymala-Busse proposed a clear way of combining the rough set theory [2] and the evidence theory [3]. In this section we recall from [4] the basic definitions and theorems that are indispensable for further considerations.

27 Extracting Minimal Templates in a Decision Table

341

Let A = ([/, Au{d}) be a decision table, where d ^ Aisa. distinguished attribute called the decision. The decision d determines a partition of universe U into decision classes X i , . . . , X^(d), where r{d) = \{k : 3xeu d{x) = A;}| (means the number of different values of a decision attribute) is called the rank ofd. Let App.ClassA{d) be a family of sets defined by

App,ClassA{d)

= {AXu • • •, AXr(d)} U {BdA{0) : 6 e OA and \e\ > 1}

where

5d^(0) = f ) B A ^ ^ ( X , ) n n - ^ ^ A ( X i )

and

OA = {I,.. • ,r{d)}.

Now we defined a function of the form of F^ : OA —> App.ClassA (d) by (AXi for F^(^) = }
0 = {i}, i e { l , . . . , r ( d ) } e = ID 9 C OA, \0\ > 1

and the standard basic probability assignment TUA • 2^^ —> R+, is defined [4] by mA{0)=^-^^

for any

OCOA-

Theorem 1. For any 0 C OA the following equality holds [4]: \A u Xi\ BelA{0) = '^' It shows that BelA{0) (a behef function) is the ratio of the number of objects in U that may certainly be classified to the union Ui^gXi to the number of all objects inU. Corollary 1. For any 6 C OA the following equality holds [4]: \A U Xi\ PIA{0)

=

'^'

\u\

Hence, the plausibility function value for 6 is the ratio of the number of objects that may probably classified to the union Ui^gXi to the number of all objects in U.

342

Barbara Marszal-Paszek and Piotr Paszek

27.3 Templates in a Decision Table A template T in a decision table A = {U, A, C,d) is any sequence vi,... ,Vn, where Vi G Vai U {*} and ^ = { a i , . . . , an}. If the symbol * appears in a given template it means that the value of the marked attribute is not restricted by the template. We say that a given object matches a given template T = (t^i,..., Vn) if ai{x) = vi, for each i such that Vi ^ ^. The size of T is equal to the number of elements of T different from *. Minimal Template Problem For a given decision table we would like to find minimal templates (i.e., having possibly the smallest number of positions different from *) that define sets with some relevant properties of functions Bel and PI over the decision table. Let A be the decision table. By A T we denote the restrction of A to the template T, i.e., UT-{X G U : ai{x) — Vi, for alH G { 1 , . . . , n} such that Vi ^ *} and AT is the set of attributes of A restricted to UT. We consider the following problem: Minimal Template Problem (MTP) Input: decision table A; thresholds e^K e (0,1) and a natural number n. Output: minimal (with respect to the length) templates T for which there exists a set ^ Q ^AT ^ith at most n elements satisfying the following conditions: \PIAA^)

- BelAAO)\ < e

|P/A^(^)|>1-6

^

> if.

(27.1) (27.2) (27.3)

The support of the template T should be sufficiently large (condition (27.3)) and the set of decisions in A T should be well approximated, i.e., for some possible small set 0 C ©A^ the union |J^^^ Xi of decision classes Xi (i e 6) of A T is approximated with the high quality described by conditions (27.1)-(27.2). The condition (27.1) states that the relative size of the boundary region of [J^^Q Xi is sufficiently small and the condition (27.2) is expressing the fact that the the relative size of UT — [Ji^e Xi is sufficiently small. Any such template T can be also interpreted as a decision rule (association rule [1]) of the form aT —^ d £ 0 where ar = /\{ai = Vi : Vi j^ *}. Such rules can be interesting in case where there are no rules (with the right hand side described by a single decision value) in a given decision table that have satisfactory support (see inequality (27.3)). Then we search for rules having a sufficiently large support with respect to some minimal sets 0 of decision values. To extract such templates we use genetic algorithms.

27 Extracting Minimal Templates in a Decision Table

343

The templates can be used in decomposition of large data sets into smaller ones that are feasible for analysis by the existing rough set algorithms. Notice that the result of decomposition will be a forest of decision trees rather than a single decision tree. For applications some more advanced versions of the minimal template problem can be considered such as the critical pattern problem. Such critical patterns are defined by adding one more requirement to templates considered previously. This requirement states that any extension of the extracted pattern T has also the analogous properties as the template T.

27.4 Templates defined by Approximate Reducts with Respect to Decision Value Sets In this section we consider an approach for constructing from a given decision table decision tables that are semi-optimal with respect to the quality measures analogous to the defined in the previous section. Such tables are defined by subsets of condition attributes. This leads to decreasing the discemibility between objects. However, some unions of decision classes can be still approximated with the high quality. These attribute subsets can be treated as (approximate) reducts assuming that some grouping of decision values in the original decision table can be performed. Let A = ([/, A^ C, d) be a decision table and let Oj = d{U). We define a partial order on pairs (B, 0) where B C A and ^ C ©j by ( 5 , / ) < {B\ r ) iff EC B' and / C / ' .

(27.4)

For any B C A by AB we consider a restriction to B of the decision table A = (f/, A, C, d), i.e., A B = (C/, 5 U d, 5 , d). We consider the following problem: Minimal Redact Problem (MRP) Input: decision table A; thresholds k,l,€ e (0, l),k < I and a natural number n. Output: minimal pairs (5,6) such that B C A ,6 C 0 ^ ^ with at most n elements satisfying the following conditions: \PIAA^)

- BelAA^)\ < e

\PlAs{e)\>l-s k\U/IND{A)\

< \U/IND{B)\

(27.5) (27.6)

< l\U/IND{A)\.

(27.7)

The conditions (27.5)-(27.6) are analogous to those formulated for the previous problem. The requirement (27.7) states that the number of S-indiscmibility classes is reduced comparing the number of A-indiscemibility classes (it is less than l\IND{A)\) but its is still significant (it is greater than k\IND{A)\). We use genetic algorithm to generate semi-optimal solutions of the above problem. The discussed reducts can be used for decomposition of larger data tables into smaller ones that are feasible for the existing rough set based algorithms.

344

Barbara Marszal-Paszek and Piotr Paszek

27.5 Conclusion In the paper it was suggested that the relationships between the rough set theory and the evidence theory could be used in finding minimal templates for a given decision table, also for a table with grouped decision classes. Extracting the templates from data is a problem that consists in finding some reducts in a given decision table. Any such reduct is a set of attributes with a minimal number of attributes, which warrants, among others, that the difference between the belief function and the plausibility function is sufficiently small. The discussed reducts can be used in decomposition of larger data tables into smaller ones tractable by the existing rough set algorithms. The last statement is a subject of our current research. Ackonowledgments We wish to express our thanks to Professor Andrzej Skowron for his helpful comments during making this work.

References 1. Friedman, J.H. and Hastie, T., and Tibshirani, R.: The elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg 2001. 2. Pawlak, Z.: Rough Sets: Theoretical aspects of reasoning about data. Boston: Kluwer Academic Publishers (1991). 3. Shafer, G.: A mathematical theory of evidence. Princeton University Press (1976). 4. Skowron, A. and Grzymala-Busse, J.: From the Rough Set Theory to the Evidence Theory. In R.R. Yager, M. Fedrizzi, J. Kacprzyk (eds.). Advances in the Dempster-Shafer Theory of Evidence. New York: Wiley (1994) 193-236.

28 Programming Bounded Rationality Hans-Dieter Burkhard Institute of Informatics Humboldt University, D-10099 Berlin, Germany [email protected]

Summary. Research on Artificial Intelligence and Robotics helps to understand problems of rationality. Autonomous robots have to act and react in complex environments under constraints of their bounded resources. Simple daily tasks are much more difficult to implement than playing chess. Soccer playing robots are considered as testfieldfor rational behavior. Our implementations are inspired by the belief-desire-intention model.

28.1 Introduction Control of autonomous robots and agents in dynamical environments is interesting from a cognitive point of view as well as under application view points. Different architectures have been inspired by cognitive issues as well as technical requirements. It is commonly accepted that simple stimulus response behavior as well as deliberative decisions are both useful according to different requirements. Hybrid architectures are built to combine both. The underlying conflict between complex computations for efficient behavior and bounded resources is known as bounded rationality. In his book [2], Bratman has investigated a model based on the concepts of belief (what the agent supposes the world to be), desires (what the agent desires, but not necessarily tries to achieve), and intentions (what the agent intends to achieve and the actions he wants to perform for that achievements). There is a sophisticated process of refinements of intentions and plans which finally lead to actions. Implementations of this model are known as BDI architectures (BDI = belief, desire, intention, [7], [11]). Actually, the usage of the notions in the implementations is not unique. There were successful models in an multi-modal logical framework. BDI approaches are often identified with these models and their applications. In contrast, our approach uses the BDI concepts as bases for data structures in object oriented implementations. The aim is to implement and maintain a data structure which corresponds to the idea of intentions as hierarchical partial plans which are completed at that time when it is necessary. We especially address problems known as upwards and recognizant

348

Hans-Dieter Burkhard

failures: In hybrid architectures certain failures are recognized at the lower levels, but need handling on higher levels. As an example we use the soccer domain from the RoboCup initiative. The author likes to thank the previous and recent members of the RoboCup teams "AT Humboldt" (Simulation League) and "German Team" (Sony Four Legged Robotic League) for a lot of fruitful discussions. The paper could not have been written without their theoretical and practical work. The work is granted by the German Research Association (DFG) in the main research program 1125 "Cooperating teams of mobile robots in dynamic and competitive environments".

28.2 Robot Control in Dynamic Environments Control of autonomous robots in dynamic environments is often considered under the view point of moving vehicles with the need for fast short term reactions (e.g. obstacle avoidance) but sufficient time for long term planning. Layered architectures with simple "reactive" low level behavior and complex high level behavior are state of the art (cf. [1],[6]). Only the low level reactions are considered as time critical, while re-planning after the occurrence of unexpected events might take longer time. The vehicle must perform an immediate stop if the road is suddenly closed, but then it might wait until a new path is planned. This need not to be true in other scenarios. Soccer can serve as an example for explanation. During an offensive, the players of team A are oriented forward. Suddenly, when the opponents get the control over the ball (e.g. after missing a pass by team A), then team A should immediately switch to defensive. What does it mean? Firstly, players should immediately stop running forward, they need not to run free anymore. Up to this point, it is comparable to the situation of a vehicle which stops moving before an obstacle. Secondly, it means immediately to adapt defensive play: Instead of running free (trying to keep distance to opponents) they have to mark (try to come closer to opponents and to attack them). There is no time for making new plans, the situation becomes different from the case of vehicles. 28.2.1 Soccer Robots: The RoboCup Initiative The soccer scenario has been proven to be very useful for the discussion of problems and for the evaluation of proposed solutions for autonomous mobile robots in dynamic environments. Such environments are characterized by fast changes, plans may become invalid by unpredictable events. It is not possible to predict future events in general. Moreover, sensory information is incomplete and unreliable. Robots have to be aware, that their skills may not be successful and that their plans may fail. One may theoretically think about a plan to play the ball via several players from the goal-kick to the opponents goal, but nobody would expect that plan to work. Note that there is a great difference to a chess program: It is easy to write a program for finding the ultimate best moves, it is "only" a question of complexity to run this program. But nobody is able to write a similar program for soccer playing robots.

28 Programming Bounded Rationality

349

Different leagues of RoboCup [8], [4] are devoted to different aspects. Leagues with real robots investigate the problems of most suitable hardware and their integrated usage (body materials, drives, legs, sensors, energy, control etc.). The simulation league uses a virtual environment, where software aspects can be studied long before the related hardware is available in praxis. RoboCup Rescue investigates the problems of rescue robots. While soccer robots have to work totally autonomous without any outside control, the challenge of rescue robots is the coordination of humans and robots. Rescue robots are assistant robots. In case of a disaster, there will be not enough human experts to control several hundreds or thousands of robots. Moreover, it is questionable if an outside human controller could receive and process all relevant data to make robots move safely in a disrupted ground. But humans can give global advices, and if a robot finds some victim, then humans will have to decide about further steps. The vision of RoboCup is stated as follows: "By the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team." The challenge behind this vision is twofold: The robots should act autonomously, and they must be accepted as opponents of a human team. This means, that a gun is not allowed, that shape and speed must regard certain constraints, and a lot of further requirements. In fact, it is a vision: Nobody knows if the goal can be reached at all. But one should imagine the development from the first aircrafts to the first man in space, the development from first computers to the success of Deep Blue in chess. RoboCup is not understood as simply a competition, the RoboCup community discovers the strengths and limits of autonomous robots. Having in mind a long term vision, new research challenges are discussed and fixed year by year. There is a common understanding, that solutions and programs are disclosed for the usage of all teams. This explains the huge progress of RoboCup robots since the start in 1997. It is not really important if robots can eventually win against human. What counts is the development of science and techniques by evaluation in competition (comparable to the development of motor cars and aircrafts one hundred years ago), and the exchange of solutions. And it is a great amount of fun, not at least for those young people which will see the outcomes in fifty years. They have their own competition: the games of RoboCup Junior [9]. 28.2.2 Horizontal Modularization Software programs for robot control have to deal with a lot of different aspects ranging from scene interpretation over decision/planning/coordination to actor control. Many efforts and discussions are due to appropriate structures and technologies of software design (cf. e.g. [1], [6], [10]). Natural cognition may serve as model. There exist some commonly used concepts (belief, plan, goal, . . . ) borrowed from such models. This makes communication easier for system developers and programmers. But there are also objections against the usage of mental notions for machines. Control items for robots include •

sensors and perception unit to process inputs from the environment.

350 • •

Hans-Dieter Burkhard behavior control (with different complexities ranging from simple stimulus response behavior to long term deliberative behavior), actors and basic action control to act in the environment (sometimes using direct feed back with sensors).

Communication capabilities are included in both the sensors and actors. The operation according to this first "horizontal" modularization is referred to as the "sense-think-act"-cycle: First the sensory inputs are interpreted, next decisions take place in order to determine appropriate actions, and then actions are performed. Simple control structures perform this cycle just as it is. More complex tasks need planning and coordination. The "think"-step may maintain some memory to keep past commitments. Those commitments (goals, plans) may affect recent decisions. The "sense"-step may maintain another memory ("worldmodel") to keep older information for recent use. The worldmodel is important if the environment is not totally observable at any time point. The soccer ball may be hidden by other players. Hence it is useful to have an idea of hidden objects by memorizing their former appearances. No memory is needed in chess: there the players have complete access to the actual state on the chess board. Besides the "horizontal" modularization according to sense-think-act, there may be different "vertical" layers according to complexity issues. The decision processes are often split into a fast reactive layer with short decision cycles, and a deliberative layer for long term planning with longer cycles. Each level may have its own sensethink-act cycle. The synchronization of such complex layers is a real challenge for software technology.

28.3 How to Implement a Double Pass? 28.3.1 Simple Skills Soccer robots need basic skills like pass, score, dribble, intercept, position. Simple stimulus response behavior is sufficient to some extend. Interception of a moving ball can serve for illustration. A very simple player runs straight line to the place where he sees the ball. While the ball is moving, he adjusts his direction every time he looks for the ball, and he will perform a curved path as the result. A somewhat more skillful player anticipates the optimal point for interception and runs directly to this point. As discussed in [3], there are different possibilities for the anticipation of the optimal intercept point, i.e. for determining an optimal speed vector v{d, u) given the distance d and the speed u of the ball. Machine learning methods are broadly used in RoboCup, but one might also think of complex calculations using physical laws. Such intercept is still realized as stimulus response behavior in a cycle: observe the ball - determine optimal intercept point - run to this point. Then there is another difficulty caused by noisy data and/or imprecise calculations: the calculated intercept point may vary for each observation. This leads to oscillating behavior when

28 Programming Bounded Rationality

351

the robot adapts strictly to the latest calculation. Fast adaptation may lead to worse results, while some "inertia" would be better. If stability must be enforced by software (if simple physical inertia is not sufficient), the program has to compare older results with the newer ones and has to make appropriate interpolations. Therefore, a memory for older decisions is necessary. 28.3.2 Coordination More complex problems are illustrated by basic coordination problems. The decision processes become more and more complex (and subject to further stability problems) as the time horizon is enlarged. Here are some examples of decision processes in the RoboCup scenario: •

•

•

A player decides if he can intercept the ball, i.e. if the ball is reachable for him. The decision process can use the procedures for computing v{d, u) from above. It can be extended to calculate the interception point p{d^ u) and time t{d, u). A player decides if he can intercept the ball before any other players can do. He has to compare his own chances with the interception times of other players (e.g. using the methods to calculate p{d^ u) and t{d^ u) from the view point of other players). A player has to decide for the chances of a pass. If he would kick the ball with a fixed speed vector u, he could determine the player which can intercept first by the calculation method from above. Such calculations can be done for several hypothetic kick directions. After evaluating different kicks, the player can decide for a certain pass, or for some other action if there is no good chance for passing.

Up to this point, the methods are comparable to decision procedures as used in chess: try to imagine what happens for different alternatives and choose the best one. The projection into the future uses some kind of simulation of possible futures. Unfortunately, those decision procedures do not scale up in time. It is impossible to make an anticipation even for standard situations like a comer kick. On the other hand, comer kicks provide pattems for behavior which are worth to be used for robot control. A central problem is the incompleteness of the underlying "plans". It is not possible to set up a complete recipe for intended actions. The plans are usually not fully instantiated even during their execution. Therefore it is sometimes neglected that there is a plan at all. In any case, classical planning methods using search strategies are of limited use. As an example, we consider the course of actions during a double pass. We will use this example extensively in this paper. The course of actions is known only in principle at the beginning. The details will be known afterwards. At the end, the description from the view point of the first player could contain items like this: 1. Dribbling from point pi with speed vi to point P2 between time ti and time ^2. 2. Kick with speed vector ui at time ta. (Pass to team mate.) 3. Run from point p^ with speed V2 to point p4 between time ^4 and time ^5. (Run over opponent.)

352

Hans-Dieter Burkhard

4. Run from point ps with speed vs to point pe between time te and time tj. (Run for reposition.) 5. Run from point p7 with speed f4 to point ps between time tg and time tg. (Run for reposition.) 6. Intercept at time ^10. The parameters of this "script" cannot be defined in advance. For example, after the kick at time t^,, only the parameters pi, vi, p2, ti, t2, ui, t^ are known, while the others are still undefined. Besides incompletely known physical constraints, they depend on the behavior of the team mate and the opponent. The question is how one can deal efficiently with such incomplete plans. How is it possible to implement behaviors like a double pass or other more complex standard situations if classical planning is not applicable? Only partial plans can be instantiated during deliberation (e.g. the scheme of the double pass), while the concrete parameters are determined by the principle of least conmiitment. Moreover, there is a hierarchy leading from "general" intentions to somewhat more "specific" ones. The general intention (play double pass) is broken down to specific intentions dribble, kick etc. The intention dribble is again composed from simpler actions (run, ball handling, ...) and so on. As time proceeds, the concrete lower level intentions are completed, but there is no complete plan from the beginning. Actually, these requirements correspond to the analysis in [2], chapter 3.1: "Partiality and hierarchy combine with the inertia of plans to give many intentions and actions a hybrid character: at one and the same time, a new intention or action may be both deliberative in one respect and nondeliberative in another. An intention or action may be the immediate upshot of deliberation, and so deliberative. But that very deliberation may have taken as fixed a background of prior intentions and plans that are not up for reconsideration at the time of the deliberation. I may hold fixed my intention to earn a doctorate in philosophy while deliberating about what school to go on, what to write a thesis on, and so on. It is by way of such plans - plans that are partial, hierarchical, resist reconsideration, and eventually control conduct - that the connection between our deliberation and our action is systematically extended over time. The partiality of such plans is essential to their usefulness to us. But on the other side of the coin of partiality are the patterns of reasoning I have been emphasizing: reasoning from a prior intention to further, more specific intentions, or to further intentions concerning means or preliminary steps. In such reasoning we fill in plans in ways required for them successfully to guide our conduct." The implementation of processes for this kind of planning is a question of appropriate data structures. Such data structures must implement something like a "mental" state.

28 Programming Bounded Rationality

353

28.4 ''MentaF' States 28.4.1 Belief and Commitment By common understanding, simple stimulus response behavior does not use any state. Whenever the agent receives the same input, it reacts with the same behavior. A closer consideration may show some further underlying assumptions like determinism, appropriate correspondence between input and output, appropriate tact rates etc. On the other hand, if the actions of the agent depend not only from the most recent input, then there must be a state for memorizing things that happened in the past. Such things may concern past observations and past decisions of the agent. The concept of states is used in different ways. Sometimes it simply means the different outcomes of a switch assignment. In this paper, the crucial point for talking about a ("persistent") state is the fact, that its contents persist over a longer time period than only one sense-think-act cycle. According to their contents and impacts, we distinguish between belief states and commitment states. The belief state contains information concerning past and recent events in the environment, therefore it is called "worldmodel". Belief can be understood as the overall notion for the information derived from the sensors. It contains the inferred model of the external world. Therefore it processes data from sensors (including communicated information). It may also contain information about internal facts (e.g. battery status) as derived by internal sensors. Usually, the agent does not have direct access by sensors to all relevant information. Therefore, the agent may use inferential methods based on general knowledge about the world. It may simulate the expected course of affairs to get some idea about the state of the world. Usually, the internal representation of the outside world may be not correct. Hence, this representation is called belief, and not knowledge. The inconsistencies may origin in unreliable data and misleading adoptions. While the usage of the concept of belief is similar in all approaches, the concepts related to conmiitments are used in different ways (corrupting their usefulness in discussions to a certain extend). Commitment states contain the information about goals, plans etc. This information originates from the internal decisions of the agent concerning desirable events and facts in the future. Bratman proposes a segmentation of the commitment state into desires, intentions, and plans. As cited above, intentions and plans are closely related: "Intentions are plans writ large." As more as intentions are refined, they become steps of a plan to be performed. Intentions and plans describe and determine future actions that the agent already has committed to perform. In contrast, desires are pre-commitments for structuring the decision process. They are candidates for forthcoming intentions. Both, belief and commitment states are used to memorize contents from the past to the presence and to the future. But the contents of the belief states are concerned with the events that happened in the past, while the conmiitment states are concerned with the events that should happen in the future. According to this distinction, belief

354

Hans-Dieter Burkhard

States are past-directed, commitment states are future-directed. Note that this differs from the use of past, presence and future states in [6]. Interestingly, the simulation methods of the worldmodel used for extrapolating present facts from older ones are also useful for extrapolation into the future: The agent can imagine consequences of possible actions and events in the deliberation process. Therewith he can evaluate the results of possible plans and choose the best one. 28.4.2 The Option Tree As discussed above, plans/intentions are hierarchically organized. The appropriate data structure is a tree. A double pass intention consists of sub-intentions dribble, pass, run, reposition, intercept. Dribble consists of further sub-intentions and so on. Intentions are partially determined during execution: There are choices left for later commitment, e.g. how to perform dribbling. Only one of the existing alternatives will become an intention. Others will remain still desirable, but without commitment at that moment in time(they are in the status "desire")- Others may simply be not candidates for commitment. The underlying general concept (the domain for desires and intentions) are the options. Options describe alternative courses of actions which may become intentions according to commitments. Options are hierarchically structured. Options are composed by sub-options: The double pass option consists of sub-options dribble, pass, run, reposition, intercept. Options can be realized in different ways: The overall oi^\ion play-soccer cdiVi be realized by the option offensive, or by the option defensive, or by some other useful option. We have two kinds of branches in the option tree: •

•

Choice-branches: They lead to different alternatives for performing the option at the top of the branch. A pass can be performed by di forward-kick, a sidewardkick, a bicycle-kick etc. Sequence-branches: They lead to a sequence of sub-options which must be performed to realize the option at the top of the branch (like the sub-options dribble, pass, run, reposition, intercept for a double pass).

The process of commitment for a complete intention means subsequent top-down commitments for a special sub-option at the related choice-branches. For example, the first part of a commitment might be offensive. Offensive can be realized in different ways. Therefore a next commitment is necessary, for example double pass. Then we have sequential sub-options dribble, pass, run, reposition, intercept. Each of them can be realized in different ways. Commitments are necessary for the alternatives for each of these sub-options: for the style of dribbling, for a special kind of pass etc. The commitments need not be performed at the same time. Instead, they may be performed by need. But afterwards, all commitments are perfect and we can talk about a completed intention. It has the form of a sub-tree in the option tree. Such a complete intention tree contains exactly one successor for each choice-branch start-

28 Programming Bounded Rationality

355

ing from the top of the tree, and it contains all sub-options for the sequence-branches in this tree. An example from the soccer domain is given in Figure 28.1. The numbers (e.g. in DoublePass/1) denote the role (htre: first player) in a cooperative plan.

[OfSasivef----^^ I Score I I l>0ubteP^$s/t

H DoublePass/a. |1

I Defensive | ChangeWings/1

TrTT"! | Attack 1

| OffsideTrap |

Kick I

«..

r Fig, 28.1. Option Tree with Intention Subtree for Double pass/1 According to our discussions from above, the idea behind this description of intentions is the definition of a partial hierarchical plan for the activities. At any concrete time point, the robot has to act according to the recent intentions. It is necessary, that there is a process of completing the intention in time. A proposed implementation will be described below in Section 28.6. Intentions as described here provide the means for the description of the commitment state of the robot. There are two aspects: 1. The options at the bottom of a choice branch in the option tree are annotated according to their commitment status: Roughly speaking, annotations may be of the form "intended", "desired", "not desired". The options which are marked "intended" form the intention tree. Actually, the annotations in our implementation are numbers (e.g. according to expected utilities). Intentions are defined top down according to the highest scores. Desires are defined according to the next highest scores. 2. The options at the bottom of the sequence branches in the intention tree are annotated according to their execution status: Their annotations are of the form "active", "waiting", "done". There is exactly one active option at each level of the intention tree. The active options form a path from the top of the tree to a certain leave determining the recent action. This path is called activation path. At the time when the first player passes to the second one, the activation path (cf. Figure 28.1) consists of PlaySoccer-Offensive-DoublePass/1-Pass- ... down to a concrete action (e.g. a kick-command with specified power and direction). The

356

Hans-Dieter Burkhard

Other options are marked "done" if they are already executed, or "waiting" if they will be executed in the future. Further annotations are needed for the parameters: we call this the agenda. It contains identification of players (e.g. which players are cooperating in the double pass), parameters of actions (e.g. anticipated intercept points, kick directions). The specifications are postponed as far as possible. Altogether, the annotations in the option tree describe the mental state concerning the commitments of the robot. It is a unique data structure over all levels of the hierarchy for all related purposes, no matter if it concerns high level or low level control.

28.5 Redeliberation: When to Abandon a Double Pass? 28.5.1 StabUity vs. Adaptation Dynamic environments, incomplete and imprecise information, unreliable actors there is no guarantee of success, plans may fail to reach their goals. The idea behind intentions is guidance of further deliberation for reasons of complexity and stability of behavior. Hence intentions should be stable. But if there is enough evidence for failure, intentions should be cancelled to prevent from fanaticism. But what are the right criteria? We have already mentioned the problem of oscillating behavior in case of too early adaptation. In the extreme case, the agent might not reach any goal when he always decides for another plan before completing any of them: Buridan's ass gets starved between two piles of hay. Usually both piles are considered to be equally attractive, but noise of sensors might cause oscillating attractiveness of the piles. To be completely rational, the agent had to decide not only on the evaluation of the single options. He should take into account the consequences and costs for changing plans, the possibility for later improvements etc. Especially, for coordination with other agents, stability has a very large impact - and changing plans might result in a lot of overhead for negotiation and new commitments. As long as resources are bounded, attempts for stability may fail in certain cases, but will perform better in the sum over longer times. In natural systems, stability is often forced by the physical constraints: inertia forces smooth movements, restricted conmiunication forces to follow former agreements in cooperation. There are good reasons to keep old intentions, but it must be possible to cancel intentions if necessary. The program of the robot has to manage such critical situations. Consider the situation during a double pass after the first player has passed to his team mate. He runs forward over the opponent and trusts that his team mate will later pass back to him. Sometimes critical situations appear with the danger that the team mate will not reach the ball. The intention forces the player to keep running forward. But if there is enough evidence for failure, the intention has to be cancelled. Moreover, the new situation has to be managed by re-deliberation.

28 Programming Bounded Rationality

357

Re-deliberation is of course needed, if the double pass definitely fails, i.e. if the team mate did not reach the ball, but an opponent does. In this case, our player should not continue the double pass actions by running forward. Instead he should switch immediately to defense. Instead of running away from the opponent, he must try to come close to him. Re-deliberation may also be desirable in the case of unexpected better alternatives. After intercepting the ball, the team mate might have the possibility of scoring by a mistake of the opponent goalie. Then he should not pass back to the first player according to the double pass script. He should take the chance of scoring. It is a difficult task for the programmer to implement the two contradictory requirements: 1. efficiency of decision procedures to avoid time consuming re-considerations as long as things evolve as expected, and at the same time 2. maintaining the possibility for changing behavior in the case of non-expected courses of events. Often it is necessary not only to cancel an intention, but also to clean up further activities similar to an interrupt in a runtime system. 28.5.2 Failing upwards The situation becomes especially difficult if a failure is first recognized at a lower level of control, but needs handling on a higher level. The situation is called failing upwards. We consider again the situation during a double pass when the second player looses the ball. Both players of the team are affected, both have to switch from offensive to defensive play. This means a switch - a re-deliberation on a higher level of control. But there is an important difference: • •

The second player, the expected receiver of the pass, cannot continue according to his double pass / role 2 intention. The failure becomes known immediately. The first player could still perform actions according to his double pass / role 1 intention, he could continue running forward. But there is no reason to do that anymore. It is a harder problem of "upwards recognition": It must be assured by some means, that the failure is recognized on the higher level for the switch from offensive to defensive play.

A crucial point are the real time constraints of an application. In soccer, the switch from offensive play to defensive play should be performed immediately. It means, the real time constraints affect all levels of control (unlike the re-planning of a path for vehicle control).

28.6 How to Perform a Double Pass The special requirements we have identified so far are • •

Need for possibly time consuming long term deliberation. Need for re-deliberation in case of upwards failures under real time constraints.

358

Hans-Dieter Burkhard

There is no choice: Taken into account that deliberation may be time consuming, and that fast decisions for actions are necessary, we have to have two parallel processes (threads or something similar) as in hybrid architectures: one for deliberation, the other for fast actions. But in our approach, we use the complete data structure of the option tree for both processes. The hierarchy allows to describe behaviors and plans in a unique way, ranging from single actions on the lowest level up to long term plans on the highest levels. Partial plans are represented by only partially defined annotations, i.e. partially defined intention trees and partially defined parameters in the agenda. The both processes traverse through the whole option respective intention tree. They add or modify the annotations of the trees according to long term deliberation respective according to short time decisions. Hence they are calltd passes. The passes perform different tasks: The Deliberator performs the time consuming calculations for choice of goals and long term planning. It sets up a partial hierarchical plan (intention tree). Following the least commitment idea, the plan is refined as time goes on. Normally the deliberator does not have time problems since it works with sufficient forerun. Time critical decisions are left to the executor. The Executor performs the contemporary decisions. Based on the preparatory work of the deliberator, its decision space is restricted to the recent intention. It has to determine the concrete parameters e.g. for a kick at the time when the kick is to be performed using the newest sensor information. On the lower levels it controls behaviors similar to the reactive layers of hybrid control. But before doing that, it has to traverse the higher levels in order to check conditions and to perform decisions if necessary. It is important to realize that the both processes are independently running passes through all levels of the hierarchy. This is in contrast to runtime organization in layered architectures (where short time decisions only affect the lowest level) and in imperative programming (where only the procedure on the top of the stack is active). 28.6.1 Deliberator: When to Perform a Double Pass? While the player runs for intercepting the ball, he can already make guesses about the situation when he will have control over the ball: could there be a chance for scoring, would it be desirable to change the wings, is a partner for a pass available and so on. He can set up possible (partial) plans for the future. They are called desires, but not intentions, because there is no commitment to any of them. Which of them becomes the status of an intention will be decided later. Note that different desires can be set up, possibly by different threads. They provide the possibility to be prepared for the handhng of recognizant failures. Consider the players during the double pass. There may be an already prepared alternative plan P (a desire) for defensive play. If it unfortunately happens, that the opponents get the ball, then the players cancel the intention double pass and can switch immediately

28 Programming Bounded Rationality

359

to the new installed intention for defensive according to the prepared plan P without loss of time. The deliberator has to choose from all possibilities: He decides for the prior intention. Since there is no need not to fill in all the details, deliberation can start very early, e.g. while a player is still running for intercepting the ball. Of course, these evaluations are only temporary, thus they remain in the status of desires with different degrees of desirability. The degrees are expressed as annotations at the choice branches in the option tree. They are updated as time goes on until the player gets control over the ball. Then the highest ranked desire becomes the new intention. For example, this new intention is change wings. Note that until this time point, the executor was working over an older intention terminating with the intercept. Now, when the old intention is completed, the new intention for change wings provides a new scope for the executor. Meanwhile, the deliberator can already evaluate the options desirable after the player has finished his actions for the change wings intention. Actually, the choices and evaluations by the deliberator are very complex. In general, it has to process a very large search space. The search space is based on the known parameters of the ball and of the 22 players with respect to localization, speed and further conditions. Additional parameters like distances and guesses about opponent tactics can be derived. Not all parameters do really influence any decision. The hierarchical structures helps to split the space into better manageable parts. Standard situations provide generic cases for cooperative play. Using methods from Case Based Reasoning (CBR, cf. [5]), a concrete situation can be matched to a standard situation. For example, a triggering feature for the double pass is an opponent on the way of an offensive player controlling the ball. The standard situation (the "case") provides a standard scheme ("solution") for an intention. Using CBR methods for adaptation, a concrete intention can be specified. The option hierarchy serves as a structure for describing cases. 28.6.2 Executor: How to Realize a Double Pass? The executor has to guarantee that relevant checks and decisions are performed under real time constraints. In complex environments, it is not possible to do the complete deliberation or re-deliberation in the case of failure in short time. The restricted scope of the activation path in the intention tree is a mean to minimize the work of the executor. There it can perform relevant steps on all levels of the hierarchy. Again, we consider the work of the executor during the double pass while the first player is running over the opponent. The condition "is offensive play still applicable" (i.e. does the team maintain control over the ball) is checked while traversing the high level concerning offensive or defensive play. For this condition, it doesn't matter what kind of offensive play (which sub-intention) is running. As long as ball control is due to the own team, the executor steps down to the next lower level. There on some intermediate level, the player checks if his team mate has already passed the ball back to him. If not, he continues to run to a free space behind the opponent. Otherwise he changes to the next sub-option run for intercept. It proceeds this way

360

Hans-Dieter Burkhard

down to the basic level following and modifying the activation path. Parameters of the agenda are calculated if necessary. In each cycle, the executor starts traversing the higher levels in order to check conditions and to perform decisions if necessary. This permits an efficient handling of events that would cause "upwards failures" otherwise. In our example, the condition "does the team still maintain control over the ball" is checked already on the high level. If the ball is lost, the executor can immediately stop further traversing in the double pass intention. Instead he can switch to an appropriate altemative desire for defensive which becomes the new intention. Hopefully, such an altemative has been prepared by the deliberator in time. But at least, there will be no waste of time for hopelessly continuing the double pass. Therewith, failures are not checked at the low level and then reported to the higher levels. Instead, the executor checks for possible failures on the levels they belong to. The crucial point behind is a problem of software technology. In a stack oriented run time system with control given to the last called procedure (the procedure on top of the run time stack) we have two alternatives: •

•

The last called procedure performs the necessary tests. In our example, there is a procedure active which performs running forward. This procedure has to test if the team mate still maintains control over the ball. There is some higher level procedure which performs the tests according to the impacts of the consequences. This is the procedure for performing the double pass, which has to be cancelled if the ball is lost.

The last altemative is the conceptual correct choice. But in stack oriented architecture, the procedure becomes active only after the called subroutines are terminated. This can result in a considerable delay. If tests are performed and reported upwards by the low level procedure run forward, this procedure will become overloaded by related tests for all higher level options using this procedure. Skills like running forward are used in different contexts. Tests of relevant conditions for all these contexts would result in the overload. Different run forward skills for different contexts would lead to repeated parts of code.

28.7 Conclusion The problem of coordinated actions in robot soccer was considered under the view point of bounded rationality. The proposed architecture ("Double Pass Architecture" because of its two passes for deliberator respective executor) was inspired by the work of Bratman: Prior intentions set the screen for later decisions which lead to further intentions, and finally to actions. Intentions and plans are hierarchically stmctured and only partially defined. To maintain a related "mental" state for commitments, our architecture uses the data structure of the option tree. A commitment state is given by annotations of this tree. The annotations define the recent intention

28 Programming Bounded Rationality

361

tree, the activation path in this tree, and the parameters in the agenda. Additional sub-trees may define desires as candidates for later intentions. Deliberator and executor are to some extend motivated by [2], chapter 3.3: "Practical reasoning, then, has two levels: prior intentions and plans pose problems and provide a filter on options that are potential solutions to those problems; desire-belief reasons enter as considerations to be weighed in deliberating between relevant and admissible options." There is still ongoing work about the suitable distribution of work between deliberator and executor. An experimental version is under development for the simulation league of RoboCup in the project "Architectures and Learning on the Base of Mental Models" of the research program 1125 "Cooperating teams of mobile robots in dynamic and competitive environments" granted by the German Research Association (DFG). An important feature is the unique data structure and the handling of so called upwards failures. Actually, they are not reported upwards anymore because they are already detected on the right level of responsibility. An often discussed practical problem concerns the question if the explicit treatment of complex behaviors is necessary at all. Without explicit representation, a double pass may simply emerge from appropriately tuned simpler behaviors. In fact, such emergent behavior can sometimes be observed in RoboCup competitions. Emergence can occur by chance only, or by intentional fine tuning of simpler behaviors by the programmer. At least in the latter case, the double pass is implicitly present in the program. But the question is, how far this approach can scale up. Our approach is based on an explicit representation using the data structure of the option tree. For an outside observer it makes no difference if the observed behavior is implicitly or explicitly implemented. We are back at an interesting cognitive question: Which "mental" states do have a real representation in mind, and which are only useful ascriptions of observed behavior?

References 1. Arkin R C (1998) Behavior Based Robotics. MIT Press 2. Bratman M E (1987) Intentions, Plans, and Practical Reason. Harvard University Press, Massachusetts 3. Burkhard H D (2002) Real Time Control for Autonomous Mobile Robots. Fundamenta Informaticae 51(3):251-270 4. Kitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E, Matsubara H (1997) RoboCup: A challenge problem for AI. AI Magazine 18(l):73-85 5. Lenz M, Bartsch-Sporl B, Burkhard H D, Wess S (eds) (1998) Case Based Reasoning Technology. From Foundations to Applications. LNAI1400, Springer 6. Murphy R R (2000) Introduction to AI Robotics. MIT Press 7. Rao A S, Georgeff M P (1991) Modeling agents within a BDI-architecture. In: Fikes R, Sandewall E (eds) Proc. of the 2rd Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'91)

362

Hans-Dieter Burkhard

8. RoboCup. The Robot World Cup Initiative: www. r o b o c u p . o r g Annual Proceedings of the RoboCup Workshops/Symposia appear in the Springer LNAISeries 9. RoboCupJunior website: www. r o b o c u p j u n i o r . o r g 10. WeiB G (ed)(1999) Multiagent Systems. A Modem Approach to Distributed Artificial Intelligence. MIT Press 11. Wooldridge M (1999) Intelligent Agents. In: [10]: 27-78

29 Generalized Game Theory's Contribution to Multi-agent Modelling Addressing Problems of Social Regulation, Social Order, and Effective Security Tom R. Bums\ Jose Castro Caldas^, and Ewa Roszkowska^'^ ^ Uppsala Theory Circle, Department, of Sociology University of Uppsala, Box 624, 75126 Uppsala, Sweden. [email protected] ^ Dinamia-ISCTE Av. Das Forcas Armadas, Lisboa 1649-026, Portugal [email protected] ^ Faculty of Economics, University of Bialystok Warszawska 63, 15-062 Bialystok ^ Bialystok School of Economics Choroszczanska 31, 15-732 Bialystok, Poland erosz@w3 cache.uwb.edu.pi

Summary. The paper is divided into three parts: In section 29.1 of the paper. Generalized Game Theory (GGT) is outlined, and its applications in formalizing key social science concepts such as institutions, social relationships, roles, judgment, and games are presented. Institutions operate as a type of social algorithm, organizing and regulating agents playing different roles as they engage in deliberation and judgment activities and make and implement collective decisions. Section 29.2 of the paper will present simple multi-agent simulation models and selected results of the simulation. Section 29.3 will briefly outline an agenda for societal research based on the application of GGT to explaining and managing problems of insecurity and social disorder in multi-agent systems. In the GGT perspective, the problem of security can be formulated in terms of regulating a system and its agents, and dealing with social disorder and crisis.

29.1 An Outline of GGT: Game Structures and Game Processes GGT can be characterized as a cultural institutional approach to game conceptualization and analysis. Social theory concepts such as norm, value, belief, role, social relationship, and institution as well as game interaction structures can be defined in a uniform way in terms of rules and rule complexes.

364

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

Given a concrete situation S^ in context t (time, space, social environment), a general game structure is represented as a particular rule complex G{t) ([5],[12]).^ Informally speaking, a rule complex is a set consisting of rules and/or other rule complexes. The G{t) complex includes as subcomplexes of rules the players' social roles vis-a-vis one another along with relevant norms and other rules. Suppose that a group or collective / = { 1 , . . . ,m} of actors is involved in a game G{t). ROLE{i,t, G) denotes actor i's role complex in G{t) (we drop the "G^" indexing of the role). ROLE{I, t) denotes a role configuration of all actors in / engaged in G{t). For every z = 1 , . . . , m, ROLE{i, t) is a subcomplex of ROLE{I, t) and the latter complex is a subcomplex of G{t), i.e. ROLE{i,t)

Cg ROLE{I,t)

Cg G{t)

(29.1)

The game structure G{t) consists then of a configuration of two or more roles together with R, some general rules (and complexes) of the game: G{t) = [ROLEl, ROLE2,....,

ROLEk; R].

(29.2)

R contains rules (or rule complexes) which describe and regulate the game such as the "rules of the game", general norms, practical rules (for instance, initiation and stop rules in a procedure or algorithm) and meta-rules, indicating, for instance, how seriously or strict the roles and rules of the game are to be implemented, and possibly rules specifying ways to adapt or to adjust the rule complexes to particular situations. G{t) also contains rules defining legitimate or appropriate players, their legitimate or appropriate options, their values, preferences, or utility functions, and their bases of judgment and action determination. The game structure G{t) is distinguished from game process. The game process entails the participating agents applying and performing the appropriate rules and rule complexes of the game structure in a given time and context situation t. The game process entails "playing the game," that is, agents collecting information, making judgments, performing their roles, possibly innovating, and, in general, exercising "human agency." More specifically, a rule complex, let us say C Cg R contains ^ In practice, a totality of rules can have a more complicated structure than merely a set of rules. The notion of rule complex was introduced as a generalization of a set of rules. Formally, a rule complex is a set obtained according to the following formation rules: Anyfiniteset of rules is a rule complex. If C, D are rule complexes, then CuD and p{C) are rule complexes. If C C D and D is a rule complex, then C is a rule complex. A rule complex C is a subcomplex of a rule complex D, C Cg D, if C = D or C obtains from D by deletion of some (or all occurrences) of occurrences rules in D or removal of redundant parentheses. Rule complex is a major concept in GGT. The motivation behind the development of this concept has been to consider repertoires of rules in all their complexity with complex interdependencies among the rules and, hence, to not merely consider them as sets of rules. The organization of rules in rule complexes provides us with a powerful tool to investigate and describe various sorts of rules with respect to their functions such as values, norms, judgment rules, prescriptive rules, and meta-rules as well as more complex objects consisting of rules such as roles, routines, algorithms, models of reality as well as social relationships and games.

29 Generalized Game Theory's Contribution to Multi-agent Modelling

365

rules defining what is a bonafide player (as opposed to, for instance, an "outsider" or a non-human "agent" such as a "robot" or "nature"). In classical game terms, C would specify that all players are "rational" beings (with well-defined goals, consistent in their preferences, and exercising freedom of choice), etc. In real social life, however, there are typically social categories such as "gender" (male, female, or "mixtures"), "under age" (for instance, < 18) defining roles along with specified action opportunities and constraints. There are often rules in C defining or specifying the characteristics of people assigned to particular roles, for instance, persons of a particular social class, status, profession or occupation. We shall also see that persons in their roles may also be distinguished in terms of their level of commitment or performance (see simulation studies later). An actor's role is specified in GOT in terms of a few basic cognitive and normative components, that is rule subcomplexes (see Figure 29.1). Key role components are the following: 1. VALUE{i, t) represents actor i's value complex and provides inputs to generating evaluations and preferences through judgment processes, i's value complex in situation t consists of evaluative rules assigning value to relevant things, states of the world, deeds, and people in the situation, defining, among other things, what is "good", "bad", "right", "wrong", "acceptable", "unacceptable". The strength or resilience of a value reflects commitment, a type of meta-value (see Section 29.2). Rules in VALUE{i, i) may specify (or generate) preference relations among the set of objects X, where X may be the available strategies or the outcomes of interaction in a game matrix or persons or classes of persons. And, of course, VALUE{i, t) may contain a well-defined "utility function." 2. MODEL{i, t) - The subcomplex represents actor i's belief structure about herself, others, and her environment as well as relevant conditions, and constraints in the situation St. It contains beliefs (which are a type of rule) about important characteristics of the game situation and its context. 3. ACT{i,t) represents the repertoire of acts, strategies, routines, programs, and action available to actor i in her particular role in the game situation St. These are acts that are obligated, allowed, or possible. Other actions may be allowed only under special circumstances, or may be discouraged or forbidden. Such a normative action complex includes general principles, guidelines, directives, and regulations indicating what to do and not do in the course of interacting with others in playing the game. It includes rules specifying a set of strategies (or principles for generating or discovering strategies) in actor f's role vis-a-vis others in game G{t). 4. J{i, t) - The judgment complex consists of rules which enable the agent i to come to conclusions about truth, validity, value, or choice of strategic action(s) in a given situation. In general, judgement is a process of operation on objects. The types of objects on which judgements can operate are: values, norms, beliefs, data, strategies, and outcomes. Also there are different kinds of outputs or conclusions of judgment operations such as evaluations, beliefs, data, pro-

366

Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska cedures, or rule complexes. For our purposes here, we concentrate on action judgments and decisions.

Judgment is a core concept in GGT ([6],[7],[8],[10]). The major basis of judgment is a process of comparing and determining similarity. The capacity of actors to judge similarity or likeness (that is, up to some threshold, which is specified by a metarule or norm of stringency), plays a major part in the construction, selection, and judgment of action. This is also the foundation for rule-following or rule-application activity. ^ Definition 1. Given a player i with a judgment complex J{i,t) in situation St, i applies J{i,t) in operating on a vector of objects a = ( a i , . . . , a n ) , where a i , . . . , an G U, and U is a universe of objects. This process we refer to as the judgement process. The objects, actor i, and situation t specify the context of judgement. The judgement process results in conclusions, which we denote as J^*{a). For a given situation, we can simply write J (a). It is also possible that no rule or rule complex is produced and we write J{a) = 0. 29.1.1 The Principle of Action Determination: A Type of Judgment In making their judgments and decisions about an action or plan B (or between A and B), players activate relevant or appropriate values, norms, and commitments. These are used in the assessments of options through a comparison-evaluation process. In determining or deciding action, a player(s) compares and judges the similarity between an option S, or pair of options A and B, and the appropriate, primary value or goal which the actor is oriented to realizing or achieving in the situation. More precisely, the actor tries to determine if a finite set of expected or predicted qualia or attributes of option B, Q{B), is sufficiently similar to the set of those quaha Q{v) which the norm or value v (or a vector of values) prescribes. ^ This relates to the Wittgensteinian problem of "following a rule" ([18],[19]). From the GGT perspective, "following a rule" (or rule complex) entails several phases and a sequence ofjudgments: in particular, activation and application together with relevant judgments such as those of appropriateness for a given situation or judgments of applicability. To apply a rule (or rule complex), one has to know (1) the conditions under which the application is possible or allowed and (2) the particular conditions of execution or application of the rule (in part, whether other rules may have to be applied earlier). The application of a rule (or rule complex) is not then simply a straightforward matter of "following" and "implementing" it: the conditions of execution may be problematic; the situation (or situational data) may not fit or be fully coherent with respect to the rule (or rule complexes); actors may reject or refuse to seriously implement a rule (or rule complex); a rule (or rule complex) may be incompatible or inconsistent with another rule that is to be applied in the situation. In general, actors may experience ambiguity, contradiction, dilemmas, and predicaments in connection with "following a rule" making for a problematic situation and possibly the unwillingness or inability to "follow a particular rule."

29 Generalized Game Theory's Contribution to Multi-agent Modelling

367

The principle of action determination states: Given an actor i in ROLE{i,t) which entails the value v (or a vector of values) in VALUE{i, t), specifying qualia and standards Q{v) that are to be focused on and realized in role decisions and performances in interaction situation G{t). Actor i in G{t) guided by value v (or a vector of values) tries to construct, or to find and select an action pattern or option B where B is characterized by qualia or properties Q{B), which satisfy the following rough or approximate equation,-^ J(i, t){Q{B), Q{v)) = sufficiently similar

(29.3)

that is, the conclusion of the judgment process is that Q{B) is judged sufficiently similar'^ toQ{v). 29.1.2 Game Processes Interaction or games taking place under well-defined conditions entail the application and implementation of relevant rule complexes. This is usually not a mechanical process. Actors conduct situational analyses; they find that rules have to be interpreted, filled in, adapted to the specific circumstances (see footnote 2). Some interaction processes may be interrupted or blocked because of application problems: contradictions among rules, situational constraints or barriers, social pressures from actors within G{i) and also pressures originating from outside the game situation, that is the larger context (see later). In general, then, human agents do not only apply relevant values and norms specified in their roles. They bring to their roles values and norms from other relationships. For instance, their roles as parents may come into play and affect performance in work roles (or vice versa). They also develop personal "interests" in the course of playing their roles, and these may violate the spirit if not the letter of norms and values defining appropriate role behavior. More extremely, they may reject compliance and willfully deviate, for reasons of particular interests ^ Elsewhere we ([10],[16]) have elaborated this model using a fuzzy set conceptualization. The general formulation of next equation relates to the notion of "satisfying" introduced by Simon [17]. "^ In a game with given action alternatives for each of the players, a judgment procedure such as optimization may be used under some conditions. Actors proceed to compare and determine the similarity or goodness offitbetween the value indications Q{v) and the expected consequences Q{.) of the different action alternatives. She chooses — and tries to produce — these actions which minimize the difference between the anticipated consequences of the actions and the prescribed or indicated consequences of the value v. This is following or applying a value in another sense than that of constructing an action on the basis of a value. More precisely, for two alternatives, A and B, the consequences of each alternative, Q{A) and Q{B) are compared to the consequences prescribed by value v, Q{v). Consider that the actors can compare the results of their judgments by means of a relation <. Suppose that the results of actor I's judgments of the two options A and B are dl{A) and dl{B), that is the degree to which A and B realize or satisfy v, respectively. Moreover, it is judged that dl{A) < dl{B), that is that action B better realizes v than action A. Finally, the actor chooses that action which is maximal with respect to <, namely in this simple example, B.

368

Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska

or even ideals (as modeled in our simulation studies). Finally, agents may misinterpret, mis-analyze, and, in general, make mistakes in applying and performing rules. In sum, role behavior is not fully predictable or reliable. We investigate and model multi-agent social systems in which the agents have different roles and role relationships. Most modem social systems of interest can be characterized in this way. That is, there are already pre-existing institutional arrangements or social structures (see Figure 29.1). How are games played? Two or more roles ( 1 , 2 , 3 , . . . , m) in relation to one another generate interaction patterns, outcomes, and developments. Consider for simplicity's sake a two role model:

CULTURE/INSTITUTIONAL ARRANGEMENTS

T SOCIAL AGENT B MODEL(2,t) ROLE2 J(2,t]

i 1 VALUE(2,t)

\ ACT(2,t)

1

PHYSICAL ECOSYSTEM STRUCTURES: TIME, SPACE AND OTHER

INTERACTIONS AND OUTCOMES

CONDITIONS

Fig. 29.1. Two role model of interaction embedded in Cultural-Institutional and Natural context Classical games are special cases, namely closed games with given players as well as given action alternatives and outcomes. The players have anomic type social relationships and operate with specified, fixed value complexes or preferences orienting them to self-interested pursuit of their own interests or values [8]. Such closed game situations with specified players, and given value and judgment complexes (for instance, maxmin or other optimization procedure (see footnote 4) as well as given alternatives and outcomes are analytically distinguishable from open game situations.

29 Generalized Game Theory's Contribution to Multi-agent Modelling

369

Open and closed games are distinguished more precisely in terms of the properties of the action complex, ACT (I, t, G) for the group of players I at time t in game G{i) (see [8]). In closed game conditions, ACT{i, t, G) is specified and invariant for each actor i in / , situation Su and game G{t). Such closure is characteristic of classical games (as well as parlour games), whereas most real human games and interaction processes are open. In open games, the actors participating in G{t) construct or "fill in" ACT{I, t, G) in the course of non-routine interactions. Also, in such open games, actors may change values (including changes in preference structures and utility functions), models, and judgment complexes. For instance, in a bargaining process, the actors may alter their strategies or introduce new strategies - or develop particular feelings and undergo shifts in their values and judgment complexes - during the course of their negotiations. In such bargaining processes, the particular social relationships among the actors involved - whether relations of anomie, rivalry, or solidarity - guide the construction of options and the patterning of interaction and outcomes. In general, for each actor i in /, her repertoire of actions, ACT{i, t, G), is constructed by her (and possibly others) in the course of her interactions. She tends to do this in accordance with the norms and values relevant to her role at t. In open game situations, actors may construct and elaborate strategies and outcomes in the course of interaction, for instance in the case of a bargaining game in market exchange ([8],[16]). In such bargaining processes, established social relationships among the actors involved guide the construction of options and the patterns of interaction and outcomes. There is a socially constructed "bargaining space" (settlement possibilities) varying as a function of the particular social relationship in the context of which the bargaining interactions take place. The relationship - the particular social rules and expectations associated with the relationship - make for greater or lesser deception and communicative distortion, greater or lesser transaction costs, and likelihood of successful bargaining. The difficulties - and transaction costs - of reaching a settlement are greatest for pure rivals. They would be more likely to risk missing a settlement than pragmatic "egoists." This is because rivals tend to suppress the potential cooperative features of the game situation in favor of pursuing their rivalry. Pure "egoists" are more likely to effectively resolve some of the collective action dilemmas in the bargaining setting. Friends may exclude bargaining altogether as a precaution against undermining their friendship. Or, if they do choose to conduct business together, their tendencies to self-sacrifice may make for bargaining difficulties and increased transaction costs in reaching a settlement [8]. Let us consider the role relationship {ROLE{l),ROLE{2),R} of actors 1 and 2, respectively, in their positions in an institutional arrangement in which they play a game G{i, t). Such role relationships typically consist of shared as well as interlocked rule complexes. A shared rule or rule complex refers to the condition that it belongs to both rule complexes, ROLE{l) and ROLE{2). The concept of interlocked complementary rule complexes means that a rule in one actor's role complex concerning his or her behavior toward the other has a corresponding rule in the other's actor's complex, for instance in the case of a superordinate-subordinate role relationship, a rule k in ROLE(l) specifies that actor 1 has the right to ask 2 certain

370

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

questions, to make particular evaluations, and to direct actions and sanction 2. In 2's complex there is a rule or rule complex m, obligating 2 to recognize and respond appropriately to actor 1 asking questions, making particular evaluations, directing certain actions, and sanctioning actor 2. Human action is multi-dimensional and open to differing interpretations and judgment processes. The focus may be on, for instance: (i) on the outcomes of the action ("consequentialism" or "instrumental rationality"); (ii) the adherence to a norm or law prescribing particular action(s) ("duty theory"); (iii) the emotional qualities of the action ("feel good theory"); (iv) the expressive qualities of the action (action oriented to communication and the reaction of others as in "dramaturgy"); (v) or combinations of these. Role incumbents exercise their agency and focus on specific qualia in particular contexts, because, among others, (1) such behavior is prescribed by their roles, (2) such behavior is institutionalized in the form of routines, (3) the actors have no time or computational capability to deal with other qualia. Thus, games may be played out in different ways, as actors operate within constraints, determine their choices and actions, and, in general, exercise their agency ([10],[16]): • •

•

• •

routine interactions. The actors utilize habitual modalities (bureaucratic routines, standard operating procedures (s.o.p.'s), etc.) in their interaction. consequentialist-oriented interactions. Actors pay attention to the outcomes of their actions, apply values in determining their choices and behavior on the basis of outcomes realizing values. normativist-oriented interactions. Actors pay attention to, and judge on the basis of norms the qualities or attributes of action and interaction, applying role specific norms in determining what are right and proper actions. emotional interactions. symbolic communication and rituals.

Or, there may be some combination of these, including mixtures such as agents oriented to outcomes interacting with others oriented to qualities of the action. Or, someone following a routine interacts with another agent who operates according to a "feel good" principle. For our purposes here, we focus on the first three patterns. (1) routine interactions (including standard operating procedures (s.o.p.'s), rituals, etc.). In general, the actors operate fixed (without modifiability) interlocked algorithms in their interaction vis-a-vis one another: ALGl Cg ROLE{l,t), ALG2 Cg ROLE{2,t).^ They have either a common rule TQ E R or rules with common premises, but with conclusions that are different for each of them. Under particular conditions (specified by the premises), the rule or rules initiates the performance of ALGl and ALG2 for actors 1 and 2, respectively. These are interlocking. ^ Algorithms are characterized as rule complexes, i.e. ALGl = { r i i , r i 2 , . . . ,r*im}; ALG2 = {r-21, r22, • • •, ^-271}, where rij G ALGl and r2j G ALG2 is a rule or complex of rules. Interlocked algorithms means that the conclusions of at least one subset (subcomplex) of each algorithm provides the premises for at least one subset (subcomplex) of the other. Thus, a failure of one or more rules in a subset (subcomplex) to be activated means a failure in the overall procedure.

29 Generalized Game Theory's Contribution to Multi-agent Modelling

371

meaning that realization in performance of the conclusions of some rule ru (or rule complex) in ALGl provide the premises for a rule (or rule complex) r2j in ALG2. In a certain sense the two algorithms are sub-routines making up a role-relationship algorithm, ALG = {ALGl, ALG2}, that is a more global algorithm and at the same time is a subcomplex of complex {ROLEl, ROLE2, R}. The sub-routines in an operative algorithm ALG characterized by process equilibrium are consistent. Consistency means in a GOT practical perspective that the implementation of rule(s) in ALGl in its (their) proper order does not interfere with the performance of rule(s) in ALG2. Typically, there will be rules for turn-taking and other rules indicating which actor takes the lead under certain conditions, etc. (see [4], concerning interactions between superordinates and subordinates). As the actors enact the interlocked algorithms, certain steps (applications of particular designated rules) are followed. Routine interaction equilibria. The interaction will be a process equilibrium if all the local conditions for applying and executing the algorithms are satisfied and the interlocking algorithms are consistent. Note that the agents in the purest case of routine interaction do not pay attention to results and do not make explicit value judgments, as in the cases of consequentialist and normativist oriented action and interaction. They attend to following the rules of the algorithm. Blockage of performance of the global algorithm and disequilibrium will result whenever: (1) the two sub-algorithms contain at least one contradiction such that the conclusions of a rule in ALGl contradict the premises of a rule to be executed subsequently in ALG2.^ (2) one or both actors makes mistakes or mis-performs so that the conclusions of a rule in ALGl contradict the premises of a rule to be performed subsequently in ALG2. (3) In the context of performance, not all the situational or local conditions for applying and executing rules in the procedures are satisfied. Routine interaction is disrupted or blocked by unexpected conditions, or more generally chaos, so that certain situational conditions necessary for the performance of the interlocked role algorithms are not met. In the following two cases, action determination takes place according to the application of values in value judgments - this according to the principle of action determination (equation 29.3). The role relationship is characterized by quasialgorithms (that is, not complete) and rule complexes. The value directed judgment processes construct and/or select among available alternatives, programs, and rule complexes in general. In other words, the actors check to see if their values are realized in actions they undertake vis-a-vis one another. In their "strategic" or instrumental actions and interactions, the focus of the actors is on the consequences (con) of actions and interactions, that is v specifies Q{v) which Q{con{A)) is to satisfy. In the case of normativist (that is, "non-consequentialist") action, the actors focus on the properties (pro) of actions and interactions themselves, that is, v specifies Q{v) which Q{pro{A)) is to realize. Cognitively and evaluatively, these are substantially different modalities of action determination. ^ A weaker version is that they fail to satisfy the premises of a rule to be performed subsequently in ALG2 (that is, the conclusions of a later rule may contradict earlier rules but this is not a problem when actors automatically and locally perform in the way indicated here).

372

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

(2) Consequentialist-oriented interactions. Given the role complex of actors 1 and 2, {ROLE{l),ROLE{2),R}, the actors orient to trying to realize role specified values in the outcomes or payoffs of action(s) under consideration. The actors focus on dimensions and qualia of action outcomes, "states of the world", or "payoffs" associated either with a constructed action or a set of action alternatives under consideration in a choice situation, in a word v specifies Q{v) which Q{con{A)) is to satisfy. One form of such a mode of action determination is found in classical game theory, entailing the players of the game attempting to maximize or optimise a result or outcome for self. In this case, the actors are assumed to be self-interested, autonomous agents. Related forms of interaction have been investigated by Bums (see [2]) and in [8],[10],[16]. These entail, among others, (1) variation in the goals of the actors; actors may be oriented strategically to one another, for instance, in trying to help the one another ("other-orientation"), or they may be oriented to joint or collective beneficial outcomes; (2) the actors orient in their game situation to multiple values, which often results in dilemmas, or divergent or contradictory conclusions about what to do (see [8]). Typically, their action judgment process will involve the use of procedures such as weighting schemes, lexicographic, and other methods in order to resolve dilemmas. The value orientations, the relationship between the actors' values, and their modalities of action judgment will depend on the type of social role relationship between the actors. For instance, 1. Solidary actors expect one another to determine action(s) which realize a collective value or mutual satisfaction, "a just division", etc. These are the appropriate values which would appear in the action determination principle and which the actors would try to realize in constructing and/or selecting actions in the course of their interactions. 2. Competitive or rivalry relationship. Each actor is motivated (operating with value orientations) trying to construct or find strategies that give results better for self than for other ("relative gain"). The methods of classical game theory as well as other methods may be used for constructing or selecting actions that would give desired results. 3. Relationship of indifference (strictly speaking this is not a role relationship). The actors only concern themselves about the best result for self, ignoring the other agent. Depending on situational conditions, the agents may be motivated to cooperate (for instance, payoffs are convergent for the actors in the situation) or to compete (payoffs are divergent in the situation). Consequentialist-oriented equilibria. In classical game theory, autonomous agents concern themselves with their particular self interest: One major result is the Nash equilibrium, from which no actor in the game can improve his or her individual situation by choosing an action or outcome differing from equilibrium. GGT considers another variant more sophisticated (and realistic) than either preference ordering or maximization of utility. Each actor engaged in a game has in the context of their particular role relationship a value complex defining a minimally acceptable level

29 Generalized Game Theory's Contribution to Multi-agent Modelling

373

as well as an ideal or maximum goal to aim for. So, an outcome that satisfies the minimum for each actor would be a type of equilibrium. However, those who had hoped for a better result are likely to be disappointed and partially dissatisfied and are inclined to search for other possibilities. So the equilibrium under such conditions is an unstable one. The more that the agents realize their more ideal expectations, the more stable the equilibrium. For instance, this is true of negotiated contracts and prices on a market. Related work ([8],[10],[16]) shows that there are multiple equilibria, and that the social relationships among the actors (for example, relations of solidarity, competition, or enmity) determine the particular equilibrium (or equilibria) as well as lack of equilibria that obtain in a particular game such as a prisoners' dilemma game. Actors with a competitive relationship where each tries to outdo the other will lack an equilibrium in a strict case. In consequentialist oriented interaction, the actors cannot determine an equilibrium if they fail to obtain or to be able to use information about outcomes or information about the connection between actions and outcomes. Also, disequilibrium results if actors' expectations or predictions about outcomes satisfying values fail to materialize. A narrow focus on outcomes - ignoring the qualities including ethical qualities of action and interaction - implies that actors behave as if "the ends justify the means." On the one hand, this simplifies greatly judgmental computations. Once consideration of the qualities of actions - and, more generally, once actors are motivated by and take into account multiple values - they are likely to be faced with dilemmas and tendencies to vascillating behavior [8]. (3) Normativist-oriented interactions. In a context t, the rule complex {ROLE{l,t),ROLE{2,t),R} specifies how actors in their roles should act vis-avis one another. The actors pay attention to such qualities of the interaction as, for instance, "cooperation", "taking one another into account", or "fair play". These determinations entail a comparison-judgment of an action or actions focusing on its (their) qualities that satisfy or realize one or more norms referring to the intrinsic properties of the actions. That is, v specifies Q{v) which Q{pro{A)) is to satisfy. Again, actors in solidary relationships focus on producing actions and interactions that are defined as "cooperative", as "solidary", as "fair play", etc. Rivals would focus on producing competitive-like activities. Normativist equilibria. Such equilibria obtains when the actors are able to satisfy sufficiently the value(s), that is, norm(s), which apply to their actions and interaction. Of course, if the satisfaction level is minimal, the equilibrium may be an unstable one, because one or more actors would be inclined to try to improve performance of a given action scheme or to construct another scheme. Stability would, of course, obtain if they judge other schemes to be unavailable. However, under such conditions of dissatisfaction, the equilibrium would be an unstable one [10]. Obviously, disequilibrium obtains if conditions in the situation make it impossible to perform th appropriate actions correctly. In addition, mistakes may be made so that the actual performances do not satisfy the norm. The situation would also be problematic, if the actors cannot obtain sufficient information either to enact the norm or to judge the qualities of their actions (for instance, Hamlet is uncertain if

374

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

he should apply the norm of revenge, revenging his father's death, since he is not sure if the uncle (with his mother) murdered the father). A narrow focus only on the intrinsic properties of action - ignoring the consequences - is highly problematic in general. That is, the actors consider action as "right" regardless of outcomes. Those who are in competitive social relationships generate actions that have the qualities of "competitiveness", "one upmanship", etc. "Equilibrium" patterns would entail the actors generating activities satisfying the norms of competitive action. Note that there is equilibrium interactions among rivals when the focus is on the qualities of the action rather than on the outcomes (where equilibrium is not possible in a strict sense, as pointed out earlier). Several remarks are in order about the different judgment modalities for determing action: (1) Interaction processes may be characterized by some combination of routine, outcome oriented and normatively oriented determinations, for instance, combinations of instrumental or strategic calculation with normative. However, as indicated earlier, this requires particular judgment procedures to resolve value dilemmas or conflicts when they arise in the course of interaction. (2) Actors who interact according to a conmion rule regime or complex are aware that they are playing a particular game together - that is with given rules, to which the players orient (but which they may adhere to or comply with to varying degrees). G{t) is not just any collection of rules whatever. Of course, the actors may disagree (or try to deceive one another) about what the actual rules are. (3) The GOT conception of a game does not contain a procedure to "solve the game." In other words, the concept of a game structure is not a means or a procedure to solve it, for instance, to resolve problems of coordination or conflict, which it may entail. The players of a game have, of course, procedures to determine action and, possibly, also to resolve conflicts [9]. (4) In the GGT framework, rules are distinguished from the performance of rules, namely the process of applying and implementing rules in concrete activities (in time and space). Among these activities is not only the performance of particular action rules such as norms and prescribed procedures but adaptation to interaction situations and conditions (5) In the GGT perspective, the results of a performed game are: (a) patterns of interaction which are largely predictable within rough limits on the basis of the game complex G{t), the actors' roles in the game, and situational conditions as well as the larger context (see Figure 29.1); (b) Some outcomes will be equilibria, others not. When one or another outcome is a normative equilibrium (satisfying or realizing common norms or values), then it is likely to be a stable, enduring result. Otherwise, one or more of the players (or even external players) may challenge it. Interaction is not mechanical; (c) Role performances do not necessarily result in equilibria. Certain game contexts - and configurations of rule complexes - have greater likelihood of ending in stable results or outcomes than others. In sum, the game structure G{t) in GGT is a rule complex where subcomplexes are the roles that different agents play vis-a-vis one another, and the roles are made up of subcomplexes representing key behavioral functions: {MODEL{i,

t), VALUE{i, t], ACT{i, t), J{i, t)} Cg Cg ROLE{i, t) Cg ROLE{I, t) Cg G{i)

{29A)

29 Generalized Game Theory's Contribution to Multi-agent Modelling

375

GOT treats games, roles and role relationships, judgment, and interaction processes as socially embedded, as indicated in Figure 29.1.

29.2 Applications: Simple Simulation Models of Social Regulation Formulating multi-agent game models in the GOT framework provides new insights and policy options. Typically, however, analytic results can only be obtained in cases of highly simplified and stylised situations, for instance, 2- or 3-agent closed games. For dealing with greater complexity - as well as exploring the consequences of any given proposal for institutional change - simulation is an essential tool [11]. A variety of situations of social regulation of interest to scientists and policymakers may be tackled within this framework. Our multi-agent models start with an institution or institutional arrangement, that is a specified rule regime that is applied to a population of agents in a particular institutional situation or domain. The regime defines the roles of the agents, their access to resources, and their rights and obligations with respect to such key actions as complying with norms or deviating from them, monitoring, judging, and sanctioning. A simple application related to social regulation and the maintenance of order will be presented briefly in the form of a simulation model that provides a framework for further extensions. The model was implemented with REPAST.^ The basic game situation involves a type of collective action game, where participants choose to contribute or not contribute to a collective good. This might involve making a contribution or adhering to a social norm benefiting each and all but entailing sacrifices or "costs" to individual agents. There may be some degree of social regulation and control. Two general types of regulation are of interest: (1) in one the population of agents (/), citizens, interact and influence one another within local network relationships; they are subject then to neighborhood or conmiunity regulation, not specialized, centralized regulation; (2) in the other situation, there is a specialized, central regulator A who tries to enforce in the population of agents (/) adherence to a norm (which may be an official policy or law). For this limited paper, we report only on the results of simulation runs with the first type of model. The Basic Model: Diffuse Regulative Processes (that is, self- or organic regulation) The model involves the following constitutive elements: a world or interaction domain, a population of citizens/neighbors who differ in their levels of commitment to the social norm and their communicative capabilities and who act locally as "neighbors" vis-a-vis one another. We start with a population of citizens or neighbors who have defined roles in relation to one another. In GGT we define the value orientations, cognitive models, action and judgment complexes of the population or various sub-populations of the population. ^ The programs for the basic model are found at http://www.soc.uu.se/research/utc/download/

376

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

The world is a 2-D grid (a torus) inhabited by agents (one in each cell). In each time unit of the simulation, agents engage in collective action (modeled as a game of public good provision). On each iteration as many public good games as cells in the grid are played. The neighborhoods playing the game in each iteration are randomly selected (one neighborhood may play the game more than once in one iteration). A neighborhood is a square of eight cells. In the public good game each agent has the choice of contributing or not contributing (0-1 choice). Thus, 1. We start with a population I of individuals who interact in neighborhoods. 2. We define the valuCy cognitive or model, action and judgment complexes of the population or the various sub-populations of the entire population. (A) Value factor, level of commitment to complying with community norm(s) varies across the population of individuals and may change over time. (i) Case 1: There is a subpopulation of "good citizens" (Type 1). They are committed to varying degrees to the role of good citizen, that is, complying with laws or norms. There is also a subpopulation of citizens (Type 0) that rejects to varying degrees the role of good citizen and develops another role, an anti-social role with a commitment to anti-community norms. (ii) Case 2: Case 1 is extended by considering dynamic feedback where the levels and patterns of compliance in the neighborhood affect actors' levels of conmiitment to the "good citizen" role (or its antithesis). (B) Cognitive Model factor: The agents observe what their neighbours are doing (whether they are complying, or not complying). They take the local pattern into account in making judgments about how to behave in the situation. Note that this "neighborhood principle" may be expanded to include more of the population than just the closest neighbors. {Q) Action factor: The action opportunities in the situation are to comply or not comply. An expanded version takes into account the communication of judgments or feelings among agents, for instance, "I am not complying but would like to." Or, "I am complying but would rather not." (JD) Judgment factor: The agent's disposition for compUance or non-compliance is jointly determined by the degree of prior commitment of the agent and his or her perception of the rate of compliance in the neighbourhood. This disposition is a measure of the likelihood of a certain kind of behavior. In case 2 the degree of commitment is updated whenever an agent acts, increasing or decreasing depending on whether the rate of compliance in the relevant neighbourhood is higher or lower than the agent's prior degree of commitment.

29.2.1 Simulation results with the basic model The main results obtained in a computational experiment that included 10 runs of 2500 iterations of the simulation for each set of parameter values (see parameters and values tested in Table 29.1) may be summarized in the following general conclusions:

29 Generalized Game Theory's Contribution to Multi-agent Modelling

377

Degree of commitment of actor/

neighbourhood behaviour Perception Feedback » _ -^

case 1 and case 2 only case 2

Fig. 29.2. The Basic Simulation Model

In case 1 the simulation converges to rates of partial compliance that positively depend on the initial proportion of actor types, relative levels of commitment of the two populations and the relative communication rate between the two populations (see Figure 29.3); In case 2 the simulation converges to levels of compliance that positively depend only on the proportion of types (the variation of relative level of commitment or relative communication rate has little or no effect). Table 29.1. Simulation Parameters \Parameters Values tested Average Degree of Commitment for Agents of 0.1,0.5,0.9 Type 0 (AvgComO) Average Degree of Commitment for Agents of 0.1,0.5,0.9 Type 1 (AvgComl) Communication Rate for Agents of Type 0 0.1,0.5,0.9 1 (ComRO) Communication Rate for Agents of Type 1 0.1,0.5,0.9 (ComRl) # agents type 1 / # agents type 0 (PropT) 0.25,1,4 Rate of Adjustment of Commitment (RateAdj) 0 (case 1), 0.01 (case 2) |

With respect to the problem of norm sustainability the simulation results suggest that: The sustainability of a norm depends crucially on: (a) the proportion in the population of those who share the norm; (b) the comparative levels of commitment of those who share the norm and those who oppose it; (c) the comparative communicative capabilities of both groups. A committed and/or communicative minority may shift the balance of norm compliance (non-compliance) in its favour. For instance, this is the result obtained for case 1 with PropT = 0.25, ComRO=0, ComRl=0, AvgComO=0.1, AvgComl=0.9 (see Table 29.2 and Figure 29.3) The sustainability of norm compliance also depends on the spatial configuration of the initial distribution of types of agents. For instance, in case 1, with PropT =

378

Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska

0.25, ComRO=0, ComRl=0, AvgComO=0.1, AvgComl=0.9 an initial random distribution leads to convergence close to a 70% compliance rate in the population. A compact minority of type 1 agents leads, on the other hand, to an outcome of only 30% comphance. A clustered or compact minority may shift the balance of norm compliance/noncompliance in its favour. For instance, in case 1, with PropT = 0.25, ComRO=0, ComRl=0, AvgComO=0.9, AvgComl=0.1, an initial random distribution leads to convergence close to a 0% rate of compliance in the population. A compact minority of type 1 agents leads, on the other hand, to an outcome of over 10% compliance. Simulation runs with "walls"^ perpetuate (or enable the sustainability of) a particular group, whether a group of good citizens or anti-social types. Table 29.2. Simulation results of Case 1: Relative commitment, proportion of actors types, and rate of compliance.

Relative Commitment 0.11 0.20 0.56 1.00 1.80 5.00 9.00 0.25 3% 5% 12% 20% 32% 52% 65% PropT 1.00 11% 18% 36% 50% 64% 83% 89% 4.00 34% 46% 70% 80% 88% 95% 97% The basic model allows for extensions that may be used to explore specialized and centralized regulative processes. These multi-agent systems include at least two types of social roles, that of a regulator and those of citizens or neighbors. The public agency A is responsible for solving the collective action problem - that is, the contribution of individual citizens to a collective good or compliance with a norm or law. The agency tries to control the actions of the population(s) of societal agents (such as citizens, business enterprises, public agencies). It does this by monitoring, assessing, sanctioning, etc. It determines a public policy or strategy judged to ensure compliance and enforces it through one or more concrete measures (incentives, sanctions, moral appeals, or persuasion). Formal regulatory actions are assumed to cost money and other resources, but the resources can be allocated in different ways (over time and space, and in relation to concrete events and developments). The population of citizens I are required or expected to adhere to the norm, policy, or law, and the regulator tries to steer or control them. In such a model, the population may or may not be homogenous in its moral predisposition (see below). Different agency strategies and resource levels are run in relation to the population I which is predisposed to, for instance, conditional contribution (or non-contribution). The level of compliance is a function of the perceived level of monitoring and sanctioning of the regulator (and possibly also of neighbors) concerning compliance with respect to a norm which prescribes contribution or compliance. A "wall" is a space on the grid which is left empty so that neighborhoods are split - there is no direct influence or conmiunication across a wall.

29 Generalized Game Theory's Contribution to Multi-agent Modelling

379

4.00 1.00 o

0.11

0.20

0.56

1.00

1.80

5.00

L0.25 9.00

Relative committment (AvgComl/AvgComO) ComRO=0, ComR1=0

10%-20% • 20%-40% • 40%-60% U 60%-80% • 80%-100%

Fig. 29.3. Computational experiment results: Average rate of compliance in the population in case 1 (data from Table 29.2) In the extended model, the following role structures are specified: (1) the value complexes of the regulatory and citizen groups; (2) the cognitive models of the regulatory and citizen groups; (3) the regulator's and citizen groups' action opportunities and constraints; (4) the judgment functions of the regulator and citizen groups. A key aspect of the model is that failure to effectively regulate agents' behavior may result in a major system failure. For instance, in the case of electricity, a excessive electricity consumption overloads the system and a "blackout" or crisis occurs. "Blackout" can refer not only to a collapse of electrical supply but to other types of societal "breakdown," It is a concept that would apply to any major resource depletion (or pollution crisis or widespread violence) that threatens to undermine and destabilize a functioning social order, whether temporarily or extended over some period of time ~ a time factor that can be investigated using the types of models that we have developed. Such types of models enable us to systematically explore the relationships between regulatory agent value orientations and goals, budget levels, other resource constraints, allocation of resources among different implementation strategies, and the character and behavior of the population of regulates. The model - and variants of it - allows us to investigate changes in strategy - for instance, relating to resource use - as a function of policies, budget and other resource constraints, and methods and patterns of enforcement. The policies and programs of agent A impact on the different populations differentially, because of the diverse values, interests, and capabilities of the populations. When one considers that the population I may be divided into sub-populations, who

380

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

not only have varying degrees of commitment to law and order but themselves engage in regulatory processes, then we are in a position to utilize a technology to investigate multiple regulative processes in a complex society.

29.3 An Agenda for Research: GGT Institutional Analysis Applied to Security Problems in Multi-agent Systems In the GGT perspective, institutions are understood as durable social rule systems or regimes, structuring and regulating agents' games (interactions) and the social system as a whole ([4],[3]). Their operation may be considered from different perspectives. Rational choice theory (as well as classical game theory) tends to view institutions as imposing constraints on choice or as influencing the cost-benefit analyses of self-interested rational individuals. In other accounts the role of institutions is broader: they not only constrain or influence through incentives, they define roles with particular values and obligations, providing motivations for action that transcend self-interest by appealing to the agent's social identity and collectiveunderstanding. Analysing problems of security from an institutionalist perspective and understanding how institutions regulate and stabilise social systems - and why they sometimes fail to so - demands an understanding of the social and psychological mechanisms that mediate institutions and behaviour [13]. Research into such mechanisms suggests that besides constraining and influencing through incentives, institutions carry with them expressive and normative dimensions. These frame and affect the interpretation of situations, influencing individual judgements, attitudes and behaviour. Research such as that based on GGT has been developed beyond the narrow boundaries of the rational choice and classical game paradigm ([1],[2],[8],[15],[16]). It recognizes agents who are faced with, and try to resolve, judgment dilenmias, e.g., between personal preferences and obligations associated with social roles and norms. They may embark on courses of action that the mere logic of "incentives" would ignore or forbid in the name of rationality. More importantly, it suggests that incentives, namely pecuniary ones undermine in some contexts the disposition of most individuals to express - and to act in terms of - solidarity and cooperative-intentions, exercising self-restraint and making sacrifices for the common good. Control over individual action through increased monitoring and generalized incentives are likely, under some conditions, to be self-defeating. Transparent incentives structures may transform the interpretations and predispositions of actors in the situations by framing choices in cost-benefit terms instead of community commitment terms. Increased monitoring may convey a message of breakdown in cooperation, or the impression of excessive control by a regulator, possibly triggering shirking and other forms of "free-riding" [14]. More generally, the problem of security can be formulated in the GGT perspective in terms of regulating multi-agent systems, dealing with social disorder and crisis. Such regulation is organized and regulated through particular institutions, which

29 Generalized Game Theory's Contribution to Multi-agent Modelling

381

may fail to function properly or effectively under particular conditions (as in several of our simulation runes). For instance, 1. private property rule regimes may increase or decrease conflict and social disorder. Regimes which allow for overlapping rights or claims, that is unclear or imprecise boundaries or fences, may be conflict generating and result in social disorder. Therefore, the design of such regimes is typically of critical importance to social order and security. 2. Institutional arrangements operating as regulatory devices may decrease conflict and social disorder on labour, commodity, and financial markets. Again, the designed arrangements may be reasonably effective or tragically ineffective. For instance, in advanced countries, there are elaborate institutional arrangements to regulate and resolve labor-management conflicts: namely systems of negotiation, mediation and arbitration, among others. Similarly on commodity markets, there are product safety controls, product guarantees, etc. On the international level, the World Trade Organization (WTO) arrangements regulate trade and exchange in many areas reducing conflict, instability, and disorder. Also, on the international level, major conflicts between debtor and creditor countries and their banks are regulated through negotiation, intervention of powerful, developed countries (as in dealing with Latin American and Asian financial crises). Under some conditions, however, regulatory arrangements fail to resolve or constrain social conflict, but rather increase it as debtor countries decide they must default on loans, generating insecurity and disorder. In general, regulatory institutions may fail in a number of ways. 3. Money and financial systems are subject to regulation, conflict, and crisis. For instance, money and financial markets are regulated on the national level, but increasingly as they function globally, they tend to be unstable. There are powerful forces of "escalation", rapid, destabilizing developments and crises. 4. Institutionalized societal conflict-resolution procedures such as multi-lateral negotiation, judiciary processes, and democratic systems, among others, are means of regulating conflicts and social instability in many contemporary societies [9]. There are limitations of such methods. For instance, "resolutions" may not be accepted by committed minorities. What then? In societies with deep cleavages, certain democratic forms such as "first past the post" and majority rule do not resolve conflict and maintain social order, but rather tend to intensify conflict and generate disorder. In such "democratic" social systems, a majority readily dominates a minority (or minorities) and the latter may reject such domination and mobilize to neutralize or even undermine it, setting the stage for confrontation and escalating conflict. Similarly, a judicial or arbitrator procedure to settle conflicts depends on people's belief or confidence in its competence, impartiality, and objectivity. Again, there may be substantial groups in society who distrust, and act in opposition to, the judiciary or arbitrator because they are coupled or identified with a majority or oppressive group. Or, they have a history of unjust, arbitrary, or incompetent actions.

382

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

"Integrating institutions" such as Republican arrangements treat all citizens equally - and insist that all act equally in the public sphere. However, this may take the form of insisting on keeping religion or religious symbols out of the public sphere. Many religions or religious groups can accept this. But some religious groups (Christian, Jewish, and Muslim) may find it problematic because they may not separate so neatly the private and public spheres, as the Republican model demands. So, the elimination of scarves, as in France, can be seen or experienced as provocative to many Muslims. In general, some institutions in a given context operate to integrate, others to separate (establishing walls between) agents in society. Under which conditions would "walling" institutions function effectively and when would they fail? (See footnote 8). Demographics and resource conditions as well as types and levels of interdependencies across "walls" are important factors, among others, in determining conditions of success or failure of one type of arrangement or another. A number of societal regulatory problem-situations may be identified and analysed using the GGT approach: 1. The regulator problematique: finding sustainable order and security within the constraints of bounded knowledge and scarce resources. 2. Unanticipated consequences of social action in complex systems (non-calculable risks). 3. Social dilemmas and tragic choices. For instance, the tragic choice between security and civil rights and liberties in the face of rising terrorism. 4. Regulatory dialectical game: the interaction between regulator and regulatees is such that as the regulator tries to increase the effectiveness (degree or reliability) of control, the regulatees search for and find counter-measures and loopholes (or sufficient numbers of them do so that social order is disrupted and destabilized). 5. Escalating Processes: Conflict processes may be subject to positive feedback loops. This is characteristic also of armaments races. Other societal processes characterizable in this way are bandwagon effects and speculative bubbles. 6. Self-fulfilling (and self-defeating) prophesies as key mechanisms in social life. Examples are many: bandwagon effects, stock market bubbles, socio-economic depressions which are "predicted". In general, a major principle of social life is that what people believe to be real is often real in its consequences. This may be exemplified in extreme cases by beliefs about "powerful" ghosts in the Middle Ages, or "the multitude of Communists" said to be operating during the McCarthy Period in the USA; or, the threat of genetically modified organisms (GMOs) or other material "threats" that may or may not be real but the belief in their "reality" has real consequences as a result of their impact on human action and interaction. Also, what people believe not to be real has typically real consequences. The earlier, widely-held belief about the non-transferability of BSE ("mad-cow disease") to humans illustrate this. In conclusion, institutional analysis based on GGT suggests key mechanisms of societal order and stability and also appropriate design of regulatory institutions to operate effectively in dealing with security and social order issues. Security depends

29 Generalized Game Theory's Contribution to Multi-agent Modelling

383

on detecting and recognizing dangers or threats and responding to these appropriately and effectively. In general, GOT indicates socio-cognitive, value, judgmental, and organizational factors that play a role in limiting or undermining effective regulation, for instance: 1. A socio-cognitive model or frame is oriented to the wrong dimensions and misses or misinterprets, for instance, threat-signals - as a result, there is a failure to detect, understand, or respond to real dangers or hazards. 2. Similarly, institutionalised values may lead key agents to ignore or downplay particularly threatening hazards or dangers. 3. The established capabilities to deal with particular dangers or hazards have not been developed, or are limited in scale so that key regulatory systems are easily overloaded. 4. Cultural patterns of communication may be such that, for instance, agents conceal or distort information, contributing to failures of detection or of effective response to particular dangers or hazards. 5. Fragmentation among agents - due to ethnic, class, and gender reasons as well as other bases of social cleavage - contributes to blockage of information exchange and knowledge sharing. Such fragmentation may also arise in the context of deficient social organization such that those responsibility for security or social order fail to effectively share information and knowledge relating to major hazards and dangers. 6. Institutional hierarchy tends to block significant information flow, for instance, about performance failures (administrative structures are typically hierarchical in character and illustrate time and time again the problem of blocked or distorted information flow and predictable performance failings). In sum, modem social systems are complex and require sophisticated and multidimensional theoretical frameworks and models with which to understand and regulate them. GOT is a promising candidate in this regard. Acknowledgement We are grateful to Nora Machado for contributing to the conceptualization and representation of Figure 29.1.

References 1. Anderson E. (2000). "Beyond Homo Economicus: New Developments in Theories of Social Norms^.Philosophy and Public Affairs, 29, n. 2. 2. Burns, T. R. (1990)."Models of Social and Market Exchange: Toward a Sociological Theory of Games and Human Interaction." In: C. Calhoun, M. W. Meyer, and W. R. Scott (eds), Structures of Power and Constraints: Essays in Honor of Peter M. Blau. New York: Cambridge University Press.

384

Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska

3. Bums T. R., Carson M. (2002). "Actors, Paradigms, and Institutional Dynamics: The Theory of Social Rule Systems Applied to Radical Reforais." In: R. Hollingsworth, K.H. Muller, E.J. Hollingsworth (eds) Advancing Socio-Economics: An Institutionalist Perspective. Oxford: Rowman and Littlefield. 4. Bums T.R., Flam H. (1987). The Shaping of Social Organization: Social Rule System Theory with Applications, London: Sage Publications. 5. Burns T. R., Gomoliiiska A. (1998). "Modeling Social Game Systems by Rule Complexes." In: L. Polkowski and A. Skowron (eds.). Rough Sets and Current Trends in Computing. Berlin/Heidelberg, Springer-Verlag, 581-584 6. Bums T. R., Gomolinska A. (2001). "Socio-cognitive Mechanisms of Belief Change: Application of Generalized Game Theory to Belief Revision, Social Fabrication, and Selffulfilling Prophesy." Cognitive Systems, Vol 2(1), 39-54. 7. Bums T. R., Gomolinska A. (2000). "The Theory of Socially Embedded Games: The Mathematics of Social Relationships, Rule Complexes, and Action ModaUties." Quality and Quantity: International Journal of Methodology Vol. 34(4): 379-406. 8. Burns T.R., Gomolinska A. , Meeker L. D.(2001). "The Theory of Socially Embedded Games: Applications and Extensions to Open and Closed Games." Quality and Quantity: IntemationalJoumal of Methodology, Vol. 35(1): 1-32. 9. Bums T.R., Roszkowska E. (2005). "Conflict and Conflict Resolution: A SocietalInstitutional Approach." In: M. Raith, Procedural Approaches to Conflict Resolution. Springer Press, Berlin/London. In press. 10. Bums T.R., Roszkowska E. (2004). "Fuzzy Games and Equilibria: The Perspective of the General Theory of Games on Nash and Normative Equilibria" In: Pal S.K, Polkowski L, Skowron A. editors, Rough-Neural Computing. Techniques for Computing with Words. Springer-Verlag, 435-470. 11. Caldas J. C , Coelho H. (1999). "The Origin of Institutions: Socio-economic Processes, Choice, Norms, and Conventions." J. of Artificial Societies and Social Simulation, Vol. 2, no. 2. http://www. soc. surrey, ac. uk/JASSS/2/2/1. html 12. Gomolinska A. (1999). "Rule Complexes for Representing Social Actors and Interactions." Studies in Logic, Grammar, and Rhetoric, Vol. 3(16):95-108. 13. Hodgson G.M. (2002). *The Evolution of Institutions: An Agenda for Future Theoretical Research". Constitutional Political Economy, 13, pp. 111-127. 14. Kahan D.M. (2002). "The Logic of Reciprocity: Tmst, Collective Action and Law." Yale Law and Economics Research Paper No. 201. Yale Law School, John M. Olin Center for Studies in Lawy, Economic, and Public Policy. New Haven, Conn.: Yale University 15. Leijonhufvud A.(1993). "Toward a Not-Too-Rational Macroeconomics." Southern Economic Journal. Vol. 60:1, 1-13 16. Roszkowska E., Bums TR (2002). Fuzzy Judgment in Bargaining Games: Diverse Pattems of Price Determination and Transaction in Buyer-Seller Exchange. Paper presented at the First World Congress of Game Theory, Bilbao, Spain, 2000. available at http://www.soc.uu.se/publications/fulltext/tb_market-pricing-game.doc. 17. Simon H. (1969). The Sciences of the Artificial. Cambridge: MIT Press. 18. Winch P. (1958). The Idea of a Social Science and Its Relation to Philosophy. London: Routledge & Kegan. 19. Wittgenstein L. (1958). Remarks on the Foundations of Mathematics. Oxford: Blackwell.

30

Multi-Agent Decision Support System for Disaster Response and Evacuation Alexander Smimov, Michael Pashkin, Nikolai Chilov, Tatiana Levashova, and Andrew Krizhanovsky St.Petersburg Institute for Informatics and Automation The Russian Academy of Sciences 39, 14*'' Line, St Petersburg, 199178, Russia {smir,michael,nick,oleg,aka}@mail.iias.spb.su

Summary. The paper describes an agent-based approach and its application to intelligent support of disaster response and evacuation operations. The approach is based on the idea of knowledge logistics which stands for integration and transfer of the right knowledge from distributed sources to the right person within the right context at the right time to the right purpose. The problem in the approach is represented as configuring a network of knowledge sources and approach is called *'KSNet-approach". The paper concentrates on such aspects as multiagent architecture, knowledge representation formalism and presents an application of the approach via a case study.

30.1 Introduction Disaster response and evacuation operations are very likely to be based on a number of different, quasi-volunteered, vaguely organized groups of people, non-government organizations, institutions providing humanitarian aid and also army troops and official governmental initiatives. Here many participants will be ready to share information with some well specified community [1]. Therefore to manage such operations an efficient knowledge sharing between multiple participating parties is required. This knowledge must be pertinent, clear, and correct, and it must be timely processed and delivered to appropriate locations, so that it could provide for situation awareness. This is even more important when the operations involve coalitions uniting resources both of government (military, security service, community service, etc.) and non-government organizations. As a result, systems aimed at intelligent support of disaster response and evacuation operations have to meet a number of requirements including (i) support of knowledge sharing, (ii) distributed architecture for collaborative work, (iii) interoperability with other information systems, (iv) dynamic (on-the-fly) problem solving, (v) ability to work with uncertain information, (vi) constraint network notation for real-world problem description, and other.

386

A, Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky

Since successful operation management can be achieved through knowledge of the status and dynamics of the situation and its comprehension, it can be stated that the right knowledge from distributed sources has to be integrated and transferred to the right person within the right context at the right time to the right purpose. The aggregate of these interrelated activities is referred to as Knowledge Logistics [2].

30.2 Approach to Knowledge Logistics for Disaster Response and Evacuation Operations Knowledge logistics (KL) takes place in a network-centric environment. Unlike hierarchical organizations with fixed commander-subordinate relationships, nodes of network-centric environment are autonomous decision making units that can serve other units and also be served by them. With regard to computer systems the networkcentric environment is based on advanced information technologies such as intelligent agents, ontology management, Web intelligence. Semantic Web and markup languages. Support of disaster response and evacuation operations in the networkcentric environment requires rapid processing and analysis of large body of up-todate (preferably real-time) information from distributed and heterogeneous sources (experts, electronic documents, real-time sensors, weather forecasts, etc.). Hence, one of the key components of the situational awareness is fusion of information from different sources. The most influential fusion model in the area of information fusion is JDL Data Fusion Model [3]. It combines five levels of fusion: 0) sub-object data assessment, 1) object assessment, 2) situation assessment, 3) impact assessment, and 4) process refinement. Here presented approach combines KL and information fusion at level 2 of situation assessment and is based on such advanced information technologies such as intelligent agents, markup languages, ontology management, and other. 30.2.1 Multiagent Architecture As an implementation of the approach the system called "KSNet" (acronym of Knowledge Source Network) has been developed. This system uses intelligent software agents to provide access to distributed heterogeneous knowledge sources. Multiagent systems offer an efficient way to understand, manage, and use the distributed, large-scale, dynamic, open, and heterogeneous computing and information systems [4,5]. Multi-agent system architecture based on Foundation for Intelligent Physical Agents (FIPA) Reference Model [6] was chosen as a technological basis for the system, since it provides standards for heterogeneous interacting agents and agentbased systems, and specifies ontologies and negotiation protocols. FIPA-based technological kernel agents used in the system are: wrapper (interaction with knowledge sources), facilitator ("yellow pages" directory service for the agents), mediator (task execution control), and user agent (interaction with users). The following problemoriented agents specific for KL and scenarios of their collaboration have been developed: translation agent (terms translation between different vocabularies), knowl-

30 Multi-Agent Decision Support System...

387

edge fusion (KF) agent (KF operation performance), configuration agent (efficient use of knowledge sources), ontology management agent (ontology operations performance), expert assistant agent (interaction with experts), and monitoring agent (verification of knowledge sources). A community of agents is represented in Fig.30.1 according to the above described principles and functions of the system "KSNet". A detailed description of the multi-agent architecture can be found in [7]. Below, two agents specific to disaster response and evacuation operations are described.

KB - knowiedge base KF - knowledge fusion

Fig. 30.1. Agent-based architecture for the system 'KSNet'

30.2.2 Monitoring Agent Since disaster response and evacuation operations take place in a dynamic environment, continuous run-time monitoring and tracing of this environment is one of the key factors of success. This means that successful disaster response and evacuation operations can be achieved through the comprehension of the situation, knowledge of the status and dynamics of its elements and prediction of the states of the operation environment in the future. In the KSNet-approach a monitoring agent is provided for permanent checking of the knowledge sources to have actualized information about the current situation. It enables timely planning of activities in a network-centric environment since decision makers will have actualized information. 30.2.3 Knowledge Fusion Agent Agents and services for disaster response and evacuation operation support have to be dynamic, in other words on-the-fly business operation planning and management based on adaptive agents is required. Possibilities to add new agents and remove or modify existing ones based on the results of monitoring have to be provided. For

388

A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky

this purpose the described approach implements adaptive agents. These agents may modify themselves when solving a particular task. For example, within the KSNetapproach there is an agent attached to an application that is responsible for configuration problem solving (e.g., coalition configuration, routing, etc) based on existing knowledge. The task is described by an ontology stored in an ontology library. Upon receiving a task the application loads an appropriate ontology and generates an executable module for its solving "on-the-fly". In the proposed approach a novel "on-the-fly" compilation mechanism is proposed to solve varying problems. In a rough outline this novel "on-the-fly" compilation mechanism is based on the following concepts (Fig.30.2): -

-

a pre-processed user request defines (1) which ontologies are to be used for the problem domain description, and (2) which knowledge sources are to be used; C++ code is generated on the basis of information extracted from (1) the user request (goal, goal objects, etc.), (2) appropriate ontologies (classes, attributes, and constraints), and (3) suitable knowledge sources; the compilation is performed in an environment of the prepared in advance C++ project; failed compilations/executions do not fail the system work in whole; an appropriate error message is generated. Translation Agent ^ ——

User request

Code generation

/ y Knowledge sources Wrappers

- Prepared in advance templates - Generated "on-the-fly" code fragments

JL Compilation

n

y Ontologies

Ontology Management Agent

Execution (ILOG)

Knowledge Fusion Agent

Fig. 30.2. The concept of the "on-the-fly" compilation mechanism The essence of the proposed on-the-fly compilation mechanism is to write the ontology elements (classes, attributes, constraints) to a C++ file directiy so that it could be compiled into ILOG powered program (as it was mentioned above ILOG was chosen as a constraint solver in this project). The service responsible for the problem solving creates the C++ file based on these data and inserts the program source code to the program (the prepared in advance Microsoft Visual Studio project). The program is compiled in order to create an executable file in the form of dynamic-link library (DLL). After that the service calls a function from DLL to solve the problem. The

30 Multi-Agent Decision Support System ...

389

experiments showed that for complex tasks the compilation time is significantly less then the time of the task solving by the generated program. 30.2.4 Ontologies and Knowledge Imprecision The main principles considered during development of the presented approach and the system "KSNet" are originated from characteristics of modem e-business applications. These applications widely use ontologies as a common language for business process / enterprise modelling [8-9]). Since agents must represent their knowledge via ontologies [5] the FIPA ontology definition was used for the ontologies description. . Thus, the described approach is focused on utilizing reusable knowledge through ontological descriptions [10], with object-oriented constraint network paradigm being considered as a common knowledge representation notation what correlates with semantic metadata representation concept of the Semantic Web project. As a general model of ontology representation in the system "KSNet", an objectoriented constraint network paradigm was proposed [2]. This model defines the common ontology notation used in the system. According to this representation an ontology (A) is defined as: A = (O, Q, D, C) where: (9 - a set of object classes {''classes''), each of the entities in a class is considered as an instance of the class; Q - a set of class attributes {''attributes'")', JD - a set of attribute domains {"domains'')', and C - a set of constraints. However, when dealing with knowledge, uncertainties may arise due to the following reasons: (i) a lack of information, (ii) invalidity of information, (iii) subjectivity, (iv) a lack of knowledge about a problem, (v) unverbalizability of the problem, (vi) imprecision of the problem solving methods. Taking uncertainties into account is especially important for such operations as disaster response that have to be fast and take place under extreme conditions (e.g., damaged communications). To process the uncertain knowledge the formalism of fuzzy object-oriented constraint networks described as (O, Q, D, C^, W, T, Ip) has been chosen, where C^ - a set of constraints, and each constraint contains a function // of membership in [0, 1] associated with weight OJC representing its weight (importance) or priority; W - a weight scheme, i.e. a function combining satisfaction degree of constraint fi{c) with LUc, for estimation of weighted satisfaction degree of /i^(c); T - an aggregation function, which performs simple partial regulating on defined values, defining C^; and Ip - an information content (instances of classes) of the constraint network, which has a probabilistic nature. Constraints of attributes to classes belonging, compatibility structural constraints, hierarchical structural constraints and "one-level" structural constraints are hard constraints. All of them have to be satisfied in the found solution, i.e. for each of them cjc = 1. Some functional constraints and domains to attributes belonging can be considered as soft constraints. Within the KSNet-approach the following types of uncertainties have been selected: (i) variable contents and structures of knowledge sources, (ii) uncertainty presented in knowledge sources, (iii) low assurance of experts in their knowledge, (iv) complexity of an application domain formalization, (v) terminological conflicts

390

A. Smimov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky

during translation of knowledge from one ontology to another, (v) complexity of a user request recognition, and (vi) incompatibility of knowledge stored in different sources. This list does not pretend to cover all possible types of uncertainties.

30.3 Case Study In the considered case study a fictitious Binni region [11] is considered. The aim of the used Binni scenario is to provide a rich environment, focusing on new aspects of coalition problems and new technologies demonstrating the ability of distributed systems for intelligent support to supply services in an increasingly dynamic environment. For the presented here task of mobile hospital creation the following knowledge sources can be considered: supplies related information (required quantities of materials, required times of delivery), available suppliers (constraints on suppliers' capabilities, capacities, locations), available providers of transportation services (constraints on available types, routes, and time of delivery), geography and weather of the Binni region (constraints on types, routes, and time of delivery, e.g. by air, by trucks, by off-road vehicles). The problem of an automatic knowledge discovery is a future research. For the case study the list of knowledge sources containing information for the user request processing was defined by an expert team. This list of sources is not fixed. The scalable architecture of the system KSNet allows seamless attaching of new sources in order to get new features and to take into account more factors for tasks solved. The general problem considered in this case study for the system "KSNet" has the following formulation: "Define suppliers, transportation routes and schedules for building a hospital of given capacity at given location by given time". The following required information was defined: -

-

hospital related information (constraints on its structure, required quantities of components, required delivery schedules); available suppliers (constraints on suppliers' capabilities, capacities, locations); available providers of transportation services (constraints on available types, routes, and time of delivery); geography and weather of the Binni region (constraints on types, routes, and time of delivery, e.g. by air, by trucks, by off-road vehicles).

30.3.1 Application and Task Ontologies for Hospital Configuration As a result of the analysis of the problem the following application ontology describing the problem was built. A number of existing ontologies corresponding to the described problem were found in Internet's ontology libraries [12 - 17]. These ontologies represent a hospital in different manners using different representation formats. Firstly, the ontologies were imported from the source formats into the system notation (e.g., ontology parts corresponding to the hospital representation of the North American Industry Classification System (NAICS) code [15] and United

30 Multi-Agent Decision Support System ...

391

Nations Standard Products and Services Code (UNSPSC) [16] were imported from DAML+OIL source format). After that, they were included into the ontology library, henceforth they can be reused for the solution of similar problems. Next, ontology parts relevant to the request were combined into a single ontology (Fig.30.3). Fig.30.3 has the following notation. Reused ontology classes (the classes adopted from the Internet's ontology libraries) are shown by firm lines, reused classes that were renamed are shown by dotted lines, new ontology classes (the classes included by experts) are outlined by thick lines, firm unidirectional arrows represent hierarchical relationships "is-a", dotted unidirectional arrows represent hierarchical relationships "part-of", double-headed arrows show associative relationships. Ontology part corresponding to AO included into the case study is represented by the shaded area. The built application ontology was expanded with regard to extension of the class

Construction

Structures, building and construction components and supplies

f Transportation ]

Fig. 30.3. "Mobile hospital" application ontology "Hospital configuration". This class represents a complex task that was split into subtasks. As a result a task ontology of hospital configuration was built (Fig.30.4). Hospital configuration

4 - * -.. Components definition

-^^

A Staff definition

1f J [

""-"^ BOM definition

BOM-Bill of Materials

Y Hospital allocation

•---.

* - -> Logistics

^-^

A Resource allocation

-<^

k Routing problem 1 V Route availability

] j

] \

Fig. 30.4. Task ontology "Hospital configuration" Within the complex task the following subtasks were defined: Hospital allocation. This subproblem is devoted to finding the most appropriate location for a hospital to be built considering such factors as locations of the disaster, water resources, nearby cities and towns, communications facilities (e.g., locations of airports, roads, etc.) and decision maker's choice and priorities.

392

A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky

Logistics. This subproblem is devoted to finding the most efficient ways of delivery of the hospital's components from available suppliers considering such factors as communications facilities (e.g., locations of airports, roads, etc.), their conditions (e.g., good, damaged or destroyed roads), weather conditions (e.g., rains, storms, etc.) and decision maker's choice and priorities. Components Definition. This subproblem is devoted to finding the most efficient components for the hospital considering such factors as component suppliers, their capacities, prices, transportation time and costs and decision maker's choice and priorities. The subtask of "Staff definition" is shown as greyed since it was out of the scope of the project. 30.3.2 Example Solutions of the Problem In order to provide up-to-date routing plans the system monitors the current situation in the region. For this purpose an emulated news Web site has been implemented that contains information about weather and events in the considered region. A specially designed wrapper reads news and finds which cities/areas are not currently available for transportation. Besides, it reads weather conditions and accordingly corrects transportation time and costs for appropriate routes. Presented example illustrates

Fig. 30.5. Routing plan for the minimize time preference (in this solution four vehicles/vehicle groups are used to provide maximum of concurrency). finding a routing plan for the same conditions but with different user preferences, namely: minimize time; minimize time, then costs; minimize both time and costs; minimize costs, then time; minimize costs. In (Fig. 30.5 - Fig. 30.7) results for different choices are presented and compared. For illustration of the results a map is

30 Multi-Agent Decision Support System...

393

generated that uses the following notations. Dots are the cities of the region. The city of Aida is the city where the hospital is to be located. The bigger cities are the cities where suppliers are located (Libar, Higgville, Ugwulu, Langford, Nedalla, Laki, Dado). Transportations routes are shown as lines. The colored trucks denote the routes of particular vehicles/vehicle groups.

Fig. 30.6. Routing plan for the minimize both time and costs and minimize costs, then time preferences (in this solution three vehicles/vehicle groups are used).

Fig. 30.7. Routing plan for the minimize costs preference (in this solution one vehicle/vehicle group is used to provide minimum of costs).

394

A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky Route Plans for Different Criteria 2500 -f 2400 2300 1500

I

2200 " + 2100 2000 1900 Minimize Time Minimize time, Minimize txtth Minimize costs, Minimize costs then costs time and costs tiien time

-Time —•—Costs

Fig. 30.8. Routing plans for different criteria (time and costs minimization preferences). Fig. 30.8 represents a comparison of the routing plans created for different criteria. As it can be seen while importance of one of the parameters increases (e.g., importance for costs increases from left to right) the value of the parameter decreases (the red line with diamonds for the costs) and vice versa (the green line with squares for the time).

30.4 Conclusions Within the presented approach knowledge logistics is coupled with information fusion based on constraint satisfaction techniques. Utilizing ontologies and compatibility of the employed ontology notation with modem standards enables semantic interoperability with other knowledge-based systems and services and facilitates knowledge sharing. Application of constraint networks allows rapid problem manipulation by adding/changing/removing its components (objects, constraints, etc.) and usage of such existing efficient technologies as ILOG. Agent-based architecture increases scalability, efficiency and interoperability of the system "KSNet". Applicability of the approach is illustrated via a case study of on-the-fly portable hospital configuration as a problem of health service logistics. Acknowledgements Some parts of the research were done as parts of the ISTC partner project # 1993P funded by Air Force Research Laboratory at Rome, NY, the project # 16.2.44 of the research program "Mathematical Modelling and Intelligent Systems", project #1.9 of the research program "Fundamental Basics of Information Technologies and Computer Systems" of the Russian Academy of Sciences, grant # 02-01-00284 of the Russian Foundation for Basic Research. Some prototypes were developed using software granted by ILOG Inc.

30 Multi-Agent Decision Support System ...

395

References 1. Pechoucek, M., Marik V., Barta, J.: CplanT: an acquaintance model based coalition formation multi-agent system. In: Proc. of the Second Int. Workshop of Central and Eastern Europe on Multi-Agent Systems (CEEMAS'2001) (2001), pp. 209-216. 2. Smimov, A., Pashkin, M., Chilov, N., Levashova, T., Haritatos, R: Knowledge source network configuration approach to knowledge logistics. Int. J. of General Systems, Vol. 32, No. 3 (2003) pp. 251-269. 3. Salerno, J., Hinman, M., Boulware, D., Bello, P.: Information fusion for situational awareness. In: Proc. of the 6^^ Int. Conf. on Information Fusion, (2003) pp. 507-513. 4. Payne, T., Singh, R., Sycara, K.: Communicating Agents in Open Multi-Agent Systems. In: First GSFC/JPL Workshop on Radical Agent Concepts (WRAC) (2002) pp. 365-371. 5. G. Weiss (ed.). Multiagent Systems: a Modem Approach to Distributed Artificial Intelligence. The MIT Press, Cambridge MA, USA (2000). 6. Foundation for Intelligent Physical Agents (FTPA), 2004. http://www.fipa.org. 7. Smimov, A., Pashkin, M., Chilov, N., Levashova, T: Agent-based support of mass customization for corporate knowledge management. Eng. Applications of Artificial Intelligence, Vol. 16, No. 4 (2003) pp. 349-364. 8. Goossenaerts, J., Pelletier, C : Enterprise Ontologies and Knowledge Management. In: Proc. of the 1*^ Int. Conf. on Concurrent Enterprising "Engineering the Knowledge Economy through Co-operation" (2001) 281285. 9. O'Leary, D. E.: Different Firms, Different Ontologies, and No One Best Ontology. IEEE Intelligent Systems. September/October (2000) pp. 72-78. 10. Guarino, N., Welty, C : Towards a Methodology for Ontology-based Model Engineering. In: Bezivin, J., Ernst, J. (eds.) Proc. of the ECOOP-2000 Workshop on Model Engineering. 2000. 11. Rathmell, R. A.: A Coalition Force Scenario "Binni - Gateway to the Golden Bowl of Africa." In: A. Tate (ed.) Proc. on the Int. Workshop on Knowledge-Based Planning for Coalition Forces (1999) pp. 115-125. 12. Clin-Act (Clinical Activity). The ON9.3 Library of Ontologies: Ontology Group of IPCNR (a part of the Institute of Psychology of the Italian National Research Council (CNR)) (2000). URL: http://saussure.irnikant.rm.cnr.it/onto/. 13. Hpkb-Upper-Level-Kemel-Latest: Upper Cyc / HPKB 1KB Ontology with links to SENSUS, Version 1.4. Ontolingua Ontology Server. (1998). URL: http://www-ksl-svc.stanford.edu:5915. 14. Weather Theory. Loom ontology browser. Information sciences Institute, The University of Southem California (1997). URL:http://sevak.isi.edu:4676/loom/shuttle.html. 15. North American Industry Classification System code. DAML Ontology Library, Stanford University (2001). URL: http://opencyc.sourceforge.net/daml/naics.daml. 16. The UNSPSC Code (Universal Standard Products and Services Classification Code). DAML Ontology Library, Stanford University (2001). URL: http://www.ksl.stanford.edu/projects/DAML/UNSPSC.daml. 17. WebOnto: Knowledge Media Institute (KMI). The Open University, UK (2002). URL: http://eldora.open.ac.uk:3000/webonto.

31 Intelligent System for Environmental Noise Monitoring Andrzej Czyzewski^, Bozena Kostek^'^, and Henryk Skarzynski^ ^ Gdansk University of Technology ul. Narutowicza 11/12, PL-80-952 Gdansk, Poland [email protected]

http://www.multimed.org/ ^ Institute of Physiology and Pathology of Hearing [email protected]

ul. Pstrowskiego 1, 01-952, Warsaw Summary. The telemonitoring system, developed at the Multimedia Systems Department of the Gdansk University of Technology is discussed, aimed at environmental noise levels monitoring. A system presentation was provided, consisting of descriptions of the following elements: noise measurement units, computer noise measuring software, Internet multimedia noise monitoring service and soft computing algorithms applied to the analysis of the system database content. The results of noise measurements were compared to those obtained with acquired subjective opinions on noise annoyance. A new GIS layer was produced on the basis of this study employing data produced with soft computing algorithms. The engineered intelligent application may help in diminishing hearing problems and other diseases occurrence caused by environmental & industrial noise.

31.1 Introduction A considerable portion of hearing and psychosomatic diseases is caused by excessive industry, urban and traffic noise or any unwanted sounds occurring in everyday life. Consequently, it is expected that a reduction of their occurrence will be achieved as a result of implementation of the solutions that have been developed within the project scope. The latest technological advances in information technology were used in the course of the project realization [1][2]. Consequently, it is shown in the paper that the presented solutions are based on some innovative ideas and inexpensive technical means for measuring noise and assessing its annoyance [6] [7] [8]. The intelligent processing of resulting data allows fast evaluation of noise influence on humans. It is expected that implementation of the noise telemonitoring system covering whole country will contribute to rising awareness of society and authorities with regard of the influence of noise on health. Furthermore, it turns out to be an essential factor in the future improvement of the environmental noise conditions.

398

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski

31.2 System design The recently developed multimedia noise monitoring system is addressed to all users interested in problems related to noise. It offers not only objective noise measuring methods, but also electronic questionnaires for subjective opinions survey. The measuring noise system consists of the following functional elements: a USB device with a measuring microphone which is used for signal acquisition (the device can be connected to any PC computer equipped with USB interface) and a software for calculating noise parameters (Fig. 31.1) according to valid norms [3].

-RMS.IIv^t
F 721

MfeTgirf^v,r F^k valufi (dB)—- " - p ^ - 4 - | . p A v ^ l ^ . f ® ^ f r ~ ^ ! 787 F 795 ' 5 791 j 69.4 F 70 3

pAcc^ojainito¥N'-87/B-0215f/02F-|r 1/3-octave bands speclfum -At^sbii^hgf^fellavlj&agslfatef^'bar^

1

Averdj^ «ownd tev^ j60 2

Number of 1 ^-octave band

Fig. 31.1. Sample window of the noise measurement program. The application cooperates with the USB device and another system component, which is an Internet service with specially dedicated database open to exploring with data mining algorithms. For the needs of remote and continuous noise measurements, another device based on specially designed microcomputer was developed. The device enables continuous measuring of noise and makes it possible a wireless transmission of results to the database of the system. Consequently, the system was equipped with a modem for data transmission by the GPRS protocol. The elements of the device are presented in Fig. 31.2. The implemented calibration method allows ones the usage of the developed sound interface and the measuring microphone. When the measurements are over, the program can send results through the Internet to the central server for their conmion storage, processing and analysis. Noise measurement in Multimedia Noise Monitoring System uses client's PC system or dedicated embedded miniature PC computer system for automatic measurements. The prototype device developed to achieve these goals offered extended functionality compared to the requirements of MNMS (Multimedia Noise Monitoring System) system [2]. The extensions resulted from the idea that the same device

31 Intelligent System for Environmental Noise Monitoring

399

could also be used in telemedical applications developed earlier by the staff of the Multimedia Systems Department with a close co-operation with Warsaw-based Institute of Physiology and Pathology of Hearing [4,5].

Fig. 31.2. Components of the environmental noise monitoring system: a) compact computer; b) GPRS (General Packet Radio Service) sender; c) GPS (Global Positioning System) receiver It was assumed that every measurement would be triggered by the PC host and would consist of two phases. During the first phase the system records a noise sample which is used for further calculations. The acquisition of geographical location data takes place afterwards. After the successful measurement completion the device is put into suspension mode in order to reduce power consumption and prolong its effective use outdoors. The proposed device acts as a meter allowing for a simultaneous measurement of noise and geographical localization data. This can ease and speed up the generation of noise maps and allows quick discerning places where the noise level is dangerously high. The device itself, along with the accompanying software, should be an interesting alternative to noise level meters currently available on the market. The scheme of the general MSMH system characteristic is seen in Fig. 31.3. The system architecture allows for measurement data acquisition using two methods. The first one is based on the application of specially prepared network communication protocol. It is also used to control automatic measurement stations. The second method involves direct data entry into the database using the SQL text protocol. The database is accessible through a Web service. A dedicated server has been set in order to support the monitoring devices. TCP/IP communication support has been implemented, compliant with the prepared protocol. The server supporting TCP/IP protocol and the communication with the SQL database has been equipped with additional abstraction classes.

400

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski Computer with USB Measurement Equipment Computer with Web BrcwsaComputer with Web Browser Noise Measurement Equipment OPRS

Data Acquisition Server

MSMH Serwer

Fig. 31.3. The general MSMH system lay-out Common C++ class lets clients to be independent from the operating system, on which the server runs. The Gnome Database Access library allows significant flexibility in server's co-operation with various SQL databases. Such an approach resulted from the need of easier modification and upgrade of the system in the future. The second key server component is the Web service responsible for the presentation and processing of data collected by the engineered multimedia noise monitoring system. The PHP (PHP Hypertext Pre-processor) has been selected as a principal programming language. The service modules responsible for database operations and chart generation have been underscored. This approach, separating the presentation part and the data storage part (operated by the use of static templates of Web pages) from service logic enables simple modification of the content and behavior of the service. The Web service of the Multimedia Noise Monitoring System consists of a number of modules and sub-modules, whose mutual relations have been presented in Fig. 31.4. The main part of Web service consists of three basic modules: Administrative Panel, System User Zone and Operational Module.

31.3 Presenting Acquired Data One of the main functions of Multimedia Noise Monitoring System is the presentation of data in a form comprehensible for the user. The function offers two methods of data visualization: traditional charts and noise maps.

31 Intelligent System for Environmental Noise Monitoring

401

Multimedia Noise Monitoring System

Administrative Panel

System User Zone

Administration of the mostimpoftan options and system settings

Options connected with user datas and contact with service

Main pari of the Web service available for all users

Visualizatbn

Consultation

presentation of data in a form compreliensible for the user

Descriptions and

Fig. 31.4. Web Service structure

31.3.1 Noise map modules The engineered service has been designed for displaying simple noise maps. The main feature of a noise map is the visualization of sound intensity in a specifically defined area. In various GIS (Geographical Information System) systems noise maps are one of many information layers presented to the user. In case of the noise maps module of the system the multi-layer rule has also been maintained, but it is simplified in order to allow its display in a Web browser. The presentation of the system's noise map consists of a series of overlapping raster images. An example of a noise map for Gdansk University of Technology is presented in Fig. 31.5. When the map is displayed in a Web browser, specific images are adjusted to the zoom and offset coefficient entered by the user. In a hierarchy of a series of images a noise map layer is presented as the lowest positioned layer. Raster image presenting the sound intensity information in a given area is read from the Web server, just like the remaining images. In majority of cases

402

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski

it is a final result of calculations and simulations in a given application supporting noise maps creation. |^^pgBH|||ffP 4^m^

. ^ . ^ J

/ § - ^ W i s a i ^ ilUubone ^J*M<x^

- > ^

^

J

s 1 ^ http //sound efi pg gda pl/haias/index chp'ld=l 14

j j

f>l»fW^

UiCW *

Mulfimedlalny System Monitorowania Halasir J Lista map hatd>u Mapa haiasu I Mapa hatasu (dia pola iaeq}

[g^oc*,,^ [ijz,

Fig. 31.5. Sample acoustic map - noise at the area of Gdansk University of Technology

31.3.2 Measurement Results Data Visualization The user must specify which region the searched measurement point is located in. For the selected region a list of cities with measurement devices is displayed. After selecting the city the user will see the map of the selected area with marked measurement points. The final selection concerns a specific measurement point. It can be selected by clicking a box on the map or by selecting a measure point from the list. For each measurement point one can specify a time range, for which specific parameters will be presented. An example of a selected measure point can be observed in Fig. 31.6. After selecting a measurement point and specifying a required time range one can display the results in graphic or table form. Measurement card for a given point contains a table including available noise parameters and a chart presenting the results in a graphic form. Fig. 31.7 presents an example of a page containing the results of measurements. By clicking a selected parameter in the table one can add of remove it from the chart. To simplify the process of viewing the results for other points, appropriate links have been added. Therefore one can select another measurement point for the same city or specify a new location.

31 Intelligent System for Environmental Noise Monitoring

403

31.3.3 Visualizing Survey Results The Web service offers access to the survey to every interested user. The survey enables users to express their own, subjective opinion about the acoustic climate in the place of residence. Subjective research is a perfect addition to objective measurements, as it allows collecting information about noise spitefulness directly from the inhabitants of an area. Survey results are automatically processed by the system. A number of results' presentation methods have been prepared. They may be charted on the map of regions of the country, for a given city in the form of circle charts or in the form of collective circle charts for the whole country. The user may select an appropriate presentation method. ^^^^J Pik

SUtm

Wkti^

Uubiqne

M^rt^dM

Ptrnt

Mulfimedialny System Monitorowania Hatasu 1 .WvkreSY : W y b 6 r l o k a l i z a c j i p o m i a r u

I I VVybor p u n k t u p o m i a r o w e g o d)<3 m Oost^pne punkty pomiar

®

liPijy^(C)gCH»jXife..

P o k s r wyniki z pr:eii!ialu

czasowego'

^^^^^

(i^asna 2 pcgmi wmiJlrcijftm&^.fi

mmrnEm Fig. 31.6. Selecting location at which the measurements are made.

31.4 Analysis of database content The content of the database can be analyzed in various ways. One possibility is assessing the subjective impression of loudness of environmental noise by people living or working in the areas endangered with excessive noise levels. This is a way of assessing hearing acuity together with overall sensitivity to sound which may be determined also by some psychological causes (tiredness, nervousness or reverse: habituation effects). The second case assumes that overall response to noise revealing noise annoyance and possibly the risk of hearing diseases may be determined also on the basis of analyzing data gathered from subjective opinions.

404

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski

mm ; m.

et^rgs

4m 4 WKJok

mubtone

Uanxtiaa

fmasc

Mulrimedialny System Monitorowania Hatasu

S Wyniki pomiarow dla miasta: Gdynia, w miejscu! Chylonia, skrzyzowanie: Chylonska - kartuska

Has«o

IB^^

Parametrjmjfi max 56 ^den [dgl

akt

ytutei

36

52

45

Lfeldfil

21

62

57

LdTdBI

45

74

45

LflBQ fdBl

4S

60

55

<^

€> Fig. 31.7. Web page displaying results of acoustic measurements.

The hearing sensitivity is understood in this context as sensitivity to perceived sound (employing hearing sense and psychological determinants). The sensitivity decrease could be objective (hearing loss) or subjective (habituation effects). This kind of a study seems very interesting for hearing pathologists, psychologists and environmental engineers. The analysis can be done employing statistical or data mining tools. We decided to make database querying with two data mining approaches: 1. basing on assessing fuzzy rules and perception-based data processing principles introduced by Zadeh [9]. The difference d between the typical (regular) noise loudness impression and the impression reported by system users (expressed in dB) is the subject of our interest, because it can reflect the decrease of generally understood hearing sensitivity. It is assumed that the users express their subjective impressions in the natural language (from NONE to ULTRA LOUD), thus fuzzy logic provides suitable tool for the computing in our case [4]; 2. basing on collecting subjective assessment results data in a decision table and applying rough set data analysis method. This approach was successfully tried earlier with regard to subjective opinions processing related to "computing with word" concept [10]. 31.4.1 Case 1: Perception-based data processing In the first discussed case there are two premises. One is associated with the information on regular loudness scaling; in further considerations it is represented by

31 Intelligent System for Environmental Noise Monitoring

405

the Norm variable. The other premise is associated with the investigated results of noisy areas inhabitants' subjective loudness impression; it is represented by the Sub variable. In order to differentiate the fuzzy sets associated with individual premises, labels of fuzzy sets associated with the first premise use lower case letters while those of fuzzy sets associated with the second premise employ upper case letters. On the basis of available information one can design a rule basis according to the following guidelines: - Premises pointing to consistence of loudness sensation evaluation for regular loudness scaling and for the investigated loudness scaling generate a decision stating no scaling differences and marked with the label "none'\ e.g.: If Norm is soft AND Sub is SOFT THEN d is none Zero (none) difference is a special case of difference. The experimental results lead to a conclusion that the output of the described fuzzy system can be described by a set of thirteen membership functions (Fig. 31.8) expressing the difference between the loudness sensation evaluation in noisy conditions and the evaluation for regular hearing. Fuzzy sets obtained in this fashion can be described with the following labels (describing the difference size): the MF in the middle of the Fig. 31.8 reflects to the label: none, then to the right there are the following lables: very small, very small'¥y small, small+, medium, mediums, large, large+, very large, very large-\-, total, total-\-. Labels marked with "+" sign denote positive difference (hypersensitivity). From the mid MF to the left the assigned labels refer to the negative difference.

Fig. 31.8. Output membership functions. If the given result of loudness scaling differs by one category of loudness sensation evaluation, the decision is associated with the output labeled "small" in the case of a negative difference or "small+ " for a positive difference. e.g.: If Norm is very soft AND Sub is NONE THEN d is small IF Norm is loud AND Sub is ULTRA LOUD THEN d is small+ It was found that the difference by two or more categories are lesfrequent, however they are also supported by adequate decision rules, e.g.: IF Norm is soft AND Sub is LOUD THEN d is medium-^ IF Norm is ultra loud AND Sub is SOFT THEN d is large IF Norm is medium AND Sub is ULTRA LOUD THEN d is large+

406

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzyiiski

The fuzzy rule processing followed by the defuzzyfying allows for calculating crisp values of d for various points on the acoustical map of the investigated region. First the definition of membership function for input variables is needed. An example of a set of membership functions for the frequency band of 500Hz obtained by approximating the factual values of membership functions with triangles is illustrated on Fig. 31.9. In practice such functions should be determined for various frequency sub-bands, typically with center frequency: 500 Hz, IkHz, 2 kHz and 4 kHz. The approximation of fuzzy set boundaries is done algorithmically on the plane containing scattered data. The algorithm used by the authors to determine the triangular membership functions involves the following steps:

90

100 110 120

Fig. 31.9. Approximation of fuzzy sets boundaries. Dots represent (processed) answers of noise monitoring system users as to their subjective noise loudness impression (expressed in terms of degree of membership to individual loudness categories). X-axis represents measured noise levels. Labels of membership functions are: NONE, VERY SOFT, SOFT, MEDIUM, LOUD,VERYLOUD, ULTRA LOUD.

Finding the value of the first element belonging to the given fuzzy set (value of the first argument, for which the factual membership function takes a non-zero value) For determining the first arm of the triangle one considers all the elements of the factual membership function MF fulfilling the equation: x:{\/

{MF (xi) ~ MF (x,_i)) > 0

(31.1)

where i - indices of arguments of membership functions MF fulfilling condition (31.1), Calculating parameters ai and 6i of the straight line: y = aix -\-bi For determining the second arm of the triangle one considers all the elements of the factual membership function MF fulfilling the equation: X : ( V {MF {xi) - MF (x^.i)) < = 0

(31.2)

31 Intelligent System for Environmental Noise Monitoring

-

407

where i- indices of arguments of membership functions MF fulfilling condition (31.2), Calculating parameters a2 and 62 of the straight line 2/ = ^2^ + 62, Calculating the point of intersection of straight lines y = aix -\- bi and y = a2X H- 62 (determining the triangle vertex), Calculating zeros of both lines.

As in this case individual elements may belong to more than two fuzzy sets, further fuzzy logic-based processing is more compHcated [4]. A side effect is that membership functions, which share a part of their domain with domains of other membership functions (intersection with more than two other fuzzy sets), may not have the maximum value equal to 1.. It turns out that such situation is only possible when determining membership functions on the basis of averaged results of loudness scaling, as only then each fuzzy set "neighbors" (intersects) at most two other fuzzy sets and there are elements, for which the average value of loudness scaling results points directly to a given category of loudness sensation evaluation. Usually the situation when the whole population of regular-hearing persons would evaluate the hearing sensation of a given test signal level exactly the same does not happen. As in fuzzy processing using functions that reach the maximum membership value of 1 is recommended, in the discussed case one needs to normalize each membership function. 31.4.2 Case 2: Rough-set based data analysis Since our system measures noise levels and simultaneously allows people express their subjective opinions about the noise harmfulness (using electronic questionnaires), so that there is possible to investigate the relation of noise occurrence and annoyance of people exposed to noise. Meanwhile, the noise annoyance can be defined also objectively, on the basis of measured data. New noise annoyance indicator is defined according to the ISO 1996-2 norm. The day-evening-night level Lden in decibels (dB), is defined by the following formula:

Lden = 101. ^ . — f 12 .10-ir^ -F 4 • 10

r^"^ + 8 • 10-^ro— j

(31.3)

in which: -

Lday is the A-weighted long-term average sound level as defined in ISO 19962: 1987, determined over all the day periods of a year; Levening is the A-wcightcd long-term average sound level as defined in ISO 19962: 1987, determined over all the evening periods of a year; Lnight is the A-weighted long-term average sound level as defined in ISO 19962: 1987, determined over all the night periods of a year;

408

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski

Adequate norms define permissible noise levels as shows Tab. 31.1 presenting demands for quiet areas. The assessment of the annoyance provided by the proposed system is based on both: the measurement procedures resulting in Lden value, and a collection of information related to the respondent's subjective evaluation. The rough set data analysis serves in the engineered system as an expert procedure allowing for the correlation between objective measure and subjective evaluation. Therefore the main objective is to compare assessments of exposure to noise done by a human with the objective data coming from measuring module of the system. The respondent's answers are obtained from the electronic questionnaires. The corresponding data are collected in the Pawlak's decision table along with the Lden value (Tab. 31.2). The attributes Ai,...,Am are related to parameters such as: respondent's age, perception of loudness, noise occurrence, and noise type, vulnerability to distraction by noise, to noise interference on communication and work performance, to anxiety & fear, and finally to subjective perception of stress. Among the attributes contained in the Tab. 31.2 one can see those related to the categories of loudness perception. This is a way to find the correlation between subjective perceptions and objectively measured values basing on rough set-based data processing. The data may have descriptive character (string values) or can be expressed by numerical ranges. The following decision attribute set is valid for descriptive values: {none, low, medium, high, very high, ultra high}.

Table 31.1. Normative values of noise indicators. Residential areas and noise-sensitive buildings housing public institutions (schools, hospitals, nursing homes, etc.) Single buildings in the open country Service enterprises (hotels, offices, etc.): Recreational areas where people stay overnight (holiday houses, allotment gardens, caravan parks, etc.): Other recreational areas where people do not stay overnight

Equivalent level, Lden Maximum level, LAFmax 55 dB 70 dB

55 dB 60 dB

70 dB 75 dB

50 dB

65 dB

55 dB

70 dB

The form of rules derived on the basis of analyzed cases contained in the respondents' database filelds is of the following form: (attribute-Ai)=(valueMil) and. and (attributeJirn)= (valueMnm) => (decisionjy)^ {value.di) The data are gathered from all respondents over a period of time. Having collected results for number of respondents, these data are then processed by the rough

31 Intelligent System for Environmental Noise Monitoring

409

Table 31.2. Decision table containing respondents' data. Respondent/ attribute Ai A2 an ai2 ti

•^m

D

t2

^21 022

CL2m

di d2

tn

0"nl

an2

0>nm

dn

dim

set algorithm [10]. In Tab. 31.3 some records from data collected by the system are shown. These data are related to street noise evaluation. Table 31.3. Database of respondents ' records. B C D Respondent/ Parameters A cont. cont. 1 85 loud i n 85 very loud impuls. freq.

I J Annoyance med. med. 54 high 54

high

Denotations: A - -Lden [dB] B - Loudness {none, very soft, soft, medium, very loud, ultra loud} C - Type of noise {impulsive, non-stationary, stationary, continuous} D - Occurrence {rare,fi^equent,often, continuous} E - Distraction{loWy medium, high} F - Communication Interference{low, medium, high} G - Performance Interference {low, medium, high} H - Anxiety & Fear {low, medium, high, very high, ultra high} I - Stress{low, medium, high} J-Age category{positive integer value} Annoyance - {very low, low, medium, high, very high} The first step of rough set processing is related to the elimination of rows in decision tables that are duplicated (superfluous data elimination). Further steps result in generation of rules and rough set measure, and computation of reducts allowing obtaining the reduced form of rules based on the indispensable attributes only [4] [5] [7]. Examples of rules processed by the system are shown below. One can see that some cases will be contradictory depending on the respondent's age and his/her sensitivity to the noise exposure, and in addition to the type of noise. IF ^=85 A B=loud A C=cont A i:)=cont. A £'=med. AF=med. AG=med. A iJ=low A /=med. A J=54 => Annoyance=med. IF A=85 A B=very loud A C=impulsive A jD=freq. A £;=high A F=high A G=med. A i7=high A /=high A J=54=^ Annoyance=high The real interest does not concern, however, the meaning of the above shown rules which is obvious to any data analyst, but it is related to the rough measure /J,RS associated with these rules. That is because the rough measure can be used in the

410

Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski

system as a weighting factor for determining the correlation between the objective values of measured L^en and subjectively evaluated annoyance. The accuracy of decisions produced by the intelligent database analysis algorithm is expected to grow higher as the number of respondents' records is increased.

31.5 Conclusions Discerning correlation of objectively measured quantities and perception-based categories is one of most important problems in many disciplines of science, including acoustics. The engineered noise telemonitoring system was designed in such a way that it allows to measure noise in endangered areas and to study the influence of environmental noise on humans. Two kinds of soft computing algorithms were employed to that end originating from fuzzy sets and rough set theories. The engineered intelligent application may help in diminishing hearing problems and other diseases occurrence caused by environmental & industrial noise.

References 1. Czyzewski A., Kotus J., Web-Based Acoustic Noise Measurement System. 116th Audio Engineering Society Convention, Preprint No. 6006., Berlin, 08-11 May, 2004. 2. Czyzewski A., Kotus J., „Universal system for diagnosing environmental noise" Management of Environmental Quality, pp. 294-305, vol. 15, No. 3, 2004. 3. Directive 2002/49/Ec of the European Parliament. 4. Czyzewski A., Kostek B., Suchomski P., Automatic Assesment of the Hearing Aid Dynamics Based on Fuzzy Logic - Part I., Konferencja TASTED, the Third lASTED Int. Conf on Artificial Intelligence and Applications, September 8-10, 2003 Benalmadena, Spain 5. http://www.telewelfare.com/ 6. Miedema H. M. E., Vos H., Noise sensitivity and reactions to noise and other environmental conditions, J. Acoust. Soc. Am. 113 (3), 1492-1504 (2003). 7. D. Ouis: Annoyance from road traffic noise: a review. Journal of Environmental Psychology, 21,101-120 (2001). 8. M. Heinonen-Guzejev, H. S. Vuorinen, J. Kaprio, K. Heikkilag, H. Mussalo-Rauhamaa, Self-report of transportation noise exposure, annoyance and noise sensitivity in relation to noise map information. Journal of Sound and Vibration 234 (31.2), 191-206 (2000). 9. Zadeh L.A., A New Direction in AI: Toward a Computational Theory of Perceptions. AI Magazine 22(31.1): 73-84 (2001) 10. B. Kostek, „Computing with words" Concept Applied to Musical Information Retrieval. Electronic Notes in Theoretical Computer Science, 2003, No 4, vol. 82.

32

Multi-agent and Data Mining Technologies for Situation Assessment in Security-related Applications Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov SPIIRAS, 39, 14-th Liniya, St. Petersburg, 199178, Russia {gor,ok,samovl}@mail.iias.spb.su Summary. The paper considers one of the topmost security related problems that is situation assessment. Specific classification and data mining issues associated with this task and methods of their solution are the subjects of the paper. In particular, the paper discusses situation assessment data model specifying situation, approach to learning of situation assessment, generic architecture of multi-agent situation assessment systems and software engineering issues. Detection of abnormal use of computer network is a case study used for demonstration of the main research results.

32.1 Introduction Security-related problems, which recently became of great concern for human society, constitute a new class of applications within information technology scope. Among such applications, the most important ones are those associated with security of critical state infrastructures including computer networks and information systems assurance, safeguard and restoration of critical enterprises like nuclear power plants, electrical power grids, etc. Other important class of such applications covers assessment of threat and prognosis of development of situations associated with large scale natural and man-made disasters and mitigation of their negative impact on the environment. Very specific class of security-related applications is caused by the necessity to predict terrorist intents and counteract against terrorist attacks. The list of such security related applications of topmost concerns can be continued. From information technology point of view, security-related applications possess a number of common very specific properties making extremely difficult the development of the corresponding decision making and control systems. Among such properties, the most specific ones are multiplicity of distributed data sources, heterogeneity, incompleteness, uncertainty and temporal nature of input data to be fused for decision making, large scale and distributed nature of decision making problem, etc. The above impulses new research in the area of distributed intelligent information systems ([11], [12], [1], [8], [15], etc.) whose main objective is a so-called situa-

412

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

tional awareness task that is understood as the in-depth comprehension, prediction, and management of what is going within the system and environment of interest. The experience accumulated with regard to situational awareness problem allowed creating a general model of data processing within respective applications, socalled JDL model^ [14]. It considers hierarchy of tasks associated with the situation awareness-related applications (Fig. 32.1). In the conmionly accepted the situational awareness is a situation-centric problem, whose most significant subtasks are Object Assessment often referred to as Data Fusion and Situation Assessment referred to as Information Fusion. Both these tasks are currently the subjects of intensive research ([1], [8], [15], etc.]). Level 0: Preproeessing of sensor data Sensor 1 Level 1: Object assessment Sensor 2 l_J

Level 2: Situation assessment Level 3: Impact and/or threat assessment

Sensor N

Sensor and resource management

iir-( Level 4 j ^ Process refinement

Data Base Management System Fusion DB Support DB

Fig. 32.1. JDL model of data and information fusion [Salemo-01] Certain important aspects of the situation assessment task constitute the main subjects of this paper. On the one hand, distributed nature of situation assessment system input data necessitates the use of distributed architecture. In this respect the paper takes advantage of multi-agent architecture for systems in question. On the other hand, incomplete and temporal nature of input data makes the decision making problem rather specific, and this issue is also a subject of the paper. It is further shown that due to the aforementioned properties of input data the classification has to be produced based on multiple asynchronous data streams. Unfortunately both learning of classification and classification itself for such kind of data are poorly investigated. The paper proposes an approach that allows coping with the respective learning and classification problems. The subsequent part of the paper is organized as follows. Section 32.2 introduces the basic notions associated with the situation assessment task and outlines specific features of input data model used for situation assessment. Section 32.3 presents briefly the developed and completely implemented methodology of situation assessment based on information fusion and outlines the multi-agent architecture of a particular security-related application that is a detection of the security status of computer networks. Section 32.4 describes an approach intending training of classifiers ^JDL model was developed by Joint Directories Research Laboratories of the US Air Force within the framework of Information Fusion Initiative.

32 Multi-agent and Data Mining Technologies for Situation Assessment...

413

destined for on-line situation assessment update based on asynchronous inputs from multiple sources. Section 32.5 considers implementation issue of multi-agent situation assessment systems and demonstrate this aspect by example of such a system developed by the authors. Conclusion summarizes the research results and outlines future works.

32.2 On-line Situation Assessment Update: Peculiarities of Input Data Situation assessment is the topmost task in the security-related scope. Situation is a characteristic of a system constituted by semi-autonomous objects (situation objects) having particular goals and operating in a coordinated mode to achieve certain goal of the system on the whole. Situation object can be physical (e.g., technical means participating in a rescue operation) or abstract (e.g., components of software where traces of attack against computer are manifested). Situation and objects are characterized by their "states" taking values from finite sets of labels. Situation assessment task (or rather "situation state assessment" task) is a classification task aiming to determine its current state; its essence is that at each given time instant a label is mapped to situation. Situation and objects states are of dynamic nature and therefore situation assessment is a real time task. Situation related information arrives continuously from multiple distributed sensors. As a rule, the outputs of these sensors come into situation assessment system with different frequencies and in irregular mode constituting jointly asynchronous input data streams that have to be processed by situation assessment system in order to online update the current situation state. The below given example from computer network security demonstrates peculiarities of situation assessment system input; it considers the anomaly detection task. It is assumed that security status of a computer network can take values from binary set { "Normal", "Abnormal"}. It is also assumed for simplicity that four data sources resulting from preprocessing of network traffic constitute input of the security status assessment system^; they are: 1. connection — related vectors of binary sequences specifying six-component stream of IP packets headers; 2. statistical attributes of connections manifested in traffic (like duration, status, total number of connection packets, etc.); 3. statistical attributes of traffic during the short time (5 sec) intervals presented by four features specifying integral characteristics of input traffic like total numbers of connections and services of various types for last 5 sec; and 4. statistical attributes of traffic for long time intervals composed of the same statistics as previous ones but averaged over chosen number of connections. ^The whole case study from intrusion detection scope that is used for validation of the situation assessment technology under development includes multiple data sources of traffic level, operating system level and application level.

414

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

Fig. 32.1 illustrates part of these data streams graphically. The datasets of the above kinds used below for demonstration of the properties of the developed approach to mining of asynchronous data streams were resulted from Tcpdump/Windump data processed by TCPtrace utility and also by some other ad-hoc developed programs.

Sequences of IP packets of particular connections Traffic statistics aggregated for Traffic statistics aggregated for connections

Fig. 32.2. Multiplicity of input data streams used for anomaly detection based on data of network traffic level Sensor data are collected continuously, and one of their peculiarities is that they are time-stamped and particular data streams input into situation assessment system with different frequencies and possess finite values of "life times", that can considerably vary for data of different streams. Finiteness of life time results in the fact that after elapsing a definite time a part of data becomes useless for situation assessment. Therefore at a time of situation assessment update some attributes can be not assigned a value and, thus, input data vector to be used for situation assessment update has missing values. Fig. 32.2 demonstrates this fact. Indeed, let us assume that new data ("events)" arrive at the times Ti,T2, T3 and T4, and according to the necessity to update situation assessment in real time mode, at the same times Ti,T2, T3 and T4 situation assessment system has to make decision about current computer network security status. Decision at the time Ti is initiated by arrival of data denoted as Zi about the most recent connection completed. At that moment, life times of the most recently received data corresponding to the traffic statistics aggregated for 5 sec, Z2, and corresponding to the traffic statistics aggregated for 100 connections, Z3, are not yet elapsed and that is why these data together with the newly arrived ones, Zi, constitute the fully instantiated input Z{Ti) =< Zi, Z2, Zs >. Someone can make sure that the same takes place at the decision making time T2. At the times T3 and T4 the situation looks different. Indeed, at the time T3 decision is initiated by arrival of data Z2 corresponding to the traffic statistics aggregated for 5 sec. At that moment life time of the most recently received data Z3 corresponding to traffic statistics aggregated for 100 connections is yet not elapsed and that is why can be used for on-line decision making, whereas life time corresponding to the most recently completed connection, Zi, is already elapsed (and new connection is still being in progress) and that is why the data corresponding to Zi are useless. Therefore the input at the time T3 contains missing value of the data Zi (see Fig. 32.2). The similar takes place

32 Multi-agent and Data Mining Technologies for Situation Assessment...

415

at the time T4, when due to elapsing of the life time of data Z3 the input of the system assessing the computer network security status contains missing value in the last position. As a conclusion for the above example it can be stated that asynchronous

(1): Connections flow

^5U^^

(2): Aggregation for 5 sec.

< £ A ^ ^ ^

fSSmZ^Zm^ZSm^

|g^^

(3): Aggregation for 100 connections Input data with missing values

Legend: Expiration of life times

z(r,) ==

Arrival of new events

Zfrj) == ^ 2 . Z 3 >

Zfr,j =<*, Z2. mn

( L r o ) ( S ) • • ' ( S ) - Life times of events \J\

^ 2 J \J\

( 7 ) - Events initiating on-line

*

-Missing value

zrr,) =< zi . Z ILA

decision update

Fig. 32.3. Explanation of missingness nature in input of situation assessment system

nature of the situation assessment system inputs and finite life time of these inputs result in the necessity to make decisions on the basis of data with missing values. In general case some kind of prognosis of the missing values can be used. Unfortunately in the example in question the latter is not possible at all due to the fact that, for instance, at the moment of decision making new connection can correspond to the activity of new user and in such a case there is no correlations between previous and next portions of connection-related data. It is important to note that due to

On-line update of combined decisions "^ 2'^>^

Asynchronous streams of inputs

^

Decision of source 1

Decision of source 2

Decision of source k

-<%—

Classifier of source 1

Classifier of source 2

Classifier of

Data source 1

Data source 2

Data source k

source k Asynchronous streams of inputs ^

Fig. 32.4. Decision Fusion Methodology

some reasons, in general case of situation assessment systems certain attributes of input data streams can be also missing; e.g., if airborne data are used then data can

416

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

be missing due to meteorological factors, object masking, etc. Thus, missingness of data is a specific property of the input of situation assessment systems and in many cases it is impossible to impute missing values based on some statistical properties of input. The above example from computer network security scope demonstrates this fact. Thus, a specific problem stated by situation assessment tasks is that the latter is classification task with missing values. Respectively, training and testing of situation assessment systems destined for on-line classification update is reduced to data mining and knowledge discovery from the data sets containing missing values. The respective approach and techniques as well are briefly considered in this paper.

32.3 On-line Situation Assessment Update: Methodology and Multi-agent Architecture Let us outline methodology of situation assessment that is based on the ideas of information fusion corresponding to level 2 of JDL model (see Fig. 32.1). This methodology determines how to allocate data and information processing functions to data source-based level and meta-level. There exist several approaches to fusion of data and information ([3]). In the used methodology at least two-level information fusion architecture is considered. In this architecture local classification mechanisms produce decisions regarding the object states based on particular data sources and then, these decisions are combined at meta-level. This methodology is advantageous in many respects, particularly, it (1) considerably decreases conmiunication overhead; (2) is applicable to the applications where data structures of particular sources are heterogeneous, since only local decisions are forwarded to the upper level, and these decisions are represented in binary or categorical scales; (3) there exist a number of effective and efficient algorithms for combining such decisions at upper level and (4) it preserves the source data privacy. A generalized structure demonstrating such a methodology of information fusion is depicted in Fig. 32.4. As concerns the above Anomaly detection system

Fig. 32.5. Anomaly detection system architecture example considering anomaly detection task, at its first level four classifiers produc-

32 Multi-agent and Data Mining Technologies for Situation Assessment...

417

ing decisions based on particular data sources are used. At the second level dealing with asynchronous binary data streams these decisions are combined. The respective multi-agent architecture of the anomaly detection system is presented in Fig. 32.5. This architecture consists of two agents of different classes: •

•

Agent of Network Traffic-based Alerts, NTA-agent, that is responsible for detection of abnormal user activity based on particular data streams of output data generated by Network Traffic Sensor, NTS, and Alert Correlation agent (Information Fusion), AC-agent, that s responsible for combining the alerts generated by classifiers oi NTA-agent.

The basic functionalities oi NTA-agent are (1) transformation of raw data structures resulting from traffic data preprocessing (this is performed by NTS component indicated in Fig. 32.5) into feature structures (this function is realized by NT-F component of NTA-agent), and (2) producing classifications, Normal or Alert, for each particular data stream of the feature structures. The last functions are realized by classifiers QA(Cnc), QB (Cnc), QB (CW^) and QB (CWIQO). Here Cnc stands for Connection and Ssn stands for Session. Accordingly, architecture of NTA-agent comprises (1) the component performing computation of the feature structures over the raw data and (2) the components performing classification (alert generation) based on particular data streams represented in terms of the feature structures. Architecture of the AC-agent includes two components, and the QB classifier is its main component. It is responsible for on-line combining of decisions produced by classifiers of the NTA-agent, thus, producing the on-line assessment of the host security status. Normal, Abnormal. In intrusion detection, this procedure is referred to as ''alert correlation". The Syn component (Syn is abbreviator for 'Synchronization") is responsible for detecting "too old" data. This component carries out "synchronization" of data in the following manner. Up to the time of receiving a new message, AC-agent (more exactly, its component Syn) keeps previous decisions of all first-level classifiers (in our case it consists of 4 attributes — labels of host security status produced by classifiers of the NTA-agent) with time stamps. While having new message received, Syn component changes values of respective attribute and deletes values of attributes that are "out of date". The updated data vector which can contain missing values is forwarded to the QB classifier responsible for "alert correlation". The simplified architecture of anomaly detection system described above presents in general features a generic architecture of many situation assessment systems intended to on-line update the situation status. The differences between particular cases of such systems can mainly concern the number of data sources used, the number of agents, the number of decision making levels. Nevertheless, in the most cases such a multi-agent system has to comprise more than one level of data processing and decision making, component providing "synchronization" as well as component responsible for first-level decision combining.

418

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

32.4 Data Mining with Missing Values for Situation Assessment Data mining with missing values is a special problem being investigated for a long time. Here most of researchers mainly dwell upon the methods based on a reasonable assignment ("imputation") of the missing values exploiting mostly statistical ideas, but such approaches are not applicable to SA due to substantial variety of dynamics of input data streams. Unfortunately, an approach based on using the imputation idea is not relevant to many situation assessment applications. This is the reason why the direct mining of data with missing values has to be used for this application. An approach to direct mining of data with missing values that does not assume an imputation was proposed in [6]. The idea exploited in it is conceptually simple: if we arbitrary assigned the missing values of training dataset, we would be able to extract the set of maximally general rules, MGR [10] using the existing techniques like AQ [9], RIPPER [2], GK2 [7], etc. It is important to note that different assignments would lead to different MGR sets. It was shown in [6] that among different assignments of missing values of training dataset two specific variants exist that lead to the sets of MGR serving as low Riow and upper Rupper bounds for any set of MGR i?* corresponding to an arbitrary assignment: •^low

_ •^* ^ -^upper

K'^^'^J

where C is the deducibility relation. Let us outline how this assignments can be found and how the bounds Riow and Rupper can be computed. Let t{i) be an arbitrary z-th instance of training dataset and k be the index of the chosen seed [10], / ^ be the indexes set of seed attributes with assigned values. While searching for MGR for seed t{k), columns of training dataset whose indexes are out of the set / ^ are ignored. Let us denote the index set of missing values in a negative example t{l) by /^~^ and the same set in a positive example t(r), r ^ k,by I^^. Let us consider two variants of missing values assignment in the sets of negative NE and positive PE examples:

J

i = ^tli € /^„/ e NE'Xi = tti e /+,,r G PE,

I

t\ = tlielr^^.le

NE;t\

= -^t^i G /,%,r G PE

(32.2) (32.3)

The assignment (32.2) maximally increases both distinctions between the seed and negative examples, and similarities between the seed and other positive examples. On the contrary, the assignment (32.3) maximally increases both similarities between the seed and negative examples, and distinctions between the seed and other positive examples. The assignment (32.2) can be called optimistic, it cannot decrease both the generality and coverage factor of any rule of MGR extracted from any arbitrary assigned source dataset. In the assignment (32.2), that can be called ''pessimistic'', both the generality and coverage factors of rules extracted from any arbitrary assigned source dataset cannot be increased.

32 Multi-agent and Data Mining Technologies for Situation Assessment...

419

The above statements provide a general framework for direct mining of data with missing values It is obvious that the rule set Riow belongs to the set of MGR under search. Other rules have to be selected from the rule set Rupper- Let us explain how this can be done. Let alternative classes of situations be denoted as Q and Q. The selection algorithm used in this research consists of the following steps applied to each seed: L Assign the missing values of the training dataset "optimistically" as in (32.2) and mine the rule set Rupper for classes Q and 0 ^ RupperiQ) and RupperiQ)2. Assess the quality of the extracted rule sets Rupper (Q), RupperiQ) using certain evaluation criteria based on testing dataset and select the best rules from these sets for use in reasoning mechanism. 3. Design classification mechanism and assess its performance quality. 4. If the above quality is not satisfactory go to 4 to repeat rule selection. The other procedures are the same as used in ordinal cases [10]. An experiment simulating training of anomaly detection system case study allows to optimistically evaluate the developed approach to online situation assessment update. Indeed, as applied to the anomaly detection task which uses data of the network traffic level, data of operating system log and data of application level, this approach showed the estimated probability of correct classification about 0.99 if tested on testing and training samples with about 20% of missing values.

32.5 Implementation Issues An important issue of multi-agent situation assessment systems is a technology of its analysis, design, implementation and deployment. In this research the MASDK 3.0 software tool supporting all the stages of multi-agent technology is used [5]. MASDK System kernel, MAS XML specification

Integrated set ofl editors

ir^ L Generic agent

Software agent | builder

3ost

Communication platform

Fig. 32.6. MASDK software tool components and their interaction

This software tool implementing Gaia methodology [16] consists of the following components (Fig. 32.6): 1. system kernel, which is a data structure for XML-based representation and storing of target applied MAS specification; 2. integrated multitude of user friendly editors supporting user's activity destined for specification of applied MAS;

420

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

3. library of C++ classes implementing what is usually called Generic agent integrating reusable component of agents; 4. communication platform to be installed in particular computers of a network; and 5. generator of software agent instances, which performs generation of source C++ code and executable code of software agent instances and also software needed for MAS deployment over the already installed communication platform. Specification of applied MAS in the system kernel is carried out using the editors structured in three levels. The editors of ihQ first level used for description of applied MAS at the analysis stage are as follows: 1. application ontology editor, 2. editor describing roles, names of agent classes, and high-level schemes of roles' interactions, 3. editor describing roles' interaction protocols. Editors of the second level supporting specification of agent classes at design stage are as follows: 1. editor specifying model of meta-level behavior of agent (while analyzing input messages and interacting with user and environment); 2. editor specifying particular agent functions and behavior scenarios in terms of state machines; 3. editor specifying software agent private ontology, and 4. initial state of agent class mental model. Editors of the third level support specification of the MAS components needed for its deployment. Applied MAS specification produced by designers making use of the above editors is stored as XML file in the system kernel. Generation of source (in C++) and executable codes is performed in automatic mode. The case study on anomaly detection system described in section 32.2 and used through all the paper for demonstration of the proposed solutions was implemented through MASDK 3.0 software tool. All the classifiers were trained for Distributed data Mining tool also developed by the authors [4]. One of its components implements the algorithm for direct mining of data with missing values described in previous section. It was used for training of the classifier of meta-level, denoted in Fig. 32.5 as Q B This system was trained and tested based on well known DARPA data. Fig. 32.6 demonstrates graphically the performance of this multi-agent system for anomaly detection for certain time period lasting about one hour. In the bottom part the time intervals where intrusion takes place are presented in black color, whereas the intervals without intrusions are given in white color. The top part of Fig. 32.6 presents the performance results of the developed anomaly detection system. In it same colors are used for the same decisions produced by the anomaly detection system. The decisions corresponding to false alarms and missing of signals are presented here below

32 Multi-agent and Data Mining Technologies for Situation Assessment...

421

the time axis, whereas decisions corresponding to the correct anomaly detections are given above this axis in black color. The correct detection of normal users' activity are given in white color above the time axis.

Fig. 32.7. Visualization component of the implemented anomaly detection system

It can be seen that although only traffic-based data source was used, and training dataset contains rather high percentage of missing values (about 20%) the results are not too bad, although they are far from "ideal". It also should be noted that the purpose of the above experiments at the current stage of research was not to evaluate the algorithm developed for direct mining of data with missing values but validate the architecture as well as the developed design and implementation technology destined for engineering of situation assessment systems supporting on-line update of situation assessment.

32.6 Conclusion The paper is devoted to certain key issues of the situation assessment problem. It considers the situation assessment task statement that accounts for the fact that input of any situation assessment system is composed of asynchronous data streams possessing various life times, and the input can contain missing values. Other important peculiarity of the situation assessment task statement that is very significant for practice is that situation assessment has to be updated on-line, i.e. that such systems operate in real-time mode. The novel results presented in the paper are as follows: 1. New sound approach to direct mining of data with missing values based on computation of upper and low bounds of the sets of maximally general rules that can be extracted from arbitrary assigned training data with missing values. 2. Two level multi-agent architecture for situation assessment systems making decisions based on asynchronous data streams arriving from multiple sources. The main paper results were used in design and implementation of a software prototype of multi-agent anomaly detection system operating on the basis of multiple data sources. The future research will aim at further validation of the paper results via design, and implementation of multi-agent software prototypes for other security-related applications.

422

Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov

Acknowledgement We wish to thank European Office of Aerospace Research and Development of the USAF (Project 1993P) and Russian Foundation for Basic Research (grant # 04-0100494) for support of this research.

References 1. Ben-Bassat, M., Freedy, A.: Knowledge Requirements and Management in Expert Decision Support Systems for (Military) Situation Assessment. IEEE Transactions on Systems, Man and Cybernetics, vol.12. (2002) pp. 479-490 2. Cohen, W.: Fast efficient rule induction. Machine Learning: 12th International Conference, CA, Morgan Kaufmann (1995) 3. Goodman, I., Mahler, R., and Nguen, H.: Mathematics of Data Fusion. Kluwer Academic Publishers, (1997) 4. Gorodetsky, V., Karsaeyv, O., and Samoilov, V: Software Tool for Agent-Based Distributed Data Mining. Proceedings of the IEEE Conference "Knowledge Intensive Multiagent Systems" (KIMAS 03), Boston, USA (2003) 5. Gorodetski, V, Karsaev, O., Kotenko, and I., Khabalov, A.: Software Development Kit for Multi-agent Systems Design and Implementation. In B.Dunin-Keplicz, E.Navareski (Eds.), From Theory to Practice in Multi-agent Systems. Lecture Notes in Artificial Intelligence, Vol. 2296, (2002) 121-130 6. Gorodetsky, V, Karsaev, O.: Mining of Data with Missing Values: A Lattice-based Approach. In Proceedings of International Workshop on the Foundation of Data Mining and Discovery, Japan, (2002) 151-156 7. Gorodetsky, V, Karsaev, O.: Algorithm of Rule Extraction from Learning Data. Proceedings of the 8-th International Conference "Expert Systems & Artificial Intelligence" (EXPERSYS-96)(1996) 133-138 8. Greeenhill, S., Venkatesh, S., Pearce, A., Ly, T.C.: Representations and Processes in Decision Modeling. DSTO Aeronautical and Maritime Research Laboratory, Australia, DSTO-GD-0318(2002) 9. Michalski, R.: A Theory and Methodology of Inductive Learning. Machine Learning, vol.1, Carbonel, J.G., Michalski, R.S. and Mitchel, T.M. (Eds.). Tigoda, Palo Alto (1983) 83-134 10. Michalski, R. and Kaufman, A.: Data Mining and Knowledge Discovery: A Review of Issues and Multistrategy Approach. Machine learning and Data Mining: Methods and Applications, John Wiley and Sons, (1997) 11. Proceeding of the Fifth International Conference on Information Fusion (IF-2002). Annapolis, MD, July 7-11, (2002) 12. Proceeding of the Six International Conference on Information Fusion (IF-2003). Melbourne, AustraUa, July 13-17 (2003) 13. Salerno, J., Hinman, M., Boulware, D.: Building a Framework for Situation Assessment. Proceedings of The 7th International Conference on Information Fusion. Sweden (2004) 14. Salerno, J.: Information Fusion: A High-level Architecture Overview. In CD Proceedings of the Fusion-2002, AnnapoHs, MD (2002) 680-686. 15. Than, C. L., Greenhill, S., Venkatesh, S., Pearce, A.: Multiple Hypotheses Situation Assessment. Proceedings of The 6th International Conference on Information Fusion. Australia, (2003)972-978 16. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia Methodology for Agent-Oriented Analysis and Design. Journal of Autonomous Agents and Multi-Agent Systems, 3, vol.3. (2000)285-312

33

Virtual City Simulator for Education^ Training, and Guidance Hideyuki Nakanishi Department of Social Informatics, Kyoto University Kyoto 606-8501, Japan [email protected] http://www.Iab7.kuis.kyoto-u.ac.jp/~nuka/

33.1 Introduction Since smooth evacuation is important to safeguard our lives, we are taught how to evacuate in preparation for a disaster. For example, fire drills are conducted in schools. However, so sporadic and preestablished real-world trainings can only give us very limited experience. Also, it is rare to conduct fire drills in large-scale public spaces such as central railway stations, even though they are places where vast amounts of people gather. Crowd simulations should be used to compensate for the lack of opportunities to experience this. Even if we have learned to evacuate before an emergency happens, we need appropriate guidance during such emergency. For example, large buildings have to be equipped with many emergency exits and their signs. However, these architectural guidance objects are not very flexible and human guidance is necessary to help an escaping crowd while adapting to the moment's needs. Crowd simulations should also be used to assist such guidance tasks. Crowd simulations having the purpose of learning about evacuation and guidance of escaping people would be very beneficial. Multi-agent simulations are already known as a technology that deals effectively with the complex behavior of escaping crowds [9, 28]. However, conventional simulations are not tailored for learning or guiding an evacuation. Since they are only designed for analyzing crowd behavior, they do not take human involvement much into account. For example, crowds are usually represented as moving particles. It is not easy to interpret such a symbolic representation. Moreover, it is almost impossible for users to become a part of the simulated crowds and experience a virtual evacuation. So in order to compensate for this incapability, and as part of the Digital City Project [12], we have developed "FreeWalk" [25], a virtual city simulator that allows human involvement. In FreeWalk, multi-agent simulations that include human beings can be designed. In this paper, I describe Free Walk's design and how it is used for learning and guidance purposes. First, the capability to involve humans is described. Next, an

424

Hideyuki Nakanishi

experiment to evaluate its effectiveness for learning is explained. Finally, the first prototype of a guidance system is introduced.

33.2 FreeWalk Virtual training environments have already been used for single-person tasks (e.g. driving vehicles) and are becoming popular for multi-party tasks because they can significantly decrease the cost of group training. In these environments, it is easier to gather many trainees since they participate in the training as avatars by entering the virtual space through a computer network. In addition, it is possible to practice a dangerous task repeatedly since trainees are inherently safe. In the virtual environments "social agents", the software agents that have social interaction with people [24], play an important role in the following two ways: 1) By sharing the same group behavior, social agents become colleagues of human trainees and can decrease the number of human participants necessary to carry out a large-scale group training. 2) Social agents can play a predefined role within the training. Scripted training scenarios enable social agents to perform their assigned roles [6]. Human participants can then learn something through the interaction with the social agents, which behave according to the training scenario. We have developed a platform for simulating social interaction in virtual space called FreeWalk, its primary application being virtual trainings. We integrated diverse technologies related to virtual social interaction, e.g. virtual environments, visual simulations, and lifelike characters [30]. In FreeWalk, lifelike characters enable virtual collaborative events such as virtual meetings, trainings, and shopping in distributed virtual environments. You can conduct distributed virtual trainings [8, 21, 38] where you can use lifelike characters as the colleagues of human trainees [32]. FreeWalk is not only for training but also for communication and collaboration [7, 19, 33]. You can use lifelike characters as the facilitators of virtual communities [11]. These characters and the human participants can use verbal and nonverbal conmiunication skills to talk with one another [5]. And it can also be a browser of 3D geographical contents [20] in which lifelike characters guide your navigation [16] and populate the contents [36]. To allow users to be involved in a multi-agent simulation, each virtual human in FreeWalk can be either an avatar or an agent. 'Avatar' means a virtual human manipulated by a user through the keyboard, mouse, and other devices. 'Agent' means a virtual human controlled by an outside program connected to FreeWalk. FreeWalk has a common interaction model for both agents and avatars while at the same time having different interfaces for them, so they can interact with each other based on the same model. Agents are controlled through the application program interface (API). Avatars are controlled through the user interface (UI). FreeWalk does not distinguish agents from avatars. Figure 33.1 roughly shows the distributed architecture of FreeWalk. An agent is controlled through the platform's API. A human participant enters the virtual space as an avatar, which he/she controls through the UI devices connected to the platform. Each character can be controlled from any client. FreeWalk

33 Virtual City Simulator

425

Program

Fig. 33.1. Architecture of FreeWalk uses a hybrid architecture in which the server administrates only the list of current members existing in the virtual space and each client administrates the current states of all characters. This architecture enables agents and people to socially interact with each other based on the same interaction model in a distributed virtual space. Currently, FreeWalk is connected with the scenario description language " g " [13]. It took a long time to construct agents that can socially interact with people, since such agents need to play various roles and each role needs its specific behavioral repertory. We thought it should be easier to design the external role of an agent instead of its internal mechanism and previous studies had focused on the internal mechanism rather than on the external role [16]. So 2 is a language for describing an agent's external role through an "interaction scenario", which is an extended finite state machine whose input is the perceived cues, whose output is an action, and where each state corresponds to each scene. Each scene includes a set of interaction rules, each of which is a couple made up of a conditional cue and the consequent series of actions. Each rule is of the form: "if the agent perceives the event A, then the agent executes the actions B and C." FreeWalk agents behave according to the assigned scenario. FreeWalk and the language processor of Q are connected by a shared memory, through which the Q processor calls FreeWalk's API functions to evaluate cues and actions described in the current scene. FreeWalk's virtual city makes a multi-agent simulation more intuitive and understandable. The spatial structure and the crowd behavior of the simulation are represented as 3D photo-based models. Since the camera viewpoint can be changed freely, users can observe the simulation through a bird's-eye view of the virtual city and also experience it by controlling their avatars using first-person views. FreeWalk does not use neither prepared gait animations nor simplified collision models to keep

426

Hideyuki Nakanishi

the correspondence between crowd behavior and the graphical representation of the virtual city. The VRML model that is basically used for drawing the virtual city is also used as a geometric model to detect collisions with the spatial structure and to generate gait animations. Animations are generated based on the hybrid algorithm of kinematics and dynamics [37]. To reduce the building cost, each VRML model was constructed as the combination of pictures taken by digital cameras and a simple geometric model based on the floor plan. A simple model also helps to reduce the workload of collision detection. It is also possible to represent the actual state of an existing real-world crowd in realtime. To achieve this, it is necessary to synchronize the events simulated in Free Walk with those occurring in the real world. FreeWalk provides an interface used to connect with a sensor network through the Internet. FreeWalk uses physical and social rules to robustly synchronize the movements of human figures with those of the real-world crowds. Based on the positions captured by the sensors. FreeWalk determines the next position of the corresponding human figure. This next position is modified according to the social rules described in the Q language. (Examples of rules are flocking behaviors such as following others and keeping a fixed distance from them [31], and such cultural behaviors as forming into a line to go through a ticket gate or forming a circle to have a conversation [17].) Then, the next position is modified again based on the pedestrian model to avoid collision with others, walls, or pillars [28]. Finally, the gait animation is generated.

33.3 Learning Evacuation Free Walk enables users to experience crowd behavior from their first-person view (FP in Figure 33.2.) With this viewpoint they can practice decision making since they control their avatars based on what their personal view informs. But in FreeWalk users can also observe the overall crowd behavior from a bird's-eye view (BE in Figure 33.2.) This view is more effective in understanding an overall crowd behavior than first-person views. Since both views have different advantages, we conducted an experiment to compare them, and also to derive their synergic effects in order to find out the best way to learn evacuation and the kinds of superiority of FR In the experiment, we tested each view and the effect of viewing a combination of them both in different orders. We compared four groups: experiencing a first-person view (FP group); observing a bird's-eye view (BE group); experiencing a first-person view before observing a bird's-eye view (FP-BE group); and observing a bird's-eye view before experiencing a first-person view (BE-FP group). The subjects were 96 college students. 24 subjects were assigned to each group. A previous real-world experiment [34] had given us a gauge to measure the subjects' own understanding of the resulting crowd behavior. This experiment demonstrated how the following two evacuation methods cause different crowd behaviors: 1) In the follow-direction method, the leaders point their arms at the exit and shout out, "the exit is over there!" to indicate the direction. They don't escape until all evacuees have gone out. 2) In the follow-me method, they do not indicate the di-

33 Virtual City Simulator

427

First-person view (FP)

Fig. 33.2. Two views for learning evacuation

rection. To a few of the nearest evacuees, they whisper, "follow me" and proceed to the exit. This behavior creates a flow toward the exit. The evacuation simulation was constructed based on this previous experiment [22]. At the beginning of the simulation, everyone was in the left part of the room, which was divided into left and right parts by the center wall as shown in the BE of Figure 33.2. Four leaders had to lead sixteen evacuees to the correct exit on the right side and prevent them from going out through the incorrect exit in the left part. In the FP simulations, six of the evacuees were subjects and the rest were agents. So, four FP simulations were conducted in each of the three groups that include FP simulations. In the BE simulations, both evacuees and leaders were all agents. In the experiment, subjects observed and experienced the two different crowd behaviors caused by the two evacuation methods explained above. We used the resulting behaviors as questions and the causing methods as answers: In a quiz including 17 questions, subjects read the description of each crowd behavior and chose one of the two methods as an answer. They took the quiz before and after the experiment. We used a t-test to find significant differences between the scores of pre- and post-quizzes. A significant difference meant that the subject could learn the asked nature of crowd behavior through his or her observation and experience. Table 33.1 summarizes the results of the t-test on nine questions. Since no group could answer the other eight questions correctly, they are omitted. Even though the

428

Hideyuki Nakanishi

results depend on the design of the quiz, it seems clear that a bird's-eye observation was necessary to understand the crowd's behavior. The FP group could not answer the questions from no. 3 to 9, which were related to the overall crowd behavior. However, the first-person experience was not worthless. It is interesting that the BEFP group could answer the questions no. 6 and 7 that the BE and FP-BE groups could not. This result implies that the background knowledge of the overall behavior enabled the subjects to infer individuals' actions (the following behavior) and its outcome (the formation of a group) from their first-person experiences. They could understand how they interacted with other evacuees because they controlled their avatars by themselves. They could also understand the others interacted with each other the same way as themselves because they knew the overall behavior beforehand. The ranking of the four ways to learn evacuation is illustrated in Figure 33.3. BE-FP was the best way, BE and FP-BE were next, and FP was the worst. We found that the best way is to observe first and then experience.

33.4 Guiding Evacuation Our living space consists of our home, office and public spaces. Studies on remote communication have predominantly focused on the first two spaces. The primary issue of these studies is how to use computer network technologies to connect distributed spaces. These studies have proposed various designs and technologies but share the same goal, which is the reproduction of face-to-face (FTF) communication environments. For example, media space research tried to connect distributed office spaces [2]. Telepresence and shared workspace research explored a way to integrate distributed deskwork spaces [4, 14]. Spatial workspace collaboration research dealt Table 33.1. Summary of the results of the quiz (one-sided paired t-test) Question (the answer of all items is the foUow-me method.)

FP

BE

FP-BE

BE-FP

1

Leaders are the first to escape.

4 3***

2.2*

2.3*

4_Q***

2

Leaders do not observe evacuees.

2.8**

44***

4 Q***

4 2*5f!*

3

Leaders escape Hke evacuees.

1.6

2.2*

1.9*

2.9**

4

One's escape behavior is caused by others' escape behavior.

1.2

2.1*

3.3**

2.9**

3 y***

4.5***

No.

5

Nobody prevents evacuees from going to the incorrect exit.

1.6

4 9***

6

Evacuees follow other evacuees.

1.3

1.0

0.7

2.1*

7

Evacuees form a group.

1.6

0.5

1.2

1.9*

8

Leaders and evacuees escape together.

0.7

2.0*

0.2

3.4**

Evacuees try to behave the same as other evacuees.

0.9

2.5*

1.5

0.9

[9

*p<.05, **p<.01, ***p<.001 (df=23)

33 Virtual City Simulator

429

Effectiveness

Fig. 33.3. Ranking of the four ways to learn evacuation with spatially configured workspaces [18]. CVE research proposed using virtual environments as virtual workspaces [1]. And this kind of efforts still continue [15]. A recent additional issue is how to use the technologies for enhancing collocated spaces [10]. We tackled a third but increasingly important issue-how to use the technologies to support remote conmiunication in large-scale public spaces such as a central railway station. Those spaces have characteristic participants: staff administrating the space and visitors passing through it. Remote conMnunication between them is important because a vast amount of people gathers and appropriate guidance for crowd control is critical. Currently, surveillance cameras and announcement speakers connect staff in a control room with visitors in a public space. The staff can observe the visitors thanks to the cameras and talk to them through the speakers. This traditional conmiunication system is not enough for individual guidance. The off-site staff in some room can give overall guidance for the whole visitors but on-site staff working in the public space is necessary to give location-based guidance for each visitor. We devised a new way to guide each visitor remotely and a new conmiunication environment for it, since conventional environments that aimed at the reproduction of FTP communication cannot be adapted to the case where every visitor is a candidate for on-demand guidance. The results of the evacuation learning experiment described in the previous section have several implications on the design of the communication environment. The surveillance cameras enable the staff to watch many fragmentary views of the public space. However, the results showed that bird's-eye views were better than first-person views. Thus, a single global view is better than a collection of fragmentary views. The announcement speakers can convey only uniform information for all the visitors.

430

Hideyuki Nakanishi

However, the results showed that the group experiencing a first-person view could understand the situation much better if they observed the bird's-eye view after their first-person experience. This means that announced information should teach the visitors the overall situation surrounding them. Thus, the visitors need not only overall information, e.g. a fire breaks out, but site-specific information like "too many people are rushing into the stairs in front of you." Another limitation of the announcement speakers is that they cannot support two-way communication. The most interesting result was that the best way to learn the crowd behavior was to observe it first and experience it afterward. This result implies that the staff can derive useful information from the visitors. Thus, two-way communication is better than one-way communication. We built Free Walk into an evacuation guidance system in which a public space is monitored by vision sensors instead of surveillance cameras and information is transmitted by mobile phones instead of announcement speakers. Figure 33.4 is a snapshot of our guidance system and the escaping passengers on a station's platform, with their mobile phones at hand. You can see a pointing person who stands in front of a large-scale touch screen. Suppose that this person is a station staff officer working in a control room. The screen displays the bird's-eye view of the simulated station visualized thanks to FreeWalk. The walking behavior of human figures in the virtual station is generated according to the positional data transmitted from the real station, equipped with vision sensors that track the movements of the real-world escaping passengers. In the snapshot, the man is pointing at a human figure, which represents one of the passengers. When the touch screen detects this pointing operation, the system immediately activates the connection between the officer's headset and the passenger's mobile phone. This trick is possible because the headset is connected to a PC equipped with a special interface card which can control audio connections between the PC and several telephone lines. This simple coupling between pointing operation and audio activation makes it easy for the staff to begin and end an instruction. As described above, the system provides the staff a single global view of the public space and two-way communication channels with particular visitors so that the staff can supply the visitors site-specific information. Kyoto Station in Kyoto City is a central railway station where the number of visitors per day is more than 300,000. To install our evacuation guidance system in the station, we attached a vision sensor network to the station. We attached 12 sensors to the concourse area and 16 sensors to the platform. Figure 33.5(a) is the floor plan, on which the black dots show the sensors' positions, and Figure 33.5(b) shows how they have been installed. The vision sensor network can track passengers between the platform and the ticket gate. In Figure 33.5(c), you can see a CCD camera and a reflector with a special shape [23]. If we could expand the field of view (FOV) of each camera, we could reduce the number of required cameras. However, a widened FOV causes minus (barrel) distortion in the images taken by conventional cameras. The reflector of our vision sensor can eliminate such distortion. The shape of the reflector can tailor a plane that perpendicularly intersects the optical axis of the camera to be projected perspectively to the camera plane. As shown in Figure 33.5(d), this optical contrivance makes it possible to have a large FOV without distortion. From

33 Virtual City Simulator

431

Fig. 33.4. Evacuation guidance system the images taken by the cameras, the regions of moving objects are extracted using the background subtraction technique. The position of each moving object is determined based on geographical knowledge, including the position of the cameras, the occlusion edges in the views of the cameras, and the boundaries of walkable areas.

432

Hideyuki Nakanishi

(b) /nstaf/at/on

(e) SImufatedpassengers Fig. 33.5. Virtual Kyoto Station

Figure 33.5(e) is a screenshot of the simulated passengers synchronized with their retrieved positions. We named the communication form supported by our guidance system "transcendent communication" [26]. In transcendent communication, a user watches the bird's-eye view of the real world to grasp its situation and points at a particular location within the view to select a person or group of people to talk to. Figure 33.6 explains the difference between distributed communication and transcendent communication. In distributed communication, a virtual space is used for connecting real spaces each of which contains its own participant. Therefore, the virtual space is a synthetic space, which transmits nonverbal cues such as proxemics and eye contact [27]. The goal of distributed communication is the reproduction of collocated communication. In transcendent communication, a virtual space is used for visualizing the bird's-eye view of the real space. Therefore, the virtual space is a projective space, which represents the real world situation. The goal of transcendent communication is not the reproduction of collocated conmiunication but the production of asymmetric conmiunication. Collocated communication is symmetric since everyone has his/her first-person view to observe the others and can control conversation floor. In distributed communication, participants should be basically reciprocal since privacy is an important issue and intrusiveness should be avoided [3, 35]. To the contrary, in transcendent communication, the bird's-eye view helps a transcendent participant, e.g. someone from the station staff, become an intrusive observer who can administrate the immanent participants, in this case, the passengers. Transcendent participants need to observe immanent participants but immanent participants

33 Virtual City Simulator

433

do not need to observe transcendent participants. And only transcendent participants can control communication channels.

Real

cil^fc. Collocated

Distributed

Transcendent

Fig. 33.6. Transcendent communication

Figure 33.7 presents an example of "transcendent guidance", that is, guidance via transcendent communication. In Figure 33.7(a), the staff is watching a virtual station platform where too many people are rushing into the stairs at the right, while the left stairs are not crowded at all. The staff determines that a group of people following behind in the right crowd is not safe and should be guided towards the left stairs. In Figure 33.7(b), the staff connects communication channels with the group to begin guidance instructing them to switch the destination from the right stairs to the left stairs as shown in Figure 33.7(c). The current implementation in Kyoto Station cannot allow us to do the exact same thing as this example due to technological limitations. The image processing function of the vision sensor network does not work if the platform is crowded. And there is no mechanism to automatically track the phone numbers of the passengers. However, the technological advances in perceptual user interfaces (PUI) [29] may soon be able to eliminate these implementation issues. Communication channel control should be very efficient in order to give good guidance. To explore its interaction design, we implemented two different kinds of user interfaces. In the GUI version shown in Figure 33.4, touching characters and talking are coupled. In the PUI version shown in Figure 33.8, we used an eye-tracking device instead of a touch screen to couple gazing characters and talking. The PUI version gives a much more seamless feeling than the GUI version since a vocal channel is established immediately when a user looks at the character to talk. However, gaze is a single pointing device while a touch screen enables a user to use at least two devices, i.e. his or her two hands. Even if the screen can only detect a single spot being touched at one given time, the two hands are more efficient than a single hand or a gaze.

434

Hideyuki Nakanishi

(a) Watch

R ^a

T

bl- M

(b) Connect

(c) Guide

R ^H

.^niEC^.^

Fig. 33.7. Transcendent guidance

Fig. 33.8. Gaze-and-talk interaction

33.5 Conclusion We presented two examples of crisis management applications of our virtual city simulator, FreeWalk. The first application is virtual evacuation simulation, where

33 Virtual City Simulator

435

learners can observe multi-agent crowd behavior simulations described in the Q language and also take part in the simulation as avatars. The second application is transcendent guidance systems, which visualize real-world pedestrians in the virtual city and enable location-based remote guidance. The key feature of our simulator is such inclusion of humans in crowd behavior simulations of urban spaces. In the simulations, each person can be simulated as an agent, an avatar, or a projective agent that visualizes context information retrieved from a real-world person walking around a smart environment. The development of the two applications showed that the design principles of real-world systems could be derived from virtual simulations. We designed the transcendent guidance system based on the result of the evacuation simulation experiment. The transfer of the design principles was made possible by the correspondence between the two different viewpoints (first-person and bird's-eye views) and the two different kinds of users (transcendent and immanent users). We think this study implies a new method of software design. Acknowledgements. This work was conducted as part of the Digital City Project supported by the Japan Science and Technology Agency. I express my special gratitude to Torn Ishida, the project leader. This work would also have been impossible without the contribution of Satoshi Koizumi and Hideaki Ito. I express my thanks to the Municipal Transportation Bureau and General Planning Bureau of Kyoto city for their cooperation. I received a lot of support in the construction of the simulation environment from Toshio Sugiman, Shigeyuki Okazaki, and Ken Tsutsuguchi. Thanks to Hiroshi Ishiguro, Reiko Hishiyama, Shinya Shimizu, Tomoyuki Kawasoe, Toyokazu Itakura, CRC Solutions, Mathematical Systems, and CAD Center for their efforts in the development and experiment. The source code of Free Walk and Q is available at http://www.lab7.kuis.kyoto-u.ac.jp/freewalk/ and http://www.digitalcity.jst.go.Jp/Q/.

References 1. Benford, S., Greenhalgh, C, Rodden, T. and Pycock, J. Collaborative Virtual Environments. Communications of the ACM, 44(7), 79-85, 2001. 2. Bly, S. A., Harrison, S. R. and Irwin, S. Media Spaces: Bringing People Together in a Video, Audio, and Computing Environment. Communications of the ACM, 36(1), 28-47, 1993. 3. Homing, A. and Travers, M. Two Approaches to Casual Interaction over Computer and Video Networks. International Conference on Human Factors in Computing Systems (CHI91), 13-19, 1991. 4. Buxton, W. Telepresence: Integrating Shared Task and Person Spaces. Canadian conference on Graphics Interface (GI92), 123-129, 1992. 5. Cassell, J., Bickmore, T, Billinghurst, M., Campbell, L., Chang, K., Vilhjalmsson H. and Yan H. Embodiment in Conversational Interfaces: Rea. International Conference on Human Factors in Computing Systems (CHI99), 520-527, 1999.

436

Hideyuki Nakanishi

6. Cavazza, M., Charles, F. and Mead, S. J. Interacting with Virtual Characters in Interactive Storytelling. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2002), 318-325, 2002. 7. Greenhalgh C. and Benford S. Massive: A Collaborative Virtual Environment for Teleconferencing. ACM Transactions on Computer-Human Interaction, 2(3), 239-261, 1995. 8. Hagsand, O. Interactive Multiuser VEs in the DIVE System. IEEE MultiMedia, 3(1), 3039, 1996. 9. Helbing, D., Parkas, I.J. and Vicsek, T. Simulating Dynamical Features of Escape Panic. Nature, 407(6803), 487-490, 2000. 10. Huang, E. M. and Mynatt, E. D. Semi-Public Displays for Small, Co-located Groups. International Conference on Human Factors in Computing Systems (CHI2003), 49-56, 2003. 11. Isbell, C. L., Keams, M., Kormann, D., Singh, S. and Stone, P. Cobot in LambdaMoo: A Social Statistics Agent. National Conference on Artificial Intelligence (AAAI2000), 36-41, 2000. 12. Ishida, T. Digital City Kyoto: Social Information In-frastructure for Everyday Life. Communications of the ACM, 45(7), 76-81, 2002. 13. Ishida, T. Q: A Scenario Description Language for Interactive Agents. IEEE Computer, 35(11), 54-59, 2002. 14. Ishii, H., Kobayashi, M. and Arita, K. Iterative Design of Seamless Collaboration Media, Communications of the ACM, 37(8), 83-97, 1994. 15. Jancke, G., Venolia, G. D., Grudin, J., Cadiz, J. J. and Gupta, A. Linking Public Spaces: Technical and Social Issues. International Conference on Human Factors in Computing Systems (CHI2001), 530-537, 2001. 16. Johnson, W.L., Rickel, J.W. and Lester, J.C. Animated Pedagogical Agents: Face-to-Face Interaction in Interactive Learning Environments. International Journal of Artificial Intelligence in Education, 11, 47-78, 2000. 17. Kendon, A. Spatial Organization in Social Encounters: the F-formation System. A. Kendon, Ed., Conducting Interaction: Patterns of Behavior in Focused Encounters, Cambridge University Press, 209-237, 1990. 18. Kuzuoka H. Spatial Workspace Collaboration: a SharedView Video Support System for Remote Collaboration Capability. International Conference on Human Factors in Computing Systems (CHI92), 533-540, 1992. 19. Lea, R., Honda, Y., Matsuda K. and Matsuda, S. Conamunity Place: Architecture and Performance. Symposium on Virtual Reality Modeling Language (VRML97), 41-50, 1997. 20. Linturi, R., Koivunen, M. and Sulkanen, J. Helsinki Arena 2000 - Augmenting a Real City to a Virtual One. T. Ishida, K. Isbister Ed., Digital Cities, Technologies, Experiences, and Future Perspectives. Lecture Notes in Computer Science 1765, Springer-Verlag, New York, 83-96. 2000. 21. Macedonia, M. R., Zyda, M. J., Pratt, D. R., Barham, P. T. and Zeswitz, S. NPSNET: A Network Software Architecture for Large-Scale Virtual Environments. Presence, 3(4), 265287, 1994. 22. Murakami, Y, Ishida, T, Kawasoe, T. and Hishiyama, R. Scenario Description for Multiagent Simulation. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2003), 369-376, 2003. 23. Nakamura, T. and Ishiguro, H. Automatic 2D Map Construction using a Special Catadioptric Sensor. International Conference on Intelligent Robots and Systems (IROS2002), 196-201, 2002. 24. Nakanishi, H., Nakazawa, S., Ishida, T., Takanashi, K. and Isbister, K. Can Software Agents Influence Human Relations? - Balance Theory in Agent-mediated Communities

33 Virtual City Simulator

437

-. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2003), 717-724. 2003. 25. Nakanishi, H. FreeWalk: A Social Interaction Platform for Group Behavior in a Virtual Space. International Journal of Human Computer Studies, 60(4), 421-454, 2004. 26. Nakanishi, H., Koizumi, S., Ishida, T. and Ito, H. Transcendent Communication: Location-Based Guidance for Large-Scale Public Spaces. International Conference on Human Factors in Computing Systems (CHI2004), 655-662, 2004. 27. Okada, K., Maeda, R, Ichikawa, Y. and Matsushita, Y. Multiparty Videoconferencing at Virtual Social Distance: MAJIC Design. International Conference on Computer Supported Cooperative Work (CSCW94), 385-393, 1994. 28. Okazaki, S. and Matsushita, S. A Study of Simulation Model for Pedestrian Movement with Evacuation and Queuing. International Conference on Engineering for Crowd Safety, 271-280, 1993. 29. Pentland, A. Perceptual Intelligence. Communications of the ACM, 43(3), 35-44, 2000. 30. Prendinger, H. and Ishizuka, M. Life-Like Characters: Tools, Affective Functions, and Applications. Springer Verlag, 2004. 31. Reynolds, C. W. Flocks, Herds, and Schools: A Distributed Behavioral Model. International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH87), 2534, 1987. 32. Rickel, J. and Johnson, W. L. Animated Agents for Procedural Training in Virtual Reality: Perception, Cognition, and Motor Control. Applied Artificial Intelligence, 13, 343-382, 1999. 33. Sugawara, S., Suzuki, G., Nagashima, Y, Matsuura, M., Tanigawa H. and Moriuchi, M. Interspace: Networked Virtual World for Visual Communication. lEICE Transactions on Information and Systems, E77-D(12), 1344-1349, 1994. 34. Sugiman T. and Misumi J. Development of a New Evacuation Method for Emergencies: Control of Collective Behavior by Emergent Small Groups. Journal of Applied Psychology, 73(1), 3-10, 1988. 35. Tang, J. C. and Rua, M. Montage: Providing Teleproximity for Distributed Groups. International Conference on Human Factors in Computing Systems (CHI94), 37-43, 1994. 36. Tecchia, R, Loscos, C. and Chrysanthou, Y Image-Based Crowd Rendering. IEEE Computer Graphics and Applications. 22(2), 36-43, 2002. 37. Tsutsuguchi, K., Shimada, S., Suenaga, Y Sonehara, N. and Ohtsuka, S. Human Walking Animation based on Foot Reaction Force in the Three-dimensional Virtual World. Journal of Visualization and Computer Animation, 11(1), 3-16, 2000. 38. Waters, R. C. and Barrus, J. W. The Rise of Shared Virtual Environments. IEEE Spectrum, 34(3), 20-25, 1997.

34 Neurocomputing for Certain Bioinformatics Tasks Shubhra Sankar Ray, Sanghamitra Bandyopadhyay, Pabitra Mitra, and Sankar K. Pal Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India {shubhra_r, sanghami,pabitra_r, sankar}@isical .ac . in

Summary. Different bioinformatics tasks like gene sequence analysis, genefinding,protein structure prediction and analysis, gene expression with microarray analysis and gene regulatory network analysis are described along with some classical approaches. The relevance of intelligent systems and neural networks to these problems is mentioned. Different neural network based algorithms to address the aforesaid tasks are then presented. Finally some limitations of the current research activity are provided. An extensive bibliography is included. Key words: biological data mining, soft computing, computational biology, genomics, proteomics, multilayer perceptron, self organizing map

34.1 Introduction Bioinformatics can be viewed as the use of computational methods to make biological discoveries [1]. It is an interdisciplinary field involving biology, computer science, mathematics and statistics to analyze biological sequence data, genome content and arrangement, and to predict the function and structure of macromolecules. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be derived [2]. There are three important sub-disciplines within bioinformatics: a) Development of new algorithms and models to assess different relationships among the members of a large biological data set in a way that allows researchers to access existing information and to submit new information as they are produced b) Analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and c) Development and implementation of tools that enable efficient access and management of different types of information. Artificial neural networks (ANN), a biologically inspired technology, is a machinery for adaptation and curve fitting and guided by the principles of biological neural

440

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

network. ANN have been studied for many years with the hope of achieving human like performance, particularly in the field of pattern recognition. They are efficient adaptive and robust classifiers, producing near optimal solutions and achieve high speed via massive parallelism. Therefore, the application of ANN for solving certain problems of bioinformatics, which need optimization of computation requirements, and robust, fast and close approximate solution, appears to be appropriate and natural. Moreover, the errors generated in experiments with bioinformatics data can be handled with the robust characteristics of ANN and minimized during the trainnig process. The problem of integrating ANN and bioinformatics constitutes a new research area. This article provides a survey of the various neural network based techniques that have been developed over the past few years for different bioinformatics tasks. In Section 34.2 we describe the elements of bioinformatics along with their biological basis. In Section 34.3 different bioinformatics tasks are explained. Then we explain the relevance of ANN in bioinformatics in Section 34.4. Different ANN based methods available to address the bioinformatics tasks are explained in Section 34.5 and Section 34.6. Finally, conclusions and some future research directions are presented.

34,2 Elements of Bioinformatics DNA (deoxyribonucleic acid) and proteins are biological macromolecules built as long linear chains of chemical components. DNA strand consists of a large sequence of nucleotides, or bases. For example there are more than 3 billion bases in human DNA sequences. DNA plays a fundamental role in different bio-chemical processes of living organisms in two respects. First it contains the templates for the synthesis of proteins, which are essential molecules for any organism [3]. The second role in which DNA is essential to life is as a medium to transmit hereditary information (namely the building plans for proteins) from generation to generation. The units of DNA are called nucleotides. One nucleotide consists of one nitrogen base, one sugar molecule (deoxyribose) and one phosphate. Four nitrogen bases are denoted by one of the letters A (adenine), C (cytosine), G (guanine) and T (thymine). A linear chain of DNA is paired to a complementary strand. The complementary property stems from the ability of the nucleotides to establish specific pairs (A-T and G-C). The pair of complementary strands then forms the double helix that was first suggested by Watson and Crick in 1953. Each strand therefore carries the entire information and the biochemical machinery guarantees that the information can be copied over and over again even when the "original" molecule has long since vanished. A gene is primarily made up of sequence of triplets of the nucleotides (exons). Introns (non coding sequence) may also be present within gene. Not all portions of the DNA sequences are coding. Coding zone indicates that it is a template for a protein. As an example, for the human genome only 3%-5% of the sequence are coding, i.e., they constitute the gene. There are sequences of nucleotides within the

34 Neurocomputing for Certain Bioinformatics Tasks

441

DNA that are spliced out progressively in the process of transcription and translation. In brief, the DNA consists of three types of non-coding sequences. 1. Intergenic regions: Regions between genes that are ignored during the process of transcription 2. Intragenic regions (or Introns): Regions within the genes that are spliced out from the transcribed RNA to yield the building blocks of the genes, referred to as Exons 3. Pseudogenes: Genes that are transcribed into the RNA and stay there, without being translated, due to the action of a nucleotide sequence.

Intran Jiink

Exon

InliDii

Exon

Exon

Junk

^ • ^

Geiie

IR

Region (IR) Fig. 34.1. Various parts of a DNA

Proteins are made up of 20 different amino acids (or "residues"), which are denoted by 20 different letters of the alphabet. Each of the 20 amino acids is coded by one or more triplets (or codons) of the nucleotides making up the DNA. Based on the genetic code the linear string of DNA is translated into a linear string of amino acids, i.e., a protein via mRNA [3].

34.3 Bioinformatics Tasks The different biological problems studied within the scope of bioinformatics can be broadly classified into two categories: genomics and proteomics which include genes, proteins, and amino acids. We describe below different tasks involved in their analysis along with their utilities. 34.3.1 Gene Sequence Analysis The evolutionary basis of sequence alignment is based on the principles of similarity and homology [4]. Similarity is a quantitative measure of the fraction of two genes which are identical in terms of observable quantities. Homology is the conclusion drawn from data that two genes share a common evolutionary history; no metric is associated with this. The tasks of sequence analysis are as follows:

442

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

Sequence Alignment An alignment is a mutual arrangement of two or more sequences, that exhibits where the sequences are similar, and where they differ. An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score but may or may not be biologically meaningful. Basically there are two types of alignment methods, global alignment and local alignment. Global alignment [5] maximizes the number of matches between the sequences along the entire length of the sequence. Local alignment [6] gives a highest scoring to local match between two sequences. Gene Finding and Promoter Identification In general DNA strand consists of a large sequence of nucleotides, or bases. Due to the huge size of the database, manual searching of genes, which code for proteins, is not practical. Therefore identification of the genes from the large DNA sequences is an important problem in bioinformatics [7]. A cell mechanism recognizes the beginning of a gene or gene cluster with the help of a promoter. The promoter is a key regulatory sequence before each gene in the DNA that serves as an indication to the cellular mechanism (transcription) that a gene is ahead. For example, the codon AUG (which codes for methionine) also signals the start of a gene. Recognition of regulatory sites in DNA fragments has become particularly popular because of the increasing number of completely sequenced genomes and mass application of DNA chips. 34.3.2 Protein Analysis Proteins are polypeptides, formed within cells as a linear chain of amino acids [8]. Within and outside of cells, proteins serve a myriad of functions, including structural roles (cytoskeleton), as catalysts (enzymes), transporter to ferry ions and molecules across membranes, and hormones to name just a few. There are twenty different amino acids that make up essentially all proteins on earth. Different tasks involved in protein analysis are as follows: Multiple Sequence Alignment In order to characterize protein families, one needs to identify shared regions of homology in a multiple sequence alignment; (this happens generally when a sequence search revealed homologies in several sequences). The clustering method can do alignments automatically but are subjected to some restrictions. Manual and eye validations are necessary in some difficult cases. The most practical and widely used method in multiple sequence alignment is the hierarchical extensions of pairwise alignment methods, where the principal is that multiple alignments is achieved by successive application of pairwise methods. Multiple amino acid sequence alignment techniques [1] are usually performed to fit one of the following scopes:

34 Neurocomputing for Certain Bioinforaiatics Tasks

443

(a) finding the consensus sequence of several aligned sequences; (b) helping in the prediction of the secondary and tertiary structures of new sequences; and (c) providing preliminary step in molecular evolution analysis using phylogenetic methods for constructing phylogenetic trees. Protein Motif Search Protein motif search [7] allows search for a personal protein pattern in a sequence (personal sequence or entry of Gene Bank). Proteins are derived from a limited number of basic building blocks (domains). Evolution has shuffled these modules giving rise to a diverse repertoire of protein sequences, as a result of it proteins can share a global or local relationship. Protein motif search is a technique for searching sequence databases to uncover common domains/motifs of biological significance that categorize a protein into a family. Structural Genomics Structural genomics is the prediction of 3-dimensional structure of a protein from the primary amino acid sequence [9]. This is one of the most challenging tasks in bioinformatics. The four levels of protein structure (Figure 34.2) are (a) Primary structure: the sequence of amino acids that compose the protein, (b) Secondary structure: the spatial arrangement of the atoms constituting the main protein backbone, such as alpha helices and beta strands, (c) Tertiary structure: formed by packing secondary structural elements into one or several compact globular units called domains, and (d) Final protein may contain several polypeptide chains arranged in a quaternary structure.

^^arw^iimtm

^wdrnm

Ji^^m^am

Fig. 34.2. Different levels of protein structures

Sequence similarity methods can predict the secondary and tertiary structures based on homology to known proteins. Secondary structure prediction can be made

444

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

using Chou-Fasman [9], GOR, neural network, and nearest neighbor methods. Methods of tertiary structure prediction methods involve energy minimization, molecular dynamics, and stochastic searches of conformational space. 34.3.3 Gene Expression and Microarrays Gene expression is the process by which a gene's coded information is converted into the structures present and operating in the cell. Expressed genes include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein (e.g., transfer and ribosomal RNAs). Not all genes are expressed and gene expression involves the study of the expression level of genes in the cells under different conditions. Conventional wisdom is that gene products that interact with each other are more likely to have similar expression profiles than if they do not [10]. Microarray technology [11] allows expression levels of thousands of genes to be measured at the same time. Comparison of gene expression between normal and diseased (e.g., cancerous) cells are also done by microarray. There are several names for this technology - DNA microarrays, DNA arrays, DNA chips, gene chips, others. A microarray is typically a glass (or some other material) slide, on to which DNA molecules are attached at fixed locations (spots). There may be tens of thousands of spots on an array, each containing a huge number of identical DNA molecules (or fragments of identical molecules), of lengths from twenty to hundreds of nucleotides. For gene expression studies, each of these molecules ideally should identify one gene or one exon in the genome, however, in practice this is not always so simple due to families of similar genes in a genome. The spots are either printed on the microarrays by a robot, or synthesized by photo-lithography (similarly as in computer chip productions) or by ink-jet printing. Many unanswered, and important, questions could potentially be answered by correctly selecting, assembling, analyzing, and interpreting microarray data. Clustering is commonly used in microarray experiments to identify groups of genes that share similar expressions. It may also help in identifying promoter sequence elements that are shared among genes. 34.3.4 Gene Regulatory Network Analysis Since almost all cells in a particular organism have an identical genome, differences in gene expression and not the genome content are responsible for cell differentiation (how different cell types develop from a fertilized egg) during the life of the organism. In gene regulation (how gene expression is switched on and off) an important role is played by a type of proteins called transcription factors. The transcription factors can attach (bind) to specific parts of the DNA, called transcription factor binding sites (i.e., specific, relatively short combinations of A, T, C or G), which are located in so-called promoter regions. Specific promoters are associated with particular genes

34 Neurocomputing for Certain Bioinformatics Tasks

445

and are generally not too far from the respective genes, though some regulatory effects can be located as far as 30,000 bases away, which makes the definition of the promoter difficult. Transcription factors control gene expression by binding the gene's promoter and either activating (switching on) the gene's transcription, or repressing it (switching it off). Transcription factors are gene products themselves, and therefore in turn can be controlled by other transcription factors. Transcription factors can control many genes, and some (probably most) genes are controlled by combinations of transcription factors. Feedback loops are also possible. Therefore we can talk about gene regulation networks.

34.4 Relevance of Neural Network in Bioinformatics Artificial neural network (ANN) models try to emulate the biological neural network with electronic circuitry. Recently, ANN have found a widespread use for classification tasks and function approximation in bioinformatics. For bioinfprmatics data analysis mainly two types of networks are employed, "supervised" neural networks (SNN) and "unsupervised" neural networks (UNN). The main applications of SNN (e.g. Multilayer perceptron) are function approximation, classification, pattern recognition and feature extraction, and prediction. Moreover, they are able to detect second and higher order correlations in patterns. This is specially important in biological systems, which frequently display a nonlinear behavior. These networks are able to determine the relevant features in unknown data after training (with known data). This principle coined the term "supervised" networks. Correspondingly, "unsupervised" networks (e.g. Kohonen self organizing maps) can be applied to clustering and feature extraction tasks with new (unknown) data. The main characteristics of ANN are: a) b) c) d)

Adaptability to new data/environment Robustness/ ruggedness to failure of components Speed via massive parallelism Optimality w.r.t. error

Let us now explain the functioning of ANN in bioinformatics with an example of protein secondary structure prediction from a linear sequence of amino acids (Figure 34.4). Step 1: In the ANN usually a certain number of input "nodes" are each connected to every node in a hidden layer. Step 2: Every residue in a PDB (Protein Data Bank) entry can be associated to one of the three secondary structures (HELIX, SHEET or neither: COIL). ANN are designed with 21 input nodes (one for each residue including a null residue) and three output nodes coding for each of the three possible secondary structure assignments (HELIX, SHEET and COIL). Step 3: Each node in the hidden layer is then connected to every node in the final output layer.

446

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

Step 4: The input and output nodes are restricted to binary values (1 or 0) when loading the data onto the network during training and the weights are then modified by the program itself during the training process. Step 5: HELIX can be coded as 0,0,1 on the three output nodes; SHEET can be coded as 0,1,0 and COIL as 1,0,0. A similar binary coding scheme can be used for the 20 input nodes for the 20 amino acids. Step 6: To consider a moving window of n residues at a time, input layer should contain 20 x n nodes plus one node at each position for a null residue. Step 7: Each node will "decide" to send a signal to the nodes it is connected to, based on evaluating its transfer function after all of its inputs and connection weights have been summed. Step 8: Over 100 protein structures were used to train the network. Step 9: Training proceeds by holding a particular data constant onto both the input and output nodes and iterating the network in a process that modifies the connection weights until the changes made to them approach zero. Step 10: When such convergence is reached, the network is said to be trained and is ready to receive new (unknown) experimental data. Step 11: Now the connection weights are not changed and the values of the hidden and output nodes are calculated in order to determine the structure of the input sequence of proteins. Selection of unbiased and normalized training data, however, is probably just as important as the network architecture in the design of a successful NN.

TC.

Fig. 34.3. A linear chain of amino acids is applied as input to the ANN

34.5 ANN In Bioinformatics Let us now describe the different attempts made using ANNs in certain tasks of bioinformatics in the broad domains of sequence analysis, structure prediction, and gene analysis described in Section 34.3.

34 Neurocomputing for Certain Bioinformatics Tasks

447

34.5.1 Sequence Alignment Given inputs extracted from an aligned column of DNA bases and the underlying Perkin Elmer Applied Biosystems (ABI) fluorescent traces, AUex et al. [12] trained a neural network to determine correctly the consensus base for the column. They empirically compared five representations; one uses only base calls and the others include trace information. The networks that incorporate trace information into their input representations attained the most accurate results for consensus sequence. Consensus accuracies ranging from 99.26% to 99.98% are acheived for coverages from two to six aligned sequences. In contrast, the network that only uses base calls in its input representation has over double that error rate. GenTHREADER is a neural network architecture that predicts similarity between gene sequences [13]. The effects of sequence alignment score and pairwise potential are the network outputs. GenTHREADER was successfully used for the structure prediction in two cases: case 1: ORF MG276 from Mycoplasma genitalium was predicted to share structure similarity with IHGX; case 2: MG276 shares a low sequence similarity (10% sequence identity) with IHGX. In [14] a molecular alignment method with the Hopfield Neural Network (HNN) is discussed. Other related investigations in sequence analysis are available in [15, 16]. 34.5.2 Gene Finding and Promoter Identification The application of artificial neural networks for discriminating the coding system of eukaryotic genes is investigated in [17]. Over 300 genes different discrimination models are build which are relevant to genes promoter regions, poly(A) signals, splice site locations of introns and noose structures. The results showed that as long as the coding length is definite, the only correct coding region can be chosen from the large number of possible solutions discriminated by neural networks. In [18] the quantitative similarity among tRNA gene sequences was acquired by analysis with an artificial neural network. The evolutionary relationship derived from ANN results was consistent with those from other methods. A new sequence was recognized to be a tRNA-like gene by a neural network on the analysis of similarity. The work of Lukashin et al. [19] is one of the earlier investigations that discussed the problem of recognition of promoter sites in the DNA sequence in neural network framework. The learning process involves a small (of the order of 10%) part of the total set of promoter sequences. During this procedure the neural network develops a system of distinctive features (key words) to be used as a reference in identifying promoters against the background of random sequences. The learning quality is then tested with the whole set. The efficiency of promoter recognition has been reported as 94 to 99% and the probability of an arbitrary sequence being identified as a promoter is 2 to 6%.

448

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

In [20] a multilayered feed-forward ANN architecture is trained for predicting whether a given nucleotide sequence is a mycobacterial promoter sequence. The ANN is used in conjunction with the caliper randomization (CR) approach for determining the structurally/functionally important regions in the promoter sequences. This work shows that ANNs is an efficient tool for predicting mycobacterial promoter sequences and determining structurally/functionally important sub-regions therein. Other related investigations in this field are available in [21, 22, 23]. 34.5.3 Protein Analysis The most successful techniques for prediction of the three-dimensional structure of protein rely on aligning the sequence of a protein of unknown structure to a homologue of known structure. Such methods fail if there is no homologue in the structural database, or if the technique for searching the structural database is unable to identify homologues that are present. The work of Qian et al [24] is one of the earlier investigations that discussed the protein structure prediction problem in neural network framework. After training the ANN with more than 100 X-ray crystal structures of globular proteins, a prediction accuracy of 64% was obtained for secondary structure of non-homologous proteins. Rost et al. [25, 26] took advantage of the fact that a multiple sequence alignment contains more information about a protein than the primary sequence alone. Instead of using a single sequence as input into the network, they used a sequence profile that resulted from the multiple alignments. Design of ANN with bi-directional training and the use of the entire protein sequence as simultaneous input instead of a shifting window of fixed length has led to prediction accuracy above 71%. In [27] a method has been developed using ANN for the prediction of beta-turn types I, II, IV and VIII. For each turn type, two consecutive feed-forward backpropagation networks with a single hidden layer have been used. The first sequenceto-structure network has been trained on single sequences as well as on PSI-BLAST PSSM. The output from the first network along with PSIPRED [28] predicted secondary structure has been used as input for the second-level structure-to-structure network. The networks have been trained and tested on a non-homologous data set of 426 proteins chains by seven-fold cross-validation. The prediction performance for each turn type is improved by using multiple sequence alignment, second level structure-to-structure network and PSIPRED predicted secondary structure information. Wood et al. [29] compared the cascade-correlation ANN architecture [30] with back-propagation ANN using a constructive algorithm and found that cascadecorrelation achieves predictive accuracies comparable to those obtained by backpropagation, in shorter time. Ding et al. [31] used support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers for protein fold recognition, without relying on sequence similarity. Other related investigations in protein structure prediction are available in [32, 33, 34, 35, 36, 37, 38].

34 Neurocomputing for Certain Bioinformatics Tasks

449

34.5.4 Gene Expression and Microarray Most of the analysis of the enormous amount of information provided on microarray chips with regard to cancer patient prognosis has relied on clustering techniques and other standard statistical procedures. These methods are inadequate in providing the reduced gene subsets required for perfect classification. ANNs trained on microarray data from DLBCL lymphoma patients have, for the first time, been able to predict the long-term survival of individual patients with 100% accuracy [39]. Here it is shown that differentiating the trained network can narrow the gene profile to less than three dozen genes for each classification and artificial neural networks are a superior tool for digesting microarray data. Bicciato et al. [40] described computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. The methodology has been tested on two different microarray datasets, acute human leukemia and the human colon adenocarcinoma. Vohradsky [41] used artificial neural networks as a model of the dynamics of gene expression. The significance of the regulatory effect of one gene product on the expression of other genes of the system is defined by a weight matrix. The model considers multigenic regulation including positive and/or negative feedback. The process of gene expression is described by a single network and by two linked networks where transcription and translation are modeled independently. Each of these processes is described by different networks controlled by different weight matrices. Methods for computing the parameters of the model from experimental data are also shown. On the basis of published microarray data, Ando et al. [42] described a fuzzy neural network (FNN) model to analyze gene expression profiling data for the precise and simple prediction of survival of DLBCL patients. From data on 5857 genes, this model identified four genes (CD 10, AA807551, AA805611 and IRF-4) that could be used to predict prognosis with 93% accuracy. Relevant investigations for Gene Expression and Microarray is also available in [43,44, 45, 46, 47]. 34.5.5 Gene Regulatory Network Adaptive double self-organizing map (ADSOM) [48] provides a clustering technique for identifying gene regulatory networks. It has a flexible topology and it performs clustering and cluster visualization simultaneously, thereby requiring no a-priori knowledge about the number of clusters. ADSOM is developed on a technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors to provide a visualization tool to decide how many clusters are needed, but its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number

450

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. It may be noted that gene regulatory network analysis is a very recent research area, and neural network applications to it are scarce. Using simulated data, Ritchie et al. [49] optimized back propagation neural network architecture using genetic programming to improve the ability of neural networks to model, identify, characterize and detect nonlinear gene-gene interactions in studies of common human diseases. They showed that the genetic programming optimized neural network is superior to the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present.

34.6 Other Bioinformatics Tasks Using ANN Dopazo et al. [50] described a unsupervised growing self-organizing neural network that expands itself following the taxonomic relationships existing among the sequences being classified. The binary tree topology of this neural network, opposite to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. The time for convergence is approximately a linear function of the number of sequences. This neural network methodology is an excellent tool for the phylogenetic analysis of a large number of sequences. Parbhane et al. [51] utilize an artificial neural network (ANN) for the prediction of DNA curvature in terms of retardation anomaly. ANN captured the phase information and increased helix flexibility. Base pair effects in determining the extent of DNA curvature has been developed. The network predictions validate the known experimental results and also explain how the base pairs affect the curvature. The results suggest that ANN can be used as a model-free tool for studying DNA curvature. Drug resistance is a very important factor influencing the failure of current HIV therapies. The ability to predict the drug resistance of HIV protease mutants may be useful in developing more effective and longer lasting treatment regimens. In [52] a classifier was constructed based on the sequence data of various drug resistant mutants. Self-organizing maps were first used to extract the important features and cluster the patterns in an unsupervised manner. This was followed by subsequent labelling based on the known patterns in the training set. The classifier using the structure information is able to correctly recognize the previously unseen mutants with an accuracy of between 60 and 70%. The method is superior to a random classifier. Neural network computations on DNA and RNA sequences are used in [53] to demonstrate that data compression is possible in these sequences. The result implies

34 Neurocomputing for Certain Bioinformatics Tasks

451

that a certain discrimination should be achievable between structured vs random regions. The technique is illustrated by computing the compressibility of short RNA sequences such as tRNA.

34.7 Conclusions and Discussion The rationale for applying computational approaches to facilitate the understanding of various biological processes are mainly a) to provide a more global perspective in experimental design b) to capitalize on the emerging technology of database-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms. Neural networks appear to be a very powerful artificial intelligence (AI) paradigm to handle these issues [54]. Other soft computing tools, like fuzzy set theory and genetic algorithms, integrated with ANN [55] may also be used; based on the principles of Case Based Reasoning [56]. Even though the current approaches in biocomputing are very helpful in identifying patterns and functions of proteins and genes, they are still far from being perfect. They are not only time-consuming, requiring Unix workstations to run on, but might also lead to false interpretations and assumptions due to necessary simplifications. It is therefore still mandatory to use biological reasoning and common sense in evaluating the results delivered by a biocomputing program. Also, for evaluation of the trustworthiness of the output of a program it is necessary to understand the mathematical/theoretical background of it to finally come up with a use- and senseful analysis.

References 1. Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA (1998) 2. Altman, R.B., Valencia, A., Miyano, S., Ranganathan, S.: Challenges for intelligent systems in biology IEEE Intelligent Systems 16 (2001) 14-20 3. setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. International Thomson Publishing, 20 park plaza, Boston, MA 02116 (1999) 4. Nash, H., Blair, D., Grefenstette, J.: Comparing algorithms for large-scale sequence analysis. Proc. 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE'Ol) (2001) 89-96 5. Needleman, S.B., Wunsch, CD.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970) 443-453 6. Smith, T.F., Waterman, M.S.: Identification of common molecular sequences. Journal of Molecular Biology 147 (1981) 195-197

452

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

7. Fickett, J.W.: Finding genes by computer: The state of the art. Trends in Genetics 12 (1996)316-320 8. Salzberg, S.L., Searls, D.B., Kasif, S.: Computational Methods In Molecular Biology. Elsevier Science, Amsterdam (1998) 9. Chou, P., Fasmann, G.: Prediction of the secondary structure of proteins from their amino acid sequence. Advances in Enzymology 47 (1978) 145-148 10. Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is Bioinformatics? A Proposed Definition and Overview of the Field. Yearbook of Medical Informatics (2001) 83-100 11. Quackenbush, J.: Computational analysis of microarray data. National Review of Genetics 2 (2001) 418^27 12. Allex, C.F., Shavlik, J.W., Blattner, F.R.: Neural network input representations that produce accurate consensus sequences from DNA fragment assemblies. Bioinformatics 15 (1999)723-728 13. Jones, D.T.: GenTHREADER: An Efficient and Reliable Protein Fold Recognition. Journal of Molecular Biology 287 (1999) 797-815 14. Arakawa, M., Hasegawa, K., Funatsu, K.: Application of the novel molecular aHgnment method using the Hopfield Neural Network to 3D-QSAR. J Chem Inf Comput Sci. 43 (2003) 1396-1402 15. Hirst, J.D., Sternberg, M.J.: Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31 (1992) 7211-7218 16. Petersen, S.B., Bohr, H., Bohr, J., Brunak, S., Cotterill, R.M., Fredholm, H., Lautrup, B.: Training neural networks to analyse biological sequences. Trends Biotechnol. 8 (1990) 304-308 17. Cai, Y., Chen, C : Artificial neural network method for discriminating coding regions of eukaryotic genes. Comput Appl Biosci. 11 (1995) 497-501 18. Sun, J., Song, W.Y., Zhu, L.H., Chen, R.S.: Analysis of tRNA gene sequences by neural network. J Comput Biol. 2 (1995) 409-416 19. Lukashin, A.V., Anshelevich, V.V., Amirikyan, B.R., Gragerov, A.I., Frank-Kamenetskii, M.D.: Neural network models for promoter recognition. J Biomol Struct Dyn. 6 (1989) 1123-1133 20. Kalate, R.N., Tambe, S.S., Kulkarni, B.D.: Artificial neural networks for prediction of mycobacterial promoter sequences. Comput Biol Chem. 27 (2003) 555-564 21. Sherriff, A., Ott, J.: Applications of neural networks for gene finding. Adv Genet. 42 (2001) 287-297 22. Reese, M.G.: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem. 26 (2001) 51-56 23. Mahadevan, I., Ghosh, I.: Analysis of E.coli promoter structures using neural networks. Nucleic Acids Res. 22 (1994) 2158-2165 24. Qian, N., Sejnowski, T.J.: Predicting the secondary structure of globular proteins using neural network models . Journal Molecular Biology 202 (1988) 865-884 25. Rost, B., Sander, C : Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. National Academy of Sciences USA 90 (1993) 7558-7562 26. Rost, B., Sander, C : Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232 (1993) 584-599 27. Kaur, H., Raghava, G.P.: A neural network method for prediction of beta-turn types in proteins using evolutionary information. Bioinformatics. 2004 May 14 (2004) accepted 28. McGuffin LJ, Bryson K, J.D.: The PSIPRED protein structure prediction server. Bioinformatics 16 (2000) 404-405

34 Neurocomputing for Certain Bioinformatics Tasks

453

29. Hirst, M.J.WJ.D.: Predicting protein secondary structure by cascade-correlation neural networks. Bioinformatics 20 (2004) 419-420 30. Pasquier, C, Promponas, V.J., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins 44 (2001)361-369 31. Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349-358 32. Berry, E.A., Dalby, A.R., Yang, Z.R.: Reduced bio basis fimction neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem. 28 (2004) 75-85 33. Shepherd AJ, Gorse D, T.J.: A novel approach to the recognition of protein architecture fi'om sequence using Fourier analysis and neural networks. Proteins 50 (2003) 290-302 34. Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 17 (2001) 234242 35. Lin, K., May, A.C., Taylor, W.R.: Threading using neural nEtwork (TUNE): the measure of protein sequence-structure compatibility. Bioinformatics 18 (2002) 1350-1357 36. Cai, YD., Liu, X.J., Chou, K.C.: Prediction of protein secondary structure content by artificial neural network. J Comput Chem. 24 (2003) 727-731 37. Dietmann, S., Frommel, C: Prediction of 3D neighbours of molecular surface patches in proteins by artificial neural networks. Bioinformatics 18 (2002) 167-174 38. Riis, S., Krogh, A.: Improving Prediction of Protein Secondary Structure using Structured Neural Networks and Multiple Sequence Alignments. Journal of Computational Biology 3 (1996) 163-183 39. O'Neill, M.C., Song, L.: Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics 4 (2003) 13-20 40. Bicciato, S., Pandin, M., Didone, G., Bello, CD.: Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng. 81 (2003) 594-606 41. Vohradsky, J.: Neural network model of gene expression. FASEB J. 15 (2001) 846-854 42. Ando, T., Suguro, M., Hanai, T., Kobayashi, T., Honda, H., Seto, M.: Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 93 (2002) 1207-1212 43. Sawa, T., Ohno-Machado, L.: A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med. 33 (2003) 1-15 44. Spicker, J.S., Wikman, E, Lu, M.L., Cordon-Cardo, C, Workman, C, ORntoft, T.F, Brunak, S., Knudsen, S.: Neural network predicts sequence of TP53 gene based on DNA chip. Bioinformatics 18 (2002) 1133-1134 45. Herrero, J., Valencia, A., Dopazo, J.: A hierarchical unsupervised growing neural network for clustering gene expression pattems. Bioinformatics 17 (2001) 126-136 46. Software, P.: PNN Technologies. (Pasadena, CA) 47. Liang, Y, Georgre, E.O., Kelemen, A.: Bayesian Neural Network for Microarray Data. Technical Report (Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, U.S.A.) 48. Ressom, H., Wang, D., Natarajan, P.: Clustering gene expression data using adaptive double self-organizing map. Physiol. Genomics 14 (2003) 35-46 49. Ritchie, M.D., White, B.C., Parker, J.S., Hahn, L., Moore, J.H.: Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics 4 (2003) 28-36

454

S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal

50. Dopazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44 (1997) 226-233 51. Parbhane, R.V., Tambe, S., Kulkami, B.D.: Analysis of DNA curvature using artificial neural networks. Bioinformatics 14 (1998) 131-138 52. Draghici, S., Potter, R.B.: Predicting HIV drug resistance with neural networks. Bioinformatics 19 (2003) 98-107 53. Alvager, T., Graham, G., Hutchison, D., Westgard, J.: Neural network method to analyze data compression in DNA and RNA sequences. J Chem Inf Comput Sci. 37 (1997) 335337 54. Pal, S.K., Polkowski, L., Skowron, A.: Rough-neuro Computing: A way of computing with words. Springer, Berlin (2003) 55. Pal, S.K., Mitra, S.: Neuro-fuzzy Pattern Recognition: Methods in Soft Computing Paradigm. John Wiley, NY (1999) 56. Pal, S.K., Shiu, S.C.K.: Foundations of Soft Case Based Reasoning. John Wiley, NY (2004)

35 Rough Set Based Solutions for Network Security* Guoyin Wang, Long Chen, and Yu Wu Institute of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing, 400065, RR. China [email protected] Summary. The problem of network security and intrusion detection is discussed atfirst,and then some data-mining-based methods are presented to solve these problems. The problems, possibilities, and methods of data mining solutions for intrusion detection are further analyzed. The art of rough-set-based solutions for network security, and application frame are also discussed. It is shown that rough-set-based method is promising in terms of detection accuracy, requirement for training data set and efficiency. Rough-set-based new techniques such as data reduction, incremental mining, uncertain data mining, and initiative data mining are suggested for intrusion detection systems. Key words: rough set, network security, intrusion detection system, data mining

35.1 Introduction With the rapid growing of interconnections among computer systems, network-based computer systems are playing increasingly vital roles in modem society. They become the target of intrusions by potential intruders. Network security is becoming a major challenge. In order to meet this challenge, intrusion detection systems (IDS) are designed to protect network information systems. Intrusion detection is important in the network security framework. An IDS evaluates suspected intrusions, it signals an alarm once a suspected intrusion happens. An IDS also watches for attacks that originate from within a system. Intrusion detection techniques are generally categorized into anomaly detection and misuse detection. Misuse detection systems use patterns of well-known attacks or weak spots of the system to match and identify known intrusion patterns or signatures. Anomaly detection systems attempt to quantify the usual or acceptable behav*This paper is partially supported by National Natural Science Foundation of P. R. China (No.60373111), Key Science and Technology Research Foundation by the State Education Ministry of R R. China, Application Science Foundation of Chongqing of P. R. China, and Science and Technology Research Program of the Municipal Education Committee of Chongqing ofR R.China.

456

Guoyin Wang, Long Chen, and Yu Wu

iors and flag irregular activities that deviate significantly from the established normal usage profiles as anomalies (i.e. potential intrusions). Another useful classification for intrusion detection systems is according to their data source [1]. Normally, the data source determines the types of intrusions that can be detected. The two general categories are host-based detection and network-based detection. For host-based systems, the data source is from an individual host on the network. In particular, these systems employ their host's operating system audit trail as the main source of input. Because host-based systems directly monitor the host data files and operating system processes, they can determine exactly which host resources are the targets of a particular attack. Recent research in intrusion detection techniques has shifted from user based intrusion detection to process based intrusion detection. Process based intrusion detection tools analyze the behavior of executing processes for possible intrusive activity. The premise of process based intrusion detection is that most computer security violations are made by misusing programs. When a program is misused, its behavior will differ from its normal usage. With the rapid development of computer networks, some traditional single-host based intrusion detection systems have been modified to monitor a number of hosts on a network. They transfer the monitored information from multiple monitored hosts to a central site for processing. These are termed distributed intrusion detection systems, such as IDES [2,3], NSTAT [4], and AAFID [5]. Network-based intrusion detection systems employ network traffic as the main source of input. There are a set of traffic sensors within the network. These sensors perform local analysis, detect suspicious events and report to a central location. Recent developments in network oriented intrusion detection systems have moved the focus from network traffic to the computational infrastructure (the hosts and their operating systems) and the communication infrastructure (the network and its protocols). They use the network as just a source of security-relevant information. Network-based intrusion detection systems have been widened to address large, complex network environments. Examples of this trend include GrIDS (Graph based Intrusion Detection System) [6], EMERALD [7], NetStat [8], and CARDS (Coordinated Attack Response and Detection System) [9]. Intrusion detection systems can be also divided into passive or reactive intrusion detection systems. In a passive system, the IDS detects a potential security breach, logs the information and signals an alert. In a reactive system, the IDS responds to the suspicious activity by logging off a user or reprogramming the firewall to block network traffic from the suspected malicious source.There are several open questions for intrusion detection techniques [10]: •

•

Soundness of approach: Does the approach actually detect intrusions? Is it possible to distinguish anomalies related to intrusions from those related to other factors? Completeness of approach: Does the approach detect most, if not all, intrusions, or is a significant proportion of intrusions undetectable by this method?

35 Rough Set Based Solutions for Network Security

• •

• •

•

457

Timeliness of approach: Can we detect most intrusions before significant damage is done? Choice of metrics, statistical models, and profiles: What metrics, models, and profiles provide the best discriminating power? Which are cost-effective? What are the relationships between certain types of anomalies and different methods of intrusion? System design: How should a system based on the model be designed and implemented? Feedback: What effect should a detection of an intrusion have on the target system? Should an IDS system automatically direct the system to take certain actions? Social implications: How will an intrusion detection system affect the user community it monitors? Will it deter intrusion? Will the users feel their data are better protected? Will it be regarded as a step towards "big brother"? Will its capabilities be misused to that end?

The following criteria should be the goal of intrusion detection systems [11]. • • •

• •

Completeness: All operations in a valid trace should be classified as valid (or normal). Consistency: For every invalid trace (or intrusion trace), a valid access specification should classify at least one operation as bad (or invalid). Compactness: The specification should be concise so that it can be inspected by a human and be suitable for real-time detection. One simple compactness measure is the number of rules (or clauses) in a specification. Predictability: The specification should be able to explain future execution traces with a low false alarm rate. Detectability: The specification should fit closely to the actual valid behavior and reject future execution traces that are intrusions.

In the following, problems and possible solutions of data mining in network security are discussed in section 35.2. Then, rough set based incremental rule generation algorithm is introduced in section 35.3. Several rough-set-based methods for network security are further discussed in section 35.4 in detail. Finally, in section 35.5, we conclude the paper and give some suggestions for future work.

35.2 Data Mining in Network Security Some commonly used techniques in data mining are: artificial neural networks, fuzzy sets, rough sets, decision trees, genetic algorithms, nearest neighbor method, statistics based rule induction, linear regression and linear predictive coding, et al. Most existing information systems have security flaws that render them susceptible to intrusions, penetrations, and other forms of abuse. Finding and fixing all these deficiencies is not feasible for technical and economic reasons. It is not easy to replace existing systems with known flaws with more secure systems because these

458

Guoyin Wang, Long Chen, and Yu Wu

systems have attractive features that are missing in the more-secure systems, or for economic reasons. Developing secure systems is extremely difficult, if not generally impossible. Even the most secure systems are vulnerable to abuses by insiders who misuse their privileges. Thus, the development of real-time intrusion detection systems is needed and important. However, real-time detection of previously unseen attacks with high accuracy and a low false alarm rate is still a challenge. Many recent approaches for intrusion detection have applied data mining techniques, which have been empirically proven effective [10,12]. One major drawback of data mining based approaches is that the data required for training is very expensive to produce [13]. Data mining based IDSs collect data from sensors that monitor some aspect of a system. Sensors may monitor network activity, system calls used by user processes, or file system access. They extract predictive features from the raw data stream being monitored to produce formatted data that can be used for detection. A detection model determines whether the data is intrusive. Algorithms for building detection models are also usually classified into two categories: misuse detection and anomaly detection. Misuse detection models are typically obtained by training on a large set of data in which the attacks have been manually labeled. It is very expensive to produce these data because each piece of data must be labeled as either normal or some particular attack. Anomaly detection models compare sensor data to normal patterns learned from a large amount of training data. The data used for training is required to be purely normal and does not contain any attacks. This data can be very expensive because the process of manually cleaning the data is quite time consuming. Models trained on data gathered from one environment may not perform well in some other environment. This means that in order to obtain the best intrusion detection models, data must be collected from all possible environments in which the intrusion detection system will be deployed. The cost of generating data sets can be very expensive and the cost incurred is a significant barrier to IDS deployment. Because possible malicious behaviors and intruder actions are potentially infinite, it is difficult and impossible to demonstrate all of them from a finite training corpus. Furthermore, the previously unseen attack is often the greatest threat. Finally, for reasons of privacy, it is desirable that a user-based anomaly-detection agent only employs data that originate from the profiled user or are publicly available. Releasing traces of one's own normal behaviors, even to assist the training of someone else's anomaly detector, runs the risk that the data will be abused to subvert the original user's security mechanisms. Thus, in many cases, only positive instances are available for training. Learning from only positive examples presents a challenge for classification, since it can easily lead to overgeneralization. The accuracy of data mining based detection models depends on sufficient training data and suitable feature set. According to the above analysis of intrusion detection systems, we can find that there are many unsolved problems in intrusion detection systems, such as,

35 Rough Set Based Solutions for Network Security • • • • • •

459

How to design the model for describing the characteristics of normal behaviors and abnormal behaviors? How to design the model for describing the characteristics of specific misuse intrusion behaviors? How to reduce the cost of collecting, storing, processing the huge amounts of source data for mining knowledge of intrusion behaviors? How to adjust the intrusion detection systems with the growing of source data at low cost and the changing of users' behaviors? How to implement efficient intrusion detections to reduce the damages caused by intrusions? How to increase the detection rate while decreasing the false positive rate at the same time?

It should be possible to solve the above problems using some specific data mining techniques. Many data mining techniques have been used to model normal and abnormal behaviors, and many kinds of misuse intrusions. The critical issue is how to combine multiple simple detection models into one integrated intrusion detection system. Since it is very expensive to produce the training data required by data mining based intrusion detection systems, rough set [14] might be a potential method to solve the problem of reducing the cost of collecting, storing, processing the huge amounts of source data. Rough set has a unique advantage in data reduction. It can be used to process the data available at present and recommend the importance degrees of different data generated by each sensor for distinguishing each kind of intrusion behaviors from normal behaviors. Thus, we could know what features are important for intrusion detection, and what data is sufficient for each intrusion detection system. We need not to obtain, store, or process unnecessary sensor data. The data obtained from the IDS monitors increases quickly. The behaviors and methods of intruders often vary also. In order to detect these new intrusion behaviors, data mining based intrusion detection systems are required to be updated frequently through mining all collected data, including data mined before and new coming data, again. Generally, data mining processes are time consuming and very expensive. It would be very helpful if incremental data mining method were available. Fortunately, many researchers have developed several data mining algorithms with incremental leaming ability, such as rough set and rule tree based incremental knowledge acquisition algorithm [15]. We need not to mine the data mined before again while mining new coming data. New detection models and knowledge can be extracted from the new coming data and be added into the ones mined from the old data. Thus, the detection ability of an intrusion detection system can grow itself.

35.3 Rough Set Based Incremental Rule Generation Algorithm There are thousands of variations of known attacks and new attacks emerging in network every day. How to update the rule set is a severe problem for every intrusion detection system. One technical solution in classical rough set theory is to

460

Guoyin Wang, Long Chen, and Yu Wu

combine new attack records with original training data as new training data, and then repeat the procedure of rule generation with the new training data. Nevertheless, with the quick growing of training data, it is getting more and more difficult and time-consuming to maintain and update the rule set. To some extent, it seems that the knowledge acquired before is useless, since all knowledge have to be re-studied again even when only one new attack appears. In addition, most works are repetitive and time-consuming. To solve this problem, incremental leaming would be an appropriate choice. Human brain is a typical representation of incremental leaming. For example, when an undergraduate is leaming new knowledge, he needn't leam the knowledge he has already leamed in elementary school and high school again. He can update his knowledge stmcture with new knowledge. An incremental knowledge acquisition algorithm (RRIA) [15] based on rough set and mle tree is introduced to update the mle set of an intmsion detection system. When a new attack type or new variation of attack appears, we need not repeat the procedure of mle generation with the whole training data. The only thing we should do is to generate new mles from new attack records and then combine them with the original mle set. The original mle set is in the following format of mle tree [15]. 1) A mle tree is composed of one root node, some leaf nodes and some intemal nodes. 2) The root node represents the whole mle set. 3) Each path from the root node to a leaf node represents a mle. 4) Each intemal node represents an attribute testing. Each branch represents a possible value of an attribute in a mle set. If an attribute is reduced in some mles, a special branch will be needed to represent it and the value of the attribute in this mle will be supposed to be "*". "*" is different from any possible values of the attribute. When a new mle needs to be added into the mle tree, the following incremental leaming will be done. The processing for more than one record is simply to repeat this process. 1) Begin with root node of a mle tree. 2) Scan the mle tree and find a path that matches this new mle. If there are several such paths, we use some strategies to select one. Then a matched mle is obtained. 3) If the decision value of the matched mle is different from that of the new mle, re-study those data(mles) related to the matched mle. 4) If the new mle is not matched, insert it into the mle tree. Experimental results show that RRIA algorithm has higher speed and equal (or a slightly higher) recognition rate than classical rough set algorithms. Because RRIA algorithm can reduce data records required for rule acquisition and shorten the mntime of updating mle set while maintaining satisfied performance, it is appropriate for processing huge data online.

35 Rough Set Based Solutions for Network Security

461

35.4 Rough Set Based Methods for Network Security 35.4.1 Junk email detection based on rough set Email is very familiar to us. It is very important to eliminate unsolicited emails or junk emails. Rough set based filters can be used to detect junk emails on the Internet. One major technique of it is to build filters in email transfer route. We noticed that many junk email filters hadn't made use of the whole security information in an email, which existed mostly in the junk email header rather than in the text and attachment. Lets have a look at our decision table for email headers which is helpful to judge an email is a junk email or not [16]. Eleven condition attributes and one decision attribute are defined in the following. Condition Attributes: A. Number of "Received" fields. That is the times of email relaying. One "Received" per relay. B. Number of addressees. C. Number of email route interruption. For example, it is a route interruption when the receiver's domain name and IP address in the former "Received" field are different from those in the latter "Received" field; D. Number of mismatch between domain name and its corresponding IP address. This attribute is rather important. Since the dynamic of domain name and limit on network resource, it is difficult to obtain this attribute in our tests. Therefore, its default value is zero in this paper. E. Number of no domain name of sending host after "from" in "Received" field. F. Number of no domain name of receiving host after "by" in "Received" field. G. Number of no IP address of sending host after "from" in "Received" field. H. Whether the original sender address in "From" field is accordant with that in "Received" field. The original sender address is given in the last "Received" field after "from" or "by". I. Whether the destination address in "To"fieldis accordant with that in "Received" field. The latter is the actual receiver. J. If "Delivered-To" field exists, whether it is accordant with "To" field. Its default value is 1 (yes). K. If "Return-Path" field exists, whether it is accordant with "From" field. Its default value is 1 (yes). Decision Attribute: L. Whether junk email or not. Legitimate emails value 1 and junk mails value 2. There are several processes to mine knowledge from a decision table, such as • • •

Preprocessing of data, including dealing with missing attribute values, data discretization. Attribute reduction. Value reduction.

462

Guoyin Wang, Long Chen, and Yu Wu

We obtain some useful junk email detection knowledge from email headers. Our simulation results demonstrate that when mining on selected baleful email corpus, the filter has high efficiency and high identification rate. 35.4.2 IDS Architecture Based on Rough Set A network-based IDS based on rough set theory is developed recently [17]. The system architecture is shown in Fig.35.1. Since in high-speed network, the detection efficiency of a single host will be very low, a distributed framework is adopted. It is mainly composed of Sensor, Protocol Decoder, Rule Generation Component, Intrusion Detection Module, Alarm/Response Plug-in, and Administrator. Their functions are as follows. •

•

•

•

•

•

Sensor is responsible for collecting network data. It is actually an interface to capture the information flowing through a network card on a machine or scan port of a switch. Evidently, its location determines the localization of intrusion detection. For example, intrusion detection can be done on a single machine, a network segment, or a gateway. Protocol Decoder analyzes raw data collected by the Sensor. At the beginning, it reassembles those data according to their protocol. Then, it converts raw packet and connection data into a format so that the rule generation component and intrusion detection module can use. Rule Generation Component integrates rough set theory with rule generation. The whole procedure of rule generation was already discussed in section 35.3. In this component, a rule model is generated in the format of rule tree. Each path from the root node of the rule tree to a leaf node represents a rule. Intrusion Detection Module examines the rule tree generated in Rule Generation Component and matches the incoming raw data with rules. The results of rule matching will be transferred to Alarm/Response Plug-in and the latter will handle them by use of the strategy defined in advance. Alarm/Response Plug-in is responsible for dealing with results from Intrusion Detection Model. When external attacks occur, it will notify the administrator by means of e-mails, console alerts, log entries, or a visualization tool. Administrator is an interface between intrusion detection and users. Users can update the rule set manually by checking the log file, and define the strategy of the detection, etc.

As shown in Fig. 35.1, the system performs its task in two phases, rule training phase and detection phase. In the rule training phase, audit data labeled with attacks are used as training data for rule generation. The output of this phase is a rule tree. Especially, this phase will be executed offline before intrusion detection. In the detection phase, actual detection is implemented through matching incoming network data with the rule tree. The incoming data will be labeled as normal behavior or a certain attack. Based on the detection results, Alarm/Response Plug-in will take action.

35 Rough Set Based Solutions for Network Security

463

AkxuVResponse Module t=d

L z c jAkimLog ^

Rule Set

I

IntiusionDetection Module TT

ModuU Trainiie Data

PioiDcol Decoder

AdztnnbtralDr

I

Sensor? Detection

Rule Training

Network Fig. 35.1. Architecture of IDS

The system is capable of extracting a set of detection rules from network packet features. It is effective and suitable for online intrusion detection with low cost and high efficiency. Simulations on KDDCUP data [18] show that rough set theory is very appropriate for intrusion detection. The incremental rule generation algorithm RRIA introduced in the last section is used to detect new attacks. It can update the rule set quickly and conveniently. Compared with other methods, our method requires a smaller size of training data set and less effort to collect training data. 35.4.3 Other Rough Set Related Methods for Network Security Another method was presented for anomaly intrusion detection with low cost and high efficiency [19]. It extracts detection rules using rough set algorithm from the system call sequences generated during the normal execution of a process and considered as the normal behavior model. It is capable of detecting the abnormal operating status of a process and thus reporting a possible intrusion. Compared with other methods, it requires a smaller size of training data set and less effort to collect training data and is more suitable for real-time detection. Empirical results show that this method is promising in terms of detection accuracy, required training data set and efficiency. Not only rule generation based on rough set theory can be used for network security, but other concept may also be useful. For example, rough inclusion is used for matching of normal behaviors and abnormal behaviors [20]. All the above methods almost get a common conclusion that rough set method is suitable and promising for network security.

464

Guoyin Wang, Long Chen, and Yu Wu

35.5 Conclusions and Future Work We introduce the current status of intrusion detection systems (IDS) and data mining research, discuss some problems in technologies, methods and models of intrusion detection, and present some possible data mining based ways for solving these problems. Rough set based methods with some new techniques like data reduction, incremental mining for network security are discussed. Simulation experiment results show that rough set method is suitable and promising for network security. It is also very expensive to obtain the information from all possible sensors. It would save our time and money if we could mine from partial or incomplete data. In addition, the behaviors of normal computer users and network users differ greatly. Inconsistent data are often generated from detection sensors. Thus, it would be also very important for data mining based intrusion detection systems to deal with uncertain data containing incomplete or inconsistent records. Except for some traditional uncertain data processing techniques like statistical methods, some new techniques for processing uncertain data based on rough set theory were developed in recent years [21,22]. It would be helpful to cope with these problems. Some intrusion actions might not be noticed even if they have occurred before since the limitation of our knowledge about intrusion behaviors, and new intrusion techniques are developed and used by intruders. Some automatic (self, initiative) data mining algorithms driven by data itself [23] would be useful to mine the knowledge of this kind of intrusion actions and improve the detection rate. Our long-term goal is to design and build an intelligent, accurate, and flexible intrusion detection system with distributed and real-time characteristics. The system will have low false negative and false positive rates, not easily fooled by small variations in intrusion patterns. Scalable knowledge acquisition method, uncertain data mining, and initiative data mining, may be integrated into one system to improve its performance in the future.

References 1. Noel,S., Wijesekera, D. and Youman, C.(2002). Modern Intrusion Detection, Data Mining, and Degrees of Attack Guilt, in Applications of Data Mining in Computer Security, Daniel Barbar and Sushil Jajodia (eds.), Kluwer Academic Publishers. 2. Denning, D.E.(1987).An Intrusion-Detection Model, IEEE Transactionson Software Engineering, vol.13, pp.222-232 3. Lunt, T. F. A.(1993). Survey of Intrusion Detection Techniques, Computers and Security, vol. 12 (4), pp. 405-418. 4. Kenmierer, R. A. (1997). NSTAT: A Model-based Real-time Network Intrusion Detection System, University of California Santa Barbara Department of Computer Science, Santa Barbara, CA, Technical Report TR 1997-18. 5. Spafford, E. H. and Zamboni, D.(2000). Intrusion Detection Using Autonomous Agents, Computer Networks, vol. 34, pp. 547-570. 6. Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J., Levitt, K., Wee, C , Yip, R. and Zerkle, D.(1996). GrIDS-A Graph Based Intrusion Detection

35 Rough Set Based Solutions for Network Security

7.

8.

9.

10.

11. 12. 13.

14. 15. 16.

17.

18. 19.

20. 21. 22.

23.

465

System for Large Networks, Proceedings of 19th National Information Systems Security Conference, Baltimore, MD, pp. 361-370. Neumann, P. G. and Porras, P. A.(1999). Experience with EMERALD to Date, Proceedings of First Usenix Workshop on Intrusion Detection and Network Monitoring, Santa Clara, CA, pp. 73-80. Vigna, G. and Kemmerer, R. A.(1998). NetSTAT: A Network-based Intrusion Detection Approach, Proceedings of 14th Annual Computer Security Applications Conference, Phoenix, AZ, pp. 25-34. Yang, J., Ning, P, Wang, X. S.(2000). and Jajodia, S. CARDS: A Distributed System for Detecting Coordinated Attacks, Proceedings of IFIP TCI 1 16th Annual Working Conference on Information Security , pp. 171-180. Warrender, C , Forrest, S. and Pearlmutter, B.(1999). Detecting intrusions using system calls: alternative data models, 1999 IEEE Symposium on Security and privacy, IEEE Computer Socitey, pp. 133-145. Ko, C.(2000). Logic Induction of Valid Behavior Specifications for Intrusion Detection, 2000 IEEE Symposium on Security and Privacy, Berkeley, California, USA, pp. 142-153. Lee, W, Stolfo, S. J. and Mok, K.(1999). A Data Mining Framework for Building Intrusion Detection Models, 1999 IEEE Symposium on Security and Privacy, pp. 120-132. Eskin, E., Miller, M., Zhong, Z. D., Yi, G., Lee, W. A. and Stolfo, S. (2000). Adaptive Model Generation for Intrusion Detection Systems, Proceedings of the ACMCCS Workshop on Intrusion Detection and Prevention, Athens, Greece. Wang, G. Y. (2001).Rough Set Theory and Knowledge Acquisition, Xi'an: Xi'an Jiaotong University Press. Zheng, Z., Wang, G. Y. and Wu, Y.(2003). A Rough Set and Rule Tree Based Incremental knowledge Acquisition Algorithm, LNAI2639, Springer-Verlag. pp. 122-129. Wu, Y, Li, Z. J., Luo, P., Wang, G. Y.(2003). A new anti-Spam filter based on data mining and analysis of email security. Data Mining and Knowledge Discovery: Theory, Tools, and Technology V, pp. 147-154. Li, Z. J., Wu, Y, Wang, G. Y, Hai, Y J., He. Y R(2004). A new framework for intrusion detection based on rough set theory. SPIE Defense and Security Symposium. Orlando, Florida USA . accepted and to appear. http://kdd.ics.uci.edu/databases/kddcup99/ Cai, Z. M., Guan, X. H., Shao, R, Peng, Q. K., Sun, G. J. (2003) A rough set theory based method for anomaly intrusion detection in computer network systems. Expert Systems Vol.20 (5), pp251-259. Li X. J., Huang Y, Huang H. K.,(2003). An Computing Immune Model based on Poisson Procedure and Rough Inclusion. Chinese Journal of Computers. Vol.26 (1), pp.71-76. Wang, G. Y.(2002). Extension of Rough Set under Incomplete Information Systems, 2002 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1098-1103. Wang, G. Y and Liu, F.(2000). The Inconsistency in Rough Set Based Rule Generation, The Second International Conference on Rough Sets and Current Trends in Computing (RSCTC'2000), Canada, pp. 370-377. Wang, G. Y and He, X. (2003).A Self-learning Model under Uncertain Condition, Journal of Software, vol. 14(6), 2003, pp. 1096-1102.

36 Task Assignment with Dynamic Tol^en Generation Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi University of Rome La Sapienza, Dipartimento di Informatica e Sistemistica Via Salaria 113, Rome, Italy [email protected]

Summary. The problem of assigning tasks to a group of agents acting in a dynamic environment is a fundamental issue for a MAS and is relevant to several real world applications. Several techniques have been studied to address this problem, however when the system needs to scale up with size, communication quickly becomes an important issue to address; moreover, in several applications tasks to be assigned are dynamically evolving and perceived by agents during mission execution. In this paper we present a distributed task assignment approach that ensure very low communication overhead and is able to manage dynamic task creation. The basic idea of our approach is to use tokens to represent tasks to be executed, each team member creates, executes and propagates tokens based on its current knowledge of the situation. We test and evaluate our approach by means of experiments using the RoboCup Rescue simulator.

36.1 Introduction The problem of assigning tasks to a group of agents or robots acting in a dynamic environment is a fundamental issue for Multi Agent Systems (MAS) and Multi Robot Systems (MRS) and is relevant to several real world applications. Many techniques have been studied to address this problem in different scenarios, providing solutions that in different ways approximate the optimal solution of the Generalized Assignment Problem (GAP), which consists in assigning a predefined set of tasks (or roles) to a set of agents maximizing an overall utility function that takes into account the capabilities of all the agents in the team. While GAP requires the definition of a static set of tasks, that must thus be known in advance, in many application domains, tasks to be accomplished are not known a priori, but are discovered dynamically during the execution of the mission. Furthermore, when the system needs to scale up with size, communication quickly becomes an important issue to address. The problem of dynamic task assignment has been studied and experimented by many researchers both in MRS (e.g. [3, 16, 10]) and in MAS (e.g. [6, 4, 7, 13]) communities. Several different aspects of the problem have been investigated and several approaches proposed. However, the growing complexity of missions in which

468

Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi

robots and agents are involved pushes toward the development of novel solutions for task assignment, which are able to address the more challenging issues posed by the applications. For example, auction based approaches to task assignment, have been proved to fail in the RoboCup Rescue domain, due to high communication requirements [8]. In this paper we present a distributed task assignment approach that is able to dynamically discover new tasks to be accomplished according to the situation perceived by the agents during the execution of their activities, and to ensure very low communication overhead. We focus on task assignment for teams operating in environments that need to meet (soft) real time constraints in their mission execution, where agents involved have similar functionalities but possibly varied capabilities. The reference scenario we are interested in has the following characteristics: i) the domain and the number of agents involved pose strict constraints on communications; ii) agents may perform one or more tasks, but within resource limits; iii) too many agents fulfilling the same task lead to conflicts that needs to be avoided; iv) tasks are discovered during mission execution. The basic idea of our approach is derived from previous work based on token passing [12]. Tokens are used to represent tasks that must be executed by the agents, and each team member creates, executes and propagates these tokens based on its knowledge of the environment. The basic approach is based on the assumption that one token is associated to every task to be executed and that the token is maintained by the agent that is performing such a task, or passed to another agent if the agent that has the token is not in the condition of performing it. In the case of dynamic discovery of the tasks to be performed and thus of dynamic token generation, the token passing approach must be appropriately extended in order to limit the number of tokens associated to the same task. Indeed, in our reference scenario optimal performance is obtained when there is a limited number of agents cooperating to execute the same task; when too many agents operate on a single task the overall performance decreases, since they ignore other tasks that evolve in a dynamic environment. The algorithm presented in this paper allows every agent to generate tokens dynamically whenever a task to be accomplished is perceived, while limiting the number of tokens associated to the same task and minimizing the bandwidth (i.e. communication messages among agents) required. We test and evaluate our approach by means of experiments on a simulated scenario, that models a team of fire-fighters engaged in fighting fires in a city. To this end, we use the RoboCup Rescue simulator, that models the evolution of fires in the buildings of a city, city traffic, fire-fighters actions of extinguishing fires and communication among them. In this scenario, the location of the fires are not known a priori and the fire-fighter agents find them during their activities; in addition fires may unpredictably spread over adjacent buildings if not extinguished in time. Moreover, communication constraints are very strict, since messages are both limited and costly (in terms of simulation time steps). The results that are reported in this paper show that the proposed extension of the token passing approach provides good performance in this scenario, while maintaining a very low conmiunication bandwidth and thus significantly increasing the scala-

36 Task Assignment with Dynamic Token Generation

469

bility of the system. Therefore, the proposed approach is specifically well-suited for large scale teams operating in dynamic environment, as compared to other dynamic task assignment methods that require a wider communication bandwidth.

36.2 Problem Definition The definition of the problem considered in this paper is derived from the GAP problem [14], which consists in assigning a set of tasks (or roles) i? = { r i . . . r^^} to a set of agents (or entities) E = {ei.. .en} with different capabilities for each task Cap{ei, Tj) € [0,1] (i.e. a reward for the team when agent e^ performs task r^), different resources needed by the agents for performing each task Resources{ei, rj), and the resources available for an agent €{.resources. An allocation matrix A is used for establishing task assignment: aij = 1 if and only if the agent e^ is assigned to task rj. The goal for the GAP problem is to find such an allocation matrix, that maximizes the overall capability function: / ( ^ ) = X^XlC'ap(ei,r^) x a^j i

3

subject to: Vi ^ J Resources{ei,rj)

x a^j < ei.resources

3

i

For example, in the rescue scenario that we have considered in our experiments, tasks are fires to be extinguished and agents are fire fighter brigades. The capability of a fire fighter to extinguish a fire, maybe dependent on several parameters, however a good approximation could be to consider the capability as a function of distance from the fire; clearly, if the nearest fire fighter is allocated to each fire the team gain a reward in terms of total traveled distance and time to extinguish all the fires. Resources are represented by the amount of water needed to put out fires. The above formulation is well defined for a static environment, where agents and tasks are fixed and capabilities and resources do not depend on time. However, in several applications it is useful or even necessary to solve a similar problem where the defined parameters changes with time. For example, in the above mentioned rescue scenario, all the defined parameters clearly depends on time, (e.g. fire fighters capabilities are strongly dependent on the environment evolution). Indeed several methods for dynamic task assignment implicitly take into consideration such an aspect, providing solutions that consider the dynamics of the world and derive a task allocation that approximate solutions of the GAP problem at each time steps (see for example [3, 16, 10, 8]).

470

Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi

The method described in this paper follows the line described above, and aims at solving the GAP problem when the set of tasks R is not known a priori when the mission starts, but it is discovered and dynamically updated during tasks execution. To describe our method we will use the following notation. We denote that the set R depends on time with R{t) = { r i . . . rm{t)}^ where m{t) is the number of tasks considered at time t, and we express the capabilities and the resources depending on time with Cap{ei^rj,t), Resources{ei^rj^t), and ei.resources(t). The dynamic allocation matrix is denoted by Au in which ai^j^t = 1 if and only if the agent e^ is assigned to task Vj at time t. Consequently, the problem definition is to find a dynamic allocation matrix that maximizes the following function m{t) t

i

j=i

subject to: Tn{t)

VtVi y j Resources{si^rj^t)

x aij^t <

ei,resources(t)

VtVjG{0,...,m(t)}^a,,,-t
36.3 Token Generation for Tasks Allocation The main idea of the token passing approach is to regulate access to tasks execution through the use of tokens, i.e. only the agent currently holding the token can execute the task. Following this approach the communication needed to guarantee that each task is performed by one agent at time is dramatically reduced (see [2]). If a task can benefit from the simultaneous execution of several agents, we can decide to create several tokens referring to the same task. However, when tokens are generated and perceived by agents during mission execution conflicts on tasks may arise. In this paper we will deal with two kinds of conflicts: the first one is due to the fact that the same task can be perceived by several agents during the missions, and if no explicit procedure is used the allocation process has no control on the maximum number of agents operating on such a task; this can lead to a consistent waste of resources and result in poor performance. The second type of conflict arises when an agent accomplishes a task and other tokens referring to the same task are still active, causing agents to waste precious time in trying to accomplishing terminated tasks. We explicitly address these problems by proposing an extension to the algorithm presented in [2]. In the following, a Task refers to the physical object or event that the agent perceives and that implies an activity to be executed (e.g. a fire to be extinguished), therefore given a perceived object o we define the related task T{o)\ a Token comprises the physical object related to the task and an identification number, that identifies different tokens for the same task, therefore given a task T{o) we

36 Task Assignment with Dynamic Token Generation

471

may have a number s of tokens TK{o, l)...TK(o, s). The main idea of the proposed algorithm is that when an agent perceives a task, it records this information in a local structure and announces the presence of the task to all its team mates. Only the agent that first perceives a new task (e.g. a fire) creates one ore more tokens for it; conflicts that might arise due to simultaneous perception are addressed and solved as explained later. Whenever, an agent accomplishes a task it announces to the entire team the task termination, and each of the team members removes the tokens referring to the accomplished task from their local structures. Using this approach conflicting tokens can still be created for two main reasons: i) Contemporary task discovery: two agents ei and €2 perceive a new task t, creating a set of tokens Tk{t, l)...Tk{t^ s) exactly at the same time, such that both agents will have different tokens referring to the same task, ii) Messages asynchrony Assume we have three agents ei, 62,63; if ei immediately after the creation of a new set of tokens Tk{t, l)...Tk{t, s) decide to send one of them, say Tk{t,j), to agent 63, this token will not be found in the local structure of ei when the announce messages of 62 arrives and therefore will not be deleted; for 63 we can have two situations: a) the message referring to token Tk{t, j) arrives before the announce message of 62 b) the announce message of 62 arrives before the message referring to token Tk{t,j). In both these situations the token Tk{tJ) will not be deleted, and the conflict will not be solved. Both these problems have been addressed and solved in our approach as explained later in this section. In the algorithm the following data structures are used: i) Known Tasks Set (KTS) is a set containing at each time step all the tasks that has been perceived by all the agents; ii) Token Set (TkS) is the set of tokens each agent currently holds; iii) Temporary Token Set (TmpTkS) is a set containing the tokens created by the agent in the current time step; iv) Accomplished Tasks Set (ATS) is a set containing at each time step all the tasks that have been accomplished by all the agents each of this data structure is local to one agent, v) A message has three fields: type E {announce^ accomplishedTask^ token), task that contains information about the perceived task (e.g. fire position), valid when type is announce or accomplishedTask\ finally the token field is valid only when the message is of type token and contains information about the token (e.g. task position. Id number, visited agents etc.); whenever an agent detects a new task through its perception it adds the new task to the KTS, creates s tokens referring to the task and adds them in the TmpTkS, then it Announce the new task to all its team members (Algorithm 1 OnPercReceived). Each team member when accomplishes a task sends an accomplished message to all its team mates and update its ATS (Algorithm 1 OnTaskAccomplishment). Each team member when receiving a message updates its local structures as explained in Algorithm 1, OnMsgReceived. Whenever a task is perceived, a new token is generated only if that task is not present in the KTS. After tokens have been processed (Algorithm 1, TokenManagement) the TmpTkS is copied in the TkS . Assuming that messages cannot get lost, Algorithm 1 guarantees that when an agent a perceives a task T, that has already been discovered before (i.e. that is present in the KTS), it will not create new tokens for it, correctly assuming that someone else already has the token(s) for T.

472

Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi

Algorithm 1: Procedures for on line token generation ONPERCRECEIVED(tas/c)

(1) (2) (3) (4)

if {task ^KTS) KTS = KTS U task TmpTkS = TmpTkS U T{task, 1) U ... U T{task, s) SEND(Msg(Announce,tasfc))

ONMSGRECEIVED(MSP)

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

if Msg.type =—AccomplishedTask ATS = ATS U Msg.task if Msg.type == Announce if {Msg.task 0 KTS) KTS = KTS U {Msg.task} else if Msg.senderld > My Id TmpTkS = TmpTkS \ {Tf^jT{Msg.task if CurrentT ask == Msg.task

J)}

STOPCURRENTTASKO

(11) if Msg.type == Token (12) TkS = TkS U Msg.Token ONTASKACCOMPLISHMENT(tas/c)

(1) (2)

ATS = ATS U task SEND(Msg(AccomplishedTask,ta5/c))

TOKENMANAGEMENTO

(1) (2) (3) (4) (5) (6)

TkS = TkS \ ATS TokenSet = CHOOSETOKENSET(TkS) SendTokenSet = TkS \ TokenSet SEND(Msg(Token,5'endTo/cen5eO) TkS = TkS U TmpTkSet STARTTASK(CHOOSETASK(TokenSet))

Notice that OnPercReceived, OnMsgReceived and OnTaskAccomplishment are asynchronous procedures, triggered by particular events; theoretically all the possible interleaving of their execution could occur, hovv^ever, if we assume that each procedure is atomic (which is a reasonable assumption since no synchronization among agents is involved), we can guarantee that there will never be two tokens referring to the same task in the system for a time longer than the time required for the Announce messages to reach all the team members. In fact, as explained above conflicting tokens may be created in case of Contemporary task discovery or due to Messages asynchrony. The problem of Contemporary task discovery is considered and solved by procedure OnMsgReceived: when agents receive the announce messages the one with a lower static priority, represented in the procedure by the lower Id number, will delete the token for task t from TmpTkS, solving the conflict; if t is already being

36 Task Assignment with Dynamic Token Generation

473

executed by the agent with lower static priority, it will stop its execution yielding to the higher priority agent the possibility to execute the task. The problem arising due to Messages asynchrony is avoided thanks to the distinction between temporary tokens (stored in TmpTkS) and normal tokens (stored in TkS). In fact, assuming that the time needed for an announce message to reach all the agents is less than one simulation step (i.e. assuming that messages are synchronized with agents execution) the use of a Temporary Token Set guarantees that the conflicts will be detected and avoided. Otherwise, a higher communication overhead is needed in order to recover from such conflicts. Setting a static fixed priority among agents can obviously result in non optimal behavior of the team, for example assuming that Cap{ei,rj,ti) > Cap{e2,rj,ti) following the static priority, we yield to the less capable agent the access to the task rj. However, while theoretically the difference among capabilities can be unbounded, generally, when tasks are discovered using perception capabilities agents perceive tasks when they are close to the object location, ( e.g. if two fire fighters perceive the same fire their distance from the fire is comparable) and therefore the loss of performance due to the use of a fixed priority is limited. Once a token has been created and added to the TkS the token-based access to values requires that each agent decides whether to execute the tasks represented by tokens it currenriy has or to pass the tokens on. The token management procedure of Algorithm 1 describes how tokens are processed: each agent erases from its TkS the accomplished task set ATS, then it chooses a set of tokens it can execute (ChooseTokenSet(TkS)). Each agent follows a greedy policy in this decision process, i.e. it tries to maximize its utility given the tokens it currently can access and its resource constraints. However, each agent in its decision consider whether it is in the best interest of the team for it to execute the tasks represented by its tokens. The key question is whether passing the token on will lead to a more capable team member taking on the token. Using probabilistic models of the members of the team and the tasks that need to be assigned, the team member can choose the minimum capability the agent should have in order to take on a token. Each agent sends the remaining tokens to its team mates, following a round robin policy and copies the TmpTkS in the TkS. Finally, each agent chooses the best task (e.g. for fire fighters could be the nearest fire) among the TokenSet it currently has (ChooseTask(TokenSet)) and starts the task execution.

36.4 Experiments and Results We tested our task assignment approach in the RoboCup Rescue environment [5]. RoboCup Rescue provides an ideal simulation environment to test allocation strategies for team comprised of rescue agents. We focus on a real city map of Foligno in Italy [9], so as to test the performance of our approach in a realistic disaster rescue environment and where agents must navigate narrow streets and passages. Here, a team of fire brigades must fight fires in real-time, while facing the uncertainty of fire spreading and the dynamism that arises due to several factors: (i) agent has a limited

474

Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi

view of the world, and do not know in advance fires initial positions (ignition points); (ii) the way fires spread can not be precisely predicted; (iii) agents can be blocked in narrow passages. To show that the algorithm presented in section 36.3 does actually avoid conflicts of both types, we implemented three different kinds of allocation strategies. The first strategy, referred to as Token Passing (TP), is a plain implementation of the token based approach algorithm, no announce procedure is used, but agents record in a Known Fire List the fires they perceive to avoid that different agents create two tokens for the same fire. This strategy does not enforce any constraint on the maximum number of agents simultaneously fighting the same fire. The second strategy, referred to as TP with Announce (TPA-n), makes use of the announce procedure to enforce that no more than n agents are simultaneously fighting the same fire, however this strategy does not address the second kind of conflict type, therefore situations in which agents can try to fight already extinguished fires may arise. The third strategy, referred to as TPA-n with AccomplishedTask (TPAA-n), makes use of the announce and AccomplishedTask messages, avoiding both types of conflicts. In all the strategies the processing token procedure is the same and the capability to execute a task is computed considering the distance between the fire fighting agent and the fire, and whether the agent is blocked in a narrow passage. If an agent is blocked, it sends out the task it is currently executing and choose a different task from its set. The set of tasks to be executed is computed choosing the nearest fire / and keeping up to K fires whose distance from / is lower than a fixed Threshold T. The Threshold T and the number of tokens each agent can retain is statically defined, and is computed considering global information, such as the number of agents involved in the simulation and their distribution on the map. For a detailed discussion on how this static values can be computed we refer to [11]. We tested each strategy in different operative conditions, changing the extinguish power the fire fighting agents have. We start each simulation from the same initial configuration, comprised of 10 fire fighting agents and 18 ignition points distributed as shown in Figure 36.1; In this experiments we assume that messages cannot be lost, and that their delay is not higher than a simulation step (i.e. agent execution is synchronized with message passing), moreover we set the number of tokens to be created for each task to be a fixed number (three in the performed experiments); while it is possible to dynamically change this number during mission execution depending on the environment situation, in these experiments we focus on studying how conflicts influence performance of fire fighting agents, leaving the problem of how many tokens would be needed for each task and how to deal with possible lost messages and unpredictable delays to later investigation. We extracted from the performed experiments the extinguish time, as the time needed to put out all the fires, the number of point to point messages exchanged among agents per time step, the number of broadcast messages sent by agents per time step, the total traveled distance per agent and finally the total number of conflicts, as the number of times during the entire simulation that more than three agents have the same fire as target.

36 Task Assignment with Dynamic Token Generation

475

^ ^ ^ ^ ^

I h(Spi S7jil9li 91 I flfcPoInt ID» 0

Fig. 36.1. Foligno Map used in the experiments

Table 36.1. Results obtained averaging 10 simulations TP Ext.Tlme^ 67 [0.7] Ptp Msg per time step 1.4 [0.04] Beast Msg per time step 0[0] Trav. Dist. per agent 2495 [198] Conflicts 26.62 [1.92]

TPA-3

TPAA-3 59.63 [16.6] 50.62 [2.5] 1.8 [0.56] 1.7 [0.053] 0.68 [0.63] 1.63 [0.13] 3201 [527] 2221 [195] 0[0] 0[0]

In Table 36.1 we report results obtained from the simulations performed. Each reported value is the average obtained from ten repetitions of the simulation with the same operative conditions, along with the computed standard deviation (reported between brackets). From the table it is possible to see that the TPAA-3 strategy consistently outperform the TP strategy with a higher but still acceptable amount of messages. IVloreover, the traveled distance for each agent is smaller on average, showing that better results are reached with a smaller waste of resources. The performance of TPA-3 strategy are on average in the middle with respect to TP and TPAA-3, however this strategy is characterized by a very high variance specially regarding the extinguish time and traveled distance. The high variance is due to the fact that the

476

Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi

strategy does not avoid the second type of conflicts, possibly generating consistent resource wasting. In the performed experiments we have used values for extinguish power ranging from 6000 (water unit per minute) and up to model situations where it is useful that the agents allocation is balanced among the different tasks. Indeed, we found that the similar relationships among strategies hold increasing the extinguish power from 6000 (results reported in table 36.1) to 8000 and 10000.

36.5 Conclusions and Future works Task allocation is a very widely studied area and several approaches have been presented in literature addressing different issues and techniques ranging from forward looking optimal model [8], to market or auction based techniques [16,4], to symbohc matching [15] and Distributed Constrained Optimization Problem based algorithms [6]. However, the growing complexity of application for MAS and MRS requires novel solutions for task assignment, which are able to address specific features posed by the domain, such as dynamic tasks evolution, strict constraints on communication and soft real time constrained to be met. Token based approach have been proved to be well suited for task allocation in such scenario [13, 1], however the specific problems of dynamic token generation and conflicts resolution have not been considered yet. In this paper we take a step in this direction proposing an extension to the token approach able to address this issue while keeping a reasonably low communication overhead. Moreover, we present first experimental results obtained for our approach, showing that it is actually applicable in a rescue scenario and is able to resolve conflicts improving the performances of the rescue teams. Several other issues need to be further addressed, in particular we intend to test our algorithm with different types of rescue teams, such as ambulances or police force. The ambulance case is particularly interesting because it is important to enforce the constraint that only one agent can take care of a civilian, since no further benefit can be given to the team by having more than one ambulance trying to pick up a civilian, therefore we plan to further test our approach with ambulances. When dealing with different forces type constrained tasks comes into play, for example an ambulance agent could need to have a blocked road freed to pick up a civilian by a police agent, and an evaluation of our approach in such situation is particularly interesting. Finally, in our working scenario we assumed that no messages can be lost, this is quite a strong assumption, that can be easily violated in real world applications, therefore an interesting extension of our method will be devoted to explicitly deal with such situations. Acknowledgment This effort was partially funded by the U.S. Air Force European Office Of Scientific Research under grant number 033065 and by project "Simulation and Robotic

36 Task Assignment with Dynamic Token Generation

477

Systems for intervention in emergency scenarios" within program COFIN03 of the Italian MIUR, grant number 2003097252

References 1. A. Farinelli, P. Scerri, and M. Tambe. Allocating and reallocating roles in very large scale teams. In First Int. Workshop on Synthetic Simulation and Robotics to Mitigate Earthquake Disaster, Padua, Italy, July 2003. 2. A. Farinelli, P. Scerri, and M. Tambe. Building large-scale robot systems: Distributed role assignment in dynamic, uncertain domains. In Representation and approaches for time-critical decentralized resources/role/task allocation (AAMAS Workshop), 2003. 3. B. Gerkey and J. M. Mataric. Multi-robot task allocation: Analyzing the complexity and optimality of key architectures. In Proc. of the Int. Conf on Robotics and Automation (ICRAW), Taipei, Taiwan, Sep 14 - 19 2003. 4. L. Hunsberger and B. Grosz. A combinatorial auction for collaborative planning. In Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS2000J, pages 151-158, 2000. 5. H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, E. Osawa, and H. Matsubara. RoboCup: A challenge problem for AI. AI Magazine, 18(l):73-85, Spring 1997. 6. Roger Mailler, Victor Lesser, and Bryan Horling. Cooperative negotiation for soft realtime distributed resource allocation. In Proceedings of AAMAS'03, 2003. 7. P. J. Modi, H. Jung, W. Shen, M. Tambe, and S. Kulkami. A dynamic distributed constraint satisfaction approach to resource allocation. In Proc of Constraint Programming, 2001. 8. R. Nair, T. Ito, M. Tambe, and S. Marsella. Task allocation in robocup rescue simulation domain. In Proceedings of the International Symposium on RoboCup, 2002. 9. D. Nardi, A. Biagetti, G. Colombo, L. locchi, and R. Zaccaria. Realtime planning and monitoring for search and rescue operations in largescale disasters. Technical report. University "La Sapienza" Rome, 2002. http://www.dis.uniromal.it/~rescue/. 10. L. E. Parker. ALLIANCE: An architecture for fault tolerant multirobot cooperation. IEEE Transactions on Robotics and Automation, 14(2):220-240, 1998. 11. P. Scerri, A. Farinelli, S. Okamoto, and M. Tambe. Allocating roles in extreme team. In AAMAS 2004 (Poster), New York, USA, 2004. 12. P. Scerri, A. Farinelli, S. Okamoto, and M. Tambe. Token approach for role allocation in extreme teams: analysis and experimental evaluation. In 13th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2004)., Modena, Italy, 2004. 13. P. Scerri, D. V. Pynadath, L. Johnson, Rosenbloom P., N. Schurr, M Si, and M. Tambe. A prototype infrastructure for distributed robot-agent-person teams. In In Proceedings of AAMAS, 2003. 14. D. Shmoys and E. Tardos. An approximation algorithm for the generalized assignment problem. Mathematical Programming, 62:461-474, 1993. 15. G. Tidhar, A. S. Rao, and E. A. Sonenberg. Guided team selection. In Proceedings of the Second International Conference on Multi-Agent Systems, 1996. 16. R. Zlot, A Stenz, M. B. Dias, and S. Thayer. Multi robot exploration controlled by a market economy. In Proc. of the Int. Conf. on Robotics and Automation (ICRA'02), pages 3016-3023, Washington DC, May 2002.

37

DyKnow: A Framework for Processing Dynamic Knowledge and Object Structures in Autonomous Systems Fredrik Heintz and Patrick Doherty * Department of Computer and Information Science, Linkoping University 581 83 Linkoping, Sweden Summary. Any autonomous system embedded in a dynamic and changing environment must be able to create qualitative knowledge and object structures representing aspects of its environment on theflyfrom raw or preprocessed sensor data in order to reason qualitatively about the environment. These structures must be managed and made accessible to deliberative and reactive functionalities which are dependent on being situationally aware of the changes in both the robotic agent*s embedding and internal environment. DyKnow is a software framework which provides a set of functionalities for contextually accessing, storing, creating and processing such structures. The system is implemented and has been deployed in a deliberative/reactive architecture for an autonomous unmanned aerial vehicle. The architecture itself is distributed and uses real-time CORBA as a communications infrastructure. We describe the system and show how it can be used in execution monitoring and chronicle recognition scenarios for UAV applications.

37.1 Introduction Research in cognitive robotics is concerned with endowing robots and software agents with higher level cognitive functions that enable them to reason, act and perceive in a goal-directed manner in changing, incompletely known, and unpredictable environments. Research in robotics has traditionally emphasized low-level sensing, sensor processing, control and manipulative tasks. One of the open challenges in cognitive robotics is to integrate techniques from both disciplines and develop architectures which support the seamless integration of low-level sensing and sensor processing with the generation and maintenance of higher level knowledge structures grounded in the sensor data. Knowledge about the internal and external environments of a robotic agent is often both static and dynamic. A great amount of background or deep knowledge is required by the agent in understanding its world and in understanding the dynamics *Both authors are supported by grants from the Wallenberg Foundation, Sweden and NFFP 539 COMPAS.

480

Fredrik Heintz and Patrick Doherty

in the embedding environment where objects of interest are cognized, hypothesized as being of a particular type or types and whose dynamics must be continuously reasoned about in a timely manner. This implies signal-to-symbol transformations at many levels of abstraction with different and varying constraints on real-time processing. Much of the reasoning involved with dynamic objects and the dynamic knowledge related to such objects involves issues of situation awareness. How can a robotics architecture support the task of getting the right information in the right form to the right functionalities in the architecture at the right time in order to support decision making and goal-directed behavior? Another important aspect of the problem is the fact that this is an on-going process. Data and knowledge about dynamic objects has to be provided continuously and on-the-fly at the rate and in the form most efficient for the receiving cognitive or reactive robotics functionality in a particular context. Context is important because the most optimal rates and forms in which a robotic functionality receives data are often task and environmentally dependent. Consequently, autonomous agents must be able to declaratively specify and re-configure the character of the data received. How to define a change, how to approximate values at time-points where no value is given and how to synchronize collections of values are examples of properties that can be set in the context. By robotic functionalities, we mean control, reactive and deliberative functionalities ranging from sensor manipulation and navigation to high-level functionalities such as chronicle recognition, trajectory planning, and execution monitoring. The paper is structured as follows. We start with section 37.2 where a larger scenario using the proposed framework is described. In section 37.3, the UAV platform used in the project is briefly described. In section 37.4, DARA, a Distributed Autonomous Robotics Architecture for UAVs is briefly described. DyKnow is an essential module in this architecture. In sections 37.5 and 37.6, the basic structure of the DyKnow framework and the dynamic knowledge and object structures is described. In sections 37.7.2 and 37.7.3, two deliberative functionalities which use the DyKnow framework are considered, chronicle recognition and execution monitoring, in addition to the dynamic object repository (DOR) described in section 37.7.1. We conclude in section 37.8 with a discussion of the role of the DyKnow framework and some related work.

37.2 An Identification and Track Scenario In order to make these ideas more precise, we will begin with a scenario from an unmanned aerial vehicle project the authors are involved in which requires many of the capabilities discussed so far. Picture the following scenario. An autonomous unmanned aerial vehicle (UAV), in our case, a helicopter, is given a mission to identify and track a vehicle with a particular signature in a region of a small city. The signature is provided in terms of color and size (and possibly 3D shape). Assume that the UAV has a 3D model of

37 DyKnow

481

the region in addition to information about building structures and the road system. These models can be provided or may have been generated by the UAV itself. Additionally, assume the UAV is equipped with a GPS and INS^ for navigating purposes and that its main sensor is a camera on a pan/tilt mount. Let's consider the processing from the bottom up, even though in reality, there will be many feedback loops in the UAV architecture. One way for the UAV to achieve its task would be to initiate a reactive task procedure (parent procedure) which calls the systems image processing module with the vehicle signature as a parameter. The image processing module might then try to identify colored blobs in the region of the right size, shape and color as a first step. These object descriptions would have to be sent to a module in the architecture called the dynamic object repository (DOR) which is responsible for the dynamic management of such objects. Each of these vision objects would contain features related to the image processing task such as RGB values with uncertainty bounds, length and width in pixels, position in the image, a sub-image of the object which can be used as a template for tracking, an estimate of velocity, etc. From the perspective of the UAV, these objects are only cognized to the extent that they are moving colored blobs of interest and the feature data being collected should continue to be collected while tracking those objects perceived to be of interest. What objects are of interest? The parent procedure might identify that or those objects which are of interest based on a similarity measure according to size, color and movement. In order to do this, the DOR would be instructed to create one or more world objects and link them to their respective vision objects. At this point the object is cognized at a more qualitative level of abstraction, yet its description in terms of its linkage structure contains both cognitive and pre-cognitive information which must be continuously managed and processed due to the interdependencies of the features at various levels. A world object could contain additional features such as position in a geographic coordinate system rather than the low-level image coordinate. Generating a geographic coordinate from an image coordinate continuously, called co-location is a complex process that involves combining dynamic data about features from several different objects such as the camera object, helicopter object and world objects, together with data from an onboard geographical information system (GIS) module which is also part of the architecture. One would require a computational unit of sorts that takes streamed data as input and outputs a new stream at a higher level of abstraction representing the current geographical coordinate of the object. This colocation process must occur in real-time and continually occur as the world object is tracked. This implies that all features for all dynamic objects linked to the world object in focus have to be continually updated and managed. At this point, the parent task may want to make a comparison between the geographical coordinate and the position of that coordinate in terms of the road system for the region, information of which is stored in the onboard GIS. This indexing ^GPS and INS are acronyms for global positioning system and inertial navigation system, respectively.

482

Fredrik Heintz and Patrick Doherty

mechanism is important since it allows the UAV to reason qualitatively about its spatial surroundings. Let's assume this is done and after some period of tracking and monitoring the stream of coordinates, the parent procedure decides that this looks like a vehicle that is following the road. On-road objects might then be created for each of the world objects that pass the test and linked to their respective world objects. An on-road object could contain more abstract and qualitative features such as position in a road segment which would allow the parent procedure to reason qualitatively about its position in the world relative to the road, other vehicles on the road, and other building structures in the vicinity of the road. At this point, streams of data are being generated and computed for many of the features in the linked object structures at many levels of abstraction as the helicopter tracks the on-road objects. The parent procedure could now use static knowledge stored in onboard knowledge bases and the GIS together with this dynamic knowledge to hypothesize as to the type of vehicle. The hypothesis would of course be based on the linkage structure for an on-road object and various features at different levels of abstraction. Assume the parent procedure hypothesizes that the on-road object is a car. A car object could then be created and linked to the existing linkage structure with additional high-level feature information about the car. Whether or not the sum of streamed data which makes up the linkage structure represents a particular type of conceptual entity will only ever remain a hypothesis which could very well change, based on changes in the character of the streams of data. Monitors, users of these structures, would have to be set up to observe such changes and alert the parent procedure if the changes become too abnormal relative to some criteria determined by the parent procedure. Abnormality is a concept that is well-suited for being reasoned about at a logical level and the streamed data would have to be put into a form amenable to this type of processing. How then can an architecture be set up to support the processes described in the UAV scenario above? This is the main topic of this paper and in it we propose a software system called the DyKnow Framework.^

37.3 The WITAS UAV Platform The WITAS"* Unmanned Aerial Vehicle Project [1, 2] is a long-term basic research project whose main objectives are the development of an integrated hardware/software VTOL (Vertical Take-Off and Landing) platform for fully-autonomous missions and its future deployment in applications such as traffic monitoring and surveillance, emergency services assistance, photogrammetry and surveying. The WITAS Project UAV platform we use is a slightly modified Yamaha RMAX (Fig. 37.1). It has a total length of 3.6 m (including main rotor), a maximum take-off ^"DyKnow" is pronounced as "Dino" in "Dinosaur" and stands for Dynamic Knowledge and Object Structure Processing. "^WITAS (pronounced vee-tas) is an acronym for the Wallenberg Information Technology and Autonomous Systems Laboratory at Linkoping University, Sweden.

37 DyKnow

483

weight of 95 kg, and is powered by a 21 hp two-stroke engine. Yamaha equipped the radio controlled RMAX with an attitude sensor (YAS) and an attitude control system (YACS).

Fig. 37.1. The WITAS RMAX Helicopter The hardware platform consists of three PC 104 embedded computers (Fig. 37.2). The primary flight control (PFC) system consists of a PHI (700Mhz) processor, a wireless Ethernet bridge and the following sensors: a RTK GPS (serial), and a barometric altitude sensor (analog). It is connected to the YAS and YACS (serial), the image processing computer (serial) and the deliberative computer (Ethernet). The image processing (IP) system consists of a second PC 104 embedded computer (PHI 700MHz), a color CCD camera (S-VIDEO, serial interface for control) mounted on a pan/tilt unit (serial), a video transmitter (composite video) and a recorder (miniDV). The deliberative/reactive (D/R) system runs on a third PC 104 embedded computer (PHI 700MHz) which is connected to the PFC system with Ethernet using CORBA event channels. The D/R system is described in more detail in the next section. For further discussion, it is important to note that computational processes are executed concurrently on distributed hardware. Data flow is both synchronous and asynchronous and the concurrent distributed nature of the hardware platform contributes to diverse latencies in data flow throughout the system.

37.4 DARA: A Distributed Autonomous Robotics Architecture The DARA system [3] consists of both deliberative and reactive components which interface to the control architecture of the primary flight controller (PFC). Current flight modes include autonomous take-off and landing, pre-defined and dynamic trajectory following, vehicle tracking and hovering. We have chosen real-time

484

Fredrik Heintz and Patrick Doherty 700Mhz PIII/256Mbram/500Mbflash

LINUX

Camera Platform

700Mhz PIII/256Mbram/500Mbflash IPAPI ^a^rabberj •C RS232

RMAX Helicopter Platform

[preprocessor] RtLlNUx

700Mhz PIII/256Mbram/256Mbflash

Fig. 37.2. DARA Hardware Schematic CORBA [4]^ as a basis for the design and implementation of a loosely coupled distributed software architecture for our aerial robotic system. The communication infrastructure for the architectures is provided by CORBA facilities and services. Fig. 37.3 depicts an (incomplete) high-level schematic of some of the software components used in the architecture. Each of these may be viewed as a CORBA server/client providing or requesting services from each other and receiving data and events through both real-time and standard event channels. The modular task architecture (MTA) which is part of DARA is a reactive system design in the procedure-based paradigm developed for loosely coupled heterogeneous systems such as the WITAS aerial robotic system. Reactive behaviors are implemented as task procedures (TP) which are executed concurrently and essentially event-driven. A TP may open its own (CORBA) event channels, and call its own services (both CORBA and application-oriented services such as path planners) including functionalities in DyKnow.

^We are currently using TAG/ACE. The Ace Orb is an open source implementation of CORBA 2.6.

37 DyKnow Task Planner Service

Helicopter Controller

Path Planner Service ^

I Task Procedure Execution I ...Module (TPEM). I ('TPf: • . . • < ('tPrt

v....^-::i Physical Camera Controller

Chronicle Recognition Service

Image Controller

Prediction Service Qualitative |Signal Processing Controller

J:^!:^^::! IPAPI ( IPAPI Runtime ) Image Processing Module (IPM)

485

Geographical Data L Repository Knowledge Repository Dynamic Object L Repositorv

Fig. 37.3. DARA Software Schematic

37.5 DyKnow Given the distributed nature of both the hardware and software architectures in addition to their complexity, one of the main issues is getting data to the right place at the right time in the right form and to be able to transform the data to the proper levels of abstraction for use by high-level deliberative functionalities and middle level reactive functionalities. DyKnow is designed to contribute to achieving this. Ontologically, we view the external and internal environment of the agent as consisting of entities representing physical and non-physical objects, properties associated with these entities, and relations between entities. We will call such entities objects and those properties or relations associated with objects will be called features. Features may be static or dynamic and parameterized with objects. Due to the potentially dynamic nature of a feature, that is, its ability to change value through time, 2ifluentis associated with each feature. A fluent is a function of time whose range is the feature's type. For a dynamic feature, the fluent values will vary through time, whereas for a static feature the fluent will remain constant through time. Some examples of features would be the estimated velocity of a world object, the current road segment of an on-road object, and the distance between two car objects. Each fluent associated with these examples implicitly generates a continuous stream of time tagged values of the appropriate type. Additionally, we introduce locations, policies, computational units and fluent streams which refer to aspects of fluent representations in the actual software architecture. A location is intended to denote any pre-defined physical or software location that generates feature data in the DARA architecture. Some examples would be onboard or offboard databases, CORBA event channels, physical sensors or their device interfaces, etc. In fact, a location will be used as an index to reference a representational structure associated with a feature. This structure denotes the process which implements the fluent associated with the feature. A fluent implicitly represents a stream of data, 2ifluentstream. The stream is continuous, but can only ever be approximated in an architecture. A policy is intended to represent a particular contextual window or filter used to access a fluent. Particular functionalities in the architecture may need to sample the stream at a particular rate or interpolate values

486

Fredrik Heintz and Patrick Doherty

in the stream in a certain manner. Policies will denote such collections of constraints. Computational units are intended to denote processes which take fluent streams as input, perform operations on these streams and generate new fluent streams as output. Each of these entities are represented either syntactically or in the form of a data structure within the architecture and many of these data structures are grounded through sensor data perceived through the robotic agent's sensors. In addition, since declarative specifications of both features and policies that determine views of fluent streams are Ist-class citizens in DyKnow, a language for referring to features, locations, computational units and policies is provided, see [5] for details. One can view DyKnow as implementing a distributed qualitative signal processing tool where the system is given the functionality to generate dynamic representations of parts of its internal and external environment in a contextual manner through the use of policy descriptors and feature representation structures. The dynamic representations can be viewed as collections of time series data at various levels of abstraction, each time series representing a particular feature and each bundle representing a particular history or progression. Another view of such dynamic representations and one which is actually put to good use is to interpret the fluent stream bundles as partial temporal models in the logical sense. These partial temporal models can then be used on the fly to interpret temporal logical formulas in TAL (temporal action logic) or other temporal formalisms. Such a functionality can be put to good use in constructing execution monitors, predictive modules, diagnostic modules, etc. The net result is a very powerful mechanism for dealing with a plethora of issues associated with focus of attention and situational awareness.

37.6 Dynamic Object Structure in DyKnow An ontologically difficult issue involves the meaning of an object. In a distributed architecture such as DARA, information about a specific object is often distributed throughout the system, some of this information may be redundant and it may often even be inconsistent due to issues of precision and approximation. For example, given a car object, it can be part of a linkage structure which may contain other objects such as on-road, world and vision objects. For an example of a linkage structure see Fig. 37.4. In addition, many of the features associated with these objects are computed in different manners in different parts of the architecture with different latencies. One candidate definition for an object could be the aggregate of all features which take the object as a parameter for each feature. But an object only represents some aspects of an entity in the world. To represent that several different objects actually represent the same entity in the world, links are created between those objects. It is these linkage structures that represent all the aspects of an entity which are known to the UAV agent. It can be the case that two linkage structures in fact represent the same entity in the world but the UAV agent is unable to determine this. Two objects may even be of the same type but have different linkage structures associated with them. For example, given two car objects, one may not have an on-road

37 DyKnow

487

object, but an off-road object, as part of its linkage structure. It is important to point out that objects as intended here have some similarities with OOP objects, but many differences.

VisionObJect

#2 ^

J

WorldObject #3 L

J

OnRoadObject #5

^

^

> L

CarObject

#7 >

Fig. 37.4. An example object linkage structure

To create and maintain these object linkage structures we use hypothesis generation and validation. Each object is associated with a set of possible hypotheses. Each possible hypothesis is a relation between two objects associated with constraints between the objects. To generate a hypothesis, the constraints of a possible hypothesis must be satisfied. Two different types of hypotheses can be made depending on the types of the objects. If the objects have different types then a hypothesis between them is represented by a link. If they have the same type then a hypothesis is represented by a codesignation between the objects. Codesignations hypothesize that two objects representing the same aspect of the world are actually identical, while a link hypothesizes that two objects represent different aspects of the same entity. A link can be hypothesized when a reestablish constraint between two existing objects is satisfied or an establish constraint between an object and a newly created object is satisfied. In the anchoring literature these two processes are called reacquire and find [6]. Since the UAV agent can never be sure its hypotheses are true, it has to continually verify and validate them against its current knowledge of the world. To do this, each hypothesis is associated with maintenance constraints which should be satisfied as long as the hypothesis holds. If the constraints are violated then the hypothesis is removed. The maintenance and hypothesis generation constraints are represented using the linear temporal logic (LTL) with intervals [7] and are checked using the execution monitoring module which is part of the DyKnow framework. For a more detailed description see [8].

37.7 Applications using DyKnow In the following subsections, we will show how the DyKnow framework can be used to generate fluent streams for further processing by two important deliberative functionalities in the DARA system, chronicle recognition and execution monitoring. Both are implemented in the UAV system. Before doing this, we provide a short description of the Dynamic Object Repository (DOR), an essential part of the DARA which uses the DyKnow framework to provide other functionalities in the system with information about the properties of dynamic objects most often constructed from sensor data streams.

488

Fredrik Heintz and Patrick Doherty

37.7.1 The Dynamic Object Repository The Dynamic Object Repository (DOR) is essentially a soft real-time database used to construct and manage the object linkage structures described in section 37.6. The DOR is implemented as a CORBA server and the image processing module interfaces to the DOR and supplies vision objects. Task procedures in the MTA access feature information about these objects via the DyKnow framework, creating descriptors on-the-fly and constructing linkages. Computational units are used to provide values for more abstract feature properties associated with these objects. For example, the co-location process involving features from the vision, helicopter and camera objects, in addition to information from the GIS, use computational units to output geographical coordinates. These are then used to update the positional features in world objects linked to the specific vision objects in question. Objects are referenced via unique symbols which are created by the symbol generation module which is part of the DOR. Each symbol is typed using pre-defined domains such as car, world-object, vision-object, vehicle, etc. Symbols can be members of more than one domain and are used to instantiate feature representations and as indexes for collecting information about features which take these symbols as arguments. Since domains collect symbols which reference a certain type of object, one can also conveniently ask for information about collections or aggregates of objects. For example, "take all vision objects and process a particular feature for each in a certain manner". 37.7.2 An Application to Clironicle Recognition Chronicles are used to represent complex occurrences of activity described in terms of temporally constrained event structures. In this context, an event is defined as a change in the value of a feature. For example, in a traffic monitoring application, a UAV might fly to an intersection and try and identify how many vehicles turn left, right or drive straight through a specific intersection. In another scenario, the UAV may be interested in identifying vehicle overtaking. Each of these complex activities can be defined in terms of one or more chronicles. In the WITAS UAV, we use the CRS chronicle recognition system developed by France Telecom. CRS is an extension of IxTeT [9]. Our chronicle recognition module is wrapped as a CORBA server. As an example, suppose we would like to recognize vehicles passing through an intersection. Assume cars are being identified and tracked through the UAV's camera as it hovers over a particular intersection. Recall that the DOR generates and maintains linkage structures for vehicles as they are identified and tracked. It can be assumed that the following structured features exist: pos = positJon(DOR, policyl, carl) roadseg = road-segment(DOR, roadSegment(pos), policy2, carl) incross = in_crossing(DOR, inCrossing(roadseg), policyS, carl)

pos is a feature of a car object and its fluent stream can be accessed via the DOR as part of its linkage structure, roadseg is a complex feature whose value

37 DyKnow

489

is calculated via a computational unit roadSegment which takes the geographical position of a world object associated with the car object as argument and uses this as an index into the GIS to return the road segment that the vehicle is in. Similarly, incross is a complex feature whose value is produced using a computational unit that takes the roadseg fluent stream as input and returns a boolean output stream, representing whether the car is in a crossing or not, calculated via a lookup in the GIS. For the sake of brevity, a car is defined to pass through an intersection if its road segment type is not a crossing then it eventually is in a road segment that is a crossing and then it is again in a road segment that is not a crossing. In this case, if the fluent stream generated by incross generates samples going from false to true and then eventually true to false within a certain time frame then the car is recognized as passing through a crossing. The chronicle recognition system would receive such streams and recognize two change events which match its chronicle definition and thereby recognize that the car has passed through the crossing. The stream itself requires some modification and policyS specifies this via a monotonic time constraint and a change constraint. The monotonic time constraint would make sure the stream is ordered, i.e. the time stamp of events increase monotonically. The change constraint specifies how change is defined for this stream. There are several alternatives which can be used: • • •

any change policy - any difference between the previous and current value is a change; absolute change policy - an absolute difference between the previous and current value larger than a parameter delta is a change; relative change policy - a. normalized difference between the previous and current value larger than a parameter delta is a change.

There are obvious variations on these policies for different types of signal behavior. For example, one might want to deal with oscillatory values due to uncertainty of data, etc. The example used above is only intended to provide an overview as to how DyKnow is used by other modules and is therefore simplified. 37.7.3 An Application to Execution Monitoring The WITAS UAV architecture has an execution monitoring module which is based on the use of a temporal logic, LTL (linear temporal logic with intervals [7]), which provides a succinct syntax for expressing highly complex temporal constraints on activity in the UAV's internal environment and even aspects of its embedding environment. For example safety and liveness conditions can easily be expressed. Due to page limitations we can only briefly describe this functionality. Essentially, we appeal to the intuitions about viewing bundles of fluent streams as partial models for a temporal logic and evaluating formulas relative to this model. In this case though, the model is fed piecewise (state-wise) to the execution monitor via a state extraction mechanism associated with the execution monitor. A special progression algorithm [7] is used which evaluates formulas in a current state and retums a new formula which if true on the future states would imply that the formula is true for the complete time-line being generated.

490

Fredrik Heintz and Patrick Doherty

The DyKnow system is ideal for generating such streams and feeds these to the execution monitor. Suppose we would like to make sure that two task procedures (all invocations) in the reactive layer of the DARA, called A and B, can never execute in parallel. For example, A and B may both want to use the camera resource. This safety condition can be expressed in LTL as the temporal formula always-i(3x3y tp_name[x]="A" A tp_running[x]=true A tp_name[y]="B" A tp-runing[y]=true), where "always" in the formula is the modal operator for "at all times". To monitor this condition the execution monitor requires fluent streams for each of the possible instantiations of the parameterized features tp_name and tp_running which can be generated by the reactive layer of the DARA. These are fed to the instantiated execution monitor which applies the progression algorithm to the temporal formula above relative to the fluent streams generated via the DyKnow framework. This algorithm is run continuously. If the formula evaluates to false at some point, an alert message is sent to a monitor set up by the functionality interested in this information and modifications in the system configuration can be made.

37.8 Related Work The DyKnow framework is designed for a distributed, real-time and embedded environment [10, 11] and is developed on top of an existing middleware platform, realtime CORBA [12], using the real-time event channel [13], the notification [14] and the forthcoming real-time notification [15] services. One of the purposes for this work is in the creation of a knowledge processing middleware capability, i.e. a framework for interconnecting different knowledge representation and reasoning services, grounding knowledge in sensor data and providing uniform interfaces for processing and management of generated knowledge and object structures. The framework is quite general and is intended to serve as a platform for investigating a number of pressing issues associated with the processing and use of knowledge on robotic platforms with soft and hard real-time constraints. These issues include anchoring, or more generally symbol grounding, signal to symbol transformations, information fusion, contextual reasoning, and focus of attention. Examples of application services which use the middleware capabilities are execution monitoring services, anchoring services and chronicle recognition services. We are not aware of any similar frameworks, but the framework itself uses ideas from many diverse research areas mainly related to real-time, active, temporal, and time-series database [16, 17, 18], data stream management [19, 20, 21], and work in the area of knowledge representation and reasoning. The main differences between DyKnow and the database and data stream approaches are that we have a different data model based on the concepts of features and fluents and we have many views or representations of the same feature data in the system each with different properties depending on the context where the feature is used as described by a policy.

37 DyKnow

491

References 1. Doherty, P.: Advanced research with autonomous unmanned aerial vehicles. In: Proceedings on the 9th International Conference on Principles of Knowledge Representation and Reasoning. (2004) 2. Doherty, P., Granlund, G., Kuchcinski, K., Sandewall, E., Nordberg, K., Skarman, E., Wiklund, J.: The WITAS unmanned aerial vehicle project. In: Proceedings of the 14th European Conference on Artificial Intelligence. (2000) 747-755 3. Doherty, P., Haslum, P., Heintz, P., Merz, T., Nyblom, P., Persson, T., Wingman, B.: A distributed architecture for autonomous unmanned aerial vehicle experimentation. In: Proceedings of the 7th International Symposium on Distributed Autonomous Robotic Systems. (2004) 4. Object Computing, Inc.: TAO Developer's Guide, Version 1.3a (2003) See also h t t p : //www.cs . w u s t l . e d u / ~ Schmidt/TAO. h t m l . 5. Heintz, E, Doherty, P.: DyKnow: An approach to middleware for knowledge processing. Journal of Intelligent and Fuzzy Systems (2004) 6. Coradeschi, S., Saffiotti, A.: An introduction to the anchoring problem. Robotics and Autonomous Systems 43 (2003) 85-96 7. Lamine, K.B., Kabanza, E: Reasoning about robot actions: A model checking approach. In: Advances in Plan-Based Control of Robotic Agents. LNAI (2002) 123-139 8. Heintz, E, Doherty, P.: Managing dynamic object structures using hypothesis generation and validation. In: Proceedings of the AAAI Workshop on Anchoring Symbols to Sensor Data. (2004) 9. Ghallab, M.: On chronicles: Representation, on-line recognition and learning. In: Proceedings of the International Conference on Knowledge Representation and Reasoning (KR-96). (1996) 10. Schmidt, D.C.: Adaptive and reflective middleware for distributed real-time and embedded systems. Lecture Notes in Computer Science 2491 (2002) 282-?? 11. Schmidt, D.C.: Middleware for real-time and embedded systems. Communications of the ACM 45 (2002) 43-48 12. Schmidt, D.C., Kuhns, E: An overview of the real-time CORBA specification. IEEE Computer 33 (2000) 56-63 13. Harrison, T., Levine, D., Schmidt, D.C.: The design and performance of a real-time CORBA event service. In: Proceedings of the ACM SIGPLAN Conference on ObjectOriented Progranmiing Systems, Languages and Applications (OOPSLA-97). Volume 32, 10 of ACM SIGPLAN Notices., New York, ACM Press (1997) 184-200 14. Gruber, R., Krishnamurthy, B., Panagos, E.: CORBA notification service: Design challenges and scalable solutions. In: 17th International Conference on Data Engineering. (2001) 13-20 15. Gore, P., Schmidt, D.C., Gill, C , Pyarali, I.: The design and performance of a real-time notification service. In: Proc. of the 10th IEEE Real-time Technology and Application Symposium. (2004) 16. Eriksson, J.: Real-time and active databases: A survey. In: Proc. of 2nd International Workshop on Active, Real-Time, and Temporal Database Systems. (1997) 17. Ozsoyoglu, G., Snodgrass, R.T.: Temporal and real-time databases: A survey. IEEE Trans. Knowl. Data Eng. 7 (1995) 513-532 18. Schmidt, D., Dittrich, A.K., Dreyer, W., Marti, R.W.: Time series, a neglected issue in temporal database research? In: Proceedings of the International Workshop on Temporal Databases, Springer-Verlag (1995) 214-232

492

Fredrik Heintz and Patrick Doherty

19. Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C , Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. VLDB Journal (2003) 20. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM Symposium on Principles of Database Systems (PODS 2002). (2002) 21. Group, T.S.: STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1) (2003)

38 Classifier Monitoring using Statistical Tests Rafal Latkowski^'^ and Cezary Glowinski^ ^ SAS Institute ul. Gdanska 27/31,01-633 Warszawa, Poland [email protected] ^ Warsaw University, Institute of Computer Science ul. Banacha 2, 02-097 Warszawa, Poland [email protected]

Summary. This paper is addressed to methods for early detection of classifier fall-down phenomenon, what gives a possibility to react in advance and avoid making incorrect decisions. For many applications it is very essential that decisions made by machine learning algorithms were as accurate as it is possible. The proposed approach consists in applying a monitoring mechanism only to results of classification, what not cause an additional computational overhead. The empirical evaluation of monitoring method is presented based on data extracted from simulated robotic soccer as an example of autonomous agent domain and synthetic data that stands for standard industrial application.

38.1 Introduction The achievements of machine learning make it possible to apply it to many areas. Predictive models and built with their help classifiers not only enable us to create autonomous agents, but are commonly used also in business and industry. It is very essential that decisions made by machine learning algorithms were as accurate as it is possible. In other case they cannot achieve the expected targets, wherever applied: to marketing, to industry or in autonomous systems. Generally speaking the correctness of the decision making strictly depends on the accuracy of applied classifier. Obviously, the accuracy of the classifier is measured during the training phase. While creating the predictive model we select for deployment the model that achieves the highest accuracy and stability measured over prepared test data sets. Such verification is not possible during the productive life cycle of classifier, when it is applied to the real data gathered in dynamic and nondeterministic environment. The question that arises from such a situation is how we can trust the results of classifier? The first phenomenon that makes it doubtful to trust the classifier is that every natural process is evolving in time, e.g., customers are learning other offer and products, machines are changing their physical parameters and autonomous agents learn new strategies, what is frequently described as "concept drift" (see, e.g., [7]). It is known fact, that the classification results are continuously getting weaker and such a process is called ageing of the model. Usually the process of model ageing is slow and the reporting

494

Rafal Latkowski and Cezary Glowinski

is employed to identify it in a posteriori process, when the actual decision is known. The actual value of the decision is known not exactly at the same point of time when the classification is made, but dependently on the application, from fraction of second up to several months after the classification. The second phenomenon is sudden change of process of the revolutionary character, e.g., introduction of completely new product on market, machine failure or reprogramming autonomous agent with new meta-strategy of learning. The sudden classifier ageing or classifier fall-down phenomenon can be a consequence of many circumstances, even errors or changes in data preprocessing. It is a very dangerous phenomenon because it result in making wrong decision for a period of time (a couple of months in worst case), what can result in severe losses. To better express the necessity of the classifier monitoring let take some examples. The first example is related to autonomous agents. The open research community concentrated on the robotic soccer and RoboCup world championships has an aim to compete by the 2050 a human team of soccer players with a team of autonomous humanoid robots (see, e.g., [4]). Many research groups build software simulators or hardware robots for achieve this goal. Such an artificial soccer player should have special classifier that recognizes the strategy of opponent. This classifier can be misled by opponent that is completely reprogrammed or comes from newly created team. In such a situation classifier fall-down phenomenon can result in losing the game. The second example comes from business application. The telecommunication operators collect a lot of data on their customers. This data is used, e.g., to avoid the customer resignations by predicting them in advance. Such systems for customer retention are suffering from classifier fall-down phenomenon, e.g., when completely new categories of products are introduced. With false prediction the marketing campaigns are directed not to the desired target group. In this case reduced accuracy results in measurable losses even comparing to the case without classifier at all. This paper is addressed to methods for early detection of classifier fall-down phenomenon, what gives a possibility to react in advance and avoid making incorrect decisions. The proposed method consists in applying a monitoring mechanism only to results of classification, what not cause an additional computational overhead. The paper is organized as follows. In next Section the classifier monitoring method is described. Section 38.3 provides empirical evaluation with detailed description of the data sets and experiments. Section 38.4 contain final conclusions and remarks.

38.2 Method description 38.2.1 Motivation The initial idea on how to monitor a classifier could be checking the distributions of variables that are used to make the decision (predictors). In such an approach all variables are independently tested before classification is performed. This approach can be applied only to the cases, when the distribution of one variable is significantly different in training and test set. If the distribution is changing on more than one variable than even insignificant changes on one variable can result in classifier fall-down. The

38 Classifier Monitoring using Statistical Tests

495

proposed here approach is free from such deficiencies because it consist in testing the classifier answer. There is also another common situation that results in classifier fall-down. Training data used to build the model does not cover the full scope of universe, because, even when universe is finite, it is enormous huge. We believe that inductive learning find the proper generalization of presented facts. However, in real applications the classification of objects very far from presented ones in training phase results in pure accuracy. The one-variable test can easily do not capture such a situation. There were proposed some solutions to this problem (see, e.g., [6]), but they assume monitoring the object space by nearest neighbor methods or neural networks. These algorithms require additional computational effort comparable to the cost of creating classifier itself. Our approach require only a linearly proportional time to the number of objects in test and training set. 38.2.2 Classifier Monitoring The proposed approach consists in applying a monitoring mechanism only to results of classification. The classifier monitoring compares the distribution of answers on data set used for training with distribution of answers on data set currently being classified. If the applied test shows the significant difference, than it is a signal to perform detailed checking of classifier and, e.g., build new model. There are a number of statistical tests for comparing different properties of one, two or a number of distributions. In this research we utilize nonparametric statistical tests and we do not assume any particular distribution. Only several statistical tests satisfy such a conditions, in particular: Wilcoxon rank sum test (equivalent to the Mann-Whitney test) and Kolmogorov-Smimov test (see, e.g., [2, 3, 5]). These tests detect the differences in location and shape of two distributions. The Wilcoxon and Kolmogorov-Smimov tests have the advantage of making no assumption about the distribution of data, i.e., they are non-parametric and distribution free. The result of classification process usually can be of two types. The simpler type is one-valued decision that assigns classified object to a particular decision class. The more expressive result of classification is the probability vector that assigns to each possible decision a predicted probability that classified object belongs to considered decision class. For our research we use the second type of answer, what gives more detailed information on how model works on provided data. The classification or prediction process frequently proceed in bunches or in data streams, where not one object is classified, but whole set of objects. Such a situation occurs when we are performing stand-alone tests on previously prepared data or classification (prediction) is performed for, e.g., total base of customers. The result of classification is then a set of answers, i.e., probability assignments. In this paper we are limited to the binary decision — yes or no, what corresponds to classification that object belongs to a concept versus classification that object does not belong to a concept. The procedure of classifier monitoring is following (cf. Fig. 38.1): 1. Let C is a classifier, T = {^i,... ,tn} is data set used for training and P = {Pi? • • • ^Pm} is new data set, being currently classified. 2. Select one decision class d for which the probability assignments will be considered. From now on we will assume, that Cj^ : [/ -^ [0,1] gives an proba-

496

Rafal Latkowski and Cezary Glowiriski

bility assignment that an object x belongs to decision class d with probability C\d{x) = s. 3. Prepare set of probability assignment ST, called scoring, for data set used for training T. The set 5 T = { s f , . . . , 5^} consist of all answers of classifier C, such that C|d(^i) = sj, 4. Prepare set of probability assignment (scoring) Sp for data set used for testing P, The set Sp = { s f , . . . , s ^ } consist of all answers of classifier C, such that C\d{Pi) = sf. Scoring Sp can be computed without knowing the actual decision value, so also before gathering the data on decision. 5. Perform statistical test on ST and Sp that compares whether changes in classification process are significant or not. If the test value exceed a specified threshold, than notify of potential classifier fall-down. Training Set

Test Set Scoring for test set

Fig. 38.1. The procedure of classifier monitoring applies a statistical test to results of classification. 38.2.3 Classifier Fall-Down Identification The proposed approach for classifier monitoring consists in comparing two scorings: for training data and for currently classified data. There are several issues on proper classifier fall-down identification using this approach. The empirical evaluation presented further shows, that not all statistical tests are applicable to this problem, even in spite of satisfying requirements, e.g., that a test is model free. Besides presented here method of classifier monitoring we evaluated also another approach that compares not the scorings, but the distributions of tested objects to final leaves of decision tree. However, in such an approach we found no test or measure that correctly recognizes the classifier fall-down phenomenon. The Wilcoxon signed rank test, cosine measure, Kullback-Leibner divergence measure or six-sigma rule either do not capture the classifier fall-down or notify of nonexistent one. We suspect that the problem with those measures comes from the fact, that they do not consider the actual score value s that is assign to each decision tree leaf. If we consider the Kolmogorov-Smimov test on two scorings, then this test depends not only on distribution of objects to decision tree leaves but also on the actual score value in each leaf. The empirical distribution Junction (EDF) of scoring, which is used to calculate the KS-test, can be fully determined form distribution of objects to leaves combining with leaf score value. Perhaps other measures that take into consideration also the score value of leaves can be successfully applied to this problem. In fact the transformation of the Kolmogorov-Smimov test from EDF to

38 Classifier Monitoring using Statistical Tests

497

distribution of objects to leaves combining with leaf score values results in reduction of computational complexity of testing and in great compression of the classifier control data that has to be stored. The unresolved issue is how to estimate the optimal threshold value that delimitate predicted acceptable classifier accuracy from accuracy fall-down. Even if we precise the border between acceptable and unacceptable classifier accuracy it is unknown how to estimate this threshold. In our research we are familiar with considered data and classifier properties, so the threshold can be determined based on an expert experience. However, we do not have a general answer on how to estimate the threshold for proposed statistical tests. The proposed classifier monitoring is able to detect the accuracy fall-down only if there are some differences in description of classified objects. We can imagine another situation, where all object descriptions are untouched, but the concept is changing itself. In spite of that such a case is unobserved in real applications, it is possible to, e.g., generate the same synthetic data but with other concept labeling, where differences are only in decision attribute (target variable). There is no method at all to identify that prior knowing the actual decision (concept), while it touches the problem of learning the proper concept itself. In particular the proposed method of classifier monitoring is not able to recognize such a situation.

38.3 Empirical Evaluation 38.3.1 Data Description We used two groups of data sets for experimental evaluation of proposed method. The first group is synthesized in such a way that simulates an industrial data mining application. The second group is extracted from the RoboCup World Championship 2003 in soccer simulation league. Table 38.1. The results of experiments with synthetic data, where decision tree classifier was induced forfirstdata set. Data Set

Accuracy

Error rate

Standarized Wilcoxon Statistic

P-value Wilcoxon Test

Kolmogorov-Smimov Statistic

1 2 3 4

83.83% 70.83% 57.20% 43.71%

16.17% 29.17% 42.80% 56.29%

0 0.409571 -0.3174 0.200541

1 0.682121 0.75094 0.841057

0 0.017434 0.031917 0.037072

The datasets for simulating an industrial application are synthesized. They contain samples from two multinormal distributions in eight dimensional space [0,1]^. There are four data sets, where the standard deviations are constant, but locations are getting closer in consecutive data sets. Each data set contain about 10000 observations (objects). The data sets from RoboCup domain are extracted from log files of soccer simulator games that held at the finals of RoboCup World Championship 2003. The data contain the overall information about playfield, like position of players or number of executed already actions of each type. Each simulated player on the playfield was

498

Rafal Latkowski and Cezary Glowinski

manually market, whether it plays using an offensive strategy (attacker) or defensive strategy (defender or goalie). The data was desymmetrized and transformed to a special form, where each record describes one player at given time point of game. The finally transformed data contains 46 conditional attributes and one decision (target) attribute, namely strategy. There are eight data sets collected from four games with four participating teams, so each team is represented in two data sets. Each data set contain about 70000 observations (objects). 38.3.2 Experiments We carried out experiments separately for RoboCup domain data sets and syntectic data sets. The experiments were performed using an algorithm for decision tree induction implemented in SAS Enterprise Miner (see, e.g., [1]). The automatically generated scoring code allows storing both, scoring and distribution of leaves. The first group of experiments were carried out for synthetic data sets. The decision tree model was induced for the first data set, where centers of two normal distributions are distant. Then the classifier was applied to all four data sets. The classification results were gathered and tested as described in previous section. The results of this experiment are presented in Table 38.1. The first data set was used in both, training and testing. In case of first data set we can observe the highest classification accuracy and obviously no differences detected by statistical tests at all. The consecutive data sets, that contain samples from closer distributions, are worse classified by model induced for the first data set. The Wilcoxon statistic does not capture the essential classifier fall-down that occur for third and fourth data set. In the case of Kolmogorov-Smimov statistic we can easily observe that first and second data set receive values less then 0.2, while third and fourth on more than 0.2. If we put a threshold at level 0.2, than Kolmogorov-Smimov statistic perfectly detects the classifier fall-down. The experiments for data sets from RoboCup domain were performed differently. The model for predicting strategy was built for each data set. Each classifier was applied to all data sets. There are eight data sets, so also eight models were induced. In total 8 • 8 = 64 experiments were carried out to cover all combinations. Such a proceeding simulates a strategy detection classifier that is faced to unknown team or known team but in other game. The results of classification accuracy are presented in Table 38.2. As we expect the diagonal elements, which correspond to classifying the data set on which the model as built, present fully accurate or almost fully accurate classification. The similar observation holds for classifying the same team, for which model was built, but from the other game. The weakest classification accuracy in this category is 97.07% for model built on team TsinghuAeolus in game 4 (final) and tested on game 1 (third level group game). The classification accuracy of other teams varies from 36.36% up to 100%. The results of Kolmogorov-Smimov test are presented in Table 38.3. The results presented in this table are almost perfectly correlated to accuracy results. The diagonal elements are obviously equal to zero and classification of the same team

38 Classifier Monitoring using Statistical Tests

499

Table 38.2. The accuracy results of experiments with data sets from RoboCup domain. Training data set Tsinghu- Gl Aeolus G4 UvA_- 0 2 Trileam G4 Everest G2 03 Brain- 0 1 stormers 0 3

Test data set UvA_Trileam | Everest Oame2 | Oame4 [ Oame2 | Oame3 99.03% 97.13% 91.44% 96.53% 89.69% 87.96% 99.39% 99.07% 99.99% 99.59% 99.47% 97.37% 98.13% 100% 76.84% 76.4% 89.91% 89.28% 100% 98.66% 88.32% 88.1% 99.26% 99.99% 63.64% 72.73% 72.73% 45.45% 63.64% 72.73% 72.73% 45.45%

TsinghilAeolus Oamel Oame4 100% 98.56% 97.07% 100% 98.26% 99.86% 97.14% 90.19% 97.61% 100% 99.36% 98.98% 36.36% 63.64% 36.36% 63.64%

Brainstormers03 Oamel | Oame3 94.48% 99.25% 99.34% 95.26% 98.81% 98.93% 78.28% 96.11% 98.63% 96.12% 99.25% 93.27% 100% 100% 100% 100%

Table 38.3. The Kolmogorov-Smimov statistic results of experiments with data sets from RoboCup domain. Testd ata set Everest TsinghilAeolus UvA_l rileam Oamel Oame4 Oame2 Oame4 Oame2 Oame3 0.0007 0.0042 0.0143 0.0393 0.0135 0 0.0147 0 0.0515 0.0602 0.0010 0.0017 0.0090 0.0007 0 0.0017 0.0027 0.0112 0.1158 0.1180 0.0143 0.0490 0.0076 0 0 0.0015 0.0120 0.0001 0.0504 0.0536 0.0016 0.0039 0.0583 0.0595 0.0018 0 0.0455 0.0909 0.1818 0.1364 0.1364 0.0909 0.0454 0.0909 0.1817 0.1363 0.1363 0.0909

Training data set 01 04 02 04 02 03 Brain- Ol stormers 0 3

TsinghuAeolus UvA.Trileam Everest

• 1.0 0.9 0.8 H 0.7 0.6 0J5 0.4 03-\ 02-{ 0.1 0.0

•

•

•

Accuracy

•

•

^—0—^

KS Statistic

Brainstc)rmers03 Oamel Oame3 0.0037 0.0030 0.0027 0.0237 0.0060 0.0056 0.1086 0.0194 0.0013 0.0194 0.0030 0.0329 0 0 0 0

P-Value Wilcoxon

i,n;nin»i.m..i>..»wfiiuiig: I I I I I I I I I I I I I I I I r I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 'I I I

1

8

15

2

2

2

9

3

6

4

3

5

0

57

64

Fig. 38.2. The classification accuracy and statitical test results on data from RoboCup domain. The results are sorted by classification accuracy.

gives KS-test value below 0.015. Figure 38.2 presents the same results in graphical form, where experiments are sorted with respect to classification accuracy. It is easy to observe that while the accuracy is decreasing the KS-test value is almost al-

500

Rafat Latkowski and Cezary Glowinski

ways increasing. If we set the threshold between 0.04 and 0.045 then all 22 worst classification results in range from 36% to 90% are recognized as doubtful. If we set the threshold between 0.061 and 0.09 then the classification accuracy fall-down from level 88% to 78% is correctly recognized except two the worst experiments. It means that 12 out of 14 cases are correctly recognized. The p-value of Wilcoxon rank sum test, presented on Figure 38.2, does not manifest similar properties. The p-value for experiments with 100% classification accuracy is 1.0. However, for other experiments the p-value is extremely variable and is almost zero also for tests with classification accuracy above 90%.

38.4 Conclusions The empirical evaluation shows that the application of proper statistical test makes it possible to detect the classifier malfunctioning. The experimental results showed that the Kolmogorov-Smimov test is recommended for detecting the classifier fall-down phenomenon. The proposed method can be applied to monitor any type of classifier under assumption that it generates scoring if form of probability estimation, e.g., probability of belonging to a decision class. The proposed approach is suitable for detection of classification accuracy falldown in case of binary classifiers. For other purposes it is necessary to extend the scoring definition in order to apply similar statistical tests or replace the testing technique. The other deficiency of proposed method is lack of strict guidelines how to determine the proper threshold value and its confidence interval. In our further research we will try to overcome this problem by providing strict estimations on the possible classification accuracy fall-down with respect to the KS-test value. Although presented experiments were carried out using decision tree induction algorithm, there is no obstacle to apply this method to other classifiers, e.g., based on decision rules or artificial neural networks. The proposed method of classifier monitoring is applicable to classifiers induced by any algorithm. The only requirement is the availability of scoring or similar probability-like values that are produced by classifier.

References 1. Data Mining Using SAS Enterprise Miner: A Case Study Approach, Second Edition. SAS Publishing (2003) 2. Conover W.J.: Practical Nonparametric Statistics, Second Edition. John Wiley & Sons (1980) 3. Hollander M., Wolfe D. A.: Nonparametric statistical inference. John Wiley & Sons (1973) 4. Kaminka G. A., Lima P. U., Rojas R.: RoboCup 2002: Robot Soccer World Cup VI. LNCS 2752. Springer (2003) 5. Koronacki J., Mielniczuk J.: Statystyka dla studentow kierunkow technicznych iprzyrodniczych. WNT (2001) 6. Liu Y., Menzies T., Cukic B.: Data Sniffing — Monitoring of Machine Learning for Online Adaptive Systems. In 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAr02). IEEE (2002) 7. Freund Y., Mansour Y: Learning under persistent drift. In S. Ben-David, editor, Proceedings of the EuroCOLT97. LNCS 1208, 94-108. Springer (1997)

39 What Do We Learn When We Learn by Doing? Toward a Model of Dorsal Vision* Ewa Rauch Linkoping University, Sweden (on leave) [email protected] Summary. Much effort in computer science is currently focused on developing architectures for multi-agent adaptive system capable of monitoring the environment and detecting security threats. I present here one such architecture developed by evolution and implemented in the neural mechanisms of the human brain - it is the dorsal visual system. I claim that the dorsal visual system in the human brain can be modeled as two cooperating rough agents which monitor the environment and guide other systems. The two agents' adaptation capabilities can be modeled on the basis of research in neuroscience related to the processes of implicit learning from experience. In the paper I first present arguments behind my claim. Next, I show how studying the dorsal visual system may help to improve human-machine interaction. Finally, I suggest how the conjectures presented here can be tested experimentally. Key words: categorization, dorsal vision,finitestate automata, guidance, knowledge discovery, patterns, proportional monitoring, rough set theory, templates

39.1 Introduction Building monitoring software systems capable of operating in "out-of-human-reach" environments, like deep seas or earthquake areas, presents still a serious challenge. Such systems need to be quite autonomous and adapt to new environments with minimal involvement of human expertise. They have to provide informative guidance for other software systems and to alert human operators, when a threat is detected. Ease of communication with human operators is of highest importance. The cognitive load that is put on people operating existing monitoring systems is much too high: "Instead of, how it is today, having 4 people occupied with interpreting information from a large sky telescope, we want to have a single operator working in front of screens with information from 4 such telescopes" [22] How can this goal be achieved? Probably, not by improving existing artifacts. Despite many important achievements, traditional Artificial Intelligence (AI) has not *I would like to express my gratitude to Prof. Andrzej Skowron for his advice. His comments are invaluable help to me. The research has been supported by the grant 3 Tl IC 002 26 from Ministry of Scientific Research and Information Technology of the Republic of Poland.

502

Ewa Rauch

provided efficient means for building such software. Complex environments, changing rapidly and largely unpredictably, cannot be monitored using approaches based on prespecified knowledge, like statistics, cause-effect relationships, or domain models. These approaches require a lot of human expertise and are relatively inflexible. As Waltz and Kasif say about statistical decision theory, "the models we can devise effectively are rarely accurate," while methods of traditional AI "rely on laborious hand-coding, have difficulty coping with uncertainty and change" [23]. Moreover, monitoring systems are oriented toward predictions, while traditional AI approaches usually provide rich diagnostic knowledge, which is necessary in breakdown situations, but often not useful for prediction. As Cobos et al, point, in diagnosis and in prediction "the temporal structure of the information provided to a reasoner may vary (e.g., multiple events followed by a single event vs. a single event followed by a multiple events)" [2]. Traditional AI approaches are organised around global, inflexible, usually hand-coded, models. Waltz and Kasif stress that it is not a proper way of building monitoring systems required to act in dynamic and uncertain environments. They posit that local models, learned over small neighborhoods from the most relevant instances, should be used instead [23]. It might seem that we need to invent a new architecture, but is it really true? In this paper I show that an architecture composed of two autonomous cooperating subsystems capable of monitoring a complex environment and adapting to its changing character has already been invented and is actually realized in many implementations. This architecture provides for building reliable monitoring systems which learn local models without supervision and generate efficient guidance information without increasing cognitive load unless absolutely necessary. This architecture underlies the dorsal visual system in the human brain, invented and perfected over ages by Evolution. Previously relatively unknown, because of its impenetrability by conscious access, the dorsal visual system is now amenable to investigation in the healthy human brain using neuroimaging procedures. It may be observed, studied, and computationally modeled using new, so called "soft", AI technologies. This paper is organized as follows. First, I present arguments behind my claim that the dorsal visual system in the human brain can be modeled as two monitoring rough agents. Next, I discuss how studying the dorsal visual system may help to improve human-machine interaction. Finally, I suggest how the conjectures presented here can be tested experimentally.

39.2 IVfodeling the dorsal system as two monitoring rough agents In order to show that the dorsal visual system can be modeled as two monitoring rough agents, I first present evidence that it is a system which monitors the visual environment and provides guidance to other systems. Later, I refer to research which suggests that the dorsal visual system adapts to changing conditions by implicit learning from experiences of interactions with objects in the new environment. Then, I present a patient case which demonstrates visual performance which is possible when visual information from the environment is provided exclusively by a single, retino-cortical dorsal pathway. It shows how powerful and autonomous is each of the, two in the healthy human brain, dorsal visual pathways. Having shown that

39 Toward a Model of Dorsal Vision

503

the dorsal visual system is an adaptive system which has two, relatively autonomous, subsystems, I draw on the most often used definition of an agent as a software system which is adaptive, cooperative, and autonomous, and conceptualize the dorsal visual system as two agents. Finally, I present some conjectures underlying the work on modeling the dorsal visual system and explain why its computational model has to be built using tools based on rough set theory [19], one of new technologies in AI. 39.2.1 Monitoring and the dorsal visual system The world around us is an ever-changing, complex environment. In order to act advantageously we need to monitor it. Gruber and Goschke present recent behavioral and functional neuroimaging studies suggesting that in the human brain continuously operate mechanisms of background-monitoring. These mechanisms, which are located in the fronto-parietal regions, interpret information that selective attention dismisses as task-irrelevant. Though potentially distracting for the ongoing action, such information cannot be ignored completely because in a changing environment task-irrelevant information may signal serious threats or important chances. The background-monitoring mechanisms, working for the most silently and in hiding, when necessary, interrupt an ongoing action and update working memory. This way the executive control centra in the human brain may be "influenced by the occurrence of unexpected or significant stimuli outside the current focus of attention. [...] This may enable the individual to interrupt and adapt behavior in response to significant and/or unexpected events." [10] People gather information from the world via senses. The most often used is, probably, vision. In the human brain operate two, anatomically and functionally separate, visual systems: the ventral visual system and the dorsal visual system. In the ventral visual system there are two retino-cortical pathways; each one leading from the eye to the temporal lobe in the same hemisphere. In the dorsal visual system each of the two retino-cortical pathways crosses the midline and goes to the contralateral parietal lobe. The ventral visual system, also called the 'what'-system, getting its input from the central visual field, is very well understood, since it is accessible for conscious access and for verbal report [9]. It was extensively studied over years, mainly as a consequence of interest in human-computer interaction (HCI). Because the ventral visual system processes only stimuli coming from the central visual field around the current focus of attention, the background-monitoring processes take their input from the stimuli carried by the dorsal visual system. It means that the dorsal visual system in the human brain is responsible for monitoring events in the visual environment and gathering information which may signal threats or unexpected chances. 39.2.2 Adaptivity and the dorsal visual system In order to act advantageously in the complex and changing world. Evolution equipped us with adaptation capabilities. "The ability to detect unusual, significant, or possibly dangerous events is fundamental for adapting to a rapidly changing environment and insuring survival of the organism" [16]. When a behavior, understood here as a sequence of actions, successful in the old environment is no longer advantageous, it must be adapted to the new environment. The dorsal visual system, which

504

Ewa Rauch

is responsible for visual guidance of actions [18], must also quickly adapt to the new conditions. Since the dorsal visual system is impenetrable for conscious access, it cannot be provided with explicit knowledge form the cognitive mechanisms and must learn autonomously from experiences of interactions with the new environment. Recent results in the, closely related to learning action sequences, area of sequential learning, improve our understanding of how the dorsal visual system learns. Keele et al. present a theory of sequence learning - a single theory which has grown out of years of independent studies in several laboratories with different origins. This theory suggests that there are two learning systems and that they represent sequential regularities in different ways. The first system, called multidimensional, builds cross-dimensional associations between events. The second system, unidimensional, is strongly modular. Each module is capable of automatically extracting regularities from its input events. Learning in such modules is implicit and dual-task conditions do not disrupt it. The dichotomy unidimensional/multidimensional closely resembles the dorsal/ventral dichotomy in the human visual system, but with respect to the dorsal visual system the theory suggests "a more general role for this system, proposing modules that provide the representations for an organized sequences of actions" [12]. Hence, the dorsal visual system is an adaptive system. It implicitly learns regularities in the environment as well as sequences of actions. 39.2.3 Autonomy/cooperation and the dorsal visual system Though the ventral visual system is well understood since quite a long time, the dorsal visual system is still largely a mystery. We know that it is very sensitive to luminance changes and motion, but we have very poor understanding of what really is happening with the low-spatial resolution stimuli gathered outside the central visual fields and carried via the magnocellular channels to the parietal lobes. Before the advent of the brain imaging tools, the dorsal visual system was difficult to study in healthy humans as it is not available for conscious access. To study it in animals was impossible because it is silenced by anesthetics. Since human interaction with computers was assumed to be channeled exclusively via the ventral visual system, the dorsal visual system as a medium of communication between a human operator and a machine was never really studied. Despite some lesion studies, providing evidence that it is actually a powerful channel of gathering visual information, there was no strong pressure on investigating how the dorsal visual system works. Now, with modem tools allowing us to see a healthy brain in action, the situation is different and the interest in studying the dorsal visual system grows very quickly. In [14] it is reported the case of a 30-year-old-man who at the age of 3 years completely lost his ventral visual system and one retino-cortical pathway in his dorsal visual system. All visual information, that his brain uses is carried by the retino-cortical magnocellular-dorsal pathway leading from his right eye to the left parietal lobe. Despite the fact that he is legally blind - has no conscious vision whatsoever - he is capable of driving a motorcycle and playing fast ball games. He is capable of moving in unfamiliar environments and with poor lighting, relying only on landmarks to direct his movements toward a goal. His visuomotor performance is very good - he easily catches two table tennis balls and juggle with them. At the same time, because of his perceptual

39 Toward a Model of Dorsal Vision

505

limitations, he cannot choose his food at the cafeteria. He is able to grasp precisely an object and name it, though he has no conscious perception of its size, color or orientation. He can neither read nor write, but is fluent in Braille. He "offers the quite unusual opportunity of evaluating what it is to see [...] through a dorsal system on the basis of magnocellular information alone" [14]. This patient case proves that we strongly underestimate the dorsal visual system and its importance for human performance. The architecture of a computational model of the dorsal visual system has to be composed of two agents, because, as we can see from above, the dorsal visual system has two, relatively autonomous, adaptive subsystems. 39.2.4 The dorsal visual system as a multi-agent monitoring architecture From what was presented follows that the dorsal visual system may be seen as a monitoring multi-agent architecture implemented in the neural structures of the human brain. A symbolic computational model developed on the basis of this architecture would allow to test hypotheses related to the dorsal visual system as well as to multiagent, monitoring systems in general. Though nowadays neuroscience is studying the dorsal visual system very actively and many researchers have attempted to model some aspects of it, so far I know of no symbolic model of the dorsal visual system, which could provide for computational testing of its monitoring capabilities. Aristotle once said that what we have to learn to do, we learn by doing. His main interest was the phenomenon of change and its importance for human cognitive functioning in world. Recent results in attention research provide evidence that changes are important not only for human cognition, but also for human body performance. Changes, which we are not aware of, may be detected by the dorsal visual system and affect our actions [8]. There is some evidence that, while performing interactive actions, people, unconsciously and unintentionally, learn regularities in environmental changes reflecting effects of the actions [13]. These learned regularities, temporal and spatial, in sequences of changes, may guide attention as well as anticipatory action preparation mechanisms in order to improve performance [6]. Generally speaking, performance can be based on endogenous cues (top-down, memory based, cognitive expectations) or on exogenous cues (bottom-up, sensory events) [6]. Performance based on endogenous cues requires deliberate effort and awareness, while performance based on exogenous cues, detected with or without awareness, is fast and almost effortless. The dorsal visual system learns regularities in sequences of changes in a stream of spatio-temporal information [12]. On this basis it may sometimes predict time delay between two changes and find the next location of interest. These two kinds of information, approximate spatial locations and temporal intervals, are, respectively, spatial and temporal exogenous cues and may, influence performance [4]. Spatial exogenous cues provide benefits only if they reliably predict location of interest. Because of costs inherent in attentional shifts, spatial exogenous cues which are not predictive and require reorienting of attention, are not advantageous. Today, very little is known about the mechanisms employing temporal exogenous cues. I suggest, that temporal exogenous cues are employed by the background-

506

Ewa Rauch

monitoring mechanisms and, while predictive temporal cues provide for efficient monitoring strategies based on proportional monitoring [1], unpredictive temporal cues make necessary using inefficient strategies based on periodic monitoring. I suggest that "leaming-by-doing" may be seen as unconsciously extracting, more and more predictive, spatial or temporal exogenous cues. It is accompanied by decrease in performance guided by the ventral visual system and increase in performance guided by the dorsal visual system. This paper is a part of a work toward a computational model of the dorsal visual system in the human brain. The model is aimed at explaining how unconscious processes of action guidance, which operate in the dorsal visual system in the human brain [18], can extract regularities from visual stimuli and how these regularities can be used in environment monitoring and in the category-invention process of unsupervised learning [3]. The model will also show how finite state machines, suggested as a computational analogue of the neurobiological mechanisms [20], can be induced from the extracted regularities and used by the dorsal visual system in the human brain for representing knowledge learned from experience. Following the ideas presented by Metzinger and Gallese, the model will explain how action ontologies may emerge in the process of unconscious "leaming-by-doing". The work is in a very early phase. Currently, the focus is on modeling the mechanisms which provide for autonomous discovery of predictive temporal knowledge in categorical time-series generated from timestamped snapshots and using it for temporal prediction, as well as on modeling the mechanisms for autonomous discovery of approximate spatial knowledge from sequences of snapshots and using it for spatial prediction. The dorsal visual system in the human brain is proposed to be modeled as two agents. The agents extract information from streams of imperfect data and use it on-line for task guidance. The extracted information is also used off-line for learning from experience. Since the dorsal visual system is not accessible for conscious access, each agent extracts information solely from its input stream. The dorsal visual system is proposed to be modeled using rough agents, i.e., agents performing approximate reasoning based on rough sets [19,17]. According to what results in neuroscience tell us about the dorsal visual system and its monitoring capabilities, it should be modeled as an unsupervised learning system discovering knowledge from an evolving stream of imperfect data. Actually, the tools based on rough set theory provide the best means for modeling such systems [7].

39.3 The dorsal visual system and human-machine interaction The presented here work on modeling the dorsal visual system is aimed toward better understanding of what we unconsciously learn when we "leam-by-doing". In the area of human-machine interaction it may help to reduce the attentional load on the human operator if designers of graphical user interfaces (GUIs) take into consideration the fact that the dorsal visual system is a channel through which the human brain may effortlessly gather action guidance information. Instead of designing GUIs based on endogenous cues, which require deliberate effort and awareness, GUIs may generate exogenous cues whenever not a decision, but a visually guided action, is required.

39 Toward a Model of Dorsal Vision

507

Getting this way some unnecessary attentional load off the human operator is, surely, a step toward having a single operator working with 4 telescopes at once. Another potential advantage of the work on modeling the dorsal visual system is preventing health problems related to using computers at work. Recent rapid development of brain imaging techniques opened ways for studying unconscious processes in the brain of a person performing a task. We know today for sure, that unconsciously registered , spatial as well as temporal, information may strongly affect performance of the musculoskeletal system in the human body [11]. Graphical user interface (GUI) almost continuously generates an evolving stream of graphical stimuli. Since designers take into consideration only spatial organisation of visual information, the temporal regularities, if any, are created accidentally and may provide the dorsal visual system with some unpredictive temporal exogenous cues. In sensitive people this erroneously activates anticipatory action preparation mechanisms, increases the level of neuromotor noise together with accompanying it limb stiffness [24], and eventually may lead to neurological injuries. In order to prevent such problems, GUIs for sensitive computer users must control temporal aspects of information generated on the screen. I believe that some musculosceletal and neurological complaints related to using computers at work, like carpaltunnelsyndrome, may be avoided if designers pay attention to temporal regularities generated on the computer screen. The computational model proposed here may provide means for testing individual differences in sensitivity to temporal exogenous cues as well as temporal regularity of a GUI.

39.4 Computational experiments related to the model Conjectures underlying the proposed model of the dorsal visual system can be seen as an attempt to explain grounding action words in perception. Computational experiments can be designed following the approach described by Regier and Carlson who construct spatial templates for visualization of some linguistic spatial categories and use computational models for predicting acceptability ratings. The predicted ratings are later compared with human behavioral data. Similarly, temporal templates can be created with some linguistic action categories related to these templates, and computational predictions can be compared with behavioral data.

39.5 Final comment I hope that by taking into consideration the processes operating in the dorsal visual system we may build interactive tools which do not harm people using them in everyday work.

References 1. M.S. Atkin, P.R. Cohen: Monitoring in Embedded Agents, Computer Science Technical Report 95-66, University of Massachusetts, Amherst, Ma, 1995. 2. RL. Cobos, F.L. Lopez, A. Cano, J. Almaraz, D.R. Shanks : Mechanisms of Predictive and Diagnostic Causal Induction, Journal of Experimental Psychology: Animal Behavior Processes, 2002, 28(4), 331-346. 3. J.P. Clapper, G.H. Bower: Adaptive Categorization in Unsupervised Learning, Journal of Experimental Psychology: Learning, Memory, and Cognition, 2002, 28(5), 908-923.

508

Ewa Rauch

4. J.T. Coull, A.C. Nobre: Where and when to pay attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI, Journal of Neuroscience, 1998, 18(18), 7426-7435. 5. J.T. Coull, CD. Frith, C. Biichel, A.C. Nobre: Orienting attention in time: behavioural and neuroanatomical distinction between exogenous and endogenous shifts, Neuropsychologia, 2000, 38(6), 808-819. 6. J.T. Coull: fMRI studies of temporal attention: allocating attention within, or towards, time. Cognitive Brain Research, 2004, In Press. 7. I. Duntsch, G. Gediga: Rough set data analysis. In Encyclopedia of Computer Science and Technology, Marcel Dekker, 2000. 8. D. Femandez-Duque, I.M. Thornton: Explicit Mechanisms Do Not Account for Implicit Localization and Identification of Change: An Empirical Reply to Mitroff et al. (2002), Journal of Experimental Psychology: Human Perception and Performance, 2003, 29(5), 846-858. 9. M.A. Goodale, D.A. Westwood: An evolving view of duplex vision: separate but interacting cortical pathways for perception and action. Current Opinion in Neurobiology, 2004, 14, 1-9. 10. O. Gruber, T. Goschke: Executive control emerging from dynamic interactions between brain systems mediating language, working memory and attentional processes. Acta Psychologica, 2004, 115, 105-121. 11. K. Imanaka, I. Kita, K. Suzuki: Effects of nonconscious perception on motor response. Human Movement Science, 2002, 21, 541-461. 12. S. Keele, R. Ivry, U. Mayr, E. Hazeltine, H. Heuer: The Cognitive and Neural Architecture of Sequence Representation, Psychological Review, 2003, 110(2), 316-339. 13. W. Kunde, J. Hoffmann, P. Zellmann: The impact of anticipated action effects on action planning. Acta Psychologica, 2002, 109, 137-155. 14. S. Le, D. Cardebat, K. Boulanouar, M-A. Henaff, F. Michel, D. Milner, C. Dijkerman, M. Puel, J-F. Desimonet: Seeing, since childhood, without ventral stream: a behavioural study. Brain, 2002, 125, 58-74. 15. T. Metzinger, V. Gallese: The emergence of a shared action ontology: Building blocks for a theory. Consciousness and Cognition, 2003, 12, 549-571. 16. E. Nagy, G.F. Potts, K. Lovelan: Sex-related ERP differences in deviance detection. International Journal ofPsychophysiology, 2003, 48(3), 285-292. 17. S.K. Pal, L. Polkowski, A. Skowron: Rough-Neural Computing: Techniques for Computing with Words, Series in Cognitive Technologies, Springer-Verlag, Heidelberg, 2004. 18. R. Passingham, I. Toni: Contrasting the Dorsal and Ventral Visual Systems: Guidance of Movement versus Decision Making, Neurolmage, 2001, 14, S125-S131. 19. Z. Pawlak: Rough Sets. Theoretical aspects of reasoning about data. Kluver Academic Publishers, Dordrecht, 1991. 20. K.M. Petersson, C. Forkstam, M. Ingvar: Artificial syntactic violations activates Broca's region. Cognitive Science, 2004, 28, 383-407. 21. T. Regier, L. Carlson: Grounding Spatial Language in Perception: An Empirical and Computational Investigation, Journal of Experimental Psychology: General, 2002, 130(2), 273-298. 22. K. Sycara: personnal communication, MSRAS, Plock, 2004. 23. D. Waltz, S. Kasif: On Reasoning from Data, ACM Computing Surveys, 1995, 27(3), 356359. 24. A.W.A. Van Genmiert, G.P. Van Galen: Auditory stress effects on preparation and execution of graphical aiming: A test of the neuromotor noise concept. Acta Psychologica, 1998,98,81-101.

40

Rough Mereology as a Language for a Minimalist Mobile Robot's Eenvironment Description Lech Polkowski^'^ and Adam Szmigielski^* ^ Polish-Japanese Institute of Information Technology Warsaw, Poland ^ Department of Mathematics and Computer Science Univ. Wannia and Mazury, Olsztyn, Poland {polkow,aszmigie}@pjwstk. edu.pl

Summary. Rough Mereology is a paradigm for approximate reasoning proposed by Polkowski and Skowron in 1994. The basic primitive notion of Rough Mereology is the notion of a part to a degree and the functor returning this value is called the rough inclusion. Rough Mereology has been shown to be a veryflexibletool in problems of Approximate Reasoning including reasoning in many-agent systems, spatial reasoning, granulation of knowledge problems, computing with words paradigm, among others. Here, it is applied to provide a language aimed at describing the mobile robot environment in case of a minimalist mobile robot equipped with an omnidirectional sonar and navigating in the environment of a sonar GPS. Mereological constructs are applied toward definitions of geometric notions and navigation algorithms are based on these geometric notions. Some results of experiments with the real robot Pioneer 2DX are reported showing the applicability of rough mereology in mobile robot navigation tasks.

40.1 Introduction Space considerations play a very important role in tasks of mobile robot navigation and control. It stems from a rather obvious fact that in order to effectively navigate in an environment the mobile robot either should possess a knowledge about the environment that may be encoded within the robot as a map of the environment [1], or as a database of objects to be recognized - facilitating a transition from perception to action [7]. Other possibility is to engage a robot in interaction with the environment and to build spatial constructs from sensory readings directly without resorting to a dedicated planner or reasoning center. In this way it seems one implements the embodied intelligence idea of Brooks [3]. *The simulation and real experiments results presented here formed a part of this author's PhD dissertation, of which thefirstauthor was the supervisor, defended at the Institute of Automation and Computer Science of the Department of Electronics and Information Techniques of the Warsaw University of Technology in May, 2004.

510

Lech Polkowski and Adam Szmigielski

Spatial models of a mobile robot environment should be effective in the sense of robot control complexity but also they should have sufficient expressive power in the sense of human understanding. While standard quantitative models are very well developed and powerful, any precise description cannot itself illuminate expressions like "near", "front of" etc. Qualitative models are very useful in mobile robot navigation, but they may fail when we try to locate the robot with linguistic expressions like "near", "between", or when we want to express intention of robot movement like "move further away from" or "turn a little left". In implementing a program that would lead to a robot controlled by natural language phrases as mentioned above, one has to translate first phrases of language into constructs related to spatial properties of the robot environment and then to relate those properties to specific configurations of the sensory readings. For these reasons, we have selected Rough Mereology as a language for approximate reasoning to be used in constructing a set of notions that a robot may use to relate itself to its environment. In order to stress the method and its relative strength, we have decided on a minimalist robot, equipped only with an omnidirectional sonar as its only sensor for detecting environment objects and interacting with the environment by means of communicating with a network of sonar receivers by means of a sonar emitter. As a result of the interaction and sensory abilities, two circles are created: one centered at the robot of radius equal to the distance to the nearest obstacle, a boundary to a region called the collision-free region, and the other centered at a receiver, delimiting a region called the receiver region. The two regions intersect allowing for a degree of partial containment measured as the ratio of the area of the intersection of the regions to the area of the larger region. The numerical value obtained in this way is used in definitions of geometric notions like "between", "near to", etc., in terms of which algorithms for robot navigation are written down. In what follows, we outline the basic notions of mereology and rough mereology, we describe the idea of the reasoning process devised for our robot, we present the description of the physical system of the robot, and finally we present some results of experiments.

40.2 Mereology Mereology is a theory of sets that - in contrast to the classical, or naive, set theory based on the primitive notion of an element - assumes the relation of being a part as its primitive notion, cf.[9]. The primitive notion of mereology is a predicate (relation) TT of being a part. Given a universe of entities, TT does satisfy the following requirements, (PRTl) it is not true that there exists an entity x with the property that XTTX; (PRT2) if x-Ky and yirz then XTTZ.

40 Rough Mereology in Mobile Robotics

511

meaning that the relation n is non-reflexive and transitive, i.e., it is a pre-order relation on the given universe. (!) In order to relate this non-standard notion to the familiar notions of naive set theory, we may observe that the relation C of strict containment is a part relation on any family of sets. The notion of an element el, can be defined as xely ^ xny V x = y. Then the el relation is a partial order on the universe of entities, in particular: xelx for any entity X.

40.3 Rough Mereology Lesniewski's mereology can be extended by adding -a predicate of being a part to a degree r, denoted by /ir, where r e [0,1], see [10]. The formula xfirV can be read: "x is a part ofy to a degree r'\ We recall the set of axioms the predicate /i^ should satisfy. These axioms stem from our intuitive understanding of the nature of partial containment. •

•

• •

RMl x^iy <=> xely, this means that being a part to the degree 1 is equivalent to being an element in the mereological sense. (!) In the model where part relation is strict containment of sets C, (RMl) does mean that the region (set) x is included in the region y either in strict way or both regions are equal. RM2 xfiiy =^ Wz{zfirX => z^ry\ meaning the monotonicity property: any region z is a part of the region y to degree not smaller than that to which 2: is a part of X, if only x is an element of y. RM3 (x = y A xfirz) => yfJ^r^^ this means that the identity of individuals is a congruence with respect to p. RM4 {xprV A s < r) => xpsV, establishes the meaning of "a part to degree at least r". It means that if a region is a part to some degree, then it is also a part to any smaller degree.

Rough mereology can make the bridges between the qualitative and quantitative spatial description by the numerical determination of region-based relations. We recall the idea of Pawlak and Skowron [8] of a rough membership function. For two non-empty sets of objects in an information table, x, y, let Mxd/) = 1 ^ , \y\

(40.1)

where |x| stands for the cardinality of x. Obviously, the formula extends to the continuous case with Ixl the area of x.

512

Lech Polkowski and Adam Szmigielski

40.4 Rough-Mereological Geometry System Predicates /x^ may be regarded as weak distance functions in the context of geometry. From this point of view we may apply // in order to define basic notions of rough-mereological geometry. We recall a notion of distance ^r in rough mereological universe [9], K(X, y) = r <=> r = min{max{u, w : xjiuV A yfJ^w^}}.

(40.2)

Rough mereological distance, or shorter mereo-distance, is defined in an opposite way to the traditional distance - the smaller r, the greater distance. Using formula (40.2) the predicate Nrr, to be nearer, proposed by van Benthem [2] can by described as zNrr{x,y)

4=^ {i^{z,x) > n{z,y)).

(40.3)

zNrr{x, y) means that the region z is closer to the region x than to the region y. If neither the region z is closer to the region x than to the region y nor the region z is closer to the region y than to the region x, then both regions x, y are at the same mereo-distance to the region z. Formally we can define equi-distance as a predicate Eq{x,y), zEq{x, y) ^ {-^{xNrr{z, y)) A ->{yNrr{z, x))). (40.4) In our system, in which regions are disks, equidistance is the collection of disks whose centers lie on the straight line perpendicular to the segment linking centers of the disks x and y and crossing it at the middle. We may adopt also other predicates , e.g., the predicate to be between, Btw [2], defined as, zBtw{x, y) ^ \/u{u = zW xNrr{z, u) V yNrr{z, u)).

(40.5)

Elementary calculations show that the mereo-distance of two disks x, y is minimal (i.e., maximal as to the value) in the case when the radii of both disks are equal and either center lies on the boundary of the other disk; then, 9

/^

/^(x, y) = - - —- = max. O

(40.6)

ZTT

With the use of this minimal distance, we define the predicate near that signals that the object (robot) x is reaching its target (receiver) y, X near y ^ 3u[uelx A K,{U, y) = max].

(40.7)

40 Rough Mereology in Mobile Robotics

513

40.5 System description Our navigation system consists of one, omnidirectional ultrasonic sensor, to measure the distance to the nearest obstacle, located on the robot and a set of receivers placed in the robot environment. Correct placement of receivers is important for proper description of topology of the environment. Robot has also an ultrasonic emitter to dialogue with receiver(s). As the moving platform, the mobile robot Pioneer P2DX with a differential drive was used. Our robot has two degrees of freedom - it can go forward and rotate. Consequentiy, movement control is limited to two control parameters - rotation angle a and the length of movement s. Measurement interpretation The set of measurements consists of the radius of non-collision region d and radii of receiver regions h^h, The configuration of the non-collision region and the receiver region is shown in Figure 40.1.

Fig. 40.1. Configuration of non-collision region and receiver region Let us denote the non-collision region as Rs - the disk{S, d), the receiver region as Ro - the disk{OJ). The point S is always located on circle{0,l), so mereodistance KSO of these two regions can be calculated using formula (40.2) as,

. ,\RsnRo\ \RsnRo\. Kso = m m { — — - — , \rCo\

•}.

,.^«, (40.8)

\^S\

The rough mereological distance depends not only on the distance / from the transmitter to the receiver but also on the radius d of non-collision region. From linguistic to spatial description If we define spatial relations within mereology we can also treat them semantically [5]. In this way complementary interpretations of defined relations are introduced. From one point of view we can understand them as semantical (psychological), from the other one as a description of the physical space. Rough mereology can deal with real world data to specify defined relations numerically. This duality can make the bridges between qualitative and quantitative spatial description.

514

Lech Polkowski and Adam Szmigielski natural language spatial expressions (eg. near, far) ^ rough-mereological spatial relations (eg. near, far) ^ physical space (numerically described) Fig. 40.2. Translation scheme from linguistic to physical space description

The conversion scheme of language expression to low level description via rough mereology is shown in Fig. 40.2. The major difficulty is to distinguish the set of spatial relations useful for a specific task, which formalized some spatial intuitions. Navigation system Architecture of our navigation system is shown in Fig. 40.3. The system works in feedback control loop. The controller uses rough mereological space description in its reasoning.

CONTROL GOAL (in natural language)

cb'ntr"QTIe"r"f CONTROL

GHifiiii;

GOAL kas mereological relation)!

SONAR SYSTEMS SPATIAL REASONING (mereo-geometry description) 4»!-rot.angl^ -path PATA INTERPRETATIOJf (in mereology context)

ROBOT moving platform Omnidirectional sonaJ-

Fig. 40.3. Architecture of navigation system The set value of controller (the control goal), in opposition to classical regulators, is an expression of natural language.

40.6 Realization of robot tasks In all robot navigation tasks, we will use the scheme presented in Fig. 40.2. Following that scheme, the first step is to express our intention of robot movement in

40 Rough Mereology in Mobile Robotics

515

natural language. After that we should find the mereological relation to describe this intention. The last step is a numerical description, necessary in robot control. The main problem is to specify the relation, that should express our intention. 40.6.1 Coming near to the receiver region In robot navigation, we may want the robot to come closer to the receiver region. It is easy to observe, that for region configuration presented in Fig. 40.1 both regions have the smallest mereo-distance if and only if they have the same area (equal to I — - ^ « 0,391). From control point of view, every region, included in the noncollision region is also a non-collision region. If there exists a non-collision region X, whose distance to a receiver region y is minimal, then the non-collision region x is near to the receiver region y, in symbols x near y. The intention of movement, •

GOAL OF CONTROL: move the robot near to the receiver region

can be replaced by a formula •

MEREOLOGICAL DESCRIPTION: X neary.

The goal of control can be obtained by finding the proper robot orientation and continuing the movement. The algorithm of control is based on a set of rules that have been induced from an analysis of the robot movements. We denote with the symbols K(O, j), K(O, S) , K(O, p), respectively, mereo-distances between regions (disks) of radii d centered, respectively at points G,S,D in Fig, 1, and the receiver region at O. Clearly, K{O,J) < K{O,S) < K{O,P). The symbol K will denote the distance between the non-collision and the receiver regions after the movement is made. Table 40.1 presents basic rules determining the rotation angle a of the robot as a function of the distance K. Table 40.1. Rotation angle as a function of mereo-distance mereo-distance center distance angle of rotation 1-d 1 1+d

K(O,P) K{O, S)

i^{oJ)

0 2 TT

The form of the function K ^^ aof which Table 40.1 is an approximation, is set as, a = arctan((K(o,p) -

K{OJ))

•

^ K ^ ) - ^

The direction of rotation is set by the decision rule.

^^ J

(4Q

9)

516

Lech Polkowski and Adam Szmigielski

if apresent > OLprevious then changc the direction of rotation,

(40.10)

where ^present ? ^previous denote the current, respectively, previous, value of the rotation angle. The algorithm stops when the non-collision region is greater then the receiver region. This criterion depends on robot's environment. When the robot moves in wide space it can stop at much greater metric distance to the receiver in comparison to the case when the robot moves in the bounded space, limited by walls (e.g., a comer). 40.6.2 Driving robot in straight line The intention of movement: •

GOAL OF CONTROL: move the robot in the straight line,

can be replaced by a mereological formula •

MEREOLOGICAL DESCRIPTION: x-KEql,

where Eql is the straight line, constrained from equation (40.4) - all regions (points) meetings requirements of equation (40.4) are in this straight line. 40.6.3 TViming the robot Turning the robot is a more advanced task of robot navigation, which can be reduced to simpler tasks. That process can be executed in three stages: 1. to go straight, 2. to reach the point of changing the navigated pair of receivers, 3. to go following the new navigating points. The first and third stages are tasks of 'going along the straight line'. The problem is to identify the moment of switching the pair of receivers. Line segments of navigating trajectory does not have to be perpendicular.

40.7 Experimental results Tasks of robot navigation, described previously, were simulated and tested in real world conditions. In real world experiments we used Pioneer P2DX robot, produced by Activmedia Corporation. In Fig. 40.4 a trajectory of turning the robot and coming near to receiver region is presented as exemplary, in case the robot was initially oriented at 90°.

40 Rough Mereology in Mobile Robotics

517

Fig. 40.4. Task of turning and coming near to the receiver region at initial orientation 90°

40.8 Conclusion In this paper we have presented a method for robot's environment description based on rough mereological approach based on using disks as primitive objects best suited to the robot sensory abilities (i.e.,omnidirectional sonar and a system of sonar receivers). A scheme for converting natural language spatial expressions into low-level physical descriptions and a dynamical control system using rough mereological spatial reasoning have been proposed and tested in real world conditions.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Arkin R (1998) Behavior Based Robotics. MIT Press, Cambridge MA van Benthem JFAK (1983) The Logic of Time. Reidel, Dortrecht Brooks RA (2002) Robot: The Future of Flesh& Machines. ePenguin Coventry KR (1996) Spatial preposition, functional relation and lexical specification.In: Olivier P, Gapp (eds) Representation and Processing of Spatial Expression. Lawrence Erlbaum Associates Srzednicki JTJ, Rickey VF (eds)(1984) Lesniewski's Systems Ontology and Mereology. Ossolineum, Wroclaw Lukasiewicz J, O zasadzie sprzecznosci u Arystotelesa. PWN, Warszawa Ming Xie (2003) Fundamentals of Robotics. Linking Perception to Action. World Scientific, New Jersey Pawlak Z, Skowron A (1994) Rough membership function. In: Yager RR, Fedrizzi M, Kacprzyk J (eds) Advances in Dempster-Schafer Theory of Evidence. Wiley, New York Polkowski L (200x), Mereological foundations to approximate reasoning,.This volume Polkowski L, Skowron A (1997) Rough mereology: A new paradigm for approximate reasoning. Intern. J. Approx. Reas. 15(4): 333-365 Szmigielski A (2002) Rough mereological localization and navigationIn:Proceedings RSCTC 2002 Rough Sets and CurrentTrends in Computing. Lecture Notes in AI 2475, Springer, Berlin

518

Lech Polkowski and Adam Szmigielski

12. Szmigielski A, Polkowski L (2003) Computing from words via rough mereology in mobile robot navigation. In: Proceedings IROS 2003, lEEE/JRS International Conference on Intelligent Robots and Systems. Las Vegas, USA

41 Data Acquisition in Robotics Krzysztof Luks^ Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 Warszawa, Poland kluks@ai . p j wstk. edu. p i

Summary. This paper presents data acquisition system for a humanoid head robot built at PJIIT Robotics and Multiagent Systems Laboratory. Real time operating system QNX Neutrino was used together with dedicated hardware driver to provide low overhead, high throughput data acquisition system. Driver architecture was determined by the environment it was operating in and software it interfaced with.

41.1 Introduction In many aplications in robotics one needs to gather data from sensors probing the outside envitonment. In typical cases sensors will range from CCD cameras to sonars and gyroscopes to simple temperature sensors. The need for data obtained by means of sensors arises from the necesity of converting sensory readings ("perceptions") to actions taken by robotic systems; this conforms to the metaphore of "linking perceptions to actions" [1]. In all cases the results of sensor's measurement has to be converted into digital form and transferred via some kind of interface to computer's memory. Only then can the processing software gain access to it. Some sensors have the analog to digital converter integrated or even be the converter itself (e.g. CCD matrix cameras) while others need separate converting element. The system may also need to provide feedback link to anable applications to send control commands to hardware components. Possible commands include controlling camera focus, picture resolution or data sampling frequency. Some of the sensors are more "intelligent" than others and require less sophisticated controling software. E.g. cameras can implement picture adjutment algorithms, such as white balancing or auto-focus, in hardware significantly reducing processing power required to acquite satisfying quility images. Difficulty in getting sensor data to the computer depends mainly on the nature of the data - it's amount and transfer speed. In a case of streaming high resolution video special care has to be taken to reduce transfer overhead to minimum so that it won't use up all the system resources. A single camera with 512 by 512 pixel image with

520

KrzysztofLuks

WORLD

Sensor

Converter

Interface

Application

Memory

Driver

Fig. 41.1. Generic sensing information flow model, based on [1]

256 intensity levels per pixel generating 30 images a second produces 24 megabytes of data every second [2]. Also system environment in which data processing application will run has to be taken into an account. With many processes running on the same computer system proper care needs to be taken to ensure uninterrupted and timely execution of all programs. Failure to meet this requirement may lead to data loss which in consequence can cause damage to robot, it's malfunction or poise a threat to people or property. In the most basic case data acquisition system will consist of a simple sensor integrated with analog to digital converter connected via a serial port to, possibly embedded, dedicated computer system. Good example of such setup is temperature sensor from which only few bytes of data is being read every few seconds. These issues can be addressed by using a hard realtime operating system which will ensure that time constraints are met.

41.2 Concepts of realtime computing There are many definitions of realtime computing. Quoting the comp.realtime newsgroup FAQ one can define realtime system as one in which the correctness of the computations not only depends upon the logical correctness of the computation but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred. Traditionally, realtime operating systems have been used in "mission-critical" environments requiring realtime capability, where failure to perform computations in a certain time frame can result in harm to persons or property. Such systems include for example medical equipment or industrial process monitoring. Recently, however, another field of application of realtime computing have become popular. Systems where failure to meet time constraints results in financial penalty or considerable loss of quality of service. Such systems include consumer

41 Data Acquisition in Robotics

521

multimedia devices where dropped video frames exceeding certain amount make such device unacceptable to the customer. 41.2.1 Non realtime operating systems The key characteristic that separates an RTOS from a conventional OS is the predictability needed to meet above requirements above. Conventional (monolithic) OS uses "fair" process and thread scheduling algorithms. This doesn't guarantee that realtime threads finish their processing on time. Also, priority information is, in most cases, lost during kernel calls being performed on behalf of a client thread. This results in unpredictable delays preventing an activity from completing on time. 41.2.2 Realtime microkernel architecture The microkernel architecture used in QNX RTOS leaves only the most basic tasks to be performed by OS kernel. This task includes managing threads and processes and passing messages between them. Scheduling for execution is done at per thread basis and high-priority threads can preempt lower-priority ones when they become ready for execution.

application device driver

micro kernel

message bus

Fig. 41.2. Microkernel message bus

All other functionality, such as device drivers and OS services exist as separate processes and don't run within the kernel. In QNX such processes are called resource managers and use IPC interface provided by the microkernel as a message

522

KrzysztofLuks

bus to exchange data (fig. 41.2). This provides complete network transparency as the microkernel automaticaly recognises if data transfre can be acomplished by simple memory copy operation or by use of local area network. Seperating device drivers from the core kernel has additional benefits of protecting the system from accidental memory corruption caused by badly written code. Also all processing is done at a priority determined by the thread on whose behalf they are operating [3]. 41.2.3 QNX Resource Managers Resource managers register a pesudo-file element in pathname space (e.g. /dev/nudaq). Such pseudo-files can be accessed by standard POSIX I/O functions like open ( ) , r e a d ( ) , w r i t e ( ) . When this happens, the resource manager receives an open request, followed by read and write requests. Resource managers not only deal with hardware devices, but can also provide functionality like filesystem interfaces. They are not restricted to handling just open(), readO and write() calls but can support any functions that are based on a file descriptor or file pointer, as well as other forms of IPC. In QNX Neutrino, resource managers are responsible for presenting an interface to various types of devices. In other operating systems, the managing of actual hardware devices (e.g. serial ports, parallel ports, network cards, and disk drives) or virtual devices (e.g. /dev/null, a network filesystem, and pseudo-ttys), is associated with device drivers. But unlike device drivers, the Neutrino resource managers execute as processes separate from the kernel.

41.3 Data acquisition subsystem in project Paladyn

Hardware Layer

Software Layer

Image Stabilisation module

Gyroscope

Microphone

"'W

NuDaq9112 Sound Sources Separation Module

Microphone Resource Manager

Fig. 41.3. Data acquisition scheme in project Paladyn

41 Data Acquisition in Robotics

523

41.4 NuDaq 9112 The analog to digital converter used for data acquisition purposes in project Paladyn was NuDaq 9112 32 bit PCI card. It has 16 single-ended or 8 differential analog inputs and it's own FIFO buffer. The card supports sampling with frequency up to llOkHz. Conversion can be initiated by one of three sources: Software Trigger A single value conversion is performed when a 1 is written into NuDaq's STR register. This mode is suitable for low frequency conversion because of big CPU overhead imposed by subsequents writes to STR registry. Timer Pacer NuDaq card is equipped with 3 programmable counters that can be used to trigger conversion at fixed frequency. When used together with DMA data transfer this mode is suitable for high speed conversion with very low CPU usage. External Trigger External frequency generator can be used to synchronise NuDaq's conversion speed with external devices. 41.4.1 Data transfer modes When the conversion is complete and data is stored in card's internal buffer one of following modes is used to transfer data to computer's memory: Polling Used in conjunction with software trigger. The software must check the state of DRDY bit, which is set to 1 when data become available. Then it can read converted value from data registry. Interrupt driven transfer NuDaq 9112 can use hardware interrupts to send data to the PC. In this mode the card generates an interrupt each time the conversion is completed. One can set up an interrupt handler that will copy the data from card's buffer to PC memory. This mode is asynchronous. Direct Memory Access The card "pushes" data to pre-allocated buffer in computer's memory and notifies it by signalling an interrupt when the transfer is complete. This method uses double buffering technique that combined with DMA transfer reduces CPU usage to almost 0%. 41.4.2 Driver architecture The core element of NuDaq driver is double buffer holding data transferred by the card and making it available to other processes via shared memory. One half of the double buffer holds samples that are being transferred from the A/D converter while the other half holds samples that can be read by other processes. When the first half becomes full their roles are exchanged. During driver initialisation PCI bus is programmed with physical address of intermediate continuous buffer that is used as a temporary storage for data copied from NuDaq's FIFO buffer to main drive buffer (fig. 41.4). Each time the FIFO buffer fills NuDaq card orders PCI controller to copy data to the continuous buffer.

524

Krzysztof Luks

(nudaq)

A/D converter V

Contlnous buffer

r~ (resource manager)

<-

1 1 1 "

.

open()

*

readO

1

/dev/nudaq A

1

data

\r

Double buffer

Resource manager interface (/dev/nudaq) is used both for configuring the hardware and for retrieving data.

Fig. 41.4. Driver architecture using only resource manager interface DMA architecture requires that memory area, to which data is copied by PCI controller, is continuous and is located in first 16 megabytes of computer memory. Additionally buffer address passed to PCI controller must be physical memory address. Therefore additional intermediate buffer was added to comply with above requirements. 41.4.3 Driver operation During startup the driver performs initialisation of PCI bus and NuDaq card. Next all buffers are allocated and interrupt handler thread is spawned. This handler is responsible for copying data from intermediate buffer to double buffer. Then two files are registered in local namespace: /dev/PCI9112W0 and / d e v / s h m e m / n u d a q . The former is used by clients to send control conmiands the latter is an access point to shared memory area containing sampled data. Driver then awaits for incoming events. Main thread responds to opening of /dev/PCI9112W0 file, while the interrupt handler thread sleeps and is awaken when NuDaq's FIFO fills up and PCI controller finishes transferring this data to intermediate buffer. It then copies the data to appropriate half of the double buffer and checks if active half needs to be changed. In such event the driver changes beginning of shared memory area address to address of currently active half of double buffer.

41 Data Acquisition in Robotics l ~ (resource manager)

(nudaq);

A/D converter y

Continous buffer

y

525

<

1

1

1

1 /dev/nudaq

1

configurat

1

command

[

r

]

1 memcpyO

J

1 data

Double buffer (shared memory)

Only configuration requests are passed through /dev/nudaq. Data is read from shared nnemory segment pointing to filled half of driver's double buffer.

Fig. 41.5. Driver architecture using shared memory This way clients only deal with single buffer available through shared memory access point / d e v / s h m e m / n u d a q . The client library offers function AI_AsyncDblBufferHa I f Ready, which checks if current half of double buffer is ready to be read. It uses resource manager /dev/PCI9112W0 interface to complete this task. If this function returns non zero values the client can copy data from shared memory region to its local buffer.

41.5 Applications NuDaq board controlled by described driver was successfully used in project Paladyn for reading data from microphones and gyroscopes. The board was set up to sample data from two microphones at 44 kHz frequency and at 2 kHz frequency from piezoelectric gyroscope. Sampled data was read by two running concurrently applications: image stabilisation module and sound separation engine. They both run wait for data become available for them and copy samples from shared memory buffer to their local address space.

526

Krzysztof Luks

41.6 Conclusions Thanks to flexible nature of QNX microkernel architecture it was possible to build stable and reliable driver. Also retaining source-level compatibility with Linux client library was possible. The driver was tested in project Paladyn where it was used to obtain sound samples from microphones and data from gyroscopes and accelerometers.

References 1. Xie M (2003) Fundamentals of robotics. Linking perception to action. World Scientific, New Jersey London Singapore Hong Kong 2. Arkin RC (1998) Behavior-based robotics. MIT Press, Cambridge London 3. QNX Software Systems 2003 QNX system documentation 4. Luks K (2003) System stabilizacji obrazu i akwizycji danych dla glowy robota humanoidalnego. MSc Thesis, Polish-Japanese Institute of Information Technology, Warsaw (in Polish)

42 Spatial Sound Localization for Humanoid Lech Blazejewski Polish-Japanese Institute of Information Technology PJIIT Robotics and Multiagent Systems Lab.,Koszykowa 86, 02-008 Warsaw, Poland [email protected]

Summary. The problem of sound source separation and localization is a challenging task not commonly addressed in today's humanoid robotics projects. Additional (prior to visual) spatial auditory information is required to control humanoid head's attention more accurately. This is done in order to augment human-robot interaction. Humanoid head robot operates in noisy, dynamic environment. System capable of handling random noise in order to separate sound sources, and localize them in robot's coordinate system is presented. Key words: humanoid robot, auditory localization, spectral analysis

42.1 Introduction The idea of embodiment lays beneath most of today's humanoid robotics projects. In order to achieve robot's intelligent behaviour and make it very human-like, it is necessary to make the same sensoric modality available to robot that is for humans. Embodiment could be the answer to the problem. It allows emulation of human senses as artificial sensoric systems. It is a good idea to start with the embodiment process when constructing an humanoid head, as it contains the greatest number of human senses. Robot's intelligent behaviour relates to it's interaction with the surrounding world. In this paper we discuss the problem of sound localization. Sound source separation and localization is an important task on the way of attention control. Some facts (such as talking person) happening in robot's surrounding could possibly be out of sight. With detection of significant sound source and the ability to localize it, robot could possibly turn its attention to the auditory event, saccading into new area. With the help of auditory localization system, robot's attention would be much more flexible, allowing the new modality of interaction. Humanoid head named „Paladyn" has been constructed in order to sustain human like sensoric systems - optokinetic and auditory modules. The auditory hardware consists of two microphones 18 cm apart, mounted on head's corpus. The model is assumed to be linear, as there is no any spherical casing - the sound can easely pass

528

Lech Blazejewski

the head interior, where the shortest possible way between the ears is approximated by straight line.

Fig. 42.1. The "Paladyn" humanoid head We will first discuss localization methods for the head spatial configuration. Then we'll emphasize on the application of the whole auditory module, explaining technical issues of the process of sound localization of "Paladyn" head.

42.2 Binaural localization methods In the late XlX'th century John William Strutt, also known as Lord Rayleigh formulated so called duplex theory. In this model he explained the process of human sound localization using classic wave physics. Duplex theory discusses sinusoidal sound source localized on the one side of the human head. The sound reaching the further ear seems to be delayed temporally and has lower amplitude. Strutt's theory defines two major localization cues: Interaural Time Difference (ITD) and Interaural Intensity Difference (IID). Typical human head ranges in 18 cm of diameter (the ear to ear distance, reflected in the robotic head construction). The model of the head is simplified to the two receivers separated by linear or spherical distance. Interaural Time Difference is derived stright forward from the phase difference of the waves perceived at the same time instance. When the sound source is located on the far side of the head the signal received in the closer ear is of different phase then perceived in the other ear at the same time. This process is valid for the low frequency sounds, where the wavelength is shorter than the interaural distance. In the case of higher frequency sounds there is growing ambiguity because the phase differences measured are amongst different wave cycles.

42 Spatial Sound Localization for Humanoid

529

For the sounds of wavelengths shorter that the interaural distance diffraction is very small. The waves are being scattered and suppressed creating an significant acustical shadow on the opposide side of the head. The difference in amplutude of sound received in both ears can reach up to 20dB. That sort of shadowing does not occur for the long sound waves for which the diffraction is strong. The sound localization cues are limited in accuracy as a function of frequency. For the head of linear structure, the ears - microphones are separated only by linear distance of 18cm, the ITD cue is useful up to the frequencies of J^ 1911Hz. Human percieves sounds of 20Hz to 20kHz range. For the linear head sound wave diffraction (as well as the validity of IID cue) starts from «1911Hz and is stronger with the higher frequencies. The conclusion is that the ITD and IID cues are complementary and used together enable us to localize sounds of all audible spectrum. The localization methods presented in duplex theory explain the horizontal localization processes only. Even that the IID and ITD cues are producing uncertain data, that could be falsely interpreted. Geometric analogy explains the problem. 42.2.1 ITD and sound localization ITD could be understood as a time of flight of the wave that travels the distance between the ears (microphones). That distance could be expressed as a distance difference (Jntaraural Distance Difference - IDD) of paths (Dieft, Dright) travelled by sound to both ears: (42.1)

IDD = Dieft - a^right S(x,y)

L(-f,0)

(0,0)

P(f.O)

Fig. 42.2. Geometry of linear head model

With the assumed constant speed of sound in the air Vsound ^ 344 ^^ IDD could be expressed as the path travelled by a sound wave in time ITD:

530

Lech Blazejewski IDD

(42.2)

= ITD * Vsound

The remaining problem is to define the relationship between the IDD and the sound source position in head's surrounding. The construction of our humanoid head „Paladyn" could be approximated as a linear in terms of interaural space, what leads to quite an simplification of the model: L and R are the left and right ear (microphone) separated by robot's head diameter B, where S represents the sound source. When all those parameters are known the IDD could be expressed as subtraction: IDD = \LS\ - \RS\ = \l{x+^y

- \l{^-^y

+ y^

(42-3)

The ITD cue itself isn't enough localize the source precisely. There is infinite number of pairs of paths A e / t , Dright resulting identical ITD value. In fact all those possible sources are located on the surface of hiperboloid, also called an cone of confusion.

Fig. 42.3. Hyperbola asymptotes approximate the sound source location

As we can see at the distance at least y the direction of sound source is unequivocal to the asymptote of the hiperbola. We need to rewrite the IDD equation in order to find the asymptote formula: X

a where

IDD'^

0

(42.4)

IDD{B^ - IDD^) (42.5) 4/DD2 The formulations of the asymptotes could be written as a function of variables a and /?: a

Al:y=

/3 =

\ -X A2:y = -\ -x (42.6) ya Ma The asymptote inclination over the x (binaural) axis is unequivocal to location of sound source and it is expressed as:

42 Spatial Sound Localization for Humanoid (f)Ai = arctan J -

(t)A2 = arctan f - J ^ |

531 (42.7)

This localization approach is unable to differentiate the front-back location of sound source. 42.2.2 IID and sound localization When the sound source is emiting wavelengths smaller then the diameter of the head the scattering of the wave occurs which produces significant sound suppresion and acoustical shadowing. That results in noticable amplitude difference (IID) in signals recorded in both ears. IID cue is directionally dependent. With the sound located on front of the head the IID is approximately equal to zero, with the source moving to the side of the head, one receiver acquires sound of weaker amplitude. IID is the measure of difference of acoustical power dissipated on the receivers, expressed in decibels: IDD = 20 log10

bright

Si eft

(42.8)

where Sright and 5/e/t are the signals or the signal groups acquired in right and left ear. The data processing is being done on discrete samples. The IID estimation could possibly be done on each individual pair of samples from left and right ear, and also on mean value of all the samples in left and right sampling window. IID based localization only assist ITD localization to resolve the left-right ambiguity.

5000 m

4000 I

I

t

•o 3000 c

I

^

I

2500 I

t

«

m iti 1111H11 itfi 1

O 1800 I

Fig. 42.4. Interaural Intensity Differences as a function of sound source location and intensity. Courtessy of C. J. Moore [2]

532

Lech Blazejewski

42.3 Binaural cues estimation The practical problem in implementation of the sound localization system is simplified to estimation of binaural cues: ITD and IID. The system solves the problem in two separate ways: the full spectral analysis and correlation based processing. The resulting ITD's and IID's are then concatenated in DPF (see diagram), giving enough knowledge to estimate the sound source position. przetwarzanie spektralne

Fig. 42.5. Auditory localization module diagram

42.3.1 A/D Conversion First the analog signals from the microphones are being A/D converted. We use the Nudaq PCI-9112 converter allowing the conversion speeds up to 55kHz per channel (when converting two channels) at 12bit resolution. Because of the card's DMA capability the hard real time coversion and analysis became possible. The choice for the conversion speed was an tradeoff between perceived frequency range, the time available for analysis and how precise the angular estimation of source could be. The human audible range lays between 20Hz and 20kHz. According to Nyquist law the sampling frequency should be 40kHz or higher. The practical choice was 44kHz. That choice fixed the sample length at 22.7/iS, what lead to the maximum theoretical ^ precision of localization (for the linear head model) set at 3.9°^ Another problem was the length of sampling window. Again it was a tradeoff between it's size and the hard real time requirements for the auditory localization system. The window of 2048 samples has been chosen, lasting for 46 ms and providing plausible quality. Auditory localization system works in cycles. Each cycle ^ when calculating ITD t hrough correlation of two signals, the minimal step of correlation is also the minimal possible difference computable. The size of minimal step (identical to the sample length) might be understood as resolution of the correlation process ^with the linear „Paladyn" head, 18 cm in diameter, the time that sound travels from one ear to the other is equal to 523)Lts. Taking the sample period T = 22.7/iS we divide ^j ^ 23 sections (an distingushible localization slices in 90 degree localization area), which leads to 3.9 degree localization precision

42 Spatial Sound Localization for Humanoid

533

Starts with A/D conversion which lasts for 46 ms, during that time all the computation (the estimation of ITD/IID and localization) has to be finished, until the new pack of sample data arrive. This enables the auditory system work at Q - ^ ^ 21 Hz - which complies with the real time strategy. 42.3.2 Discrete Fourier Transform All the localization analysis is being done in frequency domain. The next step of data processing is time-frequency domain shift using FFTW {Fastest Fourier Transform in the West) implementation of Fourier Transform. This process is biologically plausible, as the analogical process of frequency coding is found in human cohlea. To improve the resolution of the spectrogram we use the zero supplementation technique^. The resulting spectrogram resolution is ^^^ ~ lOHz. 42.3.3 Spectral processing Spectral based localization processing occurs in frequency domain. All the frequencies representing pure tones are being analzed. Tone extraction Tone extracion helps to avoid analysis of the spectral leakage peaks. It significantly lowers the computation time as the processing occurs only for the peaks fn commiting the condition:"^ fpeak ' fn SO that fn-1 < fn > /n+1 whcrc fn > treshold

(42.9)

Spectral ITD calculation Interaural Time Differencie is proportional to phase difference (IPD, Interaural Phase Difference of any given subband pair^. Using the Fourier transform form of interaural signal^ we calculate the IPD for every peak frequency in spectrum:

'^""••'='-•" (l|life§) -"'- (IHfes) '-••»' IPD is expressed in radians. The evalutaion of ITD^ is straightforward, given the frequency of the peak: IPD ITDf^... = ^i^:r(42.11) ^'^ Jpeak

^supplementing the 2048 widow samples with additional 2048 zeros "^being a local maximum peak which amlitude exceeding the given treshold ^by subband pair we understand any pair of peaks of interaural signals that comply to tone condition ^by ^ i rinteraural signal we think of two signals received at two, spacially separated ears ^ which is measured in seconds of delay

534

Lech Blazejewski

Spectral IID calculation For every peak frequency in analyzed time slice there is also a slight intensity difference (varying with the frequency, and the source position). The IID is expressed as a measure of momentary power difference:

where PUfpeak) = mSlifpeak)?

+ mSUfpeak)?

(42.13)

Pnifpeak) = mSniUeak)?

+ {^{SRifpeak))^

(42.14)

IID enforced ITD localization The IID cue is measured in decibels, and in a case of realtime system can only be used for left-right sound source differentiation^. For the frequencies up to ?=::il911Hz it is worth to calculate ITD (in case of linear model of „Paladyn" head). For the higher frequencies there is arising ambiguity, and only IID. remains as localization cue. Because calculating of IID and ITD is a computationally cheap process, processing is done for all the tonal peaks in the spectrum. With the known ITD, IID, interaural distance^ and the speed of sound in the air Vsound = 3 4 4 ^ the localization of peaks formula is derived from the hyperbola asymptote equation:

0^XM2 = 4-/ - arctan 1^^

( F . . . . , ./Ti^)3

j

^'^'''^

There is arising left-right and front-back localization ambiguity, because single ITD value reflects four possible positions of sound source. This could be avoided using IID cue. An IID is a measure of proportionality between left and rigt signal powers (as shown in equation 42.12. If Puft > Pright (signal is stronger from the left side), the value of IID would be in [0,1], when the situation is opposite, Pright > Pie ft, the values are between [1, oo].

^it is possible however to use the IID information to create a directional filtering function of the head. This localization methods are related to HRTF's (Head Related Transfer Functions) and thus comptutationally epensive and unresistant to random noise are of no use in realtime, real-world reactive auditory systems ^18 cm for „Pal adyn" head

42 Spatial Sound Localization for Humanoid

535

42.3.4 Correlation processing Still, even with localized tonal frequencies we need to refine sound sources. Each sound source consist of one or more tonal frequencies. A good way to extract major sources is to correlate two signals in time domain. We could understand the correlation process of two (left and right ear) signals as a similarity in the function of time delay r . Every peak in the correlatogram means higher similarity between signals at given delay (interpreted as ITD), and in fact the possible sound source angular position ^^ To simplify the process of detection of local maximums in the correlatogram the input signals are half-wave rectified. This means treating all the negative values of the time-domain signals as zero (so the redundant information enclosed in the signal is lost). Because the aim of correlation process is to extract the ITD, represented by r seconds, as read in correlatogram, it is appropriate to apply low-passfiltration(LPF) to the 20Hz-20kHz signal. As we stated the ITD cue is valid - in „Paladyn" humanoid head case - up to 1911Hz. The work of J. Blauert and W. Cobben proved 800Hz as optimal cutoff frequency. Because the major aim of humanoid head is to interact with it's environment, especially humans it should be able to localize sounds of frequencies of typical speech. The compromising value for cut-off frequency was threrefore set to 1500Hz. Weighted cross-correlation The perceived sound has been digitized, transformated, half-wave rectified and lowpass filtered. Still, there is a need to enhance the SNR {Signal to Noise Ratio) of the signal. To achieve more reliable ITD estimation through correlation of two signals the additional weighting is performed. The correlation known as SCOT {Smooth Coherence Transform is applied using correlated and autocorrelated signals:

^Q \/GLL{m)'

GRR{m)

Where the GLR and GLL^ GRR are respectively correlated and autocorelated signals in frequency domain: SUt) 0r SR{t) = FSl{m)FSR{m)

(42.17)

The problem of estimation of ITD is simplified to estimaton of maximum value of the correlatogram. ^^the accuracy of this method is limited by the resolution of the given discrete signal window

536

Lech Blazejewski

Direction-pass filtering The last filtering process that leads to distinguish separate sound sources and it's spatial localization is DPF (Direction Pass Filtering). In fact the DPF stands for comparing the sound sources detected by correlation with the localization of separate spectral tones.

DPF 1 1 1

zrodto 1 ITD^

Ir wcorr

zr6dto 2 max{...}.

ITD

1 !

lokalizacja i ITD 1

lokalizacja |_ ITD f

1

1

I—

1 1

komparator

1 sourcesjist 1

*

1

ITD subband table

1 1

1 1 1 1 1

Fig. 42.6. Direction Pass Filter First the DPF extracts local maxima in the weighted correlatogram (up to three as the laboratory testing shows). Then the area between the each local maximum and surrounding minima is called a separate source. The source has strong directionality at the ITD represented by maximum and weaker at falloffs. Then all the localized frequencies (found in spectral processing procedure) are compared with the refined sources. If at least one of the localized frequencies comply with the localized source the source is treated as valid and representative. Such information is then stored and used in the process of humanoid head behaviour.

42.4 Conclusion The system of binaural localization proved itself as functional and reliable to localize sound sources up to 30 degree accuracy (and up to 2 sources at a time) in noisy, everyday environment. As long it is not as accurate as human sound localization (which is fully spatial, and much more accurate, up to degrees in frontal plane) it's main goal - the control of visual attention - has been achieved. The auditory module itself, because of it's modular design is fully scalable and provides all the low-level audio data. This information can be used by any other auditory module for the humanoid head, without any redundant work. The first goal of the „Paladyn" project ~ to create an low level sensoric behaviours has been achieved.

42 Spatial Sound Localization for Humanoid

537

References 1. Blauert J (2001) Spatial Hearing. The Psychoacoustics of Human Sound Localization. MIT Press, Cambridge MA 2. Moore B (1999) Wprowadzenie do psychologii slyszenia. PWN, Warszawa 3. Hartmann W (1998) Signals, sound and sensation. Springer, Berlin Heidelberg New York 4. Lindsay H, Norman A (1984) Procesy przetwarzania informacji u czlowieka. Wprowadzenie do psychologii. PWN, Warszawa 5. Knapp C, Carter C (1976) The generalized correlation method for estimation of ime delay - IEEE transactions on acoustics, speech and signal processing. Vol. ASSP-24, No. 4:320-327 6. Natale L, Matta G, Sandini G (2002) Development of auditory - evoked reflexes: Visuo - acoustic cues integration in a binocular head - Robotics and Autonomous Systems 39:87-106 7. Irie R (1995) Robust sound localization: an application of an audtitory perception system for humanoid robot. MIT, Cambridge MA 8. Martin K (1995) A computational model of spatial hearing, MIT, Cambridge MA 9. Wasson G (1995) Using acoustic information to control visual attention, University of Virginia VR 10. Nakadai K, Okuno H, Kitano H (1998) A method of peak extraction and its evaluation for humanoid, Japan Science and Technology Corp. Tokyo 11. Blauert J, Cobben B (1978) Some consideration of binaural cross correlation analysis Acoustica 39:96-104 12. Yost W, Gourevitch (1987) Physical acoustics and measurements pertaining to directional hearing - Directional hearing 2:3-33, Springer-Verlag, Berlin Heidelberg New York 13. Feddersen W, Sandel D, Jeffress T (1957) Localization of high frequency tones - J. Acoust. Soc. Am. 29:988-911 14. Scassellati B (1999) A binocular, foveated active vision system, MIT, Cambridge MA 15. Murray D (1993) Design of stereo heads - Active Vision, MIT Press, Cambridge MA 16. Nakadai K, Lourens T, Okuno H (2000) Active audition for humanoid, Japan Science and Technology Corp, Tokyo 17. Nakadai K, Lourens T, Kitano H (2000) Expoiting auditory fovea in humanoid-human interaction, Japan Science and Technology Corp, Tokyo 18. Okuno H, Nakadai K, Kitano H (2002) Non-verbal eliza-like human behaviours in human-robot interaction through real time auditory abd visual multiple-talker tracking, Japan Science and Technology Corp, Tokyo 19. Nakadai K, Hidai K, Okuno H (2001) Real-time multiple speaker tracking by multi modal integration for mobile robots, Japan Science and Technology Corp, Tokyo 20. Okuno H, Nakadai K, Lourens T (2001) Separating three simultaneous speeches with two microphones by integrating auditory and visual processing, Japan Science and Technology Corp, Tokyo 21. Okuno H, Nakadai K, Lourens T (2000) Humanoid active audition system, Japan Science and Technology Corp, Tokyo 22. Nakadai K, Hidai K, Okuno H (2001) Real-time active human tracking by hierarchical integration of audition and vision, Japan Science and Technology Corp, Tokyo 23. Nakadai K, Matsui T, Okuno H (2001) Active audition system and humanoid exterior design, Japan Science and Technology Corp, Tokyo

43 Oculomotor Humanoid Active Vision System Piotr Kazmierczak Polish-Japanese Institute of Information Technology PJIIT Robotics and Multiagent Systems Laboratory, Koszykowa 86, 02-008 Warsaw, Poland [email protected]

Summary. This paper presents stereo active vision system for a humanoid head developed at PJIIT Robotics and Multiagent Systems Lab. The system is able to foveate on a salient object and then maintain that object in the focus of attention. The task decomposition was based on the developmental approach of human infants thus allowing step by step achievements of skills. Visual behaviors matching eye movements were implemented on a network of computers running real time system. Key words: humanoid robot, active vision, visual behaviors, tracking

43.1 Introduction The main motivation for creating humanoid robots is that human-like intelligence needs human-like interaction. Biologically inspired robots seem to be ideal platform for active vision and visual attention experiments. Interaction v^ith objects or humans are very important to learn behaviors. Vision plays an important role in directing attention to the object and gathering information [9]. Because w^e believe that the w^orld cannot be represented in symbolic forms, we have adopted alternative methodology [2] which emphasises embodiment and developmental approach to task decomposition thus allowing step by step achievements of skills. Developmental approach gives structured decomposition and provides gradual increase in complexity [7]. 43.1.1 Robot platform specification Humanoid head PALADYN consists of visual, auditory and inertial system. This paper outlines only visual characteristics of our robot. Readers interested in binaural localization and inertial systems are referred to other work. Our hardware configuration consist of a bunch of Pentium-IV class computers running QNX RTOS (controller, visual and auditory modules) as well as GNU/Linux (speech synthesis system). All of the machines communicate via 100Mbps Ethernet.

540

Piotr Kazmierczak

Front and side view of the 5 DoF robotic active vision platform capable of mimicking human eye movements. Each eye consists of a pair of cameras (wide and narrow field of view) corresponding to the peripheral (low res) and foveal (high res) areas of the retina.

Fig. 43.1. PALADYN, humanoid active vision head. Peripheral accessories include Galil DMC-1850 motion control card (mechanical DoF control), two AdLink NuDAQ-9112 data acquisition cards (sound sampling, accelerometers/gyro communication) and four Imagenation PXC-200AL framegrabbers (visual data acquisition). Mechanical overview Mechanical system of our robot was inspired by the physiology of humans. It is believed that millions of years of evolution resulted in the most adapted visual system to the environment where people live. Oculomotor control system of our robot mimics three basic voluntary movements (described in more detail in section 43.2). In order to express those movements our mechanical implementation characterises 5 degrees of freedom (right-left separately for each eyeball, up-down for both eyes, up-down for the neck and right-left rotation for the whole head). Camera system Each eyeball consists of a pair of color cameras: peripheral camera used primarily for motion detection in the field of view and high resolution foveal camera used for detailed analysis of the characteristics of the object. This configuration gives us enough ability to imitate oculomotor control and voluntary visual behaviors expressed by humans.

43 Oculomotor Humanoid Active Vision System

541

43.1.2 Software environment The whole system runs on a network of QNX computers. QNX was chosen because of its scalability, portability, memory protection and QNET transparent network communication. Visual behaviors and attention modules make extensive use of Intel's Open Computer Vision Library. OpenCV proved to be very useful in many areas (human-computer interaction systems, object identification, segmentation, face/gesture recognition and motion tracking).

43.2 Eye movements Ability to mimic human eye movements is important to keep an interesting object in the central field of view of the robot (fovea). This requires constant translation of visual information to appropriate motor commands such as change in position or velocity of the motor. Human eye movements are the combination of three basic voluntary movements [6] (saccade, smooth pursuit and vergence): O Saccade. Ballistic eye movement in response to stimuli or position error. As a result of execution target is placed in the fovea. Operates in open loop (no visual feedback during execution). ® Smooth pursuit. Following movement in response to target velocity error on the retina. Maintains moving object on the fovea. Operates in closed loop (continuous visual feedback). ® Vergence. Movement activated in response to relative image disparity on both retinas. Adjusts the eyes for viewing objects at varying depth. and two involuntary movements [6] (VOR and OKN): ® Vestibulo-ocular Reflex. Stabilises the eyes during rapid head motions. Compensates head rotation through the stimulation of the semi-circular canals and the otolith organs located in the inner ear. Maintains direction of gaze by counterrotating the eyes. ® Opto-kinetic Nystagmus. Stabilises image on the retina. Measures optical flow on the retina (for motions slower than VOR). Following by the eyes of any large moving pattern. All of the voluntary eye movements were implemented on humanoid robot PALADYN. Movements of the eyes consist of saccades and smooth pursuit operating alternately. Vergence and smooth pursuit operate in parallel. Visual behaviors performed by voluntary movements include locating, fixation, wandering and tracking of targets. One of the most important algorithms used by the components of the visual system is normalized cross-correlation. The idea is to find the image region of / that is most similar (best matches) to the given template T. Normalization of the images makes it additionally less prone to errors resulting from luminance change

542

Piotr Kazmierczak

during camera movements. Given the input image / (raw camera image) with W X H and a template image T with w x h, the resulting image has dimensions {W — w -\-l) X {H — h-{-1) where values in each location (x, y) characterise the similarity between the T and the part of the input image [rectangle with the top-left comer at {x,y) and right-bottom comer at {x -\- w — l^y -{- h — 1)]. Similarity is calculated as follows: h-1

C{x,y)

w-1

y'=Ox'=0 h—1 w—l

(43.1) h—1 w—1

where / (x, y) is the pixel value of the searched location (x, y) and T (x, y) is the value of the template pixel in the location (x, y). Last step is to find the location for which the resulting value of the image pixel is the highest. Distance between the maximum location and the center of the image / gives the displacement. 43.2.1 Saccade to stimuli Saccades are extremely fast (up to ^ ^ ) eye movements that focus the selected target on the fovea (high res. area of the retina). One of the characteristic features of this type of movement is that the precise distance from the center of the image to the location of the target is known before saccade execution. In case of an artificial system, executing a saccade means moving eye motors so that the resulting distance from the target to the FoV center is 0. One method would require translation of pixel values (distance x, y) to the respective offset values for each axis. Because of nonlinearities resulting from the distortion of wide angle lens (peripheral cameras), simple mapping between the distance on the image plane and motor offsets necessary to precisely foveate is not satisfactory. Marjanovic [5] presented solution to this problem. Our system adapted his method and learned to saccade to salient stimuli decreasing the error displacement resulting from the nonlinearities of the lens. Initial step requires construction of interpolated saccade maps that contain precalculated offset values for appropriate areas of the image. Each part of the input image of size 8 x 8 corresponds to one cell in the double-channel saccade matrix. Each channel contains motor offset values for x and y respectively. Active area of the input image is delimited by the rectangle

where T, / , w, h is template, input image, width and height respectively.

43 Oculomotor Humanoid Active Vision System

543

Standard size of images used by all of the visual modules is 384 x 288 (which is exactly I of the PAL resolution: 768 x 576). Interpolated saccade map for that resolution is 48 X 36. At first system defines constant factor for a single pixel (corresponding to the encoder offset that moves eye by one pixel). Knowing factors for each axis (x and y determined during self-calibration) each matrix cell becomes initialized by the distance from the center of the matrix (24 x 18) to the cell coordinates (x, y) expressed in encoder offset values. Once at the beginning saccade matrix gets initialized with default values determined during self-calibration. For each column y in each row x of the matrix, every point px^y holds distance from the center of the matrix:

where w, h are respectively the number of columns and rows in the matrix. For each cell X, y new value gets defined: Ox = (PxFm)

Fp]

Oy = {pyFm)

Ft,

(43.4)

where o is the new value of the cell, p: coordinates of the new point (after conversion), Fm'- interpolation factor (here 8) and Fp, Ft: pan and tilt constant factors for a single pixel respectively. For each learning trial or target point, visual target coordinates get converted as follows: Px,y = ^Fm(43.5) Image patch surrounding target point gets converted to grayscale and saved in T for future use. Robot attempts to saccade to target location using map estimates in px^y. The central patch of the center of the new image gets converted to grayscale and saved in C for future use. Normalized cross-correlation is performed on T and C. If the maximum value of the resulting image is beyond given threshold, then the object is declared as lost. If the error value (displacement of T in relation to C) is greater than 0 for any axis, then the offset value corresponding to the target point in the matrix gets updated: Ox=Ox-^ {exFp);

Oy = Oy - {syFt),

(43.6)

where o is offset, e: error, Fp, Fm'- pan and tilt factors respectively. 43.2.2 Smooth pursuit tracking Smooth pursuit is one of the slowest voluntary eye movements. The purpose of this movement is to keep moving object in the center of the fovea. In artificial visual system such as humanoid head PALADYN, smooth pursuit means constantly keeping moving object in the field of view of the narrow (high resolution) camera which enables the system to process image without the risk of losing valuable information. Smooth pursuit gets activated automatically after saccade to the object. Pursuit will continue for slow motions (below ^ ^ ) and as long as the object stays in the field

544

Piotr Kazmierczak

[safora saecade: saving tAsgat {>atch

~ j [After aaecadie: coixaiating against target |

LEFT: saccade internals: before and after execution of the move (cross-correlation). RIGHT: interactive training program (user defined or randomly chosen target location). Fig. 43.2. Saccade mechanism

^

eg e f f s « Xiitosc, LA

E

Off£« vMrtosc, T

J

Wy,UJr»r«.

p;7TT.T*T^7o7roW«Lp«M w f ^ ^ ^

1

I°l'"::''.^''l' !:*"/1-''""-"L"!:'*J!1" i*"i

l;!!r^!U^°:!'J :!!'^.;U?:!iJlJ:fr"tL

l.|

| w.;;;..rj z.un» j z^^.,., j z.,.» |.y9»„c.

"" J

LEFT: saccade map after initialization with values obtained during self-calibration. RIGHT: saccade map being dynamically updated during learning trials. Fig. 43.3. Saccade map viewer

of view. Losing tracking objects triggers saccade to the most salient object in the activation map (see section 43.3.5). At first (right after the saccade) central patch (50 x 50) of the current image gets converted to grayscale and saved for future use. For each consecutive frame normalized cross-correlation is run on the saved template and the central patch of the current image (100 x 100). Distance x, y from the maximum value in the resulting image to the center of the current image multiplied by CLCC^LJI^X}^ ^^^{Li?|T} gives new values for motor velocities and acceleration (ace: acceleration, vel: velocity, LR, T: left-right and tilt DoF respectively). Smooth pursuit stops as soon as the maximum value in the resulting image is below constant 0.95. This is treated as if the object was lost which naturally triggers saccade to new target (possibly lost one if it is moving). Because of natural distortions in the tracking object (scale changes, occlusions etc.) in some cases tracking template must be updated. Each time the maximum value of the resulting image drops below 0.98 template gets updated by the current central

43 Oculomotor Humanoid Active Vision System

545

patch of the image (50 x 50). Other less computationally intensive methods of tracking based on the optical flow are currently under heavy investigation. In the nearest future Kalman-like predictor for non-linear systems will be integrated. ^^^^^^^^S^^^H^^^^^^^^^Pp^

;1

h"

H/

m

f—r 1 l^.^™-™^.^

™,.

•••^••Hl

[ During ptitsttit: measuring
I B«£or« pursuit: saving central j ^ t c h

j

J.

[ p s s j i..:;;riF;';^„i.i..»u.«;n..»».sn

LEFT: following of the moving object (motor velocity update based on disparity). RIGHT: smooth pursuit at work (real time visualisation of the pursuit algorithm).

Fig. 43.4. Smooth pursuit mechanism

43.2.3 Binocular vergence Vergence is the slowest voluntary eye movement. By measuring image disparity between both retinas it adjusts the eyes for viewing objects at varying depth. Other depth perception methods include stereovision, but these have much higher requirements and rarely work in real time configurations. Vergence is important for object manipulation, figure-ground separation and collision avoidance. Adopted approach to disparity measurement is very simple. Vergence module makes use of normalized cross-correlation algorithm to find regions of the image that match the template. The idea that lays behind the vergence is the assumption that by utilising saccades and smooth pursuit, tracking object always stays in the center of the dominant eye. Thus measuring horizontal disparity between both images (from both peripheral cameras) is realised by measuring disparity from the center of the template that matched best in the non-dominant eye and the center of the image. The resulting value (x axis) multiplied by constant factors (accR, VCIR) drive velocity and acceleration of the non-dominant eye. Vergence and smooth pursuit running in parallel allow real time tracking of moving objects in three dimensional space. Future plans include re-implementation and integration of zero-disparityfilter[4].

546

Piotr Kazmierczak

[-|>9»-^ijteai4j>«nt "aye; ,»»vi»g c a o t r a l ' p » i ; ^

] [ OonUtoant »ye? mfttouy-ih^ ,v0pg»"ne» egyor

]

LEFT: disparity error drives acceleration and velocity of the non-donninant eye motor. RIGHT: estinnating horizontal disparity between image templates from both cameras. Fig. 43.5. Vergence mechanism

43.3 Pre-attentive selection of stimuli It is impossible to process every single information received from the environment. Attention acts as a mechanism of early selection of relevant stimuli thus reducing computational requirements of the system. The most common view on human attention is two-level processing: pre and postattentive. Pre-attentive (parallel) processing works beyond conscious mind and is fully automatic. This level of attention was implemented on our robot. Post-attentive (serial) processing is available to conscious inspection and planning. This level of processing is beyond the scope of the project. 43.3.1 Motion saliency Motion is one of the most important stimuli allowing humans and other biological systems to extract moving objects from the background. Orienting head towards objects in motion is very important for proper human-robot interaction. Even insignificant motions like unconscious blink can be distinguished by robotic system and trigger saccade to that salient stimuli. Keeping eye contact also influence the way people treat and interact with robots resulting in a more natural interaction. Motion saliency map is based on the difference between two consecutive frames received by the same camera. Every incoming frame is converted to grayscale, downsampled and saved in a ring of frame buffers. Every incoming frame / (at | size of the PAL resolution) is converted to grayscale and down-sampled: output image O requires down-sampling input image / with prior gaussian smoothing (structuring element size for morphological operations: 5x5): O.

-f 1;

OH

+ 1.

For each new frame O motion saliency map M is constructed as follows:

(43.7)

43 Oculomotor Humanoid Active Vision System

547

M{x,y) = \Ot{x,y) - Ot-i{x,y)\,

(43.8)

where Ot and Ot-i is current and previous down-sampled image respectively. 43.3.2 Habituation One of the distinguishing features of human visual system is the ability to dynamically change the object of attention. Infants respond strongly to novel stimuli, but soon habituate and respond less as familiarity increases [3]. Saccade places object of interest at the central area of the image plane (at the hires narrow FoV camera corresponding to fovea) thus we can assume that the object on which robot is fixating is located in the fovea. Simple mechanism of habituation was adopted based on the previous work of Scassellati (2001) and Breazeal (2000) [8, 1]. Following saccade central areas of the habituation map receive maximum values. The purpose of this is to keep interesting object in the center of the image right after the movement. If the object remains static and none of the motors are running, saliency values begin to decrease over time approaching minimum after a few seconds. This prevents fixation on the central object eventually resulting in ballistic saccade to new salient stimuli located at the periphery of the image. Figure 43.6 shows habituation mechanism at work.

\ ^^^^^^^B^ / lljgjgiich

_

II ___{i 0 — . Babitttatlon ua£> «:han
]

|s*SWl)>«4,j

ZikolKt

glfc»>«i S Ji-!V«'y<«ia«» p

0 |j09M"«

C \'iff»if*

D1

i. )

LEFT: internals of the habituation mechanism, changes of the map values over time. RIGHT: building activation map combining habituation, motion and sound saliency. Fig. 43.6. Habituation mechanism

43.3.3 Sound localization Although not directly connected with visual attention, information from binaural sound localization system can influence robot's actions by providing additional information about the environment surrounding the robot.

548

Piotr Kazmierczak

Often moving objects generate sound which combined with other salient stimuli can greatly improve accuracy of saccade activation. Saliency map for sound source localization uses information such as signal strength S (value in range of 0 — 255), signal location L (value in range of 0 — 383 - signal must come from the source that is actually visible) and the accuracy level P (where lower value means higher accuracy). PALADYN is capable of localizing sound source in one dimensional space thus each sound information represents a bar on a saliency map. A bar of width 2P and value S is placed in each location L. Values decrease gradually from the center of the bar at S to each border where they approach 0. Example saliency map for sound localization is presented in Figure 43.7.

I Two sound gow:c«8 W i ^ varying atgeiigtfa

| | thya* »Q«nd sources {one possibly moving)

LEFT: bars on the image represent sound source locations and strength of the signal. RIGHT: activation map combining motion and sound saliency.

Fig. 43.7. Sound saliency map

43.3.4 Other saliency maps There were attempts to integrate robotic system with other types of saliency maps such as depth perception (stereovision algorithm), color (including skin color feature map), and face recognition. Due to very high computational requirements those maps were not fully integrated at the current stage. 43.3.5 Building activation map Each saliency map represents distribution of specific stimuli across the visual field (motion, sound, habituation, other cues). Attention process combines saliency maps to produce activation map. Each saliency map is represented as a grayscale image where values represent the amount of saliency (from no stimulus to maximum stimuli: 0 — 255). Constructing activation map consists of summation of every single saliency map S (with respect to the weight a) with the actual activation map A:

43 Oculomotor Humanoid Active Vision System

A = S-a + A'{l-a)-\-l,

549

(43.9)

After summation, the resulting activation map is segmented and binary thresholded. Area and centroid for each segment in the resulting image is calculated. Centroid of the biggest segment becomes the new saccade target.

43.4 Conclusions and future work System presented in this paper makes use of biologically inspired voluntary oculomotor behaviors such as saccades, smooth pursuit and vergence. Visual behaviors operating cooperatively perform real time 3D tracking of objects. Future plans include full integration of non-voluntary movements (gaze stabilisation), optical flow, Kalman-like estimator and zero-disparity filter as well as additional saliency maps for depth (stereovision), skin and color.

43.5 Acknowledgements This work was based on the MSc thesis of the author. The author wishes to thank prof, dr hab. Lech Polkowski (project supervisor) and other members of The PALADYN Group for their great support, enthusiasm and contribution: | Piotr Ciesielski Lech Blazejewski, Sebastian Pawlak and Krzysztof Luks.

References 1. Breazeal, C.L. (2000) Sociable Machines: Expressive Social Exchange Between Humans and Robots. PhD thesis, Massachusetts Institute of Technology 2. Brooks, R.A. (1991) Intelligence without representation. Artificial Intelligence, 47:139160 3. Carey, S. and Gelman, R. (1991) The Epigenesis of Mind. Lawrence Erlbaum Associates, Hillsdale, NJ 4. Coombs, D. and Brown, CM. (1993) Real-Time Binocular Smooth-Pursuit. /JCV, 11 (2): 147-165 5. Marjanovic, M., Scassellati, B. and Williamson, M. (1996) Self-taught Visually-Guided Pointing for a Humanoid Robot. In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pages 35-44 6. Robinson, D. (1968) The Oculomotor Control System: A Review. Proceedings of the IEEE, 56 (6) 7. Scassellati, B. (1998) Building Behaviors Developmentally: A New Formalism. AAAI Sprint Symposium'98, AAAI Press 8. Scassellati, B. (2001) Foundations for a Theory of Mind for a Humanoid Robot. PhD thesis, Massachusetts Institute of Technology 9. Yamato, J. (1998) Tracking moving object by Stereo Vision Head with Vergence for Humanoid Robot. Master's thesis, Massachusetts Institute of Technology

44 Crisis Management via Agent-based Simulation Grzegorz Dobrowolski and Edward Nawarecki Institute of Computer Science, AGH University of Science and Technology, Krakow, Poland [email protected]

Summary. Based on original definitions of: agent, system, activity, resources a model of critical situations in multi-agent systems is proposed. Next, the model is used to introduce an idea of monitoring sub-system which plays an important role in two sketched architectures of critical situations management for virtual and real cases of multi-agent systems applications.

44.1 Introduction The contribution deals with a class of intelligent decentralized systems that meet the agent paradigm [1,2]. There can be systems that both are designed from scratch as multi-agent ones (operating in the virtual world, e.g. network information services, virtual enterprises) and function in the reality as a set of cooperating autonomous subsystems of whatever origin (e.g. transportation systems, industrial complexes). Such systems (virtual as well as real) are marked by possibility of arising critical situations that can be caused by both outer (e.g. undesirable interference or the forces of nature) and inner (e.g. resource deficit, local damages) factors. Generally, crisis is interpreted here as threat of loss (partial or complete) of the system functionality. Difficulties of crisis identification, evaluation of possible effects and prevention (anti-crisis) actions come from general features of multi-agent systems (autonomy of the agent's decisions, lack of global information) as well as their dynamics (consequences appear after an operation in unpredictable manner) [3, 4]. Intention of the reported work is to propose a formalization of an agent and multiagent system description that can serve as a base for analysis of the system operation, especially in critical situations. The formalization allows, among others, for specification of how the analyzed system can be monitored and, in consequence, for creation of a simulation model [5, 6] of its behavior in the face of a particular crisis. Results of the simulation studies are the scenarios of the crisis progress. Investigation of the scenarios leads to finding a strategy of avoiding the crisis or, at least, reducing its effects. Considerations are carried out at the appropriate level of generality so as it could be possible to adjust them to specificity of the particular application.

552

Grzegorz Dobrowolski and Edward Nawarecki

The article is organized as follows. The first part (sections 44.2-44.4) is devoted for presentation of basic issues of multi-agent systems. They comprise definitions of: agent, system, activity, resources, and so on. Section 44.5 gives a proposed model of the critical situation with discussion of its most important aspects, e.g. a need and function of a monitoring sub-system. Then, Section 44.6 introduces some assumptions and architecture solutions of how to manage with the defined critical situations in two general types of multi-agent systems mentioned above.

44.2 Models of an Agent and System A presented beneath model of an agent (see also [7]) is constructed according to the black-box schema — a part of a domain is taken out and constitutes a system through specification of all interaction observed. In this way the model effectively reflects mainly the agent's ability to interact with his neighborhood and other agents (the features that are a base for a multi-agent system creation) but also in general defines his internal (abstract) architecture. The approach leads to adoption of the following assumptions crucial to the model. 1. Excluding physical impact on an agent, the rest of agent-neighborhood interface is fully controlled by the agent himself. 2. An agent operates in a discrete manner. His activity is a finite sequence of actions (elementary) performed by him. 3. An agent decides on the sequence in the sense of both actions to do and moments of time of their initiation. 4. The basic mechanism of the agent's model is sequential initiation of actions, called further the mechanism of choice. The assumptions seem to be enough general to encompass possible algorithms and implementation techniques of artificial agents as well as to describe presence of human beings in multi-agent systems. Definition 1 Agent A is a three-tuple of the form: A = {A,

S , FcSxAxS}

(44.1)

where: A finite set of actions (elementary) of agent A; S finite set of internal states of agent A; F three-element relation describing permitted succession of states and actions of agent A — in the given state the agent can perform an action (the second element) that leads him to a new state (the third element). Sustaining of cause-effect conjunction requires relation F to have the following feature: (5,a,5i) eF ^ (5,a,S2) G F => si = S2 (44.2)

44 Crisis Management via Agent-based Simulation

553

Relation F reflects the possible combinations of states and actions. Each state implies a subset of actions allowed. Description 1 Let As describe a set of actions of agent A admissible in state s As = {aeA

: ( s , a , # ) € F } , A.FeA

(44.3)

no matter which states they lead to. Assigning and performing of a particular action from As is an effect of the choice mechanism of an agent (an elementary action can be performed several times). Now let us focus our attention on the external characteristics of an agent i.e., how he behaves, not decides about actions. Definition 2 Let mapping f of the form: Si-^i = faM)

= 3 (si.ai.Si^i)

eF

: F eA

(44A)

denote performing action ai G A and changing the agent's state. To emphasize cause-effect conjunction, the appropriate states are indexed with natural numbers. Definition 3 Manifold but finite application of mapping / , represented in the formula beneath by operator 0, is called activity of agent A: ifaj

<8) faj+i

<8) • • • 0 / a j = /{a,-,a, + i,...,afe}

•' Oj, ^ ^ + 1 , • • • , Ofc G ^ G TI (44.5)

The agent's activity can be denoted also as a sequence of chosen and performed elementary actions of him. Having the above definition of an agent, let us show how such agents can be linked to form a multi-agent system. Definition 4 Multi-agent system Q is a two-tuple of the form: n

= {K}i=i,...,n;/}

(44.6)

It consists of a finite set of agents and relation called the interaction relation:

I =

A' X A^ D P^ 3 {a\a^)

\J

P^

; A' £ A' ; A^ € A^ ; A\A^

e H

Relation I is symmetric: {a^^a^) G I -^ {a^^a^) G / . The interaction relation describes potential interactions (cooperation) among the agents of the system. Interactions are realized when the appropriate actions are performed by the agents. None of the action can be performed independently (separately). Consequently, activity of a multi-agent system can be defined as a composition of activities of its members.

554

Grzegorz Dobrowolski and Edward Nawarecki

Definition 5 Activity of multi-agent system Q is a two-tuple of the form: {X,

^n}

(44.7)

where X is a quotient set of the union of agent's activities by the extended interaction relation:

X ={ U

{ai:aieA^})/I

i=l,...,iV

A^

eA^]A^en

and >:Q C X X X is a quotient relation of the union ofpreceding relations ^AJ in the agent's activity by the extended interaction relation: tr2 = (

U

^A^)/I

: A^ eQ

Definition 5 indicates also possibile graphic representation of the activity of a multi-agent system. It may be represented by a directed graph {X ^ "^Q } of the nodes given by the equivalence classes in X — the simultaneous agents' actions and the arcs that join, conformably to relation >:Q^ actions directly following each other. An example of such a graph^ is shown in figure 44.1.

^8

^«-i

c

Fig. 44.1. Activity of a multi-agent system. Ovals indicate interactions between agents.

44.3 Resources in Multi-agent Systems Introduction of resources into the model of a multi-agent system allows for description of phenomena of consumption of various materials or goods coming from outside of the system, production and deposition of such resources in the neighborhood (environment) as well as internal exchange of them among agents of the system. ^For the sake of simplicity some labels and indexes are neglected.

44 Crisis Management via Agent-based Simulation

555

The basic assumption is here that changes of a resource scheme are caused by the agents as a consequence of performing actions. Incorporation of the resources into the model is simultaneously done at three levels: the level of a single agent that consumes and produces them, of a multi-agent system where production and consumption accompany cooperation actions and of a joined sub-system (environment) that is to model balance of the resources in order to reflect their interchange among the agents during cooperation and flows to and from the neighborhood. Definition 6 Agent with resources AR is an extension of tuple A of the agent from definition 1: AR = {A, RA^TAC AX RA XJR} (44.8) where: RA finite set of resource names relating to the agent's actions A; TA three-element relation describing for each action which resources (second element) and of what quantity (third element, IR - real numbers) are consumed (negative values) or produced (positive values) as an effect of its performing. In the same way we can define a multi-agent system with resources. Aggregate QR is augmented with a finite set of resource names relating to the system RQ and analogous relation TQ. A minimal structure of the environment sub-system for modelling problems with resources is a set of balanced equations reflecting their exchange. For a given activity of an agent or multi-agent system, exploiting respective definitions, a system of balance equations for the considered resources can be constructed. In the case of a single agent AR3 A and its activity {a^ : a^ G A} the equations are: ^ z ^ - h 2 ; ° = 0 : {ai,r,Zr)eTAeAR,

WreR

(44.9)

where components {z^ e TRjreR completing the equations reflect the agent's neighborhood. The balance equations for activity of a multi-agent system {X , >:Q} encompass all actions of both the agents and their cooperations. The balance equations are written under assumptions, generally justified, that the resources intervened in a multi-agent system are additive (their quantities can be added) and proposed balances have physical interpretation.

44.4 Goals and Strategies of a Multi-agent System Due to the essence of existence and creation of multi-agent systems for almost each of them there are: •

the global goal and, possibly, some secondary goals (connected with the global one), i.e. while the particular system has been built;

556 •

Grzegorz Dobrowolski and Edward Nawarecki local (individual) goals of the agents representing their autonomy, that can be contradictory mutually or with respect to the global one.

Respectively, there are strategies that applied in the system or by its agents are to cause attaining the goals. Attaining a local goal (and thus realization of a local strategy) means the particular activity (sequence of actions) appropriately chosen by the choice mechanism of an agent. It can be described in the following way. Definition 7 Agent with strategy Au is an extension of aggregate A (def. 1) of the form: Au = {A, {us}ses} (44.10) where set {us}ses called the agent's strategy is a family of mappings (choice functions) such that: \/seS

3us:ADAs-^aeAs;

A,SeA

(44.11)

If all sets of admissible actions in the agent's states are of single element (y s e S card(i4s) = 1; A,S e A), then such agent is called reactive. Its strategy is obviously trivial. The agent's decision making process is only sketched here. Deeper consideration can be carried out if detailed description of the agent's state and the rest of elements from definition 7 is done. For the sake of explanation, let us assume that information possessed by an agent is a component of his state. During activity this information is modified. Based on performed actions, it can be observed from the outside as evolution of the choice mechanism. If the agent's state can be evaluated, the utility theory may be a core of the mechanism. Because the global goal is imposed externally on a system, its description as well as the corresponding strategy ought to be introduced independently of realization of the local strategies. The only way to do it is exploiting the notion of system activity. Definition 8 A strategy of multi-agent system Q is choice function UQ of the form: UQ : V —^ V{V)

(44.12)

where V is a set of all possible activities of the system (V{V) -family of subsets of V), The corresponding (chosen) subset V is called the global goal. Although it is not assumed in the above definition, function UQ ought to choose not empty and not too big subsets of activities. The strategy is often defined in the form of assumptions with respect to the given evaluation of the system activity. Two possibilities can be mentioned here as examples. The first one is a request the system to attain the specified final state (strategy is formed by all activities that lead to that state). The second possibility comes from description of the system activities via balance equations of resources. Then, formulation of a strategy is similar to an optimization problem, e.g. operate with minimal consumption of energy.

44 Crisis Management via Agent-based Simulation

557

Let us discuss now the question of implementation of both kinds of strategies. Contrary to the global strategy local ones are directly built into the agents. The general assumption about rationality of agents guarantees realization of their strategies. Implementation of the global strategy is very difficult mainly because of autonomy of agents. Definition 8 just characterizes the goal not showing how to achieve it. This is yet another formulation of the general problem of multi-agent systems, namely how to organize them.

44.5 Modeling a Critical Situation Let us take the following general description of critical situations in multi-agent systems as a point of departure. A critical situation is recognized as a particular state or sequence of states that violate or lead to violation of the global as well as local (the agents') goals of a system. Thus critical situations can be local (concerning a single agent) and global (involving not only all but also a group of agents). Arising of a local crisis may entail a global one in the future, but functional abilities of a system very often allow avoiding the consequences at the global level. Such phenomenon straight results from the basic features of multi-agent systems. One may say that some anti-crisis mechanisms (in the above sense) are already incorporated. On the contrary, threat of a global crisis usually requires especially invented mechanisms. A crisis among a group of agents is treated here as a global one because of similar way of the state description and the obvious fact that such crisis must emerge with respect to a partial or side- goal of a system. The above characteristics allow to define general conditions of management of critical situations: • • •

possibility of observation (monitoring) of the system state based on observation of the agents' states individually, adoption of the adequate ways of evaluation of a state in order to achieve operational criteria of critical situations recognition, availability of appropriate anti-crisis mechanisms.

Degree of realization of the above postulates can be regarded as a determinant of the system immunity against a crisis. As it has been signalled, a flexible by nature multiagent system has some elements of the mechanisms implemented either as parts of the agent's algorithms or in a way of communication or organization of a system (or a sub-system). Let us discuss the conditions firstly for the case of local critical situations. In the obvious way an agent monitors his state as well as evaluates it on his own. In state s he determines set As. Significant reduction of the set can be an indication of the crisis. If the agent must consider actions like "do nothing" or self-destruction, it is not only the indication but a kind of remedy also. Although application of the agent's strategy in the state Ug is oriented towards a decision, it is also the evaluation of the state. If some ranking of the actions is

558

Grzegorz Dobrowolski and Edward Nawarecki

prepared according to the utility coefficients, its values can be used for formulation of a crisis criterion. Then decline in utility can be regarded as a sign of a crisis. Finally, if the both mechanisms occur to be insufficient, choice function Ug can be augmented with an element intended for monitoring of crises and causing activation of built-in especially anti-crisis actions. The similar analysis with respect to global critical situations is a bit harder. This is because of a problem with determining the state of a multi-agent system. The state can be easily defined as composition of the agent's states but its calculation is usually operationally impossible. Such situation comes from the following features of a system. • • •

There are no strong enough synchronization mechanism to determine simultaneity of the agent's states. The system state is highly multi-dimensional so that the high cost of information acquisition should be taken into account. The agents are autonomous. They usually intend to disclose only as much information as it is necessary for the system operation.

In a general case it is assumed that agent j reveals just a sub-space of his state s*^ or some evaluation of its state v^{s^). Restriction of the state is accepted as a report while the evaluation is regarded as subjective. Of course, interpretation of the above information is known around the system. It is worthwhile to mention here that a state comprises also information about history so that the evaluation can have dynamic character. Putting all descriptions of the agent's state together possibly in a single place and regarding them as simultaneous is the only way to construct a description of the state of the whole system. Let us assume for further discussion that a monitoring sub-system operates and the following specific evaluation of the state of a multi-agent system is possible. v^=.vo{s*',...,v^{s^),...)

: A^ef2

(44.13)

The evaluation — similarly as in the case of a single agent — can be oriented towards critical situation. Adoption of a special shape of the evaluation functions and the appropriate definition of subsets of their values opens possibility of specialized tracking of the system states. For example, the values can be done by linguistic values: normal, preferred, danger, crisis. In its simplest form tracking can be just memorization of monitoring data and introduction of them into an evaluation procedure. Following the pointed earlier ways of definition of the global goal, two kinds of critical situations can be introduced: direct and indirect. The direct means the threat of loosing operability of the system in consequence of unavailability of the some agents' actions. The primary cause of an indirect critical situation is a lack of resources (violation of the appropriate balance) that, in turn, gives deficit of functionality. Detection of both kinds can be done by the monitoring sub-system based on individual evaluations pointed out lost of functionality or observation of distribution of some resource crucial to the agent's or system activity.

44 Crisis Management via Agent-based Simulation

559

44.6 Structure of Management of Critical Situations Particular causes of crises are various. It depends on both application of multi-agent systems as well as technical solutions used so that it seems hard or even pointless to search for not only an universal but detailed manner of management. Let us focus our attention on two characteristic cases that can be applied in several variants. They will be deepened with some implementation solutions. As it was already said the problem of management of critical situation encompasses two important sub-problems: monitoring of a system state and management itself. In the case when agents operate in a virtual environment (computer system) a problem of gathering information about a system state is a standard problem of computer science. Only amount of data can be essential to chosen implementation solutions. However, the open question is how to recognize critical situations and deal with them. An algorithm of the management block may follow different ideas. Two of them are worth to point at: • •

management based on previously elaborated patterns of critical situations, on-line management via forcing into particular reactions (changes) in a multiagent system.

r

^ <—o

Monitor V

J

MAS iL

•

Patterns

Fig. 44.2. Management Structure for the Case of Virtual System A general diagram for the discussed case is done in figure 44.2. A multi-agent system (MAS) is observed by a monitoring sub-system (Monit) that feeds with data a management block using patterns. Such structure is proper when — among others — the system is designed for computation (softcomputing) or agent-based simulation — is a model of some real system applied in the off-line mode (e.g. transportation net, production system). The problem of monitoring becomes much complicated when agents of the observed system have strong autonomy i.e. they can not want to be observed or direct observation is difficult for various reasons. Such situation arises in open heterogeneous information systems (access to information may be limited because of lack of sharing ontology) or in real-life systems monitored in the on-line mode (e.g. direct observation of traffic, trade or production complexes, monitoring of the environment).

560

Grzegorz Dobrowolski and Edward Nawarecki

In the presence of difficulties in reproduction of the agents' states and restricted means of interference in their activities, it is justified to extend the diagram of figure 44.2 with a new block that is designed to fill the multi-agent model with real data (see fig. 44.3). Then there are two monitoring sub-systems: rMonitor, vMoni-

vMonitor

vMAS

rMonitor

^----o

•'•

rMAS

Management

Patterns

Fig. 44.3. Management Structure for the Case of Real System tor. The former serves as deliverer of information about the real multi-agent system (rMAS) while the latter is mainly used for aggregation of data generated by the model (vMAS). Functions of the management block are also enriched. Besides the discussed control functions with respect to (vMAS) oriented to study on variants of anti-crisis reactions, there are also means for enforcement (or initialization of it) in real-life conditions (rMAS) on application of the elaborated variants. It is necessary to mention that the structures of figures 44.2 and 44.3 depict only functionalities proposed in the approach. They do not have to be separated in the shape of blocks (parts) in the concrete implementation. Particularly, monitoring subsystem vMonit may be a sort of an overlay on the agent model. It may be even regarded as an environment of vMAS. Other implementation solution may be proposed that consists in extension of a society of agents. Groups of agents of the especially oriented functions are introduced into rMAS (sometimes into vMAS as well): • •

monitoring agents that observe behavior of the original agents (e.g. by tapping them), preventing agents that realize anti-crisis policy by interference in the environment or activity of the original agents (e.g. by prohibition of some of their actions).

Current research activities of Institute of Computer Science AGH are directed towards realization of the blocks of figures 44.2 and 44.3. In particular, agent-based simulators of production complexes and transport services, pilot versions of monitoring systems are realized. A reader is referred here to INFOCAST system [8] as the most mature example. Simultaneously, studies on management algorithms in critical situations are also carried out.

44 Crisis Management via Agent-based Simulation

561

44,7 Summary Consideration of the article has been concerned with the application of the agent approach to the problem of management of critical situations. The approach has been discussed in two aspects: •

•

the conduct in menace to operation of the multi-agent system that is, in the case, of virtual nature (e.g. information or softcomputing systems realized using agent technology); the use of a multi-agent system for decision making (or supporting) in critical situation that (can) happen in the real world (e.g. in communication or power systems, environment [ecological])

Although procedures for both cases are different, approach to analysis and tools — especially software ones — used for management of crisis occur to be similar. They encompass: • • •

a formal description of a multi-agent system (virtual or real) allowing building of the simulation model; a system (or systems) that monitors operation of a multi-agent system under consideration; a decision block using results of monitoring or simulation and moreover be able to enforce elaborated anti-crisis strategies.

The proposed variants of utilization of the above elements do not take into consideration all nuances of possible applications but are a good point of departure for further research in the field that seems to be of great theoretical importance as well.

References 1. Weiss G (ed) (1999) Multiagent Systems: A Modem Approach to Distributed Artificial Intelligence. The MIT Press 2. Ferber J(1999) Multi-Agent Systems. Addison-Wesley 3. Wu S, Soo V (1999) Risk control in multi-agent coordination by negotiation with a trusted third party. In: Thomas D (ed) Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Voll). Morgan Kaufmann Publishers 500-505 4. Collins J, Tsvetovas M, Sundareswara R, van Tonder J, Gini M, Mobasher B (1999) Evaluating risk:flexibilityand feasibility in multi-agent contracting. In: Etzioni O, Muller J P, Bradshaw J M (eds) Proceedings of the Third International Conference on Autonomous Agents (Agents'99). Seattle, WA, USA, ACM Press 350-351 5. Uhrmacher A M, Gugler K (2000) Distributed, parallel simulation of multiple, deliberative agents. In: Bruce D, Donatiello L, Turner S (eds) Procedings of the 14th Workshop on Parallel and Distributed Simulation (PADSOO). Los Alamitos, CA, IEEE Press 101-110 6. Zhao Z, Belleman R G, van Albada G D, Sloot P M A (2002) Scenario switches and state updates in an agent-based solution to constructing interactive simulation systems. In: Proceedings of the Communication Networks and Distributed Systems Modeling and Simulation Conference (CNDS 2002) 3-10

562

Grzegorz Dobrowolski and Edward Nawarecki

7. Dobrowolski G, Nawarecki E (2001) Multi-agent system in a decentralized control problem. In: Binder Z (ed) Management and Control of Production and Logistics 2000. PERGAMON 445-450 8. Kluska-Nawarecka S, Dobrowolski G, Marcjan R (2001) INFOCAST - A system for quality control procedures and diagnosis of casting defects. Acta Metallurgica Slovaca 7:441^46

45

Monitoring in Multi-Agent Systems: Two Perspectives'*^ Marek Kisiel-Dorohinicki Institute of Computer Science AGH University of Science and Technology, Krakow, Poland [email protected] Summary. The subject of the paper is the discussion of monitoring issues in multi-agent systems, and particularly, the infrastructure that should be built into an agent platform, so that its use would not bring much trouble to the designer. In the course of the paper two cases are considered depending on the kind of core MAS infrastructure applied: informationintensive and computational systems. For each case some general remarks on how to build a monitoring subsystem are presented as well as a more detailed description of a particular prototype realisation is given.

45.1 Introduction It seems obvious that implementation and deployment of computer systems must be supported by tools allowing to check whether they are working properly. Of course, this is also true for multi-agent systems, yet in this case the problem is of vast importance not only for the (human) designer or even a user, but also for the agents — the components of the system. Since the cooperation of autonomous agents depends on their proper behaviour, this should be verified as much as possible by the partners before proceeding with the identified strategy of interaction. In both cases we need some mechanism that allow for detection and diagnosis of possible failures (critical situations) considering the goal of the whole MAS or a group of cooperating agents (a team) to be achieved. This is only possible assuming on-line monitoring of the states and behaviour of interacting agents and drawing inmiediate conclusions on the possibility of the emergence of crisis. It depends on the particular application and design policy applied, whether these conclusions are to be drawn by the user, some external software tools, or by the agents themselves — but this problem is out of the scope of this paper. In complex multi-agent environments the problem of efficient monitoring of often distributed and heterogeneous agents is not trivial, and many concepts of the monitoring policy were already proposed (e.g. [5, 3]). Yet most papers leave out the *This work was partially sponsored by State Committee for Scientific Research (KBN) grant no. 4 TllC 027 22

564

Marek Kisiel-Dorohinicki

implementation problems, or show only some preliminary solutions strongly related to a particular platform, which gives no indication how to deal with the problem in a more common way. Furthermore, some tools provide system monitoring support for (human) users, but not for the agents of the system (e.g. [4]). This paper aims at filling this gap introducing a general model of a monitoring subsystem for MAS based on the concept of monitoring services. The essence of the approach is to allow for on-line processing of required (ordered) information via autonomous monitoring agents, coordinated by a local authority (manager) that also plays role of a directory of monitoring resources for both agents and external clients (and thus is called monitoring services provider). The paper is organised as follows. After a short review of the considered subtypes of MAS and their infrastructure in section 45.2, the concept of monitoring services and a general structure of a monitoring subsystem are presented in section 45.3. Sections 45.4 and 45.5 discuss in more detail the monitoring infrastructure for the cases of information-intensive and computational MAS respectively, together with their prototype implementations.

45.2 The infrastructure of multi-agent systems Today the term multi-agent system covers a variety of different systems. In the course of the paper we will concentrate on two extreme cases: information-intensive systems consist of cognitive (reasoning), socially aware, and service-oriented (client-server model) agents, often designed to work in a distributed heterogeneous and open environment (the global network), and utilising

host

platform

Fig. 45.1. The structure of information-intensive MAS

45 Monitoring in Multi-Agent Systems: Two Perspectives

565

reach (knowledge oriented) inter-agent (point-to-point) communication with sophisticated collaboration schemata (fig. 45.1); computational systems consist of a relatively large number of reactive, Hghtweight (computationally simple) agents, designed to work in parallel but rather not network-aware, thus often utilising simple broadcasting communication (fig. 45.2). Typical applications of the so-called information-intensive MAS include: information searching and gathering, knowledge acquisition and management, user assistance, etc. The systems in the second group essentially differ from these. Thus it was proposed to call such systems mass (or massively) multi-agent systems [7]. Their field of application can be mainly: •

•

simulation of problems that are distinguished due to their granularity in the sense of existence of a large number of similar objects that manifest some kind of autonomy, and soft computing restricted to problems that can be reduced to searching some space for elements of the given features; objects similar to those mentioned above are used as a means for the search.

infrastructure

aswO 1<«>

agem^ / (^agenT)

^gb) <^> c ^

Fig. 45.2. The structure of computational M4S The main difference between these types of multi-agent systems is the implementation technology and, in consequence, the relative load of infrastructure. Infrastructure may be understood here as everything in the system that is not an agent (e.g. conmiunication facilities or registration and directory services). In an informationintensive MAS there is relatively lightweight infrastructure, which means that most of the software (code) is concentrated in agents. This also means that the infrastructure knows a little about the states of agents. On the contrary, in a computational MAS agents have very simple and often the same structure, and so the infrastructure is relatively heavy and, which is of vast importance in this context, knows much

566

Marek Kisiel-Dorohinicki

about the agents' states. Nevertheless, it must be stressed that absolute load of infrastructure may be much higher in an information-intensive MAS, because of the complexity of the implementation environment and the needs of the technology used.

45.3 Monitoring infrastructure and services For both information-intensive and computational MAS building effective and efficient monitoring mechanisms is not an easy task. This is mainly because of the number and variety of agents that produce huge amount of data, which quickly become out-of-date. What is more, often information concerning groups of interacting agents or resulting from longer observation of agents activity is indispensable. Also problems of distribution and heterogeneity of agents are of vast importance, especially in open environments. In this context the proposed solution assumes local on-line processing of required information via monitoring services, available both to the agents and external software tools via dedicated interaction protocols. Monitoring services should allow for obtaining the information about: • • • •

physical structure of MAS (agents present in a given location) — if not provided by the platform, logical structure of MAS (classes of agents, their properties) — if available in a particular application, actual state of a given agent (node, team), changes of the state of a given agent (node, team) — this is a subscription rather then a single inquiry.

monitoring agents

CO CD data acquisition i

i

i

i

i

i

o^rtS'-O^

monitoring service provider

I

client (e.g. GUI)

I

domain-specific agents

Fig. 45.3. A general structure of a monitoring subsystem for MAS A general structure of a monitoring subsystem for MAS is presented in fig. 45.3. The acquisition and processing of required information is actually realized by monitoring agents of two kinds: 1. domain-specific agents with built-in monitoring functionality, 2. dedicated agents which obtain required information e.g. by observing behaviour or overhearing communication of other agents.

45 Monitoring in Multi-Agent Systems: Two Perspectives

567

The creation and activity of various monitoring entities is coordinated by monitoring services provider, which is a local authority responsible for management of all monitoring resources in a particular location (host). Since some directory of monitoring resources is indispensable to support processing and delegation of monitoring services, the monitoring services provider also delivers appropriate interfaces for agents of the system and external clients to facilitate identification of agents, their properties, and actual state. Cooperation between providers in different locations allows for remote monitoring of agents (e.g. distributed teams or mobile agents). Even though at this level of abstraction the structure of monitoring subsystem looks similar for both information-intensive and computational systems, the implementation differs a lot. Monitoring in the former case should be based mainly on autonomous monitoring agents, which may be built on top of monitored agents, or acquire required information from other (monitored) agents. In the latter case much information about monitored agents is available in the infrastructure, and thus monitoring agents should be tightly integrated with the platform.

45.4 Monitoring infrastructure for information-intensive MAS In information-intensive MAS only some data may be acquired via platform's directory services, and most information must be acquired from agents: directly, if possible, or observing the behaviour or overhearing communication of other agents otherwise. In the first case domain-specific agents must be monitoring-aware — equipped with some monitoring module, which often assumes a specific agent architecture with built-in monitoring functionality. In the second case dedicated monitoring agents may be delegated to observe (overhear) and draw conclusions on the domain-specific agents' states. This is only possible if supported by the core infrastructure of the platform and obviously violates agents autonomy. It may be also conceptually difficult if the internal agent architecture is not known. Nevertheless, this does not require any mechanism built into the agent structure.

monitoring service provider

platform )rm (facilitator)

client (e.g. GUI)

1 /

Fig. 45.4. Monitoring infrastructure for information-intensive MAS

568

Marek Kisiel-Dorohinicki

As described in section 45.2, information-intensive multi-agent systems should be applicable in open heterogeneous environments. This is because contemporary distributed software architectures require efficient access and exchange of resources and services, which assumes cooperation between systems supplied by different vendors. Yet interoperability between a variety of different environments for agent technology cannot be possible until a sufficient set of standards is available and widely used by developers [2, 8]. This is why the prototype implementation of a monitoring subsystem was realised with conformance to FIPA^ specifications. The agent platform that was used for implementation facilitates agent identification and localisation (directory services), as well as communication — exchange of ACL messages (message router), according to FIPA specifications. It also provides basic agent structure that supports sending and receiving messages (protocol stack and ACL message parser). This structure was extended with a monitoring module, that provides values of any agent attributes using standard Java reflection APL This way every agent in the platform may be easily directly monitored without any additional effort of the designer of the particular application. The monitoring services provider was realised as a regular agent registered in the directory facility and thus every agent of the system is able to discover and use its monitoring functionality. The most important part of the realisation was the design of appropriate communication protocols — definition of ACL messages with specific content language exchanged between: • • •

agents (clients) and monitoring services provider, monitoring services provider and monitoring agents, monitoring services providers in different locations.

To facilitate effective communication the protocols allow for aggregation of requests and obtained data. Further work will firstly concentrate on implementation on a well-established FIPA-compliant platform (such as JADE). Also a more general content language (e.g. based on KIF) will be proposed. Because of our main application area (distributed expert systems), monitoring agents and processing of acquired knowledge for rulebased representation will be considered.

45.5 Monitoring infrastructure for computational MAS In a computational MAS acquisition of required information may be realised mostly by the core infrastructure, since it "knows" a lot about the agents' states. Thus the monitoring subsystem should be tightly integrated with the platform, as fig. 45.5 shows. Due to a huge amount of agents in the system caching and statistical processing of acquired (numerical) data may be of vast importance for efficient communication between distributed components of the monitoring infrastructure. The prototype implementation of the monitoring subsystem for a computational MAS was realised for AgWorld platform [1] — a software framework facilitating ^Foundation for Intelligent Physical Agents, for further reference see http://www.fipa.org

45 Monitoring in Multi-Agent Systems: Two Perspectives monitoring service provider core

569

client ^jJe.g.GUI) j

Fig. 45.5. Monitoring infrastructure for computational MAS agent-based implementations of distributed evolutionary computation systems (both evolutionary MAS and flock-based MAS models [6]). AgWorld is a PVM-based library of C++ (or Java) classes^. The main components of its structure are: Resource is a passive entity consumed by Agents (used as a means of management of evolutionary processes). Agent represents individual or flock (a unit of evolution and migration). Place constitutes abstraction of local environment (also a unit of distribution). Path is a directed connection between Places (facilitates conmiunication and migration). World is responsible for management of Places and Paths. Every active object in the system (e.g. Agent, Place or World) is a specialization of SimulationObject class, where the core monitoring functionality is implemented. This way all these objects become monitoring-aware and the monitoring infrastructure is automatically integrated with the whole platform. The difference is only in the communication protocols used between particular elements of the structure: • • •

for the sake of efficiency direct method calls are used between Agents in one Place (which is a single PVM process), PVM library functions are used to send messages between possibly distributed Places and World, XML-based socket communication is dedicated for external clients.

Further work on monitoring infrastructure for computational systems will focus on extending communication protocols (e.g. to support management functionality). Other transport protocols for external clients will be proposed — Corba IDL interface is nearly ready. Simultaneously a conmion monitoring infrastructure for PVM and MPI-based platforms is being prepared; integration with grid services is also considered.

^For further reference see http://agworld.sourceforge.net

570

Marek Kisiel-Dorohinicki

45.6 Concluding remarks Because of its limited length this paper is surely not a complete guide to design and implementation of monitoring mechanisms in multi-agent systems. Yet the author tried to identify the most important problems and propose possibly general solutions for two different classes of MAS implementations, based on experiences gained during numerous projects in various application domains. To be more comprehensible the considerations were illustrated by concrete realisations of general-purpose monitoring subsystems for information-intensive and computational MAS respectively — of course only selected aspects of their design were discussed. The material presented is obviously the work in progress, and thus descriptions of both prototypes were completed with expected strategies of further development. It seems that interaction protocols dedicated for monitoring could be more formally described to allow for interoperability between monitoring subsystems of different platforms and implementation of common external tools supporting not only analysis and visualisation of obtained data but also management of monitored agents.

References 1. A. Byrski, L. Siwik, and M. Kisiel-Dorohinicki. Designing population-structured evolutionary computation systems. In T. Burczyiiski, W. Cholewa, and W. Moczulski, editors, Methods of Artificial Intelligence (AI-METH 2003). Silesian Univ. of Technology, Gliwice, Poland, 2003. 2. J. Dale and E. Mamdani. Open standards for interoperating agent-based systems. Software Focus, 1(2), 2001. 3. J. Dix, T. Eiter, M. Fink, A. Polleres, and Y. Zhang. Monitoring Agents using Declarative Planning. Fundamenta Informaticae, 57(2-4):345-370, 2003. Short version appeared in: Glinther, Kruse, Neumann, editors, Proceedings of KI2003, LNAI2821, Springer Verlag, 2003. 4. J. R. Graham, D. McHugh, M. Mersic, F. McGeary, M. V. Windley, D. Cleaver, and K. S. Decker. Tools for developing and monitoring agents in distributed multi-agent systems. In T. Wagner and O. Rana, editors, Infrastructure for Agents, Multi-Agent Systems, and Scalable Multi-Agent Systems, LNAI 1887. Springer Verlag, 2001. 5. G. A. Kaminka, D. V. Pynadath, and M. Tambe. Monitoring teams by overhearing: A multi-agent plan-recognition approach. Journal ofArtificial Intelligence Research, 17:83135,2002. 6. M. Kisiel-Dorohinicki. Agent-based models and platforms for parallel evolutionary algorithms. In M. Bubak, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, Computational Science - ICCS 2004, Part III, LNCS 3038. Springer Verlag, 2004. 7. E. Nawarecki, M. Kisiel-Dorohinicki, and G. Dobrowolski. Organisations in the particular class of multi-agent systems. In B. Dunin-Keplicz and E. Nawarecki, editors. From Theory to Practice in Multi Agent Systems, LNAI 2296. Springer-Verlag, 2002. 8. P. O'Brien and R. Nicol. FIPA - Towards a standard for software agents. BT Technology Journal, 16(3):51-59, 1998.

46 Multi-Agent Environment for Management of Crisis in an Enterprises-Markets Complex Jarostaw Kozlak Department of Computer Science, AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakow, Poland Summary. A very important research problem is elaboration of methodology for an efficient reaction to crisis events. The research goal presented in this work is creation of a simulation environment for analysing and estimating the methods of reaction to crisis events such as foreseeing any potential crisis events and taking them into consideration while making plans or creating plans which may be modified after a crisis event has occurred and that with limited disadvantageous consequences. An illustration of the research said is a system composed of an environment containing renewable common resources, as well as a set of enterprises and markets. Key words: crisis situations, multi-agent systems, supply chains

46.1 Introduction Many problems, which are of importance in today's world, are characterised by a high degree of complexity and are influenced by different heterogeneous entities of different autonomy levels. There could be mentioned such exemplary problems, like production planning or transport planning. For the realization of this kind of planning processes, it may be useful to apply a multi-agent approach which offers methodologies for dealing with such situations. The multi-agent approach is based on the realisation of systems composed of interacting, intelligent and autonomous entities, called agents, which construct local plans; in consequence of interactions occurring between them there will emerge global plans affected by all the complex system. The multi-agent approach proved its usefulness for solving standard production planning problems (job shop, open shop, flow shop), especially in respect to their dynamic version (e.g. [11]). There is also performed research on the application of the multi-agent approach for more complex problems which better reflect the reality. One can mention applications for supply-chains management [2, 6, 8]. For example, in [8] was presented an environment for creating systems for supplychains management, with two distinguished classes of component agents: structural elements (representing physically existing production enterprises, storehouses, brokers, sellers etc.), and control elements responsible for resources flow in the system.

572

Jaroslaw Kozlak

Market approach is also used [9, 10] for the development of efficient supply-chains with multi-agent systems. A very important research problem is elaboration of methodology for an efficient reaction to crisis events. It is assumed that crisis situations occur after the those events which entail a reaction, even at the cost of deterioration in the quality of realisation of requests adopted in hitherto plans. The research goal presented in this work is creation of a simulation environment for analysing and estimating the methods of reaction to crisis events such as foreseeing any potential crisis events and taking them into consideration while making plans or creating plans which may be modified after a crisis event has occurred and that with limited disadvantageous consequences. An illustration of the research said is a system composed of an environment containing renewable conunon resources, as well as a set of enterprises and markets [4]. Enterprises effectuate production process (obtain some output resources as a result of a transformation of suitable configuration of input resources); they may be also obtained resources from the environment or exchange resources with other enterprises paying the price set by market. The enterprises plans have a form of workflows representing a series of resource transformations; one of their main evaluation criteria is the profit obtained therefrom. The markets determine the market prise for rendering particular services and mediate in resource transfer between the enterprises. We assume here that a crisis situation is an arrival of a high-priority production order which cannot be rejected. The experiments performed are to demonstrate some abilities of pro-active planning which can foresee the incoming requests and tries to take them into consideration while constructing the plan.

46.2 Overview of Agent Planning Methods Particular agents make plans which best comply with their goals. On the basis of private goals of agents, it is possible to establish global plans in which participate different agents. It is possible to distinguish the following kinds of distributed planning (DP) [3]: Cooperative Distributed Planning (CSP) - the objective is to establish a good common global plan. Particular entities participate in the construction of the plan, agents exchange information about their plans which are constructed and adjusted to the needs of global common plan realization. Full cooperation of participating agents is assumed. Negotiated Distributed Planning (NDP) - the reference point is a successful realization of private, local goal by each agent. The development of a plan embracing a group of agents results from negotiation held among the agents and coordination of actions performed by particular agents. Distributed Continual Planning, (DCP) [3] - planning a process is distributed among the agents and the development of plans is led together with execution of actions purposed to their realization. For such kinds of problem planning, one can distinguish two approaches reactive (based on the idea of the best possible reaction

Title Suppressed Due to Excessive Length

573

to the occurred events leading to the plan modification) and pro-active (a more or less precise model of potential events which will have influence on the plans is used and one tries to predict them during plan construction). In case of proactive planning, it is possible to differentiate methods aimed at constructing robust solutions which give the highest chance that the solutions obtained remain valid after future dynamic events or methods aimed at creating flexible solutions which can be easy modified and adapted in case of need [1].

46.3 Multi-Agent Systems and Crisis Situations According to main ideas of multi-agent approach, it is assumed that as a result of multi-agent cooperation the system functionality should increase thanks to cooperation among the agents which gives possibilities to perform additional, often more complex actions [7]. Autonomy of agents should support an adjustment of the system to changes which appear after crisis situations take place. In the case of crisis situation, agents became aware of changes and modify its goals in response to them.

46.4 Model Features The objective of the work is the creation of a universal multi-agent environment with a possibility of testing different techniques of agent planning. It should be also possible to generate crisis situation and to observe the modification of agents plans purposing to their adjustment to situation changed. The environment should possess the following features: • •

•

a possibility of testing different methods of construction of agent plans, a possibility of modelling (in simple form) various kinds of crisis situations associated to agent functionality as well as free resources in the environment or being possessed by agents, a possibility of examination of different methods of reactions to crisis situations.

46.5 Model Description The model is composed of an environment and four types of agents: enterprises, producers, customers and markets. 46,5.1 Environment The environment constitutes a place, where agents are located. It is also a space where resources, item renewable (which quantity increases each given period of time, value of increase depends on current quantity of resources).

574

Jaroslaw Kozlak

46.5.2 Agent-Enterprise The role of agent-enterprise is the coordination of functioning agent-producers subordinated to it. Agent enterprise AE may be described as (EP, gr, g), where: EP - set of agents-producers subordinated to agent-decision module, gr - set of co-ordination rules (resources, actions, negotiated rules), g - goal function, value depends on the configuration of resources. 46.5.3 Agent-Producer Agent-producer - transforms configuration of resources. It is described as 5-tuple: APi = (AE, gu Pi, Ri, Ki\ where: AE - identifier of enterprise - owner of agent-producer, Qi - goal function, value depends on the configuration of resources. Pi - plan of agent (series of actions executed successively), Ri - resources configuration: R = (roi, ro2,... ,ron, rfi, r/2,... ,r/n,), where roi, ro2, ..., ron, are quantities of resources owned by agent and r / i , r/2, ... ,r/n are quantities of free resources owned by agent. Ki - knowledge on environment, expressed as a list of triplets a^, Oj, pij, where ai - action performed by agent, Oj - operation describing results of action execution end influences on agent configuration and its neighborhood (environment and other agents), Pij - probability that execution of action a^ (roi, ro2, ... ,rOn, r / i , r/2, ... , rfn) causes the result Oj. Action GiQ) is described by type i, force of execution j (expresses the quantity of resources used and the anticipated quantity of resources obtained as results of action) and execution time. There are the following kind of actions: action on the environment, transformation of resources and resources exchange between agents. 46.5.4 Agent-Customer Agent-customer AC = (g, R) is described by a utility function g (preferences conceming resources configuration) and current configuration of owned resources R. 46.5.5 Agent-Market In the system, there is a set of agents-markets. The role of the agent-market is mediation in an exchange of resources between agents, definition of equilibrium prices of action execution and coupling agent -producer with agent-customer for goods exchange. Market Mi is described as a following tuple (a^, pr^, qoi, qdi, Ipi), where: tti, - type of action of resources exchange;

Title Suppressed Due to Excessive Length

575

pri - current price for action execution; qdi - number of demands for action execution; qoi - number of offers of action execution; Ipi - list of agent couples (producer-consumer) performing exchange of resources; The algorithm of market functioning is described as follows: begin cycle wait for offers and demands during n steps if [demand - offer| < epsilon then keep service price if demand - offer > epsilon then price += correction(price, demand-offer) if offer - demand > epsilon then price -= correction(price, offer-demand) end cycle 46.5.6 Agent-Disturber The role of the agent-disturber is introduction of disturbances to the system, leading to occurrence of crisis situations. In this goal, it is necessary to generate particular events which are described in 46.5.8. The Agent Disturber AD is represented as (car, caa, cr), where: car - vector describing probabilities that agent will be devoid of one resource of particular type, caa - vector describing probabilities that agent will be devoid of performing a particular type of operation (by removing capabilities of performing of certain actions), cr - vector describing probabilities that a particular type of critical request, which agents have to execute, arrives. 46.5.7 Planning Algorithms The plan consists of a series of actions performed in given time periods. The plan development is based on a random generation of a set of action series, possible to be performed according to the knowledge possessed by the agent. Plans are estimated on the basis of calculation of goal function values for the given configurations of resources of agent at the end of plan realization. An additional criterion may be the probability of successful realization of the hereby defined plan. Then, it is possible to either choose the best plan or perform a random selection e.g. based on roulette rule. Constructed planning algorithms are adapted for use in dynamic changing environment. The plan may be modified, when (see fig. 46.1):

576 •

•

Jaroslaw Kozlak The given auction was not successfully performed (obtained result diverged from assumed result more then tolerance limit) or became impossible for realization (as a result of changes in an agent's entourage, for example, associated to a crisis situation). A better plan is found which lead to higher value of goal function in the given time horizon

replanning better plan found

better plan found

success

failure

replanning

Fig. 46.1. Plan construction and realization

46.5.8 Crisis Situations It is possible to distinguish the following kinds of crisis situations leading to the necessity of modification of plans by agents: • • • •

overexploitation of resources accessible in the environment, deprivation of resources possessed by an agent. In this case, the agent has to adapt its goals to the configuration changed, deprivation of an agent by removing capabilities of performing certain actions, introductions of certain critical requests which the enterprise has to execute.

Pro-active planning may make possible a prediction of some disadvantageous events (especially, if probabilities of their occurrence may be calculated).

46.6 Realization A prototype system version was implemented in Java. The research also contains some works on two environments for multi-agent planning in the environments composed of virtual enterprises and markets, based on agent platforms JADE [12] and MadKit [13]

Title Suppressed Due to Excessive Length

577

46.7 Experimental Results The preliminary experiments concerning a reaction of system to disturbances and applicability of pro-active planning were performed. The percentage of agents' plans successfully realized was analyzed. The goals of agents and system configuration were selected so that the probability of the successful realization of plan was 100 percent. Then, disturbances based on depriving agents of resources possessed by them were introduced to the system. Dependant on of the strength of disturbances, the probability of the successful realization of agent's plans decreases (fig. 46.2). When the quantity of deprived resources was about 30, it was impossible for agents to realize their plans, because of lack of resources. The average value assumed to be obtained after the realization of plan was 5056.68. The applied mechanisms of pro-active planning, based on taking into consideration of apparitions of the disturbances during plan realization and creation of plans with slightly lower requirements concerning the resources, limited the average value of plan constructed by agents (equals 5041.8), but slightly increased possibilities of correct realization of despite disturbances (the results are especially promising for disturbances with limited strength).

ipn 100 • CO

/^"^

80 •

c

_ ^

f-

60 •

1

40n U 1

-•—norm

yx •

0

1

_/y ^

10

1

t

20

1

1

22

[—•—pro-act II

1

25

1

28

1

30

1 1

40

disturbances [number]

Fig. 46.2. Plan sucessfully realized with or without pro-active plannning mechanisms In the future, it is necessary to perform more experimental research for different configurations of analysed agents and different values of goal functions.

46.8 Conclusions and Future Works In the paper, the idea of an environment composed of virtual enterprises and markets was presented. The main goal of the realization of this system is its application in

578

Jaroslaw Kozlak

different experiments concerning the planning and scheduling performed by groups of autonomous agents. The special importance for us has the construction of the plans considering crisis situations, described in section 46.5.8. Then, the following works are intended: • • •

further works on environment development; modelling of crisis situations and analysis of results obtained after application of different planning approaches; introduction of uncertain events and taking them into consideration in planning algorithms.

References 1. Davenport A, Beck J Survey of Techniques for Scheduling with Uncertainty. Unpublished manuscript. http://www.eil.utoronto.ca/EIL/profiles/chris/zip/uncertainty-survey.ps 2. Fox M, Barbuceanu M, Teigen R (2000) Agent-Oriented Supply-Chain Management. The International Journal of Flexible Manufacturing Systems, 12:165-188. Kluwer Academic Publishers, Boston. 3. desJardins M, Durfee E, Ortiz C, Wolverton M (1999) A survey of research in distributed, continual planning. AI Magazine, 4:3-22 4. Kozlak J, (2001) Management of the renewable resources in the open multi-agent system. In: Binder Z (ed) Management and Control of Production and Logistics. Pergamon/Elsevier Science 5. Liu J-S, Sycara K (1995) Multiagent coordination in tightly coupled real-time environments. In: Lesser V (ed). Proceedings of the International Conference on Multi-Agent Systems. MIT Press 6. Parunak V, VanderBok R (1998) Modeling The Extended Supply Network, Industrial Technology Institute, working paper 7. Steels L (1990) Cooparation between distributed agents through self-organisation. In: Demazeau Y, Muller J-P (eds) Decentralized AI. Elsevier Science PubHshers B.V., NorthHolland 8. Swaminathan J, Smith S, Sadeh N (1998) Modeling Supply Chain Dynamics: A Multiagent Approach. Decision Sciences. Volume 29, Number 3 9. Walsh S, Wellman M (1999) Modeling Supply Chain Formation in Multiagent systems. In: IJCAI-99 Workshop on Agent Mediated Electronic Commerce 10. Wellman M, Walsh W, Wurman P, MacKie-Mason J (2001) Auction protocols for decentralized scheduling. In: Games and Economic Behavior 35:271-303. 11. Yoo M-J, Muller J-P (2002) Using Multi-Agent System for Dynamic Job Shop Scheduling. In: Proceedings of ICEIS 2002 12. JADE (2003) Java Agent Devlopment Framework. http://sharon.cselt.it/projects/jade/ 13. The MadKit Project (a Multi-Agent Development Kit). http://www.madkit.org/

47

Behavior Based Detection of Unfavorable Events Using the Multiagent System Krzysztof Cetnarowicz^, Edward Nawarecki^, and Gabriel Rojek^ ^ Institute of Computer Science AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakdw, Poland [email protected] ^ Department of Computer Science in Industry AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakow, Poland roj [email protected] Summary. This article presents an attempt to creation of a security system which makes possible automatic detection of danger in protected area. Considered protected areas are real world systems e.g. airports, shop or city centers. Automatic processing of behavior of persons acting in the considered area should enable to indicate some actors which effects of behavior can be or will be unfavorable for the considered system - danger for other actors in the secured are.

47.1 Introduction Danger of destructive attacks seems to be more and more frequent in the contemporary world. The main problem is to detect the attack risk as soon as possible. The detection of such dangers must comply with the following requirements: • • •

detection of dangers early enough in order to undertake security-protection actions, detection of new kinds of danger that are unknown a priori, constant monitoring of the area which should be protected.

Analyzing the historical development of human societies we find that a society can meet a new kind of danger and to protect itself against it. A society can create and update moral principles, penal codes, and using them may detect unusual behavior and identify ill deeds. This is very flexible and efficient way in detection of new kind of dangers but is not practical in contemporary menaces because it takes a lot of time. Using the computer simulation we can accelerate the mentioned above way and obtain a method that is efficient in both: detection and time.

580

Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek

47.2 Real world system A surveyed real world system (e.g. airport) may be in general considered as an environment with resources and actors. Resources (e.g. cars, rubbish bins) have characteristic properties and create an environment. Actors acting in the environment change resources. The action of actors modifying resources may be monitored and than analyzed and used to determine unusual behavior. However in general, most systems are so called open systems. In the open system there is a migration of actors and new actors (new kind of actors) may come into a given system. So in the open system, we have to deal with new type of actors, new kind of behavior of new-type actors, and existing, stale models of behavior become useless. In general we can say that the main goal is to find a common property of good behavior of good actors and take an opposite behavior as the wrong one. Than we can use the negative selection to identify ill deeds and unfavorable behavior and in turn find the authors of it. 47.2.1 Real world system simulation We can built a computer model of the real system that simulates events that take place in the real system. Than we can realize a real-time simulation of a real system where simulated events are driven by real events of the real system. In the model: • •

the real resources are represented by simulated resources, actors of the real system are represented by agents.

We built a multiagent model of the real system. The multiagent system may be provided with social procedures that fulfill needed task.

47.3 Actors, actions, behavior and estimation of behavior An actor of the real world undertakes actions. The actions collected in a given period of time form a sequence that define actor's behavior. If more actions of an actor under consideration are known, we can say more about behavior of the actor. The order of actions (undertaken by an agent) is essential to evaluate its behavior. The same actions in different order may indicate behaviors that can be judged in different ways. Behavior of an individual is a sequence of actions. The length of this sequence is meaningful. The longer sequence enables to estimate behavior with higher confidence. It is impossible to define simply: long sequence, short sequence. The length of sequence depends on features of an individual (actor) which is observed. For example: taking into consideration people observed in the airport - we are unable to say anything about a given passenger who has just entered the monitored area. When we notice that he left some packages in five rubbish bins - we can estimate his behavior as bad and we should start an alarm. The essence of this article is to discuss how

47 Behavior Based Detection of Unfavorable Events

581

an alarm can be started automatically what leads us to the problem of the automatic behavior evaluation.

47.4 Estimation of behavior in multiagent systems Agent that represents a given actor of the real world undertakes actions that may be considered as objects. The objects create a sequence which is registered by agents observing another acting agent. In real world society all agents (all individuals in society) observe (and estimate) all individuals which actions should be noticed. In the simulated model of the real world society (real world system) we can create one agent which estimates behavior of all individuals in the monitored area. Actions of agents as registered objects may be processed in order to decide whether the behavior of the agent is good or had. Than it may enable evaluation of the agent as good or bad and imply an adequate action. It should be mentioned, that the quoted notions of good and bad do not have absolute meaning. Good behavior or good agent is a desirable individual for a given system (monitored area) in which evaluation takes place. Bad agent is an undesirable agent in a given system, although it is possible that it is good in another one. 47.4.1 Estimation inspired by immunological mechanisms Particular mechanisms, inspired by immunological mechanisms (as found in [3, 4, 5, 6]) are applied to estimate the behavior of an agent. The proposed mechanisms operate on actions committed by agents under observation. Structures used by immunological mechanism has form of sequences (chains) of actions performed by the agents. The length of a chain is defined as /, (i. e. every chain contains / objects). Every object represents one action undertaken by an agent under observation. The algorithm of the proposed mechanism is following: • • •

observe and store (permanently) actions (corresponding objects) undertaken by every agent visible in the environment (system); once after a given number of observed actions (undertaken by every agent) generate corresponding detectors; when detectors are generated - evaluate behavior of every agent in the system.

The generated detectors have a form of fragments of sequences (subsequences) composed of objects representing actions. A subsequence may be considered as a detector if it does not appear in a sequence of actions of any agent which is considered as good (represents good behavior). The process of creation of detectors takes place after a given number of memorized actions - we need to gather knowledge about undertaken actions. Behavior estimation consists of verification of sequences of actions (of all agents in monitored area). The sequence of actions is considered as bad if it contains as a subsequence a detector. Behavior of an agent is evaluated as bad if its sequence of actions is similar (within a given level) to detectors.

582

Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek

47.5 Division profile Our aim is to obtain a class of agent activity whose goal is to observe others agents in society and possible other elements of the environment. Those observations should be made in order to distinguish individuals whose behavior is unfavorable or incorrect (bad) for the observer. Such distinguished bad individuals should be adequately treated (e.g. convicted, avoided, liquidated) which should also be formed by a division profile. In the case of a multiagent system which is a simulation of real world system, it is desirable to equip every agent in the system with mentioned mechanisms, so the security is assured by all agents existed in the system. This mechanisms built-in into exemplary agent create division profile of this agent in interpretation of M-agent theory (presented e.g. [1, 2]). Division profile mechanisms are built-in into all agents existing in simulated system, but every agent posses his own instantion of division profile. In environment of simulated system there are added two common for all agents structures: •

•

board of actions F - the structure in which actions (objects representing actions) of all agents in environment are stored, there are stored only h last actions, every action is accompanied by the notion by whom this action has been undertaken; board of removal O - the structure which collects results of functioning of division profiles of all agents in the system.

Referring to presented in Sect. 47.4.1 synopsis of inmiunological inspired algorithm of behavior evaluation, functioning of division profile mechanisms can be spitted into 3 stages as it is shown onto Fig. 47.1.

creation of collection W

^ I

behavior evaluation of neighboring agents - qualifing particular agents as "barf*

time

generation of detector set J? Fig. 47.1. Stages of division profile functioning

47.5.1 Creation of collection W Collection W is named collection of own actions. This collection includes correct, "good" sequences of actions. The collection W should consists of action-object sequences of length Z, which is undertaken by the agent-observer (agent that owns considered instantion of division profile). This is correct, because of the assumption that actions which the agent undertakes are evaluated as good by him. Presuming

47 Behavior Based Detection of Unfavorable Events

583

there are stored h last actions undertaken by every agent, own collection W will contain h — I + 1 elements. 47.5.2 Generation of detector set R The algorithm of detectors generation refers to the negative selection - the method of T-lymphocytes generation (presented in [3, 4, 5, 6]). From set RQ of generated sequences of length / those reacting with any sequence from collection W are rejected. Sequences from set RQ represents every possible actions in every possible order (every possible actions are actions that can be undertaken by anyone agent in the system). Sequence reaction means that elements of those sequences are the same. Sequences from set RQ which will pass such a negative selection create a set of detectors R. 47.5.3 Algorithm of actions evaluation Once detector set of the agent-observer is generated, this detector set is used to find bad among sequences of action-objects of other agents (presented in board of actions F). Actions evaluation is done by the agent-observer (agent that division profile is considered) for every agent in environment separately. Assuming there are j agents in the system, evaluation of actions is made by the agent-observer j times separately. An exemplary agent a^ has attributed sequence Ni stored in board F , i indicates the number of agents. There is also attributed coefficient rrii to this agent a^. At the beginning of actions evaluation coefficients m are set to zero for all agents. For an agent a^ every subsequence of length / of sequence Ni is compared with every detector from set R, as it is shown in Fig. 47.2. If any element of detector set and any subsequence of Ni match, there is incremented coefficient rrii. Sequence matching means that the elements of the sequences compared are the same. At the end of process of actions evaluation for all agents, every agent has attributed coefficient indicating the number of matches. Coefficient rrii indicate the number of counted matches for agent a^. The bigger number of matches of an agent indicates that this agent can be considered as bad. The agent-observer chooses an agent or agents which has the greatest number of coefficients and sends a demand of deleting of that agent or agents. To this demand is attributed coefficient equal to number of matches of the agent which demand concerns. 47.5.4 Board of removal O Board of removal O is a structure in the environment with mechanisms which calculate all demand of deleting sent by all agents evaluating behavior. Board O is not part of division profile of any agent, but it is a part of evaluation mechanisms shared for all agents which makes evaluation of behavior. Board of removal is array O = (oi,02, ...Oi, ...Oj), where Oi is coefficient attributed to agent a^ and j is the number of agents in the system. Rules of board of removal exploitation:

584

Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek

Agent aj

(M)

#^ #^

Agent (22

1m

\

n f a n o n f yf.

%

v » «g^«m«, •»

%

Detector seti?

•

i^r

f 1

# Agent (2,

Evaluation

,

1.

J\.

^ ^ ^

Identity of symbols in sequences

no

NI/V"

#

mi=mi+I

Agent (2/ j - amount of all agent in environment Nj- collection of sequences of actions udertook by agent a, Fig. 47.2. Algorithm of actions evaluation presented for agent ai

1. board O is reset (all coefficients are set to zero) at the beginning of every constant time period At; 2. in the case of receiving of remove demand of an exemplary agent a^ with attributed to this demand coefficient m^: •

Oi — Oi + m i ;

3. at the end of every constant time period At there is removed agent (or agents) ad which characterize: • Od = max{oi,02'-'Oj), • Od> OU, where OU is constant (is named sensitivity of recognition).

47.6 Experiment In order to confirm effectiveness and utility of proposed solutions there was implemented multiagent system. In the environment of multiagent system there exists two types of resources: • •

resources of type A, resources of type B.

Resources are used by agents, but refilling of resources is only possible when resources are used simultaneously. It can be distinguished two types of agents acting in the environment of research system:

47 Behavior Based Detection of Unfavorable Events

•

•

585

type g=0 - agents which in every constant time period At take one unit of randomly selected (A - 50%, B - 50%) resource; type g=0 agents need units of resources to refill energy; if energy level of an type g=0 agent goes down, this agent is eliminated; type g=l - agents which in every constant time period At take one unit of A resource; its existence does not depend on its energy level; type g=l agents are also called intruders.

There are some similarities between actions of acting agents in research system (taking A resource / taking B resource) and actions of real world actors e.g. leaving baggage / taking baggage (at repository). 47.6.1 Results: intruders inside environment In this part of research there were researched three cases: •

•

•

a case with only type g=0 agents in the system without division profile mechanisms - initially there are 50 type g=0 agents in the system, which do not have any security mechanisms; a case with type g=0 agents and type g=l agents without division profile mechanisms - initially there are 35 type g=0 agents and 15 type g=l agents, all agents do not have any security mechanism; a case with type g=0 agents and type g=l agents with division profile mechanisms - initially there are 35 type g=0 agents and 15 type g=l agents, all agents in the system are equipped with the division profile mechanisms with parameters h = 18,1 = 5, OU = 300.

The system in those three cases was simulated to 300 time periods and there were performed 10 simulations. Diagram in Fig. 47.3 shows the average numbers of agents in separate time periods. In the two cases of system with agent without division profile mechanisms: if there are not any intruders in the simulated system, all type g=0 agents can exist without any disturbance. The existence of intruders in the system causes problems with executing tasks of type g=0 agents, which die after some time periods. Bad agents still remain in the system, which is blocked by those bad agents. In the case of system with agent with division profile mechanisms: in the environment there are stored last 18 actions undertaken by every agent. After 18 actions have been undertaken by every agent, detectors are constructed of length / = 5. Agents use their division profile mechanisms to calculate which neighboring agent they want to eliminate. Agent demand to eliminate these neighbors which have the maximum of detector's matchings. Agents present their demands to the environment with the number of matchings. The environment counts matchings in the demands presented and eliminates agents as it was presented in the description of division profile mechanisms. The constant OU is set up to 300. As results presented onto Fig. 47.3 verify after detectors were constructed, intruders were distinguished due to the division profile mechanisms. In the same time

586

Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek 60 50 40 30 H 20 10 A — I

30

—

r -

60

90

120

I

150

1 —

-1

180

210

1 —

240

270

300

time = initially 50 type g»0 agents •" initially 35 type g^O agents and 15 type g=1 agents (intruders), ail agents without division profile mechanisms -initially 35 type g^O agents and 15 type g^l agents (intruders), ail agents with division profile mechanisms

Fig. 47.3. The system without intruders, intruders inside the system, intruders inside the system with agents with built-in division profile

period distinguished agents were deleted what makes possible for type g=0 agents to function freely. 47.6.2 Results: intruders penetrating the environment There was simulated a case with mobile intruders. Initially there are 40 type g=0 agents. After 20 constant time periods there are getting new agents to the system. In every time period there is getting one intruder to the system, this process occurs to 80th time period, so there are getting 60 type g=l agents. There were simulated two cases: • •

agents without any security mechanisms; all agents equipped in division profile mechanisms with parameters h=lS,l = 5, OU — 300 (mobile agents are also equipped in division profile mechanisms) - after 18 actions have been undertaken by every agent there are constructed detectors, so after this had happened agents can distinguish bad whether good agents.

The system was simulated to 300 time periods and there were 10 simulations performed. Diagram in Fig. 47.4 shows the average numbers of agents in separate time periods. Knowing that all mobile agents are had it seems that all getting agents should be immediately eliminated. It is not true, because agents distinguish bad agent on the

47 Behavior Based Detection of Unfavorable Events

587

agents using division profile nnechanisms agents without division profile

20 -I

0

30

1

1

60

1

1

90

1

1

120

1

1

150 time

1

1

180

1

1

210

r-

240

270 300

Fig. 47.4. The system with mobile intruders penetrating the system, agents without or with buil-in division profile

basis of behavior estimation. It is possible to evaluate agents which have presented behavior - have undertaken 18 required by division profile mechanisms actions. Every new bad agent getting to the system is destroyed after he has presented his behavior - has undertaken 18 actions. Because agents undertake one action in one constant time period At, every new bad agent is killed after 18 constant time period At of his functioning.

47,7 Conclusion This paper presents a discussion of automated assuring security of real world system such as airport or shop center. The real world system can be simulated in order to monitor and analyze actions which are undertaken by actors in the real world. It can be used multiagent model which in the environment of the multiagent system refers to the protected area: resources represent real world resources (e.g. cars, baggage) and agents represent real world actors (persons). Using multiagent system it is possible to equip every agent in mechanisms which goal is to distinguish bad or good agents. This mechanisms were named division profile in our work. Presented mechanisms with immunological approach operate on objects which represent observed actions. The environment of multiagent system is designed with some additional mechanisms (mechanisms of actions storing and agents removing) in order to support security mechanisms which all agents are equipped. In order to confirm the effectiveness of our conception of unfavorable events detection there was simulated system which refers to a simple real world system (leaving / taking baggage at repository). Obtained results indicate that it is possible to detect threats unknown a priori (that are unknown in the process of security system creation). Presented mechanisms enable also constant monitoring of protected area

588

Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek

and as it was verified in test make possible to evaluate behavior before unfavorable consequences of this behavior.

References 1. Cetnarowicz K.,: M-agent architecture based method of development of multiagent systems. Proc. of the 8th Joint EPS-APS International Conference on Physics Computing, ACC Cyfronet, Krakow (1996) 2. Cetnarowicz K., Nawarecki E., Zabiriska M.: M-agent Architecture and its Application to the Agent Oriented Technology. Proc. of the DAIMAS'97, St. Petersburg (1997) 3. Forrest S., Perelson A. S., Allen L., Cherukuri R.: Self-nonself Discrimination in a Computer. In Proc. of the 1994 IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos (1994) 202-212 4. Forrest S., Perelson A. S., Allen L., Cherukuri R.: A Change-detection Algorithm Inspired by the Immune System. IEEE Transactions on software Engineering, IEEE Computer Society Press, Los Alamitos (1995) 5. Hofmeyr S. A., Forrest S.: Architecture for an Artificial Immune System. Evolutionary Computation, vol. 7, No. 1 (2002) 45-68 6. Wierzchon S. T: Sztuczne systemy immunologiczne: teoria i zastosowania. Akademicka Oficyna Wydawnicza Exit, Warszawa (2001)

48 Intelligent Medical Systems on Internet Technologies Platform Beata Zielosko and Andrzej Dyszkiewicz Institute of Computer Science, University of Silesia B^dzinska 39,41-200 Sosnowiec, Poland [email protected] [email protected]

48.1 Introduction The amount of accumulated medical data generates a problem of their efficient process, which aims at getting cross-sectional knowledge about this research. Thanks to computer science, if we want to research this type of problems, we can successfully use intelligent techniques such as e.g. genetic algorithms, neural networks, fuzzy sets or rough sets. These last give good results in patients' classification at sequent treatment stages and rehabilitation stages [7]. Considering the peculiarity of medical data (i.e. amount and diversity of parameters and many stages in diagnose and treatment), the usage of rough sets in creating advisory medical systems can shorten the time needed to diagnose. It can minimize the treatment expenses, can raise doctors' job quality and decrease risk in making a mistake during diagnose. Placing these solutions on .Net platform and implementing them as Web Services, gives new possibilities in usage of artificial intelligence domains in intelligent medical systems.

48.2 A role of data unification in medical metrology The usage of different types of diagnosis within the scope of motion functions, spontaneous emission and metabolism, within many hospital units, creates some conmiunication problems. They include transmission and estimation of temporary condition of patient and achieved clinical results with application various kinds of therapy. It is important, especially either in case of transferring patient from one hospital unit to another or discharging from hospital to home with intention of re-admission with further therapy in the future. In above-mentioned situations sometimes it is difficult to characterize objectively an exit state of patient and compare it with the state after staying at home or in another hospital unit. Doctors from other units are forced to duplicate many initial descriptive actions. It has an influence on peculiarity of medical data [1]. There exist some diseases of many attributes and sometimes it is difficult to indicate the most important symptoms. Some features are more important then

590

Beata Zielosko and Andrzej Dyszkiewicz

other in process of diagnosis, or some features are typical only for individual stages of disease. Besides, symptoms of one disease are caused by the other. Not always it is easy to indicate all attributes of disease and—we get incomplete data. In such situations unification of methodology and starting of usage of measure methods, which register discreet symptoms of life, could have a good influence on conmiunication between various medical structures [2]. The uniformity of measuring methods, with application e.g. in orthopedics unit, neurology unit or rehabilitation unit, will help in keeping one way of measuring procedures started in other organization unit. It will help to precise the patient's state [4]. Next it will enable the objective assessment of qualitative medical services in to apply the most effective methods of therapy, which will simultaneously generate the lowest economical expenses. To aim unification and possibilities in data exchanging between hospital units or individual healthcare centers, we have some international standards that allow transmission of medical data in the electronic form. The most popular ones: — HL7 (Health Level 7)—standard of data exchange in a text form, applied in USA. In latest version HL7 3.0 introduces objective information model in health services, which is described by UML (Unified Modeling Language). — DICOM (Digital Imaging and Communication in Medicine)—standard of data exchange medical images. In case of DICOM norm, every application which is compatible with a norm, must have the document of "Conformance Statement", where the author of application characterizes what services are compatible with DICOM. The "Conformance Statement" is sometimes selective which means that two applications could be compatible with the norm, but don't interact with each other because they are compatible in different scopes. — UN/EDIFACT (United Nations/Electronic Document Interchange For Administration, Conmierce and Transport)—standard used in Europe and Poland. Norm UN/EDIR\CT is applied for administrative and reporting purposes. It is mainly used to implement an outside data exchange e.g. relation between deliverer of medical services and NFZ (national health service). Above standards of data transmission are often used only in big health centers. Sometimes it happens that every medical unit has its own standard of accumulation, process and data transmission—depending on the applied system. It makes communication between medical units difficult. One of solution of this problem is to apply XML (extensible Markup Language) in above standards as unification method to transmit medical data.

48.3 Web Services and XML—elements of .Net platform The main aim of .Net platform is to resolve building and functions of diffusion systems into simple form and to secure their efficiency, scaling and security [8]. .Net offers modem technologies to create software to co-operate with internet. Web Services are one of the examples. We could characterize Web Services as independent software components available in network. They are used for specified functions and services [10]. It means that Web Service written in Java language and available in

48 Intelligent Medical Systems on Internet Technologies Platform

591

Linux system, could be called by application written in Visual Basic language and this application could be available in Windows system. The conmiunication between Web Services and clients happens thanks to standard protocols used in internet— HTTP (HyperText Markup Language) and SOAP (Simple Object Access Protocol). HTTP is used in sending the requests to Web Services and the messages replies return from them. SOAP defines a way of calling services and transfer return values. SOAP messages are formed in XML language. We could design application with many modules (multi modular application) and a possibility of passing data parameters and also use individual modules as Web Services. These Web Services could be modernized and scaled depending on client application needs or on development of implemented algorithms. Additional advantage of this solution is layers building (presentation layer, bussiness rules layer and data access layer) which makes application independent of used operating systems or device platform on client's side [3]. Web Services allows progranmmers to get the individual functions which are available in application. In this way there is no need of creating whole application every time we want to create a new one or add some new functions to the existing one. Owing to continuous development of intelligent techniques in processing data and economical aspects of designed informatics systems, the choice of .Net platform as a technology to create diagnosis support systems, and the choice of XML language as standard of transmission medical data, seems to be an interesting solution. XML is an extensible markup language and is used as a universal document format. It defines the way of publishing contents in databases and it determines format of exchanging data between applications and systems from other producers. XML is also a basic language of data description used by Web Services [6]. When creating XML document, we don't have to use a narrow, predefined set of markups (as HTML (HyperText Markup Language) language). We could create own elements for describing document. The described text included in a document i.e. set of information about format of data and data origin is called metadata. In this way standard document defines XML application (so-called XML vocabulary) [11]. This application creates document's framework and could be used to describe specified type of document e.g. MathML—XML application for formatting equations. Large flexibility of creating its own elements in XML documents requires precise rules of syntax. Usually XML application is defined by DTD (Document Type Definition), which is an optional component of XML document. Other ways to declare contents of XML documents are: CSS (Cascading Style Sheets), XSL (extensible Stylesheet Language) and XML Schemes. These files include instructions of formatting elements in XML documents and are to join by proper declaration e.g. , "default.xsl" is the name of XSL file. Using the XML Schemes, XML document could be presented by means of DOM (Document Object Model). XML file is described by means of tree, which has bundles. These bundles are objects with various methods and properties.

592

Beata Zielosko and Andrzej Dyszkiewicz

48.4 Multi modular medical system of patients' diagnosis As an example I would like to present possibility of rough sets elements adaptation in .Net technology to create medical system of patients' diagnosis. The usage of rough sets enables to help in solving problems connected with patients' classification [9].

Fig. 48.1. Multi modular medical system of patients' diagnosis

Proposed solution is a multi-channel canvassing measurement data, which are synchronized together with time basis. It will be measured by a four-channel photopletismograph coupling four-channel spirometer and a four-channel thermometer [5]. An important issue in this system is synchronous data collecting from the sensors. On the basis of this information and disease symptoms, the system can help a doctor in deciding about patients' diagnosis. Knowledge about correlations between data got from the research could be an indirect result of the work of this system. This system will make the assessment of human's body reactions and emotional conditions. Those conditions will be registered as a change of breath frequency which modifies pulse and as a consequence influences a temperature of human body organs. For the reason of synchronous data collecting from individual modules included in system, it could be used e.g. in intensive care, monitoring patient's condition when he is away of hospital and telemedicine. In further research this system will be used as one of the elements of the system to diagnose patients with scoliosis. On the basis of expert's knowledge and analysis of results got from research, the decision table is constructed and sent as an XML file format. This file is a parameter for function implemented in Web Service. Such function makes it possible to compute by other Web Services. For example: function of one Web Service returns data in abstraction classes format. Abstraction classes are passed as parameters to next Web Service which generates e.g. a core. Another Web Services with other functions implemented allows e.g. to generate decision rules. The business rules of appUcation layer placed on the server includes algorithms to process data. This assures that we don't have to modify other functions implemented

48 Intelligent Medical Systems on Internet Technologies Platform

593

in other Web Services in case we want to change one of those algorithms. Features as: scaling, multi modular building, diffusion structure, devices independence, builtin exchange of data standards, operating systems independence, permit to aggregate this technology with rough sets. This solution could give interesting results in the form of new services available in a global network.

48.5 Summary Era of static WWW (World Wide Web) is finishing. Now internet is a platform accessible for newer and newer services, standards as XML or script languages. Electronic shops, banks, institutions, schools are more and more popular. .Net platform could be an efficient environment for an exchange of data and services of applications working in individual medical units. Above examples show that implementation of intelligent techniques of data process as Web Services on .Net platform opens new possibilities in designing diffusion decision support systems and autonomic computing.

Fig. 48.2. Diagnosis support system based on Web Services and XML standard

594

Beata Zielosko and Andrzej Dyszkiewicz

References 1. Brodziak A (1974) Formalizacja naturalnego wnioskowania diagnostycznego. Psychonika-teoria struktur i procesow informatycznych centralnego systemu nerwowego czlowieka i jej wykorzystanie w infonnatyce. PAN Warszawa 2. Doroszewski J (1990) Komputerowe wspomaganie diagnostyki medycznej. Nal^cz M, Problemy Biocybemetyki i Inzynierii Biomedycznej. WKL Warszawa 3. Dunway R (2003) Visual studio .NET. Mikom, Warszawa 4. Dyszkiewicz A, Wr6bel Z (2001) Elektromechaniczne procedury diagnostyki i terapii w rehabilitacji. Problemy Biocybemetyki i Inzynierii Biomedycznej pod redakcj§ Macieja Nal^cza, Warszawa 5. Dyszkiewicz A, Zielosko B, Wakulicz-Deja A, Wrobel Z (2004) Jednoczesna akwizycja wielopoziomowo sprz^zonych parametrow organizmu czlowieka krokiem do wyzszej swoisto^ci wnioskowania diagnostycznego. MPM Krynica 6. Esposito D (2002) Building Web Solutions with ASP .NET and ADO .NET. MS Press, Redmond 7. Komorowski J, Pawlak Z, Polkowski L, Skowron A Rough Sets: A Tutorial 8. Mackenzie D, Sharkey K (2002) Visual Basic .Net dla kazdego. Helion, Gliwice 9. Pawlak Z (1991) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht 10. Panowicz L (2004) Software 2.0: Darmowa paltforma .Net 1: 18-26 11. Young J. Michael (2000) XML krok po kroku. Wydawnictwo RM, Warszawa

Author Index

Bandyopadhyay, Sanghamitra, 439 Bazan, Jan, 191 Bell, David, 227 Bieniawski, Stefan, 31 Blazejewski, Lech, 527 Burkhard,Hans-Dieter, 347 Bums, Tom R., 363 Castro Caldas, Jose, 363 Cetnarowicz, Krzysztof, 579 Chen, Long, 455 Chilov, Nikolai, 385 Czyzewski, Andrzej, 397 Dardziiiska, Agnieszka, 133 Dobrowolski, Grzegorz, 551 Doherty, Patrick, 479 Dunin-K^plicz, Barbara, 69 Duntsch, Ivo, 179 Dyszkiewicz, Andrzej, 589

locchi, Luca, 467 Johnson, Rodney W., 85 Karsaev, Oleg, 411 Kazmierczak, Piotr, 539 Kisiel-Dorohinicki, Marek, 563 Kostek, Bozena, 397 Kozlak, Jaroslaw, 571 Krizhanovsky, Andrew, 385 Latkowski, Rafal, 493 Levashova, Tatiana, 385 Liao, Zhining, 227 Luks, Krzysztof, 519 Marszal-Paszek, Barbara, 339 Melich, Michael E., 85 Michalewicz, Zbigniew, 85 Mitra, Pabitra, 439 Moshkov, Mikhail Ju., 239

El Fallah-Seghrouchni, Amal, 53 Farinelli, Alessandro, 467 Fioravanti, Fabio, 99 Gediga, Gunther, 179 Glowinski, Cezary, 493 Gomolinska, Anna, 203 Gorodetsky, Vladimir, 411 Grabowski, Adam, 215 Guo, Gongde, 179, 227 Heintz, Fredrik, 479

Nakanishi, Hideyuki, 423 Nardi, Daniele, 467 Nawarecki, Edward, 551, 579 Nguyen Hung Son, 249 Nguyen Sinh Hoa, 249 Nowak, Agnieszka, 333 Pal, Sankar K., 439 Pashkin, Michael, 385 Paszek, Piotr, 339 Patrizi, Fabio, 467 Pawlak, Zdzislaw, 3

596

Author Index

Peters, James R, 13 Pettorossi, Alberto, 99 Polkowski, Lech, 117,509 Proietti, Maurizio, 99

Stepaniuk, Jaroslaw, 305 Szczuka, Marcin, 281 Szmigielski, Adam, 509 Sl?zak, Dominik, 281

Ra^, Zbigniew W., 133, 261 Rauch, Ewa, 501 Ray, Shubhra Sankar, 439 Rojek, Gabriel, 579 Roszkowska, Ewa, 363 Ryjov, Alexander, 147

Tzacheva, Angelina A., 261

Samoilov, Vladimir, 411 Schmidt, Martin, 85 Sergot, Marek, 161 Simiiiski, Roman, 273 Skarzynski, Henryk, 397 Skowron, Andrzej, 191 Smimov, Alexander, 385 Staruch, Bozena, 293

Verbrugge, Rineke, 69 Wakulicz-Deja, Alicja, 273, 333 Wang, Guoyin, 455 Wang, Hui, 179, 227 Wei, Ling, 317 Wolpert, David H., 31 Wr6blewski,Jakub,281 Wu, Yu, 455 Zhang, Wenxiu, 317 Zielosko, Beata, 589

Printing and Binding: Strauss GmbH, Morlenbach

Monitoring, Security, and Rescue Techniques in Multiagent Systems (Advances in Soft Computing)

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Algorithms in Bioinformatics: 4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004, Proceedings

Parameterized and Exact Computation: First International Workshop, IWPEC 2004, Bergen, Norway, September 14-17, 2004, Proceedings

Polarized Sources and Targets: Proceedings of the 11th International Workshop

Information and Communications Security: 6th International Conference, ICICS 2004, Malaga, Spain, October 27-29, 2004. Proceedings

Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004, Proceedings

Advanced Reliability Modeling: Proceedings of the 2004 Asian International Workshop (AIWARM 2004), Hiroshima, Japan, 26 - 27 August 2004

Intelligence in Communication Systems: IFIP International Conference, INTELLCOMM 2004, Bangkok, Thailand, November 23-26, 2004, Proceedings

Proceedings Of The Workshop: Semigroups and Languages

Biometric Authentication: ECCV 2004 International Workshop, BioAW 2004, Prague, Czech Republic, May 15, 2004, Proceedings (Lecture Notes in Computer Science)

Genome Informatics: Proceedings of the 8th Annual International Workshop on Bioinformatics and Systems Biology (IBSB 2008)

LASER 2004: Proceedings of the 6th International Workshop on Application of Lasers in Atomic Nuclei Research (LASER 2004) held in Poznan, Poland, 24-27 May, 2004

Monitoring, Security, and Rescue Techniques in Multi-Agent Systems: Proceedings of the International Workshop Msras 2004

Monitoring, Security, and Rescue Techniques in Multiagent Systems (Advances in Soft Computing)

Multiagent systems

Multiagent systems

Proceedings Of The Sixth International Workshop: Proceedings Of The Sixth International Workshop

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Algorithms in Bioinformatics: 4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004, Proceedings

Parameterized and Exact Computation: First International Workshop, IWPEC 2004, Bergen, Norway, September 14-17, 2004, Proceedings

Polarized Sources and Targets: Proceedings of the 11th International Workshop

Understanding the Insider Threat: Proceedings of a March 2004 Workshop

Monitoring International Labor Standards: Techniques and Sources of Information

Multiagent Robotic Systems

Polarized Sources and Targets: Proceedings of the 11th International Workshop

Large Nc QCD 2004: Proceedings of the Workshop

Information and Communications Security: 6th International Conference, ICICS 2004, Malaga, Spain, October 27-29, 2004. Proceedings

Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004, Proceedings

Advanced Reliability Modeling: Proceedings of the 2004 Asian International Workshop (AIWARM 2004), Hiroshima, Japan, 26 - 27 August 2004

Intelligence in Communication Systems: IFIP International Conference, INTELLCOMM 2004, Bangkok, Thailand, November 23-26, 2004, Proceedings

Proceedings Of The Workshop: Semigroups and Languages

Biometric Authentication: ECCV 2004 International Workshop, BioAW 2004, Prague, Czech Republic, May 15, 2004, Proceedings (Lecture Notes in Computer Science)

Proceedings of the Workshop: Semigroups and Languages

Vlsi Design Methods: International Workshop Proceedings

Technology and Components of Accelerator-driven Systems: Workshop Proceedings

Technology and Components of Accelerator-driven Systems: Workshop Proceedings

Proceedings of the First International Workshop on Model-Driven Interoperability

Genome Informatics: Proceedings of the 8th Annual International Workshop on Bioinformatics and Systems Biology (IBSB 2008)

Techniques for Corrosion Monitoring

Condensed Matter Theories: Proceedings of the 31st International Workshop

Dynamical systems and bifurcations. Proceedings workshop Groningen, The Netherlands

LASER 2004: Proceedings of the 6th International Workshop on Application of Lasers in Atomic Nuclei Research (LASER 2004) held in Poznan, Poland, 24-27 May, 2004

Parallel, Distributed and Multiagent Production Systems

Monitoring, Security, and Rescue Techniques in Multi-Agent Systems: Proceedings of the International Workshop Msras 2004

Monitoring, Security, and Rescue Techniques in Multiagent Systems (Advances in Soft Computing)

Multiagent systems

Multiagent systems

Proceedings Of The Sixth International Workshop: Proceedings Of The Sixth International Workshop

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Algorithms in Bioinformatics: 4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004, Proceedings

Parameterized and Exact Computation: First International Workshop, IWPEC 2004, Bergen, Norway, September 14-17, 2004, Proceedings

Polarized Sources and Targets: Proceedings of the 11th International Workshop

Understanding the Insider Threat: Proceedings of a March 2004 Workshop

Monitoring International Labor Standards: Techniques and Sources of Information

Multiagent Robotic Systems

Polarized Sources and Targets: Proceedings of the 11th International Workshop

Large Nc QCD 2004: Proceedings of the Workshop

Information and Communications Security: 6th International Conference, ICICS 2004, Malaga, Spain, October 27-29, 2004. Proceedings

Cryptographic Hardware and Embedded Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-13, 2004, Proceedings

Advanced Reliability Modeling: Proceedings of the 2004 Asian International Workshop (AIWARM 2004), Hiroshima, Japan, 26 - 27 August 2004

Intelligence in Communication Systems: IFIP International Conference, INTELLCOMM 2004, Bangkok, Thailand, November 23-26, 2004, Proceedings

Proceedings Of The Workshop: Semigroups and Languages

Biometric Authentication: ECCV 2004 International Workshop, BioAW 2004, Prague, Czech Republic, May 15, 2004, Proceedings (Lecture Notes in Computer Science)

Proceedings of the Workshop: Semigroups and Languages

Vlsi Design Methods: International Workshop Proceedings

Technology and Components of Accelerator-driven Systems: Workshop Proceedings

Technology and Components of Accelerator-driven Systems: Workshop Proceedings

Proceedings of the First International Workshop on Model-Driven Interoperability

Genome Informatics: Proceedings of the 8th Annual International Workshop on Bioinformatics and Systems Biology (IBSB 2008)

Techniques for Corrosion Monitoring

Condensed Matter Theories: Proceedings of the 31st International Workshop

Dynamical systems and bifurcations. Proceedings workshop Groningen, The Netherlands

LASER 2004: Proceedings of the 6th International Workshop on Application of Lasers in Atomic Nuclei Research (LASER 2004) held in Poznan, Poland, 24-27 May, 2004

Parallel, Distributed and Multiagent Production Systems

Recommend Documents